In this problem we are going to analyze the famous color similarities data
with the same model we used in problem (1). We assume that the
observed distances have a log-normal distribution. We are also
going to take a Bayesian approach and state prior distributions for
all of the parameters of our model. Hence, rather than minimizing
a squared-error loss function we are going to maximize the
log-likelihood of a posterior distribution. We are going to use a version of
metric_mds_nations_ussr_usa.r
which we used in problem (1) to find
the coordinates and the variance term that maximizes
the log-likelihood of the posterior distribution.
Download the R program:
metric_mds_colors_log_normal.r
-- Program to optimize the Log-Normal Bayesian Model for the Colors Similarities data.
colors.txt
-- Colors Data used by metric_mds_colors_log_normal.r
Here is the function that computes the log-likelihood:
fr <- function(zzz){
#
xx[1:8] <- zzz[1:8] There are 26 parameters: 14*2=28-3+1=26
xx[9:10] <- 0.0 The 14 points in 2 dimensions = 28 coordinates
xx[11:19] <- zzz[9:17] The 5th point -- 490 Angstroms is the origin and
xx[20] <- 0.0 the 2nd Dimension of 600 Angstroms is set to 0.0
xx[21:28] <- zzz[18:25] Hence, 25 coordinates plus the variance term = 26 parameters
sigmasq <- zzz[26]
kappasq <- 100.0
sumzsquare <- sum(zzz[1:25]**2)
#
i <- 1
sse <- 0.0
while (i <= np) {
ii <- 1
while (ii <= i) {
j <- 1
dist_i_ii <- 0.0
while (j <= ndim) {
#
# Calculate distance between points
#
dist_i_ii <- dist_i_ii + (xx[ndim*(i-1)+j]-xx[ndim*(ii-1)+j])*(xx[ndim*(i-1)+j]-xx[ndim*(ii-1)+j])
j <- j + 1
}
if (i != ii)sse <- sse + (log(sqrt(TT[i,ii])) - log(sqrt(dist_i_ii)))**2
ii <- ii + 1
}
i <- i + 1
}
sse <- (((np*(np-1))/2.0)/2.0)*log(sigmasq)+(1/(2*sigmasq))*sse+(1/(2*kappasq))*sumzsquare
return(sse)
}
The number of parameters is equal to 26 (nparam=26) and further
down in the program before the optimizers are called zzz[26] is
initialized to 0.2. This program also has much more efficient code for putting the coordinates estimated by the
optimizers back into a matrix for plotting purposes:#
# Put Coordinates into matrix for plot purposes
#
xmetric <- rep(0,np*ndim)
dim(xmetric) <- c(np,ndim)
zdummy <- NULL
zdummy[1:(np*ndim)] <- 0 Note the parentheses around "np*ndim" -- R does weird things if you write [1:np*ndim]
zdummy[1:8] <- rhomax[1:8]
zdummy[11:19] <- rhomax[9:17] I transfer the coordinates from the optimizer back into a vector np*ndim long with the points stacked
zdummy[21:28] <- rhomax[18:25]
#
i <- 1
while (i <= np)
{
j <- 1
while (j <= ndim)
{
xmetric[i,j] <- zdummy[ndim*(i-1)+j] Now it is easy to put the coordinates back into an np by ndim matrix
j <- j +1
}
i <- i + 1
}
- Run the program and turn in a
NEATLY FORMATTED plot of the colors.
- Report xfit and rfit and a NEATLY
FORMATTED listing of the results table.
- Add this code fragment from metric_mds_nations_log_normal_2.r:
#
InvCheck <- xhessian%*%XXInv
checkinverse <- sum(abs(InvCheck-diag(nparam)))
#
and report checkinverse.
- Add the code to produce a graph of the eigenvalues of the
Hessian matrix. Turn in NEATLY FORMATTED
plot with appropriate labeling.