pydiodon.mds

pydiodon.mds(dis, k=- 1, meth='svd', no_loop=True)[source]

Multidimensional Scaling of a distance or dissimilarity array

Parameters
disnumpy array, n x n

distances or dissimilarities

kinteger

number of axis

methstring

method for MDS (see notes)

no_loopboolean

see notes

Returns
X2D numpy array,

n x k, coordinates of points

S1D numpy array

k values, singular values |

Notes

Objective: builds a point cloud where one point represents an item, such that the discrepancies beween distances in a matrix dis and between associated points in \(R^k\) is as small as possible. It is classical MDS, not Least Square Scaling.

Procedure: There are three steps
1. computes the Gram matrix G between items from distance matrix dis
2. runs a SVD or EVD of G
3. computes the coordinates from the SVD (or EVD) in X
argument no_loop : it is used in the computation of the Gram matrix:
- if n is large, no_loop=False is recommended
- if not, no_loop=True is recommended (default value)
methods for MDS: the argument meth specifies which method is selected for the core of MDS. Default value is svd. Let G be the Gram matrix associated to the distance array.
- if meth=svd, the a SVD of G is run
- if meth=grp, the SVD is run with Gaussian Random Projection
- if meth=evd, the eigenvalues and eigenvectors of G are computed
Here are some suggestions for selection of the method:
- if n is not too large (\(n < 10,000\)), svd method is recommended
- if n is large, grp method is recommended, and compulsory if n is very large
- if k is small, \(k \simeq 10\), evd method is recommended.
computations : EVD or SVD of the Gram matrix are two equivalent ways to compute a solution. Let G be the Gram matrix. They are linked by
- EVD: \(GU = US\) ; U is the matrix of columnwise eigenvectors of G and S the diagonal matrix of its eigenvalues
- SVD: \(G = USU'\) is the SVD of G (which is symmetric, U’ is the transpose of U)
Then, in both cases, if X is the matrix of coordinates of n items in \(R^r\), \(X = US^{1/2}\)

Examples

This is a first toy example for running a MDS.

First, building a Euclidean distance matrix.

>>> import pydiodon as dio
>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> import scipy.spatial.distance as ssd # for computing distances
>>> # creates a m x n point cloud
>>> m = 10 ; n = 4
>>> A = np.random.randn(m,n)
>>> # computes the pairwise distances
>>> D = ssd.pdist(A)
>>> D = ssd.squareform(D)       # distance matrix in square form

Second, run the MDS

>>> # runs the MDS
>>> X, S = dio.mds(D)

Third, interpret the quality and displays the result

>>> # plotting the quality
>>> plt.plot(S, color="red")
>>> plt.title("Eigenvalues")
>>> plt.show()
>>> # plotting the point cloud
>>> F1 = X[:,0] ; F2 = X[:,1]
>>> plt.scatter(F1, F2, c="chartreuse", s=20)
>>> plt.xlabel("axis 1")
>>> plt.ylabel("axis 2")
>>> plt.show()

This example is in mds_xmpl_1.py

Here is now a real life example.

>>> import pydiodon as dio
>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> #
>>> rank = 200
>>> filename = "guiana_trees.dissw"
>>> Dis, rn, cn = dio.load(filename, rownames=True, colnames=True)
>>> X, S = dio.mds(Dis, k=rank)

This is focused on running the MDS of Dis . Here is a complementary example for visualizing the results.

>>> # plotting the singular values
>>> plt.plot(S, color="red")
>>> plt.title("Quality along the axis")
>>> plt.show()
>>> # plotting the point cloud, axis 1 and 2
>>> F1 = X[:,0]; F2 = X[:,1]; F3 = X[:,2]
>>> plt.scatter(F1,F2,c="blue", s=5)
>>> plt.xlabel("axis 1")
>>> plt.ylabel("axis 2")
>>> plt.title("data set: Guiana trees")
>>> plt.show()

This example is in mds_xmpl_guiana_trees.py

References

[1] T.F. Cox and M. A. A. Cox. Multidimensional Scaling - Second edition, volume 88 of Monographs on Statistics and Applied Probability. Chapman & al., 2001.
[2] I. Borg and P. J. F. Groenen. Modern Multidimensional Scaling. Springer Series in Statistics. Springer, second edition, 2005.
[3] K. V. Mardia, J.T. Kent, and J. M. Bibby. Multivariate Analysis. Probability and Mathematical Statistics. Academic Press, 1979.

Diodon notes: section 10

07/11/2017 - revised 21.03.20, 22.10.13