pydiodon.mds¶
- pydiodon.mds(dis, k=- 1, meth='svd', no_loop=True)[source]¶
Multidimensional Scaling of a distance or dissimilarity array
- Parameters
- disnumpy array, n x n
distances or dissimilarities
- kinteger
number of axis
- methstring
method for MDS (see notes)
- no_loopboolean
see notes
- Returns
- X2D numpy array,
n x k, coordinates of points
- S1D numpy array
k values, singular values |
Notes
Objective: builds a point cloud where one point represents an item, such that the discrepancies beween distances in a matrix dis and between associated points in \(R^k\) is as small as possible. It is classical MDS, not Least Square Scaling.
Procedure: There are three steps1. computes the Gram matrix G between items from distance matrix dis2. runs a SVD or EVD of G3. computes the coordinates from the SVD (or EVD) in Xargument no_loop : it is used in the computation of the Gram matrix:- if n is large,no_loop=False
is recommended- if not,no_loop=True
is recommended (default value)methods for MDS: the argumentmeth
specifies which method is selected for the core of MDS. Default value issvd
. Let G be the Gram matrix associated to the distance array.- ifmeth=svd
, the a SVD of G is run- ifmeth=grp
, the SVD is run with Gaussian Random Projection- ifmeth=evd
, the eigenvalues and eigenvectors of G are computedHere are some suggestions for selection of the method:- if n is not too large (\(n < 10,000\)),svd
method is recommended- if n is large,grp
method is recommended, and compulsory if n is very large- if k is small, \(k \simeq 10\),evd
method is recommended.computations : EVD or SVD of the Gram matrix are two equivalent ways to compute a solution. Let G be the Gram matrix. They are linked by- EVD: \(GU = US\) ; U is the matrix of columnwise eigenvectors of G and S the diagonal matrix of its eigenvalues- SVD: \(G = USU'\) is the SVD of G (which is symmetric, U’ is the transpose of U)Then, in both cases, if X is the matrix of coordinates of n items in \(R^r\), \(X = US^{1/2}\)Examples
This is a first toy example for running a MDS.
First, building a Euclidean distance matrix.
>>> import pydiodon as dio >>> import numpy as np >>> import matplotlib.pyplot as plt >>> import scipy.spatial.distance as ssd # for computing distances >>> # creates a m x n point cloud >>> m = 10 ; n = 4 >>> A = np.random.randn(m,n) >>> # computes the pairwise distances >>> D = ssd.pdist(A) >>> D = ssd.squareform(D) # distance matrix in square form
Second, run the MDS
>>> # runs the MDS >>> X, S = dio.mds(D)
Third, interpret the quality and displays the result
>>> # plotting the quality >>> plt.plot(S, color="red") >>> plt.title("Eigenvalues") >>> plt.show() >>> # plotting the point cloud >>> F1 = X[:,0] ; F2 = X[:,1] >>> plt.scatter(F1, F2, c="chartreuse", s=20) >>> plt.xlabel("axis 1") >>> plt.ylabel("axis 2") >>> plt.show()
This example is in
mds_xmpl_1.py
Here is now a real life example.
>>> import pydiodon as dio >>> import numpy as np >>> import matplotlib.pyplot as plt >>> # >>> rank = 200 >>> filename = "guiana_trees.dissw" >>> Dis, rn, cn = dio.load(filename, rownames=True, colnames=True) >>> X, S = dio.mds(Dis, k=rank)
This is focused on running the MDS of Dis . Here is a complementary example for visualizing the results.
>>> # plotting the singular values >>> plt.plot(S, color="red") >>> plt.title("Quality along the axis") >>> plt.show() >>> # plotting the point cloud, axis 1 and 2 >>> F1 = X[:,0]; F2 = X[:,1]; F3 = X[:,2] >>> plt.scatter(F1,F2,c="blue", s=5) >>> plt.xlabel("axis 1") >>> plt.ylabel("axis 2") >>> plt.title("data set: Guiana trees") >>> plt.show()
This example is in
mds_xmpl_guiana_trees.py
The data set can be downloaded from https://doi.org/10.15454/XSJ079 where it is dataset
guiana_trees.dissw
.Related publication: Caron, H, Molino, J‐F, Sabatier, D, et al. Chloroplast DNA variation in a hyperdiverse tropical tree community. Ecol Evol. 2019; 9: 4897– 4905. https://doi.org/10.1002/ece3.5096
Can be downloaded from Wiley website at https://onlinelibrary.wiley.com/doi/pdf/10.1002/ece3.5096 doi: DOI: 10.1002/ece3.5096
References
[1] T.F. Cox and M. A. A. Cox. Multidimensional Scaling - Second edition, volume 88 of Monographs on Statistics and Applied Probability. Chapman & al., 2001.[2] I. Borg and P. J. F. Groenen. Modern Multidimensional Scaling. Springer Series in Statistics. Springer, second edition, 2005.[3] K. V. Mardia, J.T. Kent, and J. M. Bibby. Multivariate Analysis. Probability and Mathematical Statistics. Academic Press, 1979.Diodon notes: section 10
07/11/2017 - revised 21.03.20, 22.10.13