排序【5】--PCoA主坐标分析

一、PCoA简介

PCoA这个方法也是MDS (Metric Multidimensional Scaling)。

PCA用的距离矩阵为Euclidean distances，CA用的距离矩阵为chi-square distances，PCoA一般也是用的为Euclidean来表示样品之间的距离关系。同时PCoA也有特征值，用以说明轴的贡献， (This method is also known as MDS (Metric Multidimensional Scaling). While PCA preserves Euclidean distances among samples and CA chi-square distances, PCoA provides Euclidean representation of a set of objects whose relationship is measured by any similarity or distance measure chosen by the user. As well as PCA and CA, PCoA returns a set of orthogonal axes whose importance is measured by eigenvalues. This means that calculating PCoA on Euclidean distances among samples yields the same results as PCA calculated on covariance matrix of the same dataset (if scaling 1 is used).)

NMDS (Non-metric Multidimensional Scaling):

Non-metric alternative to PCoA analysis - it can use any distance measure among samples, and the main focus is on projecting the relative position of sample points into low dimensional ordination space (two or three axes). The method is distance based, not eigenvalue based - it means that it does not attempt to maximize the variance preserved by particular ordination axes and resulting projection could therefore be rotated in any direction.

二、 PCoA分析步骤

1)构成样本间差异的数据矩阵M

M的数据一般不是生态工作者观察群落的原始记录,而是由原始记录推导出来的,它 有各种推导方法。

2)构成离差矩阵A

mij2=mji2=aii+ajj-2aij( i,j=1,2,……,N).


3)求出N个样方点的坐标矩阵C

4 ) 排列N个样方

$$\sum _{i=1}^{K}\lambda_{i}/ \sum _{i=1}^{s}\lambda_{i}$$

。特别是只选择二、三维主坐标就可直观地画出它们的排序图形

三、PCOA分析

cmdscale( ) stats包
wcmdscale( ) vegan包：加权的经典标度
pco( ),ecodist包
pco( ),labdsv包
pcoa( ),ape包


sammon映射：sammon（），MASS包

isoMDS（），MASS包

metaMDS（）,vegan包

monoMDS（），vegan包（全局和局部的非度量标度，局部的多维标度在非线性维数缩减部分介绍）

isoreg（），stats包

(2) SMACOF方法和smacof包，复杂MDS问题的求解。

1，vegan包中的cmdscale根据距离矩阵做PCOA分析

spe.bray <- vegdist(spe)#先算距离矩阵
spe.b.pcoa <- cmdscale(spe.bray, k=(nrow(spe)-1), eig=TRUE) #输入距离矩阵计算PCoA
ordiplot(spe.b.pcoa)
ordiplot来作图

city1<-city[,-1]#这个数据集的第一列是名字，先把它去掉
for (i in 1:9)
for(j in(i+1):10)
city1[i,j]=city1[j,i] #把上三角部分补足
rownames(city1)<-colnames(city1) #再把行名加回来
city2<-as.dist(city1, diag = TRUE, upper = TRUE)#转换为dist类型
city3<-as.matrix(city2) #转化为矩阵
citys<-cmdscale(city3,k=2) #计算MDS，为可视化，取前两个主坐标plot(citys[,1],citys[,2],type='n') #绘图
text(citys[,1],citys[,2],labels(city2),cex=.7)#标上城市名字

plot(-citys[,1],citys[,2],type='n')
text(-citys[,1],citys[,2],labels(city2),cex=.7)


2，ape包中的pcoa

am<-pcoa(ab, correction="none", rn=NULL)
# Correction methods for negative eigenvalues (details below): "lingoes" and "cailliez". Default value: "none".
rn     ：An optional vector of row names, of length n, for the n objects.

## S3 method for class 'pcoa':
biplot(am, Y=NULL, plot.axes = c(1,2), dir.axis1=1, dir.axis2=1, rn=NULL, ...)
biplot(am, Y=NULL, plot.axes = c(2,3))  #取2,3轴
biplot(am)
biplot.pcoa(am)


3，可以用vegan包中的capscale(…)，可以自己设定距离矩阵

ab.res<-capscale(ab~1,distance="bray");
# ab~1指示ab数据结构是sample/variable，distance表明用的距离矩阵为bray。其中vegdist(…)距离矩阵有"manhattan", "euclidean", "canberra", "bray", "kulczynski", "jaccard", "gower", "altGower", "morisita", "horn", "mountford", "raup" , "binomial" or "chao" and the default is “bray” or Bray-Curtis.

summary(ab.res);
#看到特征值

scores(ab.res,display="sites");
#or display=”species”

plot(ab.res)
#做出来的图不含有标签，需要points(…) and text(…)给加上去


1，用来自beta_diversity_through_plots.

py（qiime）脚本产生的关于不同样品的之间的unifrac距离的文件weighted_unifrac_dm.txt改变为weighted_unifrac_dm.csv

#读取数据

aa<-read.csv("weighted_unifrac_dm.csv",header=F,sep="\t");
#排序（）
aa<-aa[,order(aa[1,])]
aa<-aa[order(aa[,24]),];


#修改行和列的坐标名字

ab<-aa[1:23,1:23];
colnames(ab)<-c("SH_C_1","SH_C_2","SH_0_1","SH_0_2","SH_0_3","SH_0.5_1","SH_0.5_2","SH_0.5_3","SH_2_1","SH_2_2","SH_2_3","SH_4_2","SH_4_3","SH_10_1","SH_10_2"
,"SH_10_3","SH_15_1","SH_15_2","SH_15_3","SH_25_1","SH_25_2","SH_25_3","M82_35_b");
rownames(ab)<-c("SH_C_1","SH_C_2","SH_0_1","SH_0_2","SH_0_3","SH_0.5_1","SH_0.5_2","SH_0.5_3","SH_2_1","SH_2_2","SH_2_3","SH_4_2","SH_4_3","SH_10_1","SH_10_2"
,"SH_10_3","SH_15_1","SH_15_2","SH_15_3","SH_25_1","SH_25_2","SH_25_3","M82_35_b");


#根据矩阵做pcoa分析

am<-pcoa(ab, correction="none", rn=NULL)
biplot(am)
biplot(am, plot.axes = c(2,3))  显示2,3轴


NMDS

Example of use

NMDS of river valley data

vltava.spe <- read.delim ('http://www.sci.muni.cz/botany/zeleny/wiki/anadat-r/data-download/vltava-spe.txt', row.names = 1)
NMDS <- metaMDS (vltava.spe)
NMDS
Call:
metaMDS(comm = vltava.spe)
Nonmetric Multidimensional Scaling using isoMDS (MASS package)
Data:     wisconsin(sqrt(vltava.spe))
Distance: bray
Dimensions: 2
Stress:     21.66520
Two convergent solutions found after 3 tries
Scaling: centring, PC rotation, halfchange scaling
Species: expanded scores based on ‘wisconsin(sqrt(vltava.spe))’
If the default setting of metaMDS function is used, the data are automatically (if necessary) transformed (in this case, combination of wisconsin and sqrt transformation was used). In this case, stress value is 21.7.
plot (NMDS)


四、PCoA与PCA

Principal Coordinates Analysis (PCoA, = Multidimensional scaling, MDS)是一种研究和可视化数据的相似性和不一样的方法。PCoA通过相似性或不相似性矩阵来发现主要的因子。就想特征分析，计算一系列的特征值和特征向量，每一个特征值有一个特征向量。特征值一般会排序，第一个特征值一般就是主导的特征值。

PCoA分析加入了进化树的内容，而PCA仅仅比较的是OUT丰度的不同。PCA is used for similarities and PCoA for dissimilaritties. However, all binary measures (Jaccard, Dice etc.) are distance measures and, therefore PCoA should be used。

Principal Components analysis (PCA)

• transforms a number of possibly correlated variables (a similarity matrix!) into a smaller number of uncorrelated variables called principal components. So it reduces the dimensions of a complex data set and can be used to visulalize complex data.

The first principal component accounts for as much of the variability in the data as possible, and each succeeding component accounts for as much of the remaining variability as possible.

• captures as much of the variation in the data as possible
• principal components are …
• summary variables
• linear combinations of the original variables
• uncorrelated with each other
• capture as much of the original variance as possible

Classical Multidimensional Scaling (CMDS)

[syn. Torgerson scaling, Torgerson-Gower scaling]

• is similar in spirit to PCA but it takes a dissimilarity as input! A dissimilarity matrix shows the distance between every possible pair of objects.
• is a set of data analysis techniques that display the structure of (complex) distance-like data (a dissimilarity matrix!) in a high dimensional space into a lower dimensional space without too much loss of information.
• The goal of MDS is to faithfully represent these distances with the lowest possible dimensional space.

ClusterVis calculates a principal coordinate analysis (PCoA) of a distance matrix (see Gower, 1966) and calculates a centered matrix. The centered matrix is then decomposed into its component eigenvalues and eigenvectors.

The eigenvectors, standardized by dividing by the square root of their corresponding eigenvalue, are output as the principal coordinate axes. This analysis is also called metric multi-dimensional scaling. It is useful for ordination of multivariate data on the basis of any distance function

参考资料：

R中作图

http://qzongy007.blog.sohu.com/261236424.html

http://qzongy007.blog.sohu.com/259865878.html

http://www.inside-r.org/packages/cran/ape/docs/biplot.pcoa

http://ecology.msu.montana.edu/labdsv/R/labs/lab8/lab8.html

http://www.inside-r.org/packages/cran/ape/docs/pcoa

http://www.planta.cn/forum/viewtopic.php?p=214581&sid=50cdbfe1861b15ec0ba81a66f1cda8d8

http://www.pmc.ucsc.edu/~mclapham/Rtips/ordination.htm

analysis of community ecology data in R: http://www.davidzeleny.net/anadat-r/doku.php/en:pcoa_nmds (超赞)

Principal Coordinates Analysis ：http://www.sequentix.de/gelquest/help/principal_coordinates_analysis.htm