基于phyloseq的排序分析

Original 文涛聊科研微生信生物 2022-05-08

wentao

2019年5月16日

载入本次分析需要的包

phyloseq的排序函数为我们的排序分析进行了极大的简化，保证计算的同时兼顾可视化工作。

library("phyloseq"); packageVersion("phyloseq")## [1] '1.26.1'

data(GlobalPatterns)
library("ggplot2"); packageVersion("ggplot2")

## [1] '3.1.0'library("cluster"); packageVersion("cluster")## [1] '2.0.7.1'library("plyr"); packageVersion("plyr")## [1] '1.8.4'

设置主题准备数据

theme_set(theme_bw())

数据得到并进行过滤

count数量在一半样品中出现抵低于5的OTU

GP = GlobalPatterns
wh0 = genefilter_sample(GP, filterfun_sample(function(x) x > 5), A=0.5*nsamples(GP))
GP1 = prune_taxa(wh0, GP)

相对丰度标准化，这里使用10的六次方

GP1 = transform_sample_counts(GP1, function(x) 1E6 * x/sum(x))

仅仅保留主要的五个门类的OTU

phylum.sum = tapply(taxa_sums(GP1), tax_table(GP1)[, "Phylum"], sum, na.rm=TRUE)
top5phyla = names(sort(phylum.sum, TRUE))[1:5]
GP1 = prune_taxa((tax_table(GP1)[, "Phylum"] %in% top5phyla), GP1)

get_variable 函数返回mapping文件中的目标列

get_variable函数结合需要判断的变量，将分组分为两类，并添加到mapping文件中

human = get_variable(GP1, "SampleType") %in% c("Feces", "Mock", "Skin", "Tongue")
sample_data(GP1)$human <- factor(human)

phyloaeq封装的排序脚本使用

ordinate函数，我们只需要定义使用的排序方法：method，和使用的距离参数 distance,下面是基于brayZ距离的NMDS排序

下面展示loading矩阵信息

GP.ord <- ordinate(GP1, "NMDS", "bray")

## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.1333468 
## Run 1 stress 0.1333468 
## ... Procrustes: rmse 4.32893e-06  max resid 7.783055e-06 
## ... Similar to previous best
## Run 2 stress 0.1637672 
## Run 3 stress 0.1844742 
## Run 4 stress 0.1529686 
## Run 5 stress 0.1498725 
## Run 6 stress 0.1469145 
## Run 7 stress 0.1505794 
## Run 8 stress 0.1891198 
## Run 9 stress 0.1452642 
## Run 10 stress 0.1488978 
## Run 11 stress 0.1469145 
## Run 12 stress 0.1685942 
## Run 13 stress 0.1333468 
## ... Procrustes: rmse 5.818375e-06  max resid 1.112148e-05 
## ... Similar to previous best
## Run 14 stress 0.1675647 
## Run 15 stress 0.171298 
## Run 16 stress 0.1385323 
## Run 17 stress 0.1710784 
## Run 18 stress 0.1385323 
## Run 19 stress 0.1498724 
## Run 20 stress 0.1385323 
## *** Solution reached

p1 = plot_ordination(GP1, GP.ord, type="taxa", color="Phylum", title="taxa")
print(p1)

### 可选分面loading展示，更加清晰直观

p1 + facet_wrap(~Phylum, 3)

### 对样品进行排序

type 选择samples 对样品进行排序，这也是我们使用更多的一种方法。

p2 = plot_ordination(GP1, GP.ord, type="samples", color="SampleType", shape="human") 
p2 + geom_polygon(aes(fill=SampleType)) + geom_point(size=5) + ggtitle("samples")

### biplot 选项将样品和loading一同展示

但是重叠很严重

p3 = plot_ordination(GP1, GP.ord, type="biplot", color="SampleType", shape="Phylum", title="biplot")
# Some stuff to modify the automatic shape scale
GP1.shape.names = get_taxa_unique(GP1, "Phylum")
GP1.shape <- 15:(15 + length(GP1.shape.names) - 1)
names(GP1.shape) <- GP1.shape.names
GP1.shape["samples"] <- 16
p3 + scale_shape_manual(values=GP1.shape)

### split 通过将两者分开分，面展示解决问题

p4 = plot_ordination(GP1, GP.ord, type="split", color="Phylum", shape="human", label="SampleType", title="split") 
p4

### 自定义颜色

gg_color_hue <- function(n){
    hues = seq(15, 375, length=n+1)
    hcl(h=hues, l=65, c=100)[1:n]
}
color.names <- levels(p4$data$Phylum)
p4cols <- gg_color_hue(length(color.names))
names(p4cols) <- color.names
p4cols["samples"] <- "black"
p4 + scale_color_manual(values=p4cols)

### 下面使用多种排序方法进行排序，并比对

使用llply函数

dist = "bray"
ord_meths = c("DCA", "CCA", "RDA", "DPCoA", "NMDS", "MDS", "PCoA")
plist = llply(as.list(ord_meths), function(i, physeq, dist){
        ordi = ordinate(physeq, method=i, distance=dist)
        plot_ordination(physeq, ordi, "samples", color="SampleType")
}, GP1, dist)

## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.1333468 
## Run 1 stress 0.1528906 
## Run 2 stress 0.1827625 
## Run 3 stress 0.1333468 
## ... New best solution
## ... Procrustes: rmse 3.155156e-06  max resid 7.816854e-06 
## ... Similar to previous best
## Run 4 stress 0.1468477 
## Run 5 stress 0.1333468 
## ... New best solution
## ... Procrustes: rmse 2.975489e-06  max resid 7.840301e-06 
## ... Similar to previous best
## Run 6 stress 0.1681228 
## Run 7 stress 0.16608 
## Run 8 stress 0.1521722 
## Run 9 stress 0.1385323 
## Run 10 stress 0.1570722 
## Run 11 stress 0.1455574 
## Run 12 stress 0.1510167 
## Run 13 stress 0.1653386 
## Run 14 stress 0.1658681 
## Run 15 stress 0.1385327 
## Run 16 stress 0.1507161 
## Run 17 stress 0.169588 
## Run 18 stress 0.1469145 
## Run 19 stress 0.180481 
## Run 20 stress 0.1518735 
## *** Solution reached

提取作图所需数据

names(plist) <- ord_meths
pdataframe = ldply(plist, function(x){
    df = x$data[, 1:2]
    colnames(df) = c("Axis_1", "Axis_2")
    return(cbind(df, x$data))
})
names(pdataframe)[1] = "method"

p = ggplot(pdataframe, aes(Axis_1, Axis_2, color=SampleType, shape=human, fill=SampleType))
p = p + geom_point(size=4) + geom_polygon()
p = p + facet_wrap(~method, scales="free")
p = p + scale_fill_brewer(type="qual", palette="Set1")
p = p + scale_colour_brewer(type="qual", palette="Set1")
p

### 选择其中一个展示

plist[[2]]

### 对展示图形进行修饰

p = plist[[2]] + scale_colour_brewer(type="qual", palette="Set1")
p = p + scale_fill_brewer(type="qual", palette="Set1")
p = p + geom_point(size=5) + geom_polygon(aes(fill=SampleType))
p

### 使用unifrac距离进行PCoa分析

ordu = ordinate(GP1, "PCoA", "unifrac", weighted=TRUE)
plot_ordination(GP1, ordu, color="SampleType", shape="human")

展示排序结果

p = plot_ordination(GP1, ordu, color="SampleType", shape="human")
p = p + geom_point(size=7, alpha=0.75)
p = p + scale_colour_brewer(type="qual", palette="Set1")
p + ggtitle("MDS/PCoA on weighted-UniFrac distance, GlobalPatterns")

当我们将排序做完之后

这里仅仅做了排序的分析，其实我们的排序需要进行检验。

这样的洞庭湖决堤，实在让人同情不起来

李尚福、魏凤和双双被拿下，与美国一份报告是否有关？

抗洪靠嘴，堵漏靠沙？印度官员真是绝了！

有的人走了，却永远活着

圈内疯传某谣言

基于phyloseq的排序分析

wentao

2019年5月16日

载入本次分析需要的包

设置主题准备数据

数据得到并进行过滤

相对丰度标准化，这里使用10的六次方

仅仅保留主要的五个门类的OTU

get_variable 函数返回mapping文件中的目标列

phyloaeq封装的排序脚本使用

提取作图所需数据

展示排序结果

当我们将排序做完之后

欢迎加入微生信生物讨论群，扫描下方二维码添加小编微信，小编带你入伙啦，大牛如云，让交流变得简单。

历史目录

R语言分析技术

扩增子专题

基于phyloseq的微生物群落分析

代谢组专题

当科研遇见python

科学知识图谱

杂谈

您可能也对以下帖子感兴趣

这样的洞庭湖决堤，实在让人同情不起来

李尚福、魏凤和双双被拿下，与美国一份报告是否有关？

抗洪靠嘴，堵漏靠沙？印度官员真是绝了！

有的人走了，却永远活着

圈内疯传某谣言

生成图片，分享到微信朋友圈

基于phyloseq的排序分析

wentao

2019年5月16日

载入本次分析需要的包

设置主题 准备数据

数据得到 并进行过滤

相对丰度标准化，这里使用10的六次方

仅仅保留主要的五个门类的OTU

get_variable 函数返回mapping文件中的目标列

phyloaeq封装的排序脚本使用

提取作图所需数据

展示排序结果

当我们将排序做完之后

欢迎加入微生信生物讨论群，扫描下方二维码添加小编微信，小编带你入伙啦，大牛如云，让交流变得简单。

历史目录

R语言分析技术

扩增子专题

基于phyloseq的微生物群落分析

代谢组专题

当科研遇见python

科学知识图谱

杂谈

您可能也对以下帖子感兴趣

设置主题准备数据

数据得到并进行过滤