对单细胞亚群取子集后继续降维聚类分群该如何做

Original 生信技能树生信技能树 2022-08-15

收录于合集 #单细胞进阶 139个

这个需求实在是太常见了，也是很多粉丝在各种交流群咨询过我的问题，恰好我看到了文献《Single-cell sequencing of human midbrain reveals glial activation and a Parkinson-specific neuronal state》还刻意描述了这个过程：

对glia细胞亚群进行细分

描述的很清楚，每个单细胞亚群细分后取子集的时候，仍然是需要UMI 的raw counts值，从代码的角度就是：

library(Seurat)
library(ggplot2)
library(clustree)
library(cowplot)
library(dplyr)

load('../3-cell/phe-by-markers.Rdata') 
table(phe$celltype)
sce=readRDS('../2-int/sce.all_int.rds')


sce=sce[,phe$celltype %in% c( 'CD4' ,'CD8','neg' )]

sce
sce=CreateSeuratObject(counts = sce@assays$RNA@counts,
                       meta.data = sce@meta.data)

可以看到这样的seurat对象，就可以进行继续降维聚类分群啦，标准代码如下所示：


sce <- NormalizeData(sce, normalization.method =  "LogNormalize",  
                     scale.factor = 1e4)
GetAssay(sce,assay = "RNA")
sce <- FindVariableFeatures(sce, 
                            selection.method = "vst", nfeatures = 2000)  
sce <- ScaleData(sce) 
sce <- RunPCA(object = sce, pc.genes = VariableFeatures(sce)) 
DimHeatmap(sce, dims = 1:12, cells = 100, balanced = TRUE)
ElbowPlot(sce) 

sce <- FindNeighbors(sce, dims = 1:15)
sce <- FindClusters(sce, resolution = 0.8)
table(sce@meta.data$RNA_snn_res.0.8)  

set.seed(123)
sce <- RunTSNE(object = sce, dims = 1:15, do.fast = TRUE)
DimPlot(sce,reduction = "tsne",label=T)


sce <- RunUMAP(object = sce, dims = 1:15, do.fast = TRUE)
DimPlot(sce,reduction = "umap",label=T) 
head(sce@meta.data)
DimPlot(sce,reduction = "umap",label=T,group.by = 'orig.ident')

值得注意的是我们的单细胞亚群取子集，这个seurat对象仍然是来源于多个单细胞样品，所以仍然是需要进行某种程度合理的整合哦，我们这里给大家的示范代码是：


library(harmony)
seuratObj <- RunHarmony(sce, "orig.ident")
names(seuratObj@reductions)
seuratObj <- RunUMAP(seuratObj,  dims = 1:15, 
                     reduction = "harmony")
DimPlot(seuratObj,reduction = "umap",label=T )  


sce=seuratObj
sce <- FindNeighbors(sce, reduction = "harmony",
                     dims = 1:15)
sce <- FindClusters(sce, resolution = 0.1)
table(sce@meta.data$seurat_clusters)
DimPlot(sce,reduction = "umap",label=T) 
ggsave(filename = 'harmony_umap_recluster_by_0.1.pdf') 
DimPlot(sce,reduction = "umap",label=T,
        group.by = 'orig.ident') 
ggsave(filename = 'harmony_umap_sce_recluster_by_orig.ident.pdf')

这个文献《Single-cell sequencing of human midbrain reveals glial activation and a Parkinson-specific neuronal state》，就是对常见细胞亚群都进行了细分，首先是第一层次降维聚类分群；

各自的标记基因很容易自己看图表拿到，我们以前分享的帕金森疾病单细胞里面也有这样的基因列表：

astrocytes = c("AQP4", "ADGRV1", "GPC5", "RYR3") 
  endothelial = c("CLDN5", "ABCB1", "EBF1") 
  excitatory = c("CAMK2A", "CBLN2", "LDB2") 
  inhibitory = c("GAD1", "LHFPL3", "PCDH15") 
  microglia = c("C3", "LRMDA", "DOCK8") 
  oligodendrocytes = c("MBP", "PLP1", "ST18") 
  OPC='Tnr,Igsf21,Neu4,Gpr17'
  Ependymal='Cfap126,Fam183b,Tmem212,pifo,Tekt1,Dnah12'
  pericyte=c(  'DCN', 'LUM',  'GSN' ,'FGF7','MME', 'ACTA2','RGS5')

作者这里在展现不同细胞亚群在两个分组的细胞总数和比例，并不符合我的常规做法，如下所示：

可以看到 microglia和Astrocytes都是数量在4000附近的单细胞亚群，作者在正文就描述了这两个细胞亚群的细分：

Astrocytes的细分

一般来说，细分亚群，如果自己的领域没有比较好的积累，是很难给出每个细分亚群一个生物学名字的，可以使用其高表达量基因或者激活的转录因子来标记它。

写在文末

我在《生信技能树》，《生信菜鸟团》，《单细胞天地》的大量推文教程里面共享的代码都是复制粘贴即可使用的，有任何疑问欢迎留言讨论，也可以发邮件给我，详细描述你遇到的困难的前因后果给我，我的邮箱地址是 jmzeng1314@163.com

如果你确实觉得我的教程对你的科研课题有帮助，让你茅塞顿开，或者说你的课题大量使用我的技能，烦请日后在发表自己的成果的时候，加上一个简短的致谢，如下所示：

We thank Dr.Jianming Zeng(University of Macau), and all the members of his bioinformatics team, biotrainee, for generously sharing their experience and codes.

十年后我环游世界各地的高校以及科研院所（当然包括中国大陆）的时候，如果有这样的情谊，我会优先见你。

贺雪峰：精准扶贫为何陷入形式主义？！

中国初创风投公司数量下降了98%

隐形的小地方豪门，好日子真到头了

太突然！网红夜市被搬空，市民懵了

这是我见过水平极高的公众号！（清华北大）

对单细胞亚群取子集后继续降维聚类分群该如何做

写在文末

您可能也对以下帖子感兴趣

贺雪峰：精准扶贫为何陷入形式主义？！

中国初创风投公司数量下降了98%

隐形的小地方豪门，好日子真到头了

太突然！网红夜市被搬空，市民懵了

这是我见过水平极高的公众号！（清华北大）

生成图片，分享到微信朋友圈

对单细胞亚群取子集后继续降维聚类分群该如何做

写在文末

您可能也对以下帖子感兴趣