26  单细胞数据标准化

scRNA-seq的标准化是一个重要的预处理步骤,目的是消除技术变异(比如比如测序深度和基因长度等因素),使基因表达和/或样本之间的比较更加可靠。标准化方法可以是简单的全局缩放和统一转换,也可以是更复杂的基于每个基因的校正。重要的是要认识到不同方法可能适用于不同的数据集和研究目标,并且了解这些方法的假设和局限性对于正确解释结果至关重要。

26.1 加载R包

使用rm(list = ls())来清空环境中的所有变量。

library(tidyverse)
library(Seurat)
library(cowplot)

rm(list = ls())
options(stringsAsFactors = F)
options(future.globals.maxSize = 10000 * 1024^2)

grp_names <- c("Early Stage", "Late Stage")
grp_colors <- c("#8AC786", "#B897CA")
grp_shapes <- c(15, 16)

26.2 导入数据


seurat_obj <- readRDS("./data/result/scRNA/Seurat_filter.RDS")

seurat_obj

26.3 消除测序深度影响

首先了解以下两个概念:


seurat_norm <- NormalizeData(
  object = seurat_obj,
  normalization.method = "LogNormalize",
  scale.factor = 10000,
  verbose = FALSE)

seurat_norm

26.4 评估细胞周期的影响

细胞周期(cell cycle)是指细胞从一次分裂完成开始到下一次分裂结束所经历的全过程,包括间期和分裂期。评估细胞周期有助于了解细胞所处的具体阶段(如G1期、S期、G2期和M期),这对于理解细胞的功能状态、代谢活动和生长特性至关重要。

load("./data/result/scRNA/Homo_sapiens_cycle.rda")

seurat_phase@meta.data %>% 
  ggplot(aes(x = S.Score, y = G2M.Score)) + 
  geom_point(aes(color = Phase)) +
  theme_minimal()

26.5 识别高度可变的特征

可以使用Seurat函数FindVariableFeatures选择vst(Variance Stabilizing Transformation,方差稳定变换)方法(筛选标准是通过基因在不同细胞间的平均表达量和变异程度),并设置高变异基因数量为2000。


seurat_var <- FindVariableFeatures(
  object = seurat_phase,
  selection.method = "vst",
  nfeatures = 2000,
  verbose = FALSE)

LabelPoints(plot = VariableFeaturePlot(seurat_var),
            points = head(VariableFeatures(seurat_var), 10),
            repel = TRUE)

26.6 缩放数据


seurat_scale <- ScaleData(
  object = seurat_var,
  verbose = FALSE)

seurat_scale

26.7 降维聚类

  • 查看细胞周期的影响

seurat_pca <- RunPCA(
  object = seurat_scale,
  features = VariableFeatures(object = seurat_scale),
  verbose = FALSE)

DimPlot(seurat_pca,
        reduction = "pca",
        group.by = "Phase")

26.8 输出结果

if (!dir.exists("./data/result/scRNA/")) {
  dir.create("./data/result/scRNA/", recursive = TRUE)
}

saveRDS(seurat_pca, file = "./data/result/scRNA/Seurat_pca.RDS", compress = TRUE)

26.9 总结

我们采取了严谨的数据预处理流程。这一流程的核心在于通过数据标准化来降低技术变异,并妥善处理生物学变异,以确保分析结果的准确性和可靠性。

系统信息
sessionInfo()
R version 4.3.3 (2024-02-29)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.2

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Asia/Shanghai
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
 [1] cowplot_1.1.3      Seurat_5.0.3       SeuratObject_5.0.2 sp_2.1-4          
 [5] lubridate_1.9.3    forcats_1.0.0      stringr_1.5.1      dplyr_1.1.4       
 [9] purrr_1.0.2        readr_2.1.5        tidyr_1.3.1        tibble_3.2.1      
[13] ggplot2_3.5.1      tidyverse_2.0.0   

loaded via a namespace (and not attached):
  [1] deldir_2.0-4           pbapply_1.7-2          gridExtra_2.3         
  [4] rlang_1.1.3            magrittr_2.0.3         RcppAnnoy_0.0.22      
  [7] spatstat.geom_3.2-9    matrixStats_1.3.0      ggridges_0.5.6        
 [10] compiler_4.3.3         reshape2_1.4.4         png_0.1-8             
 [13] vctrs_0.6.5            pkgconfig_2.0.3        fastmap_1.1.1         
 [16] utf8_1.2.4             promises_1.3.0         rmarkdown_2.26        
 [19] tzdb_0.4.0             xfun_0.43              jsonlite_1.8.8        
 [22] goftest_1.2-3          later_1.3.2            spatstat.utils_3.0-4  
 [25] irlba_2.3.5.1          parallel_4.3.3         cluster_2.1.6         
 [28] R6_2.5.1               ica_1.0-3              spatstat.data_3.0-4   
 [31] stringi_1.8.4          RColorBrewer_1.1-3     reticulate_1.37.0     
 [34] parallelly_1.37.1      scattermore_1.2        lmtest_0.9-40         
 [37] Rcpp_1.0.12            knitr_1.46             tensor_1.5            
 [40] future.apply_1.11.2    zoo_1.8-12             sctransform_0.4.1     
 [43] httpuv_1.6.15          Matrix_1.6-5           splines_4.3.3         
 [46] igraph_2.0.3           timechange_0.3.0       tidyselect_1.2.1      
 [49] abind_1.4-5            rstudioapi_0.16.0      yaml_2.3.8            
 [52] spatstat.random_3.2-3  spatstat.explore_3.2-7 codetools_0.2-19      
 [55] miniUI_0.1.1.1         listenv_0.9.1          plyr_1.8.9            
 [58] lattice_0.22-6         shiny_1.8.1.1          withr_3.0.0           
 [61] ROCR_1.0-11            evaluate_0.23          Rtsne_0.17            
 [64] future_1.33.2          fastDummies_1.7.3      survival_3.7-0        
 [67] polyclip_1.10-6        fitdistrplus_1.1-11    pillar_1.9.0          
 [70] BiocManager_1.30.23    KernSmooth_2.23-22     renv_1.0.0            
 [73] plotly_4.10.4          generics_0.1.3         RcppHNSW_0.6.0        
 [76] hms_1.1.3              munsell_0.5.1          scales_1.3.0          
 [79] globals_0.16.3         xtable_1.8-4           glue_1.7.0            
 [82] lazyeval_0.2.2         tools_4.3.3            data.table_1.15.4     
 [85] RSpectra_0.16-1        RANN_2.6.1             leiden_0.4.3.1        
 [88] dotCall64_1.1-1        grid_4.3.3             colorspace_2.1-0      
 [91] nlme_3.1-164           patchwork_1.2.0        cli_3.6.2             
 [94] spatstat.sparse_3.0-3  spam_2.10-0            fansi_1.0.6           
 [97] viridisLite_0.4.2      uwot_0.2.2             gtable_0.3.5          
[100] digest_0.6.35          progressr_0.14.0       ggrepel_0.9.5         
[103] htmlwidgets_1.6.4      htmltools_0.5.8.1      lifecycle_1.0.4       
[106] httr_1.4.7             mime_0.12              MASS_7.3-60.0.1