
About sotk2
sotk2 is an R package for integrating omics datasets using modules derived from non-negative matrix factorization (NMF) or consensus NMF (cNMF). The core idea is to treat each gene expression program (metagene) as a comparable unit across datasets, then integrate programs through a correlation-based network followed by community detection.
sotk2 is designed to be self-contained and independent: it does not require any prior packages or objects outside this repository. Inputs may come from any platform or modality (for example, bulk RNA-seq, single-cell RNA-seq, spatial transcriptomics, or protein abundance), as long as NMF/cNMF outputs are available (or can be imported).
To run the full interactive version — every input live, recompute on demand — use the public Docker image on your own machine:
docker run --rm -p 11630:11630 -e SOTK2_MODE=full thebiohub/sotk2:latestThen open http://localhost:11630. See Snyder-Institute/sotk2-docker for build details and a launcher script that auto-opens the browser.
Materials
This demo uses three cohorts from the sotk2 tutorial:
- GLASS (Glioma Longitudinal AnalySiS Consortium; PMID: 29432615): bulk RNA-seq from matched primary and recurrent glioma samples
- IVYGAP (Ivy Glioblastoma Atlas Project; 29748285): bulk RNA-seq stratified by anatomic features (CT, IT, LE, MVP, PAN)
- HEILAND (Ravi et al., Cancer Cell, 2022; 35700707): 10x Genomics Visium v1 spatial transcriptomics
Methods
At a high level, sotk2:
- Collects metagenes from multiple NMF runs
- Computes cross-metagene correlation structure
- Constructs a metagene correlation network
- Detects biological modules (communities)
- Abstract and construct a community-level network
- Summarizes communities via cohort-aware annotation and usage-based diagnostics
Conceptual schematic
Features
- Identification of biologically meaningful latent factors from deconvoluted omics data
- Data-driven rank selection across multiple NMF or cNMF runs
- Correlation-based integration of biological modules across datasets and platforms
- Support for spatial, bulk, single-cell, and protein-level omics data
- Community abstraction and network-level visualization for large integrative analyses
- Assessing sample-type enrichment using Pearson residuals (observed vs expected counts; Chi-squared framework)
- Consistent network layouts for comparative, cross-dataset interpretation
GitHub and Documentation
Source code: github.com/Snyder-Institute/sotk2
Documentation: Snyder-Institute.github.io/sotk2