sotk2 LITE

About sotk2

sotk2 is an R package for integrating omics datasets using modules derived from non-negative matrix factorization (NMF) or consensus NMF (cNMF). The core idea is to treat each gene expression program (metagene) as a comparable unit across datasets, then integrate programs through a correlation-based network followed by community detection.

sotk2 is designed to be self-contained and independent: it does not require any prior packages or objects outside this repository. Inputs may come from any platform or modality (for example, bulk RNA-seq, single-cell RNA-seq, spatial transcriptomics, or protein abundance), as long as NMF/cNMF outputs are available (or can be imported).

You are viewing the LITE deployment

This public instance displays a precomputed analysis of the demo cohorts for fast, safe browsing. Recompute controls (correlation method, datasets, threshold, community detection algorithm, rewiring weights) are disabled. Display-only controls such as labels, node size, and annotation pickers remain interactive.

To run the full interactive version — every input live, recompute on demand — use the public Docker image on your own machine:

docker run --rm -p 11630:11630 -e SOTK2_MODE=full thebiohub/sotk2:latest

Then open http://localhost:11630. See Snyder-Institute/sotk2-docker for build details and a launcher script that auto-opens the browser.

Materials

This demo uses three cohorts from the sotk2 tutorial:

GLASS (Glioma Longitudinal AnalySiS Consortium; PMID: 29432615): bulk RNA-seq from matched primary and recurrent glioma samples
IVYGAP (Ivy Glioblastoma Atlas Project; 29748285): bulk RNA-seq stratified by anatomic features (CT, IT, LE, MVP, PAN)
HEILAND (Ravi et al., Cancer Cell, 2022; 35700707): 10x Genomics Visium v1 spatial transcriptomics

Reprocessed data used in this tutorial are deposited on Zenodo.

Methods

At a high level, sotk2:

Collects metagenes from multiple NMF runs
Computes cross-metagene correlation structure
Constructs a metagene correlation network
Detects biological modules (communities)
Abstract and construct a community-level network
Summarizes communities via cohort-aware annotation and usage-based diagnostics

Metagene and GEP are used interchangeably to match NMF/cNMF terminology.

Conceptual schematic

Features

Identification of biologically meaningful latent factors from deconvoluted omics data
Data-driven rank selection across multiple NMF or cNMF runs
Correlation-based integration of biological modules across datasets and platforms
Support for spatial, bulk, single-cell, and protein-level omics data
Community abstraction and network-level visualization for large integrative analyses
Assessing sample-type enrichment using Pearson residuals (observed vs expected counts; Chi-squared framework)
Consistent network layouts for comparative, cross-dataset interpretation

GitHub and Documentation

Source code: github.com/Snyder-Institute/sotk2

Documentation: Snyder-Institute.github.io/sotk2

Citation

Reproducible transcriptional modules define glioblastoma ecosystems across independent cohorts.

About sotk2

Materials

Methods

Conceptual schematic

Features

GitHub and Documentation

Citation

Usage (H) matrices from cNMF runs

Data:

Rank:

Usage plot

Examine correlation structure

Construct the correlation network

Detect communities

Refine network layout for interpretation

Summarize communities at the module level

Community-level GEP usage (geometric mean)