Back to use cases
OncologyRNA-seqDESeq2Completed

Tumor vs Normal Differential Expression Analysis

Project: Colorectal Cancer Cohort v2Generated: March 28, 2026Job ID: job_8f2a1c9d
18,432
Total genes tested
1,481
Significant DEGs
6
Tumor samples
6
Normal samples

Executive Summary

This analysis identifies differentially expressed genes (DEGs) between colorectal tumor tissue samples and matched adjacent normal tissue from 6 patients. Using a paired DESeq2 model, we identified 1,481 statistically significant DEGs (padj < 0.05, |log2FC| > 1.0) out of 18,432 genes passing the minimum count filter.

The transcriptional landscape of tumor samples is characterized by significant upregulation of oncogenes including MYC, KRAS, and PIK3CA, and reciprocal downregulation of tumor suppressor genes including TP53, SMAD4, and APC. These expression changes are consistent with the canonical molecular features of colorectal carcinogenesis.

Pathway analysis of the significant DEG set reveals strong enrichment for Wnt signaling, EGFR/MAPK pathway components, and TGF-β signaling — all established drivers of colorectal cancer progression. The Wnt/β-catenin axis shows particularly strong disruption, evidenced by the reciprocal expression pattern of APC (down, log2FC = −1.65) and CTNNB1 (up, log2FC = +2.14).

These results provide a quantitative, reproducible characterization of the tumor transcriptome in this cohort and establish a ranked gene list suitable for downstream pathway enrichment, network analysis, or validation studies.

Methods

1. Input data and preprocessing

Raw count matrices were generated from RNA-seq FASTQ files aligned to the human reference genome (GRCh38) using STAR v2.7.11a. Gene-level counts were quantified using featureCounts v2.0.6 against the Ensembl v110 annotation. Genes with fewer than 10 counts in at least 2 samples were filtered prior to analysis, retaining 18,432 expressed genes from an initial set of 24,851.

2. Normalization

Library size normalization was performed using DESeq2's median-of-ratios method. Variance-stabilizing transformation (VST) was applied for visualization and clustering. Size factors ranged from 0.78 to 1.34 across samples, indicating comparable library depth after normalization.

3. Differential expression testing

Differential expression was modeled using a negative binomial generalized linear model in DESeq2 v1.42.0, with a paired design formula (~patient + condition) to account for inter-patient variation. Wald tests were applied to estimate log2 fold changes and standard errors. Multiple testing correction was performed using the Benjamini-Hochberg procedure (FDR). Genes with padj < 0.05 and |log2FC| > 1.0 were considered statistically significant.

4. Output generation

Results were exported as a ranked DEG table (CSV), volcano plot (PNG/SVG), and this structured analysis report. All outputs are attached to the job provenance record and available for download.

Key Findings

847 upregulated genes

Upregulated

Significant upregulation (log2FC ≥ 1, padj < 0.05) in tumor vs. normal samples.

634 downregulated genes

Downregulated

Significant downregulation (log2FC ≤ −1, padj < 0.05) in tumor vs. normal samples.

MYC oncogene activation

Key oncogene

MYC shows the highest fold change (log2FC = 3.12, padj = 2.1e-09), consistent with colorectal oncogenesis.

Wnt pathway disruption

Pathway signal

APC and CTNNB1 show reciprocal expression changes, indicating Wnt/β-catenin pathway dysregulation.

Top Differentially Expressed Genes

log2 fold change (tumor vs. normal) for the top 12 DEGs by significance. Red bars indicate upregulation; blue bars indicate downregulation in tumor.

Upregulated in tumor Downregulated in tumor

Top DEG Results Table

Sorted by adjusted p-value

Genelog2FCBase meanpadjDirection
MYC+3.121,420.32.1e-09↑ Up
SMAD4-2.31892.64.8e-08↓ Down
BRCA2+2.41654.18.3e-09↑ Up
CDH1-2.541,231.81.7e-07↓ Down
KRAS+2.892,103.43.2e-08↑ Up
TP53-1.873,412.72.9e-07↓ Down
PIK3CA+1.98781.21.4e-06↑ Up
APC-1.65567.48.1e-06↓ Down

Showing top 8 of 1,481 significant DEGs. Download CSV for full results.

Provenance

Workflow parameters

ToolDESeq2 v1.42.0
Reference genomeGRCh38 (hg38)
AnnotationEnsembl v110
Minimum count filter≥ 10 reads in ≥ 2 samples
FDR threshold0.05 (Benjamini-Hochberg)
log2FC threshold|log2FC| ≥ 1.0
Size factor estimationMedian-of-ratios
NormalizationVST (Variance Stabilizing Transformation)

Input files

FileSize
counts_matrix_tumor.tsv
Count matrixSHA256: a4f2c1...d8e9
4.2 MB
counts_matrix_normal.tsv
Count matrixSHA256: b7d3e2...c5f1
3.9 MB
sample_metadata.csv
MetadataSHA256: c9a4b3...e2d7
1.1 KB
gene_annotation_hg38.gtf
AnnotationSHA256: d1f5c4...a8b3
42.8 MB

This analysis is fully reproducible. The complete provenance record — including exact software versions, parameter values, input file checksums, and runtime environment — is stored and can be used to re-run this analysis from the same inputs at any time.

Download outputs