Tumor vs Normal Differential Expression Analysis
Executive Summary
This analysis identifies differentially expressed genes (DEGs) between colorectal tumor tissue samples and matched adjacent normal tissue from 6 patients. Using a paired DESeq2 model, we identified 1,481 statistically significant DEGs (padj < 0.05, |log2FC| > 1.0) out of 18,432 genes passing the minimum count filter.
The transcriptional landscape of tumor samples is characterized by significant upregulation of oncogenes including MYC, KRAS, and PIK3CA, and reciprocal downregulation of tumor suppressor genes including TP53, SMAD4, and APC. These expression changes are consistent with the canonical molecular features of colorectal carcinogenesis.
Pathway analysis of the significant DEG set reveals strong enrichment for Wnt signaling, EGFR/MAPK pathway components, and TGF-β signaling — all established drivers of colorectal cancer progression. The Wnt/β-catenin axis shows particularly strong disruption, evidenced by the reciprocal expression pattern of APC (down, log2FC = −1.65) and CTNNB1 (up, log2FC = +2.14).
These results provide a quantitative, reproducible characterization of the tumor transcriptome in this cohort and establish a ranked gene list suitable for downstream pathway enrichment, network analysis, or validation studies.
Methods
1. Input data and preprocessing
Raw count matrices were generated from RNA-seq FASTQ files aligned to the human reference genome (GRCh38) using STAR v2.7.11a. Gene-level counts were quantified using featureCounts v2.0.6 against the Ensembl v110 annotation. Genes with fewer than 10 counts in at least 2 samples were filtered prior to analysis, retaining 18,432 expressed genes from an initial set of 24,851.
2. Normalization
Library size normalization was performed using DESeq2's median-of-ratios method. Variance-stabilizing transformation (VST) was applied for visualization and clustering. Size factors ranged from 0.78 to 1.34 across samples, indicating comparable library depth after normalization.
3. Differential expression testing
Differential expression was modeled using a negative binomial generalized linear model in DESeq2 v1.42.0, with a paired design formula (~patient + condition) to account for inter-patient variation. Wald tests were applied to estimate log2 fold changes and standard errors. Multiple testing correction was performed using the Benjamini-Hochberg procedure (FDR). Genes with padj < 0.05 and |log2FC| > 1.0 were considered statistically significant.
4. Output generation
Results were exported as a ranked DEG table (CSV), volcano plot (PNG/SVG), and this structured analysis report. All outputs are attached to the job provenance record and available for download.
Key Findings
847 upregulated genes
UpregulatedSignificant upregulation (log2FC ≥ 1, padj < 0.05) in tumor vs. normal samples.
634 downregulated genes
DownregulatedSignificant downregulation (log2FC ≤ −1, padj < 0.05) in tumor vs. normal samples.
MYC oncogene activation
Key oncogeneMYC shows the highest fold change (log2FC = 3.12, padj = 2.1e-09), consistent with colorectal oncogenesis.
Wnt pathway disruption
Pathway signalAPC and CTNNB1 show reciprocal expression changes, indicating Wnt/β-catenin pathway dysregulation.
Top Differentially Expressed Genes
log2 fold change (tumor vs. normal) for the top 12 DEGs by significance. Red bars indicate upregulation; blue bars indicate downregulation in tumor.
Top DEG Results Table
Sorted by adjusted p-value
| Gene | log2FC | Base mean | padj | Direction |
|---|---|---|---|---|
| MYC | +3.12 | 1,420.3 | 2.1e-09 | ↑ Up |
| SMAD4 | -2.31 | 892.6 | 4.8e-08 | ↓ Down |
| BRCA2 | +2.41 | 654.1 | 8.3e-09 | ↑ Up |
| CDH1 | -2.54 | 1,231.8 | 1.7e-07 | ↓ Down |
| KRAS | +2.89 | 2,103.4 | 3.2e-08 | ↑ Up |
| TP53 | -1.87 | 3,412.7 | 2.9e-07 | ↓ Down |
| PIK3CA | +1.98 | 781.2 | 1.4e-06 | ↑ Up |
| APC | -1.65 | 567.4 | 8.1e-06 | ↓ Down |
Showing top 8 of 1,481 significant DEGs. Download CSV for full results.
Provenance
Workflow parameters
| Tool | DESeq2 v1.42.0 |
| Reference genome | GRCh38 (hg38) |
| Annotation | Ensembl v110 |
| Minimum count filter | ≥ 10 reads in ≥ 2 samples |
| FDR threshold | 0.05 (Benjamini-Hochberg) |
| log2FC threshold | |log2FC| ≥ 1.0 |
| Size factor estimation | Median-of-ratios |
| Normalization | VST (Variance Stabilizing Transformation) |
Input files
| File | Size |
|---|---|
counts_matrix_tumor.tsv Count matrixSHA256: a4f2c1...d8e9 | 4.2 MB |
counts_matrix_normal.tsv Count matrixSHA256: b7d3e2...c5f1 | 3.9 MB |
sample_metadata.csv MetadataSHA256: c9a4b3...e2d7 | 1.1 KB |
gene_annotation_hg38.gtf AnnotationSHA256: d1f5c4...a8b3 | 42.8 MB |
This analysis is fully reproducible. The complete provenance record — including exact software versions, parameter values, input file checksums, and runtime environment — is stored and can be used to re-run this analysis from the same inputs at any time.