date: 2024-11-05
title: Lab-2
status: DONE
author:
- AllenYGY
tags:
- Lab
publish: True
Lab-2
install.packages("BiocManager")
BiocManager::install(c("S4Vectors", "IRanges", "GenomicRanges", "SummarizedExperiment",
"BiocGenerics", "Biobase", "BiocParallel", "matrixStats", "locfit", "ggplot2"))
In this step, we load the raw counts and sample mapping data for further analysis.
rawCounts <- read.delim("E-GEOD-50760-raw-counts.tsv")
sampleData <- read.delim("E-GEOD-50760-experiment-design.tsv")
# Prepare countData and colData
countData <- subset(rawCounts, select = -c(Gene.Name, Gene.ID))
rownames(countData) <- rawCounts$Gene.ID
colData <- data.frame(condition = sampleData$Sample.Characteristic.biopsy.site.)
rownames(colData) <- sampleData$Run
Read the DESeq2 manual for reference: DESeq2 Manual
# Load DESeq2
library(DESeq2)
# Create DESeq2 dataset
dds <- DESeqDataSetFromMatrix(countData = countData,
colData = colData,
design = ~condition)
# Run DESeq2 to identify differentially expressed genes
dds <- DESeq(dds)
res <- results(dds)
# Summary of results
summary(res)
# Filter and save DE genes by p-value
DE_genes <- subset(res, padj < 0.05)
write.csv(as.data.frame(DE_genes), file = "DE_genes.csv") # Save as .csv
Identify functional enrichment of the DE genes using clusterProfiler and org.Hs.eg.db packages.
# Load necessary libraries
library(clusterProfiler)
library(org.Hs.eg.db)
library(ggplot2)
# Gene set enrichment analysis
gene_id_list <- rownames(DE_genes)
GO_result <- enrichGO(gene_id_list,
OrgDb = org.Hs.eg.db,
ont = "ALL",
pvalueCutoff = 0.05,
pAdjustMethod = "BH",
keyType = 'ENSEMBL')
# Plotting gene set enrichment results
dotplot(GO_result, x = "GeneRatio", color = "p.adjust", showCategory = 10, title = "Gene Ontology Dotplot") +
theme(axis.text.y = element_text(size = 8))
barplot(GO_result, x = "GeneRatio", color = "p.adjust", showCategory = 10, title = "Gene Ontology Barplot") +
theme(axis.text.y = element_text(size = 8))
cnetplot(GO_result, showCategory = 10)
The results of your gene set enrichment analysis (GSEA) for the differentially expressed (DE) genes between tumor and normal samples provide insights into the biological processes that are significantly associated with the observed differences.
The dot plot and bar plot you generated highlight the top enriched Gene Ontology (GO) terms based on the adjusted p-value (p.adjust). The size of each point in the dot plot represents the count of genes associated with each GO term, while the color gradient shows the level of significance, with red representing the most significant terms. In both plots, terms related to metabolic processes, particularly those involved in amino acid and small molecule metabolism, are prominently enriched.
The GSEA results support the idea that tumor cells may have a distinct metabolic profile compared to normal cells, potentially highlighting pathways that could be targeted for therapeutic intervention.
Apply hierarchical clustering to genes and generate a heatmap.
# Load pheatmap
library(pheatmap)
# Create heatmap
pheatmap(cor(countData),
annotation = colData,
cluster_cols = FALSE,
color = hcl.colors(50, "BluYl")) # Adjust color for heatmap