log.tdata.FPKM.sample.info.subset.WGCNA <- readRDS(here("Data","log.tdata.FPKM.sample.info.subset.missing.WGCNA.RData"))
The WGCNA package requires some strict R session environment set up.
options(stringsAsFactors=FALSE)
# Run the following only if running from terminal
# Does NOT work in RStudio
#enableWGCNAThreads()
WGCNA uses a smooth softmax function to force a scale-free topology on the gene expression network. I cycled through a set of threshold power values to see which one is the best for the specific topology. As recommended by the authors, I choose the smallest power such that the scale-free topology correlation attempts to hit 0.9.
threshold <- softthreshold(soft)
knitr::kable(soft$fitIndices)
Power | SFT.R.sq | slope | truncated.R.sq | mean.k. | median.k. | max.k. |
---|---|---|---|---|---|---|
1 | 0.8612614 | 2.7655275 | 0.9878786 | 6450.08917 | 6489.58052 | 9192.6265 |
2 | 0.4603610 | 0.6900897 | 0.8777347 | 3403.63718 | 3317.36163 | 6173.7855 |
3 | 0.0115589 | -0.0654616 | 0.5811361 | 2104.88148 | 1960.49020 | 4565.3419 |
4 | 0.3740690 | -0.4796736 | 0.6154529 | 1428.36622 | 1261.13792 | 3615.3711 |
5 | 0.5856041 | -0.6794014 | 0.7173936 | 1030.57772 | 848.96741 | 2955.3740 |
6 | 0.7186021 | -0.7823568 | 0.8109538 | 776.74911 | 594.07548 | 2473.5780 |
7 | 0.8102699 | -0.8682678 | 0.8829116 | 604.94642 | 426.65977 | 2137.6500 |
8 | 0.8967570 | -0.9196192 | 0.9550448 | 483.36991 | 312.54528 | 1877.8028 |
9 | 0.9464581 | -0.9647767 | 0.9899286 | 394.28772 | 232.98209 | 1668.3541 |
10 | 0.9566120 | -1.0260007 | 0.9925092 | 327.16020 | 179.37372 | 1520.1972 |
11 | 0.9597939 | -1.0793124 | 0.9872616 | 275.39576 | 142.03410 | 1396.6941 |
12 | 0.9627006 | -1.1166370 | 0.9822050 | 234.69752 | 113.93864 | 1291.7740 |
13 | 0.9622471 | -1.1496961 | 0.9739067 | 202.16652 | 92.12115 | 1201.4449 |
14 | 0.9612874 | -1.1732983 | 0.9665201 | 175.78938 | 75.19930 | 1122.7884 |
15 | 0.9604404 | -1.1962895 | 0.9610598 | 154.13277 | 61.84509 | 1058.3629 |
16 | 0.9605814 | -1.2137460 | 0.9576800 | 136.15417 | 51.14912 | 1002.0938 |
17 | 0.9609191 | -1.2279039 | 0.9543414 | 121.08105 | 42.36921 | 951.8397 |
18 | 0.9607941 | -1.2413496 | 0.9517050 | 108.33143 | 35.28572 | 906.6480 |
19 | 0.9560648 | -1.2510075 | 0.9441395 | 97.46037 | 29.38987 | 865.7590 |
20 | 0.9564626 | -1.2607085 | 0.9441400 | 88.12325 | 24.88051 | 828.5598 |
soft$fitIndices %>%
download_this(
output_name = "Soft Threshold All Data",
output_extension = ".csv",
button_label = "Download data as csv",
button_type = "info",
has_icon = TRUE,
icon = "fa fa-save"
)
I generated a weighted adjacency matrix for the gene expression profile network using the softmax threshold I determined in the previous step. Then I generate the Topological Overlap Matrix (TOM) for the adjacency matrix. This reduces noise and spurious associations according to the authors. I use the TOM matrix to cluster the profiles of each metabolite based on their similarity. I use dynamic tree cutting to generate modules.
soft.power <- 8
cat("\n", "Power chosen")
##
## Power chosen
soft.power
## [1] 8
cat("\n \n")
adj.mtx <- adjacency(log.tdata.FPKM.sample.info.subset.WGCNA, power=soft.power)
TOM = TOMsimilarity(adj.mtx)
## ..connectivity..
## ..matrix multiplication (system BLAS)..
## ..normalization..
## ..done.
diss.TOM = 1 - TOM
metabolite.clust <- hclust(as.dist(diss.TOM), method="average")
min.module.size <- 15
modules <- cutreeDynamic(dendro=metabolite.clust, distM=diss.TOM, deepSplit=2, pamRespectsDendro=FALSE, minClusterSize=min.module.size)
## ..cutHeight not given, setting it to 0.995 ===> 99% of the (truncated) height range in dendro.
## ..done.
# Plot modules
cols <- labels2colors(modules)
table(cols)
## cols
## black blue brown brown4 cyan
## 500 1798 1634 30 231
## darkgreen darkgrey darkmagenta darkolivegreen darkorange
## 134 127 85 86 113
## darkorange2 darkred darkturquoise floralwhite green
## 38 158 128 40 1214
## greenyellow grey grey60 ivory lightcyan
## 404 29 198 44 210
## lightcyan1 lightgreen lightsteelblue1 lightyellow magenta
## 45 175 56 169 450
## mediumpurple3 midnightblue orange orangered4 paleturquoise
## 57 228 117 58 93
## pink plum1 purple red royalblue
## 487 62 445 665 165
## saddlebrown salmon sienna3 skyblue skyblue3
## 100 254 82 103 65
## steelblue tan turquoise violet white
## 96 326 4287 87 108
## yellow yellowgreen
## 1274 77
plotDendroAndColors(
metabolite.clust, cols, xlab="", sub="", main="Gene Clustering on TOM-based Similarity",
dendroLabels=FALSE, hang=0.03, addGuide=TRUE, guideHang=0.05
)
This function in WGCNA calculates the total connectivity of a node (metabolite) in the network. The kTotal column specifies this connectivity. The kWithin column specifies the node connectivity within its module. The kOut column specifies the difference between kTotal and kWithin. Generally, the out degree should be lower than the within degree, since the WGCNA procedure groups similar metabolite profiles together.
net.deg <- intramodularConnectivity(adj.mtx, merged.cols)
saveRDS(net.deg,here("Data","Chang_2DG_BL6_connectivity.RData"))
head(net.deg)
## kTotal kWithin kOut kDiff
## ENSMUSG00000000001 296.99507 99.48080 197.51426 -98.03346441
## ENSMUSG00000000028 376.29779 343.86570 32.43209 311.43360697
## ENSMUSG00000000031 216.08673 116.59827 99.48846 17.10980402
## ENSMUSG00000000037 142.45133 73.60603 68.84530 4.76072455
## ENSMUSG00000000049 241.81391 115.13560 126.67831 -11.54270168
## ENSMUSG00000000056 56.11815 28.09418 28.02397 0.07020203
The dataset generated coexpression modules of the following sizes. The grey submodule represents metabolites that did not fit well into any of the modules.
table(merged.cols)
## merged.cols
## black darkgreen darkolivegreen darkorange2 floralwhite
## 675 3102 892 321 40
## grey ivory lightcyan lightcyan1 lightsteelblue1
## 29 44 327 45 156
## lightyellow mediumpurple3 midnightblue orangered4 pink
## 5201 311 610 58 487
## royalblue skyblue skyblue3 turquoise violet
## 165 430 65 4287 87
# Module names in descending order of size
module.labels <- names(table(merged.cols))[order(table(merged.cols), decreasing=T)]
# Order eigenmetabolites by module labels
merged.mes <- merged.mes[,paste0("ME", module.labels)]
colnames(merged.mes) <- paste0("log.tdata.FPKM.sample.info.subset.WGCNA_", module.labels)
module.eigens <- merged.mes
# Add log.tdata.FPKM.sample.info.subset.WGCNA tag to labels
module.labels <- paste0("log.tdata.FPKM.sample.info.subset.WGCNA_", module.labels)
# Generate and store metabolite membership
module.membership <- data.frame(
Gene=colnames(log.tdata.FPKM.sample.info.subset.WGCNA),
Module=paste0("log.tdata.FPKM.sample.info.subset.WGCNA_", merged.cols)
)
saveRDS(module.labels, here("Data","log.tdata.FPKM.sample.info.subset.WGCNA.module.labels.RData"))
saveRDS(module.eigens, here("Data","log.tdata.FPKM.sample.info.subset.WGCNA.module.eigens.RData"))
saveRDS(module.membership, here("Data", "log.tdata.FPKM.sample.info.subset.WGCNA.module.membership.RData"))
write.table(module.labels, here("Data","log.tdata.FPKM.sample.info.subset.WGCNA.module.labels.csv"), row.names=F, col.names=F, sep=",")
write.csv(module.eigens, here("Data","log.tdata.FPKM.sample.info.subset.WGCNA.module.eigens.csv"))
write.csv(module.membership, here("Data","log.tdata.FPKM.sample.info.subset.WGCNA.module.membership.csv"), row.names=F)
Analysis performed by Ann Wells
The Carter Lab The Jackson Laboratory 2023
ann.wells@jax.org