Project: NEGEDIA

This report summarises differential gene expression analysis as performed by the negedia/degsanalysis pipeline.

A summary of samples sequencing metrics is below:

Comparisons defined by the user are listed in the following table:

Results

Counts

The results are derived from the global gene expression analysis of the experiment, focusing exclusively on genes that are expressed at least in one sample (genes with 0 counts in all the samples are excluded).

Preliminary analysis

Expression value distributions

The following plots show the log2 expression value distributions of the experiment samples.

The distribution of normalised counts per gene, for each sample, is represented in the form of a box plot.

The density plot represents the distribution of the log2 normalised counts.

Samples correlation

Principal components analysis was conducted based on the 500 most variable genes. Each component was annotated with its percent contribution to variance.

Normalised (Condition)

ANOVA test was used to determine assocations between continuous principal components and categorical covariates (including the variable of interest).

The resulting p values are illustrated below.

The variable ‘Condition’ shows an association with PC1 (49.7%) (p = 0.00).

A hierarchical clustering of genes was undertaken based on the top 500 most variable genes. Distances between genes were estimated based on spearman correlation.

Normalised (Condition)

Differential analysis

Differential expressed genes

Differential expression details

The red dots represent Differentially Expressed Genes (DEGs). Genes with a log2 fold change greater than or equal to the specified value are upregulated compared to the reference (dots on the right). Genes with a log2 fold change smaller than or equal to the specified value are downregulated compared to the reference (dots on the left). Genes with an FDR or padj value less than or equal to 0.05 are considered statistically significant.

Heatmap of normalized scaled expression per replicate of DEGs in the selected comparison.

Gene set analysis

Gene Set Enrichment Analysis (GSEA) is a computational method that determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states (e.g. phenotypes).

Analysed databases:

c2.cp.kegg: Canonical Pathways gene sets derived from the KEGG pathway database.
c2.cp.reactome: Canonical Pathways gene sets derived from the Reactome pathway database.
c2.cp.wikipathways: Canonical Pathways gene sets derived from the WikiPathways pathway database.
c5.go.bp: Gene sets derived from the GO Biological Process ontology.
c5.go.cc: Gene sets derived from the GO Cellular Component ontology.
c5.go.mf: Gene sets derived from the GO Molecular Function ontology.
c8.all: Gene sets that contain curated cluster markers for cell types identified in single-cell sequencing studies of human tissue.

GSEA

In the tables are reported the pathways name, the pathway size, enrichment score (ES), normalized enrichment score (NES), normalized enrichment score (NES), and FRD value.

Materials and Methods

Library Preparation

Total RNA was quantified using the Qubit 4.0 fluorimetric Assay (Thermo Fisher Scientific). Libraries were prepared from 125 ng of total RNA using the NEGEDIA Digital mRNA-seq research grade sequencing service v2.0 (Next Generation Diagnostic srl)¹ which included library preparation, quality assessment and sequencing on a NovaSeq 6000 sequencing system using a single-end, 100 cycle strategy (Illumina Inc.).

Bioinformatics workflow

The raw data were analyzed by Next Generation Diagnostic srl proprietary NEGEDIA Digital mRNA-seq pipeline (v2.0) which involves a cleaning step by quality filtering and trimming, alignment to the reference genome and counting by gene ²³⁴. The raw expression data were normalized, analyzed by NEGEDIA degsanalysis pipeline (v1.2.0) ⁵⁶ and visualized in a proprietary report (v1.0).

QUICK GEODATASET REFERENCE

PROTOCOLS	Description
Growth Protocol	Defined by the user
Treatment Protocol	Defined by the user
Extract Protocol	Defined by the user
Library Construction Protocol	NEGEDIA Digital mRNA-seq research grade sequencing service v2.0 (Next Generation Diagnostic srl)
Library Strategy	NEGEDIA Digital mRNA-seq v2.0

DATA PROCESSING PIPELINE	Description
Data Processing Step	Illumina NovaSeq 6000 base call (BCL) files were converted to fastq files using bcl2fastq
Data Processing Step	Trimming and cleaning with bbduk
Data Processing Step	Alignment was performed with STAR 2.6.0a
Data Processing Step	The expression levels of genes were determined with HTseq-counts 0.9.1
Genome Build	hg38
Processed Data Files Format and Content	Tab-delimited text files including raw counts

DEGs ANALYSIS	Description
Fold Change	-1.5 <= log2(Fold Change) >= 1.5
pAdj	<= 0.05

GSEA ANALYSIS	Description
FDR q-val	<= 25%
Permutation number	1000

Xiong Y, Soumillon M, Wu J et al. A Comparison of mRNA Sequencing with Random Primed and 3′-Directed Libraries. Sci Rep 7, 14626 (2017). https://doi.org/10.1038/s41598-017-14892-x ↩
Bushnell, Brian. 2014. “BBMap: A Fast, Accurate, Splice-Aware Aligner”. United States. https://www.osti.gov/servlets/purl/1241166. ↩
Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013 Jan 1;29(1):15-21. doi: 10.1093/bioinformatics/bts635. Epub 2012 Oct 25. PubMed PMID: 23104886; PubMed Central PMCID: PMC3530905. ↩
Anders S, Pyl PT, Huber W. HTSeq–a Python framework to work with high-throughput sequencing data. Bioinformatics. 2015 Jan 15;31(2):166-9. doi: 10.1093/bioinformatics/btu638. Epub 2014 Sep 25. PubMed PMID: 25260700; PubMed Central PMCID: PMC4287950 ↩
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. ↩
Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005 Oct 25;102(43):15545-50. doi: 10.1073/pnas.0506580102. Epub 2005 Sep 30. PMID: 16199517; PMCID: PMC1239896. ↩

Download Files

Use the buttons listed below to download the specific files.