Metatranscriptomics without Metagenomics (defined Community)

Protocol provided by Anna Sintsova.

Important

This documentation is currently under construction.

Metatranscriptomic data arising from defined communities (i.e. community, whose composition is known) can be analysed in a way that’s similar to traditional RNASeq with a few key differences. In this case, we first map the quality-controlled reads to the bacterial genomes, and then count number of reads mapping to each feature. The statistical analysis to identify differentially expressed features can be performed using DESeq2.

flowchart LR id1(Preprocessing the genome) --> id2(create metagenome fa:fa-cog cat) id2 --> id3(genome index fa:fa-cog bowtie2) classDef tool fill:#96D2E7,stroke:#F8F7F7,stroke-width:1px; style id1 fill:#5A729A,stroke:#F8F7F7,stroke-width:1px,color:#fff class id2,id3 tool

Once the metagenome is ready, you are read to proceed with transcript quantification workflow

Transcript quantification

flowchart LR id1(Data preprocessing) --> id2(Genome alignment fa:fa-cog bowtie2) id2 --> id3(Insert counting fa:fa-cog featureCounts) id3 --> id4(Statistical/ analysis fa:fa-cog DESeq2) classDef tool fill:#96D2E7,stroke:#F8F7F7,stroke-width:1px; style id1 fill:#5A729A,stroke:#F8F7F7,stroke-width:1px,color:#fff class id2,id3,id4 tool

Preprocessing the genomes

First we create a metagenome, i.e. a concatenation of genome sequences for all of the organisms present in the community. We also need to create a combined annotation file (gff). This will be needed later on to count how many reads mapped to each gene.

Note

Why is competitive mapping important? It properly accounts for sequences that potentially map to multiple targets/species (multi-mappers, count as fractions). If the sample was mapped to each species individually, these reads will be counted towards each genome, overestimating the counts.

cat species1.fasta species2.fasta > metagenome.fasta

Here, we build the genome index using bowtie2. This is an essential step before any read alignment step regardless of aligner you choose.

bowtie2-build metagenome.fasta metagenome

Transcript profiling

(Optional) Depending on the library preparation strategy, metatranscriptomic samples can contain large amounts of rRNA. You can use fastqc_screen to assess the amount of rRNA in your samples, and sortmerna[] to filter it out.

Next, we align reads from each sample to our indexed metagenome.

bowtie2 ...

Note

Different alignment and counting tools can be used for this step. We have tested BWA + sushiCounter, salmon as well as bowtie2 + featureCounts. In our hands, all of these pipelines produce very similar results. It is always best to test and see what works for your data!

Next, we count number of inserts aligned to each feature of interest (i.e. gene). For this we use featureCounts and we use –fraction to assign multi-mapped reads …

featureCounts ...

Warning

Be careful when combining different aligners and counting methods - not all of them are perfectly compatible. For example, featureCounts cannot recognize multi-mapped reads in alignment files generated by BWA, and smth about STAR and featureCounts as well.

Statistical analysis. To effectively analyse metatranscriptomic data, you need to account for variation in taxonomic composition across samples. Above, we used matching metagenomic data for this purpose. While we cannot do this here, we can still perform taxon-specific scaling, since we know the taxonomic composition of the community. This is dissected in detail in this paper [], which also provides template code for the analysis. This analysis ends up being equivalent to analysing the dataset as if it were a composition of N traditional RNAseq datasets, where N is the number of species in the community.