MAG abundance estimation#
Once we recover MAGs from metagenomic data, we may be interested in estimating their abundance in the samples. We can do it by mapping the original reads to the dereplicated MAGs and calculating the abundance based on the read mapping results. There are a couple of ways to estimate MAG abundance, such as RPKM (Reads Per Kilobase per Million mapped reads) and TPM (Transcripts Per Million). Here we will use TPM to estimate the abundance of each MAG in all samples.
Get MAG lengths#
This step calculates the lengths of each dereplicated MAG, which will be used in the next step to estimate abundance.
mosh annotate get-feature-lengths \
--i-features ./cache:mags_derep \
--o-lengths ./cache:mags_derep_length \
--verbose
Index dereplicated MAGs#
This step indexes the dereplicated MAGs for read mapping. The index is necessary to efficiently map the input reads back to the MAGs.
mosh assembly index-derep-mags \
--i-mags ./cache:mags_derep \
--p-threads 8 \
--p-seed 100 \
--o-index ./cache:mags_derep_index \
--verbose
Map reads to dereplicated MAGs#
In this step, we map the input paired-end reads back to the dereplicated MAGs. This helps in calculating the abundance of each MAG in the sample.
mosh assembly map-reads \
--i-index ./cache:mags_derep_index \
--i-reads ./cache:reads_filtered \
--p-threads 8 \
--p-seed 100 \
--o-alignment-map ./cache:reads_to_derep_mags \
--verbose
Estimate MAG abundance#
This step estimates the abundance of each MAG in the sample based on the read mapping results.
metric
: currently, we support RPKM and TPMmin-mapq
: indicates the minimum required read mapping quality - for Bowtie2, 42 will allow only perfect matches to be retainedmin-base-quality
: only keep alignments with this minimal Phred quality score
For more options, see --help
.
mosh annotate estimate-mag-abundance \
--i-mag-lengths ./cache:mags_derep_length \
--i-maps ./cache:reads_to_derep_mags \
--p-threads 10 \
--p-metric tpm \
--p-min-mapq 42 \
--o-abundances ./cache:mags_derep_ft \
--verbose
Let’s have a look at our estimated MAG abundance!#
First we will use Kraken 2 to classify provided MAGs into taxonomic groups.
Note
Refer to Taxonomic classification of reads section for more details on taxonomic classification with Kraken 2.
The database used here is the PlusPF
database, defined here.
mosh annotate classify-kraken2 \
--i-seqs ./cache:mags_derep \
--i-kraken2-db ./cache:kraken2_db \
--p-threads 40 \
--p-confidence 0.5 \
--p-report-minimizer-data \
--o-reports ./cache:kraken_reports_mags_derep \
--o-hits ./cache:kraken_hits_mags_derep \
--verbose
Then we will convert a Kraken 2 report into a generic taxonomy artifact for downstream analyses.
mosh annotate kraken2-to-mag-features \
--i-reports ./cache:kraken_reports_mags_derep \
--i-hits ./cache:kraken_hits_mags_derep \
--o-taxonomy ./cache:mags_derep_taxonomy \
--verbose
Now we are ready to generate a taxa bar plot.
mosh taxa barplot \
--i-table ./cache:mags_derep_ft \
--i-taxonomy ./cache:mags_derep_taxonomy \
--m-metadata-file ./cocoa-metadata.tsv \
--o-visualization ./results/mags-derep-taxa-bar-plot.qzv \
--verbose
Your visualization should look similar to this one.