Taxonomic classification of reads#
In this section we will focus on the taxonomic classification of shotgun metagenomic reads using two different tools: Kraken 2 and Kaiju. We will use the data obtained in the data retrieval section.
Approach 1: Kraken 2#
Before we can use Kraken 2, we need to build or download a database. We will use the build-kraken-db
action to fetch the PlusPF database
from here - this database covers RefSeq sequences for archaea, bacteria, viral, plasmid,
human, UniVec_Core, protozoa and fungi.
mosh annotate build-kraken-db \
--p-collection pluspf \
--o-kraken2-database ./cache:kraken2_db \
--o-bracken-database ./cache:bracken_db \
We can now use the classify-kraken2
command to run Kraken2 using the paired-end reads as a query and the PlusPF database retrieved in the previous step:
mosh annotate classify-kraken2 \
--i-seqs ./cache:reads_filtered \
--i-kraken2-db ./cache:kraken2_db \
--p-threads 72 \
--p-confidence 0.5 \
--p-memory-mapping False \
--p-report-minimizer-data \
--o-reports ./cache:kraken_reports_reads \
--o-hits ./cache:kraken_hits_reads
--verbose
See also
Bracken is a related tool that additionally estimates relative abundances of species or genera to adjust for the genome size the organisms from which each read originated. In order to use this tool we need the Bracken database that was fetched in the first step.
mosh annotate estimate-bracken \
--i-kraken-reports ./cache:kraken_reports_reads \
--i-bracken-db ./cache:bracken_db \
--p-threshold 5 \
--p-read-len 150 \
--o-taxonomy ./cache:bracken_taxonomy \
--o-table ./cache:bracken_ft \
--o-reports ./cache:bracken_reports
To remove the unclassified read fraction we can use the filter-table
action from the q2-taxa
QIIME 2 plugin:
mosh taxa filter-table \
--i-table ./cache:bracken_ft \
--i-taxonomy ./cache:bracken_taxonomy \
--p-exclude Unclassified \
--o-filtered-table ./cache:bracken_ft_filtered
Approach 2: Kaiju#
Similarly to Kraken 2, Kaiju requires a reference database to perform taxonomic classification. We will use the fetch-kaiju-db
action to download the nr_euk database that includes both
prokaryotes and eukaryotes (more info on the taxa here).
mosh annotate fetch-kaiju-db \
--p-database-type nr_euk \
--o-database ./cache:kaiju_nr_euk
We run Kaiju with the confidence of 0.1 using the paired-end reads as a query and the database artifact that was generated in the previous step:
mosh annotate classify-kaiju \
--i-seqs ./cache:reads_paired \
--i-db ./cache:kaiju_nr_euk \
--p-z 16 \
--p-c 0.1 \
--o-taxonomy ./cache:kaiju_taxonomy \
--o-abundances ./cache:kaiju_ft
Finally, we filter the table to remove the unclassified reads:
mosh taxa filter-table \
--i-table ./cache:kaiju_ft \
--i-taxonomy ./cache:kaiju_taxonomy \
--p-exclude unclassified,belong,cannot \
--o-filtered-table ./cache:kaiju_ft_filtered
Visualization#
You can try to generate a taxa bar plot with either of these results now! We will continue with the Kaiju results - to generate a taxa bar plot, you can run:
mosh taxa barplot \
--i-table ./cache:kaiju_ft_filtered \
--i-taxonomy ./cache:kaiju_taxonomy \
--m-metadata-file ./metadata.tsv \
--o-visualization ./results/kaiju_barplot.qzv
Your visualization should look similar to this one.