Functional annotation#
Required databases#
In order to perform the functional annotation, we will need a couple of different reference databases. Below you will find instructions on how to download these databases using MOSHPIT.
mosh annotate fetch-diamond-db \
--o-diamond-db ./cache:diamond_db \
--verbose
mosh annotate fetch-eggnog-db \
--o-eggnog-db ./cache:eggnog_db \
--verbose
Alternatively, you can use:
mosh annotate build-eggnog-diamond-db
to create a DIAMOND formatted reference database for the specified taxon.mosh annotate build-custom-diamond-db
to create a DIAMOND formatted reference database from a FASTA input file.
EggNOG search using Diamond aligner#
We will search the dereplicated MAGs against the EggNOG database using the Diamond aligner to identify functional annotations.
mosh annotate search-orthologs-diamond \
--i-sequences ./cache:mags_derep \
--i-diamond-db ./cache:diamond_db \
--p-num-cpus 16 \
--p-db-in-memory \
--o-eggnog-hits ./cache:eggnog_hits \
--o-table ./cache:eggnog_ft \
--verbose
Annotate orthologs against eggNOG database#
Orthologs from dereplicated MAGs are annotated against the EggNOG database, providing functional insights into the genes and gene products present in the MAGs.
mosh annotate map-eggnog \
--i-eggnog-hits ./cache:eggnog_hits \
--i-eggnog-db ./cache:eggnog_db \
--p-num-cpus 16 \
--p-db-in-memory \
--o-ortholog-annotations ./cache:eggnog_annotations \
--verbose
Extract annotations#
This method extract a specific annotation from the table generated by EggNOG and calculates its frequencies across all MAGs.
Note
The mosh annotate extract-annotations
method allows us to extract specific types of functional annotations, such as
CAZymes, KEGG pathways, COG categories, or other functional elements, and calculate their frequency across
all dereplicated MAGs.
In this tutorial, we focus on demonstrating the extraction of CAZymes.
mosh annotate extract-annotations \
--i-ortholog-annotations ./cache:eggnog_annotations \
--p-annotation caz \
--p-max-evalue 0.0001 \
--o-annotation-frequency ./cache:caz_annot_ft \
--verbose
Multiply tables#
This steps simply calculates the dot product of the mags_derep_ft
and caz_annot_ft
feature tables. This is useful for
combining the annotation data (e.g., CAZymes) with MAG abundance to determine how specific functional annotations
are distributed across MAGs, and use this information to estimate the total frequency of each annotation in each sample.
mosh annotate multiply-tables \
--i-table1 ./cache:mags_derep_ft \
--i-table2 ./cache:caz_annot_ft \
--o-result-table ./cache:caz_ft \
--verbose
Let’s have a look at our CAZymes functional diversity!#
We will start by calculating a Bray-curtis dissimilarity matrix to measure the dissimilarity between each sample, based on observed frequency of different CAZyme annotations in each sample.
mosh diversity beta \
--i-table ./cache:caz_ft \
--p-metric braycurtis \
--o-distance-matrix ./cache:caz_braycurtis_dist
Next, we will perform principal coordinate analysis (PCoA) from the obtained Bray-curtis matrix.
mosh diversity pcoa \
--i-distance-matrix ./cache:caz_braycurtis_dist \
--o-pcoa ./cache:caz_braycurtis_pcoa
Visualization time! Let’s plot the PCoA results.
mosh emperor plot \
--i-pcoa ./cache:caz_braycurtis_dist \
--m-metadata-file ./metadata.tsv \
--o-visualization caz-pcoa.qzv
Your visualization should look similar to this one.
Tip
Once your visualization is ready, click on the Color
tab at the top right and select scatter:seed
on the first tab
to color your samples by seed type. Then click on the Animations
tab and choose timepoint
as gradient and seed
as trajectory. Now, press play! You should see the progression of samples over time.