Functional annotation#

Required databases#

In order to perform the functional annotation, we will need a couple of different reference databases. Below you will find instructions on how to download these databases using MOSHPIT.

mosh annotate fetch-diamond-db \
    --o-diamond-db ./cache:diamond_db \
    --verbose
mosh annotate fetch-eggnog-db \
    --o-eggnog-db ./cache:eggnog_db \
    --verbose

Alternatively, you can use:

  • mosh annotate build-eggnog-diamond-db to create a DIAMOND formatted reference database for the specified taxon.

  • mosh annotate build-custom-diamond-db to create a DIAMOND formatted reference database from a FASTA input file.

EggNOG search using Diamond aligner#

We will search the dereplicated MAGs against the EggNOG database using the Diamond aligner to identify functional annotations.

mosh annotate search-orthologs-diamond \
    --i-sequences ./cache:mags_derep \
    --i-diamond-db ./cache:diamond_db \
    --p-num-cpus 16 \
    --p-db-in-memory \
    --o-eggnog-hits ./cache:eggnog_hits \
    --o-table ./cache:eggnog_ft  \
    --verbose

Annotate orthologs against eggNOG database#

Orthologs from dereplicated MAGs are annotated against the EggNOG database, providing functional insights into the genes and gene products present in the MAGs.

mosh annotate map-eggnog \
    --i-eggnog-hits ./cache:eggnog_hits \
    --i-eggnog-db ./cache:eggnog_db \
    --p-num-cpus 16 \
    --p-db-in-memory \
    --o-ortholog-annotations ./cache:eggnog_annotations \
    --verbose

Extract annotations#

This method extract a specific annotation from the table generated by EggNOG and calculates its frequencies across all MAGs.

Note

The mosh annotate extract-annotations method allows us to extract specific types of functional annotations, such as CAZymes, KEGG pathways, COG categories, or other functional elements, and calculate their frequency across all dereplicated MAGs.

In this tutorial, we focus on demonstrating the extraction of CAZymes.

mosh annotate extract-annotations \
    --i-ortholog-annotations ./cache:eggnog_annotations \
    --p-annotation caz \
    --p-max-evalue 0.0001 \
    --o-annotation-frequency ./cache:caz_annot_ft \
    --verbose

Multiply tables#

This steps simply calculates the dot product of the mags_derep_ft and caz_annot_ft feature tables. This is useful for combining the annotation data (e.g., CAZymes) with MAG abundance to determine how specific functional annotations are distributed across MAGs, and use this information to estimate the total frequency of each annotation in each sample.

mosh annotate multiply-tables \
    --i-table1 ./cache:mags_derep_ft \
    --i-table2 ./cache:caz_annot_ft \
    --o-result-table ./cache:caz_ft \
    --verbose

Let’s have a look at our CAZymes functional diversity!#

We will start by calculating a Bray-curtis dissimilarity matrix to measure the dissimilarity between each sample, based on observed frequency of different CAZyme annotations in each sample.

mosh diversity beta \
    --i-table ./cache:caz_ft \
    --p-metric braycurtis \
    --o-distance-matrix ./cache:caz_braycurtis_dist

Next, we will perform principal coordinate analysis (PCoA) from the obtained Bray-curtis matrix.

mosh diversity pcoa \
    --i-distance-matrix ./cache:caz_braycurtis_dist  \
    --o-pcoa ./cache:caz_braycurtis_pcoa

Visualization time! Let’s plot the PCoA results.

mosh emperor plot \
    --i-pcoa ./cache:caz_braycurtis_dist \
    --m-metadata-file ./metadata.tsv \
    --o-visualization caz-pcoa.qzv

Your visualization should look similar to this one.

Tip

Once your visualization is ready, click on the Color tab at the top right and select scatter:seed on the first tab to color your samples by seed type. Then click on the Animations tab and choose timepoint as gradient and seed as trajectory. Now, press play! You should see the progression of samples over time.