Example and Interpretation

Example dataset

We have provided a worked example that will hopefully help beginners get started with this assay.

This dataset contains 26 pooled strongyle worm samples collected from Brazilian cattle farms. Each sample contains 1000 L3 larvae cultured from fecal samples. The Nemabiome assay was used to investigate species proportions within these pooled samples. The raw results from the assay are usually downloaded directly from the Illumina website (Basespace). We have provided the raw results from these samples below. Begin by downloading and unzipping all three files.

Example Miseq data files

Also download the example file nemabiome.files. This file contains a list of the samples that we are analyzing, and specifies which files belong to each sample. From this you can generate your stability.files that should look like this.

Use these files to work through the steps set-out under the Analysis tab.

Example output

You can download the file example_nemabiome_results.txt. This is what your output file should look like if you have completed the analysis correctly.

Initial interpretation of results

A youtube video outlines the main steps involved with the initial manipulation of raw read count data and basic visualisation of the example dataset.

1. Visualize raw read counts in excel spreadsheet

The first step is to copy and paste the raw read counts into an excel spreadsheet. The first three columns (rankID, taxon, daughterlevel) indicate the level of taxonomic classification. The fourth column (total) is the total number of reads assigned to that taxonomic level. The individual read counts for each sample begins in the 5th column and the sample name is given in the top row. To display reads solely assigned at the species level, delete all rows other than “taxlevel 12”.

2. Filter raw read count data

Delete rows that have very few total reads. These are very likely the result of background contamination. The number of reads chosen as a minimum threshold will vary between laboratories and specific applications. It is advisable to run blank sequencing controls alongside your experimental samples that can be used to assess the level of background contamination. In addition, rows that are “unclassified” at the species level also need to examined critically. If the total read count for these unclassified species are high then they may represent a species that is not present in the database. if the total read count is low (below the threshold that has been set), then the reads are probably the result of PCR/sequencing error and therefore need to be removed from the dataset.

3. Correction for PCR bias

The efficiency of ITS-2 rDNA PCR amplification varies between species for a variety of different reasons (see Avramenko et al., 2015 for a more comprehensive discussion of the issues involved). Consequently, correction factors have been estimated that can be used to correct for this amplification bias. The correction factor that we have empirically determined for the major cattle and sheep species are provided below (from Avramenko et al 2017 and Redman et al, unpublished data). Simply multiply the raw read count by the correction factor for each individual species.

Correction factors

Parasite species Correction factor
Cooperia punctata 2.7087
Cooperia oncophora 1.1545
Haemonchus placei 0.9339
Nematodirus helvetianus 0.7127
Ostertagia ostertagi 1.2592
Parasite species Correction factor
Cooperia curticei 1.6107
Teladorsagia circumcincta 1.1885
Trichostrongylus vitrinus 1.0239
Trichostrongylus axei 0.9647
Trichostrongylus colubriformis 1.0239
Haemonchus contortus 0.6970

4. Calculate relative abundance of each species

To estimate the relative abundance of each species in a sample simply divide the corrected read count by the total corrected read count for each individual sample and multiply by 100.

5. Graphical visualisation of species proportions

The resulting percentages can then be turned into a stacked bar chart as the the first step at visualising your results.