Calculating Diversity
This page will instruct you in the calculation of various common diversity metrics [what are they used for? what do they tell us?]. These steps are intended to precede the calculation of relative abundance.
Alpha Diversity
To calculate alpha diversity, first filter out NAs and then use the estimate_richness() function from the package phyloseq:
library(phyloseq)
ps.filtered <- ps %>%
subset_taxa(!is.na(superkingdom))
alphadiv <- estimate_richness(ps.filtered, measures = c("Observed", "Shannon")) %>%
mutate(barcode_well = rownames(.)) %>%
mutate(barcode_well = str_replace_all(barcode_well, "X", "")) %>%
mutate(barcode_well = str_replace_all(barcode_well, "\\.", "-")) %>%
as.data.frame()
Filtering unassigned taxa
It is recommended to filter out NAs at the highest level (i.e., superkingdom) for trnL and at Order + Family for 12Sv5. This is because trnL uses exact sequence matching via assignSpecies(), so any non-NA assignment reflects a true match in the reference and NAs appear only when no match exists. 12Sv5, by contrast, is assigned via assignTaxonomy(), which makes assignments at every level along with bootstrap confidence values, which tend to fall off below Order. Any ASV that is NA at both the Order and Family level should be filtered out. Include Family in your filtering criteria to ensure you retain entries that do not have Order or Family assignments but still contain valid assignments at lower taxonomic ranks (e.g., due to gaps in the reference).
This code creates a dataframe with a column barcode_well with the name of each sample and two columns Observed and Shannon with the observed number of taxa and the Shannon diversity respectively of each sample, measures of alpha diversity. We can now join this dataframe to the sample metadata with the following code chunk, based on if a matching barcode_well column exists in your sample metadata:
Now that you have a phyloseq object with alpha diversity metrics added to your sample metadata, you can continue with further analyses to analyze the differences in diversity between different samples or groups or continue with calculating relative abundance.