Henrik Nilsson, University of Gothenburg, Sweden
Henrik Nilsson, University of Gothenburg, Sweden; Julia Pawlowska, University of Warsaw, Poland
Urmas Kõljalg, UTARTU
PlutoF, CoL, GBIF, ELIXIR and ENA
UNITE, MycoBank, IF
DNA sequences, taxon names, specimens, living specimens, curating (third-party annotations) of DNA sequences
Fig.1. Screenshot of the single UNITE Species Hypotheses in PlutoF. Here researchers can curate eukaryote species based on public rDNA ITS sequences. Curated third-party annotations can be sent directly to the European Nucleotide Archive and they are included in the UNITE reference datasets used in eDNA analyses.
Fig.2. Screenshot of the UNITE homepage displaying the number of fungal and other eukaryotic species hypotheses (SH) in the current version 9.0. Reference databases of SHs can be downloaded from the Resources page in different formats and implemented in different eDNA pipelines like QIIME, mothur, CREST, etc.
The fungal kingdom is being redefined by the staggering numbers of hitherto undescribed species unravelled by environmental sequencing. Mycologists have precious few tools to explore these data, leaving the data woefully unexplored for fungal diversity. We propose a collaboration between PlutoF, Index Fungorum (IF), MycoBank (MB), Catalogue of Life (CoL), GBIF, ELIXIR, and European Nucleotide Archive (ENA) to develop a set of software tools to allow fungal taxonomists and ecologists to make use of these data in their research.
Many fungal taxonomists do not have the bioinformatics background needed to extract the taxonomic data from the enormous pool of metabarcoding datasets available, but the tools presented in this proposal essentially get rid of the need for a bioinformatics background to explore eDNA sequences for taxonomic signal. The proposal seeks to blend fungal taxonomy and molecular ecology so that it will be straightforward to draw from molecular ecology results in taxonomy, and the other way around. This is not the case today, and the two fields are often pursued as more or less decoupled fields (https://mycokeys.pensoft.net/article/76053/ and https://mycokeys.pensoft.net/article/56691/).
The project was supervised by the BiCIKL partner UTARTU. Dr. R. Henrik Nilsson (University of Gothenburg) and Dr. Julia Pawlowska (University of Warsaw) lead the research team who used the new services. Many new services and tools were developed or upgraded and deployed in the PlutoF platform for molecular identification of eukaryotes. Examples include importing new taxon names from other infrastructures (Mycobank and GBIF), third-party annotations of INSDC sequences and publishing them in ENA, an SH-matching analysis tool, and publishing eDNA data in GBIF. 8,395.383 HTS (High Throughput Sequencing) and Sanger ITS sequences were linked to the taxonomic backbones of PlutoF and GBIF which allowed these sequences to be published in the GBIF database. More importantly eDNA based studies can now publish their ITS based findings as taxon occurrences in GBIF. Users of the sequence third-party annotation tool made 27,941 annotations from which 4,584 were exported to ENA through Elixir Contextual Data Clearinghouse.