Hanieh Saeedi, Senckenberg Research Institute and Natural History Museum, Germany
GBIF, LifeWatch ERIC, ENA
Data in the Ocean Biodiversity Information System (OBIS) are associated with the World Register of Marine Species (WoRMS) as the taxonomic backbone, the most reliable source for marine species taxonomy. That is why AphiaID (preferably as an LSID; a unique identifier for each taxon in WoRMS) is a mandatory field in data contribution to OBIS.
In GBIF, occurrence data are matched to the GBIF backbone by a scientific name string. Thus, many musea worldwide share their data with GBIF using different platforms (e.g. IPT and BioCASe installation) without any taxon ID. When those musea try to flow their data from GBIF to OBIS, they face the problem of missing identifiers and having to match all available scientific names to WoRMS, either through the available WoRMS services or the LifeWatch Species Information Backbone, including WoRMS, to comply with the OBIS data quality requirements. The process of taxon matching for huge collection materials for medium-big size musea can be complicated and time-consuming.
While the WoRMS catalogue is a checklist in GBIF and almost 95% mapped to GBIF backbone, we did not find a possibility to use this connection between the two checklists. For occurrences already mapped to a taxon in the GBIF backbone, it should be possible to extract the AphiaID of the corresponding taxon from the WoRMS taxonomy.
Our idea is to interlink the data contributed to GBIF with WoRMS taxonomy automatically or via LifeWatch vLab, based on taxon matches in GBIF. In this case, the AphiaID could simply be extracted for those entries in GBIF. This would result in a much more straightforward and (taxonomically) quality controlled data flow from natural history collections to OBIS. In addition, it would also provide major quality benefits to GBIF for available (marine) scientific names and a closer collaboration between the OBIS and GBIF communities.
Implementing an interlink between GBIF data records and the WoRMS taxonomy via the LifeWatch connection, and extracting the AphiaID directly from GBIF might open up the same possibility and pathway for taxon checklists existing in other resources with a high overlap to GBIF taxonomy. One of these resources is the European Nucleotide Archive (ENA) which archives comprehensive information on the world’s nucleotide sequencing. This possibility thus facilitates the data flow between GBIF, ENA, and OBIS using WoRMS and LifeWatch vLab taxon match services to link diverse data types together in new ways. This effort will greatly enhance marine data sharing and quality control worldwide.
The project was supervised by the BiCIKL partner LifeWatch and the work was performed by a research group of 3 scientists (2 from germany 1 from Belgium), led by Dr. Hanieh Saeedi from the Senckenberg Museum, Frankfurt am Main, Germany. During this project, the data mapping between GBIF and OBIS was significantly improved. The use of ChecklistBank for taxon name mapping increased the accuracy of data transferred from GBIF to OBIS, enhancing the reliability of biodiversity assessments. The process was streamlined through automation, which reduced the need for manual data handling, accelerating the data integration process and minimising errors. Despite challenges with ambiguous taxon matches, continuous improvements are being made to refine the mapping algorithms and expand the coverage of accurately matched taxa.