Torsten Dikow, Smithsonian National Museum of Natural History, USA
Torsten Dikow, Smithsonian National Museum of Natural History, USA
Donat Agosti, Plazi
TreatmentBank, BLR, BHL, Zenodo, COL, and GBIF
A major impediment to advancing the discovery and description of biodiversity is the lack of accessibility of data published in previous taxonomic revisions and monographs. While the digitization of natural history literature often makes such older publications accessible online, the data contained within, such as specimen occurrence data, species descriptions/re-descriptions, or illustrations, remain available only in a human-readable format. In this way, the data are not fit for re-use and re-purposing by researchers today and in the future.
Through tools developed by Plazi, digitized taxonomic articles – either previously (retroactive) or newly (proactive) published – can be marked up in XML to make the original data accessible in a machine-readable format. Ultimately, published information on individual specimens needs to be linked through the digital/extended specimen platform, and providing machine-readable access to such data from the previously published literature is an important step in this direction.
This proposal aims to make the taxonomic treatments of a diverse group of insects – apiocerid, assassin, and mydas fly species – openly accessible online. We will focus as a proof of concept on the southern African fauna (some 1,000 species in Botswana, Eswatini, Lesotho, Mozambique, Namibia, South Africa, and Zimbabwe) and extract all species treatments from the published literature through XML-markup. The persistent links to the TreatmentBank records for each species will be added to an existing online catalogue – the Afrotropical Asilidae Portal – to allow specialist taxonomists working on this fauna to access the data in a cybercatalog setting. Such a comprehensive species entry might look like the one for the species Microphontes safra.
To facilitate the digital connection of data and the actual specimens in natural history museums in a biodiversity knowledge graph, the published specimen occurrence data will be annotated. We propose to also enhance specimen records from older publications by retrieving the unique specimen identifiers from recently digitised museum collections and matching them to the material cited in the publications.
In summary, the proposed project aims to free important primary biodiversity data from primarily older taxonomic publications of flies from southern Africa to make them digitally accessible in an open-access framework through TreatmentBank, GBIF, and the Afrotropical Asilidae Portal.
The existing online taxon catalogues of asiloid flies, as well as their GBIF records, will be enhanced by the detailed data processed in TreatmentBank and deposited in the Biodiversity Literature Repository and provide, for the first time, all original descriptions and every subsequent redescription of a species. The taxonomic treatments include furthermore specimen occurrence data and illustrations of asiloid flies from southern Africa published from about 1800 to today. Specialist researchers on this fly fauna can now re-use and re-purpose all previously published data to advance the discovery and description of biodiversity from this diverse region of the world. The existing species catalogues will now also be transformed into true cybercatalogs with persistent links to online repositories and assist in developing a biodiversity knowledge graph and digital/extended specimen platform.
This proposal will therefore enhance biodiversity knowledge for the southern African insect fauna and connect to local initiatives, particularly in Namibia and South Africa through collaborations and research by T. Dikow.
We also envision publishing a data paper describing the workflow and advantages of embracing the retroactive data capture from taxonomic revisions for biodiversity discovery.
The methods developed by Plazi to extract taxonomic names, species descriptions (termed ‘treatments’), and specimens examined (termed ‘material citations’) were applied to journal articles from two journals (African Invertebrates and Annals of the Natal Museum), that focus on asiloid flies from Africa and in particular on southern Africa and articles published by the eminent fly taxonomist Jason Londt. In total, 1,467 pages were processed from 47 unique journal articles (out of a targeted 64 articles). These pages included treatments for 652 species of asiloid flies including a total of 3,649 specimens and 45 articles, 575 treatments and 2,780 material citations from Londt. In a next step, the material citations will be linked with specimens using the unique specimen identifiers in GBIF. This will be complemented by processing printed articles and those from other journals. The project was led by the BiCIKL partner Plazi and Dr. Torsten Dikow from the Smithsonian Institute, US.