Biodiversity Systematics and Software Applications

The Directory of Research Systematics Collections (DRSC) is a directory of the natural history research collections. This directory is developed and maintained through a partnership between the Association of Systematics Collections (ASC) and the U.S.G.S. Biological Resources Division (BRD). The system was developed in part to support the National Biological Information Infrastructure (NBII) and the Electronic Natural History Museum Initiative of the BRD. The DRSC Database currently contains information submitted by 525 natural history research collections.

Globally, natural history museums are estimated to hold about two and half billion specimens (Duckworth et al. 1993). Seven countries in the hemisphere have National Biodiversity Sites (see annex 1). Overall basic information associated with a specimen will specify (i) information intrinsic to the specimen itself -- taxonomic identification, sex, life-stage, etc. and (ii) information that describes circumstances of its collection and its context in nature -- date/time of collection, names of collector(s), method of collection, habitat description, geographic location, etc.

Tools and Search Engines. There are several systems including Species Analyst, Tree of Life, Allspecies.org, and European Collections Database. Another is Catalog of Life that has already collated basic reference data on 250,000 species, and plans to reach 500,000 by 2003. Overall the Catalogue of Life hopes to create of a unified catalog of the 1.75 million known species of living organisms on earth.

There are four significant projects that aim to compile and/or provide information across major taxonomic on a global scale:

  1. Integrated Taxonomic Information System (ITIS)
  2. Species 2000. A checklist a federation of "Global Species Databases"
  3. International Plant Name Index (IPNI) of the Royal Botanic Gardens, Kew (RBGK), the Harvard University Herbaria (HUH), and the Australian National Botanic Garden (ANBG).
  4. Index to Organism Names (ION) With more than 1.4 million names, ION represents the largest compilation of biological nomenclature.

Both of the checklist projects (ITIS and Species 2000) are attempting to engage members of the systematics community to act as compilers and "curators" of taxonomic information on an on-going basis. ITIS is building a centralized database, whereas Species 2000 is creating a federation among distributed and independently managed databases. The two nomenclator projects (IPNI and ION) are derived from "traditional" literature indexing operations. (An indexer reads publications and records the taxonomic names and subjects addressed in each publication; the resulting indexes are published annually.)

Software Packages for Compiling Taxonomic Names

Platypus - a visual basic application for capturing and managing taxonomic (names and relationships), bibliographic, and distributional data.

ITIS Taxonomic Workbench - another visual basic application for capturing and managing names relationships, references, and distributional data.

Problem: Neither of these packages has attained widespread acceptance relative to the potential market. Every practicing taxonomist could use one of these applications, but most use either a word processor or their own specially built database (such as dBase, Access, FileMakerPro, etc.) to manage the taxonomic and bibliographic information in their research.

Phylogenetic Data. Another class of taxon-by-character data matrices are the data sets used for phylogenetic analysis.

PAUP (Phylogentic Analysis Using Parsimony is a program used to find the best "tree" (hypothesis of phylogenetic relationships) for a given taxon-by-character data matrix.

McClade on the other hand can use the same data sets (the Nexus format), but allows users to draw and rearrange their own trees.

Treebase holds data from more than 450 different studies, covering almost 14,000 taxa.

Significant Software Projects

DELTA/IntKey and LucID provide good explanations of interactive keys. DELTA differs from LucID (and is unique) because it was designed to produce taxonomic descriptions, not just keys.

Another significant organization in the area of interactive keys is the "Expert Center for Taxonomic Identification" (ETI) in Amsterdam. ETI produces CD-ROMs about particular taxonomic groups.

BIOTICA. The Biótica Information System is designed to manage curatorial data, nomenclature, geography, references and observations. It also has the object of helping, in a simple and reliable manner, capture and update recorded data. Biotica manages information about nomenclature (both scientific and vernacular), geography, specimens (and observations), people and institutions, and literature.

BIOLINK. Designed to assist those working with taxon- and specimen-based information. It is primarily intended for use by taxonomists, ecologists, collection managers and biogeographers. It is suitable for use by individual researchers, large Museums or Collections, or teams of global collaborators. 

BIOTA. Developed for Macintosh as a "Fourth Dimension" (4-D) application, but along with 4-D has become capable of running on Windows. Biota provides an excellent balance between capability and simplicity. Biota seems to target individual researcher- or project-level collections, up to medium-sized institutional collections. (This is based on features offered, not on scalability testing or analysis of the install-base.)

SPECIFY. Specify was implemented as a Delphi application using MS Jet-Engine as the database.

The Species Analyst (TSA). The Species Analyst is a distributed information retrieval system that can query multiple collection databases at the same time and return data in a simple, tabular format. FishNet has recently been funded to bring about 20 collections on-line. The Mammal Network Information System (MANIS), is in preparation and if funded will bring another 18 collections on-line. The Species Analyst is a tool for searching a network of museum label databases that are maintained by the museums holding the actual specimen collections. Using ANSI/NISO Z39.50, a standard for information interchange and retrieval, The Species Analyst accepts Web-based queries and requests answers from the various databases. The results can be displayed as tables or maps, or be sent to the San Diego Supercomputer Center for analysis using artificial-intelligence algorithms. The Species Analyst also has links with ITIS and GenBank and it is capable of handling multiple catalogues of names. Currently The Species Analyst allows access to about 18 databases. Soon the Mexican search engine called Mallos, developed under the Mexican Network of Biodiversity Information (REMIB) and The Species Analyst will be made compatible and REMIB and NABIN will operate a single Portal. Both NABIN and REMIB share with GBIF a philosophy of granting users free access to biodiversity data, in compliance with intellectual property rights and the approval of the institutions that provide the information. Essentially, The Species Analyst and Mallos demonstrate the feasibility of accessing biodiversity data in a variety of formats and analyzing and displaying such data in very powerful ways, using a Web interface. Species Analyst - http://habanero.nhm.ukans.edu Fishnet - http://habanero.nhm.ukans.edu/ Fishnet

Problem: There is a pervasive tendency for each collection to develop its own collection cataloging application.