Toolbox
Toolbox
Data Enrichment
Publication Validator

The Publication Validator module

The Publication Validator checks the publication type of given literature DOI. This accounts for the increasing amount of non-literature DOIs found in the OAI-PMH metadata, as provided by the center libraries. This property is validated with metadata harvested from DataCite (opens in a new tab), CrossRef (opens in a new tab), and ScholeXplorer (opens in a new tab).
If no metadata is found at DataCite and CrossRef, the DOI Registrar is queried and logged from doi.org (opens in a new tab).

Consequences

The workflow of the HMC Toolbox for Data Mining is primarily designed to find literature DOIs and from there to find linked data publications. This is assumed to be the default case.

Alternatively, if validation shows that a publication DOI found in the library metadata is a data-DOI instead, a secondary Linked Data Finder query is triggered to find literature publications that are linked to the respective data-DOI.

If none of these steps is successful, the respective publication is treated as if it were a literature publication. In the database, it is marked as unvalidated.

Validation - How To

The type validation itself is done in the following way:

  1. Check the DataCite (opens in a new tab) property dataattributestypesresourceTypeGeneral:
    If the value is one of the following
    • Book
    • BookChapter
    • ConferencePaper
    • ConferenceProceeding
    • DataPaper
    • Dissertation
    • Journal
    • JournalArticle
    • PeerReview,
    • Preprint
    • Report
    • Text
    a publication DOI is concluded to be a literature publication.
    If value is one of the following
    • Dataset
    • Collection
    a publication DOI is concluded to be that of a data publication. If none of these is true, move on.
  2. Check the CrossRef (opens in a new tab) properties messagetype and messagepublisher: If messagetype equals dataset and messagepublisher equals Worldwide Protein Data Bank the publication is assumed to be a dataset.
    If both attributes exist but this validation fails the publication is assumed to be a literature publication.
    If none of these conclusions holds, move on.
  3. Check the ScholiX (opens in a new tab) metadata property result[0]sourceType: If the value is literature a literature publication is assumed, if the value is dataset or software, the program proceeds.

Disclaimer

Please note that the list of data publications obtained from data harvesting using the HMC Toolbox for Data Mining, as presented in the HMC FAIR Data Dashboard is neither complete nor entirely free of falsely identified data. If you wish to reuse the data shown in the dashboard for sensitive topics such as funding mechanisms, we highly recommend a manual review of the data.