The Publication Validator module
The Publication Validator checks the publication type of given literature DOI.
This accounts for the increasing amount of non-literature DOIs found in the OAI-PMH metadata, as provided by the center libraries.
This property is validated with metadata harvested from DataCite (opens in a new tab), CrossRef (opens in a new tab),
and ScholeXplorer (opens in a new tab).
If no metadata is found at DataCite and CrossRef, the DOI Registrar is queried and logged from doi.org (opens in a new tab).
Consequences
The workflow of the HMC Toolbox for Data Mining is primarily designed to find literature DOIs and from there to find linked data publications. This is assumed to be the default case.
Alternatively, if validation shows that a publication DOI found in the library metadata is a data-DOI instead, a secondary Linked Data Finder query is triggered to find literature publications that are linked to the respective data-DOI.
If none of these steps is successful, the respective publication is treated as if it were a literature publication. In the database, it is marked as unvalidated
.
Validation - How To
The type validation itself is done in the following way:
- Check the DataCite (opens in a new tab) property
data
→attributes
→types
→resourceTypeGeneral
:
If the value is one of the following- Book
- BookChapter
- ConferencePaper
- ConferenceProceeding
- DataPaper
- Dissertation
- Journal
- JournalArticle
- PeerReview,
- Preprint
- Report
- Text
If value is one of the following- Dataset
- Collection
- Check the CrossRef (opens in a new tab) properties
message
→type
andmessage
→publisher
: Ifmessage
→type
equalsdataset
andmessage
→publisher
equalsWorldwide Protein Data Bank
the publication is assumed to be a dataset.
If both attributes exist but this validation fails the publication is assumed to be a literature publication.
If none of these conclusions holds, move on. - Check the ScholiX (opens in a new tab) metadata property
result
→[0]
→source
→Type
: If the value isliterature
a literature publication is assumed, if the value isdataset
orsoftware
, the program proceeds.
Disclaimer
Please note that the list of data publications obtained from data harvesting using the HMC Toolbox for Data Mining, as presented in the HMC FAIR Data Dashboard is neither complete nor entirely free of falsely identified data. If you wish to reuse the data shown in the dashboard for sensitive topics such as funding mechanisms, we highly recommend a manual review of the data.