Herdc The Metadata Issues

The HERDC Category Code

A category code is required in addition to the resource type. It must appear in the format A1, B1, C2 and may additionally include the definition.

MACAR recommends:

  • that this code be recorded and stored in a resource type or genre property, with values could be assigned from a DEST category code vocabulary. Category code definitions should also be provided eg A1 – Book ; C1 – Journal article B1 - Book chapter etc
  • Use type/genre property/field. Best practice is to use a controlled vocabulary.
  • MARC : 655 field – Index Term-Genre/Form subfield a and subfield 2
  • Dublin Core: dcterms:type or dc:type in simple DC
  • MACAR : macar:type

If the repository software cannot manage more than one resource type, or if the repository prefers not to make this data publicly available, a standard mapping from a resource type vocabulary used to categorise the resource (eg MACAR Resource Type vocabulary) to DEST category codes can be created to identify and collect this data for DEST/DEEWR purposes.

Can HERDC-specific metadata be calculated using standard descriptive metadata?

One of the reasons that deriving DEST category codes might be a good idea if you were considering using a repository for HERDC reporting is that in most publication reporting systems, the attribution of DEST categories to publications is highly controlled and audited. One of the reasons for this is that these DEST codes have an impact on the amount of funding that is returned to a University. The assignation of DEST category codes is often entered by administrators, and then signed off on by heads of department. (This at least is the University of Melbourne experience). If this sort of workflow is difficult to implement in a repository, then deriving the DEST category codes from publication metadata using established business rules might be a more appropriate method of enforcing rigour around the reporting of DEST category codes. (Simon Porter)

An example might be a calculated field such as IF resource type=journal article AND type=peer-reviewed THEN DEST Category = A1 (Katie Blake)

Not all of it might be possible to include. For example, a B1 criteria (for book chapters) includes “must have been published by a commercial publisher”, where “For the purposes of these specifications, a commercial publisher is an entity for which the core business is producing books and distributing them for sale. If publishing is not the core business of an organisation but there is a distinct organisational entity devoted to commercial publication and its publications are not completely paid for or subsidised by the parent organisation or a third party, the publisher is acceptable as a commercial publisher.” Only the publisher name is generally recorded not whether it is a “commercial publisher”, e.g. Tom Ruthven Vainglorious Publishing Company does not tell me if it is a commercial publisher. (Tom Ruthven)

Status value

A status value (as in eprints.org) would be useful. Peer review is relevant for journal articles and conference publications counted for DEST/DEEWR returns. A status vocabulary encoding scheme as in the Scholarly Works Application Profile could be used. This is a list of terms to indicate the peer-reviewed status of a publication. It is a simple vocabulary with just 2 terms – peer-reviewed and non-peer-reviewed and their definitions.

This has been added, and is at http://macar.wikidot.com/status-type-vocabulary

Total number of authors and ranking of these

An indication of the total number of authors and the ranking of each of those authors is required. These figures are to determine the order in which an author is displayed, as well as a weighting applied to their “points” for participating in the publication. The total number of authors can be derived automatically from the resource descriptions themselves but author ranking may need to be recorded manually from the display in the record. However the display in the record may be misleading as some repository submission software may not allow the authors to be added and displayed in the order appearing in the publication. Some submission tools display the author inputting the data first. A data element indicating the preferred citation of the described resource may be required for HERDC as well as to support the further referencing and citing of research publications.

MACAR recommendation :

  • Add a bibliographic citation field to the record
  • MARC : 524 field – Preferred citation of described materials note – subfield a
  • Dublin Core: dcterms: bibliographicCitation or dc:identifier in Simple DC
  • MACAR : macar:bibliographic citation

Author affiliation

Repositories need to capture single authors and single/multiple affiliations and multiple authors and multiple affiliations in the metadata for research publications. Most agent/person descriptions do provide an author affiliation attribute. This relationship is usually expressed in authority data through links to authorized names and information notes.

Work is being done on this by the IFLA Working Group (FRANAR) on Functional Requirements for Authority Data and in the ongoing work of the DC Agent Working Group, and the FOAF specification (a Semantic Web initiative) These standards are in early development and we will continue to monitor closely.

One view is that the affiliation for a research publication eg journal article or conference paper is a property of the resource rather than that of the author. The author’s affiliation at the time the resource was created will persist even if the author moves to a different institution. If affiliation is then the property of the resource and not the author, then the resource description can contain multiple affiliations for multiple authors and there wouldn’t be a need to correlate particular authors and their affiliations.

MACAR recommendation :

  • Capture author affiliation in the descriptive metadata. Agent standards are in early development. Include in the descriptive metadata.
  • MARC: 110/710 - Corporate name
  • Dublin Core : Use dc:creator for author and dc:contributor for author affiliation (DCMI recommendation)
  • MACAR : Affiliated institution (university, faculty and school)

Total number of chapters in a book

It was suggested that DC description and qualifier table of contents could be used in repeatable fields. If using METS the division <div> elements within a structural map can be used to record the individual chapters of a book.

MACAR recommendation :

  • To work collaboratively with the Research Office to determine where this data should be stored as it isn’t the kind of data that aids discovery of the resource but may only be required for administrative purposes and could be stored somewhere else.
  • MARC : 773 field – host item entry
  • Dublin Core: dc:description in Simple DC or dcterms: tableOfContents
  • MACAR : none

Broader discussion

There has been broader discussion of the various HERDC models being proposed. There is a question as to what metadata is suitable for storing in a repository and what is more appropriate for the Research Office to maintain.

It would be useful to have a web interface for researchers to maintain their own profiles.

It was also recommended that the data for HERDC be recorded once and both the repository and the Research Office work collaboratively to collect the outputs for HERDC reporting to DEEWR.

Repositories are unlikely to make use of this data other than for this kind of reporting. It may not be advisable to try and ‘shoehorn’ purely HERDC metadata into a repository object. MACAR questions whether or not repositories need to record and store this kind of very specific purpose-driven metadata.

HERDC container or separate the metadata across metadata sections

Perhaps there should be a separate HERDC container (eg datastream in VITAL repository software) which could be used to keep all the HERDC metadata together Some institutions are already creating separate datastreams to capture and store this kind of data. The METS standard allows the packaging of digital objects with all kinds of metadata in a single file and this standard could be used to implement this. Finding a logical place for this in the administrative section of a METS file may be problematic. Also namespaces are a very important part of the METS standard and any XML schema incorporated in a METS package requires a namespace declaration for validation purposes. Further investigation of the METS standard is required to test if extension schemas such as this can be incorporated into a METS package and ingested into a repository.

Total number of authors within a department or school

This data is no longer a HERDC requirement.

Add a New Comment
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License