The HERDC Category Code
A category code is required in addition to the resource type. It must appear in the format A1, B1, C2 and may additionally include the definition.
MACAR recommends:
- that this code be recorded and stored in a resource type or genre property, with values could be assigned from a DEST category code vocabulary. Category code definitions should also be provided eg A1 – Book ; C1 – Journal article B1 - Book chapter etc
- Use type/genre property/field. Best practice is to use a controlled vocabulary.
- MARC : 655 field – Index Term-Genre/Form subfield a and subfield 2
- Dublin Core: dcterms:type or dc:type in simple DC
- MACAR : macar:type
If the repository software cannot manage more than one resource type, or if the repository prefers not to make this data publicly available, a standard mapping from a resource type vocabulary used to categorise the resource (eg MACAR Resource Type vocabulary) to DEST category codes can be created to identify and collect this data for DEST/DEEWR purposes.
Can HERDC-specific metadata be calculated using standard descriptive metadata?
One of the reasons that deriving DEST category codes might be a good idea if you were considering using a repository for HERDC reporting is that in most publication reporting systems, the attribution of DEST categories to publications is highly controlled and audited. One of the reasons for this is that these DEST codes have an impact on the amount of funding that is returned to a University. The assignation of DEST category codes is often entered by administrators, and then signed off on by heads of department. (This at least is the University of Melbourne experience). If this sort of workflow is difficult to implement in a repository, then deriving the DEST category codes from publication metadata using established business rules might be a more appropriate method of enforcing rigour around the reporting of DEST category codes. (Simon Porter)
An example might be a calculated field such as IF resource type=journal article AND type=peer-reviewed THEN DEST Category = A1 (Katie Blake)
Not all of it might be possible to include. For example, a B1 criteria (for book chapters) includes “must have been published by a commercial publisher”, where “For the purposes of these specifications, a commercial publisher is an entity for which the core business is producing books and distributing them for sale. If publishing is not the core business of an organisation but there is a distinct organisational entity devoted to commercial publication and its publications are not completely paid for or subsidised by the parent organisation or a third party, the publisher is acceptable as a commercial publisher.” Only the publisher name is generally recorded not whether it is a “commercial publisher”, e.g. Tom Ruthven Vainglorious Publishing Company does not tell me if it is a commercial publisher. (Tom Ruthven)
Status value
A status value (as in eprints.org) would be useful. Peer review is relevant for journal articles and conference publications counted for DEST/DEEWR returns. A status vocabulary encoding scheme as in the Scholarly Works Application Profile could be used. This is a list of terms to indicate the peer-reviewed status of a publication. It is a simple vocabulary with just 2 terms – peer-reviewed and non-peer-reviewed and their definitions.
This has been added, and is at http://macar.wikidot.com/status-type-vocabulary
Total number of authors and ranking of these
An indication of the total number of authors and the ranking of each of those authors is required. These figures are to determine the order in which an author is displayed, as well as a weighting applied to their “points” for participating in the publication. The total number of authors can be derived automatically from the resource descriptions themselves but author ranking may need to be recorded manually from the display in the record. However the display in the record may be misleading as some repository submission software may not allow the authors to be added and displayed in the order appearing in the publication. Some submission tools display the author inputting the data first. A data element indicating the preferred citation of the described resource may be required for HERDC as well as to support the further referencing and citing of research publications.
MACAR recommendation :
- Add a bibliographic citation field to the record
- MARC : 524 field – Preferred citation of described materials note – subfield a
- Dublin Core: dcterms: bibliographicCitation or dc:identifier in Simple DC
- MACAR : macar:bibliographic citation
Author affiliation
Repositories need to capture single authors and single/multiple affiliations and multiple authors and multiple affiliations in the metadata for research publications. Most agent/person descriptions do provide an author affiliation attribute. This relationship is usually expressed in authority data through links to authorized names and information notes.
Work is being done on this by the IFLA Working Group (FRANAR) on Functional Requirements for Authority Data and in the ongoing work of the DC Agent Working Group, and the FOAF specification (a Semantic Web initiative) These standards are in early development and we will continue to monitor closely.
One view is that the affiliation for a research publication eg journal article or conference paper is a property of the resource rather than that of the author. The author’s affiliation at the time the resource was created will persist even if the author moves to a different institution. If affiliation is then the property of the resource and not the author, then the resource description can contain multiple affiliations for multiple authors and there wouldn’t be a need to correlate particular authors and their affiliations.
MACAR recommendation :
- Capture author affiliation in the descriptive metadata. Agent standards are in early development. Include in the descriptive metadata.
- MARC: 110/710 - Corporate name
- Dublin Core : Use dc:creator for author and dc:contributor for author affiliation (DCMI recommendation)
- MACAR : Affiliated institution (university, faculty and school)
Total number of chapters in a book
It was suggested that DC description and qualifier table of contents could be used in repeatable fields. If using METS the division <div> elements within a structural map can be used to record the individual chapters of a book.
MACAR recommendation :
- To work collaboratively with the Research Office to determine where this data should be stored as it isn’t the kind of data that aids discovery of the resource but may only be required for administrative purposes and could be stored somewhere else.
- MARC : 773 field – host item entry
- Dublin Core: dc:description in Simple DC or dcterms: tableOfContents
- MACAR : none
Broader discussion
There has been broader discussion of the various HERDC models being proposed. There is a question as to what metadata is suitable for storing in a repository and what is more appropriate for the Research Office to maintain.
It would be useful to have a web interface for researchers to maintain their own profiles.
It was also recommended that the data for HERDC be recorded once and both the repository and the Research Office work collaboratively to collect the outputs for HERDC reporting to DEEWR.
Repositories are unlikely to make use of this data other than for this kind of reporting. It may not be advisable to try and ‘shoehorn’ purely HERDC metadata into a repository object. MACAR questions whether or not repositories need to record and store this kind of very specific purpose-driven metadata.
HERDC container or separate the metadata across metadata sections
Perhaps there should be a separate HERDC container (eg datastream in VITAL repository software) which could be used to keep all the HERDC metadata together Some institutions are already creating separate datastreams to capture and store this kind of data. The METS standard allows the packaging of digital objects with all kinds of metadata in a single file and this standard could be used to implement this. Finding a logical place for this in the administrative section of a METS file may be problematic. Also namespaces are a very important part of the METS standard and any XML schema incorporated in a METS package requires a namespace declaration for validation purposes. Further investigation of the METS standard is required to test if extension schemas such as this can be incorporated into a METS package and ingested into a repository.
Total number of authors within a department or school
This data is no longer a HERDC requirement.
In the latest Teleconference minutes, there is the following discussion:
4.1.DEST category code
It was agreed that this code could be recorded and stored in a resource type or genre property and values could be assigned from a DEST category code vocabulary. Category code definitions should also be provided eg A1 – Book ; C1 – Journal article B1 - Book chapter etc
• MACAR recommendation: Use type/genre property/field. Best practice is to use a controlled vocabulary.
• MARC : 655 field – Index Term-Genre/Form subfield a and subfield 2
• Dublin Core: dcterms:type or dc:type in simple DC
• MACAR : macar:type
Are you actually meaning we need to include the code and type in the 655 field, e.g. $aA1 - Book?
We are coding the code into "A1" into 592 and the type "book" into 655.
I don't think having a resource type "A1 - Book" is very helpful for people who don't know DEST category codes. Even if you do want the 'A1' in the resource type, I would suggest you have at the end of the field 'Book - A1' so all the books file together. Users just want to know it is a book, they don't care it is an A1 book. What is the problem with continuing to use 592?
Any further info would be good.
Fiona
Good questions Fiona. The MACAR recommendation is to use the type property and values from a controlled vocabulary. This is based on current international standards. Yes – type and code can be in dc:type or marc 655 field and if more than one type and value/code to be used from multiple vocabularies then repeatable fields can be used if allowed by the repository software. Don’t know why the 592 field was recommended or used.
A DEST category code is a value identifying the genre of a resource, just more granular than the values used from the resource type vocabulary. I agree these codes are not very useful other than for DEST reporting. If you wouldn’t like this data to be displayed to users you could consider either creating a mapping/translation from existing values in the resource type vocabulary to the DEST category codes or as Simon suggested deriving/calculating these codes from the publication metadata for reporting purposes. (please see the minutes) Eventually it is up to the repository how this is implemented.
Hope this helps
Joan
Just by way of background, here is some of the relevant conversation, so you all know what the issues are
Simon Porter said:
…. the DEST category code could be derived from other metadata against the record at reporting time. This would be a nontrivial exercise, but it would be an alternative to essentially translating information that is already in the record at data entry time.
One of the reasons that deriving DEST category codes might be a good idea if you were considering using a repository for HERDC reporting is that in most publication reporting systems, the attribution of DEST categories to publications is highly controlled and audited. One of the reasons for this is that these DEST codes have an impact on the amount of funding that is returned to a University. The assignation of DEST category codes is often entered by administrators, and then signed off on by heads of department. (This at least is the University of Melbourne experience). If this sort of workflow is difficult to implement in a repository, then deriving the DEST category codes from publication metadata using established business rules might be a more appropriate method of enforcing rigour around the reporting of DEST category codes.
Tom Ruthven replied:
UNSW uses a similar assignation of DEST codes as University of Melbourne (entered by administrators, and then signed off on by heads of department). However, as the codes are auditable information I’d prefer to store the code used for the DEST return rather than re-calculate it for an auditor. Initially the code could be calculated but I would prefer it to be stored as the code.
Angela Lang asked how the calculation might be done.
Katie Blake answered that it might be something along the lines of IF resource type=journal article AND type=peer-reviewed THEN DEST Category = A1
Simon Porter responded that
For journals it would probably end up being:
IF resource type=journal article and journal name belongs to a pre approved list THEN DEST category = Blah
The reason there might need to be a pre approved list is that in general the university is responsible for coordinating what it considers to be a peer reviewed journal. Having a pre approved list is another way of putting a control process around the process of deciding what is considered as peer reviewed for the purposes of DEST reporting. Up until 2006, DEST maintained a pre approved list…
I think the rules would get pretty tricky pretty quickly, but I raised it mostly to make the point that recording the code is probably the easy bit. It’s enforcing all the business processes around the recording of the codes that might make the use of a repository as a sole HERDC reporting device very challenging.
Tom raised auditing, which is a good point. After a head of department has signed off on the records, those records cannot be changed without an audit trail that results in a hod re approval step. I’m not sure how this would be managed purely in a repository environment. Happy to be told otherwise.
Tom Ruthven then came back with
Although this is not directly related to MACAR’s work I am not sure everything can be calculated using standard descriptive metadata.
For example, a B1 criteria (for book chapters) includes “must have been published by a commercial publisher”, where “For the purposes of these specifications, a commercial publisher is an entity for which the core business is producing books and distributing them for sale. If publishing is not the core business of an organisation but there is a distinct organisational entity devoted to commercial publication and its publications are not completely paid for or subsidised by the parent organisation or a third party, the publisher is acceptable as a commercial publisher.” Only the publisher name is generally recorded not whether it is a “commercial publisher”, e.g. Tom Ruthven Vainglorious Publishing Company does not tell me if it is a commercial publisher J
We are capturing the DEST code from our research system, or if we were entering records into our repository manually, we would enter the code into our repository. We would not use a rule such as "IF resource type=journal article and journal name belongs to a pre approved list THEN DEST category = Blah". I just don't see the sense of having the code 'A1" and type 'book' in the same field, and in particularly a public field. I really feel the 'A1' should be stored in a separate field that can then be retreived on and reported on where necessary. I was very happy with the idea of a 592 field used to describe the DEST category.