Macarminutes17 12 2007

MACARMinutesThree

Metadata Advisory Committee for Australian Repositories (MACAR) Meeting 3

13th December, 2007, Qantas Club, Melbourne Airport 10.00am -3.00 pm

1. Attendance and apologies

Present:

KB - Katie Blake - ARROW Central (Co convenor)
JG - Joan Gray - ARROW Central (Co convenor) & (Minute taker)
AL – Angela Lang – ARROW Central
NG - Neil Godfrey - Rubric
TR - Tom Ruthven - UNSW/ADT
KS - Kate Sergeant - University of SA
AD - Alison Dellit – National Discovery Service
JM - Jenny Millea – Higher Education, education.au

Ann Huthwaite attended the teleconference at 2pm

Apologies:

John Butera ( Swinburne), Helen Wolff (Swinburne), Scott Yeadon (ANU), Susanne Moir (State Library of NSW) Simon Mcmillan (University of New England) Graham Reynolds (DEST), Belinda Weaver ( University of Queensland), Kerri Blinco.

1.1 Welcome and introductions

KB welcomed everyone to the last MACAR meeting for the year. Those present were asked to introduce themselves providing a brief update of their current roles and responsibilities and the work they’ve been doing.

2. Minutes from previous meeting

Minutes of meeting of 22nd August, 200 7 – accepted

3. Summary of Group 3 work – introducing the principles, the Register and the instructions

KB reviewed work for the year, work of Group 1, Group 2 and then spoke to the Group 3 documents explaining the rationale and principles, the ready reference tool itself and the accompanying instructions. At first it was thought that a set of MARC and MODS templates and crosswalks based on the resource types and metadata schemas might be suitable tools for the repository community. It became apparent after looking at a number of metadata schemas and application profiles that use of one schema only may not provide the richer metadata that is needed and that we may need to blend elements from a number of schemas as well as add some new elements if required. Group 3 agreed that it would be useful to step back from any particular schema and to develop a data dictionary of elements/properties for each of the resource types. The work was divided up between the Group 3 members. The schemas for each of the resource types had been surveyed (and in some instances consultation with the stakeholder community had taken place) and a set of properties with subproperties and attributes were identified. KB was sent the lists and built/ populated a tool for representing this data. The Excel spreadsheet was selected in preference to a database as ARROW was already using this for its central register, everyone has access to Excel and some of its advanced features eg sort, autofilter and pivot tables could be used to manipulate, search and display the data. It is hoped that from this data dictionary it will be possible to create XML and later RDF expressions ( when tools are available to do this) from the data in the spreadsheets.

AD commented that the properties required for harvesting and recording/creation of metadata are different and the spreadsheet should provide some guidelines for this. She noted that she has recently completed a detailed analysis of data harvested from institutional repositories in 21 institutions which shows the inconsistencies in the dc metadata provided. Tom agreed that this is also a problem in the data harvested for ADT.

4. Analysis of the scope of the spreadsheet – new columns? Assigning the population of examples and values

In addition to the properties and sub-properties other attributes were identified eg obligation, repeatability etc. KB asked if more columns should be added to the spreadsheet and more attributes identified and documented such as definition/scope notes, source of definition, examples, comments etc a. JG said that other application profiles eg library AP provide a guide for what attributes should be included. AD provided mapping from all the properties to Dublin Core.

Action: KB to add additional columns for definition/scope note, example as well as a mapping to MARC

5. Review of the cluster approach

KB introduced the cluster concept on the spreadsheet. She identified the main categories such as description, identification, event, rights, creator and technical description. JG asked if these sets of groupings were like FRBR entities and could be mapped to those domains later. AD noted that she has done a dc mapping to the properties.

KB said that grouping properties into dc clusters would be useful for encoding XML expressions. A lot of discussion ensued. Is this a kind of data model we are developing ? How will it work with METS ? It was agreed that the relationships we were defining were at the element level only which could also be useful in the future when creating descriptions and description sets. This is still under discussion.

Action: Dublin Core to be considered as the basis of clusters of properties and sub properties. Discussions are proceeding.

6. Review of each of the properties

KB demonstrated how the tool works. The data sheet contains all the data and the sort and filter options and pivot tables in Excel enable the data to be managed, searched and displayed in a user friendly way. There are currently 4 worksheets: data, by resource type, by property and the list of resource types. There are lots of rows of data, over 700+ and one row for each property and lots of duplication in the data sheet as the same property has to be entered for each resource type. Column headings in the spreadsheet identifying various attributes were discussed. It was decided to include definition/scope note and example. It was also agreed that XML expressions of this data would be useful. There are some properties which might be missing and some that would need revising. JG said that the ARROW team has been doing a lot of work on datasets metadata in consultation with the e-research community at Monash and the dataset properties in the spreadsheet will be revised.

It was agreed that the work of populating all the properties with definition and scope note, examples and values, would be divided up based on DC elements as follows:

Alison Dellit : Relation; Identifier

Katie Blake : Date; Title

Joan Gray: Description; Format; Language

Neil Godfrey: Coverage; Rights

Kate Sergeant: Creator; Publisher

Tom Ruthven: Subject; Type

Action: KB to email MACAR members with details of division of tasks based on DC elements.

Action: KB to change spreadsheet with new column headings as discussed

Action: KB to investigate tools to provide XML expressions of the spreadsheet data.

7. Mapping to Dublin Core MARCXML MODS and QDC – who will do this and when

AD has already provided the mappings to Dublin Core. KB suggested that a MARCXML mapping would be done. As there are existing crosswalks from MARCXML to a range of other schemas, this may be the only mapping MACAR needs to do at this stage.

Action: KB to email a format for the mapping if members wish to include this in the spreadsheet.

8. Decide which resource types to address next and who will do them

JG asked if the type vocabulary developed by Group 2 was used in the community? Are repository managers using the new vocabulary ? TR said that he was using it but was also using multiple resource types for complex objects. JG said that the best practice rules recommended choosing the most dominant type applicable to the resource but multiple resource types could also be used if repository solutions permitted.

JG also asked about the resource types : website, multimedia and interactive resource. It was agreed that there was still merit in using website as there are digital resources which are websites eg archived sites and are not any other type.

It was also agreed that both multimedia and interactive resource should also be retained although there is some overlap but the type of resources that could be described by these could be different. The interaction from the user aspect of an interactive resource distinguishes it from multimedia. It was also agreed to remove scholarly text from the list as textual forms are more likely to be covered by the subtypes.

JG also asked if we should investigate metadata registries for declaring and publishing the MACAR type vocabulary. She said she had discussed with Diane Hillmann from Cornell University, if we could use the NSDL registry if this is something MACAR wanted to do. A metadata registry sandbox area was available to experiment with this before going ahead. It was agreed that it was possibly too early to do this as we don’t know as yet how much take up there is for the vocabulary in the Australian repository community. It has been publicised on various websites but we don’t know how many repository managers are using it. We need to promote these tools to the repository community and invite feedback.

Action: JG to amend the type vocabulary as discussed.

Action: MACAR to discuss ways of promoting the type vocabulary and invite feedback.

9. Creation of .xsd schema (s) and any other tools – what needs to be done?

KB said she has been experimenting with XML data import/export options in Excel to translate spreadsheet data to XML. She demonstrated a sample of the data that had been encoded in XML. We might also want to look at RDF translations in the future as tools are available.

10. Registering the MAP (MACAR Application Profile) and what needs to be done?

This will be a topic for a future meeting.

11. Publicising the work of MACAR – what has been done, what remains to be done ?

AD said we needed to make these tools more visible. We needed to put these tools out there and find out how the repository managers are using them. It was also suggested that we should talk to the APSR people.

JM suggested that we should publicise our work more through attendance at various conferences eg DC-2008, Educause 2008, AICTEC – Australian ICT in Education Committee, Carrick Institute etc

We have publicised our work on some websites and promoted our work at some conferences/seminars but we need to do more.

There was much more discussion on how to promote the work of MACAR. A strategy document will be prepared outlining a five-point approach:

1. A consultation stage with managers and IT staff. This will be institution by institution, as well as user groups in the communities of Dspace, Digitools, and Eprints.

2. Working with IT19 and ANDS to ensure that the work MACAR does is recognised through official channels

3. Registration of the MACAR Application Profile and development of associated tools

4. Conducting seminars, workshops, and presentations at national and international conferences

5. Preparation and dissemination of a series of briefing and instructional papers

12. Teleconference

Ann Huthwaite joined the meeting by teleconference at 3.00pm . KB summarised what was discussed so far- spreadsheet, principles and clusters, resource types and noted the action items relating to these.

AH asked for more information about the clusters. KB said that the clusters were useful for grouping properties and subproperties which had some characteristics in common and then finding and searching for these for recording purposes. The DC clusters would facilitate the encoding of properties and subproperties in XML .

AH also asked if MACAR will continue next year. KB said that ARROW will continue to support MACAR next year. KB will do a summary of decisions for Google Groups.

AH also asked about a crosswalk to MARCXML. KB said that we will be providing this in the spreadsheet. In the meantime there are templates on the ARROW website and a central register on Google Docs.

13. Next meeting

Meeting closed at 3pm. Date of next meeting to be advised.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License