Chapter 11
Meta Data and the Dublin Core

Difficulty

Newcomer

Skills

Problem/Task

Any advanced system has the need to specify meta-data for various artifacts of its system, especially for objects that represent content. For a publishing environment like Zope 3 it is important to have a standard set of meta-data fields for all content objects. Already in Zope 2’s CMF, the Dublin Core was used to provide such a set of fields.

Solution

Even though I expect that you know what the term “meta-data” means, it can be useful to do a quick review since people use the term in a very broad sense. Data in general is the information an object inheritly carries. It represents the state and is necessary to identify the object. Meta-data on the other hand is information about the object and its content. It is not required for the object to function in itself (i.e. object methods should not depend on meta-data) but the meta-data might be important for an object to function inside a larger framework, providing additional information for identification, cataloging, indexing, integration with other systems, etc.

One standard set of meta-data is called “Dublin Core” ( dublincore.org). The Dublin Core provides additional information about content-centric objects, such as the title, description (summary or abstract) and author of the object. As said before, Dublin Core was very successful in Zope 2’s CMF and Plone.

In the Dublin Core, short DC, all elements are lists, meaning that they can have multiple values. The DC elements are useful to us, because they cover the very most common meta-data items, like the creation date, title, and author. This data is useful in the context of most objects and at least their high-level meaning is easily understood. But there are some issues with the Dublin Core as well. There is a temptation for developers to interpret some deep meaning into the DC elements, since it is such a well-established standard. As Ken Manheimer pointed out, even the Dublin Core designers succumbed to that temptation, and tried to be a bit too ambitious, with some of the fields.

A good example here is the contributor element. It is not clear what is meant by a contributor. Is it an editor, translator, or an additional content author? And how does this information help me, if I want to find the person who last modified the object or publication? Therefore it becomes important to specify the meaning of the various elements (and the items in a particular element) for each specific implementation, such as Zope 3. All the elements and how they are implemented are well documented by the interfaces found in ZOPE3/src/zope/app/interfaces/dublincore.py and in the section below.

The Dublin Core Elements

The following Dublin Core element list was taken from http://dublincore.org/documents/2003/02/04/dces/. I added and edited some more comments with regard to Zope 3’s implementation.

Title

Label

Title

Definition

A human-readable name given to the resource.1

Comment

In the Zope 3, the name of a resource is a unique string within its container (it used to be called “id” in Zope 2). However, names of objects are often not presented to the end user. The title is used to represent an object instead.

Creator

Label

Creator

Definition

An entity primarily responsible for making the content of the resource.

Comment

A creator in Zope is a special example of a principal, which can take a lot of forms, but it will typically be a user of the application. Zope 3 stores the user id in this field.

Subject

Label

Subject and Keywords

Definition

A topic of the content of the resource.

Comment

Typically, Subject will be expressed as keywords, key phrases or classification codes that describe a topic of the resource. Recommended best practice is to select a value from a controlled vocabulary or formal classification scheme. Note that this is ideal for cataloging.

Description

Label

Description

Definition

An account of the content of the resource.

Comment

Examples of Description include, but is not limited to: an abstract, table of contents, reference to a graphical representation of content or a free-text account of the content. In Zope 3 we usually use the description to give some more details about the semantics of the object/resource, so that the user gains a better understanding about its purpose.

Publisher

Label

Publisher

Definition

An entity responsible for making the resource available

Comment

It is unlikely that this entity will be used heavily in Zope 3, but it might be useful for workflows of News sites and other publishing applications. The Publisher is the name/id of a principal.

Contributor

Label

Contributor

Definition

An entity responsible for making contributions to the content of the resource.

Comment

Examples of Contributor include a person, an organization, or a service. Typically, the name of a Contributor should be used to indicate the entity. As mentioned before, this term is incredibly vague and needs some additional policy; Zope 3 has not made up such a policy yet. The Contributor is the name/id of a principal.

Date

Label

Date

Definition

A date of an event in the lifecycle of the resource.

Comment

Typically, Date will be associated with the creation or availability of the resource. Recommended best practice for encoding the date value is defined in a profile of ISO 8601 [W3CDTF] and includes (among others) dates of the form YYYY-MM-DD. Note, that often time matters to us as well; of course, instead of saving text we store Python datetime objects. Also note that the definition is very vague and needs some more policy to be useful.

Type

Label

Resource Type

Definition

The nature or genre of the content of the resource.

Comment

Type includes terms describing general categories, functions, genres, or aggregation levels for content. Recommended best practice is to select a value from a controlled vocabulary (for example, the DCMI Type Vocabulary [DCT1]). To describe the physical or digital manifestation of the resource, use the “Format” element. For content objects the resource type is clearly the “Content Type”. For other objects it might be simply the registered component name. However, Zope 3 is not using this element yet.

Format

Label

Format

Definition

The physical or digital manifestation of the resource.

Comment

Typically, Format may include the media-type or dimensions of the resource. Format may be used to identify the software, hardware, or other equipment needed to display or operate the resource. Examples of dimensions include size and duration. Recommended best practice is to select a value from a controlled vocabulary (for example, the list of Internet Media Types [MIME] defining computer media formats). We have not used this element so far, even though I think we could use some of the existing framework for this.

Identifier

Label

Resource Identifier

Definition

An unambiguous reference to the resource within a given context.

Comment

Recommended best practice is to identify the resource by means of a string or number conforming to a formal identification system. Formal identification systems include but are not limited to the Uniform Resource Identifier (URI) (including the Uniform Resource Locator (URL)), the Digital Object Identifier (DOI) and the International Standard Book Number (ISBN). In Zope 3’s case this could be either the object’s path or unique id (as assigned by some utility.

Source

Label

Source

Definition

A Reference to a resource from which the present resource is derived.

Comment

The present resource may be derived from the Source resource in whole or in part. Recommended best practice is to identify the referenced resource by means of a string or number conforming to a formal identification system. I do not see how this is generically useful to Zope components, though I think is could be applicable in specific applications written in Zope.

Language

Label

Language

Definition

A language of the intellectual content of the resource.

Comment

Recommended best practice is to use RFC 3066 [RFC3066] which, in conjunction with ISO639 [ISO639]), defines two- and three primary language tags with optional subtags. Examples include “en” or “eng” for English, “akk” for Akkadian, and “en-GB” for English used in the United Kingdom. Note that we have a system in place to describe locales; see the introduction to internationalization and localization.

Relation

Label

Relation

Definition

A reference to a related resource.

Comment

Recommended best practice is to identify the referenced resource by means of a string or number conforming to a formal identification system. Another vague element, since it does not specify what sort of relation is meant; it could be “containment” for example, in which case the parent object would be a good candidate. However, more policy is required to make the field useful to Zope.

Coverage

Label

Coverage

Definition

The extent or scope of the content of the resource.

Comment

Typically, Coverage will include spatial location (a place name or geographic coordinates), temporal period (a period label, date, or date range) or jurisdiction (such as a named administrative entity). Recommended best practice is to select a value from a controlled vocabulary (for example, the Thesaurus of Geographic Names [TGN]) and to use, where appropriate, named places or time periods in preference to numeric identifiers such as sets of coordinates or date ranges. This seems not to be useful to Zope generically.

Rights

Label

Rights Management

Definition

Information about rights held in and over the resource.

Comment

Typically, Rights will contain a rights management statement for the resource, or reference a service providing such information. Rights information often encompasses Intellectual Property Rights (IPR), Copyright, and various Property Rights. If the Rights element is absent, no assumptions may be made about any rights held in or over the resource. Zope 3 could use this element to show its security settings on this object, in other words who has read and modification access to the resource. It makes little sense to use this element to generically define a copyright or license entry. Again, specific applications might have a better use for this element.