AN IntroductioN TO METADATA
FAQs on Metadata


Assembled below are a batch of commonly asked questions about metadata and related topics. Click the "disclosure triangle" next to each entry to reveal an answer and additional discussion points.

 

FAQs

What is metadata, anyway?

"Metadata" is descriptive information about a resource. The resource may be video or audio, an image or graphic, a text-based document, an interactive module, or any other content item either electronic/digital or physical/analog.

The primary purpose of metadata is to enhance findability and facilitate sharing...the ability to describe a resource and allow someone to discover, review, select, and retrieve an item.

Examples of metadata include the name of an item; descriptions or abstracts about its content; keywords or subject classifications; file formats; authors; producers; distributors; publishers; copyright and usage restrictions; etc.

Metadata needs to be structured in some way. The descriptions available through metadata should not be created in a random or ad hoc manner. In other words, metadata should follow a well-documented, formalized scheme. The flip side of using standardized metadata schemes is referred to as a "Folksonomy"...

In contrast to professionally developed taxonomies with controlled vocabularies, folksonomies are unsystematic and, from an information scientist's point of view, unsophisticated; however, for Internet users, they dramatically lower content categorization costs because there is no complicated, hierarchically organized nomenclature to learn. One simply creates and applies tags on the fly.
http://en.wikipedia.org/wiki/Folksonomy

By the way, the "descriptions" one creates are called "Metadata." However, the "thing" being described is often referred to as the "Essence" or a "media item." When you combine Essence with its Metadata you generate a media Asset that now has value to various end-user communities searching for and displaying content.

For further reading, visit an online Metadata Primer available from the NSDL--the National Science Digital Library.

What is a metadata dictionary and a metadata schema?

Metadata consists of descriptions about media items. Generating those descriptions can either use the free form methods of a populist Folksonomy or the more strict prescriptions associated with a set of systematic metadata rules and usage guidelines. Whether created from scratch or harvested from other sources, the metadata rules and guidelines must be defined and documented so others can reference the standards and apply them consistently... in other words, a Metadata Dictionary.

Basically, a Metadata Dictionary provides the following:

  1. the meaning or definitiion of a metadata descriptor (semantics),
  2. the grammar and rules for entering or cataloging descriptions/data (syntax),
  3. the defined properties associated with each metadata descriptor in a dictionary (attributes)

 

In building a Metadata Dictionary...

  • The metadata descriptors are called Elements.
  • An Element may stand alone, or be associated with or bound to other sibling elements in the dictionary.
  • Each Element has a carefully defined set of properties or Attributes.
  • And finally, Elements from various sources may be combined into a larger Application Profile that is specific to the needs of a particular community or type of user.

 

A step above a Metadata Dictionary in sophistication is a Metadata Schema. According to the Moving Image Collections (MIC) website, a schema has these traits...

A metadata schema is a standardized structure for metadata which allows repositories or machines to share data with mutual understanding. The metadata schema defines the data elements (fields) or tags (labels) used to enable indexing, retrieval, display, and sharing of records by computer systems.http://mic.imtc.gatech.edu/catalogers_portal/cat_descriptmeta.htm

A Metadadata Schema uses a Metadata Dictionary to build a Data Model to express and control for the relationships and hierarchies between metadata Elements; the goal is to exchange or share that data between different information systems or media asset repositories.

What is an application profile?

 

In the article "Application Profiles: Mixing and Matching Metadata Schemas," by Rachel Heery and Manjula Patel, an Application Profile is further defined as...

...schemas which consist of data elements drawn from one or more namespaces, combined together by implementers, and optimized for a particular local application.The experience of implementers is critical to effective metadata management...implementers use standard metadata schemas in a pragmatic way... ‘there are no metadata police’, [metadata] implementers will bend and fit metadata schemas for their own purposes.
http://www.ariadne.ac.uk/issue25/app-profiles/

For the PBS Digital Learning Library, numerous metadata dictionaries were referenced, harvested and folded into its Metadata Application Profile. Candidate metadata dictionaries and schemas included...

PBCore
Dublin Core
NETA Media Exchange Prototype
IEEE-LTSC LOM (Learning Technology Standards Committee Learning Object Metadata)
WGBH Teacher's Domain
Maryland Public Televison ThinkPort
PBS Teacher's Activity Database
UMAP--Utah Metadata Application Profile from Utah Education Network
ODRL--Open Digital Rights Language Model for Digital Rights Management
vCard Personal Data Interchange Metadata Model
PBS PODS

 

What is a metadata element?

The Periodic Table of Elements contains a carefully structured visualization of the chemical building blocks of the universe as we know it. Metadata Elements are the descriptive building blocks used to verbally or visually describe the world of resources, assets, media items, or "essence." They are referred to as Data Elements.

A Metadata Element is a single descriptor, such as a Title, a Duration, a Grade Level as a target audience, or a set of Keywords. The PBS DLL Metadata contains 69 separate elements. Each one is separately defined and can be reviewed by clicking on the entry in the left-hand Table of Contents called "Individual Element Definitions."

What are element attributes?

There are many specifications on how to define Data Elements, such as Metadata Descriptors. If one hopes to share metadata descriptions with other organizations and entities (interoperability), then it's best to follow an established set of guidelines in setting up and defining metadata elements. A commonly understood framework allows diverse groups to appreciate, understand, and harvest data or metadata descriptions from each other.

The PBS DLL Metadata employs a modified standard for describing data elements used in databases and documents. It is called ISO/IEC 11179: Specification and Standardization of Data Elements. Technically speaking, the PBS DLL Metadata is considered to be "cognizant of ISO/IEC 11179." In using the ISO/IEC standard, each descriptor or metadata element is identified by numerous attributes or characteristics that define and refine the definition of an element.

The attributes the PBS DLL Metadata employs are fully explained on a separate web page you can access from an entry in the left-hand Table of Contents called An Introduction to Metadata > Defining Elements: Attributes.

What is a controlled vocabulary, an authority, or structured syntax?

The definition and prescriptions for metadata entry or cataloging can be made more systematic by establishing restrictions on how data should actually be entered into an information or content management system... in other words, supplying rules or pre-established terms that can be employed when describing a media item.

The grammar of a description for a metadata element can be prescribed. A good example of this type of refinement is the order in which a person's name is displayed, e.g., LastName, FirstName MiddleName, Credentials or FirstName MiddleName LastName, Credentials. (For a fascinating discussion on the complexities behind entering and displaying the names of people and organizations, see the article "Representing People's Names in Dublin Core.")

Another example of a prescription for data entry is the manner in which dates are represented, Do you order a date by Month/Day/Year, Day/Month/Year or Year/Month/Day? (For a discussion of the variables in representing dates and times, see the W3C report on "Date and Time Formats.")

There are basically three way in which to control the terms and descriptions used while cataloging. These refinements use formal notations, vocabularies or specific parsing rules.

  • Use an "authority file" from another agency that specifies how to properly enter descriptive information for a type of metadata element. It may provide taxonomies of terms organized into logical hierarchies, such as the Library of Congress "subject" terms or state and national core curriculum standards and objectives for education.

  • Use a short listing of prescribed terms, often called a "controlled vocabulary." The best practice is to select a term or terms from a picklist. The picklist insures consistency in data entry.

  • Follow a particular structured syntax, punctuation or grammar when entering data, e.g., LastName, FirstName MiddleName, Credentials or dates as 2005-02-24 (YYYY-MM-DD).

Controlling the descriptions entered for a metadata element ultimately means that end users are able to conduct successful searches for relevant media items and avoid an explosive number of irrelevant "hits."

The PBS DLL Metadata utilizes Authorities, Controlled Vocabularies, and Structured Syntax wherever precision is needed and ambiguity is to be avoided.

For an excellent discussion on Controlled Vocabularies and Authorities, consult the MIC-- Moving Image Collections Cataloging and Metadata Portal for Standards and Tools at this web page: http://mic.imtc.gatech.edu/catalogers_portal/cat_cntrldVocab.htm

What is the Dublin Core (DCMI)?

The PBS DLL Metadata Application Profile folds in Dublin Core Metadata. Dublin Core (ISO 15836) is an international metadata standard for resource discovery (http://dublincore.org).

The Dublin Core Metadata Initiative (DCMI) is an open forum engaged in the development of interoperable online metadata standards that support a broad range of purposes and business models. DCMI's activities include consensus-driven working groups, global workshops, conferences, standards liaison, and educational efforts to promote widespread acceptance of metadata standards and practices.

The Dublin Core specifies a limited set of metadata elements intended to facilitate discovery of electronic resources. The Dublin Core has been in development since 1995 through a series of focused invitational workshops that gather experts from the library world, the networking and digital library research communities, and a variety of content specialties.

The Dublin Core Metadata Initiative is the body responsible for the ongoing maintenance of Dublin Core. The work of DCMI is done by contributors from many institutions in many countries. DCMI is a consensus-driven organization organized into working groups to address particular problems and tasks.

What is the PBCore?

The PBS DLL Metadata Application Profile folds in PBCore metadata (the Public Broadcasting Core of Metadata descriptions).
http://www.pbcore.org

The PBCore project started in January of 2002. During its first two phases of funding from the Corporation for Public Broadcasting, a team of individuals representing public broadcasting's key institutions and endeavors, along with subject matter experts, worked to:

  • Develop consensus regarding project objectives and timeline;
  • Recognize and codify the way public broadcasting's constituents use content and content information;
  • Examine relevant metadata standards in the media and library communities, to ascertain their applicability to public broadcasting content and its constituencies;
  • Make information about the PBCore Project available via numerous conference presentations and a project website;
  • Contribute and combine the substantial metadata work already performed at key institutions in public broadcasting (PBS, NPR, WGBH, University of Utah, MPR);
  • Form a preliminary consensus regarding a single set of metadata protocols - the Public Broadcasting Metadata Dictionary (PBCore), Version 1.0 (enhanced as v1.1 in January 2007, enhanced as v1.2 in early 2010).

In subsequent phases of the project, PBCore built advocacy for the metadata dictionary and its associated XML Schema Definition (XSD), created and conducted training in the use of PBCore (live and on-demand), and recommended and provided for the long-term sustainability and support of PBCore.

PBCore has successfully been employed in indexing and cataloging endeavors in a variety of industries, not just broadcasting, and can be found in use in many nations.

How do I review the metadata elements of PBS DLL?

This website, the PBS DLL Metadata User Guide, is a companion to the main PBS Digital Learning Library On-Board project site. Where the On-Board site focuses on participation in the project and content contributions, this Metadata User Guide documents the metadata dictionary developed for the PBS DLL. This documentation is designed to be the sole source for the metadata element definitions and the guidelines for their proper usage by content contributors to the PBS Digital Learning Library.

The Metadata User Guide provides both introductory and detailed information about PBS DLL Metadata. All pages are accessed by using the Table of Contents to the left as your main site navigation. The documentation is divided into these pages...

  • ABOUT METADATA
    • Introduction to Metadata
      • About the PBS DLL Metadata
      • FAQs on Metadata in general
      • Defining Elements: Attributes Used in our definitions
  • VIEW THE METADATA ELEMENTS
    • Overviews of All Elements
      • All-In-One Definitions Document
      • Diagram View of the Elements
      • Alphabetical Listing of the Elements
      • Elements Grouped by Obligation to Use
      • Elements Grouped by Related Categories
    • Individual Element Definitions
      • Select a Specific Element to View its Definition & Guidelines for Usage
    • Samples of Full Metadata Records
      • Video Assets
      • Audio Assets
      • Image Assets
      • Document Assets
      • Interactive Assets
      • Group Assets (multiple, related Video, Audio, Image, Document & Interactive media items)
    • Mappings to Other Metadata Standards
      • Index and Documentation for Various Mappings (including PBCore, Dublin Core, and others as needed)
    • Exchanging Metadata
      • Download and use the PBS DLL Metadata XSD: XML Schema Definition exchange document
      • About the OAI: Open Archives Initiative and metadata harvesting
    • Metadata Schema Changes
      • Version Control & Change Orders (ongoing notifications and documentation of changes made to the PBS Digital Learning Library Metadata Elements and Schema over time)

 

Metadata Exchanges and XML Schema Definitions (XSD)?

A metadata schema, as well as the actual descriptions of media items that may use the schema, need to be presented in some logical, clearly expressed manner so that the information can be interpreted correctly. More importantly, using well-formed methods to express metadata schemas and descriptions allows different parties to share data; they are communicating using the same language and the same grammar.

A language that is often used to express well-formed data is XML, Extensible Markup Language. Unfortunately, unless the party offering and the party accepting the well-formed data are using a common grammar, information is likely to be mangled as it is interpreted and validated.

This situation is where an XML Schema (also called an XSD--XML Schema Definition) is used to define the grammar and validate the data being shared. Some have stated that a XML Schema functions as a blueprint for describing the structure of the XML language in a document. These blueprints supply the...

  • Sequence in which elements appear in an XML document
  • Interrelationships between different elements (parent-child associations or nested relationships)
  • Types of data that are used to express elements and attributes (text string, number, date, timestamp, etc.)
  • References to Authorities and Controlled Vocabularies

DTDs, or Document Type Descriptions, are an alternative method for describing the blueprint. DTDs have been around longer than XML Schemas, and are very widely used. However, they have some limitations in their capacities, such as using non-XML syntax in composing a DTD, support for limited data types, inability to identify namespaces, and no support for extensibility or inheritance. XML Schemas, however, do not have these limitations while also allowing users to craft their own data types.

Typically, complex data structures, with multiple data types, require the use of an XML Schema rather than a DTD.

If your metadata cataloging system is never intended to share data or descriptions with other systems, then any variations and customizations you make to a metadata schema are confined to your own instance. If you need to export your data to another information system, then metadata descriptions for your media items and assets must be transformed into a standard framework that other information systems can interpret correctly.

We have created what is called an XML Schema Definition document (XSD) for PBCore v1.1. It is a standard framework upon which data exported from one information system can be transformed into PBCore compliant structures. It is a standard framework with which data can be interpreted in a known fashion by another information system, and imported into its metadata structures. Below is an illustration of the process.

PBS DLL is offering its metadata blueprint via an XML Schema Definition. The download link can be found by using the entry in the left-hand Table of Contentscalled Exchanging Metadata > XSD (XML Schema Definition).

For further information about XSD's, use the links listed below for Primers, XML Schema Definitions and Specifications as provided by W3C.

Do the PBS DLL elements have a Heirarchical Tree Structure?

The PBS Digital Learning Libarary Metadata Dictionary is...

  • a core set of terms and descriptors (elements)...
  • used to create information (metadata)...
  • that categorizes or describes...
  • media items (sometimes called assets or resources).

As a simple dictionary of elements, no "hierarchy" would be implied. Elements are presented in a "flat" arrangement as a listing of descriptors with specific attributes from which you can pick and choose and apply in whatever cataloging or information/asset/content management system you have.

Beyond a metadata dictionary, actual metadata elements are organized into a framework. According to an article in the Wikipedia entitled Data Modeling...

Data dictionaries are usually separate from data models since data models usually include complex relationships between data elements.

When data modeling, we are structuring and organizing data. These data structures are then typically implemented in a database management system. In addition to defining and organizing the data, data modeling will impose (implicitly or explicitly) constraints or limitations on the data placed within the structure.

A data model represents classes of entities (kinds of things) about which a company wishes to hold information, the attributes of that information, and relationships among those entities and (often implicit) relationships among those attributes.


Some metadata models or schemas are based on a logical, hierarchical arrangement of their metadata elements, not only in the way they are conceptually presented, but also in how they are applied in actual metadata and asset management systems. For example, the IEEE 1484.12.1-2002 Standard for Learning Object Metadata is hierarchical. At the base of their hierarchy is a "root" element. The root element contains many sub-elements. If a sub-element itself contains additional sub-elements it is called a "branch." Sub-elements that do not contain any sub-elements are called "leaves." This entire hierarchical model is called a "tree structure" and is witnessed in this early rendition of the Learning Object Metadata Elements...

IEEE-LOM Metadata Tree Structure

The PBS DLL Metadata is not organized hierarchically either conceptually or as a data model. It can be considered for all intents and purposes to be a flat arrangment of metadata elements, which simplifies data transformation chores and metadata exchanges.

Do PBS DLL elements map or crosswalk to other metadata schemas?

What happens when one community desires to share metadata information entered in its systems with another community that maintains its own metadata standard? In a perfect world, each metadata element from the "source" metadata standard could be paired with a similar metadata element in the "target" metadata standard, and the data would be transferred.

Unfortunately, such a pure one-to-one pairing or "harmonization" is rare. Although each standard may use a common method to express the properties of its metadata elements, the actual data held within the element may not "crosswalk" or "map" perfectly.

The following quote was extracted from an excellent article entitled "Issues in Crosswalking Content Metadata Standards." It was originally published through NISO, the National Information Standards Organization, and authored by Margaret St. Pierre of Blue Angel Technologies, Inc. and William P. LaPlant, Jr., of the U.S. Bureau of the Census Statistical Research Division.

A crosswalk is a specification for mapping one metadata standard to another. Crosswalks provide the ability to make the contents of elements defined in one metadata standard available to communities using related metadata standards. Unfortunately, the specification of a crosswalk is a difficult and error-prone task requiring in-depth knowledge and specialized expertise in the associated metadata standards. Obtaining the expertise to develop a crosswalk is particularly problematic because the metadata standards themselves are often developed independently, and specified differently using specialized terminology, methods and processes. Furthermore, maintaining the crosswalk as the metadata standards change becomes even more problematic due to the need to sustain a historical perspective and ongoing expertise in the associated standards.

When harmonizing metadata elements from different standards, there are several points of intersection where collisions, rather than smooth merging, may occur.

Matching Semantic Definitions
An element in the source standard may not find a companion element in the target standard because the definition, semantics, or meaning for elements are different. With such a mismatch, a descriptor may not translate well.

Matching Element-to-Element Relationships
Suppose the source standard uses separate metadata elements to identify the (1) Last name of a person, (2) First name, (3) Middle name, and (4) Credentials for an individual. What if the target standard only employs a single element to contain all of a person's names, prefixes and suffixes? How do the "many" elements of the source map to the "one" element in the target? There is a "many-to-one" mismatch. Likewise, there may exist a "one-to-many" element mismatch between the source and target standards. Furthermore, one standard may contain extra elements and descriptors that cannot even be paired with the other system.

Matching & Converting Content
The properties for a metadata element may define or restrict its contents by...

  • data types (e.g., text, numeric, string, date, etc.),
  • ranges of values, or
    data refinements derived from the use of various authorities, controlled vocabularies, or
  • specific syntaxes for the presentation of the data (e.g., keywords separated by semi-colons).
  • repeatability of the element in order to express multiple values or descriptions
  • mandatory or optional usage of the element when entering values.

Even though a metadata element from a source standard may semantically match an element in a target standard, the rules by which the actual data entered in the element may differ between the systems. The mismatch may be resolved by some form of conversion or data reformatting. Consistency in how data was originally entered is key to formulating conversion utilities or crosswalks.

Matching Single vs. Multiple or Compound Data Objects
Many asset management systems and databases allow the relationships between several data records/media items to be expressed. For example, a video program might have a transcript (text document), brochure (pdf), DVD (non-digital medium for order fulfillment), and other items associated with it. If an end user searches for the video program, the search results report the related media items as well. These associated/related items are often housed as a "multiple" or "compound" data object. Many databases actually refer to them as "container fields." If the source and target metadata system use different methods to identify and report multiple or compound objects, then a mismatch in mapping occurs.

Matching Hierarchical and Flat Metadata Standards
Some metadata standards, like IEEE-LOM (Learning Object Metadata) use a very hierarchical structure to organize the relationships between metadata elements. These relationships can often become quite complex. The PBS DLL Metadata Diectionary and Dublin Core are both flat in nature, with no implied or expressed hierarchy. Trying to pair metadata elements from a hierarchical and a flat system can be troublesome.

The answer to the question if PBS DLL Elements map or crosswalk to other metadata standards is yes. This website documents a variety of mappings to and from PBS DLL Metadata, including to and from the Public Broadcasting Metadata Dictionary, PBCore. An index to the various mappings can be found by using the entry in the left-hand Table of Contents called Mappings to Other Standards > Index to Mappings.