A quick introduction to metadata for people working in content-related disciplines
This is another article in which I write about and reflect upon topics that I learned about while studying Content strategy at FH Joanneum Graz.
While my colleagues are pondering faltering client relationships, critiquing your own work or how to set boundaries at work for better mental health, my approach is less original, but I hope not less informative.
It is a personal recap or summary and an attempt to finally understand things learned by summing them up in my own words.
This article is about metadata, so let’s jump right into the topic:
The basics
Let’s go all the way back: what is data?
Data is the raw material all things digital are built out of.
Without any further context, it is an aggregation of zeros and ones that are held by a container, the file. This is encoded in a certain file format, that makes it readable/viewable for certain applications. A file has a distinct name.
This data file potentially is a piece of content, but it isn’t yet, because we don’t know enough about it to understand what it is. What we are missing is the context.
With context, data turns into content
Let’s take a media asset, for example an image file. From the format, e.g. png, we know it is a picture. But that’s about it. What we don’t know is:
When was it created? By whom? Who holds the copyright to it? What are its dimensions? What does it show?
It is metadata that puts data into context. If done right and thoroughly, it puts a label to that file that holds the information to describe the data it contains as precisely as possible.
This renders the file discoverable in searches, processable for CMSs, usable for multiple purposes and — if the metadata is in the right format — makes it provide context to AI and machine learning.
Metadata types
There are many kinds of metadata, that describe different aspects of the file. Here are some of them:
· descriptive: information about the author(s), time and date of creation/modification, keywords (think of tags), …
· administrative: technical (e.g. decoding) and legal (intellectual property) information
· provenance: info on the creation and modifications throughout the file’s lifetime
· structural: e.g. the information on how many pages and chapters a book has
Metadata structure
The structure of metadata consists of elements that ask the question (eg: when was the file created), values that answer them (e.g. with a date) and attributes, that further specify the information. This can be by determining the required data type, the meaning of the element or the obligation to use it for example.
Schemas, standards and interoperability
A metadata schema is a defined set of elements designed to describe a specific type of data. Once a schema is officially published by an organization, it becomes a metadata standard. There are many of these standards available, one of the most prominent being Dublin core, a set 15 basic and common identifiers that describe resources. There are standards for data structure, data values, data contents and data exchange. Many topical standards have been developed, you can find some of them in this extensive list.
You are free to use and layer different standards and schemas to your liking and so create a custom set of attributes to completely describe all the various types of data assets within your domain. The PRISM Metadata Standard for publishing incorporates Dublin Core elements and combines them with its own specific elements.
Schema.org is an extensive standard developed by Bing, Google and Yahoo to render information on websites recognizable for search engines and make it part of the semantic web.
Interoperability is the concept of marking up data using metadata, so information can be automatically exchanged and interpreted across diverse systems and organizations. Schema.org is an example for this in the web-domain, but any large organization with multiple data silos can heavily profit from connecting its data with the help of semantic metadata, that shows the connections between data assets rather than merely describing them as single entities.