Customized Video Playback using a
Standard Metadata Format

Michael BUSH

Associate Director

Center for Language Studies

[email protected]



Professor of Linguistics

Department of Linguistics
[email protected]


Brigham Young University
Provo, Utah 84602 (USA)

Making Video Access as Tractable as Access to Text

The mantra of the Information Age?

Anything, anytime, anywhere!


The new mantra of the Information Age:

“Nothing, nowhere, never, unless it is timely, important, amusing,
relevant, or capable of engaging my imagination!”

(Nicholas Negroponte, formerly of the MIT Media Lab, now with One Laptop per Child)

Rupert Murdoch has recently put his
lieutenants at News Corporation on notice:

“The days of top-down, force-fed, one-size-fits-all media are over. The new imperative is to deliver precisely what audiences want, when and where they want it.”

(Quoted by Spencer Reiss in Wired, Issue 14.07 - July 2006

Requirements for finding the video we need, when we need it:

An approach in the spirit of the Text Encoding Initiative but that works for all time-based media.

A standards-based schema for the adequate description of all relevant aspects of time-based media assets.

A storage and retrieval system that can function (a) on a “global” level to find the right asset at the right time and (b) on a “local” level to enable the playback (or avoidance of playback!) of the specific segments of interest.

A solution that does as well for educators and learners as the media industry will do for general consumers!!

Video Asset Descriptions (VAD)

Describing Video Assets

Full MPEG-7 (ISO 15938) : Too large!

IEEE Learning Object Metadata (LOM): Too small!

Video Asset Description using MPEG-7 Core Description Profile (in MPEG-7 Part 9) : Just right!!


Development of MPEG-7 Parts 9 & 11

Three subsets of the very large MPEG-7 specification: “profiles” (2002—2005)

Core Description Profile (CDP)

Japanese National Television
Brigham Young University

Two other profiles

Simple Metadata Profile (SMP) &

User Description Profile (UDP)

Customized Video Playback (CVP)

What is customized video playback?

Customized video playback is playing a sequence of video clips under control of the user (teacher, learner, video viewer, etc.).

Clips can have associated annotations


Applications of Customized Video Playback

Electronic Film Review (EFR)

Avoids copyright restrictions

Film Studies

Use EFR to develop materials about important films for distribution to other universities for their film studies programs.

Digital libraries

Information about parts of a video asset can be stored in a VAD and searched to find a clip about a particular topic.

Annotated Films for Language Learning

Al Aragouz: 1972 Egyptian Film with Omar Sharif.

Initial interactive application was developed in 1994.

Annotations were converted to Unicode in 2004.

DVD with subtitles is under development in 2006.

EFR has been created from original, non-standard film annotations

Examples of Data Transformation for
Customized Video Playback

Al Aragouz (1994 to 2004)

Converting annotation file (with proprietary Arabic fonts) to Unicode


• آبا ... آبا ... آبا ...

• بسم الله الرحمن الرحيم. بسم الله الرحمن الرحيم. مالك يا بهلول؟

• أمة يابا!


• Dad! Dad! Dad!

• In the name of God, the Merciful, the Compassionate. In the name of God, the Merciful, the Compassionate. What's the matter, Bahloul?

• Mommy, Dad!


The phrase بسم الله الرحمن الرحيم bism illaah ir-raHmaan ir-raHiim is a traditional Islamic invocational formula used in prayer and when reciting the Qur'an, as well as at the onset of a meal or when concluding a contract. It is also the opening phrase of the فاتحة faatiHa, the first سورة suura or chapter of the Qur'an. The phrase is also known as the بسملة basmala.

مالك maa-lak : What's the matter (with you)? Literally: what (is) to you.

Examples of Data Transformation for
Customized Video Playback

Al Aragouz EFR (2006)

<?xml version="1.0" encoding="utf-8"?>

<!DOCTYPE efr SYSTEM "D:\Work\Bush_Mike\aragouz\AragouzTimeCodes\InsertTimeCodes\InsertTimeCodes\bin\Debug\EFRv05b.dtd"[]>

<efr version="1.2" dvduniqueid="f5211d6626c39d72" id="efraid_autogen_Al_Aragouz_20060328T053332" name="al-Aragouz" lang="en" xmlns="">


<descrip type="note">(deprecated)</descrip>



<clip level="1" num="C00001" start="4580" end="7996" name="0" id="C1">

<clip level="2" num="C00002" start="4580" end="4911" name="0" id="C1-S1">

<descrip type="transcrip">آبا ... آبا ... آبا ...::بسم الله الرحمن الرحيم. بسم الله الرحمن الرحيم. مالك يا بهلول؟::أمة يابا!</descrip>

<descrip type="translat">Dad! Dad! Dad!::In the name of God, the Merciful, the Compassionate. In the name of God, the Merciful, the Compassionate. What's the matter, Bahloul?::Mommy, Dad!</descrip>

<descrip type="schema">The phrase بسم الله الرحمن الرحيم bism illaah ir-raHmaan ir-raHiim is a traditional Islamic invocational formula used in prayer and when reciting the Qur'an, as well as at the onset of a meal or when concluding a contract. It is also the opening phrase of the فاتحة faatiHa, the first سورة suura or chapter of the Qur'an. The phrase is also known as the بسملة basmala. مالك maa-lak : What's the matter (with you)? Literally: what (is) to you.</descrip>




Examples of Data Transformation for
Customized Video Playback


<?xml version="1.0" encoding="UTF-8"?>

<Mpeg7 xmlns="urn:mpeg:mpeg7:schema:2001" xmlns:xsi="" xsi:schemaLocation="urn:mpeg:mpeg7:schema:2001 CDPschemaFromMP7P11.xsd">

<!-- Mpeg7 is the root element of every MPEG-7 document instance,

and a document instance that conforms to an MPEG-7 profile, such as CDP,

must also validate against the master MPEG-7 schema -->

<DescriptionProfile profileAndLevelIndication="urn:mpeg:mpeg7:profiles:2004:CDP"/>




<!-- DescriptionMetada is meta-metadata, that is, data about the MPEG-7 description,

rather than about the video asset in question -->

<Description xsi:type="ContentEntityType">

<MultimediaContent xsi:type="AudioVisualType">


[here we identify the video asset as a DVD and provide its unique ID]




<!-- After the DescriptionMetadata element, a CDP file consists of a sequence of Description elements; -->

<Description xsi:type="ContentEntityType">

<MultimediaContent xsi:type="VideoType">

<Video id="MainTitle">

[here we segment the video asset hierarchically]




