<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">


<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    
    <title>Getting a Handle on Systems Metadata for the Long Haul &#8212; v2.1.0-beta</title>
    
    <link rel="stylesheet" href="../_static/dataone.css" type="text/css" />
    <link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
    
    <script type="text/javascript">
      var DOCUMENTATION_OPTIONS = {
        URL_ROOT:    '../',
        VERSION:     '2.1.0-beta',
        COLLAPSE_INDEX: false,
        FILE_SUFFIX: '.html',
        HAS_SOURCE:  true,
        SOURCELINK_SUFFIX: '.txt'
      };
    </script>
    <script type="text/javascript" src="../_static/mathjax_pre.js"></script>
    <script type="text/javascript" src="../_static/jquery.js"></script>
    <script type="text/javascript" src="../_static/underscore.js"></script>
    <script type="text/javascript" src="../_static/doctools.js"></script>
    <script type="text/javascript" src="//cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-MML-AM_CHTML"></script>
    <script type="text/javascript" src="../_static/sidebar.js"></script>
    <link rel="author" title="About these documents" href="../about.html" />
    <link rel="index" title="Index" href="../genindex.html" />
    <link rel="search" title="Search" href="../search.html" />
    <link rel="next" title="Natural History of System Metadata" href="SysmetaLifecycle.html" />
    <link rel="prev" title="System Metadata" href="SystemMetadata.html" />
   
  
  <link media="only screen and (max-device-width: 480px)" href="../_static/small_dataone.css" type= "text/css" rel="stylesheet" />

  </head>
  <body role="document">
  
    <div class="version_notice">
      <p>
      <span class='bold'>Warning:</span> These documents are under active 
      development and subject to change (version 2.1.0-beta).<br />
      The latest release documents are at:
      <a href="https://purl.dataone.org/architecture">https://purl.dataone.org/architecture</a>
      </p>
    </div>

    <div class="related" role="navigation" aria-label="related navigation">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="../genindex.html" title="General Index"
             accesskey="I">index</a></li>
        <li class="right" >
          <a href="../py-modindex.html" title="Python Module Index"
             >modules</a> |</li>
        <li class="right" >
          <a href="SysmetaLifecycle.html" title="Natural History of System Metadata"
             accesskey="N">next</a> |</li>
        <li class="right" >
          <a href="SystemMetadata.html" title="System Metadata"
             accesskey="P">previous</a> |</li>
        <li class="nav-item nav-item-0"><a href="../index.html"></a> &#187;</li>
          <li class="nav-item nav-item-1"><a href="index.html" accesskey="U">&lt;no title&gt;</a> &#187;</li> 
      </ul>
    </div>  

    <div class="document">
      <div class="documentwrapper">
        <div class="bodywrapper">
          <div class="body">
            
  <div class="section" id="getting-a-handle-on-systems-metadata-for-the-long-haul">
<h1>Getting a Handle on Systems Metadata for the Long Haul<a class="headerlink" href="#getting-a-handle-on-systems-metadata-for-the-long-haul" title="Permalink to this headline">¶</a></h1>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Revisions:</th><td class="field-body"><table border="1" class="first last docutils">
<colgroup>
<col width="12%" />
<col width="88%" />
</colgroup>
<thead valign="bottom">
<tr class="row-odd"><th class="head">Date</th>
<th class="head">Comment</th>
</tr>
</thead>
<tbody valign="top">
<tr class="row-even"><td>20100416</td>
<td>(Sandusky) Additional text; discussions of PREMIS, BagIt, ORE</td>
</tr>
<tr class="row-odd"><td>20100402</td>
<td>(Allen, Sandusky) Additional text</td>
</tr>
<tr class="row-even"><td>20100326</td>
<td>(Allen) Added more text and structure</td>
</tr>
<tr class="row-odd"><td>20100312</td>
<td>(Allen) First draft</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<div class="section" id="introduction">
<h2>Introduction<a class="headerlink" href="#introduction" title="Permalink to this headline">¶</a></h2>
<p>The DataONE systems metadata is critical to ensuring that science data, science
metadata, and other digital objects stored in DataONE are discoverable,
accessible, auditable, verifiable, and are associated with meaningful related
digital objects. Digital objects in DataONE must also be viable for the long
term - for many decades - and so the system metadata must also include
provenance information.</p>
<p>Due primarily to the project deliverable schedule, the current DataONE system
metadata definition (DataONE, 2010b) is focused on the essential metadata values
that must be available to support the earliest versions of DataONE. So far,
relatively little attention has been paid to ensuring that system metadata
contains appropriate attributes for the long haul.</p>
<p>This document describes some of the results of a practicum project carried out
by Elizabeth (Betsy) Allen in Spring 2010. The PREMIS Data Dictionary for
Preservation Metadata was used as a standard against which the explicit and
implicit requirements of DataONE would be measured: &#8220;The PREMIS Data Dictionary
defines &#8216;preservation metadata&#8217; as the information a repository usees to support
the digital preservation process&#8221; (PREMIS Editorial Committee, 2008, p.3).
PREMIS is focused on the &#8220;viabilility, renderability, understandability,
authenticity, and identity&#8221; of digital objects &#8220;in a preservation context&#8221;
(i.e., DataONE), and pays particular attention to &#8220;digital provenance (the
history of an object and to the documentation of relationships, especially
relationships among different objects within the preservation repository&#8221; (p.3).</p>
<p>The PREMIS Data Dictionary seeks to identify &#8220;core&#8221; elements with its set of
definitions, where core implies &#8220;things that most working preservation
repositories are likely to need to know in order to support digital
preservation&#8221; (p.3). PREMIS also considered &#8220;implementability&#8221;: because of the
large amounts of data held within preservation systems, metadata values should
be suppied autmatically and be capable of automated processing: required human
intervention should be avoided.</p>
<p>It should also be noted that PREMIS has been created according to the &#8220;1:1
principle&#8221; which &#8220;asserts that each description describes one and only one
resource.... It is not possible to change a file...; on can only create a new
file... that is related to the source object&#8221; (p.14). The practicality of this
principle in DataONE has been debated by the Core Cyberinfrastructure Team,
which recognizes its conceptual cleanliness as well as its operational
impracticality.</p>
<p>PREMIS does not specify formats or other requirements for how preservation is
implemented in a preservation repository: these are left as local decisions.</p>
<p>DataONE, as a preservation repository, could aim toward &#8220;PREMIS conformance&#8221; by
implementing metadata elements that share the names and semantics of PREMIS Data
Dictionary semantic units. PREMIS is also intended to be a foundation for
interoperable preservation repositories (pp.15-16); PREMIS recommends using its
semantic unit names to aid this interoperability. This document does not argue
for or against seeking PREMIS conformance for DataONE; rather, it seeks to
identify and summarize topics and outstanding issues for discussion within the
broader DataONE community.</p>
<p>This practicum project also took the Open Archives Initiative&#8217;s Object Reuse and
Exchange (OAI-ORE) (Lagoze et al, 2008) and the BagIt File Packaging Format
(Boyko et al, 2009) into account as possible standards for aggregations, or
packages, of Web resources and as possible methods for recording the
relationships between preserved objects in DataONE. BagIt is &#8220;is a hierarchical
file packaging format designed to support disk-based or network-based storage
and transfer of generalized digital content&#8221; (p.3). OAI-ORE &#8220;defines standards
for the description and exchange of aggregations of Web resources&#8221;. This
document looks at the points where BagIt and OAI-ORE may play a role in
supporting the long-term preservation needs of DataONE.</p>
<p>BagIt and OAI-ORE have different strengths, and both systems have potential for
use within DataONE. Their differences are significant, and so they cannot be
viewed as substitutes for each other. The strengths of BagIt include simplicity
(text and file/directory orientations; supports opaque payloads; supports
aggregation; self-describing); its limitations include its hierarchical
structure and what appears to be the &#8220;fixed-in-time&#8221; nature of a bag: the
specification doesn&#8217;t discuss how the content of a bag might evolve over time,
which limits its utility for application to supporting provenance tracking for
objects in a preseravation repository.</p>
<p>OAI-ORE&#8217;s strengths include its flexibility and extensibility, its graph-based
architecture, specifications based upon stable and widely-used technologies
included the URI and RDF, and the ability for relationships to be added to
existing OAI-ORE resources as time passes. OAI-ORE is also related to other
efforts that may play a role in DataONE, such as the Open Annotation
Collaboration (<a class="reference external" href="http://www.openannotation.org/">http://www.openannotation.org/</a>). OAI-ORE&#8217;s flexibility has a cost
in terms of its complexity, so it will be more costly to develop and maintain a
reliable implementation (although its current popularity may mean that existing
code implementations may be available for use within DataONE). OAI-ORE is also
expressed in XML, which tends to increase storage consumption. Large XML data
stores are also time-consuming to parse without optimization.</p>
<p>The document is organized as follows. A set of high-level requirements was
developed to represent the general needs of the DataONE system metadata. For
each of the high-level requirements identified, the relevant sections of the
PREMIS data dictionary were identified, and missing, additional, mismatched
aspects are identified in the text. The documentation for BagIt and OAI-ORE were
consulted and their relevance to the requirement is also discussed in the text.
Optionally, use cases relevant to the requirement are described, using science
data specified in EML, Dryad, ORNL DAAC, and/or NBII formats as examples. The
section on each requirement ends with a general discussion of the overall
analysis.</p>
</div>
<div class="section" id="system-metadata-requirements">
<h2>System Metadata Requirements<a class="headerlink" href="#system-metadata-requirements" title="Permalink to this headline">¶</a></h2>
<p>For each of the high-level requirements identified, the relevant sections of the
PREMIS data dictionary were identified, and missing, additional, mismatched
aspects are identified in the text. The documentation for BagIt and OAI-ORE were
consulted and their relevance to the requirement is also discussed in the text.
Optionally, use cases relevant to the requirement are described, using science
data specified in EML, Dryad, ORNL DAAC, and/or NBII formats as examples. The
section on each requirement ends with a general discussion of the overall
analysis.</p>
<div class="section" id="requirement-1-perform-replication-on-digital-objects">
<h3>Requirement 1: Perform replication on digital objects<a class="headerlink" href="#requirement-1-perform-replication-on-digital-objects" title="Permalink to this headline">¶</a></h3>
<div class="section" id="description">
<h4>Description<a class="headerlink" href="#description" title="Permalink to this headline">¶</a></h4>
<p>To increase accessibility and help ensure long-term preservation, the
Coordinating Nodes will perform replications on digital objects. Systems
metadata will be replicated at each of the three Coordinating Nodes, while
datasets and their associated descriptive metadata will be replicated at a
minimum of two Member Nodes.</p>
</div>
<div class="section" id="what-premis-suggests">
<h4>What PREMIS suggests<a class="headerlink" href="#what-premis-suggests" title="Permalink to this headline">¶</a></h4>
<p>When a replication is performed, the DataONE system will need to record which
object was replicated (1.1), the unique identifier of the new copy (1.1), where
the replicate is stored in the system (1.7), information on the derivative
relationship between the original object and the new one (1.10) [in PREMIS,
replication of an object is defined as one type of a derivation relationship;
see p.13], and information on the event that created the replicate such as the
unique identifier of the event (2.1), type = replication (2.2), time (2.3), who
performed the replication (2.6), and a link between the replicated object and
the event (2.7).</p>
</div>
<div class="section" id="what-bagit-and-oai-ore-provide">
<h4>What BagIt and OAI-ORE provide<a class="headerlink" href="#what-bagit-and-oai-ore-provide" title="Permalink to this headline">¶</a></h4>
</div>
<div class="section" id="dataone-use-cases-and-requirements">
<h4>DataONE use cases and requirements<a class="headerlink" href="#dataone-use-cases-and-requirements" title="Permalink to this headline">¶</a></h4>
<p>(Requirement) System supports data storage <a class="reference external" href="https://trac.dataone.org/ticket/383">https://trac.dataone.org/ticket/383</a></p>
<p>(Requirement) The infrastructure must survive destruction of one or more data
storage nodes <a class="reference external" href="https://trac.dataone.org/ticket/411">https://trac.dataone.org/ticket/411</a></p>
<p>(Requirement) Data and metadata is replicated to at least one other Member Node
<a class="reference external" href="https://trac.dataone.org/ticket/433">https://trac.dataone.org/ticket/433</a></p>
</div>
<div class="section" id="discussion">
<h4>Discussion<a class="headerlink" href="#discussion" title="Permalink to this headline">¶</a></h4>
</div>
</div>
<div class="section" id="requirement-2-perform-preservation-migration">
<h3>Requirement 2: Perform preservation migration<a class="headerlink" href="#requirement-2-perform-preservation-migration" title="Permalink to this headline">¶</a></h3>
<div class="section" id="id1">
<h4>Description<a class="headerlink" href="#id1" title="Permalink to this headline">¶</a></h4>
<p>Migration is one kind of preservation strategy that Coordinating Nodes may
choose to use when a particular format of an object is in danger of
obsolescence. Also, through time, the physical media the digital objects are
stored on will degrade and an object will need to be migrated to a new media.</p>
</div>
<div class="section" id="id2">
<h4>What PREMIS suggests<a class="headerlink" href="#id2" title="Permalink to this headline">¶</a></h4>
<p>Prior to migrating an object to a different format the system must first know
the following information: current format name and version (1.5.4.1.1 and
1.5.4.1.2, respectively), assurance that the file to be migrated is not
corrupted (1.5.2 fixity), which alternative format is the best possible format
to migrate the file to given the hardware and software requirements (refer to a
digital format registry?), and name and version of application that created the
object (1.5.5.1 and 1.5.5.2, respectively).</p>
<p>When performing a migration due to format obsolescence, the DataONE system will
need to record the following metadata: which object is being replicated (1.1),
what is the unique identifier of the new object (1.1), where in the system is
the new object stored (1.7), information on the derivative relationship between
the original object and the new one (1.10). Also, it needs to record metadata on
the event that created the newly migrated object such as unique identifier
(2.1), type = migration (2.2), time (2.3), who performed the migration (2.6),
and a link between the migrated object and the event (2.7).</p>
<p>When migration for physical media obsolescence occurs, the system should record
where the object is now located (1.7.1 contentLocation).</p>
</div>
<div class="section" id="id3">
<h4>What BagIt and OAI-ORE provide<a class="headerlink" href="#id3" title="Permalink to this headline">¶</a></h4>
</div>
<div class="section" id="id4">
<h4>DataONE use cases and requirements<a class="headerlink" href="#id4" title="Permalink to this headline">¶</a></h4>
<p>(Requirement) The infrastructure must support long term preservation of data
<a class="reference external" href="https://trac.dataone.org/ticket/410">https://trac.dataone.org/ticket/410</a></p>
<p>(Requirement) Maintain original copies of all science metadata
<a class="reference external" href="https://trac.dataone.org/ticket/439">https://trac.dataone.org/ticket/439</a></p>
</div>
<div class="section" id="id5">
<h4>Discussion<a class="headerlink" href="#id5" title="Permalink to this headline">¶</a></h4>
</div>
</div>
<div class="section" id="requirement-3-record-specific-types-of-relationships-between-objects">
<h3>Requirement 3: Record specific types of relationships between objects<a class="headerlink" href="#requirement-3-record-specific-types-of-relationships-between-objects" title="Permalink to this headline">¶</a></h3>
<div class="section" id="id6">
<h4>Description<a class="headerlink" href="#id6" title="Permalink to this headline">¶</a></h4>
</div>
<div class="section" id="id7">
<h4>What PREMIS suggests<a class="headerlink" href="#id7" title="Permalink to this headline">¶</a></h4>
<p>PREMIS suggest the system record as semantic units to define structural,
derivaton and dependency relationships. For structural relationships, which
&#8220;show relationships between parts of objects&#8221; (p.13), characterizations of these
relationship types are recorded with a description of the relationship type,
such as &#8220;structural&#8221; (1.10.1), relationship sub-type such as &#8220;is a part of&#8221;
(1.10.2), and the unique identifier of the related object (1.10.3).</p>
<p>Derivative relationships &#8220;result from the replication or transformation of an
object&#8221; (p.13). Because this type of relationship involves an event, the system
must record the unique event identifier (1.10.4).</p>
<p>Dependency relationships exist &#8220;when one object requires another to support its
functino,m delivery, or coherence of content&#8221;. Examples include a data type
definition needed to render another file or modules needed by a software program
that is required to render an object. These relationships are characterized in
1.8.4 &#8220;dependency&#8221; and 1.8.5.5 &#8220;swDependency&#8221; respectively.</p>
</div>
<div class="section" id="id8">
<h4>What BagIt and OAI-ORE provide<a class="headerlink" href="#id8" title="Permalink to this headline">¶</a></h4>
</div>
<div class="section" id="id9">
<h4>DataONE use cases and requirements<a class="headerlink" href="#id9" title="Permalink to this headline">¶</a></h4>
<p>(Requirement) Identifiers for all objects <a class="reference external" href="https://trac.dataone.org/ticket/317">https://trac.dataone.org/ticket/317</a></p>
<p>(Requirement) Support arbitrary unique identifiers
<a class="reference external" href="https://trac.dataone.org/ticket/385">https://trac.dataone.org/ticket/385</a></p>
<p>(Requirement) Identifiers always refer to the same object
<a class="reference external" href="https://trac.dataone.org/ticket/412">https://trac.dataone.org/ticket/412</a></p>
</div>
<div class="section" id="id10">
<h4>Discussion<a class="headerlink" href="#id10" title="Permalink to this headline">¶</a></h4>
</div>
</div>
<div class="section" id="requirement-4-support-digital-object-discovery">
<h3>Requirement 4: Support digital object discovery<a class="headerlink" href="#requirement-4-support-digital-object-discovery" title="Permalink to this headline">¶</a></h3>
<div class="section" id="id11">
<h4>Description<a class="headerlink" href="#id11" title="Permalink to this headline">¶</a></h4>
<p>Digital object discovery by DataONE users is supported primarily by the
descriptive metadata associated with data objects ingested into DataONE. The
DataONE design refers to this metadata as &#8220;science metadata&#8221; (DataONE, 2010a).</p>
<p>Other digital object scenarios should also be considered. For example, when
managing digital objects for long-term curation and stewardship, DataONE
personnel and processes may use the system metadata (DataONE, 2010b) as the
means for digital object discovery.</p>
</div>
<div class="section" id="id12">
<h4>What PREMIS suggests<a class="headerlink" href="#id12" title="Permalink to this headline">¶</a></h4>
<p>PREMIS defines descriptive metadata as &#8221;...metadata ... used to describe
Intellectual Entities&#8221; (p.23), and assumes that which in DataONE maps to the
science metadata submitted to the system.</p>
</div>
<div class="section" id="id13">
<h4>What BagIt and OAI-ORE provide<a class="headerlink" href="#id13" title="Permalink to this headline">¶</a></h4>
</div>
<div class="section" id="id14">
<h4>DataONE use cases and requirements<a class="headerlink" href="#id14" title="Permalink to this headline">¶</a></h4>
<p>DataONE Use Case 33 - Search for Data
(<a class="reference external" href="http://mule1.dataone.org/ArchitectureDocs/UseCases/33_uc.html">http://mule1.dataone.org/ArchitectureDocs/UseCases/33_uc.html</a>)</p>
<p>(Requirement) Enable efficient mechanisms for users to discover content
<a class="reference external" href="https://trac.dataone.org/ticket/384">https://trac.dataone.org/ticket/384</a></p>
</div>
<div class="section" id="id15">
<h4>Discussion<a class="headerlink" href="#id15" title="Permalink to this headline">¶</a></h4>
</div>
</div>
<div class="section" id="requirement-5-support-digital-object-re-use">
<h3>Requirement 5: Support digital object re-use<a class="headerlink" href="#requirement-5-support-digital-object-re-use" title="Permalink to this headline">¶</a></h3>
<div class="section" id="id16">
<h4>Description<a class="headerlink" href="#id16" title="Permalink to this headline">¶</a></h4>
<p>Relationships , entities , citation, life science identifiers [exchange of
digital objects between repositories? METS?]</p>
</div>
<div class="section" id="id17">
<h4>What PREMIS suggests<a class="headerlink" href="#id17" title="Permalink to this headline">¶</a></h4>
<p>Potential users of digital objects need to know of any structural, derivative,
and dependency relationships properties in order to re-use an object. For
example, databases are often stored in repsotiories as two files: one for
content and oen for the schema. The user needs to access both files to re-use
the databse. The suggested PREMIS semantic units for relationships are described
under general requirement 3. Citation and persistent identifiers, such as LSIDs,
are not addressed in PREMIS.</p>
</div>
<div class="section" id="id18">
<h4>What BagIt and OAI-ORE provide<a class="headerlink" href="#id18" title="Permalink to this headline">¶</a></h4>
</div>
<div class="section" id="id19">
<h4>DataONE use cases and requirements<a class="headerlink" href="#id19" title="Permalink to this headline">¶</a></h4>
<p>(Requirement) Enable efficient mechanisms for users to discover content
<a class="reference external" href="https://trac.dataone.org/ticket/384">https://trac.dataone.org/ticket/384</a></p>
</div>
</div>
</div>
<div class="section" id="id20">
<h2>Discussion<a class="headerlink" href="#id20" title="Permalink to this headline">¶</a></h2>
<div class="section" id="requirement-6-record-software-and-hardware-specifications-for-future-object-rendering">
<h3>Requirement 6: Record software and hardware specifications for future object rendering<a class="headerlink" href="#requirement-6-record-software-and-hardware-specifications-for-future-object-rendering" title="Permalink to this headline">¶</a></h3>
<div class="section" id="id21">
<h4>Description<a class="headerlink" href="#id21" title="Permalink to this headline">¶</a></h4>
<p>Emulation is a core preservation strategy for digital objects.</p>
</div>
<div class="section" id="id22">
<h4>What PREMIS suggests<a class="headerlink" href="#id22" title="Permalink to this headline">¶</a></h4>
<p>PREMIS provides the notion of a representation to as &#8220;the set of files required&#8221;
to &#8220;maintain usable versions of intellectual entities over time&#8221; (p. 8).
Emulation is one preservation approach to ensure long-term usability of digital
objects. To emulate a digital object whose format is obsolete, the DataONE
system must record information that characterizes both the software (1.8.5) and
hardware (1.8.6) environent for each object. PREMIS requires software/hardware
name and type to be recorded, while software version (1.8.5.2), software
components needed by the software (1.8.5.5), and other information are optional.</p>
</div>
<div class="section" id="id23">
<h4>What BagIt and OAI-ORE provide<a class="headerlink" href="#id23" title="Permalink to this headline">¶</a></h4>
</div>
<div class="section" id="id24">
<h4>DataONE use cases and requirements<a class="headerlink" href="#id24" title="Permalink to this headline">¶</a></h4>
<p>(Requirement) The infrastructure must support long term preservation of data
<a class="reference external" href="https://trac.dataone.org/ticket/410">https://trac.dataone.org/ticket/410</a></p>
<p>(Requirement) Maintain original copies of all science metadata
<a class="reference external" href="https://trac.dataone.org/ticket/439">https://trac.dataone.org/ticket/439</a></p>
</div>
<div class="section" id="id25">
<h4>Discussion<a class="headerlink" href="#id25" title="Permalink to this headline">¶</a></h4>
</div>
</div>
<div class="section" id="requirement-7-record-provenance-information-e-g-prinicpal-timestamp-event-rights">
<h3>Requirement 7: Record provenance information (e.g., prinicpal, timestamp, event, rights)<a class="headerlink" href="#requirement-7-record-provenance-information-e-g-prinicpal-timestamp-event-rights" title="Permalink to this headline">¶</a></h3>
<div class="section" id="id26">
<h4>Description<a class="headerlink" href="#id26" title="Permalink to this headline">¶</a></h4>
<p>Recording provenance allows users of digital objects to follow who has created
and acted upon the object, what action was taken, and when the action occured.
PREMIS uses associations between events and objects to record provenance.
PREMIS, however, leaves decisions on which events are worthy of recording to the
preservation system.</p>
</div>
<div class="section" id="id27">
<h4>What PREMIS suggests<a class="headerlink" href="#id27" title="Permalink to this headline">¶</a></h4>
<p>PREMIS states that provenance is one of the many attributes necessary for a
digital object to be authentic (pg. 200); however, because demonstrating
provenance involves many semantic units, it deserves to be its own requirement
rather than a sub-requirement for authenticity [bad justification?]. The DataONE
systems would capture provenance by recording who is doing what to the digital
object through time. This includes recording information on the unique object
identifier (1.1), the original name of the object if it was not created within
the repository (1.6), and any relationships this item has with other digital
objects such as &#8220;is a source of&#8221; (1.10). The majority of semantic units
necessary to record provenance come from the event entity. The system will need
to create a unique identifier for each event (2.1), describe the event type
taken from a controlled vocabulary, (e.g. migration and ingestion)(2.1), and
record when the event occurred (2.3). Optionally, ir could store details about
the event, which are non-machine readable (2.4), and any information on the
success of the event (2.5).</p>
</div>
<div class="section" id="id28">
<h4>What BagIt and OAI-ORE provide<a class="headerlink" href="#id28" title="Permalink to this headline">¶</a></h4>
</div>
<div class="section" id="id29">
<h4>DataONE use cases and requirements<a class="headerlink" href="#id29" title="Permalink to this headline">¶</a></h4>
<p>(Requirement) Identifiers for all objects <a class="reference external" href="https://trac.dataone.org/ticket/317">https://trac.dataone.org/ticket/317</a></p>
<p>(Requirement) Support arbitrary unique identifiers
<a class="reference external" href="https://trac.dataone.org/ticket/385">https://trac.dataone.org/ticket/385</a></p>
<p>(Requirement) Consistent mechanism for identifying users
<a class="reference external" href="https://trac.dataone.org/ticket/390">https://trac.dataone.org/ticket/390</a></p>
<p>(Requirement) Identifiers always refer to the same object
<a class="reference external" href="https://trac.dataone.org/ticket/412">https://trac.dataone.org/ticket/412</a></p>
</div>
<div class="section" id="id30">
<h4>Discussion<a class="headerlink" href="#id30" title="Permalink to this headline">¶</a></h4>
</div>
</div>
<div class="section" id="requirement-8-record-information-to-ensure-viability-of-preserved-objects">
<h3>Requirement 8: Record information to ensure viability of preserved objects<a class="headerlink" href="#requirement-8-record-information-to-ensure-viability-of-preserved-objects" title="Permalink to this headline">¶</a></h3>
<div class="section" id="id31">
<h4>Description<a class="headerlink" href="#id31" title="Permalink to this headline">¶</a></h4>
</div>
<div class="section" id="id32">
<h4>What PREMIS suggests<a class="headerlink" href="#id32" title="Permalink to this headline">¶</a></h4>
<p>PREMIS defines viability as the &#8220;property of being readable from media&#8221;. The
PREMIS working group intentionally avoided defining detailed semanitc units for
viability with the exception of 1.7.2, storage media, where the medium for
storing an object is defined. More detailed information on media would likely be
desirable so that repository managers would know when to refresh the medium.</p>
</div>
<div class="section" id="id33">
<h4>What BagIt and OAI-ORE provide<a class="headerlink" href="#id33" title="Permalink to this headline">¶</a></h4>
</div>
<div class="section" id="id34">
<h4>DataONE use cases and requirements<a class="headerlink" href="#id34" title="Permalink to this headline">¶</a></h4>
<p>(Requirement) The infrastructure must support long term preservation of data
<a class="reference external" href="https://trac.dataone.org/ticket/410">https://trac.dataone.org/ticket/410</a></p>
<p>(Requirement) Maintain original copies of all science metadata
<a class="reference external" href="https://trac.dataone.org/ticket/439">https://trac.dataone.org/ticket/439</a></p>
</div>
<div class="section" id="id35">
<h4>Discussion<a class="headerlink" href="#id35" title="Permalink to this headline">¶</a></h4>
</div>
</div>
<div class="section" id="requirement-9-record-information-to-ensure-authenticity-of-preserved-objects">
<h3>Requirement 9: Record information to ensure authenticity of preserved objects<a class="headerlink" href="#requirement-9-record-information-to-ensure-authenticity-of-preserved-objects" title="Permalink to this headline">¶</a></h3>
<div class="section" id="id36">
<h4>Description<a class="headerlink" href="#id36" title="Permalink to this headline">¶</a></h4>
<p>Authenticity is the &#8220;quality of being what it purports to be&#8221;. This includes the
conepts of fixity, integrity, and the use of digital signatures.</p>
</div>
<div class="section" id="id37">
<h4>What PREMIS suggests<a class="headerlink" href="#id37" title="Permalink to this headline">¶</a></h4>
<p>PREMIS has many semantic units that can be used as evidence of an object&#8217;s
authenticity (1.5 and its sub-units). It is mandatory to record either format
designation of the object from a controlled vocabulary (e.g. base64 or Adobe
PDF)(1.5.4) or identify the format type through reference to a format registry
(1.5.4.2). It is recommended, though optional, that the DataONE system record
the message digest (1.5.2.2), the specific algorithm used to create the message
digest (1.5.2.1), and who created the original digest (1.5.2.3).</p>
<p>Digital signature information is an optional unit in PREMIS (1.9). [&#8220;A
repository may have a policy of generating digital signatures for files on
ingest, or may have a need to store and later validate incoming digital
signatures&#8221;. Which is it for DataONE or is it both?] To use digital signatures
the system need to record the signature value (1.9.1.4), the &#8220;designation for
the encryption and hash algorithms used for signature generation&#8221; (1.9.1.3), the
rules for validating the signature (1.9.1.5), the encoding used for the
singature (1.9.1.1), the signer&#8217;s public key (1.9.1.7) and who created the
signature (1.9.1.2 or 3.1). [Should recording the object&#8217;s size be a requirement
for authenticity? It is a characteristic, but I think it is more important for
ensuring that a replication was successful]</p>
<p>The semantic unit 1.5 is used to record object characteristics, but
demonstrating that the object characteristics are in fact valid occurs through
events. For example, performing regular fixity checks is captured through the
units event identifier (2.1), event type such as &#8220;fixity check&#8221; (2.2), and event
date (2.3). Digital signature validation and format validation are also types of
events that need to be recorded to show authenticity (2.3).</p>
</div>
<div class="section" id="id38">
<h4>What BagIt and OAI-ORE provide<a class="headerlink" href="#id38" title="Permalink to this headline">¶</a></h4>
</div>
<div class="section" id="id39">
<h4>DataONE use cases and requirements<a class="headerlink" href="#id39" title="Permalink to this headline">¶</a></h4>
<p>(Requirement) The infrastructure must support long term preservation of data
<a class="reference external" href="https://trac.dataone.org/ticket/410">https://trac.dataone.org/ticket/410</a></p>
<p>(Requirement) Maintain original copies of all science metadata
<a class="reference external" href="https://trac.dataone.org/ticket/439">https://trac.dataone.org/ticket/439</a></p>
</div>
<div class="section" id="id40">
<h4>Discussion<a class="headerlink" href="#id40" title="Permalink to this headline">¶</a></h4>
</div>
</div>
<div class="section" id="requirement-10-ensure-that-principals-are-authenticated">
<h3>Requirement 10: Ensure that principals are authenticated<a class="headerlink" href="#requirement-10-ensure-that-principals-are-authenticated" title="Permalink to this headline">¶</a></h3>
<div class="section" id="id41">
<h4>Description<a class="headerlink" href="#id41" title="Permalink to this headline">¶</a></h4>
<p>Software, organization. public key,</p>
</div>
<div class="section" id="id42">
<h4>What PREMIS suggests<a class="headerlink" href="#id42" title="Permalink to this headline">¶</a></h4>
<p>Principals are called agents in PREMIS. They are associated with either events
that occur to a digital object or the rights associated with an object, but they
are never directly linked to an object. PREMIS only has one required semantic
unit for principal, which is agentIdentifier (3.1). Other optional units used to
describe an agent include name (3.2) and type such as organization, software or
person (3.3). The PREMIS Data Dictionary suggests that systems use digital
signatures for authenticating submitters to and distributors from the system;
however, because validation takes place right after signing, there is no need
for the respository to preserve the signature itself through time. The system
can record the act of validation as an Event if desired.</p>
</div>
<div class="section" id="id43">
<h4>What BagIt and OAI-ORE provide<a class="headerlink" href="#id43" title="Permalink to this headline">¶</a></h4>
</div>
<div class="section" id="id44">
<h4>DataONE use cases and requirements<a class="headerlink" href="#id44" title="Permalink to this headline">¶</a></h4>
<p>(Requirement) Consistent mechanism for identifying users
<a class="reference external" href="https://trac.dataone.org/ticket/390">https://trac.dataone.org/ticket/390</a></p>
<p>(Requirement) Enable different classes of users commensurate with their roles
<a class="reference external" href="https://trac.dataone.org/ticket/391">https://trac.dataone.org/ticket/391</a></p>
</div>
<div class="section" id="id45">
<h4>Discussion<a class="headerlink" href="#id45" title="Permalink to this headline">¶</a></h4>
</div>
</div>
</div>
<div class="section" id="conclusion">
<h2>Conclusion<a class="headerlink" href="#conclusion" title="Permalink to this headline">¶</a></h2>
</div>
<div class="section" id="references">
<h2>References<a class="headerlink" href="#references" title="Permalink to this headline">¶</a></h2>
<p>Boyko, A., Kunze, J., Littman, J., Madden, L., Vargas, B. (2009). The BagIt File
Packaging Format (V0.96). Retrieved April 2, 2010, from
<a class="reference external" href="http://www.ietf.org/Internet-drafts/draft-kunze-bagit-04.txt">http://www.ietf.org/Internet-drafts/draft-kunze-bagit-04.txt</a></p>
<p>DataONE. (2010a). Metadata Attributes for Discovery. Retrieved April 2, 2010,
from <a class="reference external" href="http://mule1.dataone.org/ArchitectureDocs/SearchMetadata.html">http://mule1.dataone.org/ArchitectureDocs/SearchMetadata.html</a>.</p>
<p>DataONE. (2010b). System Metadata. Retrieved April 2, 2010, from
<a class="reference external" href="http://mule1.dataone.org/ArchitectureDocs/SystemMetadata.html">http://mule1.dataone.org/ArchitectureDocs/SystemMetadata.html</a>.</p>
<p>Lagoze, C., Van de Sompel, H., Johnston, P., Nelson, M., Sanderson, R., Warner,
S. (2008). Open Archives Initiative Object Reuse and Exchange: ORE User Guide -
Primer. Retrieved April 2, 2010, from
<a class="reference external" href="http://www.openarchives.org/ore/1.0/primer">http://www.openarchives.org/ore/1.0/primer</a>.</p>
<p>PREMIS Editorial Committee. (2008). Data Dictionary for Preservation
Metadata: PREMIS version 2.0. S.l. Retrieved April 2, 2010, from
<a class="reference external" href="http://www.loc.gov/standards/premis/v2/premis-2-0.pdf">http://www.loc.gov/standards/premis/v2/premis-2-0.pdf</a>.</p>
</div>
</div>


          </div>
        </div>
      </div>
      <div class="sphinxsidebar" role="navigation" aria-label="main navigation">
        <div class="sphinxsidebarwrapper">
    <p class="logo"><a href="http://dataone.org">
      <img class="logo" src="../_static/dataone_logo.png" alt="Logo"/>
    </a></p>
  <h3><a href="../index.html">Table Of Contents</a></h3>
  <ul>
<li><a class="reference internal" href="#">Getting a Handle on Systems Metadata for the Long Haul</a><ul>
<li><a class="reference internal" href="#introduction">Introduction</a></li>
<li><a class="reference internal" href="#system-metadata-requirements">System Metadata Requirements</a><ul>
<li><a class="reference internal" href="#requirement-1-perform-replication-on-digital-objects">Requirement 1: Perform replication on digital objects</a><ul>
<li><a class="reference internal" href="#description">Description</a></li>
<li><a class="reference internal" href="#what-premis-suggests">What PREMIS suggests</a></li>
<li><a class="reference internal" href="#what-bagit-and-oai-ore-provide">What BagIt and OAI-ORE provide</a></li>
<li><a class="reference internal" href="#dataone-use-cases-and-requirements">DataONE use cases and requirements</a></li>
<li><a class="reference internal" href="#discussion">Discussion</a></li>
</ul>
</li>
<li><a class="reference internal" href="#requirement-2-perform-preservation-migration">Requirement 2: Perform preservation migration</a><ul>
<li><a class="reference internal" href="#id1">Description</a></li>
<li><a class="reference internal" href="#id2">What PREMIS suggests</a></li>
<li><a class="reference internal" href="#id3">What BagIt and OAI-ORE provide</a></li>
<li><a class="reference internal" href="#id4">DataONE use cases and requirements</a></li>
<li><a class="reference internal" href="#id5">Discussion</a></li>
</ul>
</li>
<li><a class="reference internal" href="#requirement-3-record-specific-types-of-relationships-between-objects">Requirement 3: Record specific types of relationships between objects</a><ul>
<li><a class="reference internal" href="#id6">Description</a></li>
<li><a class="reference internal" href="#id7">What PREMIS suggests</a></li>
<li><a class="reference internal" href="#id8">What BagIt and OAI-ORE provide</a></li>
<li><a class="reference internal" href="#id9">DataONE use cases and requirements</a></li>
<li><a class="reference internal" href="#id10">Discussion</a></li>
</ul>
</li>
<li><a class="reference internal" href="#requirement-4-support-digital-object-discovery">Requirement 4: Support digital object discovery</a><ul>
<li><a class="reference internal" href="#id11">Description</a></li>
<li><a class="reference internal" href="#id12">What PREMIS suggests</a></li>
<li><a class="reference internal" href="#id13">What BagIt and OAI-ORE provide</a></li>
<li><a class="reference internal" href="#id14">DataONE use cases and requirements</a></li>
<li><a class="reference internal" href="#id15">Discussion</a></li>
</ul>
</li>
<li><a class="reference internal" href="#requirement-5-support-digital-object-re-use">Requirement 5: Support digital object re-use</a><ul>
<li><a class="reference internal" href="#id16">Description</a></li>
<li><a class="reference internal" href="#id17">What PREMIS suggests</a></li>
<li><a class="reference internal" href="#id18">What BagIt and OAI-ORE provide</a></li>
<li><a class="reference internal" href="#id19">DataONE use cases and requirements</a></li>
</ul>
</li>
</ul>
</li>
<li><a class="reference internal" href="#id20">Discussion</a><ul>
<li><a class="reference internal" href="#requirement-6-record-software-and-hardware-specifications-for-future-object-rendering">Requirement 6: Record software and hardware specifications for future object rendering</a><ul>
<li><a class="reference internal" href="#id21">Description</a></li>
<li><a class="reference internal" href="#id22">What PREMIS suggests</a></li>
<li><a class="reference internal" href="#id23">What BagIt and OAI-ORE provide</a></li>
<li><a class="reference internal" href="#id24">DataONE use cases and requirements</a></li>
<li><a class="reference internal" href="#id25">Discussion</a></li>
</ul>
</li>
<li><a class="reference internal" href="#requirement-7-record-provenance-information-e-g-prinicpal-timestamp-event-rights">Requirement 7: Record provenance information (e.g., prinicpal, timestamp, event, rights)</a><ul>
<li><a class="reference internal" href="#id26">Description</a></li>
<li><a class="reference internal" href="#id27">What PREMIS suggests</a></li>
<li><a class="reference internal" href="#id28">What BagIt and OAI-ORE provide</a></li>
<li><a class="reference internal" href="#id29">DataONE use cases and requirements</a></li>
<li><a class="reference internal" href="#id30">Discussion</a></li>
</ul>
</li>
<li><a class="reference internal" href="#requirement-8-record-information-to-ensure-viability-of-preserved-objects">Requirement 8: Record information to ensure viability of preserved objects</a><ul>
<li><a class="reference internal" href="#id31">Description</a></li>
<li><a class="reference internal" href="#id32">What PREMIS suggests</a></li>
<li><a class="reference internal" href="#id33">What BagIt and OAI-ORE provide</a></li>
<li><a class="reference internal" href="#id34">DataONE use cases and requirements</a></li>
<li><a class="reference internal" href="#id35">Discussion</a></li>
</ul>
</li>
<li><a class="reference internal" href="#requirement-9-record-information-to-ensure-authenticity-of-preserved-objects">Requirement 9: Record information to ensure authenticity of preserved objects</a><ul>
<li><a class="reference internal" href="#id36">Description</a></li>
<li><a class="reference internal" href="#id37">What PREMIS suggests</a></li>
<li><a class="reference internal" href="#id38">What BagIt and OAI-ORE provide</a></li>
<li><a class="reference internal" href="#id39">DataONE use cases and requirements</a></li>
<li><a class="reference internal" href="#id40">Discussion</a></li>
</ul>
</li>
<li><a class="reference internal" href="#requirement-10-ensure-that-principals-are-authenticated">Requirement 10: Ensure that principals are authenticated</a><ul>
<li><a class="reference internal" href="#id41">Description</a></li>
<li><a class="reference internal" href="#id42">What PREMIS suggests</a></li>
<li><a class="reference internal" href="#id43">What BagIt and OAI-ORE provide</a></li>
<li><a class="reference internal" href="#id44">DataONE use cases and requirements</a></li>
<li><a class="reference internal" href="#id45">Discussion</a></li>
</ul>
</li>
</ul>
</li>
<li><a class="reference internal" href="#conclusion">Conclusion</a></li>
<li><a class="reference internal" href="#references">References</a></li>
</ul>
</li>
</ul>
<h3>Related Topics</h3>
<ul>
  <li><a href="../index.html">Documentation Overview</a><ul>
  <li><a href="index.html">&lt;no title&gt;</a><ul>
      <li>Previous: <a href="SystemMetadata.html" title="previous chapter">System Metadata</a></li>
      <li>Next: <a href="SysmetaLifecycle.html" title="next chapter">Natural History of System Metadata</a></li>
  </ul></li>
  </ul></li>
</ul>
<div id="searchbox" style="display: none" role="search">
  <h3>Quick search</h3>
    <form class="search" action="../search.html" method="get">
      <div><input type="text" name="q" /></div>
      <div><input type="submit" value="Go" /></div>
      <input type="hidden" name="check_keywords" value="yes" />
      <input type="hidden" name="area" value="default" />
    </form>
</div>
<script type="text/javascript">$('#searchbox').show(0);</script>
        </div>
      </div>
      <div class="clearer"></div>
    </div>

    <div class="footer">
      <div id="copyright">
      &copy; Copyright <a href="http://www.dataone.org">2009-2017, DataONE</a>.
        [ <a href="../_sources/design/SystemMetadataAnalysis.txt"
               rel="nofollow">Page Source</a> |
          <a href='https://redmine.dataone.org/projects/d1/repository/changes/documents/Projects/cicore/architecture/api-documentation/source/design/SystemMetadataAnalysis.txt'
            rel="nofollow">Revision History</a> ]&nbsp;&nbsp;
      </div>
      <div id="acknowledgement">
        <p>This material is based upon work supported by the National Science Foundation
          under Grant Numbers <a href="http://www.nsf.gov/awardsearch/showAward?AWD_ID=0830944">083094</a> and <a href="http://www.nsf.gov/awardsearch/showAward?AWD_ID=1430508">1430508</a>.</p>
        <p>Any opinions, findings, and conclusions or recommendations expressed in this
           material are those of the author(s) and do not necessarily reflect the views
           of the National Science Foundation.</p>
      </div>
    </div>
    <!--
    <hr />
     <div id="HCB_comment_box"><a href="http://www.htmlcommentbox.com">HTML Comment Box</a> is loading comments...</div>
     <link rel="stylesheet" type="text/css" href="_static/skin.css" />
     <script type="text/javascript" language="javascript" id="hcb">
     /*<! -*/
     (function()
     {s=document.createElement("script");
     s.setAttribute("type","text/javascript");
     s.setAttribute("src", "http://www.htmlcommentbox.com/jread?page="+escape((typeof hcb_user !== "undefined" && hcb_user.PAGE)||(""+window.location)).replace("+","%2B")+"&mod=%241%24wq1rdBcg%24Gg8J5iYSHJWwAJtlYu/yU."+"&opts=21407&num=10");
     if (typeof s!="undefined") document.getElementsByTagName("head")[0].appendChild(s);})();
      /* ->*/
     </script>
   -->
  </body>
</html>