Natural History of System Metadata
==================================

:Revisions:
  ======== ==================================================================
  Date     Comment
  ======== ==================================================================
  20100303 (D.V.) Initial draft
  20100304 (D.V.) Revised draft and added scenario where system 
           metadata is stored only on the Coordinating Nodes

           Corrected DataSysMetadataModified changes in scenario B. 
  ======== ==================================================================

This document describes the generation of system metadata associated with a
data object and its associated science metadata document as they are added to
a DataONE Member Node and subsequently propagated through the DataONE
infrastructure.

.. contents::
   :depth: 1
   :local:


Process Overview
----------------

There are two scenarios examined here: A) the system metadata is stored on all
nodes (MN and CN) and B) system metadata is stored only on the CNs. The general
sequence of operations is the same for both scenarios with the exclusion of a
couple of steps from scenario B. 

1. Scientist using an ITK tool generates some science data and some science
   metadata describing the science data.

2. The ITK populates the required fields of the system metadata documents.

3. Not necessarily in this order: 

   a) The ITK uploads the science data

   b) The ITK uploads the science metadata

   On receipt, the MN updates the system metadata fields for which it is
   responsible.

4. The CN detects new content is available on the MN using the listObjects()
   method.

5. Copies of the science metadata and the two (incomplete) system metadata
   documents are retrieved by the CN.

   :A: The CN updates the system metadata replication information, and indicates
       to the MN that it should update its copy of the system metadata objects.

   :B: The CN updates the system metadata replication information.

6. The CN issues a replication order to an MN.

   :A: The recipient MN retrieves copies of the system metadata and 
       associated objects (science data and metadata)

   :B: The recipient MN retrieves copies of the various objects indicated 
       by the CN

7. The recipient MN requests content from the origin MN, which in turn
   verifies the request with the CN.

   :A: The origin MN and the CN update copies of system metadata

   :B: The CN updates system metadata


8. After verifying the successful transfer of content, the recipient MN
   indicates completion to the CN.

   :A: The CN updates the system metadata for the objects and ensures that the
       MN copies of the system metadata are accurate.

   :B: The CN updates the system metadata for the objects.


The change in system metadata state is shown in the following process detail
sections. In each case, the identifiers used for the various elements are
indicated in table 1.

.. table:: Table 1. Identifiers used in the example sequence.

   =============== ================================================
   Identifier      Description
   =============== ================================================
   sciD.1          Science Data
   sciM.1          Science metadata describing sciD.1
   sysD.1          System metadata for object sciD.1
   sysM.1          System metadata for object sciM.1
   cn1.dataone.org A Coordinating Node
   mn1.dataone.org Member Node that is initial recipient of data
   mn2.dataone.org Member Node that receives a copy of the objects
   =============== ================================================


The following fictitious script provides an example of how uploading content
to a Member Node using a DataONE client library might occur.

.. code-block:: python

  import sys
  import logging
  from pyd1.client import *
  from pyd1.exceptions import *
  
  d1client = pyd1.client(identity="/home/vieglais/.d1/credentials",
                         defaults="/home/vieglais/.d1/defaults.ini")
  d1client.login()
  
  # The science data and metadata as files on disk
  data_file = "/home/vieglais/data/mydata.csv"
  meta_file = "/home/vieglais/data/mymeta.xml"
  
  # Reserve a couple of identifiers. Note that this reservation will
  # be system-wide.
  try:
    data_id = d1client.reserve(u"sysD.1")
    meta_id = d1client.reserve(u"sysM.1")
  except IdentifierInUseException, e:
    logging.fatal(repr(e))
    sys.exit()
  
  # Generate stub system metadata documents. Default values
  # for access rules etc are loaded from defaults.ini
  sysm_data = d1client.generateSysMeta(id=data_id, 
                                       format="text/csv",
                                       describedBy=meta_id)
  sysm_meta = d1client.generateSysMeta(id=meta_id, 
                                       format="eml://ecoinformatics.org/eml-2.1.0",
                                       describedBy=data_id)
  # Open file handles for data and metadata
  fdata = file(data_file, "r")
  fmetadata = file(meta_file, "r")
  
  # Send the data to the Member Node
  target_node = 'mn1.dataone.org'
  data_sysid = d1client.create(data_id, fdata, sys_data,
                               target=target_node)
  meta_sysid = d1client.create(meta_id, fmeta, sys_meta,
                               target=target_node)
  fdata.close()
  fmeta.close()


Process Detail - Scenario A
---------------------------

Step A1.
~~~~~~~~

The scientist generates a data set and associated science metadata.

Step A2.
~~~~~~~~

Using tools in the ITK, the scientist generates two initial system metadata
documents, one describing the science metadata (``sysm_M.1``), one describing
the science data (``sysm_D.1``).

=========================== ========================================== ========================
Field                       sysD.1                                     sysM.1
=========================== ========================================== ========================
identifier                  sciD.1                                     sciM.1
objectFormat                text/csv                                   eml://ecoinformatics.org/eml-2.1.0
size                        1450238                                    8132
submitter                   uid=jones,o=NCEAS,dc=ecoinformatics,dc=org uid=jones,o=NCEAS,dc=ecoinformatics,dc=org
rightsHolder                uid=jones,o=NCEAS,dc=ecoinformatics,dc=org uid=jones,o=NCEAS,dc=ecoinformatics,dc=org
obsoletes                   \                                          \
obsoletedBy                 \                                          \
derivedFrom                 \                                          \
describes                   \                                          sciD.1
describedBy                 sciM.1                                     \
checksum                    2e01e17467891f7c933dbaa00e1459d23db3fe4f   9967467891f7c933dbaa00e1459d23db3f342
checksumAlgorithm           SHA-1                                      SHA-1
accessRule[1]               \                                          \
.rule                       Allow                                      Allow
.service                    Read                                       Read
.principal                  \*                                         \*
accessRule[2]               \                                          \
.rule                       Allow                                      Allow
.service                    Write                                      Write
.principal                  o=NCEAS,dc=ecoinformatics,dc=org           o=NCEAS,dc=ecoinformatics,dc=org
replicationPolicy           \                                          \
.preferredMemberNode        mn2.dataone.org                            mn2.dataone.org
.blockedMemberNode          \                                          \
.replicationAllowed         true                                       true
.numberReplicas             1                                          1 
dateUploaded                \                                          \
dateSysMetadataModified     \                                          \
originMemberNode            \                                          \
authoritativeMemberNode     \                                          \
replica[1]                  \                                          \
.replicaMemberNode          \                                          \
.replicationStatus          \                                          \
.replicaVerified            \                                          \
=========================== ========================================== ========================


Step A3.
~~~~~~~~

The client tool transmits the data, science metadata and two stub system
metadata documents to a Member Node. The Member Node updates values of the
system metadata for which it is responsible.

=========================== ========================================== ==========================================
Field                       sysD.1                                     sysM.1
=========================== ========================================== ==========================================
identifier                  sciD.1                                     sciM.1
objectFormat                text/csv                                   eml://ecoinformatics.org/eml-2.1.0
size                        1450238                                    8132
submitter                   uid=jones,o=NCEAS,dc=ecoinformatics,dc=org uid=jones,o=NCEAS,dc=ecoinformatics,dc=org
rightsHolder                uid=jones,o=NCEAS,dc=ecoinformatics,dc=org uid=jones,o=NCEAS,dc=ecoinformatics,dc=org
obsoletes                   \                                          \
obsoletedBy                 \                                          \
derivedFrom                 \                                          \
describes                   \                                          sciD.1
describedBy                 sciM.1                                     \
checksum                    2e01e17467891f7c933dbaa00e1459d23db3fe4f   9967467891f7c933dbaa00e1459d23db3f342
checksumAlgorithm           SHA-1                                      SHA-1
accessRule[1]               \                                          \
.rule                       Allow                                      Allow
.service                    Read                                       Read
.principal                  \*                                         \*
accessRule[2]               \                                          \
.rule                       Allow                                      Allow
.service                    Write                                      Write
.principal                  o=NCEAS,dc=ecoinformatics,dc=org           o=NCEAS,dc=ecoinformatics,dc=org
replicationPolicy           \                                          \
.preferredMemberNode        mn2.dataone.org                            mn2.dataone.org
.blockedMemberNode          \                                          \
.replicationAllowed         true                                       true
.numberReplicas             1                                          1 
dateUploaded                **2010-03-04T18:13:51.0Z**                 **2010-03-04T18:14:45.0Z**
dateSysMetadataModified     **2010-03-04T18:13:51.0Z**                 **2010-03-04T18:14:45.0Z**
originMemberNode            **mn1.dataone.org**                        **mn1.dataone.org**
authoritativeMemberNode     **mn1.dataone.org**                        **mn1.dataone.org**
replica[1]                  \                                          \
.replicaMemberNode          **mn1.dataone.org**                        **mn1.dataone.org**
.replicationStatus          **Queued**                                 **Queued**
.replicaVerified            \                                          \
=========================== ========================================== ==========================================


Since there are many copies of system metadata from this point onwards, only
the replication properties of the ``sysD.1`` document are considered,
summarized in a table as follows. MN1, CN, MN2 = copy of system metadata on
``mn1.dataone.org``, ``cn1.dataone.org``, ``mn2.dataone.org`` respectively.

========================== ====================== ====================== ======================
Property                   MN1                    CN                     MN2
========================== ====================== ====================== ======================
R[1].replicaMemberNode     mn1.dataone.org
R[1].replicationStatus     Queued
R[1].replicaVerified       
========================== ====================== ====================== ======================


Step A4.
~~~~~~~~

A CN detects new content available on ``mn1.dataone.org`` through the
listObjects() method.

There is no change of state in system metadata resulting from this operation.


Step A5.
~~~~~~~~

Copies of ``sciM.1``, ``sysM.1``, and ``sysD.1`` are retrieved to the CN, and
the system metadata objects are updated. There are three steps to this process:

1. The CN requests a copy of the objects.

========================== ====================== ====================== ======================
Property                   MN1                    CN                     MN2
========================== ====================== ====================== ======================
R[1].replicaMemberNode     mn1.dataone.org
R[1].replicationStatus     **Requested**
R[1].replicaVerified       
========================== ====================== ====================== ======================


2. The CN receives a copy of the metadata.

========================== ====================== ====================== ======================
Property                   MN1                    CN                     MN2
========================== ====================== ====================== ======================
R[1].replicaMemberNode     mn1.dataone.org        **mn1.dataone.org**
R[1].replicationStatus     Requested              **Requested**
R[1].replicaVerified       
========================== ====================== ====================== ======================


3. The CN verifies the checksums and indicates to the MN that the copy is
   complete, providing the time stamp that it used for the verification (ensuring 
   identical copies of the system metadata).

========================== ========================== ========================== ==========================
Property                   MN1                        CN                         MN2
========================== ========================== ========================== ==========================
R[1].replicaMemberNode     mn1.dataone.org            mn1.dataone.org
R[1].replicationStatus     **Completed**              **Completed**
R[1].replicaVerified       **2010-03-04T19:13:51.0Z** **2010-03-04T19:13:51.0Z**
========================== ========================== ========================== ==========================


Step A6.
~~~~~~~~

The CN initiates replication of content from ``mn1.dataone.org`` to ``mn2.dataone.org``.

1. The CN sends a replication request to MN2, indicating that it should
   retrieve the object from MN1.

========================== ====================== ====================== ======================
Property                   MN1                    CN                     MN2
========================== ====================== ====================== ======================
R[1].replicaMemberNode     mn1.dataone.org        mn1.dataone.org
R[1].replicationStatus     Completed              Completed
R[1].replicaVerified       2010-03-04T19:13:51.0Z 2010-03-04T19:13:51.0Z
R[2].replicaMemberNode                            **mn2.dataone.org**
R[2].replicationStatus                            **Queued**
R[2].replicaVerified       
========================== ====================== ====================== ======================


2. MN2 requests the content from MN1.

========================== ====================== ====================== ==========================
Property                   MN1                    CN                     MN2
========================== ====================== ====================== ==========================
R[1].replicaMemberNode     mn1.dataone.org        mn1.dataone.org        **mn1.dataone.org**
R[1].replicationStatus     Completed              Completed              **Completed**
R[1].replicaVerified       2010-03-04T19:13:51.0Z 2010-03-04T19:13:51.0Z **2010-03-04T19:13:51.0Z**
R[2].replicaMemberNode     **mn2.dataone.org**    mn2.dataone.org        **mn2.dataone.org**
R[2].replicationStatus     **Requested**          Queued                 **Requested**
R[2].replicaVerified       
========================== ====================== ====================== ==========================


3. MN1 checks with CN that the object was requested

========================== ====================== ====================== ======================
Property                   MN1                    CN                     MN2
========================== ====================== ====================== ======================
R[1].replicaMemberNode     mn1.dataone.org        mn1.dataone.org        mn1.dataone.org
R[1].replicationStatus     Completed              Completed              Completed
R[1].replicaVerified       2010-03-04T19:13:51.0Z 2010-03-04T19:13:51.0Z 2010-03-04T19:13:51.0Z
R[2].replicaMemberNode     mn2.dataone.org        mn2.dataone.org        mn2.dataone.org
R[2].replicationStatus     Requested              **Requested**          Requested
R[2].replicaVerified       
========================== ====================== ====================== ======================


4. MN2 verifies that the checksum is correct, then indicates to MN1 that the
   copy was correctly transferred, forwarding the timestamp for when the
   object was verified to MN1.

========================== ========================== ========================== ==========================
Property                   MN1                        CN                         MN2
========================== ========================== ========================== ==========================
R[1].replicaMemberNode     mn1.dataone.org            mn1.dataone.org            mn1.dataone.org
R[1].replicationStatus     Completed                  Completed                  Completed
R[1].replicaVerified       2010-03-04T19:13:51.0Z     2010-03-04T19:13:51.0Z     2010-03-04T19:13:51.0Z
R[2].replicaMemberNode     mn2.dataone.org            mn2.dataone.org            mn2.dataone.org
R[2].replicationStatus     **Completed**              Requested                  **Completed**
R[2].replicaVerified       **2010-03-04T20:00:00.0Z**                            **2010-03-04T20:00:00.0Z**
========================== ========================== ========================== ==========================


Step A8.
~~~~~~~~

MN2 informs the Coordinating Node that it has retrieved a valid copy of the
object from MN1, forwarding the time stamp for when the verification was made.  

========================== ====================== ========================== ======================
Property                   MN1                    CN                         MN2
========================== ====================== ========================== ======================
R[1].replicaMemberNode     mn1.dataone.org        mn1.dataone.org            mn1.dataone.org
R[1].replicationStatus     Completed              Completed                  Completed
R[1].replicaVerified       2010-03-04T19:13:51.0Z 2010-03-04T19:13:51.0Z     2010-03-04T19:13:51.0Z
R[2].replicaMemberNode     mn2.dataone.org        mn2.dataone.org            mn2.dataone.org
R[2].replicationStatus     Completed              **Completed**              Completed
R[2].replicaVerified       2010-03-04T20:00:00.0Z **2010-03-04T20:00:00.0Z** 2010-03-04T20:00:00.0Z
========================== ====================== ========================== ======================



Process Detail - Scenario B.
----------------------------

This scenario uses the notion that system metadata is stored on the CNs only.


Step B1.
~~~~~~~~

The scientist generates a data set and associated science metadata.

Step B2.
~~~~~~~~

Using tools in the ITK, the scientist generates two initial system metadata
documents, one describing the science metadata (``sysm_M.1``), one describing
the science data (``sysm_D.1``).

=========================== ========================================== ========================
Field                       sysD.1                                     sysM.1
=========================== ========================================== ========================
identifier                  sciD.1                                     sciM.1
objectFormat                text/csv                                   eml://ecoinformatics.org/eml-2.1.0
size                        1450238                                    8132
submitter                   uid=jones,o=NCEAS,dc=ecoinformatics,dc=org uid=jones,o=NCEAS,dc=ecoinformatics,dc=org
rightsHolder                uid=jones,o=NCEAS,dc=ecoinformatics,dc=org uid=jones,o=NCEAS,dc=ecoinformatics,dc=org
obsoletes
obsoletedBy
derivedFrom
describes                                                              sciD.1
describedBy                 sciM.1
checksum                    2e01e17467891f7c933dbaa00e1459d23db3fe4f   9967467891f7c933dbaa00e1459d23db3f342
checksumAlgorithm           SHA-1                                      SHA-1
accessRule[1]
.rule                       Allow                                      Allow
.service                    Read                                       Read
.principal                  \*                                         \*
accessRule[2]
.rule                       Allow                                      Allow
.service                    Write                                      Write
.principal                  o=NCEAS,dc=ecoinformatics,dc=org           o=NCEAS,dc=ecoinformatics,dc=org
replicationPolicy
.preferredMemberNode        mn2.dataone.org                            mn2.dataone.org
.blockedMemberNode
.replicationAllowed         true                                       true
.numberReplicas             1                                          1 
dateUploaded
dateSysMetadataModified
originMemberNode
authoritativeMemberNode
replica[1]
.replicaMemberNode
.replicationStatus
.replicaVerified
=========================== ========================================== ========================


Step B3.
~~~~~~~~

The client tool transmits the data, science metadata and two stub system
metadata documents to a Member Node (``mn1.dataone.org``). On receipt, the
Member Node sets the values of several fields as indicted in the table below.

=========================== ========================================== ==========================================
Field                       sysD.1                                     sysM.1
=========================== ========================================== ==========================================
identifier                  sciD.1                                     sciM.1
objectFormat                text/csv                                   eml://ecoinformatics.org/eml-2.1.0
size                        1450238                                    8132
submitter                   uid=jones,o=NCEAS,dc=ecoinformatics,dc=org uid=jones,o=NCEAS,dc=ecoinformatics,dc=org
rightsHolder                uid=jones,o=NCEAS,dc=ecoinformatics,dc=org uid=jones,o=NCEAS,dc=ecoinformatics,dc=org
obsoletes
obsoletedBy
derivedFrom
describes                                                              sciD.1
describedBy                 sciM.1
checksum                    2e01e17467891f7c933dbaa00e1459d23db3fe4f   9967467891f7c933dbaa00e1459d23db3f342
checksumAlgorithm           SHA-1                                      SHA-1
accessRule[1]
.ruleType                   Allow                                      Allow
.service                    Read                                       Read
.principal                  \*                                         \*
accessRule[2]
.ruleType                   Allow                                      Allow
.service                    Write                                      Write
.principal                  o=NCEAS,dc=ecoinformatics,dc=org           o=NCEAS,dc=ecoinformatics,dc=org
replicationPolicy
.preferredMemberNode        mn2.dataone.org                            mn2.dataone.org
.blockedMemberNode
.replicationAllowed         true                                       true
.numberReplicas             1                                          1 
dateUploaded                **2010-03-04T18:13:51.0Z**                 **2010-03-04T18:14:45.0Z**
dateSysMetadataModified     **2010-03-04T18:13:51.0Z**                 **2010-03-04T18:14:45.0Z**
originMemberNode            **mn1.dataone.org**                        **mn1.dataone.org**
authoritativeMemberNode     **mn1.dataone.org**                        **mn1.dataone.org**
replica[1]
.replicaMemberNode          **mn1.dataone.org**                        **mn1.dataone.org**
.replicationStatus          **Queued**                                 **Queued**
.replicaVerified
=========================== ========================================== ==========================================


Step B4.
~~~~~~~~

A CN detects new content available on ``mn1.dataone.org`` through
:func:`MN_replication.listObjects`.

There is no change of state in system metadata resulting from this operation.


Step B5.
~~~~~~~~

Copies of ``sciM.1``, ``sysM.1``, and ``sysD.1`` are retrieved to the
Coordinating Node, and the system metadata objects are updated on the
Coordinating Node. For sake of brevity, only the elements from
ReplicationPolicy onwards are shown in the remaining state tables since the
remainder of the content does not change.

=========================== ========================================== ==========================================
Field                       sysD.1                                     sysM.1
=========================== ========================================== ==========================================
identifier                  sciD.1                                     sciM.1
...
replicationPolicy
.preferredMemberNode        mn2.dataone.org                            mn2.dataone.org
.blockedMemberNode
.replicationAllowed         true                                       true
.numberReplicas             1                                          1 
dateUploaded                2010-03-04T18:13:51.0Z                     2010-03-04T18:14:45.0Z
dateSysMetadataModified     **2010-03-04T19:13:51.0Z**                 **2010-03-04T19:13:52.0Z**
originMemberNode            mn1.dataone.org                            mn1.dataone.org
authoritativeMemberNode     mn1.dataone.org                            mn1.dataone.org
replica[1]
.replicaMemberNode          mn1.dataone.org                            mn1.dataone.org
.replicationStatus          **Completed**                              **Completed**
.replicaVerified            **2010-03-04T19:13:51.0Z**                 **2010-03-04T19:13:52.0Z**
=========================== ========================================== ==========================================

Step B6.
~~~~~~~~

The Coordinating Node initiates replication of the objects ``sciM.1`` and
``sciD.1`` from ``mn1.dataone.org`` to ``mn2.dataone.org``. It does this by
calling the "replicate()" method on ``mn2.dataone.org``.

=========================== ========================================== ==========================================
Field                       sysD.1                                     sysM.1
=========================== ========================================== ==========================================
identifier                  sciD.1                                     sciM.1
...
replicationPolicy
.preferredMemberNode        mn2.dataone.org                            mn2.dataone.org
.blockedMemberNode
.replicationAllowed         true                                       true
.numberReplicas             1                                          1 
dateUploaded                2010-03-04T18:13:51.0Z                     2010-03-04T18:14:45.0Z
dateSysMetadataModified     **2010-03-04T20:58:00.0Z**                 **2010-03-04T20:58:00.0Z**
originMemberNode            mn1.dataone.org                            mn1.dataone.org
authoritativeMemberNode     mn1.dataone.org                            mn1.dataone.org
replica[1]
.replicaMemberNode          mn1.dataone.org                            mn1.dataone.org
.replicationStatus          Completed                                  Completed  
.replicaVerified            2010-03-04T19:13:51.0Z                     2010-03-04T19:13:52.0Z  
replica[2]
.replicaMemberNode          **mn2.dataone.org**                        **mn2.dataone.org**
.replicationStatus          **Queued**                                 **Queued**
.replicaVerified                                                       
=========================== ========================================== ==========================================


Step B7.
~~~~~~~~

The node ``mn2.dataone.org`` requests the objects ``sciM.1`` and ``sciD.1``
from ``mn1.dataone.org``. After verifying the request with a Coordinating Node
(which has a side effect of setting the system metadata ReplicationStatus to
Requested), ``mn1.dataone.org`` returns the objects. (Note that in actuality,
each object is requested individually.)

=========================== ========================================== ==========================================
Field                       sysD.1                                     sysM.1
=========================== ========================================== ==========================================
identifier                  sciD.1                                     sciM.1
...
replicationPolicy
.preferredMemberNode        mn2.dataone.org                            mn2.dataone.org
.blockedMemberNode
.replicationAllowed         true                                       true
.numberReplicas             1                                          1 
dateUploaded                2010-03-04T18:13:51.0Z                     2010-03-04T18:14:45.0Z
dateSysMetadataModified     **2010-03-04T20:59:00.0Z**                 **2010-03-04T20:59:00.0Z**
originMemberNode            mn1.dataone.org                            mn1.dataone.org
authoritativeMemberNode     mn1.dataone.org                            mn1.dataone.org
replica[1]
.replicaMemberNode          mn1.dataone.org                            mn1.dataone.org
.replicationStatus          Completed                                  Completed  
.replicaVerified            2010-03-04T19:13:51.0Z                     2010-03-04T19:13:52.0Z  
replica[2]
.replicaMemberNode          **mn2.dataone.org**                        **mn2.dataone.org**
.replicationStatus          **Requested**                              **Requested**
.replicaVerified                                                       
=========================== ========================================== ==========================================


Step B8.
~~~~~~~~

The node ``mn2.dataone.org`` verifies the retrieved objects against their
checksums and indicates to the Coordinating Node that transfer was successful.

=========================== ========================================== ==========================================
Field                       sysD.1                                     sysM.1
=========================== ========================================== ==========================================
identifier                  sciD.1                                     sciM.1
...
replicationPolicy
.preferredMemberNode        mn2.dataone.org                            mn2.dataone.org
.blockedMemberNode
.replicationAllowed         true                                       true
.numberReplicas             1                                          1 
dateUploaded                2010-03-04T18:13:51.0Z                     2010-03-04T18:14:45.0Z
dateSysMetadataModified     **2010-03-04T21:00:00.0Z**                 **2010-03-04T21:00:00.0Z**
originMemberNode            mn1.dataone.org                            mn1.dataone.org
authoritativeMemberNode     mn1.dataone.org                            mn1.dataone.org
replica[1]
.replicaMemberNode          mn1.dataone.org                            mn1.dataone.org
.replicationStatus          Completed                                  Completed  
.replicaVerified            2010-03-04T19:13:51.0Z                     2010-03-04T19:13:52.0Z  
replica[2]
.replicaMemberNode          mn2.dataone.org                            mn2.dataone.org
.replicationStatus          **Completed**                              **Completed**
.replicaVerified            **2010-03-04T21:00:00.0Z**                 **2010-03-04T21:00:00.0Z**
=========================== ========================================== ==========================================


Final State of System Metadata 
------------------------------

Assuming the replication policy was for one replica to be created for
``sciM.1`` and ``sciD.1``, then the eventual end state of the associated
system metadata documents (``sysM.1`` and ``sysD.1`` respectively) after
synchronization of the Coordinating Nodes, would be something like the
following.

Note that synchronization between Coordinating Nodes does not trigger an
update to the DataSysMetadataModified field. This is necessary to avoid a
situation where the state of the metadata is never consistent between
Coordinating Nodes.

Note also that even though the replication policy of ``sciD.1`` indicates that
NumberReplicas is one, there are actually three additional replicas stored on
the Coordinating Nodes.

=========================== ========================================== ==========================================
Field                       sysD.1                                     sysM.1
=========================== ========================================== ==========================================
identifier                  sciD.1                                     sciM.1
objectFormat                text/csv                                   eml://ecoinformatics.org/eml-2.1.0
size                        1450238                                    8132
submitter                   uid=jones,o=NCEAS,dc=ecoinformatics,dc=org uid=jones,o=NCEAS,dc=ecoinformatics,dc=org
rightsHolder                uid=jones,o=NCEAS,dc=ecoinformatics,dc=org uid=jones,o=NCEAS,dc=ecoinformatics,dc=org
obsoletes
obsoletedBy
derivedFrom
describes                                                              sciD.1
describedBy                 sciM.1
checksum                    2e01e17467891f7c933dbaa00e1459d23db3fe4f   9967467891f7c933dbaa00e1459d23db3f342
checksumAlgorithm           SHA-1                                      SHA-1
accessRule[1]
.ruleType                   Allow                                      Allow
.service                    Read                                       Read
.principal                  \*                                         \*
accessRule[2]
.ruleType                   Allow                                      Allow
.service                    Write                                      Write
.principal                  o=NCEAS,dc=ecoinformatics,dc=org           o=NCEAS,dc=ecoinformatics,dc=org
replicationPolicy
.preferredMemberNode        mn2.dataone.org                            mn2.dataone.org
.blockedMemberNode
.replicationAllowed         true                                       true
.numberReplicas             1                                          1 
dateUploaded                2010-03-04T18:13:51.0Z                     2010-03-04T18:14:45.0Z
dateSysMetadataModified     2010-03-04T18:13:51.0Z                     2010-03-04T18:14:45.0Z
originMemberNode            mn1.dataone.org                            mn1.dataone.org
authoritativeMemberNode     mn1.dataone.org                            mn1.dataone.org
replica[1]
.replicaMemberNode          mn1.dataone.org                            mn1.dataone.org
.replicationStatus          Completed                                  Completed  
.replicaVerified            2010-03-04T19:13:51.0Z                     2010-03-04T19:13:52.0Z  
replica[2]
.replicaMemberNode          mn2.dataone.org                            mn2.dataone.org
.replicationStatus          Completed                                  Completed  
.replicaVerified            2010-03-04T21:00:00.0Z                     2010-03-04T21:00:00.0Z
=========================== ========================================== ==========================================


Conclusions
-----------

Maintaining complete copies of system metadata on all nodes is cumbersome and
seems to be overly complicated. However, it is still necessary for Member Nodes
to provide accurate information for some elements of the system metadata (to
assist with transfer verification, to validate a data object is intact).

As such, it seems that scenario B above is the most tractable and provides the
desired functionality for tracking content in the DataONE infrastructure. The
role of system metadata on Member Nodes and Coordinating Nodes is further
clarified below.

The recommendations are:

- Complete, authoritative copies of system metadata are maintained on the
  Coordinating Nodes.

- A system metadata record becomes authoritative once the replicationStatus is
  set to complete after initial transfer of content to the Coordinating Node
  (step B5 above)

- Member Nodes still provide access to system metadata through the
  :func:`MN_crud.getSystemMetadata` calls, though the copy of system metadata
  returned is authoritative for the properties controlled by the Member Nodes
  only.

- Member Nodes should not return any of the :attr:`SystemMetdata.replica`
  attributes.