METADATA format registration
===========================
**Status:  initial draft**

Overview
~~~~~~~~
While DataONE's architecture is designed to accommodate any metadata format 
Member Nodes make use of, each new metadata format requires a bit of development
to enable DataONE's discovery mechanisms for those metadata documents.  Both 
Content Curator (usually a Member Node administrator) and DataONE developer effort
is required, and more significantly, a patch-level release of the CN software stack needs
to be performed so that content of the new format can be synchronized, indexed, and 
ultimately discovered.  The building, testing, and deploying the necessary items
to the CNs does necessitate a lag between when the new format is published and when 
content using it can be successfully created.  Accordingly, content curators 
making use of a new format, or a new version of an existing format, need to 
account for that in their own planning.

The process of registering a new metadata format involves the creation and testing of 
the following items::

1. a **published schema or DTD** (done by Content Curator)
2. an **indexing parser** (a DataONE developer responsibility)
3. an **XSLT template** (built by either, depending on time and ability )   // TODO: verify who's responsible

Once all are available and tested, the format can be fully registered into DataONE 
as a new object format.  

When done as part of a new Member Node deployment, it is good to plan for this 
work to be done early on, as final testing of the node requires that all objects 
use a registered format. 



Metadata Format Registration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Irrespective of Member Node deployment, registering a metadata format follows the same steps:

Content Curators:

1. develop and test their schema or DTD.  The schema or DTD needs to pass standard schema 
   validation tests that can be found at numerous testing services online (search
   for "online XML schema validation").
   
2. publish the schema such that the namespace and schemaLocation of the metadata 
   documents point to an immutable copy of the schema, where it can continue to be 
   resolved consistently indefinitely.  
   
3. contact DataONE via support@dataone.org, attaching example metadata documents,
   or providing a link to a test instance of the Member Node that contains them. 

DataONE developers:

4. test the schema format via the examples, iterating with the content curator on any bug fixes.

5. write an indexing parser and / or XSLT template.

6. test the indexing parser and XSLT template (in the DEV environment). 
 
7. Review test results with the content curator (show search results, and metadata visualizations)

8. Deploy indexing parser and XSLT templates and new object format record to additional 
   environments (STAGE and/or production)
   (Currently XSLT template is handed off to ONEMercury maintainers)
   
9. Notify content curator when work is done.

Content Curator can then start submitting metadata objects using the new format.

// TODO: who names the object format (gives the identifier?)

As part of Member Node deployment
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Deployment-phase testing of Member Nodes requires all metadata formats used by the 
prospective Member Node to be registered, so that the processes under test 
(synchronization, indexing, ONEMercury presentation) can be run.  Keeping in mind 
that DataONE will need to build, test, and deploy items to the Coordinating Nodes, 
format registration would ideally be started during the implementation / development
phase of the Member Node on-boarding process.  Specifically, the first item 
(the published schema) needs to be published and tested, and the object format 
registered to the target testing environment before the Member Node itself can 
be tested.  Absent these things, synchronization will fail, and the indexing and 
ONEMercury tests cannot be run.

Typically, the indexing parser and SXLT template are tested and deployed to the 
Coordinating Nodes of the DEV testing environment for testing by DataONE developers, 
and then if successful, deployed to the STAGE environment, in preparation for 
registration of the prospective Member Node in that environment.

Member Node implementers should work out specific timings and placements with 
their primary DataONE contact to optimize their development cycles. 



Notes:
~~~~~~~
What information is pulled from metadata into the search index:

  http://mule1.dataone.org/ArchitectureDocs-current/design/SearchMetadata.html#values-extracted-from-science-metadata

current effort estimation:
- 2 days dev, 2 days testing (sandbox, staging), 1 for the release, 1 day ONEMercury upgrade.

- new versions of existing formats require less development and result in quicker testing
- what is process for registering a data format?

Remaining issue
~~~~~~~~~~~~~~~
Because of the difficulty re-synchronizing failed objects, the Member Node is 
dependent on DataONE to register the data format before it can start even entering
data onto their node.  This seems like a backwards dependency that puts DataONE
resources on the critical path of external projects.

Q. is there a more graceful way to handle this situation?
