Äcdocutils.nodes
document
q)Åq}q(U	nametypesq}q(X���opacityqNX���serializingqNX���guidqàX���ogc wktq	àX���immutabilityq
NX���3àX	���authorityqNX���1àX
���uniquenessqNX���2àX���5àX���4àX���granularityq
NX���identifiers in dataoneqNX
���resolvabilityqNX	���structureqNuUsubstitution_defsq}qUparse_messagesq]qUcurrent_sourceqNU
decorationqNUautofootnote_startqKUnameidsq}q(hUopacityqhUserializingqhUguidqh	Uogc-wktqh
UimmutabilityqX���3Uid9qhU	authorityq X���1Uid7q!hU
uniquenessq"X���2Uid8q#X���5Uid11q$X���4Uid10q%h
Ugranularityq&hUidentifiers-in-dataoneq'hU
resolvabilityq(hU	structureq)uUchildrenq*]q+cdocutils.nodes
section
q,)Åq-}q.(U	rawsourceq/U�Uparentq0hUsourceq1X`���/var/lib/jenkins/jobs/API_Documentation_trunk/workspace/api-documentation/source/design/PIDs.txtq2Utagnameq3Usectionq4U
attributesq5}q6(Udupnamesq7]Uclassesq8]Ubackrefsq9]Uidsq:]q;h'aUnamesq<]q=hauUlineq>KUdocumentq?hh*]q@(cdocutils.nodes
title
qA)ÅqB}qC(h/X���Identifiers in DataONEqDh0h-h1h2h3UtitleqEh5}qF(h7]h8]h9]h:]h<]uh>Kh?hh*]qGcdocutils.nodes
Text
qHX���Identifiers in DataONEqIÖÅqJ}qK(h/hDh0hBubaubcdocutils.nodes
paragraph
qL)ÅqM}qN(h/Xp���Identifiers (PIDs, Persistent IDentifiers) are handles that uniquely identify
objects within the DataONE system.qOh0h-h1h2h3U	paragraphqPh5}qQ(h7]h8]h9]h:]h<]uh>Kh?hh*]qRhHXp���Identifiers (PIDs, Persistent IDentifiers) are handles that uniquely identify
objects within the DataONE system.qSÖÅqT}qU(h/hOh0hMubaubcdocutils.nodes
bullet_list
qV)ÅqW}qX(h/U�h0h-h1h2h3Ubullet_listqYh5}qZ(Ubulletq[X���*h:]h9]h7]h8]h<]uh>Kh?hh*]q\(cdocutils.nodes
list_item
q])Åq^}q_(h/XR���All data, metadata, and resource map objects in DataONE have a unique
identifier.
h0hWh1h2h3U	list_itemq`h5}qa(h7]h8]h9]h:]h<]uh>Nh?hh*]qbhL)Åqc}qd(h/XQ���All data, metadata, and resource map objects in DataONE have a unique
identifier.qeh0h^h1h2h3hPh5}qf(h7]h8]h9]h:]h<]uh>Kh*]qghHXQ���All data, metadata, and resource map objects in DataONE have a unique
identifier.qhÖÅqi}qj(h/heh0hcubaubaubh])Åqk}ql(h/Xu���PIDs will always refer to the same set of bytes accessed through the DataONE
API methods such as :func:`MNRead.get`.
h0hWh1h2h3h`h5}qm(h7]h8]h9]h:]h<]uh>Nh?hh*]qnhL)Åqo}qp(h/Xt���PIDs will always refer to the same set of bytes accessed through the DataONE
API methods such as :func:`MNRead.get`.h0hkh1h2h3hPh5}qq(h7]h8]h9]h:]h<]uh>K
h*]qr(hHXa���PIDs will always refer to the same set of bytes accessed through the DataONE
API methods such as qsÖÅqt}qu(h/Xa���PIDs will always refer to the same set of bytes accessed through the DataONE
API methods such as h0houbcsphinx.addnodes
pending_xref
qv)Åqw}qx(h/X���:func:`MNRead.get`qyh0hoh1h2h3Upending_xrefqzh5}q{(UreftypeX���funcUrefwarnq|âU	reftargetq}X
���MNRead.getU	refdomainX���pyq~h:]h9]Urefexplicitâh7]h8]h<]UrefdocqX���design/PIDsqÄUpy:classqÅNU	py:moduleqÇNuh>K
h*]qÉcdocutils.nodes
literal
qÑ)ÅqÖ}qÜ(h/hyh5}qá(h7]h8]qà(Uxrefqâh~X���py-funcqäeh9]h:]h<]uh0hwh*]qãhHX���MNRead.get()qåÖÅqç}qé(h/U�h0hÖubah3UliteralqèubaubhHX���.ÖÅqê}që(h/X���.h0houbeubaubh])Åqí}qì(h/Xh���The location of content identified by a PID is determined by calling the
:func:`CNCore.resolve` method.
h0hWh1h2h3h`h5}qî(h7]h8]h9]h:]h<]uh>Nh?hh*]qïhL)Åqñ}qó(h/Xg���The location of content identified by a PID is determined by calling the
:func:`CNCore.resolve` method.h0híh1h2h3hPh5}qò(h7]h8]h9]h:]h<]uh>K
h*]qô(hHXI���The location of content identified by a PID is determined by calling the
qöÖÅqõ}qú(h/XI���The location of content identified by a PID is determined by calling the
h0hñubhv)Åqù}qû(h/X���:func:`CNCore.resolve`qüh0hñh1h2h3hzh5}q†(UreftypeX���funch|âh}X���CNCore.resolveU	refdomainX���pyq°h:]h9]Urefexplicitâh7]h8]h<]hhÄhÅNhÇNuh>K
h*]q¢hÑ)Åq£}q§(h/hüh5}q•(h7]h8]q¶(hâh°X���py-funcqßeh9]h:]h<]uh0hùh*]q®hHX���CNCore.resolve()q©ÖÅq™}q´(h/U�h0h£ubah3hèubaubhHX��� method.q¨ÖÅq≠}qÆ(h/X��� method.h0hñubeubaubh])ÅqØ}q∞(h/XÅ���PIDs are persistent. Once content is registered with DataONE, the identifier
for that content will remain in the DataONE system.
h0hWh1h2h3h`h5}q±(h7]h8]h9]h:]h<]uh>Nh?hh*]q≤hL)Åq≥}q¥(h/XÄ���PIDs are persistent. Once content is registered with DataONE, the identifier
for that content will remain in the DataONE system.qµh0hØh1h2h3hPh5}q∂(h7]h8]h9]h:]h<]uh>Kh*]q∑hHXÄ���PIDs are persistent. Once content is registered with DataONE, the identifier
for that content will remain in the DataONE system.q∏ÖÅqπ}q∫(h/hµh0h≥ubaubaubh])Åqª}qº(h/X6���PIDs are unique, and can not be reused once assigned.
h0hWh1h2h3h`h5}qΩ(h7]h8]h9]h:]h<]uh>Nh?hh*]qæhL)Åqø}q¿(h/X5���PIDs are unique, and can not be reused once assigned.q¡h0hªh1h2h3hPh5}q¬(h7]h8]h9]h:]h<]uh>Kh*]q√hHX5���PIDs are unique, and can not be reused once assigned.qƒÖÅq≈}q∆(h/h¡h0høubaubaubh])Åq«}q»(h/Xã���PIDs are generally controlled by Member Nodes, however their uniqueness and
immutability is enforced primarily by the Coordinating Nodes.

h0hWh1h2h3h`h5}q…(h7]h8]h9]h:]h<]uh>Nh?hh*]q hL)ÅqÀ}qÃ(h/Xâ���PIDs are generally controlled by Member Nodes, however their uniqueness and
immutability is enforced primarily by the Coordinating Nodes.qÕh0h«h1h2h3hPh5}qŒ(h7]h8]h9]h:]h<]uh>Kh*]qœhHXâ���PIDs are generally controlled by Member Nodes, however their uniqueness and
immutability is enforced primarily by the Coordinating Nodes.q–ÖÅq—}q“(h/hÕh0hÀubaubaubeubh,)Åq”}q‘(h/U�h0h-h1h2h3h4h5}q’(h7]h8]h9]h:]q÷h"ah<]q◊hauh>Kh?hh*]qÿ(hA)ÅqŸ}q⁄(h/X
���Uniquenessq€h0h”h1h2h3hEh5}q‹(h7]h8]h9]h:]h<]uh>Kh?hh*]q›hHX
���UniquenessqfiÖÅqfl}q‡(h/h€h0hŸubaubhL)Åq·}q‚(h/X¢��Generation of identifiers in DataONE is largely under the control of the Member
Nodes (i.e. the data providers), with the requirement that an existing
identifier (i.e. one that is already registered in the DataONE system) can not
be reused. This rule is enforced for new content by checking the uniqueness of a
proposed identifier in the :func:`MNStorage.create` method, and for existing
content by ignoring content with identifiers that are already in use. The
:func:`CNCore.reserveIdentifier` method may be used to reserve an identifier, so
that a client may for example compose a composite object prior to committing the
new content to storage on the Member Node. Similarly, Tier 3 and above Member
Nodes may support the :func:`MNStorage.generateIdentifier` which will typically
delegate to a third party persistent identifier service such as EZID [1]_ to
return an identifier guaranteed to be unique within the DataONE system.h0h”h1h2h3hPh5}q„(h7]h8]h9]h:]h<]uh>Kh?hh*]q‰(hHXR��Generation of identifiers in DataONE is largely under the control of the Member
Nodes (i.e. the data providers), with the requirement that an existing
identifier (i.e. one that is already registered in the DataONE system) can not
be reused. This rule is enforced for new content by checking the uniqueness of a
proposed identifier in the qÂÖÅqÊ}qÁ(h/XR��Generation of identifiers in DataONE is largely under the control of the Member
Nodes (i.e. the data providers), with the requirement that an existing
identifier (i.e. one that is already registered in the DataONE system) can not
be reused. This rule is enforced for new content by checking the uniqueness of a
proposed identifier in the h0h·ubhv)ÅqË}qÈ(h/X���:func:`MNStorage.create`qÍh0h·h1h2h3hzh5}qÎ(UreftypeX���funch|âh}X���MNStorage.createU	refdomainX���pyqÏh:]h9]Urefexplicitâh7]h8]h<]hhÄhÅNhÇNuh>Kh*]qÌhÑ)ÅqÓ}qÔ(h/hÍh5}q(h7]h8]qÒ(hâhÏX���py-funcqÚeh9]h:]h<]uh0hËh*]qÛhHX���MNStorage.create()qÙÖÅqı}qˆ(h/U�h0hÓubah3hèubaubhHXd��� method, and for existing
content by ignoring content with identifiers that are already in use. The
q˜ÖÅq¯}q˘(h/Xd��� method, and for existing
content by ignoring content with identifiers that are already in use. The
h0h·ubhv)Åq˙}q˚(h/X ���:func:`CNCore.reserveIdentifier`q¸h0h·h1h2h3hzh5}q˝(UreftypeX���funch|âh}X���CNCore.reserveIdentifierU	refdomainX���pyq˛h:]h9]Urefexplicitâh7]h8]h<]hhÄhÅNhÇNuh>Kh*]qˇhÑ)År���}r��(h/h¸h5}r��(h7]h8]r��(hâh˛X���py-funcr��eh9]h:]h<]uh0h˙h*]r��hHX���CNCore.reserveIdentifier()r��ÖÅr��}r��(h/U�h0j���ubah3hèubaubhHXÊ��� method may be used to reserve an identifier, so
that a client may for example compose a composite object prior to committing the
new content to storage on the Member Node. Similarly, Tier 3 and above Member
Nodes may support the r	��ÖÅr
��}r��(h/X��� method may be used to reserve an identifier, so
that a client may for example compose a composite object prior to committing the
new content to storage on the Member Node. Similarly, Tier 3 and above Member
Nodes may support the h0h·ubhv)År��}r
��(h/X$���:func:`MNStorage.generateIdentifier`r��h0h·h1h2h3hzh5}r��(UreftypeX���funch|âh}X���MNStorage.generateIdentifierU	refdomainX���pyr��h:]h9]Urefexplicitâh7]h8]h<]hhÄhÅNhÇNuh>Kh*]r��hÑ)År��}r��(h/j��h5}r��(h7]h8]r��(hâj��X���py-funcr��eh9]h:]h<]uh0j��h*]r��hHX���MNStorage.generateIdentifier()r��ÖÅr��}r��(h/U�h0j��ubah3hèubaubhHX[��� which will typically
delegate to a third party persistent identifier service such as EZID r��ÖÅr��}r��(h/X[��� which will typically
delegate to a third party persistent identifier service such as EZID h0h·ubcdocutils.nodes
footnote_reference
r��)År��}r ��(h/X���[1]_Uresolvedr!��Kh0h·h3Ufootnote_referencer"��h5}r#��(h:]r$��Uid1r%��ah9]h7]h8]h<]Urefidr&��h!uh*]r'��hHX���1ÖÅr(��}r)��(h/U�h0j��ubaubhHXK��� to
return an identifier guaranteed to be unique within the DataONE system.r*��ÖÅr+��}r,��(h/XK��� to
return an identifier guaranteed to be unique within the DataONE system.h0h·ubeubeubh,)År-��}r.��(h/U�h0h-h1h2h3h4h5}r/��(h7]h8]h9]h:]r0��h ah<]r1��hauh>K+h?hh*]r2��(hA)År3��}r4��(h/X	���Authorityr5��h0j-��h1h2h3hEh5}r6��(h7]h8]h9]h:]h<]uh>K+h?hh*]r7��hHX	���Authorityr8��ÖÅr9��}r:��(h/j5��h0j3��ubaubhL)År;��}r<��(h/X{��DataONE treats the original identifier (i.e. the first assignment of the
identifier to an object that becomes known to DataONE) as the authoritative
identifier for an object. Although generally not encouraged, multiple
identifiers may refer to a particular object and in such cases, DataONE will
attempt to utilize the original identifier for all communications about the
object.r=��h0j-��h1h2h3hPh5}r>��(h7]h8]h9]h:]h<]uh>K-h?hh*]r?��hHX{��DataONE treats the original identifier (i.e. the first assignment of the
identifier to an object that becomes known to DataONE) as the authoritative
identifier for an object. Although generally not encouraged, multiple
identifiers may refer to a particular object and in such cases, DataONE will
attempt to utilize the original identifier for all communications about the
object.r@��ÖÅrA��}rB��(h/j=��h0j;��ubaubeubh,)ÅrC��}rD��(h/U�h0h-h1h2h3h4h5}rE��(h7]h8]h9]h:]rF��hah<]rG��hauh>K6h?hh*]rH��(hA)ÅrI��}rJ��(h/X���OpacityrK��h0jC��h1h2h3hEh5}rL��(h7]h8]h9]h:]h<]uh>K6h?hh*]rM��hHX���OpacityrN��ÖÅrO��}rP��(h/jK��h0jI��ubaubhL)ÅrQ��}rR��(h/XS��Identifiers utilized by Member Nodes can take many different forms from
automatically generated sequential or random character strings to strings that
conform to schemes such as the LSID [2]_ and DOI [3]_ specifications. DataONE
does not directly utilize implied functionality and services that might be
available for some of the identifier schemes. This is not to say that mechanisms
such as metadata retrieval for LSIDs is not used by any components of the
DataONE infrastructure, but rather that the DataONE infrastructure and services
have no functional dependency on such external services.h0jC��h1h2h3hPh5}rS��(h7]h8]h9]h:]h<]uh>K8h?hh*]rT��(hHXª���Identifiers utilized by Member Nodes can take many different forms from
automatically generated sequential or random character strings to strings that
conform to schemes such as the LSID rU��ÖÅrV��}rW��(h/Xª���Identifiers utilized by Member Nodes can take many different forms from
automatically generated sequential or random character strings to strings that
conform to schemes such as the LSID h0jQ��ubj��)ÅrX��}rY��(h/X���[2]_j!��Kh0jQ��h3j"��h5}rZ��(h:]r[��Uid2r\��ah9]h7]h8]h<]j&��h#uh*]r]��hHX���2ÖÅr^��}r_��(h/U�h0jX��ubaubhHX	��� and DOI r`��ÖÅra��}rb��(h/X	��� and DOI h0jQ��ubj��)Årc��}rd��(h/X���[3]_j!��Kh0jQ��h3j"��h5}re��(h:]rf��Uid3rg��ah9]h7]h8]h<]j&��huh*]rh��hHX���3ÖÅri��}rj��(h/U�h0jc��ubaubhHXá�� specifications. DataONE
does not directly utilize implied functionality and services that might be
available for some of the identifier schemes. This is not to say that mechanisms
such as metadata retrieval for LSIDs is not used by any components of the
DataONE infrastructure, but rather that the DataONE infrastructure and services
have no functional dependency on such external services.rk��ÖÅrl��}rm��(h/Xá�� specifications. DataONE
does not directly utilize implied functionality and services that might be
available for some of the identifier schemes. This is not to say that mechanisms
such as metadata retrieval for LSIDs is not used by any components of the
DataONE infrastructure, but rather that the DataONE infrastructure and services
have no functional dependency on such external services.h0jQ��ubeubhL)Årn��}ro��(h/X��Identifiers are treated as opaque strings in the DataONE system, with no meaning
inferred from structure or pattern that may be present in identifiers. The rules
for identifier construction in DataONE are minimal and intended to ensure
practical utility of identifiers. There is a set of characters that can not be
used within an identifier string (non-printing and whitespace characters), and
the maximum number of characters that such a string may contain (800 characters,
#577). Leading and trailing white space is not allowed.rp��h0jC��h1h2h3hPh5}rq��(h7]h8]h9]h:]h<]uh>KAh?hh*]rr��hHX��Identifiers are treated as opaque strings in the DataONE system, with no meaning
inferred from structure or pattern that may be present in identifiers. The rules
for identifier construction in DataONE are minimal and intended to ensure
practical utility of identifiers. There is a set of characters that can not be
used within an identifier string (non-printing and whitespace characters), and
the maximum number of characters that such a string may contain (800 characters,
#577). Leading and trailing white space is not allowed.rs��ÖÅrt��}ru��(h/jp��h0jn��ubaubeubh,)Årv��}rw��(h/U�h0h-h1h2h3h4h5}rx��(h7]h8]h9]h:]ry��hah<]rz��h
auh>KKh?hh*]r{��(hA)År|��}r}��(h/X���Immutabilityr~��h0jv��h1h2h3hEh5}r��(h7]h8]h9]h:]h<]uh>KKh?hh*]rÄ��hHX���ImmutabilityrÅ��ÖÅrÇ��}rÉ��(h/j~��h0j|��ubaubhL)ÅrÑ��}rÖ��(h/X7��Once assigned and registered in the DataONE infrastructure, an identifier will
always refer to the same sequence of bytes. Generation of other representations
of objects may be supported by services (e.g. an image may be transformed from
TIFF to JPEG), but the identifier will always refer to the original form.r��h0jv��h1h2h3hPh5}r�(h7]h8]h9]h:]h<]uh>KMh?hh*]r�hHX7��Once assigned and registered in the DataONE infrastructure, an identifier will
always refer to the same sequence of bytes. Generation of other representations
of objects may be supported by services (e.g. an image may be transformed from
TIFF to JPEG), but the identifier will always refer to the original form.râ��ÖÅrä��}rã��(h/jÜ��h0jÑ��ubaubeubh,)Årå��}rç��(h/U�h0h-h1h2h3h4h5}ré��(h7]h8]h9]h:]rè��h(ah<]rê��hauh>KTh?hh*]rë��(hA)Årí��}rì��(h/X
���Resolvabilityrî��h0jå��h1h2h3hEh5}rï��(h7]h8]h9]h:]h<]uh>KTh?hh*]rñ��hHX
���Resolvabilityró��ÖÅrò��}rô��(h/jî��h0jí��ubaubhL)Årö��}rõ��(h/Xi��A fundamental goal of DataONE is to ensure that any identifier utilized in the
system is resolvable, that is, DataONE provides a mechanism that will enable the
location of the object to be determined. Resolution is handled by the
Coordinating Nodes through the :func:`CNCore.resolve` method, which returns a
list of nodes from which the object may be retrieved.h0jå��h1h2h3hPh5}rú��(h7]h8]h9]h:]h<]uh>KVh?hh*]rù��(hHX��A fundamental goal of DataONE is to ensure that any identifier utilized in the
system is resolvable, that is, DataONE provides a mechanism that will enable the
location of the object to be determined. Resolution is handled by the
Coordinating Nodes through the rû��ÖÅrü��}r†��(h/X��A fundamental goal of DataONE is to ensure that any identifier utilized in the
system is resolvable, that is, DataONE provides a mechanism that will enable the
location of the object to be determined. Resolution is handled by the
Coordinating Nodes through the h0jö��ubhv)År°��}r¢��(h/X���:func:`CNCore.resolve`r£��h0jö��h1h2h3hzh5}r§��(UreftypeX���funch|âh}X���CNCore.resolveU	refdomainX���pyr•��h:]h9]Urefexplicitâh7]h8]h<]hhÄhÅNhÇNuh>KVh*]r¶��hÑ)Årß��}r®��(h/j£��h5}r©��(h7]h8]r™��(hâj•��X���py-funcr´��eh9]h:]h<]uh0j°��h*]r¨��hHX���CNCore.resolve()r≠��ÖÅrÆ��}rØ��(h/U�h0jß��ubah3hèubaubhHXN��� method, which returns a
list of nodes from which the object may be retrieved.r∞��ÖÅr±��}r≤��(h/XN��� method, which returns a
list of nodes from which the object may be retrieved.h0jö��ubeubhL)År≥��}r¥��(h/X«���A guarantee of identifier resolvability is an important, core function of the
DataONE infrastructure upon which many other services may be constructed, both
within DataONE and by third party systems.rµ��h0jå��h1h2h3hPh5}r∂��(h7]h8]h9]h:]h<]uh>K\h?hh*]r∑��hHX«���A guarantee of identifier resolvability is an important, core function of the
DataONE infrastructure upon which many other services may be constructed, both
within DataONE and by third party systems.r∏��ÖÅrπ��}r∫��(h/jµ��h0j≥��ubaubeubh,)Årª��}rº��(h/U�h0h-h1h2h3h4h5}rΩ��(h7]h8]h9]h:]ræ��h&ah<]rø��h
auh>Kbh?hh*]r¿��(hA)År¡��}r¬��(h/X���Granularityr√��h0jª��h1h2h3hEh5}rƒ��(h7]h8]h9]h:]h<]uh>Kbh?hh*]r≈��hHX���Granularityr∆��ÖÅr«��}r»��(h/j√��h0j¡��ubaubhL)År…��}r ��(h/XN��Identifiers refer to managed objects in DataONE. Initially data, science metadata
documents, and resource maps have identifiers. The definition of "data" is
somewhat arbitrary though, and a single data object may be a single record
within some larger collection, or may refer to an entire set of records
contained within some package.rÀ��h0jª��h1h2h3hPh5}rÃ��(h7]h8]h9]h:]h<]uh>Kdh?hh*]rÕ��hHXN��Identifiers refer to managed objects in DataONE. Initially data, science metadata
documents, and resource maps have identifiers. The definition of "data" is
somewhat arbitrary though, and a single data object may be a single record
within some larger collection, or may refer to an entire set of records
contained within some package.rŒ��ÖÅrœ��}r–��(h/jÀ��h0j…��ubaubeubh,)År—��}r“��(h/U�h0h-h1h2h3h4h5}r”��(h7]h8]h9]h:]r‘��h)ah<]r’��hauh>Klh?hh*]r÷��(hA)År◊��}rÿ��(h/X	���StructurerŸ��h0j—��h1h2h3hEh5}r⁄��(h7]h8]h9]h:]h<]uh>Klh?hh*]r€��hHX	���Structurer‹��ÖÅr›��}rfi��(h/jŸ��h0j◊��ubaubhL)Årfl��}r‡��(h/X!��The characters that may appear in an identifier string acceptable to the
DataONE system is constrained by the XMLSchema definition
(:class:`Types.Identifier`), which is essentially a string of length greater
than zero but less than 800 characters with no whitespace (spaces, tabs,
non-printing characters, carriage returns, new lines). Identifiers may be
Unicode provided they conform to the fairly liberal restrictions imposed by
the XML specification [4]_. Examples of valid identifiers in DataONE are shown
in the section *Serializing* below.h0j—��h1h2h3hPh5}r·��(h7]h8]h9]h:]h<]uh>Knh?hh*]r‚��(hHXÑ���The characters that may appear in an identifier string acceptable to the
DataONE system is constrained by the XMLSchema definition
(r„��ÖÅr‰��}rÂ��(h/XÑ���The characters that may appear in an identifier string acceptable to the
DataONE system is constrained by the XMLSchema definition
(h0jfl��ubhv)ÅrÊ��}rÁ��(h/X���:class:`Types.Identifier`rË��h0jfl��h1h2h3hzh5}rÈ��(UreftypeX���classh|âh}X���Types.IdentifierU	refdomainX���pyrÍ��h:]h9]Urefexplicitâh7]h8]h<]hhÄhÅNhÇNuh>Knh*]rÎ��hÑ)ÅrÏ��}rÌ��(h/jË��h5}rÓ��(h7]h8]rÔ��(hâjÍ��X���py-classr��eh9]h:]h<]uh0jÊ��h*]rÒ��hHX���Types.IdentifierrÚ��ÖÅrÛ��}rÙ��(h/U�h0jÏ��ubah3hèubaubhHX(��), which is essentially a string of length greater
than zero but less than 800 characters with no whitespace (spaces, tabs,
non-printing characters, carriage returns, new lines). Identifiers may be
Unicode provided they conform to the fairly liberal restrictions imposed by
the XML specification rı��ÖÅrˆ��}r˜��(h/X(��), which is essentially a string of length greater
than zero but less than 800 characters with no whitespace (spaces, tabs,
non-printing characters, carriage returns, new lines). Identifiers may be
Unicode provided they conform to the fairly liberal restrictions imposed by
the XML specification h0jfl��ubj��)År¯��}r˘��(h/X���[4]_j!��Kh0jfl��h3j"��h5}r˙��(h:]r˚��Uid4r¸��ah9]h7]h8]h<]j&��h%uh*]r˝��hHX���4ÖÅr˛��}rˇ��(h/U�h0j¯��ubaubhHXD���. Examples of valid identifiers in DataONE are shown
in the section r���ÖÅr��}r��(h/XD���. Examples of valid identifiers in DataONE are shown
in the section h0jfl��ubcdocutils.nodes
emphasis
r��)År��}r��(h/X
���*Serializing*h5}r��(h7]h8]h9]h:]h<]uh0jfl��h*]r��hHX���Serializingr��ÖÅr	��}r
��(h/U�h0j��ubah3Uemphasisr��ubhHX��� below.r��ÖÅr
��}r��(h/X��� below.h0jfl��ubeubeubh,)År��}r��(h/U�h0h-h1h2h3h4h5}r��(h7]h8]h9]h:]r��hah<]r��hauh>Kyh?hh*]r��(hA)År��}r��(h/X���Serializingr��h0j��h1h2h3hEh5}r��(h7]h8]h9]h:]h<]uh>Kyh?hh*]r��hHX���Serializingr��ÖÅr��}r��(h/j��h0j��ubaubhL)År��}r��(h/XT���When identifiers appear in text, the full identifier should be presented
unmodified.r��h0j��h1h2h3hPh5}r ��(h7]h8]h9]h:]h<]uh>K{h?hh*]r!��hHXT���When identifiers appear in text, the full identifier should be presented
unmodified.r"��ÖÅr#��}r$��(h/j��h0j��ubaubhL)År%��}r&��(h/X¬���Identifiers appearing in URLs or other representations that have reserved
characters should be escaped according to the rules of the targeted
serialization format. For example, the identifiers::h0j��h1h2h3hPh5}r'��(h7]h8]h9]h:]h<]uh>K~h?hh*]r(��hHX¡���Identifiers appearing in URLs or other representations that have reserved
characters should be escaped according to the rules of the targeted
serialization format. For example, the identifiers:r)��ÖÅr*��}r+��(h/X¡���Identifiers appearing in URLs or other representations that have reserved
characters should be escaped according to the rules of the targeted
serialization format. For example, the identifiers:h0j%��ubaubcdocutils.nodes
literal_block
r,��)År-��}r.��(h/XÒ���10.1000/182
urn:lsid:ubio.org:namebank:11815
http://example.com/data/mydata?row=24
ldap://ldap1.example.net:6666/o=University%20of%20Michigan,c=US??sub?(cn=Babs%20Jensen)
ฉันกินกระจกได้
Is_f√©idir_liom_ithe_gloineh0j��h1h2h3U
literal_blockr/��h5}r0��(U	xml:spacer1��Upreserver2��h:]h9]h7]h8]h<]uh>KÇh?hh*]r3��hHXÒ���10.1000/182
urn:lsid:ubio.org:namebank:11815
http://example.com/data/mydata?row=24
ldap://ldap1.example.net:6666/o=University%20of%20Michigan,c=US??sub?(cn=Babs%20Jensen)
ฉันกินกระจกได้
Is_f√©idir_liom_ithe_gloiner4��ÖÅr5��}r6��(h/U�h0j-��ubaubhL)År7��}r8��(h/Xê���would be serialized in DataONE :func:`MNRead.get` URLs (or any other URL path)
according to RFC3986_ encoding guidelines for URI path segments::h0j��h1h2h3hPh5}r9��(h7]h8]h9]h:]h<]uh>Kâh?hh*]r:��(hHX���would be serialized in DataONE r;��ÖÅr<��}r=��(h/X���would be serialized in DataONE h0j7��ubhv)År>��}r?��(h/X���:func:`MNRead.get`r@��h0j7��h1h2h3hzh5}rA��(UreftypeX���funch|âh}X
���MNRead.getU	refdomainX���pyrB��h:]h9]Urefexplicitâh7]h8]h<]hhÄhÅNhÇNuh>Kâh*]rC��hÑ)ÅrD��}rE��(h/j@��h5}rF��(h7]h8]rG��(hâjB��X���py-funcrH��eh9]h:]h<]uh0j>��h*]rI��hHX���MNRead.get()rJ��ÖÅrK��}rL��(h/U�h0jD��ubah3hèubaubhHX+��� URLs (or any other URL path)
according to rM��ÖÅrN��}rO��(h/X+��� URLs (or any other URL path)
according to h0j7��ubcdocutils.nodes
problematic
rP��)ÅrQ��}rR��(h/X���RFC3986_rS��h0j7��h1Nh3UproblematicrT��h5}rU��(h:]rV��Uid13rW��ah9]h7]h8]h<]UrefidUid12rX��uh>Nh?hh*]rY��hHX���RFC3986_rZ��ÖÅr[��}r\��(h/U�h0jQ��ubaubhHX+��� encoding guidelines for URI path segments:r]��ÖÅr^��}r_��(h/X+��� encoding guidelines for URI path segments:h0j7��ubeubj,��)År`��}ra��(h/X'��http://mn.example.com/mn/object/10.1000%2F182
http://mn.example.com/mn/object/urn:lsid:ubio.org:namebank:11815
http://mn.example.com/mn/object/http:%2F%2Fexample.com%2Fdata%2Fmydata%3Frow=24
http://mn.example.com/mn/object/ldap:%2F%2Fldap1.example.net:6666%2Fo=University%2520of%2520Michigan,c=US%3F%3Fsub%3F(cn=Babs%2520Jensen)
http://mn.example.com/mn/object/%E0%B8%89%E0%B8%B1%E0%B8%99%E0%B8%81%E0%B8%B4%E0%B8%99%E0%B8%81%E0%B8%A3%E0%B8%B0%E0%B8%88%E0%B8%81%E0%B9%84%E0%B8%94%E0%B9%89
http://mn.example.com/mn/object/Is_f%C3%A9idir_liom_ithe_gloineh0j��h1h2h3j/��h5}rb��(j1��j2��h:]h9]h7]h8]h<]uh>Kåh?hh*]rc��hHX'��http://mn.example.com/mn/object/10.1000%2F182
http://mn.example.com/mn/object/urn:lsid:ubio.org:namebank:11815
http://mn.example.com/mn/object/http:%2F%2Fexample.com%2Fdata%2Fmydata%3Frow=24
http://mn.example.com/mn/object/ldap:%2F%2Fldap1.example.net:6666%2Fo=University%2520of%2520Michigan,c=US%3F%3Fsub%3F(cn=Babs%2520Jensen)
http://mn.example.com/mn/object/%E0%B8%89%E0%B8%B1%E0%B8%99%E0%B8%81%E0%B8%B4%E0%B8%99%E0%B8%81%E0%B8%A3%E0%B8%B0%E0%B8%88%E0%B8%81%E0%B9%84%E0%B8%94%E0%B9%89
http://mn.example.com/mn/object/Is_f%C3%A9idir_liom_ithe_gloinerd��ÖÅre��}rf��(h/U�h0j`��ubaubcdocutils.nodes
note
rg��)Årh��}ri��(h/X§��The "+" (plus) character is a special case since it was once treated as a
space character in URLs, and was changed in RFC3986 [5]_ such that the "+"
would not be treated as a space. To minimize confusion when the plus
character appears in an identifier, DataONE recommends that the character
is percent escaped (``%2B``) when it appears in DataONE service URLs. All
DataONE libraries and services operate in this manner.h0j��h1h2h3Unoterj��h5}rk��(h7]h8]h9]h:]h<]uh>Nh?hh*]rl��hL)Årm��}rn��(h/X§��The "+" (plus) character is a special case since it was once treated as a
space character in URLs, and was changed in RFC3986 [5]_ such that the "+"
would not be treated as a space. To minimize confusion when the plus
character appears in an identifier, DataONE recommends that the character
is percent escaped (``%2B``) when it appears in DataONE service URLs. All
DataONE libraries and services operate in this manner.h0jh��h1h2h3hPh5}ro��(h7]h8]h9]h:]h<]uh>Kñh*]rp��(hHX~���The "+" (plus) character is a special case since it was once treated as a
space character in URLs, and was changed in RFC3986 rq��ÖÅrr��}rs��(h/X~���The "+" (plus) character is a special case since it was once treated as a
space character in URLs, and was changed in RFC3986 h0jm��ubj��)Årt��}ru��(h/X���[5]_j!��Kh0jm��h3j"��h5}rv��(h:]rw��Uid5rx��ah9]h7]h8]h<]j&��h$uh*]ry��hHX���5ÖÅrz��}r{��(h/U�h0jt��ubaubhHX∂��� such that the "+"
would not be treated as a space. To minimize confusion when the plus
character appears in an identifier, DataONE recommends that the character
is percent escaped (r|��ÖÅr}��}r~��(h/X∂��� such that the "+"
would not be treated as a space. To minimize confusion when the plus
character appears in an identifier, DataONE recommends that the character
is percent escaped (h0jm��ubhÑ)År��}rÄ��(h/X���``%2B``h5}rÅ��(h7]h8]h9]h:]h<]uh0jm��h*]rÇ��hHX���%2BrÉ��ÖÅrÑ��}rÖ��(h/U�h0j��ubah3hèubhHXe���) when it appears in DataONE service URLs. All
DataONE libraries and services operate in this manner.rÜ��ÖÅrá��}rà��(h/Xe���) when it appears in DataONE service URLs. All
DataONE libraries and services operate in this manner.h0jm��ubeubaubhL)Årâ��}rä��(h/Xu��The necessary encoding of URLs can be usually achieved through standard
libraries available in many languages, with the caveat that the encoding
follows the RFC3986 encoding rules. Many packages over-escape, keeping only
the unreserved character set unescaped. For its client libraries, DataONE is
taking a minimal escaping approach within the latitude RFC3986 allows.
Specifically, using [pchar] - ['+'] as the set of unescaped characters for
identifiers in path segments, and [pchar] - ['+', '&', '='] + ['/', '?'] for
identifiers in query segments, (segments in both cases meaning characters
between delimiters). For example::h0j��h1h2h3hPh5}rã��(h7]h8]h9]h:]h<]uh>Kûh?hh*]rå��hHXt��The necessary encoding of URLs can be usually achieved through standard
libraries available in many languages, with the caveat that the encoding
follows the RFC3986 encoding rules. Many packages over-escape, keeping only
the unreserved character set unescaped. For its client libraries, DataONE is
taking a minimal escaping approach within the latitude RFC3986 allows.
Specifically, using [pchar] - ['+'] as the set of unescaped characters for
identifiers in path segments, and [pchar] - ['+', '&', '='] + ['/', '?'] for
identifiers in query segments, (segments in both cases meaning characters
between delimiters). For example:rç��ÖÅré��}rè��(h/Xt��The necessary encoding of URLs can be usually achieved through standard
libraries available in many languages, with the caveat that the encoding
follows the RFC3986 encoding rules. Many packages over-escape, keeping only
the unreserved character set unescaped. For its client libraries, DataONE is
taking a minimal escaping approach within the latitude RFC3986 allows.
Specifically, using [pchar] - ['+'] as the set of unescaped characters for
identifiers in path segments, and [pchar] - ['+', '&', '='] + ['/', '?'] for
identifiers in query segments, (segments in both cases meaning characters
between delimiters). For example:h0jâ��ubaubj,��)Årê��}rë��(h/XQ���example-location-dependent-__/__?__&__=__
example-common-unescaped-;:@$-_.!*()',~h0j��h1h2h3j/��h5}rí��(j1��j2��h:]h9]h7]h8]h<]uh>K®h?hh*]rì��hHXQ���example-location-dependent-__/__?__&__=__
example-common-unescaped-;:@$-_.!*()',~rî��ÖÅrï��}rñ��(h/U�h0jê��ubaubhL)Åró��}rò��(h/X���will be encoded in paths to::rô��h0j��h1h2h3hPh5}rö��(h7]h8]h9]h:]h<]uh>K´h?hh*]rõ��hHX���will be encoded in paths to:rú��ÖÅrù��}rû��(h/X���will be encoded in paths to:h0jó��ubaubj,��)Årü��}r†��(h/XU���example-location-dependent-__%2F__%3F__&__=__
example-common-unescaped-;:@$-_.!*()',~h0j��h1h2h3j/��h5}r°��(j1��j2��h:]h9]h7]h8]h<]uh>K≠h?hh*]r¢��hHXU���example-location-dependent-__%2F__%3F__&__=__
example-common-unescaped-;:@$-_.!*()',~r£��ÖÅr§��}r•��(h/U�h0jü��ubaubhL)År¶��}rß��(h/X%���and encoded in the query section to::r®��h0j��h1h2h3hPh5}r©��(h7]h8]h9]h:]h<]uh>K∞h?hh*]r™��hHX$���and encoded in the query section to:r´��ÖÅr¨��}r≠��(h/X$���and encoded in the query section to:h0j¶��ubaubj,��)ÅrÆ��}rØ��(h/XU���example-location-dependent-__/__?__%26__%3D__
example-common-unescaped-;:@$-_.!*()',~h0j��h1h2h3j/��h5}r∞��(j1��j2��h:]h9]h7]h8]h<]uh>K≤h?hh*]r±��hHXU���example-location-dependent-__/__?__%26__%3D__
example-common-unescaped-;:@$-_.!*()',~r≤��ÖÅr≥��}r¥��(h/U�h0jÆ��ubaubhL)Årµ��}r∂��(h/X|��Note that RFC3986 [5]_ treats the query section of the URI as a blackbox, so '&'
and '=' are unescaped (to be used as sub-delimiters). For the purpose of
encoding content, we take the approach of encoding at the segment level, so
need to escape those characters. For those implementations using standard
encoding routines, it is important to know that package's treatment of this.h0j��h1h2h3hPh5}r∑��(h7]h8]h9]h:]h<]uh>Kµh?hh*]r∏��(hHX���Note that RFC3986 rπ��ÖÅr∫��}rª��(h/X���Note that RFC3986 h0jµ��ubj��)Årº��}rΩ��(h/X���[5]_j!��Kh0jµ��h3j"��h5}ræ��(h:]rø��Uid6r¿��ah9]h7]h8]h<]j&��h$uh*]r¡��hHX���5ÖÅr¬��}r√��(h/U�h0jº��ubaubhHXf�� treats the query section of the URI as a blackbox, so '&'
and '=' are unescaped (to be used as sub-delimiters). For the purpose of
encoding content, we take the approach of encoding at the segment level, so
need to escape those characters. For those implementations using standard
encoding routines, it is important to know that package's treatment of this.rƒ��ÖÅr≈��}r∆��(h/Xf�� treats the query section of the URI as a blackbox, so '&'
and '=' are unescaped (to be used as sub-delimiters). For the purpose of
encoding content, we take the approach of encoding at the segment level, so
need to escape those characters. For those implementations using standard
encoding routines, it is important to know that package's treatment of this.h0jµ��ubeubhL)År«��}r»��(h/X#��The following examples in Python and Java illustrate percent encoding of data
such as an identifier appropriate for appending to a URL. Each processes utf-8
encoded input through *stdin* and outputs percent encoded or decoded
responses. In java pseudo-code the general process is as follows.h0j��h1h2h3hPh5}r…��(h7]h8]h9]h:]h<]uh>Kªh?hh*]r ��(hHX≥���The following examples in Python and Java illustrate percent encoding of data
such as an identifier appropriate for appending to a URL. Each processes utf-8
encoded input through rÀ��ÖÅrÃ��}rÕ��(h/X≥���The following examples in Python and Java illustrate percent encoding of data
such as an identifier appropriate for appending to a URL. Each processes utf-8
encoded input through h0j«��ubj��)ÅrŒ��}rœ��(h/X���*stdin*h5}r–��(h7]h8]h9]h:]h<]uh0j«��h*]r—��hHX���stdinr“��ÖÅr”��}r‘��(h/U�h0jŒ��ubah3j��ubhHXi��� and outputs percent encoded or decoded
responses. In java pseudo-code the general process is as follows.r’��ÖÅr÷��}r◊��(h/Xi��� and outputs percent encoded or decoded
responses. In java pseudo-code the general process is as follows.h0j«��ubeubj,��)Årÿ��}rŸ��(h/XÙ��// pseudo-code: this will not compile!

CharacterSet PATH_SAFE = RFC3986_PCHAR and not ['+'];
CharacterSet QUERY_SAFE = PATH_SAFE and not ['&','='] or ['?','/'];

String encodeUtf8_pathSegment(identifier) {
    String utf8ID = identifier.translate("UTF-8");
    return encodedID = percentEscape(utf8ID,PATH_SAFE);
}

String encodeUtf8_querySegment(identifier) {
    String utf8ID = identifier.translate("UTF-8");
    return encodedID = percentEscape(utf8ID,QUERY_SAFE);
}

String decodeString(string) {
    // older clients may encode spaces with '+'
    // so if we see them in the input, it is due to that
    // and we need to decode them, too.

    String correctedString = string.replace("+","%2B");
    return decodePercentEscaped(correctedString);
}h0j��h1h2h3j/��h5}r⁄��(Ulinenosr€��âUlanguager‹��X���javaj1��j2��h:]h9]h7]Uhighlight_argsr›��}h8]h<]uh>K¿h?hh*]rfi��hHXÙ��// pseudo-code: this will not compile!

CharacterSet PATH_SAFE = RFC3986_PCHAR and not ['+'];
CharacterSet QUERY_SAFE = PATH_SAFE and not ['&','='] or ['?','/'];

String encodeUtf8_pathSegment(identifier) {
    String utf8ID = identifier.translate("UTF-8");
    return encodedID = percentEscape(utf8ID,PATH_SAFE);
}

String encodeUtf8_querySegment(identifier) {
    String utf8ID = identifier.translate("UTF-8");
    return encodedID = percentEscape(utf8ID,QUERY_SAFE);
}

String decodeString(string) {
    // older clients may encode spaces with '+'
    // so if we see them in the input, it is due to that
    // and we need to decode them, too.

    String correctedString = string.replace("+","%2B");
    return decodePercentEscaped(correctedString);
}rfl��ÖÅr‡��}r·��(h/U�h0jÿ��ubaubj,��)År‚��}r„��(h/Xá��import sys
import codecs
import urllib

def pctEncode(data):
  '''Encode the unicode string data as utf-8 then percent encode that
  ready for appending as a path element to a URL.
  '''
  response = urllib.quote(data.encode("utf-8"), safe=":")
  return response


def pctDecode(data):
  '''Decode a percent encoded string and return the unicode object.
  but first handle any mistaken '+' in the data string
  '''
 data = data.replace("+","%2B")
  response = urllib.unquote(data)
  return response


if __name__ == "__main__":
  '''
  Read utf-8 encoded input from stdin and percent encode or
  decode (with command line argument -d).

  e.g. given test_ids.txt, a UTF-8 encoded file with identifiers
  appearing one per line:
    cat test_ids.txt | python PctEncode.py | python PctEncode.py -d

  should output equivalent to:
    cat test_ids.txt
  '''
  doEncode = True
  try:
    if sys.argv[1] == "-d":
      doEncode = False
  except:
    pass
  id = unicode(sys.stdin.readline(), "utf-8").strip()
  while len(id) > 0:
    if doEncode:
      print pctEncode(id)
    else:
      print pctDecode(id)
    id = unicode(sys.stdin.readline(), "utf-8").strip()h0j��h1h2h3j/��h5}r‰��(j€��âj‹��X���pythonj1��j2��h:]h9]h7]j›��}h8]h<]uh>K€h?hh*]rÂ��hHXá��import sys
import codecs
import urllib

def pctEncode(data):
  '''Encode the unicode string data as utf-8 then percent encode that
  ready for appending as a path element to a URL.
  '''
  response = urllib.quote(data.encode("utf-8"), safe=":")
  return response


def pctDecode(data):
  '''Decode a percent encoded string and return the unicode object.
  but first handle any mistaken '+' in the data string
  '''
 data = data.replace("+","%2B")
  response = urllib.unquote(data)
  return response


if __name__ == "__main__":
  '''
  Read utf-8 encoded input from stdin and percent encode or
  decode (with command line argument -d).

  e.g. given test_ids.txt, a UTF-8 encoded file with identifiers
  appearing one per line:
    cat test_ids.txt | python PctEncode.py | python PctEncode.py -d

  should output equivalent to:
    cat test_ids.txt
  '''
  doEncode = True
  try:
    if sys.argv[1] == "-d":
      doEncode = False
  except:
    pass
  id = unicode(sys.stdin.readline(), "utf-8").strip()
  while len(id) > 0:
    if doEncode:
      print pctEncode(id)
    else:
      print pctDecode(id)
    id = unicode(sys.stdin.readline(), "utf-8").strip()rÊ��ÖÅrÁ��}rË��(h/U�h0j‚��ubaubj,��)ÅrÈ��}rÍ��(h/X7
��import java.io.*;
import java.net.*;

class PctEncode
{
  /**
  Simple example of URL path encoding of UTF-8 strings for including as
  path elements in URLs as per RFC3986.

  e.g. given test_ids.txt, a UTF-8 encoded file with identifiers
  appearing one per line:
    cat test_ids.txt | java PctEncode | java PctEncode -d

  should output equivalent to:
    cat test_ids.txt
  */

  public static String pctDecode(String data) {
    /**
    Decode a percent encoded string, returning a Java Unicode string
    */
    String response = null;
    try {
      data = data.replace("+","%2B");
      response = URLDecoder.decode( data, "UTF-8");
    } catch (java.io.UnsupportedEncodingException e) {
      System.out.println("Error pctDecode : " + e.getMessage());
    }
    return response;
  }


  public static String pctEncodePathSegment(String data) {
    /**
    Encode a Java string according to the path encoding rules in
    RFC3986. Note that this does not encode properly for data that
    is to be the root of the path, it is assumed that the data will
    be appended to the end of a a URL path.
    */
    String response = null;
    try {
      response = URLEncoder.encode( data, "UTF-8" );
      // fix outdated space-to-+ convention
      response = response.replace("+","%20");
      // now un-escape for minimally escaped result
      response = response.replace("%3A",":").replace("%28","(");
      response = response.replace("%3B",";").replace("%29",")");
      response = response.replace("%40","@").replace("%27","'");
      response = response.replace("%24","$").replace("%2C",",");
      response = response.replace("%21","!").replace("%7E","~");

    } catch (java.io.UnsupportedEncodingException e) {
      System.out.println("Error  pctEncode: " + e.getMessage());
    }
    return response;
  }


  public static void main( String[] args ) {
    try {
      boolean doEncode = true;
      try {
        if (args[0].equals( "-d" ))
          doEncode = false;
      } catch(ArrayIndexOutOfBoundsException e) {
      }

      PrintStream outs = new PrintStream( System.out, true, "UTF-8" );
      InputStreamReader isr = new InputStreamReader( System.in, "UTF-8" );
      BufferedReader reader = new BufferedReader( isr );
      String id = null;
      String data = null;
      while ( (id = reader.readLine()) != null ) {
        if (doEncode) {
          data = pctEncode( id );
        } else {
          data = pctDecode( id );
        }
        outs.println( data );
      }
    } catch(java.io.IOException e) {
      System.out.println("Error main: " + e.getMessage());
    }
  }
}h0j��h1h2h3j/��h5}rÎ��(j€��âj‹��X���javaj1��j2��h:]h9]h7]j›��}h8]h<]uh>M
h?hh*]r��hHX7
��import java.io.*;
import java.net.*;

class PctEncode
{
  /**
  Simple example of URL path encoding of UTF-8 strings for including as
  path elements in URLs as per RFC3986.

  e.g. given test_ids.txt, a UTF-8 encoded file with identifiers
  appearing one per line:
    cat test_ids.txt | java PctEncode | java PctEncode -d

  should output equivalent to:
    cat test_ids.txt
  */

  public static String pctDecode(String data) {
    /**
    Decode a percent encoded string, returning a Java Unicode string
    */
    String response = null;
    try {
      data = data.replace("+","%2B");
      response = URLDecoder.decode( data, "UTF-8");
    } catch (java.io.UnsupportedEncodingException e) {
      System.out.println("Error pctDecode : " + e.getMessage());
    }
    return response;
  }


  public static String pctEncodePathSegment(String data) {
    /**
    Encode a Java string according to the path encoding rules in
    RFC3986. Note that this does not encode properly for data that
    is to be the root of the path, it is assumed that the data will
    be appended to the end of a a URL path.
    */
    String response = null;
    try {
      response = URLEncoder.encode( data, "UTF-8" );
      // fix outdated space-to-+ convention
      response = response.replace("+","%20");
      // now un-escape for minimally escaped result
      response = response.replace("%3A",":").replace("%28","(");
      response = response.replace("%3B",";").replace("%29",")");
      response = response.replace("%40","@").replace("%27","'");
      response = response.replace("%24","$").replace("%2C",",");
      response = response.replace("%21","!").replace("%7E","~");

    } catch (java.io.UnsupportedEncodingException e) {
      System.out.println("Error  pctEncode: " + e.getMessage());
    }
    return response;
  }


  public static void main( String[] args ) {
    try {
      boolean doEncode = true;
      try {
        if (args[0].equals( "-d" ))
          doEncode = false;
      } catch(ArrayIndexOutOfBoundsException e) {
      }

      PrintStream outs = new PrintStream( System.out, true, "UTF-8" );
      InputStreamReader isr = new InputStreamReader( System.in, "UTF-8" );
      BufferedReader reader = new BufferedReader( isr );
      String id = null;
      String data = null;
      while ( (id = reader.readLine()) != null ) {
        if (doEncode) {
          data = pctEncode( id );
        } else {
          data = pctDecode( id );
        }
        outs.println( data );
      }
    } catch(java.io.IOException e) {
      System.out.println("Error main: " + e.getMessage());
    }
  }
}rÌ��ÖÅrÓ��}rÔ��(h/U�h0jÈ��ubaubhL)År��}rÒ��(h/XH���Given this code and a utf-8 encoded source file *test_ids.txt* such as::rÚ��h0j��h1h2h3hPh5}rÛ��(h7]h8]h9]h:]h<]uh>Mfh?hh*]rÙ��(hHX0���Given this code and a utf-8 encoded source file rı��ÖÅrˆ��}r˜��(h/X0���Given this code and a utf-8 encoded source file h0j��ubj��)År¯��}r˘��(h/X���*test_ids.txt*h5}r˙��(h7]h8]h9]h:]h<]uh0j��h*]r˚��hHX���test_ids.txtr¸��ÖÅr˝��}r˛��(h/U�h0j¯��ubah3j��ubhHX	��� such as:rˇ��ÖÅr���}r��(h/X	��� such as:h0j��ubeubj,��)År��}r��(h/X˘���√∂
10.1000/182
urn:lsid:ubio.org:namebank:11815
http://example.com/data/mydata?row=24
ldap://ldap1.example.net:6666/o=University%20of%20Michigan,%20c=US??sub?(cn=Babs%20Jensen)",
ฉันกินกระจกได้
Is_f√©idir_liom_ithe_gloineh0j��h1h2h3j/��h5}r��(j1��j2��h:]h9]h7]h8]h<]uh>Mhh?hh*]r��hHX˘���√∂
10.1000/182
urn:lsid:ubio.org:namebank:11815
http://example.com/data/mydata?row=24
ldap://ldap1.example.net:6666/o=University%20of%20Michigan,%20c=US??sub?(cn=Babs%20Jensen)",
ฉันกินกระจกได้
Is_f√©idir_liom_ithe_gloiner��ÖÅr��}r��(h/U�h0j��ubaubhL)År	��}r
��(h/XG���The following commands should output the same as ``cat test_ids.txt``::r��h0j��h1h2h3hPh5}r��(h7]h8]h9]h:]h<]uh>Mph?hh*]r
��(hHX1���The following commands should output the same as r��ÖÅr��}r��(h/X1���The following commands should output the same as h0j	��ubhÑ)År��}r��(h/X���``cat test_ids.txt``h5}r��(h7]h8]h9]h:]h<]uh0j	��h*]r��hHX���cat test_ids.txtr��ÖÅr��}r��(h/U�h0j��ubah3hèubhHX���:ÖÅr��}r��(h/X���:h0j	��ubeubj,��)År��}r��(h/Xu���cat test_ids.txt | java PctEncode | python PctEncode.py -d
cat test_ids.txt | python PctEncode.py | java PctEncode -dh0j��h1h2h3j/��h5}r��(j1��j2��h:]h9]h7]h8]h<]uh>Mrh?hh*]r��hHXu���cat test_ids.txt | java PctEncode | python PctEncode.py -d
cat test_ids.txt | python PctEncode.py | java PctEncode -dr��ÖÅr��}r ��(h/U�h0j��ubaubcdocutils.nodes
target
r!��)År"��}r#��(h/XK���.. _guid: http://en.wikipedia.org/wiki/Globally_unique_identifier#Algorithmh0j��h1h2h3Utargetr$��h5}r%��(Urefurir&��XA���http://en.wikipedia.org/wiki/Globally_unique_identifier#Algorithmh:]r'��hah9]h7]h8]h<]r(��hauh>Mvh?hh*]ubj!��)År)��}r*��(h/X9���.. _OGC WKT: http://en.wikipedia.org/wiki/Well-known_texth0j��h1h2h3j$��h5}r+��(j&��X,���http://en.wikipedia.org/wiki/Well-known_texth:]r,��hah9]h7]h8]h<]r-��h	auh>Myh?hh*]ubcdocutils.nodes
footnote
r.��)År/��}r0��(h/X���http://n2t.net/ezid/
j!��Kh0j��h1h2h3Ufootnoter1��h5}r2��(h7]h8]h9]r3��j%��ah:]r4��h!ah<]r5��X���1auh>M{h?hh*]r6��(cdocutils.nodes
label
r7��)År8��}r9��(h/X���1h5}r:��(h7]h8]h9]h:]h<]uh0j/��h*]r;��hHX���1ÖÅr<��}r=��(h/U�h0j8��ubah3Ulabelr>��ubhL)År?��}r@��(h/X���http://n2t.net/ezid/rA��h0j/��h1h2h3hPh5}rB��(h7]h8]h9]h:]h<]uh>M{h*]rC��cdocutils.nodes
reference
rD��)ÅrE��}rF��(h/jA��h5}rG��(UrefurijA��h:]h9]h7]h8]h<]uh0j?��h*]rH��hHX���http://n2t.net/ezid/rI��ÖÅrJ��}rK��(h/U�h0jE��ubah3U	referencerL��ubaubeubj.��)ÅrM��}rN��(h/X���http://lsids.sourceforge.net/
j!��Kh0j��h1h2h3j1��h5}rO��(h7]h8]h9]rP��j\��ah:]rQ��h#ah<]rR��X���2auh>M}h?hh*]rS��(j7��)ÅrT��}rU��(h/X���2h5}rV��(h7]h8]h9]h:]h<]uh0jM��h*]rW��hHX���2ÖÅrX��}rY��(h/U�h0jT��ubah3j>��ubhL)ÅrZ��}r[��(h/X���http://lsids.sourceforge.net/r\��h0jM��h1h2h3hPh5}r]��(h7]h8]h9]h:]h<]uh>M}h*]r^��jD��)År_��}r`��(h/j\��h5}ra��(Urefurij\��h:]h9]h7]h8]h<]uh0jZ��h*]rb��hHX���http://lsids.sourceforge.net/rc��ÖÅrd��}re��(h/U�h0j_��ubah3jL��ubaubeubj.��)Årf��}rg��(h/X���http://www.doi.org/
j!��Kh0j��h1h2h3j1��h5}rh��(h7]h8]h9]ri��jg��ah:]rj��hah<]rk��X���3auh>Mh?hh*]rl��(j7��)Årm��}rn��(h/X���3h5}ro��(h7]h8]h9]h:]h<]uh0jf��h*]rp��hHX���3ÖÅrq��}rr��(h/U�h0jm��ubah3j>��ubhL)Års��}rt��(h/X���http://www.doi.org/ru��h0jf��h1h2h3hPh5}rv��(h7]h8]h9]h:]h<]uh>Mh*]rw��jD��)Årx��}ry��(h/ju��h5}rz��(Urefuriju��h:]h9]h7]h8]h<]uh0js��h*]r{��hHX���http://www.doi.org/r|��ÖÅr}��}r~��(h/U�h0jx��ubah3jL��ubaubeubj.��)År��}rÄ��(h/X%���http://www.w3.org/TR/xml11/#charsets
j!��Kh0j��h1h2h3j1��h5}rÅ��(h7]h8]h9]rÇ��j¸��ah:]rÉ��h%ah<]rÑ��X���4auh>MÅh?hh*]rÖ��(j7��)ÅrÜ��}rá��(h/X���4h5}rà��(h7]h8]h9]h:]h<]uh0j��h*]râ��hHX���4ÖÅrä��}rã��(h/U�h0jÜ��ubah3j>��ubhL)Årå��}rç��(h/X$���http://www.w3.org/TR/xml11/#charsetsré��h0j��h1h2h3hPh5}rè��(h7]h8]h9]h:]h<]uh>MÅh*]rê��jD��)Årë��}rí��(h/jé��h5}rì��(Urefurijé��h:]h9]h7]h8]h<]uh0jå��h*]rî��hHX$���http://www.w3.org/TR/xml11/#charsetsrï��ÖÅrñ��}ró��(h/U�h0jë��ubah3jL��ubaubeubj.��)Årò��}rô��(h/X$���http://tools.ietf.org/html/rfc3986

j!��Kh0j��h1h2h3j1��h5}rö��(h7]h8]h9]rõ��(jx��j¿��eh:]rú��h$ah<]rù��X���5auh>MÉh?hh*]rû��(j7��)Årü��}r†��(h/X���5h5}r°��(h7]h8]h9]h:]h<]uh0jò��h*]r¢��hHX���5ÖÅr£��}r§��(h/U�h0jü��ubah3j>��ubhL)År•��}r¶��(h/X"���http://tools.ietf.org/html/rfc3986rß��h0jò��h1h2h3hPh5}r®��(h7]h8]h9]h:]h<]uh>MÉh*]r©��jD��)År™��}r´��(h/jß��h5}r¨��(Urefurijß��h:]h9]h7]h8]h<]uh0j•��h*]r≠��hHX"���http://tools.ietf.org/html/rfc3986rÆ��ÖÅrØ��}r∞��(h/U�h0j™��ubah3jL��ubaubeubcdocutils.nodes
comment
r±��)År≤��}r≥��(h/Xx7��OLD Notes follow, preserved here for now but likely to be removed

Suggested Strategy
------------------

1. DataONE supports all identifier schemes where the PID can be represented as
   a Unicode string (this should be any identifier).

2. The original identifier first assigned by a Member Node is the identifier
   promoted as the authoritative identifier for that content. Other identifiers
   that may be assigned by MNs that don't support the original scheme will be
   mapped to the original.

3. If the original MN discontinues participation in DataONE, then the
   identifier originally used remains as the authoritative identifier.

4. Any identifiers in use by the DataONE system can be resolved at any node
   (CN or MN). A caching system (e.g. memcached) should be used to improve
   resolution performance (can be primed with existing IDs).

This strategy will enable the use of any identifier that is represented by a
string, and will persist the original identifier for the object regardless of
what happens to the originating Member Node.

An obvious concern with this strategy is that a single object may have
multiple identifiers associated with it. Since the original identifier is
persisted, however, it will be the primary identifier by which that content
will be referenced, regardless of which node the object is located on.


..
   @startuml images/resolve.png
   title Resolve PID
   actor User
   participant "CRUD API" as m_crud << Member Node >>
   participant "Cache" as m_cache << Member Node >>
   participant "CRUD API" as cn_crud << Coordinating Node >>
   participant "Directory" as cn_dir << Coordinating Node >>
   User -> m_crud: resolve(token, "A5548D")
   m_crud -> m_cache: cache_lookup("A5548D")
   m_cache --> m_crud: FAIL
   m_crud -> cn_crud: resolve(token, "A5548D")
   cn_crud -> cn_dir: lookup("A5548D")
   cn_dir --> cn_crud: metadata
   cn_crud --> m_crud: metadata
   m_crud --> m_cache: addEntry("A5548D", metadata)
   m_crud --> User: metadata
   @enduml

.. image:: images/resolve.png

*Figure 1.* Resolving a PID. In this scenario a user is trying to determine
what the ID "A5548D" refers to, and uses the resolution service of a Member
Node to that effect.


..
   @startuml images/resolve-detail.png
   title Resolve PID Detail
   actor User
   participant "CRUD API" as m_crud << Member Node >>
   participant "Cache" as m_cache << Member Node >>
   participant "CRUD API" as cn_crud << Coordinating Node >>
   participant "Directory" as cn_dir << Coordinating Node >>
   participant "CRUD API" as m_crud2 << Member Node 2 >>
   User -> m_crud: get(token, "A5548D")
   m_crud -> m_cache: lookup("A5548D")
   note right
     Local resolve failed, defer to CN
   endnote
   m_cache --> m_crud: FAIL
   m_crud -> cn_crud: resolve(token, "A5548D")
   cn_crud -> cn_dir: lookup("A5548D")
   cn_dir --> cn_crud: metadata
   cn_crud --> m_crud: metadata
   m_crud --> m_cache: addEntry(GUID, metadata)
   m_crud -> m_crud: parseMetadata(metadata)
   note right
     Found data URL = http://mn2.dataone.org/objects/A4448D
   endnote
   m_crud --> User: HTTP 302: http://mn2.dataone.org/objects/A4448D
   note right
     Return a redirect to the MN 2 get object
     interface for the specified object.
   endnote
   User -> m_crud2: GET "http://mn2.dataone.org/objects/A4448D"
   m_crud2 --> User: bytes
   @enduml

.. image:: images/resolve-detail.png

*Figure 2.* Detail for object retrieval of an object identified by a PID. In
this case, the User is requesting a data object from MN 1, though the data is
actually located on MN 2.


..
   @startuml images/resolve-conflict.png
   title Conflicting IDs
   participant "MN_A" as mn_a
   participant "MN_B" as mn_b
   participant "CN" as cn
   participant "CN OStore" as cn_os

   mn_a -> cn: registerID("435")
   cn -> cn_os: store("mn_a:435")
   cn_os <-- cn: ACK
   mn_a <-- cn: ACK

   mn_b -> cn: registerID("435")
   cn -> cn_os: store("mn_b:435")
   cn_os <-- cn: ACK
   mn_b <-- cn: ACK

   actor user
   user -> cn: resolve("435")
   user <-- cn: "mn_a:435", "mn_b:435"
   @enduml

.. image:: images/resolve-conflict.png

*Figure 3.* A scenario where two MNs happen to add different content to the
system with the same identifier. Resolving the identifier without including
the namespace results in two matches that must be interpreted by the client.
The likelihood of such a scenario should be low, given that MNs should be
utilizing identifier schemes that under ideal circumstances should not
generate duplicate identifiers.


Notes from the 20090602 Albuquerque Meeting
-------------------------------------------

These lightly edited notes were taken by Bruce Wilson of the group discussion
about identifiers during the VDC-TWG 20090602 Albuquerque Meeting.

Original notes are located in subversion at:

/documents/Projects/VDC/docs/20090602_04_ABQ_Meeting


Design Goals
~~~~~~~~~~~~

From the DataONE perspective, an identifier is opaque. DataONE does not attach
any meaning or resolution protocol based on the identifier.

A call to return the object associated with a particular identifier should
always return either identically the same object or n/a if that object is no
longer available. This raises a number of implementation issues, noted below.
Particular issues include how to handle data which is regularly updated and
things like status changes.

A Member Node may use its own internal identification scheme, but must be able
to retrieve an object based on its DataONE globally unique identifier.

Member Nodes may generate their own unique identifiers, such as DOIs_,
Handles_, PURLs_, or UUIDs_. The only requirement is that the identifier is
unique across the space of DataONE. This implies that CN's must have
functionality to:

.. _DOIs: http://www.doi.org/
.. _Handles: http://www.handle.net/
.. _PURLs: http://purl.org/docs/index.html
.. _UUIDs: http://en.wikipedia.org/wiki/UUID


(a) check that an identifier is unique and

(b) to "reserve" or stub-out an identifier while the MN goes through the
    process of assembling the package to submit the object into DataONE.

When an object is replicated from one MN to another MN, the receiving MN must
be able to accept and resolve the supplied DataONE identifier. That is, an
object, no matter where it is within the DataONE network must be retrievable
by its DataONE identifier, regardless of location. There was a lot of
discussion on this point, and this is my interpretation of the conclusion. I
believe we came out with the point that if a receiving Member Node assigns its
own permanent identifier, then that creates more confusion, requires the MN to
register that second ID with the CNs, and we can have confusion regarding the
citation (for example) of the piece of data. It also makes tracking things
like metrics, since the originating MN must then find out all other
identifiers for the data and search for all of those. And while it can be
argued that nobody "owns" the data, there is (currently) a culture and need
for the original archive to feel like it still can receive credit for that
investment.

A system doesn't need to maintain every version, but it does need to be able
to identify every version.

Identifiers also apply to metadata as well as data.


Questions for Further Consideration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If a MN uses a DOI for a data set identifier, is it appropriate to include
doi: in the identifier. For example, 10.3334/ORNLDAAC/840 is the DOI for a
particular data set at the ORNL DAAC. Both "doi:10.3334/ORNLDAAC/840" and
"10.3334/ORNLDAAC/840" can be presumed to be unique identifiers. Which should
be used?

BEW: My personal preference is to use the one with the resolution
protocol included. That does, however, make the identifier more of a "smart"
identifier, which is generally problematic.

Where an identifier has a mechanism to resolve to multiple locations (such as
is possible with an LSID and some DOI mechanisms) and that object is
replicated from one MN to another MN, this would suggest that the originating
MN needs to be notified of the additional location and has the option of
registering the new location with the handle registration authority. This also
means that if a replication is removed, the original MN should have the option
of being notified, so that the resolution points are updated. Ideally, this
should happen before the replica is removed (where possible), so that we
eliminate (or at least minimize) the amount of time that an invalid resolution
point is in someone else's system.

Where an identifier (such as a Handle) has a URL resolution, what should that
resolution be? ORNL DAAC DOI's resolve to a web page where a user (after
logging in) can see and download the components of the data set. Our opinion
is that the DOI resolving to a human interpretable description of the object
is more important than a machine interpretable resolution point. Some thought
and guidance on this point for the overall DataONE community of practice is
desirable.

Do we want/need a registry of name spaces? Where a MN uses a UUID (for
example), there may not be a way to describe the name space for identifiers,
unless the MN prefixes the UUID with some descriptor, which generally violates
the general admonition about smart identifiers. It might, however, be helpful
to have something like a set of regexps that describe the name space for a
MN's identifiers, particularly if an automated way could be developed to look
for potential collisions (non-null overlaps) between name spaces. BEW: My
thought is that this is far from an initial feature, but the desirability of
this as a possible future feature could have implications on the way we do
things from the start.

Can the metadata standards support multiple globally unique identifiers? For
example, what happens in the case that a MN starts down the DOI path and then
switches to LSID's because of economic costs, for example, and goes back and
assigns an LSID to historical data sets. Those data sets now have both an LSID
and a DOI. Where is this in the metadata? Is there a mechanism for indicating
the preferred ID and the alternate ID's? Likewise, how should things be
handled when a MN decides to register an object with e.g. GCMD and the
namespace that GCMD allows for identifiers does not allow for the MN's
preferred identifier. Can a MN update the metadata to show an alternate key
with the GCMD identifier (data set is also known as)? What is the implication
for the metadata identifier in such a case? This is an update operation to the
metadata, which implies that the metadata identifier is changed. How would one
update the old metadata record to indicate that it is:

(a) deprecated and

(b) the id of the new metadata record?

The above also relates to the issue of establishing predecessor-successor
relationships between identifiers. How should this be done across the system?

How do versions enter into the identifiers scheme? The general concept is that
different versions of an object have different identifiers. What about having
some type of an identifier that aggregates all versions of an object and which
always points to the latest version of that object? How does D1 know that an
object is a new version of an existing object? Update operation should take
the old identifier and the new identifier. That would allow for the tracking
of updates. A Member Node may track versions. Could create an interface
specification for "latest version" where the CN calls the authoritative MN for
the DS and asks for the identifier of the latest version of a particular
identifier. Points back to the need for what amounts to meta-metadata - where
the metadata object can be updated to indicate the status level of the data
set (e.g. deprecated). Where is the identifier for something like World Ocean
Data Base - this gets updated quarterly. They think of the fundamental unit as
an observation point, which is either a location (e.g. buoy, possibly with
different identifiers for different depths) or a leg of a trip, with multiple
observations along a path.

For identifiers, we may need to specify the character space. What happens when
a MN stores unique identifiers in a database field that supports just ASCII,
but a different MN does its unique identifiers in some other character set?
PURL is a possible unique identifier, but we can get into cases now where URLs
have characters from other language character sets (such as Arabic, Kanji, Ö)

What happens when a request for a replicated version of a data set comes to
the replicate MN and the data set has been updated and the originating MN has
not supplied the information about the update (e.g. they did an insert for the
new version)?

How do we assign ID's for a continuous data stream or for a subset calculated
on the fly? Does this mean that every request for a continuous data stream
gets its own data set identifier, which then gets stored in the D1 system
someplace? What is the value to the overall enterprise for storing the data
set identifiers for each request, particularly in the context of something
like a stream, where the on-the-fly processing is used to get a dynamic subset
or dynamic reprojection? Examples of this sort of situation include the stream
gauge data or the Atmospheric Radiation Measurement (ARM) archive. Ameriflux
Flux tower data is a simpler case, in that they work on the basis of a
site-year as a unit of data. The World Oceanic DataBase (WODB), however,
operates on a location (and possibly depth) as a unit of data. Many of these
are updated quarterly. Each unit of data has an identifier, unique within
WODB, and WODB publishes a data stream that indicates what data packages were
updated at what point in time. It is possible to determine whether a
particular data package changed between two points in time. The differences
are human interpretable, but it is not possible (in any generally automated
fashion) to recreate the data stream for a particular data package at an
arbitrary point in prior time.

Do the CN's need a method to determine the object type for an identifier? Do
identifiers need to be unique across all types of identified objects?h0j��h1h2h3Ucommentr¥��h5}rµ��(j1��j2��h:]h9]h7]h8]h<]uh>M∏h?hh*]r∂��hHXx7��OLD Notes follow, preserved here for now but likely to be removed

Suggested Strategy
------------------

1. DataONE supports all identifier schemes where the PID can be represented as
   a Unicode string (this should be any identifier).

2. The original identifier first assigned by a Member Node is the identifier
   promoted as the authoritative identifier for that content. Other identifiers
   that may be assigned by MNs that don't support the original scheme will be
   mapped to the original.

3. If the original MN discontinues participation in DataONE, then the
   identifier originally used remains as the authoritative identifier.

4. Any identifiers in use by the DataONE system can be resolved at any node
   (CN or MN). A caching system (e.g. memcached) should be used to improve
   resolution performance (can be primed with existing IDs).

This strategy will enable the use of any identifier that is represented by a
string, and will persist the original identifier for the object regardless of
what happens to the originating Member Node.

An obvious concern with this strategy is that a single object may have
multiple identifiers associated with it. Since the original identifier is
persisted, however, it will be the primary identifier by which that content
will be referenced, regardless of which node the object is located on.


..
   @startuml images/resolve.png
   title Resolve PID
   actor User
   participant "CRUD API" as m_crud << Member Node >>
   participant "Cache" as m_cache << Member Node >>
   participant "CRUD API" as cn_crud << Coordinating Node >>
   participant "Directory" as cn_dir << Coordinating Node >>
   User -> m_crud: resolve(token, "A5548D")
   m_crud -> m_cache: cache_lookup("A5548D")
   m_cache --> m_crud: FAIL
   m_crud -> cn_crud: resolve(token, "A5548D")
   cn_crud -> cn_dir: lookup("A5548D")
   cn_dir --> cn_crud: metadata
   cn_crud --> m_crud: metadata
   m_crud --> m_cache: addEntry("A5548D", metadata)
   m_crud --> User: metadata
   @enduml

.. image:: images/resolve.png

*Figure 1.* Resolving a PID. In this scenario a user is trying to determine
what the ID "A5548D" refers to, and uses the resolution service of a Member
Node to that effect.


..
   @startuml images/resolve-detail.png
   title Resolve PID Detail
   actor User
   participant "CRUD API" as m_crud << Member Node >>
   participant "Cache" as m_cache << Member Node >>
   participant "CRUD API" as cn_crud << Coordinating Node >>
   participant "Directory" as cn_dir << Coordinating Node >>
   participant "CRUD API" as m_crud2 << Member Node 2 >>
   User -> m_crud: get(token, "A5548D")
   m_crud -> m_cache: lookup("A5548D")
   note right
     Local resolve failed, defer to CN
   endnote
   m_cache --> m_crud: FAIL
   m_crud -> cn_crud: resolve(token, "A5548D")
   cn_crud -> cn_dir: lookup("A5548D")
   cn_dir --> cn_crud: metadata
   cn_crud --> m_crud: metadata
   m_crud --> m_cache: addEntry(GUID, metadata)
   m_crud -> m_crud: parseMetadata(metadata)
   note right
     Found data URL = http://mn2.dataone.org/objects/A4448D
   endnote
   m_crud --> User: HTTP 302: http://mn2.dataone.org/objects/A4448D
   note right
     Return a redirect to the MN 2 get object
     interface for the specified object.
   endnote
   User -> m_crud2: GET "http://mn2.dataone.org/objects/A4448D"
   m_crud2 --> User: bytes
   @enduml

.. image:: images/resolve-detail.png

*Figure 2.* Detail for object retrieval of an object identified by a PID. In
this case, the User is requesting a data object from MN 1, though the data is
actually located on MN 2.


..
   @startuml images/resolve-conflict.png
   title Conflicting IDs
   participant "MN_A" as mn_a
   participant "MN_B" as mn_b
   participant "CN" as cn
   participant "CN OStore" as cn_os

   mn_a -> cn: registerID("435")
   cn -> cn_os: store("mn_a:435")
   cn_os <-- cn: ACK
   mn_a <-- cn: ACK

   mn_b -> cn: registerID("435")
   cn -> cn_os: store("mn_b:435")
   cn_os <-- cn: ACK
   mn_b <-- cn: ACK

   actor user
   user -> cn: resolve("435")
   user <-- cn: "mn_a:435", "mn_b:435"
   @enduml

.. image:: images/resolve-conflict.png

*Figure 3.* A scenario where two MNs happen to add different content to the
system with the same identifier. Resolving the identifier without including
the namespace results in two matches that must be interpreted by the client.
The likelihood of such a scenario should be low, given that MNs should be
utilizing identifier schemes that under ideal circumstances should not
generate duplicate identifiers.


Notes from the 20090602 Albuquerque Meeting
-------------------------------------------

These lightly edited notes were taken by Bruce Wilson of the group discussion
about identifiers during the VDC-TWG 20090602 Albuquerque Meeting.

Original notes are located in subversion at:

/documents/Projects/VDC/docs/20090602_04_ABQ_Meeting


Design Goals
~~~~~~~~~~~~

From the DataONE perspective, an identifier is opaque. DataONE does not attach
any meaning or resolution protocol based on the identifier.

A call to return the object associated with a particular identifier should
always return either identically the same object or n/a if that object is no
longer available. This raises a number of implementation issues, noted below.
Particular issues include how to handle data which is regularly updated and
things like status changes.

A Member Node may use its own internal identification scheme, but must be able
to retrieve an object based on its DataONE globally unique identifier.

Member Nodes may generate their own unique identifiers, such as DOIs_,
Handles_, PURLs_, or UUIDs_. The only requirement is that the identifier is
unique across the space of DataONE. This implies that CN's must have
functionality to:

.. _DOIs: http://www.doi.org/
.. _Handles: http://www.handle.net/
.. _PURLs: http://purl.org/docs/index.html
.. _UUIDs: http://en.wikipedia.org/wiki/UUID


(a) check that an identifier is unique and

(b) to "reserve" or stub-out an identifier while the MN goes through the
    process of assembling the package to submit the object into DataONE.

When an object is replicated from one MN to another MN, the receiving MN must
be able to accept and resolve the supplied DataONE identifier. That is, an
object, no matter where it is within the DataONE network must be retrievable
by its DataONE identifier, regardless of location. There was a lot of
discussion on this point, and this is my interpretation of the conclusion. I
believe we came out with the point that if a receiving Member Node assigns its
own permanent identifier, then that creates more confusion, requires the MN to
register that second ID with the CNs, and we can have confusion regarding the
citation (for example) of the piece of data. It also makes tracking things
like metrics, since the originating MN must then find out all other
identifiers for the data and search for all of those. And while it can be
argued that nobody "owns" the data, there is (currently) a culture and need
for the original archive to feel like it still can receive credit for that
investment.

A system doesn't need to maintain every version, but it does need to be able
to identify every version.

Identifiers also apply to metadata as well as data.


Questions for Further Consideration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If a MN uses a DOI for a data set identifier, is it appropriate to include
doi: in the identifier. For example, 10.3334/ORNLDAAC/840 is the DOI for a
particular data set at the ORNL DAAC. Both "doi:10.3334/ORNLDAAC/840" and
"10.3334/ORNLDAAC/840" can be presumed to be unique identifiers. Which should
be used?

BEW: My personal preference is to use the one with the resolution
protocol included. That does, however, make the identifier more of a "smart"
identifier, which is generally problematic.

Where an identifier has a mechanism to resolve to multiple locations (such as
is possible with an LSID and some DOI mechanisms) and that object is
replicated from one MN to another MN, this would suggest that the originating
MN needs to be notified of the additional location and has the option of
registering the new location with the handle registration authority. This also
means that if a replication is removed, the original MN should have the option
of being notified, so that the resolution points are updated. Ideally, this
should happen before the replica is removed (where possible), so that we
eliminate (or at least minimize) the amount of time that an invalid resolution
point is in someone else's system.

Where an identifier (such as a Handle) has a URL resolution, what should that
resolution be? ORNL DAAC DOI's resolve to a web page where a user (after
logging in) can see and download the components of the data set. Our opinion
is that the DOI resolving to a human interpretable description of the object
is more important than a machine interpretable resolution point. Some thought
and guidance on this point for the overall DataONE community of practice is
desirable.

Do we want/need a registry of name spaces? Where a MN uses a UUID (for
example), there may not be a way to describe the name space for identifiers,
unless the MN prefixes the UUID with some descriptor, which generally violates
the general admonition about smart identifiers. It might, however, be helpful
to have something like a set of regexps that describe the name space for a
MN's identifiers, particularly if an automated way could be developed to look
for potential collisions (non-null overlaps) between name spaces. BEW: My
thought is that this is far from an initial feature, but the desirability of
this as a possible future feature could have implications on the way we do
things from the start.

Can the metadata standards support multiple globally unique identifiers? For
example, what happens in the case that a MN starts down the DOI path and then
switches to LSID's because of economic costs, for example, and goes back and
assigns an LSID to historical data sets. Those data sets now have both an LSID
and a DOI. Where is this in the metadata? Is there a mechanism for indicating
the preferred ID and the alternate ID's? Likewise, how should things be
handled when a MN decides to register an object with e.g. GCMD and the
namespace that GCMD allows for identifiers does not allow for the MN's
preferred identifier. Can a MN update the metadata to show an alternate key
with the GCMD identifier (data set is also known as)? What is the implication
for the metadata identifier in such a case? This is an update operation to the
metadata, which implies that the metadata identifier is changed. How would one
update the old metadata record to indicate that it is:

(a) deprecated and

(b) the id of the new metadata record?

The above also relates to the issue of establishing predecessor-successor
relationships between identifiers. How should this be done across the system?

How do versions enter into the identifiers scheme? The general concept is that
different versions of an object have different identifiers. What about having
some type of an identifier that aggregates all versions of an object and which
always points to the latest version of that object? How does D1 know that an
object is a new version of an existing object? Update operation should take
the old identifier and the new identifier. That would allow for the tracking
of updates. A Member Node may track versions. Could create an interface
specification for "latest version" where the CN calls the authoritative MN for
the DS and asks for the identifier of the latest version of a particular
identifier. Points back to the need for what amounts to meta-metadata - where
the metadata object can be updated to indicate the status level of the data
set (e.g. deprecated). Where is the identifier for something like World Ocean
Data Base - this gets updated quarterly. They think of the fundamental unit as
an observation point, which is either a location (e.g. buoy, possibly with
different identifiers for different depths) or a leg of a trip, with multiple
observations along a path.

For identifiers, we may need to specify the character space. What happens when
a MN stores unique identifiers in a database field that supports just ASCII,
but a different MN does its unique identifiers in some other character set?
PURL is a possible unique identifier, but we can get into cases now where URLs
have characters from other language character sets (such as Arabic, Kanji, Ö)

What happens when a request for a replicated version of a data set comes to
the replicate MN and the data set has been updated and the originating MN has
not supplied the information about the update (e.g. they did an insert for the
new version)?

How do we assign ID's for a continuous data stream or for a subset calculated
on the fly? Does this mean that every request for a continuous data stream
gets its own data set identifier, which then gets stored in the D1 system
someplace? What is the value to the overall enterprise for storing the data
set identifiers for each request, particularly in the context of something
like a stream, where the on-the-fly processing is used to get a dynamic subset
or dynamic reprojection? Examples of this sort of situation include the stream
gauge data or the Atmospheric Radiation Measurement (ARM) archive. Ameriflux
Flux tower data is a simpler case, in that they work on the basis of a
site-year as a unit of data. The World Oceanic DataBase (WODB), however,
operates on a location (and possibly depth) as a unit of data. Many of these
are updated quarterly. Each unit of data has an identifier, unique within
WODB, and WODB publishes a data stream that indicates what data packages were
updated at what point in time. It is possible to determine whether a
particular data package changed between two points in time. The differences
are human interpretable, but it is not possible (in any generally automated
fashion) to recreate the data stream for a particular data package at an
arbitrary point in prior time.

Do the CN's need a method to determine the object type for an identifier? Do
identifiers need to be unique across all types of identified objects?r∑��ÖÅr∏��}rπ��(h/U�h0j≤��ubaubeubeubah/U�Utransformerr∫��NU
footnote_refsrª��}rº��(X���1]rΩ��j��aX���3]ræ��jc��aX���2]rø��jX��aX���5]r¿��(jt��jº��eX���4]r¡��j¯��auUrefnamesr¬��}r√��(X���1]rƒ��j��aX���3]r≈��jc��aX���2]r∆��jX��aX���5]r«��(jt��jº��eX���4]r»��j¯��aX���rfc3986r…��]r ��jD��)ÅrÀ��}rÃ��(h/jS��h5}rÕ��(UnameX���RFC3986h:]h9]h7]UrefnamerŒ��j…��h8]h<]uh0j7��h*]rœ��hHX���RFC3986r–��ÖÅr—��}r“��(h/U�h0jÀ��ubah3jL��ubauUsymbol_footnotesr”��]r‘��Uautofootnote_refsr’��]r÷��Usymbol_footnote_refsr◊��]rÿ��U	citationsrŸ��]r⁄��h?hUcurrent_liner€��NUtransform_messagesr‹��]r›��(cdocutils.nodes
system_message
rfi��)Årfl��}r‡��(h/U�h5}r·��(h7]UlevelKh:]r‚��jX��ah9]r„��jW��aUsourceh2h8]h<]UlineKâUtypeUERRORr‰��uh*]rÂ��hL)ÅrÊ��}rÁ��(h/U�h5}rË��(h7]h8]h9]h:]h<]uh0jfl��h*]rÈ��hHX���Unknown target name: "rfc3986".rÍ��ÖÅrÎ��}rÏ��(h/U�h0jÊ��ubah3hPubah3Usystem_messagerÌ��ubjfi��)ÅrÓ��}rÔ��(h/U�h5}r��(h7]UlevelKh:]h9]Usourceh2h8]h<]UlineMvUtypeUINFOrÒ��uh*]rÚ��hL)ÅrÛ��}rÙ��(h/U�h5}rı��(h7]h8]h9]h:]h<]uh0jÓ��h*]rˆ��hHX*���Hyperlink target "guid" is not referenced.r˜��ÖÅr¯��}r˘��(h/U�h0jÛ��ubah3hPubah3jÌ��ubjfi��)År˙��}r˚��(h/U�h5}r¸��(h7]UlevelKh:]h9]Usourceh2h8]h<]UlineMyUtypejÒ��uh*]r˝��hL)År˛��}rˇ��(h/U�h5}r���(h7]h8]h9]h:]h<]uh0j˙��h*]r��hHX-���Hyperlink target "ogc wkt" is not referenced.r��ÖÅr��}r��(h/U�h0j˛��ubah3hPubah3jÌ��ubeUreporterr��NUid_startr��KU
autofootnotesr��]r��U
citation_refsr	��}r
��Uindirect_targetsr��]r��Usettingsr
��(cdocutils.frontend
Values
r��or��}r��(Ufootnote_backlinksr��KUrecord_dependenciesr��NUrfc_base_urlr��Uhttps://tools.ietf.org/html/r��U	tracebackr��àUpep_referencesr��NUstrip_commentsr��NU
toc_backlinksr��Uentryr��U
language_coder��Uenr��U	datestampr��NUreport_levelr��KU_destinationr��NU
halt_levelr��KU
strip_classesr ��NhENUerror_encoding_error_handlerr!��Ubackslashreplacer"��Udebugr#��NUembed_stylesheetr$��âUoutput_encoding_error_handlerr%��Ustrictr&��U
sectnum_xformr'��KUdump_transformsr(��NU
docinfo_xformr)��KUwarning_streamr*��NUpep_file_url_templater+��Upep-%04dr,��Uexit_status_levelr-��KUconfigr.��NUstrict_visitorr/��NUcloak_email_addressesr0��àUtrim_footnote_reference_spacer1��âUenvr2��NUdump_pseudo_xmlr3��NUexpose_internalsr4��NUsectsubtitle_xformr5��âUsource_linkr6��NUrfc_referencesr7��NUoutput_encodingr8��Uutf-8r9��U
source_urlr:��NUinput_encodingr;��U	utf-8-sigr<��U_disable_configr=��NU	id_prefixr>��U�U	tab_widthr?��KUerror_encodingr@��UUTF-8rA��U_sourcerB��h2Ugettext_compactrC��àU	generatorrD��NUdump_internalsrE��NUsmart_quotesrF��âUpep_base_urlrG��U https://www.python.org/dev/peps/rH��Usyntax_highlightrI��UlongrJ��Uinput_encoding_error_handlerrK��j&��Uauto_id_prefixrL��UidrM��Udoctitle_xformrN��âUstrip_elements_with_classesrO��NU
_config_filesrP��]Ufile_insertion_enabledrQ��àUraw_enabledrR��KU
dump_settingsrS��NubUsymbol_footnote_startrT��K�UidsrU��}rV��(hjv��h j-��hj"��hj��h#jM��hjf��j¿��jº��h!j/��j¸��j¯��jx��jt��j\��jX��jg��jc��j%��j��h%j��h$jò��h"h”jW��jQ��h(jå��hjC��jX��jfl��hj)��h&jª��h)j—��h'h-uUsubstitution_namesrW��}rX��h3h?h5}rY��(h7]h:]h9]Usourceh2h8]h<]uU	footnotesrZ��]r[��(j/��jM��jf��j��jò��eUrefidsr\��}r]��ub.