Rough Draft of CN Audit SystemMetadata Design
---------------------------------------------

Activity Diagrams

*Figure 1.* Audit Job Controller

.. figure:: images/01_activity.png

The Audit Job Controller is the start of the CN Audit SystemMetadata process.  The
Cn Audit SystemMetadata process communicates between the different CN
Members of a cluster via Hazelcast topics. These topics are configured as the 
process begins. All the nodes that should be a part of the cluster should be 
gathered from the configuration file and saved in a datastructure for ease of 
access and further description. 

The nodes in the hazelcast configuration are IPs.  These IPs will be 
crossreferenced to the DataONE configuration in order to determine their 
NodeIDs.  The DataONE configuration can be found online or in /etc/dataone.

	/etc/dataone/d1DebConfig.xml

or

	https://repository.dataone.org/software/cicore/trunk/cn-buildout/dataone-cn-os-core/etc/dataone/d1DebConfig.xml

or

	http://releases.dataone.org/debian/conf/d1DebConfig.xml

The active node of the cluster will need to be determined. The active node will 
be responsible for merging all documents, and keeping track of the progress of 
the passive nodes. The active node is determined by a property set in a 
CNAudit.properties file named cn.audit.activenode.  If the value in 
node.properties equals that of cn.audit.activenode, then the CN is considered 
the active node. 

The Audit Job Controller will initialize the static state of the program. Since 
this is a multithreaded application, state of a single running instance will 
need to be synchronized in order to avoid race conditions. Also, some of the 
static state will be persisted at the filesystem level, those files will need 
to be either created &/or truncated. 

There are several listeners and singletons that will need to 
instantiated before the rest of the system is running. One or more of the 
singletons will manage a persistence layer for java data 
objects. Listeners will be used to process commands from the AuditJob.

Any node that is passive will exit the Controller, only the active
node proceeds beyond this point.

The Audit Job Controller will start a loop that checks if all the nodes as
found in the hazelcast configuration are active in the hazelcast cluster. Once it
is determined that all the nodes are active, then the Audit Job will be started.
Since the Controller for the active node performs a loop, there will be a check
to ascertain if the audit job is actively running.

The Audit Job will end either because all pids have processed, because of a node 
failure, or because of an Audit Job failure,

If the Audit Job ended from all pids having been processed then the passive
nodes will be sent a signal to end processing. The Audit Job Controller will 
then evict all systemMetadata from the Storage Cluster.

If the CN Audit process still has pids to process, then the reason for the
Audit Job returning must be dtermined.If the Audit Job node ended with a Passive 
Node failure, then the failure reason must be checked and an appropriate state 
should be determined for continued processing.

If the Audit Job node ended from an Active Node failure, then the state should be
determined if it is recoverable error. If it is recoverable without a reset needed,
then the Audit Job may be continued from where it left off. If a reset is needed
then the failure reason should be checked and an appropriate state should be 
determined for continued processing.

If all the nodes are active, then all the nodes are sent the appropriate reset
signal. 

*Figure 2.* Audit Job

.. figure:: images/02_activity.png

The audit job will only run for the active node.  The passive nodes and the 
active node in the cluster will react to the messages send from the 
active node's running of the Audit Job.  

The first task of the Audit Job is to send a signal to all the nodes to harvest
pids, SerialVersions, lastSystemMetadataModificationDate and Deleted Indicators
from Metacat.  The Audit job will first receive from all the nodes
how many pids it expects from each. Then the Audit Job waits until all data 
have been processed from each of the nodes (including itself).  

Once the Audit  Job has a list of all pids with their corresponding state, 
then it will loop through all the  pids and process them. It compares the 
SerialVersions and lastSystemMetadataModificationDate and deleted Indicator of 
the working pid. It will determine if any of the instances have the deleted
indicator set. If the deleted Indicator is set, then a message must be sent
to delete the object. 

If the delete indicator is not set, then the comparison investigates the 
serial version. If the all the SerialVersions are the same, then the 
lastSystemMetadataModificationDate is compared.   If there is a discrepancy
between any of the node's serial version or lastSystemMetadataMOdified
with each other, then further processingmust take place. Otherwise, a message 
is sent to all the nodes that processing of the working pid is complete.

If there is a discrepancy in either the serialVersion or
lastSystemMetadataModificationDate, then SystemMetadata must be collected by
the active node from all the nodes in the cluster.  The AuditJob will send
a Retrieve SystemMetadata signal. The active node will wait until all the nodes
in the cluster have responded.  The version of the SystemMetadata with the
highest serial version, or if the serial versions are the same, the latest
lastSystemMetadataModificationDate will be considered the ascendant version
of the SystemMetadata that will be used to merge data from other revisions into.

Once all the nodes have responded then the active node will merge the 
systemMetadata.

Once the systemMetadata is merged, the active node will send an update 
systemMetadata request. The Update systemMetadata request will begin a 
Transaction on all the listening nodes. The AuditJob will place the 
systemMetadata to be updated on a Hazelcast structure to be be read by all the 
nodes.  The AuditJob will wait until all the Nodes have Acknowledged that the 
systemMetadata has been updated.

After the auditJob has confirmed that all the nodes have updated the 
SystemMetadata, then the Audit Job will send a commit message.  This commit
message will complete the transaction that began with the Update. The Audit Job
will wait until all the listening nodes have responded with an acknowledgement
that the transaction was committed successfully.

The AuditJob can then move the WorkingPid to the list of pids that have been
completed.  The Audit job will then pull the next pid off of the active pid
list and continue with the processing loop. Otherwise, if all the pids have
been processed the AuditJob will end.

*Figure 2.1* Merge SystemMetadata

.. figure:: images/02-01_activity.png

SystemMetadata Merge Rule

The highest serial version establishes the ascendant revision of the
SystemMetadata. Only if the serial versions are the same across the 
SystemMetadata instances across the cluster will the most recent 
dateSystemMetadataModified be used to determine the ascendant revision.

First, determine if the serialVersions are equal. If they are not then the
SystemMetadata record with the highest serial version becomes the ascendant
revision of the SystemMetadata.  If the serial versions are equal, then find
the SystemMetadata record with the most recent dateSystemMetadataModified.

Determine if any record has the archive flag set. If any record does have the
archive flag set, then the ascendant revision must have its archive flag set.

Determine if any record has the obsoletes or obsoletedBy field set. If any record 
does have the obsoletes or obsoletedBy field set, then copy the value from
either the obsoletes or obsoletedBy field to the ascendant revision.

Determine if any record has replication policy info set. On accendant revision, 
Set replication allowed if set true on any revision instance. Set number of 
replicas to highest number from any revision instance. Merge preferred member 
node list and blocked member node list. So long as there are no conflicts 
betwen preferred and blocked list. If conflicts, then use ascendant 
revisions lists. Preferred list must maintain original order.



*Figure 2.2* Audit Job Listener

.. figure:: images/02-02_activity.png

*Figure 2.2.1* Audit Job Listener Harvest List

.. figure:: images/02-02-01_activity.png

*Figure 2.2.1-a* Audit Job Listener Process Temp Harvest List

.. figure:: images/02-02-01-a_activity.png

*Figure 2.2.2* Audit Job Listener Get Pid, SerialVersion and Date

.. figure:: images/02-02-02_activity.png

*Figure 2.2.3* Audit Job Listener Get SystemMetadata Record

.. figure:: images/02-02-03_activity.png

*Figure 2.2.4* Process Update to SystemMetadata

.. figure:: images/02-02-04_activity.png

*Figure 3* CN Audit Package Structures

.. figure:: images/01_class.png

*Figure 3.1*  CN Audit Control Package Structure

.. figure:: images/02_class.png

*Figure 3.2*  CN Audit Event Package Structure

.. figure:: images/03_class.png

*Figure 3.3*  Cn Audit Strategy Package Structure

.. figure:: images/04_class.png

*Figure 3.4*  Cn Audit Data Package Structure

.. figure:: images/05_class.png

*Figure 3.4a*  Cn Audit Hazelcast Data Package Structure

.. figure:: images/05_01_class.png

*Figure 3.4b*  Cn Audit Persistent Data Package Structure

.. figure:: images/05_02_class.png

*Figure 3.5*  Cn Audit Sql Package Structure

.. figure:: images/06_class.png
