Contents
The following logs are created by Coordinating Node services:
/var/log/dataone/replicate/cn-replication.log
/var/log/dataone/synchronize/cn-synchronization.log
TODO: Need a complete list of logs on the CNS
Step 0. If hostname -i
does not report the public IP address of the
system, then edit /etc/hosts
and set the public IP there. For example, on
cn-dev, hostname -i
reported 127.0.1.1. The hosts file was updated with
the correct value:
127.0.0.1 localhost
#127.0.1.1 cn-dev.dataone.org cn-dev
128.111.220.50 cn-dev.dataone.org cn-dev
...
See: http://docs.oracle.com/javase/1.5.0/docs/guide/management/faq.html#linux1
In this example, d1-processing is enabled for JMX monitoring.
Create the file /etc/dataone/process/jmx.passwd
with contents:
monitorRole {PASSWORD}
and the file /etc/dataone/process/jmx.access
with contents:
monitorRole readonly
Change owners of these to user tomcat6 and make them readable only by that user (has to be same user as process that will be launching the JMX service):
sudo chown tomcat6:tomcat6 /etc/dataone/process/jmx.*
sudo chmod 600 /etc/dataone/process/jmx.*
Shutdown d1-processing:
sudo /etc/init.d/d1-processing stop
now startup d1-processing with the JMX startup flags:
sudo env JAVA_OPTS="-Djava.awt.headless=true -Xmx4096M -Xms1024M \
-Dcom.sun.management.jmxremote \
-Dcom.sun.management.jmxremote.port=8010 \
-Dcom.sun.management.jmxremote.authenticate=true \
-Dcom.sun.management.jmxremote.ssl=false \
-Dcom.sun.management.jmxremote.password.file=/etc/dataone/process/jmx.passwd \
-Dcom.sun.management.jmxremote.access.file=/etc/dataone/process/jmx.access \
-Djava.rmi.server.hostname=128.111.220.50 \
-Dhazelcast.jmx=true" \
/etc/init.d/d1-processing start
Temporarily disable the firewall. This is necessary because even though the JMX service will listen on the specified port, the RMI service, which the JMX client will be directed to by the JMX service, will be listening on a random port:
sudo ufw disable
Open jconsole on your desktop, and select “Remote process”, entering in:
hostname:port
and the username “monitorRole” and the password specified in
/etc/dataone/process/jmx/passwd
.
After a couple of seconds the JMX client should be connected and start collecting statistics.
Remember to restart the firewall when you’re done:
sudo ufw enable
There’s an issue with Java security, probably need permission to access the
password and access files, but as an interim measure, disable JAVA_SECURITY in
/etc/init.d/tomcat6
, stop tomcat6, and restart with the following
parameters to enable JMX monitoring of tomcat:
sudo env JAVA_OPTS="-Djava.awt.headless=true -Xmx2048M -Xms1024M \
-Dcom.sun.management.jmxremote \
-Dcom.sun.management.jmxremote.port=8020 \
-Dcom.sun.management.jmxremote.authenticate=true \
-Dcom.sun.management.jmxremote.ssl=false \
-Dcom.sun.management.jmxremote.password.file=/etc/dataone/process/jmx.passwd \
-Dcom.sun.management.jmxremote.access.file=/etc/dataone/process/jmx.access \
-Djava.rmi.server.hostname=128.111.220.50 \
-Dhazelcast.jmx=true" \
/etc/init.d/tomcat6 start
sudo env JAVA_OPTS="-Djava.awt.headless=true -Xmx2048M -Xms1024M \
-Dcom.sun.management.jmxremote \
-Dcom.sun.management.jmxremote.port=8020 \
-Dcom.sun.management.jmxremote.authenticate=false \
-Dcom.sun.management.jmxremote.ssl=false \
-Dcom.sun.management.jmxremote.password.file=/etc/dataone/monitor/jmx.passwd \
-Dcom.sun.management.jmxremote.access.file=/etc/dataone/monitor/jmx.access \
-Djava.rmi.server.hostname=129.24.0.109 \
-Dhazelcast.jmx=true" \
/etc/init.d/tomcat6 start
Check jmx tool:
Usage: check_jmx -U url -O object_name -A attribute [-K compound_key] [-I attribute_info] [-J attribute_info_key] -w warn_limit -c crit_limit [-v[vvv]] [-help]
, where options are:
-help Prints this page
-U JMX URL, for example: "service:jmx:rmi:///jndi/rmi://localhost:1616/jmxrmi"
-O Object name to be checked, for example, "java.lang:type=Memory"
-A Attribute of the object to be checked, for example, "NonHeapMemoryUsage"
-K Attribute key for -A attribute compound data, for example, "used" (optional)
-I Attribute of the object containing information for text output (optional)
-J Attribute key for -I attribute compound data, for example, "used" (optional)
-v[vvv] verbatim level controlled as a number of v (optional)
-w warning integer value
-c critical integer value
Note that if warning level > critical, system checks object attribute value to be LESS THAN OR EQUAL warning, critical
If warning level < critical, system checks object attribute value to be MORE THAN OR EQUAL warning, critical
./check_jmx -U service:jmx:rmi:///jndi/rmi://localhost:8020/jmxrmi \
-O java.lang:type=Memory -A HeapMemoryUsage -K used -I HeapMemoryUsage \
-J used -vvvv -w 4248302272 -c 5498760192
./check_jmx -U service:jmx:rmi:///jndi/rmi://localhost:8020/jmxrmi \
-O java.lang:type=Memory -A LoadedClassCount -K used -I HeapMemoryUsage -J used -vvvv -w 4248302272 -c 5498760192
Get a JMX console tool. The one used in the examples here is jmxterm
available from: http://wiki.cyclopsgroup.org/jmxterm
Fire up jmxterm with something like java -jar jmxterm.jar
, then connect to
the target using the open command:
java -jar jmxterm.jar
$> open 127.0.0.1:8020
#Connection to 127.0.0.1:8020 is opened
Get a list of domains:
$>domains
#following domains are available
Catalina
JMImplementation
Users
com.sun.management
java.lang
java.util.logging
solr/
Select a domain, in this case Catalina and see what beans it offers:
$>domain Catalina
#domain is set to Catalina
$>beans
#domain = Catalina:
Catalina:J2EEApplication=none,J2EEServer=none,WebModule=//localhost/,j2eeType=Servlet,name=default
Catalina:J2EEApplication=none,J2EEServer=none,WebModule=//localhost/,j2eeType=Servlet,name=jsp
Catalina:J2EEApplication=none,J2EEServer=none,WebModule=//localhost/,name=jsp,type=JspMonitor
Catalina:J2EEApplication=none,J2EEServer=none,WebModule=//localhost/solr,j2eeType=Filter,name=SolrRequestFilter
Catalina:J2EEApplication=none,J2EEServer=none,WebModule=//localhost/solr,j2eeType=Servlet,name=Logging
Catalina:J2EEApplication=none,J2EEServer=none,WebModule=//localhost/solr,j2eeType=Servlet,name=SolrServer
Catalina:J2EEApplication=none,J2EEServer=none,WebModule=//localhost/solr,j2eeType=Servlet,name=SolrUpdate
Catalina:J2EEApplication=none,J2EEServer=none,WebModule=//localhost/solr,j2eeType=Servlet,name=default
Catalina:J2EEApplication=none,J2EEServer=none,WebModule=//localhost/solr,j2eeType=Servlet,name=jsp
Catalina:J2EEApplication=none,J2EEServer=none,WebModule=//localhost/solr,j2eeType=Servlet,name=ping
Catalina:J2EEApplication=none,J2EEServer=none,WebModule=//localhost/solr,name=jsp,type=JspMonitor
Catalina:J2EEApplication=none,J2EEServer=none,WebModule=//localhost/solr,name=ping,type=JspMonitor
Catalina:J2EEApplication=none,J2EEServer=none,j2eeType=WebModule,name=//localhost/
Catalina:J2EEApplication=none,J2EEServer=none,j2eeType=WebModule,name=//localhost/solr
Catalina:class=org.apache.catalina.UserDatabase,name="UserDatabase",resourcetype=Global,type=Resource
Catalina:host=localhost,name=ErrorReportValve,type=Valve
Catalina:host=localhost,name=StandardContextValve,path=/,type=Valve
Catalina:host=localhost,name=StandardContextValve,path=/solr,type=Valve
Catalina:host=localhost,name=StandardHostValve,type=Valve
Catalina:host=localhost,name=solr/home,path=/solr,resourcetype=Context,type=Environment
Catalina:host=localhost,path=/,resourcetype=Context,type=NamingResources
Catalina:host=localhost,path=/,type=Cache
Catalina:host=localhost,path=/,type=Loader
Catalina:host=localhost,path=/,type=Manager
Catalina:host=localhost,path=/,type=WebappClassLoader
Catalina:host=localhost,path=/solr,resourcetype=Context,type=NamingResources
Catalina:host=localhost,path=/solr,type=Cache
Catalina:host=localhost,path=/solr,type=Loader
Catalina:host=localhost,path=/solr,type=Manager
Catalina:host=localhost,path=/solr,type=WebappClassLoader
Catalina:host=localhost,type=Deployer
Catalina:host=localhost,type=Host
Catalina:name=StandardEngineValve,type=Valve
Catalina:name=common,type=ServerClassLoader
Catalina:name=http-8080,type=GlobalRequestProcessor
Catalina:name=http-8080,type=ThreadPool
Catalina:name=server,type=ServerClassLoader
Catalina:name=shared,type=ServerClassLoader
Catalina:port=8080,type=Connector
Catalina:port=8080,type=Mapper
Catalina:port=8080,type=ProtocolHandler
Catalina:realmPath=/realm0,type=Realm
Catalina:resourcetype=Global,type=NamingResources
Catalina:serviceName=Catalina,type=Service
Catalina:type=Engine
Catalina:type=MBeanFactory
Catalina:type=Server
Catalina:type=StringCache
The Host bean looks interesting:
$>bean Catalina:host=localhost,type=Host
#bean is set to Catalina:host=localhost,type=Host
$>info
#mbean = Catalina:host=localhost,type=Host
#class name = org.apache.tomcat.util.modeler.BaseModelMBean
# attributes
%0 - aliases ([Ljava.lang.String;, rw)
%1 - appBase (java.lang.String, rw)
%2 - autoDeploy (boolean, rw)
%3 - children ([Ljavax.management.ObjectName;, rw)
%4 - configClass (java.lang.String, rw)
%5 - deployOnStartup (boolean, rw)
%6 - deployXML (boolean, rw)
%7 - managedResource (java.lang.Object, rw)
%8 - modelerType (java.lang.String, r)
%9 - name (java.lang.String, rw)
%10 - realm (org.apache.catalina.Realm, rw)
%11 - unpackWARs (boolean, rw)
%12 - valveNames ([Ljava.lang.String;, rw)
%13 - valveObjectNames ([Ljavax.management.ObjectName;, rw)
%14 - xmlNamespaceAware (boolean, rw)
%15 - xmlValidation (boolean, rw)
# operations
%0 - void addAlias(java.lang.String alias)
%1 - void addChild(org.apache.catalina.Container child)
%2 - void destroy()
%3 - [Ljava.lang.String; findAliases()
%4 - void init()
%5 - void removeAlias(java.lang.String alias)
%6 - void start()
%7 - void stop()
#there's no notifications
Now let’s get a couple attribute values:
$>get appBase
#mbean = Catalina:host=localhost,type=Host:
appBase = webapps;
$>get children
#mbean = Catalina:host=localhost,type=Host:
children = [ Catalina:j2eeType=WebModule,name=//localhost/,J2EEApplication=none,J2EEServer=none, Catalina:j2eeType=WebModule,name=//localhost/solr,J2EEApplication=none,J2EEServer=none ];
Check_mk provides a layer of functionality over Nagios that simplifies configuration and monitoring of remote machines. The check_mk installation is located at:
and uses the central LDAP for authentication.
To monitor a new server with check_mk, it is necessary to install check-mk-
agent
, enable it as a service using xinetd, and ensure that fire walls are set
to allow requests from the check_mk server (monitor.dataone.org,
129.237.201.155). By default, the check-mk-service listens on TCP port 6556.
For Ubuntu servers, install the check-mk-agent
:
sudo apt-get update
sudo apt-get install xinetd check-mk-agent
Edit the xinetd configuration:
service check_mk
{
type = UNLISTED
port = 6556
socket_type = stream
protocol = tcp
wait = no
user = root
server = /usr/bin/check_mk_agent
# If you use fully redundant monitoring and poll the client
# from more then one monitoring servers in parallel you might
# want to use the agent cache wrapper:
#server = /usr/bin/check_mk_caching_agent
# configure the IP address(es) of your Nagios server here:
#only_from = 127.0.0.1 10.0.20.1 10.0.20.2
only_from = 127.0.0.1 129.237.201.155
# Don't be too verbose. Don't log every check. This might be
# commented out for debugging. If this option is commented out
# the default options will be used for this service.
log_on_success =
disable = no
}
Then restart xinetd and poke a hole through the firewall:
sudo service xinetd restart
sudo ufw allow from 129.237.201.155 to any port 6556
You can check this is running by connecting with telnet from an address listed
in the only_from
configuration parameter:
telnet MY_HOST 6556
The response should be immediate and verbose.
Add the server to the monitored set of servers by logging in https://monitor.dataone.org/check_mk then under WATO | Hosts add a new host to the appropriate group. Check the services, save the configuration, and the status should appear in the monitored servers.