Inspect is a new monitoring tool included in Sun Grid Engine (SGE) Update 3. It is a flexible interface for viewing current and historical data on SGE cluster(s) and the Service Domain Manager.
Inspect is developed purely in Java and uses JMX. This means an SGE installation has to enable JMX. Users interested in using Inspect must have version 6+ of the following software: Sun JDK, OpenJDK, IcedTea, and Apple JDK. The new GUI-based monitoring tool supports all the following operating systems:
- Microsoft Windows (XP, Vista, Server), 32bit and 64bit
- Linux: Intel platform, 32bit and 64bit
- Solaris: Intel & SPARC platform, 32bit and 64bit
- Mac OS X: Intel platform, 32 and 64bit
The Sun wiki on Inspect explains that several steps are required to successfully run and install Inspect:
- Install JDK6 or higher. JRE 6 is not sufficient due to underlying VisualVM dependencies on JDK 6.
- Set the JAVA_HOME environment variable. If this variable is not set, an error message is shown and the application is terminated.
- Use a JMX port greater than 1024 during the qmaster installation for any Sun Grid Engine systems that are installed as admin user. If the admin user is root, then this restriction does not apply.
- If enabling JMX for a previously installed system with inst_sge -addjmx, the necessary credentials are not always automatically created. The system should add at least a keystore for the admin user. This keystore can usually be found under /var/sgeCA/port$SGE_QMASTER_PORT/userkeys/admin user/keystore. If the credentials are not created, see How to Generate Certificates, Private Keys and Keystores for Users.
- If upgrading an existing SGE installation, the JMX-specific settings may disappear after a restart. If JMX is enabled (see $SGE_ROOT/$SGE_CELL/common/bootstrap: jvm_thread 1), then qconf -mconf |grep libjvm_path should point to a valid libjvm shared library path. If libjvm_path cannot be found, then you can enable it by doing the following:
1. Open the cluster configuration editor: qconf -mconf
2. Set the shared library path (this may vary by architecture).
libjvm_path /<java home>/lib/<arch>/libjvm.so
additional_jmv_args -Xmx256m
3. Restart the jvm_thread(jvm) or restart the qmaster:
qconf -kt jvm
qconf -at jvm
or
$SGE_ROOT/$SGE_CELL/common/sgemaster stop
$SGE_ROOT/$SGE_CELL/common/sgemaster start
The wiki offers a host of information on using Inspect to connect to and monitor a cluster. Topics include:
More Information
Sun Grid Engine Inspect - Wiki
Screenshots of Inspect by Chris Dag
Sun Grid Engine 6.2 Update 3
[...read more...]