understand more about the inner workings of the Sun Grid Engine software, and how to tune it to get the maximum effect in large-scale and super-scale environments
White Paper: Extreme Scalability using Sun Grid Engine Software Best Practices on Tuning the Software for Maximum Effect in Large & Super Scale Environments
Several best practices for building super scale clusters utilizing Sun Grid Engine software are shared in the Sun white paper "Extreme Scalability Using Sun Grid Engine Software," which also reveals practices to avoid. The 27-page paper authored by Daniel Templeton explains distributed resource management (DRM) systems, of which the Sun Grid Engine is a part, and focuses on the Sun software's features and architecture.
"While the Sun Grid Engine software is capable of scaling to meet the largest super-computing challenges, some care must be taken in cluster construction and configuration to unlock the software’s full potential," Templeton writes. "The tips and suggestions presented in this document are aimed at helping readers approach the building of a super-scale or large-scale compute cluster."
In regards to best practices, Templeton identifies the following factors (only briefly noted here) along with information on how the Sun Grid Engine software meets the criteria:
Architectural Considerations
Master Host: The Sun Grid Engine qmaster process has been multithreaded since the 6.0 release. As of the 6.2 release, the scheduler process also is just another thread in the qmaster process. A typical system running Sun Grid Engine software 6.2 update 3 qmaster uses approximately 13 active threads. Processor cores in relation to performance is discussed, as is memory requirements and considerations.
Networking: Inter-process communication in a Sun Grid Engine cluster is minimal. In the majority of clusters, the network traffic produced by inter-process communications by the Sun Grid Engine software does not require a special high-speed network interconnect.
Cluster Configuration
Load Reports: For clusters running Sun Grid Engine software release 6.2 or later, the load report interval is a minor factor for scalability. If the qmaster process is located on a host running Solaris or OpenSolaris, the DTrace script included with the Sun Grid Engine software can be used to look at the number of reports being received by the qmaster process in a given interval.
Scheduling Interval: The first thing to consider when configuring the scheduling interval is the system on which the qmaster process is running. When only one processor is available, the scheduler must compete with the other threads in the qmaster process for CPU time. On machines where two or more cores are available to the qmaster process, the scheduler does not compete with other qmaster process threads for compute time, and is free to consume up to 100% of a processor core. The qmaster process operates at peak capacity when scheduling runs happen back-to-back.
Scheduling Runs: To minimize the amount of scheduler overhead, it is highly recommended in large clusters to avoid adding unnecessary configuration objects, such as additional queues, resources, and projects.
Execution Host Spool Directories: It is recommended that the spool directory for an execution daemon reside on a local file system versus a shared file system. This will decrease the time required to write and then read job information.
Master Host Spooling Method: Choices with the Sun Grid Engine qmaster process are classic or berkeley, and the selection depends on the size and availability needs of the cluster.
File System Sharing: The entire Sun Grid Engine root directory often is shared in small clusters. The root directory contains the cell directory, the qmaster process spool directory, and the product binaries. This practice makes installing execution hosts extremely easy, as all of the required software is available via the shared file system. However, in clusters that are not highly available, placing the qmaster process (classic) spool directory on a shared file system can negatively impact performance. For this reason, it is always better from a performance perspective to have a local copy of the product binaries on every execution host.
Portable Data Collector (PDC): Responsible for gathering useful information about running jobs and enforcing CPU time and memory limits, the PDC can produce a noticeable amount of CPU load. Setting the PDC to run less frequently reduces CPU consumption. However, there are pros and cons to this, which are discussed in the paper.
Data Management
The Sun Grid Engine software does not manage data. It assumes the data required for a job is available on the execution host, and that the administrator has used the provided facilities to implement the required data staging. For clusters where some form of data staging has been implemented, the extensibility of the Sun Grid Engine software’s resource model presents some interesting opportunities for scheduling optimization, especially when dealing with very large data sets.
High Availability
Sun Grid Engine software includes a built-in high availability mechanism called a shadow daemon or shadow master. Every shadow daemon that runs in a cluster reads the heartbeat file periodically. If the file has not been updated since the last time it was read, a failover process is initiated. After several fail-safes to make sure the current qmaster process has stopped responding, one of the shadow daemon processes starts a qmaster process of its own on its local machine. A highly available cluster must place the qmaster process spool directory and the common directory on a shared file system. In a large-scale cluster, placing the qmaster process spool on dual-attached storage can help improve performance.
What to Avoid
Practices the author cautions against when building and running large-scale Sun Grid Engine clusters are listed under the subtitles: accounting files, cluster state, scheduling information, job submission verification, and JSV creations.
Customized news reports about Sun Microsystems. Just the news you need, none of what you don't. 50,000+ Members. 20,000+ Articles Published since 1998.