System News
Topology-aware Scheduling in Sun Grid Engine
Upgrade 5 Adds Valuable Capability for CPU Scheduling
February 12, 2010,
Volume 144, Issue 2

I won't be surprised if 12 months from now I include adding that switch to the sge_request file in my top 10 list of best practices

-- Dan Templeton
 

The benefits of building topology-aware scheduling into Sun Grid Engine 6.2 update 5 are explained in a blog entry by Dan Templeton. Given that an average OS does context switches at a rather high frequency, an application may find itself executing on a different CPU and core every time it gets the chance to run. If that application makes any use of the CPU cache, for example, its performance will suffer for it. The performance might not suffer much, but the difference is usually measurable. Hence, the virtues of topology-aware scheduling.

With this new feature available in Sun Grid Engine 6.2 update 5, users are allowed to specify three different flavors of distribution strategy: linear, striding, or explicit, Templeton explains. In linear distribution, the execution daemon will place the job's threads/processes on consecutive cores if possible. If it can't fit the entire job on a single socket, it will span the job across sockets. The striding strategy tells the execution daemon to place the job on every nth core, e.g. every 4th core or every other core. The explicit strategy lets the user decide exactly which cores will be assigned to the job. Note that the core binding is a request, not a requirement. If for some reason the execution daemon can't fulfill the request, the job will still be executed; it just won't be bound.

Templeton continues his blog with mention of the three possible binding mechanisms in Update 5. Users can either allow Sun Grid Engine to do the binding automatically as part of the job execution, or can have Sun Grid Engine add the binding parameters to the machines file for OpenMPI jobs, or can have Sun Grid Engine just describe the intended binding in an environment variable with the expectation that the job will bind itself based on that information.

When a job is bound by Sun Grid Engine during execution, Templeton writes, the job will be tied to specific CPU cores using an OS-specific system call. On Linux, the bound processors may be shared with other processes. On Solaris, the bound processors are used exclusively for the job. In either case, the job will only be allowed to execute on the bound processors.

Templeton next informs readers about how to tell what kinds of topologies are provided by the machines in the cluster, which is through the use of some new default complexes that have been added that describe the socket/core/thread layouts of the machines. These new complexes can be used during job submission to request specific topologies, or they can be used with qhost to report what's available. He then presents examples of the linear, striding and explicit binding mechanisms.

Templeton concludes his blog with the remark that it's clear that jobs that benefit from specific process placement with respect to CPU cores will perform much better in a 6.2u5 cluster, thanks to this new feature. Even for regular old run-of-the-mill jobs, though, submitting with -binding linear:1 should provide a small performance bump because it will keep them from being jostled around between context switches. "In fact, I won't be surprised if 12 months from now I include adding that switch to the sge_request file in my top 10 list of best practices," he projects.

More Information

Templeton's blog

Sun Grid Engine 6.2 Update 5 Feature Release

White Paper: Extreme Scalability using Sun Grid Engine Software [...read more...]

Keywords:

fullsource
 

Other articles in the Software section of Volume 144, Issue 2:

See all archived articles in the Software section.



News and Solutions for Users of Solaris, Java and Oracle's Sun hardware products
Just the news you need, none of what you don't – 42,000+ Members – 24,000+ Articles Published since 1998