System News
Dynamic Resource Reallocation and Apache Hadoop with Sun Grid Engine
Latest Release Takes Sun Solution to the Next Level -- the Cloud
January 28, 2010,
Volume 143, Issue 4

[Sun Grid Engine 6.2 update 5 enables a great example of on-demand resource management

-- Steve Wilson, Sun

In its latest iteration, Sun's Grid Engine comes of age -- the cloud age, that is. According to Steve Wilson's blog the new version both enables dynamic resource reallocation and the ability to use on-demand resources from Amazon EC2 as well as deep integration with Apache Hadoop at the enterprise level.

Wilson explains that users of Grid Engine can now manage resources across logical clusters (or even clouds). These might be two collections of systems inside a corporation, or can include non-local cloud resources (such as EC2), he notes.

As an example of this use, Wilson cites the way many auto companies use Grid Engine to coordinate the resources on the Grid/Cluster/Cloud they use for mechanical design and simulation. Users across the company submit jobs (e.g., a crash simulation) and Grid Engine queues them and dispatches them based on priority and policy.

When submissions start to outpace the ability of a system to keep up, instead of buying new hardware as was the case, with the Grid Engine users can now configure rules that allow you to "cloud burst" these workloads out to another cloud. With Amazon EC2 specifically, users pre-configure a set of AMI images on EC2 that have your application software and register them with Grid Engine. You also give Grid Engine the credentials to manage your EC2 account. Then, based on your policy, Grid Engine will:

  • Fire up new EC2 instances on demand (using your supplied AMIs)
  • Automatically set up a secure VPN network tunnel between your network and your EC2 instances
  • Join them to the Grid Engine cluster
  • Dispatch work to them
  • Take them back down once demand has subsided

Wilson calls this "a great example of on-demand resource management," adding that it has the potential to save customers real money in avoiding over-provisioning their internal clouds.

No less exciting, in Wilson's view, is Grid Engine's new integration with Hadoop, a popular open-source implementation of Map-Reduce. Map-Reduce is the fundamental building block that power's the internal clouds at Yahoo and Google, and it's commonly used as a way to enable applications that can process huge collections of data, he explains.

Wilson expects the new Grid Engine release to accelerate adoption within the enterprise because Grid Engine is now a key ingredient to make Hadoop enterprise ready. At a technical level, Hadoop applications can now be submitted to Grid Engine, just like any other kind of parallel computation job, he explains. This means you can now more easily share a single set of physical resources between Hadoop and other tradition applications (financial risk modeling, crash simulations, weather prediction, batch processing, etc.).

The result is reduced cost to the customer. Beyond that, Grid Engine now has a deep understanding of Hadoop's global file systems (HDFS), which means that Grid Engine can send work to the right part of the cluster (where the data lives locally) to make it ultra-efficient -- even when sharing, according to Wilson.

Finally, Grid Engine has a mature usage accounting and billing feature (ARCo) built-in. That means you can now track and (internally) charge back for Hadoop jobs -- giving IT a real way to interact with the business, Wilson points out.

More Information

Steve Wilson's blog

Dan Templeton's blog on Sun Grid Engine 6.2 update 5

Sun Grid Engine 6.2 Update 5 Feature Release

Sun Grid Engine Product Page

Read More ... [ more...]



Other articles in the Software section of Volume 143, Issue 4:
  • Dynamic Resource Reallocation and Apache Hadoop with Sun Grid Engine (this article)

See all archived articles in the Software section.

Trending in
Vol 234, Issue 3
Trending IT Articles