The nature and functions of Service Domain Manager (SDM), an add-on component for Sun Grid Engine 6.2u5, are described in DanT's GridBlog. He writes that SDM is designed to allow for services of all types to share resources with each other. He explains that each cluster has a set of performance metrics specified via service level objectives (SLOs). If at any point a cluster is in violation of its SLOs, it appeals to the SDM resource provider service for additional resources. The resource provider will look for resources wherever they're available and, finding them, will re-assign the resources to the cluster in need.
The blogger identifies the resource provider as the heart and brain of SDM whose job is to keep track of services and resources and adjust resource assignments as needed. At the level of the resource provider, he continues, everything is very abstract. It doesn't know (or care) what any of its managed services do, as long as they implement the required interface. It also doesn't care about the details of the resources its managing, beyond the fact that there are details, and that the services it's managing may care about those details.
The resource provider is also able to assess the degree of need in a request for additional resources, which is expressed as a description of the desired resources to satisfy the need (including quantity), and how important the need is. The resource provider is governed by a set of policies that govern the relative importance of the services and that allow the resource provider to decide if the importance of the requesting service plus the criticality of its need outweighs the importance of the potential donor service and how much it's using the resources in question. If there are no resources that can reasonably be reassigned to the needy cluster, then the request stays pending and will be reevaluated again later, DanT explains.
Another aspect of SDM is the service adapter, whose job it is to be the shim between the service itself and the resource provider. It implements that abstracted service interface that the resource provider expects and translates the abstract concepts of availability and need into concrete artifacts understood by the service, the blog continues. It is the service adapter's function to define and implement the SLOs for the service in a more granular fashion than the service provider can manage. DanT uses the Sun Grid Engine adapter to provide a detailed illustration of the behavior of the service adapter.
The blog notes that, when the resource provider assigns a resource to the service, the service adapter is responsible for prepping the resource and adding it into the service, where it remains until another service needs it more.
"Resources are shared, not leased. It is possible to configure SDM to behave in a fashion that is in effect leasing, but it's something you have to explicitly set up," DanT explains.
When the resource provider is asked for a resource, the blog continues, it talks to the service adapters for the managed services to find out who has something that can be borrowed. The resource provider keeps a map of where all the resources are assigned, and can immediately tell which services are currently holding resources that are candidates for reassignment.
The next step is for the service provider to contact the service adapters of those services to find out whether the resources are in use. The service adapter then looks at the service and places a numerical value of how well the resources are being used by the service. Once the resource provider has collected the usage values for all the candidate resources, it applies policies (such as relative importance of the services) and picks the resources that seem most available, DanT writes. He adds that this process applies equally to services, spare pools, and cloud service providers, noting that there is a built-in spare pool in the resource provider that doesn't actually have its own service adapter, but it works as though it did.
Finally, DanT points out that, with the 6.2u5 release, there are two service adapter implementations. One is for the Sun Grid Engine software itself. The other is a generic cloud adapter that comes with integration scripts for use with Amazon EC2 and for use with IPMI power management. Out of the box, you can use SDM to manage Sun Grid Engine clusters and to resource those clusters on demand from EC2. It is also possible to configure a spare pool that powers down idle or underutilized machines. The intention here, he writes, is to add additional service adapter implementations as the concrete demand for them manifests.
DanT invites input from users with an interest in seeing or developing a service adapter for a particular service.
More Information
Topology-aware Scheduling in Sun Grid Engine
Sun Grid Engine 6.2 Update 5 wiki page
Sun Grid Engine 6.2 Update 5 Product Page
[...read more...]