The authors of the IBM tech article Capitalizing on Large Numbers of Processors with IBMWebSphere Portal on Solaris, Martin Presler-Marshall, Laura Yen, and Daniel Edwin, share their experience in evaluating IBM WebSphere Portal V6.1 running on a Sun T5240 server with 16 processor cores, supporting 128 simultaneous computing threads. The authors tested a variety of approaches, configurations and tuning techniques in an effort to determine how best to get the optimal results from software in a system with hundreds of simultaneous computing threads.
The authors chose to model a portal in a corporate intranet environment with users assigned to groups, perhaps by job function or based on organizational boundaries. Some of the content in the portal (for example, places, pages, and applications), is available only to specific groups, while other content is available to all authenticated users. Only a small number of pages are available to unauthenticated users. In this scenario, most users log in to the site.
For the content the team relied on some simple portlets created specifically for this scenario. The set of portlets developed for these measurements uses the API defined in the Java Portlet Specification 1.0, they write, adding that none of the portlets uses any database, network content source, or other external system; instead, these portlets generate all of their content from hard-coded inputs.
User interactions were simulated through a virtual user script. Each iteration of this script simulated a single user interacting with the site in the form of multiple page views. A virtual user ran many iterations of the script during a measurement. Each page view in the script was considered a single transaction. Response times were measured for each transaction and the total transaction rate.
The determined goal of the measurement was to find the highest capacity for the system, which is defined as the highest transaction rate at which response times still meet established criteria.
The authors used a a separate database server, directory server, HTTP server, and Deployment Manager server, all of which were in addition to the main system under test, the application server.
There were four configurations tested:
- Standalone application server. The simplest deployment is to deploy WebSphere Portal in a single Java virtual machine (JVM) on the portal server node.
- Vertical cluster environment. A vertical cluster uses the clustering capabilities in IBM WebSphere Application Server to deploy multiple Java virtual machines, all running WebSphere Portal, on a single system.
- Vertical cluster environment with processor sets. This configuration takes the vertical cluster configuration from the previous item and binds each processor to a subset of the available compute threads.
- Solaris virtualization using zones to provide a horizontal cluster. While the authors did not include this configuration in their lab measurements, it provides a way to further separate each application server JVM from the others running on the same physical server.
Findings show the vertical cluster plus processor sets superior in terms of relative throughput. The vertical cluster demonstrated the highest average processor utilization at maximum capacity as well.
Using Solaris processor sets, the authors grouped a number of processors into a pool, binding the pool with a Java process and using processor sets to partition virtual processors. Through our experiments, they write, the authors found that the most efficient usage for the scenario was to bind 21 compute threads to one JVM. They created a vertical cluster with six WebSphere Portal members, and then bound each member to a Solaris processor set. The "right" number of compute threads (and therefore WebSphere Portal members) can vary with the application, but the expectation was that four to six members will give good performance for most WebSphere Portal applications.
The authors also cover tuning details, finding that a 3.5 GB heap gave good performance for the application under study. Other applications can optimize their performance with different heap sizes, they note.
In their concluding assessment, the authors point out that any environment with a large number of compute threads has the possibility of encountering performance problems if there is a significant amount of locking within the application. This problem is exaggerated on an environment such as the Sun T5240. Because each compute thread is a relatively slow virtual processor, this type of processor architecture often holds locks longer than in more conventional processor architectures, they contend.
Finally, the writers note that even though significant effort went into WebSphere Portal V6.1 to reduce locking within the WebSphere Portal framework, this very feature can reduce performance problems due to locking. WebSphere Portal is an application framework that can be used to run a large variety of applications, but if the applications (portlets) used have significant amounts of locking in them, then locking can become a performance bottleneck.
More Information
SPECmail 2009: New World Record on T5240 1.6GHz Sun 7310 and ZFS
UltraSPARC T2 Tops Benchmark
[...read more...]