Advanced Edition of IBM WebSphere Application Server

IBM WebSphere Application Server 3.5 Standard / Advanced Edition

Tuning Guide

WebSphere Application Server product graphic
 

Introduction

    What's New for Performance Tuning?

    Big Hitter Tuning Knobs

    Make/Break Tuning Knobs

Individual Performance Parameters

    Hardware capacity and settings

       Processor Speed

       Memory

       Network

    Operating System settings

       AIX (4.3.2/4.3.3)

             AIX file descriptors (ulimit)

       Solaris (2.6-2.7)

             Solaris file descriptors (ulimit)
             Solaris tcp_close_wait_interval/tcp_time_wait_interval
             Solaris tcp_fin_wait_2_flush_interval
             Solaris tcp_keepalive_interval
             Other Solaris TCP parameters
             Solaris kernel semsys:seminfo_semume
             Solaris kernel semsys:seminfo_semopm

       HP-UX 11

             Adjusting the operating system priority of the WebSphere Application Server Process
             Setting the Virtual Page Size to 64K for WebSphere Application Server JVM
             HP-UX 11 tcp_conn_request_max
             HP-UX 11 Kernel Parameter Recommendations

    The Web server

       Web Server Configuration Reload Interval

       IBM HTTP Server - AIX and Solaris

             MaxClients
             MinSpareServers, MaxSpareServers, and StartServers

       Netscape Enterprise Server - AIX and Solaris

             Active threads

       Microsoft Internet Information Server - Windows NT

             IIS Permission Properties
             Number of Expected Hits per Day
             ListenBackLog parameter

       IBM HTTP Server - Linux

             MaxRequestsPerChild

       IBM HTTP Server - Windows NT

             ThreadsPerChild
             ListenBacklog
             Disable FRCA

    The WebSphere application server process

       Servlet engines

             Favor the OSE queue type
             Transport Type
             Max Connections
             URL Invocation Cache

       Web Applications

             Servlet Reload Interval and AutoReload
             JSP Reload Interval

       EJB Container

             EJB Container Thread Pool Size
             Cache Settings
             Deployment Descriptors
             Option A Caching

       Security

             Turn off security when you do not need it
             Fine-tune the security cache timeout for your environment
             Configure SSL sessions appropriately

       Object Request Brokers (ORBs)

             Call-by-Value vs Call-by-Reference (noLocalCopies)

    Java Virtual Machines (JVMs)

       JIT

       Heap Size Settings (-Xmx and -Xms)

       HP-UX "-mn" Heap Size Parameter

       Garbage Collection Settings

    The Database

       Database Location

       WebSphere Data Source Connection Pool Size

       Prepared Statement Cache Size

       DB2

             Use TCP Sockets for DB2 on UNIX
             Choice of JDBC driver for DB2
             DB2 MaxAppls
             DB2 MaxAgents
             DB2 BUFFPAGE
             Query Optimization Level

    Session Management

       Session Affinity

       Keep sessions in memory whenever possible

       Close finished sessions promptly

       Configure the Session Manager database connections

       Using Cache with persistent session

Synergistic Performance Parameter Discussions

    Adjusting WebSphere's System Queues

       WebSphere Queuing Network

             Closed Queues vs. Open Queues
             Queue Settings in WebSphere

       Determining Settings

             Queuing before WebSphere
             Drawing a Throughput Curve
             Queue Adjustments
             Queue Adjustments for Accessing Patterns

       Queuing and Enterprise Java Beans

       Queuing and Clustering

    Tuning Java Memory

       The Garbage Collection Bottleneck

       The Garbage Collection Gauge

       Detecting Over Utilization of Objects

       Detecting memory leaks

       Java Heap Parameters

    Relaxing Auto Reloads

    Number of Connections to DB2

    Solaris TCP parameters

Appendix A - SEStats.java

Appendix B - GCStats.java

Appendix C - Additional Reference

Appendix D - Performance Tool Procedures

       Starting NT Performance Monitor

       Edit IBM HTTP Server file httpd.conf

       WebSphere AppServer Resource Analyzer

             Enable Resource Analyzer counter collection
             Start Resource Analyzer

Introduction

The intent of this document is to provide guidance related to tuning WebSphere Application Server through discussion of:

  1. Individual performance knobs (a reference).
  2. Synergistic performance parameters (general discussions and relationships between knobs).

In addition to hardware capacity, hardware settings, and operating system settings, the main areas that affect tuning of WebSphere Application Sever performance are:



A.  Web server


B.  WebSphere application server process  


C.  Java Virtual Machine (JVM)


D.  database 


Tuning the topology

All of the above decision points have their own tuning options.

Performance tuning is an ongoing learning experience. Your results might vary from those presented in this guide. It is possible to use a good tuning knob and not see any performance improvement because of a blocking bottleneck. Eliminate the blocking bottleneck and you should witness a difference in the results of using the tuning knob.

Application tuning sometimes offers the greatest tuning improvements. The following White Paper addresses application tuning: WebSphere Application Server Development Best Practices for Performance and Scalability (see: Appendix C - Additional Reference). Application rewrites have increased performance improvements 2x to 5x. It is wise to assume a reasonable amount of application tuning precedes knob/parameter tuning. Check Appendix C - Additional Reference periodically for other updates and additions.

For your convenience, some procedures are described for knobs in other products. These procedures should be considered hints, as the other products may change.

What's New for Performance Tuning?

Big Hitter Tuning Knobs

"Big Hitter" Tuning Knobs have made a significant difference in performance. Because these are APPLICATION dependent, the appropriate settings/knobs for the application and environment could be different.
Name of knob
Relaxing Auto Reloads
Adjusting WebSphere's System Queues
Call-by-Value vs Call-by-Reference (noLocalCopies)
Solaris TCP parameters
Tuning Java Memory
MaxRequestsPerChild: on Linux with IBM HTTP Server throughput +50% PingServlet
WebSphere Data Source Connection Pool Size
Prepared Statement Cache Size: throughput +12%Trade2JDBC +21%Trade2EJB
Using Cache with persistent session

Make/Break Tuning Knobs

"Make/Break" parameters need to be set under certain conditions to prevent functional problems.

Name of knob
ListenBackLog parameter: applies if running NT with IIS under heavy client load
Transport Type: use INET Sockets on Solaris (which is the default for WebSphere Application Server 3.5)
Number of Connections to DB2: If you establish more connections than DB2 sets up by default
use TCP Sockets for DB2 on UNIX: for local databases
WebSphere Data Source Connection Pool Size: avoid deadlock for applications which require more than one connection per thread
Disable FRCA: This only applies if running on NT with IHS and response times are greater than 2 minutes

Individual Performance Parameters

Hardware capacity and settings

This section discusses philosophies to consider when selecting and configuring the hardware on which your application servers will run.

Processor Speed

Memory

Network

Operating System settings

This section discusses philosophies to consider when tuning the operating systems in the server environment.

AIX (4.3.2/4.3.3)

AIX file descriptors (ulimit)

Solaris (2.6-2.7)

Solaris file descriptors (ulimit)
Solaris tcp_close_wait_interval/tcp_time_wait_interval
Solaris tcp_fin_wait_2_flush_interval
Solaris tcp_keepalive_interval
Other Solaris TCP parameters

There are customer success stories from modifications to other Solaris tcp parameters such as



tcp_conn_req_max_q


tcp_comm_hash_size


tcp_xmit_hiwat


and others.  


Although significant performance differences due to raising these settings have not been seen, your system might benefit.

Solaris kernel semsys:seminfo_semume
Solaris kernel semsys:seminfo_semopm

HP-UX 11

HP-UX 11 settings can be modified to significantly improve WebSphere Application Server performance.

Adjusting the operating system priority of the WebSphere Application Server Process
Setting the Virtual Page Size to 64K for WebSphere Application Server JVM
HP-UX 11 tcp_conn_request_max
HP-UX 11 Kernel Parameter Recommendations

The Web server

The WebSphere Application Server product is designed to "plug-in" to several different Web server brands and versions. Each Web server-operating system combination features specific tuning parameters that effect the application performance.

This section discusses the performance tuning alternatives associated with the Web servers.

Web Server Configuration Reload Interval

IBM HTTP Server - AIX and Solaris

IHS is a multi-process, single-threaded server. More information on tuning the IBM HTTP server can be found at:



http://www.software.ibm.com/webservers/httpservers/doc/v136/misc/perf.html


MaxClients

MinSpareServers, MaxSpareServers, and StartServers

Netscape Enterprise Server - AIX and Solaris

Netscape server default configuration provides a single-process, multi-threaded server.

Active threads

Microsoft Internet Information Server - Windows NT

IIS Permission Properties

Number of Expected Hits per Day
ListenBackLog parameter

IBM HTTP Server - Linux

MaxRequestsPerChild

IBM HTTP Server - Windows NT

The IBM HTTP Server is quite configurable. The default settings are usually acceptable.

ThreadsPerChild
ListenBacklog
Disable FRCA

The WebSphere application server process

Each WebSphere application server process has several parameters influencing application performance. Each application server in your WebSphere Application Server product is comprised of an enterprise bean container and a servlet engine.

Use the WebSphere Application Server Administrative Console to configure and tune applications, servlet engines, EJB containers, application servers, and nodes in your administrative domain.

Administrative console

Servlet engines

To route servlet requests from the Web server to the servlet engines, the product establishes a transport queue between the Web server plug-in and each servlet engine.

Favor the OSE queue type
Transport Type
Max Connections
URL Invocation Cache

Web Applications

You can also set parameters specific to each Web Application you deploy. The settings can effect performance.

Servlet Reload Interval and AutoReload
JSP Reload Interval

EJB Container

EJB Container Thread Pool Size
Cache Settings

Deployment Descriptors
Option A Caching

Security

Turn off security when you do not need it
Fine-tune the security cache timeout for your environment
Configure SSL sessions appropriately

Object Request Brokers (ORBs)

Several settings are available for controlling internal ORB processing. Use these to improve application performance in the case of applications containing enterprise beans.

Use the Command Line property of the "Default Server" or any additional WebSphere application server you configure in the WebSphere administrative domain to set these parameters.

Call-by-Value vs Call-by-Reference (noLocalCopies)

Java Virtual Machines (JVMs)

Tuning the JVM

The JVM offers several tuning parameters impacting the performance of WebSphere application servers (which are primarily Java applications), as well as the performance of your own applications. Set JVM parameters in the Command Line property of the "Default Server" or any additional WebSphere application server you configure in the WebSphere administrative domain.

JIT

Heap Size Settings (-Xmx and -Xms)

HP-UX "-mn" Heap Size Parameter

Garbage Collection Settings

The Database

WebSphere Application Server Version 3.5 is tightly integrated with a supported database of your choice. (See the Getting Started book software requirements for details). WebSphere Application Server uses the database as a persistent backing store for administration, as well as to store session state and enterprise bean data for your application.

If your application uses WebSphere Session State, JDBC Database Connection Pooling or enterprise beans, pay special attention to how you configure these resources and their database settings within the WebSphere administrative domain. During WebSphere Application Server installation a database named "WAS" is typically established, although you can specify a different name. This document assumes you used "WAS."

Database Location

WebSphere Data Source Connection Pool Size

Prepared Statement Cache Size

DB2

DB2 has many parameters you can configure to optimize database performance. We will not attempt to make this document an all encompassing DB2 tuning document. For complete DB2 tuning information, please refer to the DB2 System Monitor Guide and Reference.

Use TCP Sockets for DB2 on UNIX
Choice of JDBC driver for DB2
DB2 MaxAppls
DB2 MaxAgents
DB2 BUFFPAGE
Query Optimization Level

Session Management

Session Affinity

Keep sessions in memory whenever possible

Close finished sessions promptly

Configure the Session Manager database connections

Using Cache with persistent session

Synergistic Performance Parameter Discussions

Adjusting WebSphere's System Queues

WebSphere has a series of interrelated components that must be harmoniously tuned to support the custom needs of your end-to-end eBusiness application. These adjustments will help your system achieve maximum throughput, while maintaining overall system stability.

WebSphere Queuing Network

WebSphere establishes a queuing network that is a network of interconnected queues that represent the various components of the application serving platform. These queues include the network, web server, servlet engine, EJB container, data source and possibly a connection manager to a custom backend system. Each of these WebSphere resources represents a queue of requests waiting to use that resource.

Queueing network diagram

The WebSphere queues are load-dependent resources -- the average service time of a request depends on the number of concurrent clients.

Closed Queues vs. Open Queues

Most of the queues comprising the WebSphere queuing network are closed queues. A closed queue places a limit on the maximum number of requests active in the queue. (Conversely, an open queue places no such restrictions on the maximum number of requests active in a queue). A closed queue allows system resources to be tightly managed. For example, the WebSphere servlet engine's Max Connections setting controls the size of the servlet engine queue. If the average servlet running in a servlet engine creates 10 megabytes of objects during each request, then setting Max Connections to 100 would limit the memory consumed by the servlet engine to approximately 1 gigabyte. Hence, closed queues typically allow the system administrators to manage their applications more effectively and robustly.

In a closed queue, a request can be in one of two states -- active or waiting. An active request is either doing work or waiting for a response from a downstream queue. For example, an active request in the web server is either doing work (such as retrieving static HTML) or waiting for a request to complete in the servlet engine. In waiting state, the request is waiting to become active. The request will remain in waiting state until one of the active requests leaves the queue.

All web servers supported by WebSphere are closed queues. The WebSphere servlet engine and data source are also closed queues in that they allow you to specify the maximum concurrency at the resource. The EJB container inherits its queue behavior from its built-in Java ORB. Therefore, the EJB container, like the Java ORB, is an open queue. Given this fact, it is important for the application calling enterprise beans to place limits on the number of concurrent callers into the EJB container If enterprise beans are being called by servlets, the servlet engine will limit the number of total concurrent requests into an EJB container because the servlet engine has a limit itself. This fact is only true if you are calling enterprise beans from the servlet thread of execution. There is nothing stopping you from creating your own threads and bombarding the EJB container with requests. This is one of the reasons why it is not a good idea for servlets to create their own work threads.

Queue Settings in WebSphere

The following outlines the various WebSphere queue settings:

Determining Settings

The following section outlines a methodology for configuring the WebSphere queues. You can always change the dynamics of your system, and therefore your tuning parameters, by moving resources around (such as moving your database server onto another machine) or providing more powerful resources (such as a faster set of CPUs with more memory). Thus, tuning should be done using a carbon copy of your production environment.

Queuing before WebSphere

The first rule of WebSphere tuning is to minimize the number of requests in WebSphere queues. In general, it is better for requests to wait in the network (in front of the web server) than it is for them to wait in WebSphere. This configuration will result in only allowing requests into the WebSphere queuing network that are ready to be processed. (Later, we will discuss how to prevent bottlenecking that might occur from this configuration by using WebSphere clustering.) To effectively configure WebSphere in this fashion, the queues furthest upstream (closest to the client) should be slightly larger. Queues further downstream should be progressively smaller. A sample configuration leading to client requests being queued upstream shows arriving requests being queued in the network as the number of concurrent clients increases beyond 75 concurrent users.

Upstream queueing diagram

The queues in this WebSphere queuing network are progressively smaller as work flows downstream. When 200 clients arrive at the web server 125 requests will remain queued in the network because the web server is set to handle 75 concurrent clients. As the 75 requests pass from the web server to the servlet engine, 25 will remain queued in the web server and the remaining 50 will be handled by the servlet engine. This process will progress through the data source, until finally 25 users arrive at the final destination, the database server. No component in this system will have to wait for work to arrive because, at each point upstream, there is some work waiting to enter that component. The bulk of the requests will be waiting outside of WebSphere, in the network. This will add stability to WebSphere because no one component is overloaded. Waiting users can also be routed to other servers in a WebSphere cluster using routing software like IBM's Network Dispatcher.

Drawing a Throughput Curve

Using a test case that represents the full spirit of the production application (for example, it should exercise all meaningful code paths) or using the production application itself, run a set of experiments to determine when the system capabilities are maximized (the saturation point). Conduct these tests after most of the bottlenecks have been removed from the application. The typical goal of these tests is to drive CPUs to near 100% utilization.

Start your initial baseline experiment with large queues. This will allow maximum concurrency through the system. For example, start the first experiment with a queue size of 100 at each of the servers in the queuing network: web server, servlet engine and data source.

Now, begin a series of experiments to plot a throughput curve, increasing the concurrent user load after each experiment. For example, perform experiments with 1 user, 2 users, 5, 10, 25, 50, 100, 150 and 200 users. After each run, record the throughput (requests/second) and response times (seconds/request).

Throughput curve diagram

The curve resulting from your baseline experiments should resemble the typical throughput curve shown above. The throughput of WebSphere servers is a function of the number of concurrent requests present in the total system. Section A, "the light load zone", shows that as the number of concurrent user requests increase, the throughput increases almost linearly with the number of requests. This reflects the fact that, at light loads, concurrent requests face very little congestion within WebSphere's system queues. After some point, congestion starts to build up and throughput increases at a much lower rate until it hits a saturation point that represents the maximum throughput value, as determined by some bottleneck in the WebSphere system. The best type of bottleneck is when the CPUs of the WebSphere Application Server become saturated. This is desirable because a CPU bottleneck is easily remedied by adding additional or more powerful CPUs. Section B is the "heavy load zone." As you increase the concurrent client load in this zone, throughput will remain relatively constant. However, your response time will increase proportionally to your user load. That is, if you double the user load in the "heavy load zone," the response time will double. At some point, represented by Section C (the "buckle zone"), one of the system components becomes exhausted. At this point, throughput will start to degrade. For example, the system might enter the "buckle zone" when the network connections at the web server exhaust the limits of the network adapter or if you exceed the OS limits for File handles.

If the saturation point is reached by driving the system CPUs close to 100%,move on to the next step. If the CPU was not driven to 100%, there is likely a bottleneck that is being aggravated by the application. For example, the application might be creating Java objects excessively causing garbage collection bottlenecks in Java (as discussed further in the Tuning Java Memory section). There are two ways to deal with application bottlenecks. The best way is to remove them. Use a Java-based application profiler to examine your overall object utilization. Profilers such as JProbe, available from the KLGroup, or Jinsight, available from the IBM alphaWorks web site, can be used with WebSphere to help "turn the lights on" within the application. Cloning is another way to deal with application bottlenecks, as discussed in a later.

Queue Adjustments

The number of concurrent users at the saturation point represents the maximum concurrency of the application. It also defines the boundary between the "light" and "heavy" zones. Select a concurrent user value in the "light load zone" that has a desirable response time and throughput combination. For example, if the application saturated WebSphere at 50 users, you may find 48 users gave the best throughput and response time combination. This value is called the Max. Application Concurrency value. Max. Application Concurrency becomes the value to use as the basis for adjusting your WebSphere system queues. Remember, we want most users to wait in the network, so queue sizes should decrease as you move downstream. For example, given a Max. Application Concurrency value of 48, you might want to start with system queues at the following values: web server 75, servlet engine 50, data source 45. Perform a set of additional experiments adjusting these values slightly higher and lower to find the exact "sweet spot."

Appendix A - SEStats.java provides the source listing of a Java utility servlet, that reports the number of concurrent users in the servlet engine. SEStats can be run either after or during a performance experiment. In WebSphere 3.5, the Resource Analyzer can also be used to determine Max Application Cncurrency.

Queue Adjustments for Accessing Patterns

In many cases, only a fraction of the requests passing through one queue will enter the next queue downstream. In a site with many static pages, many requests will be turned around at the web server and will not pass to the servlet engine. The web server queue can then be significantly larger than the servlet engine queue. In the previous section, the web server queue was set to 75 rather than to something closer to the Max. Application Concurrency. Similar adjustments need to be made when different components have vastly different execution times. Consider an application that spends 90% of its time in a complex servlet and only 10% making a short JDBC query. On average only 10% of the servlets will be using database connections at any given time, so the database connection queue can be significantly smaller than the servlet engine queue. Conversely, if much of a servlet's execution time is spent making a complex query to a database, then consider increasing the queue values at both the servlet engine and the data source. As always, you must monitor the CPU and memory utilization for both WebSphere Application Server and database servers to ensure that you are not saturating either CPU or memory.

Queuing and Enterprise Java Beans

Method invocations to Enterprise Java Beans are queued only if the client, which is making the method call, is remote. That is, if the EJB client is running in a separate Java Virtual Machine (another address space) from the enterprise bean. On the other hand, if the EJB client (either a servlet or another enterprise bean) is installed in the same JVM, then the EJB method will run on the same thread of execution as the EJB client and there will be no queuing.

Remote enterprise beans communicate using the RMI/IIOP protocol. Method invocations initiated over RMI/IIOP will be processed by a server side ORB. The EJB container’s thread pool will act as a queue for incoming requests. However, if a remote method request is issued and there are no more available threads in the thread pool, a new thread is created. After the method request completes the thread is destroyed. Hence, when the ORB is used to process remote method requests, the EJB container is an open queue, because it’s use of threads is unbounded. The following illustration depicts the two Queuing options of EJBs.

EJB queueing options diagram

When configuring the thread pool, it is important to understand the calling patterns of the EJB client. (See above section on Queue Adjustments for Accessing Patterns) If a servlet is making a small number of calls to remote enterprise beans and each method call is relatively quick, consider settting the number of threads in the ORB thread pool to a smaller value than the Servlet Engine’s Max Concurrency.

EJB queueing lifetimes diagram

The degree to which you should increase the ORB thread pool value is a function of the number of simultaneous servlets (i.e., clients) calling EJBs and the duration of each method call. If the method calls are on the longer-side, consider making the ORB thread pool size equal to the ServletEngine Max Concurrency size because there will be little interleaving of remote methods calls.

Two Servlet-to-EJB calling patterns that might occur in a WebSphere are reviewed as follows: The first pattern shows the servlet making a few number of short-lived (i.e., quick) calls. In this model there will interleaving of requests to the ORB. Servlets can potentially be reusing the same ORB thread. In this case, the ORB Thread pool can be small, perhaps even ½ of the Max Concurrency setting of the ServletEngine. In the second example, the Longer-lived EJB calls would hold a connection to the remote ORB longer and therefore, "tie-up" threads to the remote ORB. Configure more of a one to one relationship between the ServletEngine and the remote ORB thread pools.

Queuing and Clustering

The application server cloning capabilities of WebSphere can be a valuable asset in configuring highly scaleable production environments. This is especially true when the application is experiencing bottlenecks that are preventing full CPU utilization of SMP servers. When adjusting WebSphere's system queues in clustered configurations, remember when adding a server to a cluster, the server downstream will get twice the load.

Queueing and clustering diagram

Two servlet engine clones sit between a web server and a data source. We can assume that the web server, servlet engines and data source (but not the database) are all running on a single SMP server. Given these constraints, the following queue considerations need to be made:

Tuning Java Memory

The following section focuses on tuning Java's memory management. Enterprise applications written in the Java programming language often involve complex object relationships and utilize large numbers of objects. Although Java automatically manages memory associated with an object's life cycle, it is important to understand the object usage patterns of the application. In particular,ensure that:

  1. The application is not over utilizing objects
  2. The application is not leaking objects (i.e., memory)
  3. Java heap parameters are set to handle your object utilization
Before discussing these topics, visualizing the impacts of garbage collection and how to use garbage collection to gauge the health of the WebSphere application will be reviewed.

The Garbage Collection Bottleneck

Examining Java garbage collection (GC) can give insight into how the application is utilizing memory. First, it's important to mention that garbage collection is one of the strengths of Java. By taking the burden of memory management away from the application writer, Java applications tend to be much more robust than application written in non-garbage collected languages. This robustness applies as long as the application is not abusing objects. It is normal for garbage collection to consume anywhere from 5% to 20% of the total execution time of a well-behaved WebSphere application. If not kept in check, GC can be your application's biggest bottleneck, especially true when running on SMP server machines.

The problem with GC is simple -- during garbage collection, all application work stops. This is because modern JVMs support a single-threaded garbage collector. During GC, not only are freed objects collected, but memory is also compacted to avoid fragmentation. It is this compacting that forces Java to stop all other activity in the JVM.

How GC can impact performance on a 2-way SMP computer is a good example of this.

Garbage collection diagram

Here we plot CPU utilization over time. Processor #1 is represented by the thick line and Processor #2 the thin line. Garbage collection always runs on Processor #1. The graph starts with both processors working efficiently at nearly 100% utilization. At some point, GC begins place on Processor #1. Because all work is suspended (except the work associated with GC), Processor #2 has almost 0% utilization. After GC completes, both processors resume work.

JVM technology is evolving rapidly. In particular, the IBM family of JVMs continues to improve features like garbage collection, threading and the just-in-time compiler. In a typical WebSphere workload with servlets, JSPs and data access, moving from IBM JVM 1.1.6 to JVM 1.1.8 improved GC performance by 2x. JVM 1.2.2 continues to improve GC performance. JVM 1.2.2 is supported in WebSphere 3.5.

IBM's JVM 1.3 addresses the single-threaded GC issue, by added multithreaded GC Support.

The Garbage Collection Gauge

Use garbage collection to gauge the application's health. By monitoring garbage collection during the execution of a fixed workload, users gain insight as to whether the application is over utilizing objects. GC can even be used to detect the presence of memory leaks.

GCStats is a utility program that tabulates statistics using the output of the -verbosegc flag of the JVM. A listing of the program is provided in Appendix A. The following is a sample output from GCStats:



java GCStats stderr.txt 68238


-------------------------------------------------


- GC Statistics for file - stderr.txt


-------------------------------------------------


-* Totals


- 265 Total number of GCs


- 7722 ms.  Total time in GCs


- 12062662 Total objects collected during GCs


- 4219647 Kbytes.  Total memory collected during GCs


-* Averages


- 29 ms.  Average time per GC. (stddev=4 ms.)


- 45519 Average objects collected per GC. (stddev=37 objects)


- 15923 Kbytes.  Average memory collected per GC. (stddev=10 Kbytes)


- 97 %. Free memory after each GC. (stddev=0%)


- 8% of total time (68238ms.) spent in GC.


___________________________ Sun Mar 12 12:13:22 EST 2000


Using GCStats is simple. First, enable verbose garbage collection messages by setting the -verbosegc flag on the Java command line of the WebSphere Application Server. For production environments, the minimum and maximum heap sizes should be the same. Pick an initial heap parameter of 128M or greater assuming you have ample system memory. (We will go into heap settings later.) The following screen shot illustrates how to set these parameters:

Default server diagram

After the Application Server is started, detailed information on garbage collection will be logged to the standard error file. It is important to clear the standard error file before each run so that statistics will be limited to a single experiment. The test-case run during these experiments must be a representative, repetitive workload so that we can measure how much memory is allocated in a "steady-state cycle" of work. (These are the same requirements as the test used the throughput curve in Step 1.)

It is also important to determine how much time the fixed workload requires. The tool that is driving the workload to the test application must be able to track the total time spent. This number is passed into GCStats on the command line. If your fixed workload (such as 1000 HTTP page requests) takes 68238 ms to execute, and verbosegc output is logged to stderr.txt in the current directory, then use the following command-line:



java GCStats stderr.txt 68238


To ensure meaningful statistics, run the fixed workload until the state of the application is "steady." This typically means running for at least several minutes.

Detecting Over Utilization of Objects

GCStats can provide clues as to whether the application is over utilizing objects. The first statistic to look at is total time spent in GC (the last statistic presented in the output). As a rule of thumb, this number should not be much larger than 15%. The next statistics to examine are: average time per GC; average memory collected per GC; average objects collected per GC. (Average objects is not available in JDK 1.2.2). Looking at these statistics gives an idea of the amount of work occurring during a single GC.

If test numbers show over utilizing objects is leading to a GC bottleneck, there are three possible actions. The most cost effective remedy is to optimize the application by implementing object caches and pools. Use a Java profiler to determine which objects in the application should be targeted. If for some reason the application can't be optimized, a brute force solution can be to used as a combination of server cloning and additional memory and processors (in an SMP computer). The additional memory will allow each clone to maintain a reasonable heap size. The additional processors will allow the clones to distribute the workload among multiple processors. Statistically speaking, when one application server clone starts a GC, it is likely that the others will not be in GC, but doing application work. A third possibility is to move to WebSphere 3.5, which is based on JDK 1.2.2. The improved GC technology in this JVM will likely reduce GC times dramatically.

Detecting memory leaks

Memory leaks in Java are a dangerous contributor to GC bottlenecks. They are worse than memory over utilization because a memory leak will ultimately lead to system instability. Over time, garbage collection will occur more and more frequently until finally the heap will be exhausted and Java will fail with a fatal Out of Memory Exception. Memory leaks occur when an unneeded object has references which are never deleted. This most commonly occurs in collection classes, such as Hashtable, because the table itself will always have a reference to the object, even when all "real" references have been deleted.

GCStats can provide insight into whether your application is leaking memory. There are several statistics to monitor for memory leaks. However, for best results, repeat experiments with increasing workload durations (such as 1000, 2000, and 4000 page requests ). Clear the standard error file, and run GCStats after each experiment. You are likely to have a memory leak if, after each experiment:

Memory leaks must be fixed. The best way to fix a memory leak is to use a Java profiler that allows you to count the number of object instances. Object counts that exhibit unbounded growth over time indicate a memory leak.

Java Heap Parameters

The Java heap parameters can influence the behavior of garbage collection. There are two Java heap parameter settings on the Java command line, -ms (starting heap size) and -mx (maximum heap size). Increasing these creates more space for objects to be created. Because this space takes longer for the application to fill, the application will run longer before a GC occurs. However, a larger heap will also take longer to sweep for freed objects and compact. Garbage collection will also take longer.

Set -ms and -mx to values close to each other for performance analysis

When tuning a production environment, it is important to configure -ms and -mx to the equal values (i.e., -ms256m -mx256m). This will assure that the optimal amount of heap memory is available to the application and Java will not be overworked attempting to grow the heap. On newer versions of JVMs (e.g., 1.2, 1.3), however, it is advisable to allow the JVM to adapt the heap to the workload. From a high level point of view, there is a mechanism within the JVM which tries to adapt to the working set size and a heap which is too big or small. With WebSphere 3.5 and beyond consider setting the -ms value to half the value of -mx (i.e., -ms128m -mx256m).

Varying Java heap settings diagram

The above illustration represents three CPU profiles, each running a fixed workload with varying Java heap settings. The center profile has -ms128M and -mx128M. We see 4 GCs. The total time in GC is about 15% of the total run. When we double the heap parameters to 256M, as in the top profile, we see the length of the work time between GCs increase. There are only 3 GCs, but the length of each GC also increased. In the third graph, the heap size was reduced to 64M and exhibits the opposite effect. With a smaller heap, both the time between GCs and time for each GC are shorter. Note that for all three configurations, the total time in garbage collection is approximately 15%. This illustrates an important concept about Java heap and its relationship to object utilization. Garbage collection is a fact of life in Java. You can "pay now" by setting your heap parameters to smaller values or "pay later" by setting your heap to larger values, but garbage collection is never free.

Use GCStats to search for optimal heap settings. Run a series of test experiments varying Java heap settings. For example, run experiments with 128M, 192M, 256M, and 320M. During each experiment, monitor total memory usage. If you expand the heap too aggressively, paging may occur. (Use vmstat or the Windows NT performance monitor to check for paging.) If paging occurs, reduce the size of the heap or add more memory to the system. After each experiment, run GCStats, passing it the total time of the last run. When all runs are done, compare the following statistics from GCStats:

You will likely see behavior similar to the graph above. However, if the application is not over utilizing objects and it has no memory leaks, it will hit a state of steady memory utilization, in which garbage collection will occur less frequently and for short durations.

Relaxing Auto Reloads

Solaris TCP parameters

Number of Connections to DB2

Appendix A - SEStats.java



/**


*


* SEStats


*


* A servlet program that keeps some simple statistics for a web application


* running in a servlet engine.  The servlet initializes itself as a listener


* of servlet invocation events to keep track of the number of requests being


* concurrently serviced and the time it takes each request to finish.


*


* The servlet reports statistics for overall basis, i.e., since the servlet was


* initialized, and on an interval basis. To configure the interval length, set


* the "intervalLength" initial servlet parameter to the number of requests you


* want in each interval's statistics.  The default intervalLength setting is 100


* requests.


*


* @version     1.5, 2/4/01


* @author      Carmine F. Greco


*


* 3/17/2000          Initial coding


*/


import com.ibm.websphere.servlet.event.*;


import javax.servlet.*;


import javax.servlet.http.*;


import java.io.PrintWriter;


import java.util.*;





public class SEStats extends HttpServlet implements ServletInvocationListener {


static Integer lock = new Integer(0);


static int activeThreads;


static int aggregateCount;


static int aggregateComplete;


static int intervalMax;


static int overallMax;


static long totalServiceTime;


static long lastIntervalServiceTime;


static int intervalLength = 100;


static Vector intervalMaxs;


static Vector intervalTimes;


static Hashtable urlCount;





public void init(ServletConfig config) {


// Initialize all servlet counters and variables.


activeThreads = 0;


aggregateCount = 0;


aggregateComplete = 0;


intervalMax = 0;


overallMax = 0;


totalServiceTime = (long)0.0;


lastIntervalServiceTime = (long)0.0;


intervalMaxs = new Vector();


intervalTimes = new Vector();


urlCount = new Hashtable();





// Check for initial parameters


String tmp;


if((tmp = config.getInitParameter("intervalLength")) != null) {


intervalLength = Integer.valueOf(tmp).intValue();


}





// Register as a listener for this WebApplication's servlet invocation events


ServletContextEventSource sces = (ServletContextEventSource)config.getServletContext().getAttribute(


ServletContextEventSource.ATTRIBUTE_NAME);


sces.addServletInvocationListener(this);


}





public void doGet(HttpServletRequest req, HttpServletResponse res) {


try {


PrintWriter out = res.getWriter();


out.println("<HTML><HEAD><TITLE>Servlet Engine Statistics</TITLE></HEAD>");


out.println("<BODY>");


out.println("<h1>Servlet Engine Statistics</h1>");  


out.println("<h2>Overall Statistics</h2>");


out.println("<table border>");


out.println("<TR><TD>Total service requests:</TD><TD>"+aggregateCount+"</TD></TR>");


out.println("<TR><TD>Overall maximum concurrent thread count:</TD><TD>"+overallMax+"</TD></TR>");


out.println("<TR><TD>Total service time (ms):</TD><TD>"+totalServiceTime+"</TD></TR>");


out.println("</table>");





out.println("<h2>Interval Statistics</h2>");


out.println("Interval length: " + intervalLength);


out.println("<table border>");


out.println("<TR><TH>Interval</TH><TH>Interval Maximum concurrent thread count</TH><TH>Interval Time</TH></TR>");


for(int i = 0; i < intervalMaxs.size(); i++) {


if(((Integer)intervalMaxs.elementAt(i)).intValue() == overallMax) {


out.println("<b><TR><TD>"+i+"</TD><TD>"+intervalMaxs.elementAt(i)+"</TD><TD>"+intervalTimes.elementAt(i)+"</TD></TR></b>");


}else {


out.println("<TR><TD>"+i+"</TD><TD>"+intervalMaxs.elementAt(i)+"</TD><TD>"+intervalTimes.elementAt(i)+"</TD></TR>");


}


}


out.println("</table>");


out.println("</BODY>");


out.println("</HTML>");


}catch(Exception e) {e.printStackTrace();}


}





public void destroy() {


// Cleanup listener


ServletContextEventSource sces = (ServletContextEventSource)getServletConfig().getServletContext().getAttribute(


ServletContextEventSource.ATTRIBUTE_NAME);


sces.removeServletInvocationListener(this);


}





/*


* ServletInvocationListener


*/


public void onServletStartService(ServletInvocationEvent event) {


synchronized(lock) {


activeThreads++;


aggregateCount++;





// Keep track of the interval maximum


if(activeThreads > intervalMax) {


intervalMax = activeThreads;


}





// Keep track of the overall maximum


if(intervalMax > overallMax) {


overallMax = intervalMax;


}





if(aggregateCount%intervalLength == 0) {


// Record and reset interval stats


intervalMaxs.addElement(new Integer(intervalMax));


intervalMax = 0;


}


}


}





public void onServletFinishService(ServletInvocationEvent event) {


synchronized(lock) {


aggregateComplete++;


activeThreads--;





// Add response time to total time


totalServiceTime += event.getResponseTime();





if(aggregateComplete%intervalLength == 0) {


// Record total interval response time


intervalTimes.addElement(new Long(totalServiceTime - lastIntervalServiceTime));


lastIntervalServiceTime = totalServiceTime;			


}


}


}


}


Appendix B - GCStats.java

// GCStats.java


// This utility tabulates data generated from a verbose garbage collection trace.


// To run this utility type:


//     java GCStats inputfile [total_time]


//


// Gennaro (Jerry) Cuomo - IBM Corp.  03/2000


// Carmine F. Greco 3/17/00 - JDK1.2.2 compatibility


//


import java.io.*;


import java.util.*;





public class GCStats {





static int   total_time=-1;                     // total time of run in ms


static long  total_gctime=0, total_gctime1=0;   // total time spent in GCs


static long  total_bytes=0, total_bytes1=0;     // total bytes collected


static long  total_free=0, total_free1=0;       // total 


static int   total_gc=0;                        // total number of GCs





static boolean verbose=false; // debug trace on/off





public static void parseLine(String line) {


//  parsing a string that looks like this...


//  <GC(31): freed 16407744 bytes in 107 ms, 97% free (16417112/16777208)>





if (isGCStatsLine(line)) {  // First test if line starts with "<GC..."





if (verbose) System.out.println("GOT a GC - "+line);


long temp=numberBefore(line, " bytes")/1024;     // get total memory collected


total_bytes+=temp; total_bytes1+=(temp*temp);


temp=numberBefore(line, " ms");             // get time in GC


total_gctime+=temp; total_gctime1+=(temp*temp);    


temp=numberBefore(line, "% free");          // get time % free


total_free+=temp; total_free1+=(temp*temp);    


if (temp!=0) {


total_gc++;                              // total number of GCs


}


}


}





public static int numberBefore( String line, String s) {


int    ret = 0;


int    idx = line.indexOf(s);


int    idx1= idx-1;


if (idx>0) {





// the string was found, now walk backwards until we find the blank


while (idx1!=0 && line.charAt(idx1)!=' ') idx1--;


if (idx1>0) {


String temp=line.substring(idx1+1,idx);


if (temp!=null) {


ret=Integer.parseInt(temp);    // convert from string to number


}


} else {


if (verbose) System.out.println("ERROR: numberBefore() - Parse Error looking for "+s);


}


}


return ret;


}





public static boolean isGCStatsLine(String line) {


return ( (line.indexOf("<GC") > -1) && (line.indexOf(" freed")>0) && (line.indexOf(" bytes")>0));


}





public static void main (String args[]) {


String filename=null;


BufferedReader foS=null;


boolean keepgoing=true;





if (args.length==0) {


System.out.println("GCStats - ");


System.out.println("     - ");


System.out.println("     - Syntax: GCStats filename [run_duration(ms)]");


System.out.println("     -  filename = file containing -verbosegc data");


System.out.println("     -  run_duration(ms) = duration of fixed work run in which GCs took place");


return;


}


if (args.length>0) {


filename=args[0];


}


if (args.length>1) {


total_time=Integer.parseInt(args[1]);


}


if (verbose) System.out.println("Filename="+filename);





try {


foS = new BufferedReader(new FileReader(filename));


} catch (Throwable e) {


System.out.println("Error opening file="+filename);


return;


}





while (keepgoing) {


String nextLine;


try {


nextLine=foS.readLine();


} catch (Throwable e) {


System.out.println("Cannot read file="+filename);


return;


}


if (nextLine!=null) {


parseLine(nextLine);


} else {


keepgoing=false;


}


}   


try {


foS.close();


} catch (Throwable e) {


System.out.println("Cannot close file="+filename);


return;


}





System.out.println("-------------------------------------------------");


System.out.println("- GC Statistics for file - "+filename);


System.out.println("-------------------------------------------------");


System.out.println("-**** Totals ***");


System.out.println("- "+total_gc+" Total number of GCs");


System.out.println("- "+total_gctime+" ms. Total time in GCs");


System.out.println("- "+total_bytes+" Kbytes. Total memory collected during GCs");


System.out.println("- ");


System.out.println("-**** Averages ***");





double mean=total_gctime/total_gc, stddev=Math.sqrt((total_gctime1-2*mean*total_gctime+total_gc*mean*mean)/total_gc);


int    imean=new Double(mean).intValue(), istddev=new Double(stddev).intValue();


System.out.println("- "+imean+" ms. Average time per GC. (stddev="+istddev+" ms.)");





mean=total_bytes/total_gc; stddev=Math.sqrt((total_bytes1-2*mean*total_bytes+total_gc*mean*mean)/total_gc);


imean=new Double(mean).intValue(); istddev=new Double(stddev).intValue();


System.out.println("- "+imean+" Kbytes. Average memory collected per GC. (stddev="+istddev+" Kbytes)");





mean=total_free/total_gc; stddev=Math.sqrt((total_free1-2*mean*total_free+total_gc*mean*mean)/total_gc);


imean=new Double(mean).intValue(); istddev=new Double(stddev).intValue();


System.out.println("- "+imean+"%. Free memory after each GC. (stddev="+istddev+"%)");





if (total_time>0 && total_gctime>0) {


System.out.println("- "+((total_gctime*1.0)/(total_time*1.0))*100.0+"% of total time ("+total_time+"ms.) spent in GC.");


}


System.out.println("___________________________ "+new Date());


System.out.println("");


}


Appendix C - Additional References


FOOTNOTES:

1 Since the Servlet Redirector function (typically used as a means of separating the web server and servlet engine on different machines) is a EJB Client application, it is an Open Queue.

2 Prepared statements are optimized for handling parametric SQL statements that benefit from precompilation. If the JDBC driver, specified in the data source, supports precompilation, the creation of a prepared statement will send the statement to the database for precompilation. Some drivers may not support precompilation. In this case, the statement may not be sent to the database until the prepared statement is executed.

Appendix D - Performance Tool Procedures

Starting NT Performance Monitor



start


Programs


Administrative Tools


Performance Monitor


Edit IBM HTTP Server file httpd.conf

Tuning the IBM HTTP Server can be done by using a text editor on the file httpd.conf in directory:



<IBM HTTP Server root directory>/conf/


WebSphere AppServer Resource Analyzer

Enable Resource Analyzer counter collection

From the WebSphere Application Server Admin Console:



select you application server (DefaultServer for example)


right click on your application server


and select "Performance"


double click on "Performance Modules" to see Resource Analyzer categories.


(Each category is associated with a set of counters appropriate


for the category. One RA category is database connection pools)


expand a category to see objects that relate to your particular configuration


(If you defined a datasource myDS, you would see it under


the category of database connection pools)


select an object of interest


right click on the object of interest and select: none, low, medium, high


press the "Set" button


If you selected medium for myDS, then all counters in the database connection pools category with a low and medium impact are collected. Counter impacts are fixed by Resource Analyzer. In this example,



poolSize (impact=medium) would be collected


percentUsed (impact=high) would not be collected


The Resource Analyzer Help defines counters that are associated with each category and the impact for each counter.

Start Resource Analyzer

Run ra.bat or ra.sh as directed.