Your pager hasn't been sleeping well. It periodically wakes you up in the middle of the night to tell you that your server is firing off "OutOfMemoryError" messages. Worse still, your significant other forcibly relocated you to the couch and told you not to return until your pager stops buzzing.
Sound familiar? If so, you may have a case of memory leak induced insomnia, but fortunately we've got a cure for what ails you. This tutorial will teach you everything you need to know to ease your suffering, including what memory leaks are, why they happen, and how to diagnose and fix 'em.In this article we'll focus on techniques that will enable you to address memory leaks with any commercial or free/open source memory profiler; we're not here to recommend one tool over another. After all, the most important thing is that you fix the problem and get some rest, not the tool you use to get it done.
Ever heard the story about how Java has "automatic memory management"—you know, the one that someone in marketing upgraded to an epic tale about how you'll never have to worry about memory leaks ever again? As is often the case, the truth is more complex than the marketing department made it out to be. While it's true that Java’s garbage collector (GC) helps to eliminate the most common memory leak issues from applications, it is unfortunately still possible to experience memory leaks in Java. However, they happen a lot less often than they used to in the C or C++ days.
Many people believe that black magic and complex tools are required to fix memory leaks. This undeserved reputation is caused by the lack of good explanations of what they are and what to do when you encounter them. But with the proper tools and knowledge to fix memory leaks, they aren't nearly as intimidating.
In this article we'll cover everything from memory leak basics to analyzing heap dumps, so—whether you're an experienced Java developer or encountering Java memory leaks for the first time—you'll be better prepared to deal with memory leaks by the time you reach the conclusion. We won't outline a series of steps, like "do ABC with commercial tool XYZ and don't ask why," as that approach doesn’t work and implies that remedies are more complex then they really are. Instead, we'll give you the background information necessary to address memory leaks, with emphasis placed on particular steps you'll need to execute. Similarly, we'll assume that you can learn on your own how to use the memory profiler of your choice; what's missing is an understanding of what the tool is trying to do and why, so that will be the focus of this article.
Java will be used for all examples, so all information in this article directly applies to Java applications running standalone or as a part of J2EE/JEE/Tomcat-based application server. But remember, although our primary focus is on Java, most of the process of diagnosing and fixing memory leaks described herein applies to other languages with garbage collectors. So even if you're using Ruby, C#, or Python, there should be something for you in this article.
Let's start by describing how memory leaks happen in Java. Java implements automatic garbage collection (GC), and once you stop using an object you can depend on the garbage collector to collect it. While additional details of the collection process are important when tuning GC performance, for the sole purpose of fixing a memory leak we can safely ignore them.
When is memory eligible for GC? Let's take a look at an example:We don't have to do anything special to make an object eligible for GC—we just eliminate any references to it, and it "magically" disappears and stops using memory. That's why we say that Java performs "automatic" GC.Why "eligible" for GC? Because objects are not collected immediately. GC is not instantaneous and comes with some performance impacts. Consequently, Java doesn't immediately collect every object that is eligible for collection; it typically postpones collection until a more convenient time later on. The way to think about GC in Java is that it's a "lazy bachelor" that hates taking out the trash and typically postpones the process for some period of time. However, if the trash can begins to overflow, Java immediately takes it out. In other words, if memory becomes scarce, Java immediately runs GC to free memory.Since we don't need to do anything special in order to dispose of objects in Java, how do memory leaks happen in Java? Memory leaks occur when a program never stops using an object, thus keeping a permanent reference to it.Let's take a look at an example that helps illustrate this point. The following code will cause all available memory in the JVM to be exhausted:When no more memory is remaining, an OutOfMemoryError alert will be thrown and generate an exception like this:Exception in thread "main" java.lang.OutOfMemoryError: Java heap space atMemoryLeakDemo.main(MemoryLeakDemo.java:14)In the example above, we continue adding new elements to the list memoryLeakArea without ever removing them. In addition, we keep references to the memoryLeakArea, thereby preventing GC from collecting the list itself. So although there is GC available, it cannot help because we are still using memory. The more time passes the more memory we use, which in effect requires an infinite amount memory for this program to continue running.This is an example of unbounded memory leak—the longer the program runs, the more memory it takes. So even if the memory size is increased, the application will still run out of memory at a later date.
As illustrated by the flowchart below, the process of fixing memory leaks is fairly simple:
Memory leaks are misunderstood creatures. Just getting an OutOfMemoryError alert doesn’t necessary mean that you're suffering from a memory leak. So, before you dive into "fixing" the problem, you must first find out whether or not a memory leak actually exists. If a memory leak does in fact exist, the next step is to determine which objects are leaking and uncover the source of the memory leak. Then, you fix it.We'll skip past the initial steps and dive right into diagnosing whether or not the problem is a memory leak.
Not every OutOfMemoryError alert indicates that a program is suffering from a memory leak. Some programs simply need more memory to run. In other words, some OutOfMemoryError alerts are caused by the load, not by the passage of time, and as a result they indicate the need for more memory in a program rather than a memory leak.
To distinguish between a memory leak and an application that simply needs more memory, we need to look at the "peak load" concept. When program has just started no users have yet used it, and as a result it typically needs much less memory then when thousands of users are interacting with it. Thus, measuring memory usage immediately after a program starts is not the best way to gauge how much memory it needs! To measure how much memory an application needs, memory size measurements should be taken at the time of peak load—when it is most heavily used.The graph below shows the memory usage in a healthy Java application that does not suffer from memory leaks, with the peak load occurring around 10 AM and application usage drastically decreasing at 5 PM. Naturally, the peak load on business applications often correlates with normal business hours.The application illustrated by the chart above reaches its peak load around 10 AM and needs around 900MB of memory to run. This is normal behavior for an application suffering from no memory leaks; the difference in memory requirements throughout the day is caused solely by the user load.Now, let's suppose that we have a memory leak in the application. The primary characteristic of memory leaks is that memory requirements increase as a function of time, not as a function of the load. Let's see how the application would look after running for a few days with a memory leak and the same peak user loads reached around 10 AM every day:Because peak loads on the system are similar every morning but memory usage is growing over a period of a few days, this picture indicates a strong possibility of memory leaks. If the program eventually started suffering from OutOfMemory exceptions, it would be a very strong indication that there's a problem with memory leaks. The picture above shows a memory leak of about 100MB per day.Note that the key to this example is that the only thing changing is the amount of time the system is up—the system peak load doesn't change over time. This is not the case for all businesses. For example, the peak load for a tax preparation service is seasonal, as there are likely more users on the system in April than July.There is one special case that should be noted here: a program that needs to be restarted periodically in order to prevent it from crashing with an OutOfMemoryError alert. Imagine that on the previous graph the max memory size was 1100MB. If the program started with about 900MB of memory used, it would take about 48 hours to crash because it leaks about 100MB of memory per day. Similarly, if the max memory size was set to 1000MB, the program would crash every 24 hours. However, if the program was regularly restarted more often than this interval, it would appear that all is fine.Regularly scheduled restarts may appear to help, but also might make "upward sloping memory use" (as shown in the previous graph) more difficult to notice because the graph is cut short before the pattern emerges. In a case like this, you'll need to look more carefully at the memory usage, or try to increase the available memory so that it's easier to see the pattern.
As you are already aware, you need to measure memory that is free and used inside of the JVM, not memory that the JVM process is using. In other words, the top/Task Manager/Activity Monitor will measure how much memory your Java process is using, but that's not what you need. You need a tool that can look inside the JVM process and tell you how much memory is available for your program running inside the JVM.
In addition, keep in mind that Java GC is not a constant process—it runs in intervals. The memory usage that you see in the JVM is usually higher then what your program needs at the moment, as GC hasn't yet run. Remember, lazy bachelors usually have some trash in their apartments waiting to be taken out.So, what you need to investigate is not current memory usage, but rather the average usage over a long period of time. For example, if your program is currently using 100MB of memory and five seconds later it's using 101MB, that's not an indication of a memory leak because GC might free up memory when it eventually runs. But if your program's memory usage increases over a long period of time under constant usage, you might have trouble on your hands.There are a couple of options for measuring the amount of memory a program uses. The simplest one, which does not require any tools and works even with production systems, is the Verbose GC log.
The Verbose GC log is defined when the JVM process is started. There are a couple of switches that can be used:
The following is an example of the output generated for Tomcat running in the default configuration with all of the previous switches enabled:
1.854: [GC 1.854: [DefNew: 570K->62K(576K), 0.0012355 secs] 2623K->2175K(3980K), 0.0012922 secs]1.871: [GC 1.871: [DefNew: 574K->55K(576K), 0.0009810 secs] 2687K->2229K(3980K), 0.0010752 secs]1.881: [GC 1.881: [DefNew: 567K->30K(576K), 0.0007417 secs] 2741K->2257K(3980K), 0.0007947 secs]1.890: [GC 1.890: [DefNew: 542K->64K(576K), 0.0012155 secs] 2769K->2295K(3980K), 0.0012808 secs]
The most important set of numbers is located in the second column after the second -> (e.g., in the top line shown it is 2623K->2175K(3980K). These numbers indicate that as a result of GC, we are using around 2200K of memory at the end of each GC cycle.
This trace is not an indication of a memory leak—it shows a short-term trend with less then a second between samples, and that's why we must observe long-term trends. However, if the Verbose GC log showed that the program was using around 2200K of memory after running for two days, and after running for 10 days it was using 2GB of memory (even after GC had just run), we could then conclude that there's a memory leak.
All the information that needs to be collected in order to determine if a memory leak exists can be found in the results of the Verbose GC logs. The other memory monitoring tools we'll cover in this article simply provide more information in a form that's easier to interpret.
The following approach works for any Java process, including standalone clients as well as application servers like JBoss and servlet containers like Tomcat. It is based on starting the Java process with JMX monitoring enabled and attaching with the JMX monitoring tools. We'll use Tomcat in the following example.
To start Tomcat or the Java process with JMX monitoring enabled, use the following options when starting JVM:
Note that if you're on a production system, you'll most likely want to secure your JVM before running it with these parameters. For that, you can specify these additional options:
Once started, you can use JConsole or VisualVM to attach to the process. Note that later JDK 6 versions include VisualVM.
This is an example of the JConsole monitoring Tomcat. As shown in the example below, click on the Memory tab to get memory information:Again, we're interested in the long-term trends of heap memory usage, not trends from just a few minutes of running.Finally, note that monitoring tools like Hyperic can typically show historical trends over longer periods of time than VisualVM or JConsole. In addition, tools like Hyperic allow for more fine-grained control over operations that are permitted by users. As a result, we recommend using monitoring tools for production systems use, while all the other tools discussed in this section are more appropriate for developers or "first aid" in the absence of the real monitoring tools.
We now know how to find out whether or not a memory leak exists, and we can even determine the speed at which we're leaking memory (e.g., 100MB per day). But are we better off with a "slow" leak (e.g., 1MB per day) or a "fast" leak (e.g., 500MB/hour)?
It depends. With a "slow" leak, it takes longer for the system to run out of memory. For example, if we have 256MB of free memory remaining at the peak load time, it can take quite a long time to run out of memory with a 1MB/day memory leak. On the other hand, a faster memory leak makes it easier to reproduce and fix the problem.One of the best ways to quickly determine whether there's a fast memory leak or you simply need more memory to run at the peak time is to allot more memory to the program and see what happens. If you increase the available heap in a Java program and the time between crashes increases, you are likely suffering from memory leak.Let's assume that we're lucky enough to have a fast memory leak—on the order of 100MB per hour—on our hands, and that the initial memory picture looks like this:In this example, we can expect to run out of memory after about five hours of running time.
Somewhat surprisingly, it is much easier to debug large memory leaks than small memory leaks. The reason is that memory leaks present a sort of "needle in the haystack" type problem—you need to find the leaked objects amongst all the other objects in the program.
Suppose that the program we're debugging just ran out of memory. If the program initially had 512MB of free memory and now has none, it is obvious that the leaked objects used about 512MB of memory. The figure below illustrates this example:In this situation, half of the objects in memory have been leaked! If we randomly select an object, there's a 50% chance that it has leaked. And as memory leaks usually consist of objects of a few classes, you can get a really good start on determining which objects are leaking memory by sorting memory usage based on aggregate memory use of all objects of the same class.What if the ratio is less favorable (e.g., 512MB heap size, initially used memory 480MB, and 32MB of leaked objects at the end)? If you have an easy to reproduce memory leak, you can increase the heap size because an unbounded memory leak will eventually fill any amount of memory allotted to the program. So, you can increase the heap to 1GB, reproduce the memory leak, and get 544MB of leaked objects in a 1GB of heap.If we had a way to look at the "complete picture" of memory, it would be fairly easy to pinpoint leaked objects. Fortunately, there's a way to do exactly this: heap dump the process.
A heap dump is a list of objects in the memory of JVM as well as the content of the memory occupied by those objects. It preserves the value of any attributes of the objects, including references to other objects. In other words, a heap dump gives you a complete picture of the memory.
There are multiple tools that allow you to dump heap in a Java process:
Some tools like VisualVM and memory profilers allow you to initiate a heap dump from the GUI, but you don’t need any fancy tools here—jmap will do just fine. As it provides the most general case, we'll use jmap in the next example.
Before you dump heap, be sure to keep the following issues in mind:
With those final words of caution out of the way, you should now be ready to run the following command:
jmap -heap:live,format=b,file=FILENAME PID
Note that the -F option, which will dump non-responsive programs, might be useful on UNIX systems, but is not available on Windows. Note also that JDK 6 includes the option +XX:+HeapDumpOnOutOfMemoryError that will dump heap whenever the OutOfMemoryError alert is encountered. This can be a useful option, but keep in mind that it has the potential to consume significant amounts of disk space.
You now have a heap dump in the file FILENAME and are ready to analyze it.
With the heap dump complete, we can now take a look at the memory and find out what's really causing the memory leak.
Suppose that objects are holding references to each other as illustrated by the picture below. For the sake of easy calculation, let's assume that each object is 100 bytes, so that all of them together occupy 600 bytes of memory.Now, suppose that the program holds reference to object A for a prolonged period of time. As a result, objects B, C, D, E, and F are all ineligible for garbage collection, and we have the following amount of memory leaking:
So, holding reference to object A causes a memory leak of 600 bytes. The shallow heap of object A is 100 bytes (object A itself), and the retained heap of object A is 600 bytes.
Although objects A through F are all leaked, the real cause of the memory leak is the program holding reference to object A. So how can we fix the root cause of this leak? If we first identify that object F is leaked, we can follow the reference chain back through objects D, C and A to find the cause of the memory leak. However, there are some complications to this "follow the reference chain" process:
If we start following inbound references from object F in this example, we have to choose between following object C or object D. In addition, there's the possibility of getting caught in a circle by repeatedly following the path between objects D, E and B. On this small diagram it's easy to see that the root cause is holding object A, but when you're dealing with a situation that involves hundreds of thousands of objects (as any self-respecting memory leak does) you quickly realize that manually following the reference chain be very complex and time consuming.
This is where some shortcuts can come in handy:
Now that we understand what memory leaks are and how they can be corrected, let's find out how to fix them by analyzing heap dumps.
Strictly speaking, you don’t need any tools that are not already part of the JDK. JDK ships with a tool called jhat, which you can use to inspect the heap dump. The process of fixing memory leaks is the same with all tools, and it's our opinion that no single tool is "light years" ahead of the others when it comes to fixing memory leaks.
Although jhat will get the job done, better tools often provide a few extra helpful features:
There are several free tools that are useful for analyzing heap dumps in Java. One that's widely used and is even included with the later versions of the JVM 6 is VisualVM. VisualVM is nice tool that gives you just enough to resolve memory leaks, and it shows heap dumps and relations between objects in graphical form.
Feature-wise, one step above VisualVM is the Eclipse Memory Analyzer Tool (MAT), a free tool that includes a lot of additional options. Although it's still in incubation phase as of publication of this article, MAT is free and we've found it to be extremely useful.Commercial products like JProfiler, YourKit, and JProbe are also excellent tools for debugging memory leaks. These applications include a few options that go above and beyond VisualVM and MAT, but they're certainly not necessary to successfully debug memory leaks. Unless you already have a license for one of these commercial tools, we recommend trying MAT first.
If you have a fast memory leak and are able to reproduce it in such a way as to make the leaked objects a significant portion of the final memory picture, then determining which objects are leaked is simple because they occupy a significant portion of the memory. To determine which objects are leaked, sort the classes by total memory usage of all instances of each class. The objects near the top of the list are usually the leaked objects. Then, you can follow the reference chain to them until you find the cause of the memory leak.
A useful heuristic here is that if you sort all the objects by retained heap, you can find the objects that are likely the root cause of the memory leak. Again, an example:If we assume that each instance of object C is 100 bytes, then holding object A resulted in a memory leak of almost 1GB! In other words, the retained heap of object A in this example is about 1GB. So, if we were investigating this memory leak, it would be fairly obvious that we should start there. Remember, the Eclipse Memory Analyzer Tool (MAT) allows you to sort the heap dump by the retained heap usage of the objects, so it would have been easy to use MAT to determine the retained heap of object A.
It always helps if you can easily reproduce the problem, but what if the memory leak is really small and slow? What should you do if after a few days you can reproduce only a 10MB leak in a heap that contains 2GB of objects? In situations like this, looking for the cause of the problem can feel like finding a needle in the haystack.
One solution in this case is the brute force approach—simply devote enough time looking at the objects to find the problem. While that will ultimately work, there's another way to address the problem if you have an idea of which operations are leaking memory. If you know or suspect which operations leak memory, you can find even relatively slow memory leaks.
The secret here is to compare heap dumps, as this highlights objects that are present in one heap dump but not in another. For example, suppose that objects A, B, and C are present in the first heap dump, while objects A, B, C, D, and E are present in the second heap dump. In this case, objects D and E are the difference.Comparing heap dumps makes it easier to find memory leaks because it effectively reduces the size of the "haystack" containing your needle. Comparing heap dumps can easily reduce the size of the objects for which you need to examine heaps from 2GBs to a few KB.Many commercial profilers as well as Eclipse MAT can be used to compare heaps. Once you've selected a tool that can compare heaps, simply follow the process explained in the following diagram:In effect, what you're doing is creating two snapshots, one before and one after executing the use case that you know is leaking memory. That way, you significantly reduce the number of objects that you need to investigate to find leaked objects. After you find the objects that leaked, you can find the cause of the memory leak as described in the previous section.Why perform garbage collection before taking a snapshot? Because some tools won’t automatically perform GC for you, and consequently the snapshots might include objects that are no longer reachable. Many modern tools will perform GC before taking the snapshot, but if you are unsure whether or not your tool performs full GC before taking a snapshot it is recommended that you do so yourself. That way you don’t have to worry about objects that are eligible for GC but have not yet been collected.Among the objects that are created during the use case, some are supposed to be there because they are the result of the use case execution (e.g., we created a new customer, and that customer object should be retained), while other objects are the memory leak. However, since the only objects present in the difference between the snapshots are the ones that were created during the use case, the size of the haystack you need to look through is significantly reduced.
Why do memory leaks sometimes take a long time to fix if the process is this simple? In our opinion, the main reasons are:
Fortunately, the practical issues most commonly encountered aren't very difficult to solve. The most common problems are:
This article described the methodology and techniques necessary to fix memory leaks. These techniques are universal—they apply to any profiler, and some of them even apply to languages other than Java that use garbage collection. The information presented above should give you everything you need in order to use the tools of your choice to investigate and fix memory leaks.
The proper usage of specific tools will be discussed in future articles, which will focus primarily on free tools and problems in Java. However, if there's enough interest we might tackle other languages as well. If you're interested in follow-up articles related to memory leak resolution in a different environment or language, please leave a comment or contact us at docs-at-openlogic.com.
The following links provide additional information on the behavior of GC in Java as well as different tools that you may find useful.
Follow @openlogicThis work is licensed under a Creative Commons Attribution 3.0 Unported License.
Allowed tags: <a> link, <b> bold, <i> italics