A collection is empty if it contains no elements. The overhead of an empty collection is defined as the amount of memory that would be saved if the collection was not allocated at all - that is, the implementation-inclusive size in bytes of the collection (see previous section). Empty collections where capacity may exceed size - typically array-backed collections, such as HashMap and ArrayList - may incur surprisingly high overhead, especially if the initial capacity is higher than the default value. For a ConcurrentHashMap in JDK releases prior to 7, the size of an empty instance initialized with default values is whopping 1200 bytes. Fortunately, in more recent JDK versions its code has been modified to create internal ConcurrentHashMap.Segment objects lazily, which greatly reduced this overhead.
A radical way to avoid the overhead of empty collections is to avoid allocating them until the first use (lazy initialization). If a collection is referenced by a data field that's accessed by many methods, switching to lazy initialization would require changing the code of all these methods, so that they check if the field is null and initialize it if so. That would make the code less readable and more brittle; the alternative is to replace all these direct uses with calls to a single accessor method that deals with lazy initialization. Still such changes may be time-consuming, and in some cases may create races in object initialization if these methods can be called by multiple threads.
For this reason, in some situations it makes sense to leave eager collection initialization in place, but change its initial capacity to a smaller number, e.g. use new HashMap(8) instead of new HashMap() (the latter is equivalent to new HashMap(16)). This will reduce the overhead for those HashMaps that stay empty, but may result in some performance overhead due to more frequent collection resizing if many HashMaps allocated at the same site end up growing bigger than their initial capacity. Here it can be helpful to compare the number of good and problematic objects referenced by the same reference chain. If most collections are bad, the overhead due to the few good ones, that will be negatively affected by the fix, is likely to be negligible. However, if a considerable number of collections allocated at the given point in the code are good, the fix will have to be more sophisticated.
###
A collection is empty if it contains no elements. The overhead of an empty collection is defined as the amount of memory that would be saved if the collection was not allocated at all - that is, the implementation-inclusive size in bytes of the collection (see previous section). Empty collections where capacity may exceed size - typically array-backed collections, such as HashMap and ArrayList - may incur surprisingly high overhead, especially if the initial capacity is higher than the default value. For a ConcurrentHashMap in JDK releases prior to 7, the size of an empty instance initialized with default values is whopping 1200 bytes. Fortunately, in more recent JDK versions its code has been modified to create internal ConcurrentHashMap.Segment objects lazily, which greatly reduced this overhead.
A radical way to avoid the overhead of empty collections is to avoid allocating them until the first use (lazy initialization). If a collection is referenced by a data field that's accessed by many methods, switching to lazy initialization would require changing the code of all these methods, so that they check if the field is null and initialize it if so. That would make the code less readable and more brittle; the alternative is to replace all these direct uses with calls to a single accessor method that deals with lazy initialization. Still such changes may be time-consuming, and in some cases may create races in object initialization if these methods can be called by multiple threads.
For this reason, in some situations it makes sense to leave eager collection initialization in place, but change its initial capacity to a smaller number, e.g. use new HashMap(8) instead of new HashMap() (the latter is equivalent to new HashMap(16)). This will reduce the overhead for those HashMaps that stay empty, but may result in some performance overhead due to more frequent collection resizing if many HashMaps allocated at the same site end up growing bigger than their initial capacity. Here it can be helpful to compare the number of good and problematic objects referenced by the same reference chain. If most collections are bad, the overhead due to the few good ones, that will be negatively affected by the fix, is likely to be negligible. However, if a considerable number of collections allocated at the given point in the code are good, the fix will have to be more sophisticated.
###
Empty array
###
A collection is sparse if the number of elements in it is less than a half of its capacity. Consequently, only collections where capacity may exceed size - typically array-backed collections - can be sparse. The overhead is defined as the total size in bytes of empty slots (null pointers) in the collection - that is, how much memory would be saved if the collection did not contain any empty slots. A collection is furthermore small sparse if its capacity does not exceed the default capacity (e.g. 16 for HashMap). See the discussion of fixing problems in collections on why the distinction between small and large (see below) sparse collections is important.
###
A collection is sparse if the number of elements in it is less than a half of its capacity. Consequently, only collections where capacity may exceed size - typically array-backed collections - can be sparse. The overhead is defined as the total size in bytes of empty slots (null pointers) in the collection - that is, how much memory would be saved if the collection did not contain any empty slots. Large Sparse collections that are sparse and have a capacity larger than default. 
###
A collection is sparse if the number of elements in it is less than a half of its capacity. Consequently, only collections where capacity may exceed size - typically array-backed collections - can be sparse. The overhead is defined as the total size in bytes of empty slots (null pointers) in the collection - that is, how much memory would be saved if the collection did not contain any empty slots. 
###
Boxed collections are collections that contain boxed numbers, i.e. instances of classes such as java.lang.Integer, java.lang.Double etc. Such collections are considered problematic since instances of boxed numbers take a lot more memory than int, double etc. numbers that they wrap. The overhead of a boxed collection is defined as the amount of memory that would be saved if we replaced the collection with one (for lists) or two (for maps) arrays of the corresponding primitive numbers. Technically, its value is calculated as collection_impl_size + (sizeof(Number) - sizeof(number)) * num_elements. Here Number stands for java.lang.Integer, java.lang.Double etc., and number stands for int, double, etc. Note that currently a collection is considered boxed if a random sample of a key or value or both returns a boxed number object.
###
Zero-size arrays have length zero, i.e. cannot contain any elements. Such arrays might be useful in very few situations; usually they are a useless by-product of methods that blindly allocate arrays with specified length. The overhead of such an array is defined as its size in bytes, which is array header plus possible object alignment.
###
WeakHashMaps with references from values to keys
are instances of java.util.WeakHashMap which contain key-value pairs (K, V) such that a field in V points back at K (or some other key in the same map). If there is such a reference, K and V will not be removed from the map even when there are no other references to K. This, in effect, breaks the weakness property of the map and creates a memory leak. Note that a weak reference from V back to K is fine - a WeakHashMap will work as intended in this case. 
The overhead of a problematic WeakHashMap is calculated as a shallow size of all key and value objects that are linked in the above way. This is a conservative estimate: each of these objects may point to others, that collectively might consume a lot more memory. 
###
Vertical bar arrays are two-dimensional arrays where outer dimension M is (much) larger than the inner dimension N. Such a collection is wasteful, since each short inner array or list incurs a per-object overhead (list or array implementation size) and requires a pointer from the outer list. The overhead of a bar collection is defined as the amount of memory that would be saved if outer and inner dimensions are "flipped", i.e. we replace an M by N list with N by M one. Technically, it is calculated as (M - N) * (pointer_size + array_or_list_overhead)
###
Long zero-tail arrays end with a continuous block of zeros, that is longer than a half of the array's length. Arrays looking like this often happen to be buffers in classes such as I/O streams, text parsers, etc., that have been allocated with excessive capacity. The overhead for such an array is defined as the size in bytes of the block of zeros in the end of the array.
###
Small collections contain a very small number of elements (currently defined as 4 or fewer). Such collections are considered problematic, since for them the "fixed costs" (size of implementation details that don't depend on the number of elements in collection) are comparable to, or higher (sometimes much higher) than the total size of "workload" in that collection. For example, the minimum amount of memory used by a HashSet, which internally consists of several Java objects with multiple data fields each, is around 100 bytes on a 32-bit JVM. If a HashSet contains just 3 elements, its workload size is 12 bytes - an order of magnitude smaller than its implementation size. The overhead of a small collection is defined as the amount of memory that would be saved if the collection is replaced with one (for lists) or two (for maps) arrays of objects containing this collection's elements. Technically, it is calculated as collection_impl_size - (pointer_size * num_elements + array_overhead) (for lists) or collection_impl_size - 2 * (pointer_size * num_elements + array_overhead) (for maps).
###
Instances that have some or all data fields equal to null or zero. More precisely, in the first subsection the tool reports every class where certain field(s) are null in each instance of that class, whereas in the second subsection it reports each class where certain field(s) are null in at least 90 per cent of its instances
###
Multiple String instances with the same value. Some operations, e.g. parsing a text document, inevitably produce duplicated strings. Depending on how these strings are handled subsequently, it may be possible to come up with efficient case-specific duplication elimination methods.
###
Duplicated primitive arrays are multiple instances of primitive arrays, such as int[] or byte[] with identical type, length and contents. Note that only standalone arrays are considered. That is, for instance, char[] arrays that back String instances are not checked for duplication here.

