A Stack Overflow user was wondering if the JVM can eliminate the allocation of a temporary object by replacing it with an implicit static instance. Is the JVM “smart enough” to do so?

The example in question:

public static String toJson(final Object object) throws JsonProcessingException {
   final ObjectMapper mapper = new ObjectMapper();
   return mapper.writeValueAsString(object); 
}

The JVM does not implicitly share this object in terms of a static object, that would violate the Java Language Specification. Even if the object itself had no fields, the behavior could change if it relies on other objects and their state, which might not be thread-safe. Even if the object was thread-safe or the code is single-threaded, an implicit reuse could break the assumptions of the original code as the implementation does not always start from the initial state.

As for eliminating the object allocation itself, it could be done and is done under specific circumstances. By means of escape analysis, the JIT can determine if references to a newly created object can theoretically leave or “escape” the current frame in terms of the current set of local variables in which case references might be stored in fields of other heap objects or static fields. If the analysis finds that a given object does not, the heap allocation could be replaced with a stack allocation as the lifetime of the object is restricted to the current frame.

Scalar Replacement

OpenJDK performs escape analysis. The compiler represents the outcome as follows:

  typedef enum {
    UnknownEscape = 0,
    NoEscape      = 1, // An object does not escape method or thread and it is
                       // not passed to call. It could be replaced with scalar.
    ArgEscape     = 2, // An object does not escape method or thread but it is
                       // passed as argument to call or referenced by argument
                       // and it does not escape during call.
    GlobalEscape  = 3  // An object escapes the method or thread.
  } EscapeState;

With that information, allocations can be optimized. However, the JVM does not replace heap allocations with stack allocations. Instead, it executes an optimization known as “scalar replacement” which means that object field accesses are replaced with corresponding local variables if object references do not escape the current frame and all called object methods can be inlined. The JVM thereby completely eliminates the actual object instance. Depending on the implementation of ObjectMapper, this optimization could apply.

To illustrate the effect of scalar replacement, consider the following example:

public class Test {
  public static void main(String[] args) {
    int count = 0;
    for (int i = 0; i < 1000*1000*100; i++) {
      TestProcessor proc = new TestProcessor("context");
      String output = proc.process("input");
      if (output.length() > 0) {
        count++;
      }
    }
    System.out.println(count);
  }
}

class TestProcessor {
  String m_context;

  TestProcessor(String context) {
    m_context = context;
  }

  String process(String input) {
    return m_context != null && System.currentTimeMillis() > 0?m_context:input;
  }
}  

The temporary TestProcessor instance never leaves the current frame, it does not “escape” it. Running the example without scalar replacement (-XX:-EliminateAllocations) triggers the garbage collector:

[0.021s][info][gc] Using G1
[0.317s][info][gc] Periodic GC disabled
[0.100s][info][gc] GC(0) Pause Young (Normal) (G1 Evacuation Pause) 13M->1M(66M) 1.455ms
[...]
[5.709s][info][gc] GC(58) Pause Young (Normal) (G1 Evacuation Pause) 58M->1M(96M) 0.458ms
100000000

With default options, the GC is not triggered at all and the total runtime is considerably less pronounced:

[0.011s][info][gc] Using G1
[0.035s][info][gc] Periodic GC disabled
100000000

Scalar Replacement and Fields

Storing the instance reference in an object field, array element, or a static field disables scalar replacement, even if the object field belongs to an object that does not escape the frame. All four cases in the following example do:

public class Test {
  static Object s_obj;
  static Object[] s_objs = new Object[1];

  public static void main(String[] args) {
    Holder holder = new Holder();
    int count = 0;
    for (int i = 0; i < 1000*1000*100; i++) {
      TestProcessor proc = new TestProcessor("context");
      // Store the reference in a static field
      s_obj = proc;
      // Store the reference in an array element
      s_objs[0] = proc;
      // Store the reference in an object field (non-escaping object #1)
      proc.m_obj = proc;
      // Store the reference in an object field (non-escaping object #2)
      holder.m_obj = proc;
      String output = proc.process("input");
      if (output.length() > 0) {
        count++;
      }
    }
    System.out.println(count);
  }
}

class Holder {
  Object m_obj;
}

class TestProcessor {
  String m_context;
  Object m_obj;

  TestProcessor(String context) {
    m_context = context;
  }

  String process(String input) {
    return m_context != null && System.currentTimeMillis() > 0?m_context:input;
  }
}

Scalar Replacement and Inlining

Inlining is a requirement for scalar replacement. If the object’s methods or the code that accesses the object to be scalar-replaced cannot be inlined, the object is not suitable for scalar replacement. The reason is simple: the code must access object fields and those fields are represented by local variables in the current frame, so the code must run in the current frame to gain local variable access.

Adding an object access that can be inlined (Object.equals()) does not prevent scalar replacement:

      if (output.length() > 0 && args.equals(proc)) {
        count++;
      }

Adding an object access that cannot be inlined (Object.hashCode() has a native implementation) disables scalar replacement:

      if (output.length() > 0 && proc.hashCode() > 0) {
        count++;
      }

In that scenario, overriding hashCode enables scalar replacement again because the new method can be inlined:

  public int hashCode() {
    return 42;
  }

The inlining decision is also affected by the method size. In the original example, the method TestProcessor.process() was small enough. The example calls the method numerous times, so the relevant limit is controlled by -XX:FreqInlineSize. The default in my tests based on Java 14 is 325:

$ java -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal -version 2>&1 | grep FreqInlineSize
     intx FreqInlineSize                           = 325 {pd product} {default}

Increasing the method size by means of a switch block allows us to exceed the limit:

  String process(String input) {
    switch (input.hashCode()) {
      case 0: System.currentTimeMillis();
      // ...
      case 35: System.currentTimeMillis();
    }
    return m_context != null && System.currentTimeMillis() > 0?m_context:input;
  }

Before (javap output):

        16: getfield      #7                  // Field m_context:Ljava/lang/String;
        19: goto          23
        22: aload_1
        23: areturn

After (javap output):

       324: getfield      #7                  // Field m_context:Ljava/lang/String;
       327: goto          331
       330: aload_1
       331: areturn

By exceeding the limit, scalar replacement is disabled again.

Scalar Replacement and Arrays

In Java, arrays are objects. Consequently, arrays can be subject to scalar replacement. However, the constraints are significant:

    if (call->is_AllocateArray()) { 
      if (!cik->is_array_klass()) { // StressReflectiveCode
        es = PointsToNode::GlobalEscape;
      } else {
        int length = call->in(AllocateNode::ALength)->find_int_con(-1);
        if (length < 0 || length > EliminateAllocationArraySizeLimit) {
          // Not scalar replaceable if the length is not constant or too big.
          scalar_replaceable = false;
        }
      }
    }

The array size must not exceed -XX:EliminateAllocationArraySizeLimit=<value>. The default is 64. Also, the size must be constant. The stack frame size is usually constant, so is the concept of local variables. This limitation makes sense.

Similarly, the index used for element access must be constant. Iterating an array with a loop is incompatible with scalar replacement.

Scalar Replacement and Other Constraints

The source reveals more constraints affecting the object definition itself:

    } else {  // Allocate instance
      if (cik->is_subclass_of(_compile->env()->Thread_klass()) || 
          cik->is_subclass_of(_compile->env()->Reference_klass()) ||
         !cik->is_instance_klass() || // StressReflectiveCode
         !cik->as_instance_klass()->can_be_instantiated() ||
          cik->as_instance_klass()->has_finalizer()) {
        es = PointsToNode::GlobalEscape;
      } else {
        int nfields = cik->as_instance_klass()->nof_nonstatic_fields();
        if (nfields > EliminateAllocationFieldsLimit) {
          // Not scalar replaceable if there are too many fields.
          scalar_replaceable = false;
        }
      }
    }

In terms of real-world relevance, the most important aspect is that overriding Object.finalize() disables scalar replacement. Finalization requires the object to be accessible when the hook is called - this implies GlobalEscape.

Less relevant is a field limit configurable via -XX:EliminateAllocationFieldsLimit=<value>. The default is 512, so it’s unlikely to cause problems.

Scalar Replacement vs. Stack Allocation

Scalar replacement is a valuable optimization. It certainly isn’t a catch-all solution for eliminating the allocation overhead of temporary objects and it never intended to be. As with other optimizations, there are constraints that need to be satisfied.

Stack allocation has the potential to eliminate the inlining dependency because object references could be passed directly but comes with its own set of implications and could turn out to be less efficient. Stack allocation requires representing the object in memory, scalar replacement on the other hand can potentially represent object fields in registers.

Bottom Line

If you know that the object can safely be shared and reused based on its documentation, the best approach is to explicitly reuse the instance rather than relying on potential optimizations that can vary from one JVM implementation to another and, more importantly, might be disabled due to caller/callee changes in the future.

Technical Articles ❯ Optimization

Does the JVM eliminate allocations of temporary objects?

Scalar Replacement

Scalar Replacement and Fields

Scalar Replacement and Inlining

Scalar Replacement and Arrays

Scalar Replacement and Other Constraints

Scalar Replacement vs. Stack Allocation

Bottom Line

Previous Article

Does the JVM return memory to the OS?

Request Review

Email

Phone