Internal architecture of the Java Virtual Machine (JVM)
A thread is a thread of execution in a program. The JVM allows an application to have multiple threads of execution running concurrently. In the Hotspot JVM there is a direct mapping between a Java Thread and a native operating system Thread. The operating system is therefore responsible for scheduling all threads and dispatching them to any available CPU.
JVM System Threads
VM thread: operations performed by this thread are “stop-the-world” garbage collections, thread stack dumps, thread suspension and biased locking revocation.
Periodic task thread: timer events
GC threads: support garbage collection activities
Compiler threads: compile byte code to native code at runtime
Signal dispatcher thread: This thread receives signals sent to the JVM process and handle them inside the JVM by calling the appropriate JVM methods.
Each Thread Contains:
Program Counter (PC): holds the address of the next instruction to be executed
Stack: Each thread has its own stack that holds a frame for each method executing on that thread.
Native Stack: A native method can typically call back into the JVM and invoke a Java method. This will make thread to leave the native stack and create a new frame on the Java stack.
Stack Restrictions: A stack can be a dynamic or fixed size. If a thread requires a larger stack than allowed a StackOverflowError is thrown. If a thread requires a new frame and there isn’t enough memory to allocate it then an OutOfMemoryError is thrown.
Frame: A new frame is created and added (pushed) to the top of stack for every method invocation. The frame is removed (popped) when the method returns normally or if an uncaught exception
Each frame contains:
Local Variables Array: contains all the variables used during the execution of the method
Operand Stack: is used during the execution of byte code instructions in a similar way that general-purpose registers are used in a native CPU.
Dynamic Linking: When a Java class is compiled, all references to variables and methods are stored in the class’s constant pool as a symbolic reference. A symbolic reference is a logical reference not a reference that actually points to a physical memory location.
Shared Between Threads
The Heap is used to allocate class instances and arrays at runtime. Arrays and objects can never be stored on the stack because a frame is not designed to change in size after it has been created. The frame only stores references that point to objects or arrays on the heap. Unlike primitive variables and references in the local variable array (in each frame) objects are always stored on the heap so they are not removed when a method ends. Instead objects are only removed by the garbage collector.
To support garbage collection the heap is divided into three sections:
- Young Generation: Eden and Survivor
- Old Generation
- Permanent Generation
Objects and Arrays are never explicitly de-allocated instead the garbage collector automatically reclaims them.
- New objects and arrays are created into the young generation
- Minor garbage collection will operate in the young generation. Alive Objects, will be moved from the eden to survivor space.
- Major garbage collection, which typically causes the application threads to pause, will move objects between generations. Objects, that are still alive, will be moved from the young generation to the old (tenured) generation.
- The permanent generation is collected every time the old generation is collected. They are both collected when either becomes full.
Objects that are logically considered as part of the JVM are not created on the Heap. The non-heap memory includes:
- Permanent Generation: contains method area and interned strings
- Code Cache: used for compilation and storage of methods that have been compiled to native code by the JIT compiler
Just In Time (JIT) Compilation
Java byte code is interpreted, however this is not as fast as directly executing native code on the JVM’s host CPU. To improve performance the Oracle Hotspot VM looks for “hot” areas of byte code that are executed regularly and compiles these to native code. The native code is then stored in the code cache in non-heap memory.
The method area stores per-class information such as:
- Classloader Reference
- Run Time Constant Pool: Numeric constants, Field references, Method References, Attributes
- Field data: Name, Type, Modifiers, Attributes
- Method data: Name, Return Type, Parameter Types, Modifiers, Attributes
- Method code: Bytecodes, Operand stack size, Local variable size, Local variable table,
- Exception table: Start point, End point, PC offset for handler code, Constant pool index for exception class being caught
All threads share the same method area, so access to the method area data and the process of dynamic linking must be thread safe. If two threads attempt to access a field or method on a class that has not yet been loaded it must only be loaded once and both threads must not continue execution until it has been loaded.
Although the method area is logically part of the heap, may not garbage collect or compact. This will be CodeCache a separate field of the VM to the ObjectHeap.
The JVM starts up by loading an initial class using the bootstrap classloader. The class is then linked and initialized before public static void main(String) is invoked. The execution of this method will in turn drive the loading, linking and initialization of additional classes and interfaces as required.
Loading is the process of finding the class file that represents the class or interface and reading it into a byte array and finally object is created.
Linking is verifying, preparing and resolving the type and its direct superclass and superinterfaces of classes and interfaces. Verification will ensure code standards. But this will slows down class loading however it saves time when executing the bytecode. Preparing involves allocation of memory for static storage and any data structures used by the JVM such as method tables. Resolving is an optional stage will check symbolic references of classes and interfaces.
Initialization will execute the class or interface initialization method.
There are multiple classloaders in JVM with different roles. Each classloader delegates to its parent classloader (that loaded it) except the bootstrap classloader which is the top classloader.
- The bootstrap classloader is responsible for loading the basic Java APIs, including for example rt.jar.
- Extension Classloader loads classes from standard Java extension APIs such as security extension functions.
- System Classloader will load application classes from the classpath.
- User Defined Classloaders can alternatively be used to load application classes and for runtime loading etc.
Classloader contains a reference to all classes that it has loaded.
Class Data Sharing (CDS)
JVM loads a set of key classes, such as rt.jar, into a memory-mapped shared archive. This improves JVM start-up speed and allows sharing between different instances of the JVM reducing the memory footprint.
Run Time Constant Pool
Byte codes in Java require data, often this data is too large to store directly in the byte codes, instead it is stored in the constant pool
The exception table stores per-exception handler information
The symbol table includes a pointer to all symbols including those held in run time constant pools in each class.
Interned Strings (String Table)
The Java Language Specification requires that identical string literals, that contain the same sequence of Unicode code points, must refer to the same instance of String. String literals are automatically interned by the compiler and added into the symbol table when the class is loaded.