Join us on:
Facebook
LinkedIn
Plaxo

Memory

Session Chair: Hridesh Rajan, Iowa State University
Allocation Wall: a Limiting Factor of Java Applications on Emerging Multi-core Platforms
Yi Zhao, IBM China Research Lab
Jin Shi, Tsinghua University
Kai Zheng, IBM China Research Lab
Haichuan Wang, IBM China Research Lab
Haibo Lin, IBM China Research Lab
Ling Shao, IBM China Research Lab

Multi-core processors are widely used in computer systems. As the performance of microprocessors greatly exceeds that of memory, the "memory wall" becomes a well known issue. It is important to understand how the large disparity of speed between processor and memory influences the performance and scalability of Java applications on emerging multi-core platforms.

In this paper, we studied two popular Java benchmarks, SPECjbb2005 and SPECjvm2008, on multi-core platforms including Intel Clovertown and AMD Phenom. We focus on the "partially scalable" benchmark programs. With smaller number of CPU cores these programs scale perfectly, but when more cores and software threads are used, the slope of the scalability curve degrades dramatically.

In the experiments, we identified a strong correlation between scalability, object allocation rate and write traffice on memory BUS for those "partially scalable" applications. We find that they all allocate large amount of memory and consume almost all the memory write bandwidth of the hardware platform. Based on the fact that the write bandwidth on emerging multi-core platforms are so limited by hardware, a hypothesis is proposed as: "The scalability and performance is limited by the object allocation on emerging multi-core platforms for those objects-allocation intensive Java applications", as if these applications are running into an "Allocation Wall".

In order to verify this hypothesis, several experiments are performed, including measuring key architecture level metrics, composing a micro-benchmark program, and modifying some of the "partially scalable" programs to see the effects. All the experiments gave positive results, strongly suggesting the existence of the Allocation Wall.

NUMA-Aware Memory Manager with Thread Affinity Based Object Copying
Takeshi Ogasawara, IBM Tokyo Research Laboratory

We propose a novel online method of identifying the preferred NUMA nodes for objects with negligible overhead during the garbage collection time as well as object allocation time. As the number of CPUs (or NUMA nodes) is increasing recently, it is critical for the memory manager of the runtime environment of an object-oriented language to exploit the low latency of the local memory for high performance. To resolve on which CPU the thread that frequently accesses an object is running, prior research uses the runtime information about memory accesses sampled by the hardware. However, the overhead of this approach is high for a garbage collector. We demonstrate that our approach that does not require memory access samples can improve the performance of benchmark programs in a variety of categories, including SPECpower ssj2008, SPECjbb2005, and SPECjvm2008. We prototyped a NUMA-aware memory manager on a modified version of IBM Java VM and tested on a cc-NUMA POWER6 machine that consists of eight NUMA nodes. Our NUMA-aware GC achieved additional performance improvement up to 18.6% and 2.8% on average over a JVM only with the NUMA-aware allocator. The total improvement using both the NUMA-aware allocator and GC is up to 50.3% and 9.3% on average.

Executing Code in the Past: Efficient In-Memory Object Graph Versioning
Frédéric Pluquet, Université Libre de Bruxelles
Stefan Langerman, Université Libre de Bruxelles
Roel Wuyts, Imec, Leuven and Katholieke Universiteit Leuven

Object versioning refers to how an application can have access to previous states of its objects. Implementing this mechanism is hard because it needs to be efficient in space and time, and well integrated with the programming language. This paper presents HistOOry, an object versioning system that uses an efficient data structure to store and retrieve past states. It needs only three primitives, and existing code does not need to be modified in order to be versioned. It provides fine-grained control over what parts of objects are versioned and when. It stores all states, past and present, in memory. Code can be executed in the past of the system and will see the complete system at that point in time. We have implemented our model in Smalltalk and used it for three applications that need versioning: checked postconditions, stateful execution tracing and a planar point location implementation. Benchmarks are provided to asses the practical complexity of our implementation.

Please email any questions to . This e-mail address is being protected from spambots. You need JavaScript enabled to view it