Tracks
Technical Program
Tutorials
Workshops
DesignFest
Educators' Symposium
Doctoral Symposium
Demonstrations
Posters
Student Research
Competition
Student Volunteers
Special Events
Exhibits
Housing Information
Registration Information
Transportation

Thursday, 7 November – 10:30-12:00 – Ballroom 6C

Session A

Optimizations

Chair: Craig Chambers
University of Washington

This session showcases a collection of new optimization techniques to significantly improve the runtime performance of object-oriented programs. The first paper proposes a generalized feedback-directed optimization system that requires no prior instrumentation, describing ways how traditional as well as new optimizations are driven by feedback information, and measures their effectiveness. The second paper proposes a novel algorithm for reducing thread locking overhead called "lock reservation", by which Java thread locks are reserved in advance before they are actually used, exploiting locality of lock usage by threads. The third paper shows an extension of an efficient single-dispatch algorithm into a multi-dispatch variant, and demonstrates its time and space efficiency.

Online Feedback-Directed Optimization of Java

Matthew Arnold
Rutgers University
Michael Hind
IBM T. J. Watson Research Center
Barbara G. Ryder
Rutgers University

This paper describes the implementation of a general online feedback-directed optimization system. The system is fully automatic; it requires no prior profiling run. It uses a previously developed low-overhead instrumentation sampling framework to collect control flow graph edge profiles. This profile information is used to drive several traditional optimizations, as well as a novel algorithm for performing feedback-directed splitting. We empirically evaluate this system and demonstrate improvements in peak performance of up to 20% while overhead remains low, with no individual execution being degraded by more than 2% because of initial instrumentation.

Lock Reservation: Java Locks Can Mostly Do Without Atomic Operations

Kiyokuni Kawachiya
IBM Tokyo Research Laboratory
Akira Koseki
IBM Tokyo Research Laboratory
Tamiya Onodera
IBM Tokyo Research Laboratory

Because of the built-in support for multi-threaded programming, Java programs perform many lock operations. Although the overhead has significantly been reduced in the recent virtual machines, one or more atomic operations are required for locking and unlocking an object even in the fastest cases. This paper presents a novel algorithm called "lock reservation". It exploits "thread locality" of Java locks, which claims that the locking sequence of a Java lock contains a very long sequence of a specific thread. The algorithm allows locks to be reserved for threads. When a thread attempts to acquire a lock, it can do without any atomic operation if the lock is reserved for the thread. Otherwise, it cancels the reservation and falls back to a conventional locking algorithm. We have evaluated an implementation of lock reservation in IBM's production virtual machine and compiler. The results show that it achieved performance improvements up to 28% in real Java programs.

Fast Algorithm for Creating Space Efficient Dispatching Tables with Application to Multi-Dispatching

Yoav Zibin
Technion
Yossi Gil
Technion

The dispatching problem can be solved very efficiently in the single-inheritance setting. In this paper we show how to extend one such solution to the multiple-inheritance setting. This generalization comes with an increase to the space requirements by a small factor of nSlices. This factor can be thought of as a metric of the complexity of the topology of the inheritance hierarchy. On a data set of 35 hierarchies totaling some 64 thousand types, our dispatching data structure, based on a novel type slicing technique, exhibits very significant improvements over previous dispatching techniques, not only in terms of the time for creating the underlying data structure, but also in terms of total space used. The cost is in the dispatching time, which is no longer constant, but doubly logarithmic in the number of types. Conversely, by using a simple binary search, dispatching time is logarithmic in the number of different implementations. In practice dispatching uses one indirect branch and, on average, only 2.5 binary branches. Our results also have applications to the space-efficient implementation of the more general problem of dispatching multi-methods.