This session showcases a collection of new optimization techniques to significantly improve the runtime performance of object-oriented programs. The first paper proposes a generalized feedback-directed optimization system that requires no prior instrumentation, describing ways how traditional as well as new optimizations are driven by feedback information, and measures their effectiveness. The second paper proposes a novel algorithm for reducing thread locking overhead called "lock reservation", by which Java thread locks are reserved in advance before they are actually used, exploiting locality of lock usage by threads. The third paper shows an extension of an efficient single-dispatch algorithm into a multi-dispatch variant, and demonstrates its time and space efficiency.
This paper describes the implementation of a general online feedback-directed optimization system. The system is fully automatic; it requires no prior profiling run. It uses a previously developed low-overhead instrumentation sampling framework to collect control flow graph edge profiles. This profile information is used to drive several traditional optimizations, as well as a novel algorithm for performing feedback-directed splitting. We empirically evaluate this system and demonstrate improvements in peak performance of up to 20% while overhead remains low, with no individual execution being degraded by more than 2% because of initial instrumentation.
Because of the built-in support for multi-threaded programming, Java programs perform many lock operations. Although the overhead has significantly been reduced in the recent virtual machines, one or more atomic operations are required for locking and unlocking an object even in the fastest cases. This paper presents a novel algorithm called "lock reservation". It exploits "thread locality" of Java locks, which claims that the locking sequence of a Java lock contains a very long sequence of a specific thread. The algorithm allows locks to be reserved for threads. When a thread attempts to acquire a lock, it can do without any atomic operation if the lock is reserved for the thread. Otherwise, it cancels the reservation and falls back to a conventional locking algorithm. We have evaluated an implementation of lock reservation in IBM's production virtual machine and compiler. The results show that it achieved performance improvements up to 28% in real Java programs.
The dispatching problem can be solved very efficiently in the single-inheritance setting. In this paper we show how to extend one such solution to the multiple-inheritance setting. This generalization comes with an increase to the space requirements by a small factor of nSlices. This factor can be thought of as a metric of the complexity of the topology of the inheritance hierarchy. On a data set of 35 hierarchies totaling some 64 thousand types, our dispatching data structure, based on a novel type slicing technique, exhibits very significant improvements over previous dispatching techniques, not only in terms of the time for creating the underlying data structure, but also in terms of total space used. The cost is in the dispatching time, which is no longer constant, but doubly logarithmic in the number of types. Conversely, by using a simple binary search, dispatching time is logarithmic in the number of different implementations. In practice dispatching uses one indirect branch and, on average, only 2.5 binary branches. Our results also have applications to the space-efficient implementation of the more general problem of dispatching multi-methods.