Research Papers

Chair

Gregor Kiczales
University of British Columbia

OOPSLA has been the forum for the introduction and dissemination of many key programming paradigms and methodologies. This year is no exception. This year's research papers present substantiated new research or novel technical results, advance the state of the art, and report on significant experience and experimentation.

Query Integration

Caching and Incrementalisation in the Java Query Language

Room: 204

Date: Oct 21, 2008

Time: 10:30 - 11:00

Darren Willis
Victoria University of Wellington

David J. Pearce
Victoria University of Wellington

James Noble
Victoria University of Wellington

Abstract

Many contemporary object-oriented programming languages support first-class queries or comprehensions. These language extensions make it easier for programmers to write queries, but are generally implemented no more efficiently than the code using collections, iterators, and loops that they replace. Crucially, whenever a query is re-executed, it is recomputed from scratch. We describe a general approach to optimising queries over mutable objects: query results are cached, and those caches are incrementally maintained whenever the collections and objects underlying those queries are updated. We hope that the performance benefits of our optimisations may encourage more general adoption of first-class queries by object-oriented programmers.

Interprocedural Query Extraction for Transparent Persistence

Room: 204

Date: Oct 21, 2008

Time: 11:00 - 11:30

Ben Wiedermann
The University of Texas at Austin

Ali Ibrahim
The University of Texas at Austin

William R. Cook
The University of Texas at Austin

Abstract

Transparent persistence promises to integrate programming languages and databases by allowing programs to access persistent data with the same ease as non persistent data. In this work we demonstrate the feasibility of optimizing transparently persistent programs by extracting queries to efficiently prefetch required data. A static analysis derives query structure and conditions across methods that access persistent data. Using the static analysis, our system transforms the program to execute explicit queries. The transformed program composes queries across methods to handle method calls that return persistent data. We extend an existing Java compiler to implement the static analysis and program transformation, handling recursion and parameterized queries. We evaluate the effectiveness of query extraction on the OO7 and TORPEDO benchmarks. This work is focused on programs written in the current version of Java, without languages changes. However, the techniques developed here may also be of value in conjunction with object oriented languages extended with high level query syntax.

Deep Typechecking and Refactoring

Room: 204

Date: Oct 21, 2008

Time: 11:30 - 12:00

Zachary Tatlock
UC San Diego

Chris Tucker
UC San Diego

David Shuffelton
UC San Diego

Ranjit Jhala
UC San Diego

Sorin Lerner
UC San Diego

Abstract

Large software systems are typically composed of multiple layers, written in different languages and loosely coupled using a string based interface. For example, in modern web applications, a server written in Java communicates with a database back end by passing in query strings. This widely prevalent approach is unsafe as the analyses developed for the individual layers are oblivious to the semantics of the dynamically constructed strings, making it impossible to statically reason about the correctness of the interaction. Further, even simple refactoring in such systems is daunting and error prone as the changes must also be applied to isolated string fragments scattered across the code base.

We present techniques for deep typechecking and refactoring for systems that combine Java code with a database back end using the Java Persistence API [10]. Deep typechecking ensures that the queries that are constructed dynamically are type safe and that the values returned from the queries are used safely by the program. Deep refactoring builds upon typechecking to allow programmers to safely and automatically propagate code refactorings through the query string fragments.

Our algorithms are implemented in a tool called QUAIL. We present experiments evaluating the effectiveness of QUAIL on several benchmarks ranging from 3,369 to 82,907 lines of code. We show that QUAIL is able to verify that 84% of query strings in our benchmarks are type safe. Finally, we show that QUAIL reduces the number of places in the code that a programmer must look at in order to perform a refactoring by several orders of magnitude.

Language Design

Join Patterns for Visual Basic

Room: 204

Date: Oct 21, 2008

Time: 13:30 - 14:00

Claudio V. Russo
Microsoft Research Ltd

Abstract

We describe an extension of Visual Basic 9.0 with asynchronous concurrency constructs - join patterns - based on the join calculus. Our design of Concurrent Basic (CB) builds on earlier work on Polyphonic C# and Comega. Since that work, the need for language-integrated concurrency has only grown, both due to the arrival of commodity, multi-core hardware, and the trend for Rich Internet Applications that rely on asynchronous client-server communication to hide latency. Unlike its predecessors, CB adopts an event-like syntax that should be familiar to existing VB programmers. Coupled with Generics, CB allows one to declare re-useable concurrency abstractions that were clumsy to express previously. CB removes its ancestors' inconvenient inheritance restriction, while providing new extensibility points useful in practical applications that must co-exist with or want to exploit alternative threading models available on the platform. CB is implemented as an extension of the production VB 9.0 compiler.

Whiteoak: Introducing Structural Typing into Java

Room: 204

Date: Oct 21, 2008

Time: 14:00 - 14:30

Itay Maman
Technion - Israel Institute of Technology

Joseph (Yossi) Gil
Technion - Israel Institute of Technology

Abstract

This paper presents Whiteoak: a Java extension that introduces structural type equivalence and subtyping into the language. We argue that structural subtyping addresses common software design problems, and promotes the development of loosely coupled modules without compromising type safety. We discuss language design issues, including subtyping in face of self referencing structural types, compile time operators for computing the new types from existing ones, and the semantics of constructors and non abstract methods in structural types. We describe implementation techniques, including the compile time and run time challenges that we faced (in particular, preserving the identity of objects). Measurement indicate that the performance of our implementation of structural dispatching is comparable to that of the JVM's standard invocation mechanisms.

Mixing Source and Bytecode - A Case for Compilation by Normalization

Room: 204

Date: Oct 21, 2008

Time: 14:30 - 15:00

Lennart C. L. Kats
Delft University of Technology

Martin Bravenboer
University of Oregon

Eelco Visser
Delft University of Technology

Abstract

Language extensions increase programmer productivity by providing concise, often domain-specific syntax, and support for static verification of correctness, security, and style constraints. Language extensions can often be realized through translation to the base language, supported by preprocessors and extensible compilers. However, various kinds of extensions require further adaptation of a base compiler's internal stages and components, for example to support separate compilation or to make use of low-level primitives of the platform (e.g., jump instructions or unbalanced synchronization). To allow for a more loosely coupled approach, we propose an open compiler model based on normalization steps from a high-level language to a subset of it, the core language. We developed such a compiler for a mixed Java and (core) bytecode language, and evaluate its effectiveness for composition mechanisms such as traits, as well as statement-level and expression-level language extensions.

Runtime

Tolerating Memory Leaks

Room: 204

Date: Oct 21, 2008

Time: 15:30 - 16:00

Michael D. Bond
University of Texas at Austin

Kathryn S. McKinley
University of Texas at Austin

Abstract

Type safety and garbage collection in managed languages eliminate memory errors such as dangling pointers, double frees, and leaks of unreachable objects. Unfortunately, a program still leaks memory if it maintains references to objects it will never use again. Leaked objects decrease program locality and increase garbage collection frequency and workload. A growing leak will eventually exhaust memory and crash the program.

This paper introduces a leak tolerance approach called Melt that safely eliminates performance degradations and crashes due to leaks of dead but reachable objects in managed languages, given sufficient disk space to hold leaking objects. Melt (1) identifies stale objects that the program is not accessing; (2) segregates in use and stale objects by storing stale objects to disk; and (3) preserves safety by activating stale objects if the program subsequently accesses them. We design and build a prototype implementation of Melt in a Java VM and show it adds overhead low enough for production systems. Whereas existing VMs grind to a halt and then crash on programs with leaks, Melt keeps many of these programs running much longer without significantly degrading performance. Melt provides users the illusion of a fixed leak and gives developers more time to fix leaky programs.

Jolt: Lightweight Dynamic Analysis and Removal of Object Churn

Room: 204

Date: Oct 21, 2008

Time: 16:00 - 16:30

Ajeet Shankar
UC Berkeley

Matthew Arnold
IBM

Rastislav Bodik
UC Berkeley

Abstract

It has been observed that component based applications exhibit "object churn" excessive creation of short lived objects, often caused by trading performance for modularity. Because churned objects are short lived, they appear to be good candidates for stack allocation. Unfortunately, most churned objects escape their allocating function, making escape analysis ineffective.

We reduce object churn with three contributions. First, we formalize two measures of object churn, capture and control. Second, we develop lightweight dynamic analyses for measuring both capture and control. Third, we develop an algorithm that uses capture and control to inline portions of the call graph to make churned objects non escaping, enabling churn optimization via escape analysis.

Jolt is a lightweight dynamic churn optimizer that uses our algorithms. We embedded Jolt in the JIT compiler of the IBM J9 commercial JVM, and evaluated Jolt on large application frameworks, including Eclipse and JBoss. We found that Jolt eliminates over 4 times as many allocations as a state of the art escape analysis alone.

QVM: An Efficient Runtime for Detecting Defects in Deployed Systems

Room: 204

Date: Oct 21, 2008

Time: 16:30 - 17:00

Matthew Arnold
IBM TJ Watson Research Center

Martin Vechev
IBM TJ Watson Research Center

Eran Yahav
IBM TJ Watson Research Center

Abstract

Coping with software defects that occur in the post-deployment stage is a challenging problem: bugs may occur only when the system uses a specific configuration and only under certain usage scenarios. Nevertheless, halting production systems until the bug is tracked and fixed is often impossible. Thus, developers have to try to reproduce the bug in laboratory conditions. Often the reproduction of the bug consists of the lion share of the debugging effort.

In this paper we suggest an approach to address the aforementioned problem by using a specialized runtime environment (QVM, for Quality Virtual Machine). QVM efficiently detects defects by continuously monitoring the execution of the application in a production setting. QVM enables the efficient checking of violations of user-specified correctness properties, e.g., typestate safety properties, Java assertions, and heap properties pertaining to ownership.

QVM is markedly different from existing techniques for continuous monitoring by using a novel overhead manager which enforces a user-specified overhead budget for quality checks. Existing tools for error detection in the field usually disrupt the operation of the deployed system. QVM, on the other hand, provides a balanced trade off between the cost of the monitoring process and the maintenance of sufficient accuracy for detecting defects. Specifically, the overhead cost of using QVM instead of a standard JVM, is low enough to be acceptable in production environments.

We implemented QVM on top of IBM's J9 Java Virtual Machine and used it to detect and fix various errors in real-world applications.

Concurrency

Contention-Aware Scheduler: Unlocking Execution Parallelism in Multithreaded Java Programs

Room: 204

Date: Oct 22, 2008

Time: 8:30 - 9:00

Feng Xian
University of Nebraska-Lincoln

Witawas Srisa-an
University of Nebraska-Lincoln

Hong Jiang
University of Nebraska-Lincoln

Abstract

In multithreaded programming, locks are frequently used as a mechanism for synchronization. Because today's operating systems do not consider lock usage as a scheduling criterion, scheduling decisions can be unfavorable to multithreaded applications, leading to performance issues such as convoying and heavy lock contention in systems with multiple processors. Previous efforts to address these issues (e.g., transactional memory, lock free data structure) often treat scheduling decisions as "a fact of life," and therefore these solutions try to cope with the consequences of undesirable scheduling instead of dealing with the problem directly.

In this paper, we introduce Contention Aware Scheduler (CA Scheduler), which is designed to support efficient execution of large multithreaded Java applications in multiprocessor systems. Our proposed scheduler employs a scheduling policy that reduces lock contention. As will be shown in this paper, our prototype implementation of the CA Scheduler in Linux and Sun HotSpot virtual machine only incurs 3.5% runtime overhead, while the overall performance differences, when compared with a system with no contention awareness, range from a degradation of 3% in a small multithreaded benchmark to an improvement of 15% in a large Java application server benchmark.

Dynamic Optimization for Efficient Strong Atomicity

Room: 204

Date: Oct 22, 2008

Time: 9:00 - 9:30

Florian T. Schneider
ETH

Vijay Menon
Google

Tatiana Shpeisman
Intel Corporation

Ali-Reza Adl-Tabatabai
Intel Corporation

Abstract

Transactional memory (TM) is a promising concurrency control alternative to locks. Recent work has highlighted important memory model issues regarding TM semantics and exposed problems in existing TM implementations. For safe, managed languages such as Java, there is a growing consensus towards strong atomicity semantics as a sound, scalable solution.

Strong atomicity has presented a challenge to implement efficiently because it requires instrumentation of non-transactional memory accesses, incurring significant overhead even when a program makes minimal or no use of transactions. To minimize overhead, existing solutions require either a sophisticated type system, specialized hardware, or static whole-program analysis. These techniques do not translate easily into a production setting on existing hardware.

In this paper, we present novel dynamic optimizations that significantly reduce strong atomicity overheads and make strong atomicity practical for dynamic language environments. We introduce analyses that optimistically track which non-transactional memory accesses can avoid strong atomicity instrumentation, and we describe a lightweight speculation and recovery mechanism that applies these analyses to generate speculatively-optimized but safe code for strong atomicity in a dynamically-loaded environment. We show how to implement these mechanisms efficiently by leveraging existing dynamic optimization infrastructure in a Java system. Measurements on a set of transactional and non-transactional Java workloads demonstrate that our techniques substantially reduce the overhead of strong atomicity from a factor of 5x down to 10% or less over an efficient weak atomicity baseline.

Design and Implementation of Transactional Constructs for C/C++

Room: 204

Date: Oct 22, 2008

Time: 9:30 - 10:00

Yang Ni
Intel Corporation

Adam Welc
Intel Corporation

Ali-Reza Adl-Tabatabai
Intel Corporation

Moshe Bach
Intel Corporation

Sion Berkowits
Intel Corporation

James Cownie
Intel Corporation

Robert Geva
Intel Corporation

Sergey Kozhukow
Intel Corporation

Ravi Narayanaswamy
Intel Corporation

Jeffrey Olivier
Intel Corporation

Serguei Preis
Intel Corporation

Bratin Saha
Intel Corporation

Ady Tal
Intel Corporation

Xinmin Tian
Intel Corporation

Abstract

This paper presents a software transactional memory system that introduces first-class C++ language constructs for transactional programming. We describe new C++ language extensions, a production-quality optimizing C++ compiler that translates and optimizes these extensions, and a high-performance STM runtime library. The transactional language constructs support C++ language features including classes, inheritance, virtual functions, exception handling, and templates. The compiler automatically instruments the program for transactional execution and optimizes TM overheads. The runtime library implements multiple execution modes and implements a novel STM algorithm that supports both optimistic and pessimistic concurrency control. The runtime switches a transaction's execution mode dynamically to improve performance and to handle calls to precompiled functions and I/O libraries. We present experimental results on a modern 8-core processor running a set of 20 non-trivial parallel programs ported to use our language extensions. Our measurements show that (1) our system scales well as the numbers of cores increases; (2) unlike prior work that assumed or concluded pessimistic concurrency control performs strictly worse than optimistic, pessimistic concurrency is competitive with optimistic and is sometimes better, so no one scheme clearly dominates; (3) the ability to switch modes dynamically is important for workloads with high contention; and (4) our compiler optimizations help both optimistic and pessimistic concurrency control.

Formal Methods

jStar: Towards Practical Verification for Java

Room: 204

Date: Oct 22, 2008

Time: 10:30 - 11:00

Dino Distefano
Queen Mary, University of London

Matthew J. Parkinson J
University of Cambridge

Abstract

In this paper we introduce a novel methodology for verifying a large set of Java programs which builds on recent theoretical developments in program verification: it combines the idea of abstract predicate families and the idea of symbolic execution and abstraction using separation logic. The proposed technology has been implemented in a new automatic verification system, called jStar, which combines theorem proving and abstract interpretation techniques.

We demonstrate the effectiveness of our methodology by using jStar to verify example programs implementing four popular design patterns (subject/observer, visitor, factory, and pooling). Although these patterns are extensively used by object-oriented developers in real-world applications, so far they have been highly challenging for existing object-oriented verification techniques.

Verifying Correct Usage of Atomic Blocks and Typestate

Room: 204

Date: Oct 22, 2008

Time: 11:00 - 11:30

Nels E. Beckman
Carnegie Mellon University

Kevin Bierhoff
Carnegie Mellon University

Jonathan Aldrich
Carnegie Mellon University

Abstract

The atomic block, a synchronization primitive provided to programmers in transactional memory systems, has the potential to greatly ease the development of concurrent software. However, atomic blocks can still be used incorrectly, and race conditions can still occur at the level of application logic. In this paper, we present a intraprocedural static analysis, formalized as a type system and proven sound, that helps programmers use atomic blocks correctly. Using access permissions, which describe how objects are aliased and modified, our system statically prevents race conditions and enforces typestate properties in concurrent programs. We have implemented a prototype static analysis for the Java language based on our system and have used it to verify several realistic examples.

Enforcing Object Protocols by Combining Static and Runtime Analysis

Room: 204

Date: Oct 22, 2008

Time: 11:30 - 12:00

Madhu Gopinathan
Indian Institute of Science

Sriram K Rajamani
Microsoft Research India

Abstract

In this paper, we consider object protocols that constrain interactions between objects in a program. Several such protocols have been proposed in the literature. For many APIs (such as JDOM, JDBC), API designers constrain how API clients interact with API objects. In practice, API clients violate such constraints, as evidenced by postings in discussion forums for these APIs. Thus, it is important that API designers specify constraints using appropriate object protocols and enforce them. The goal of an object protocol is expressed as a protocol invariant. Fundamental properties such as ownership can be expressed as protocol invariants. We present a language, PROLANG, to specify object protocols along with their protocol invariants, and a tool, INVCOP++, to check if a program satisfies a protocol invariant. INVCOP++ separates the problem of checking if a protocol satisfies its protocol invariant (called protocol correctness), from the problem of checking if a program conforms to a protocol (called program conformance). The former is solved using static analysis, and the latter using runtime analysis. Due to this separation (1) errors made in protocol design are detected at a higher level of abstraction, independent of the program's source code, and (2) performance of conformance checking is improved as protocol correctness has been verified statically. We present theoretical guarantees about the way we combine static and runtime analysis, and empirical evidence that our tool INVCOP++ finds usage errors in widely used APIs. We also show that statically checking protocol correctness greatly optimizes the overhead of checking program conformance, thus enabling API clients to test whether their programs use the API as intended by the API designer.

Refactoring

The Impact of Static-Dynamic Coupling on Remodularization

Room: 204

Date: Oct 22, 2008

Time: 13:30 - 14:00

Rick Chern
The University of British Columbia

Kris De Volder
The University of British Columbia

Abstract

We explore the concept of static-dynamic coupling---the degree to which changes in a program's static modular structure imply changes to its dynamic structure. This paper investigates the impact of static-dynamic coupling in a programming language on the effort required to evolve the coarse modular structure of programs written in that language. We performed a series of remodularization case studies in both Java and SubjectJ. SubjectJ is designed to be similar to Java, but have strictly less static-dynamic coupling. Our results include quantitative measures---time taken and number of bugs introduced---as well as a more subjective qualitative analysis of the remodularization process. All results point in the same direction and suggest that static-dynamic coupling causes substantial accidental complexity for the remodularization of Java programs.

Sound and Extensible Renaming for Java

Room: 204

Date: Oct 22, 2008

Time: 14:00 - 14:30

Max Schäfer
University of Oxford

Torbjörn Ekman
University of Oxford

Oege de Moor
University of Oxford

Abstract

Descriptive names are crucial to understand code. However, good names are notoriously hard to choose and manually changing a globally visible name can be a maintenance nightmare. Hence, tool support for automated renaming is an essential aid for developers and widely supported by popular development environments. This work improves on two limitations in current refactoring tools: too weak preconditions that lead to unsoundness where names do not bind to the correct declarations after renaming, and too strong preconditions that prevent renaming of certain programs. We identify two main reasons for unsoundness: complex name lookup rules make it hard to define sufficient preconditions, and new language features require additional preconditions. We alleviate both problems by presenting a novel extensible technique for creating symbolic names that are guaranteed to bind to a desired entity in a particular context by inverting lookup functions. The inverted lookup functions can then be tailored to create qualified names where otherwise a conflict would occur, allowing the refactoring to proceed and improve on the problem with too strong preconditions. We have implemented renaming for Java as an extension to the JastAdd Extensible Java Compiler and integrated it in Eclipse. We show examples for which other refactoring engines have too weak preconditions, as well as examples where our approach succeeds in renaming entities by inserting qualifications. To validate the extensibility of the approach we have implemented renaming support for Java 5 and AspectJ like inter-type declarations as modular extensions to the initial Java 1.4 refactoring engine. The renaming engine is only a few thousand lines of code including extensions and performance is on par with industrial strength refactoring tools.

Annotation Refactoring: Inferring Upgrade Transformations for Legacy Applications

Room: 204

Date: Oct 22, 2008

Time: 14:30 - 15:00

Wesley Tansey
Virginia Tech

Eli Tilevich
Virginia Tech

Abstract

Since annotations were added to the Java language, many frameworks have moved to using annotated Plain Old Java Objects (POJOs) in their newest releases. Legacy applications are thus forced to undergo extensive restructuring in order to migrate from old framework versions to new versions based on annotations (Version Lock-in). Additionally, because annotations are embedded in the application code, changing between framework vendors may also entail large-scale manual changes (Vendor Lock-in). This paper presents a novel refactoring approach that effectively solves these two problems. Our approach infers a concise set of semantics-preserving transformation rules from two versions of a single class. Unlike prior approaches that detect only simple structural refactorings, our algorithm can infer general composite refactorings and is more than 97% accurate on average. We demonstrate the effectiveness of our approach by automatically upgrading more than 80K lines of the unit testing code of four open-source Java applications to use the latest version of the popular JUnit testing framework.

Program Analysis

Enabling Static Analysis for Partial Java Programs

Room: 204

Date: Oct 22, 2008

Time: 15:30 - 16:00

Barthelemy Dagenais
McGill University

Laurie Hendren
McGill University

Abstract

Software engineering tools often deal with the source code of programs retrieved from the web or source code repositories. Typically, these tools only have access to a subset of a program's source code (one file or a subset of files) which makes it difficult to build a complete and typed intermediate representation (IR). Indeed, for incomplete object oriented programs, it is not always possible to completely disambiguate the syntactic constructs and to recover the declared type of certain expressions because the declaration of many types and class members are not accessible.

We present a framework that performs partial type inference and uses heuristics to recover the declared type of expressions and resolve ambiguities in partial Java programs. Our framework produces a complete and typed IR suitable for further static analysis. We have implemented this framework and used it in an empirical study on four large open source systems which shows that our system recovers most declared types with a low error rate, even when only one class is accessible.

Safer Unsafe Code for .NET

Room: 204

Date: Oct 22, 2008

Time: 16:00 - 16:30

Pietro Ferrara
Ecole polytechnique

Francesco Logozzo
Microsoft Research

Manuel Fahndrich
Microsoft Research

Abstract

The .NET intermediate language (MSIL) allows expressing both statically verifiable memory and type safe code (typically called managed), as well as unsafe code using direct pointer manipulations. Unsafe code can be expressed in C# by marking regions of code as unsafe. Writing unsafe code can be useful where the rules of managed code are too strict. The obvious drawback of unsafe code is that it opens the door to programming errors typical of C and C++, namely memory access errors such as buffer overruns. Worse, a single piece of unsafe code may corrupt memory and destabilize the entire runtime or allow attackers to compromise the security of the platform. We present a new static analysis based on abstract interpretation to check memory safety for unsafe code in the .NET framework. The core of the analysis is a new numerical abstract domain, Strp, which is used to efficiently compute memory invariants. Strp is combined with lightweight abstract domains to raise the precision, yet achieving scalability. We implemented this analysis in Clousot, a generic static analyzer for .NET. In combination with contracts expressed in FoxTrot, an MSIL based annotation language for .NET, our analysis provides static safety guarantees on memory accesses in unsafe code. We tested it on all the assemblies of the .NET framework. We compare our results with those obtained using existing domains, showing how they are either too imprecise (e.g., Intervals or Octagons) or too expensive (Polyhedra) to be used in practice.

Typestate-like Analysis of Multiple Interacting Objects

Room: 204

Date: Oct 22, 2008

Time: 16:30 - 17:00

Nomair A. Naeem
University of Waterloo

Ondrej Lhotak
University of Waterloo

Abstract

This paper presents a static analysis of typestate like temporal specifications of groups of interacting objects, which are expressed using tracematches. Whereas typesate expresses a temporal specification of one object, a tracematch state may change due to operations on any of a set of related objects bound by the tracematch. The paper proposes a lattice based operational semantics equivalent to the original tracematch semantics but better suited to static analysis. The paper defines a static analysis that computes precise local points to sets and tracks the flow of individual objects, thereby enabling strong updates of the tracematch state. The analysis has been proved sound with respect to the semantics. A context sensitive version of the analysis has been implemented as instances of the IFDS and IDE algorithms. The analysis was evaluated on tracematches used in earlier work and found to be very precise. Remaining imprecisions could be eliminated with more precise modeling of references from the heap and of exceptional control flow.

Performance

Java Performance Evaluation through Rigorous Replay Compilation

Room: 204

Date: Oct 23, 2008

Time: 8:30 - 9:00

Andy Georges
Ghent University

Lieven Eeckhout
Ghent University

Dries Buytaert
Ghent University

Abstract

A managed runtime environment, such as the Java virtual machine, is non-trivial to benchmark. Java performance is affected in various complex ways by the application and its input, as well as by the virtual machine (JIT optimizer, garbage collector, thread scheduler, etc.). In addition, non-determinism due to timer-based sampling for JIT optimization, thread scheduling, and various system effects further complicate the Java performance benchmarking process. Replay compilation is a recently introduced Java performance analysis methodology that aims at controlling non-determinism to improve experimental repeatability. The key idea of replay compilation is to control the compilation load during experimentation by inducing a pre-recorded compilation plan at replay time. Replay compilation also enables teasing apart performance effects of the application versus the virtual machine. This paper argues that in contrast to current practice which uses a single compilation plan at replay time, multiple compilation plans add statistical rigor to the replay compilation methodology. By doing so, replay compilation better accounts for the variability observed in compilation load across compilation plans. In addition, we propose matched-pair comparison for statistical data analysis. Matched-pair comparison considers the performance measurements per compilation plan before and after an innovation of interest as a pair, which enables limiting the number of compilation plans needed for accurate performance analysis compared to statistical analysis assuming unpaired measurements.

Analysis and Reduction of Memory Inefficiencies in Java Strings

Room: 204

Date: Oct 23, 2008

Time: 9:00 - 9:30

Kiyokuni Kawachiya
IBM Tokyo Research Laboratory

Kazunori Ogata
IBM Tokyo Research Laboratory

Tamiya Onodera
IBM Tokyo Research Laboratory

Abstract

This paper describes a novel approach to reduce the memory consumption of Java programs, by focusing on their "string memory inefficiencies". In recent Java applications, string data occupies a large amount of the heap area. For example, about 40% of the live heap area is used for string data when a production J2EE application server is running.

By investigating the string data in the live heap, we identified two types of memory inefficiencies --- "duplication" and "unused literals". In the heap, there are many string objects that have the same values. There also exist many string literals whose values are not actually used by the application. Since these inefficiencies exist as live objects, they cannot be eliminated by existing garbage collection techniques, which only remove dead objects. Quantitative analysis of Java heaps in real applications revealed that more than 50% of the string data in the live heap is wasted by these inefficiencies.

To reduce the string memory inefficiencies, this paper proposes two techniques at the Java virtual machine level, "StringGC" for eliminating duplicated strings at the time of garbage collection, and "Lazy Body Creation" for delaying part of the literal instantiation until the literal's value is actually used. We also present an interesting technique at the Java program level, which we call "BundleConverter", for preventing unused message literals from being instantiated.

Prototype implementations on a production Java virtual machine have achieved about 18% reduction of the live heap in the production application server. The proposed techniques could also reduce the live heap of standard Java benchmarks by 11.6% on average, without noticeable performance degradation.

Analyzing the Performance of Code-copying Virtual Machines

Room: 204

Date: Oct 23, 2008

Time: 9:30 - 10:00

Gregory B. Prokopski
McGill University, Sable Research Group

Clark Verbrugge
McGill University, Sable Research Group

Abstract

Many popular programming languages use interpreter-based execution for portability, supporting dynamic or reflective properties, and ease of implementation. Code-copying is an optimization technique for interpreters that reduces the performance gap between interpretation and JIT compilation, offering significant speedups over direct-threading interpretation. Due to varying language features and virtual machine design, however, not all languages benefit from codecopying to the same extent. We consider here properties of interpreted languages, and in particular bytecode and virtual machine construction that enhance or reduce the impact of code-copying. We implemented code-copying and compared performance with the original direct-threading virtual machines for three languages, Java (SableVM), OCaml, and Ruby (Yarv), examining performance on three different architectures, ia32 (Pentium 4), x86 64 (AMD64) and PowerPC (G5). Best speedups are achieved on ia32 by OCaml (maximum 4.88 times, 2.81 times on average), where a small and simple bytecode design facilitates improvements to branch prediction brought by code-copying. Yarv only slightly improves over direct-threading; large working sizes of bytecodes, and a relatively small fraction of time spent in the actual interpreter loop both limit the application of codecopying and its overall net effect. We are able to show that simple ahead of time analysis of VM and execution properties can help determine the suitability of code-copying for a particular VM before an implementation of code-copying is even attempted.

Type Systems

Generics of a Higher Kind

Room: 204

Date: Oct 23, 2008

Time: 10:30 - 11:00

Adriaan Moors
Katholieke Universiteit Leuven

Frank Piessens
Katholieke Universiteit Leuven

Martin Odersky
École Polytechnique Fédérale de Lausanne

Abstract

With Java 5 and C# 2.0, first-order parametric polymorphism was introduced in mainstream object-oriented programming languages under the name of generics. Although the first-order variant of generics is very useful, it also imposes some restrictions: it is possible to abstract over a type, but the resulting type constructor cannot be abstracted over. This can lead to code duplication. We removed this restriction in Scala, by allowing type constructors as type parameters and abstract type members. This paper presents the design and implementation of the resulting type constructor polymorphism. Furthermore, we study how this feature interacts with existing object-oriented constructs, and show how it makes the language more expressive.

The Visitor Pattern as a Reusable, Generic, Type-Safe Component

Room: 204

Date: Oct 23, 2008

Time: 11:00 - 11:30

Bruno Oliveira
University of Oxford

Meng Wang
University of Oxford

Jeremy Gibbons
University of Oxford

Abstract

The Visitor design pattern shows how to separate the structure of an object hierarchy from the behaviour of traversals over that hierarchy. The pattern is very flexible; this very flexibility makes it difficult to capture the pattern as anything more formal than prose, pictures and prototypes.

We show how to capture the essence of the Visitor pattern as a reusable software library, by using advanced type system features appearing in modern object oriented languages such as Scala. We preserve type safety statically and modularly: no reflection or similar mechanisms are used and modules can be independently compiled. The library is generic, in two senses: not only is it parametrised by both the return type and the shape of the object hierarchy, but also it allows a number of implementation choices (internal versus external control, imperative versus functional behaviour, orthogonal aspects such as tracing and memoisation) to be specified by parameters rather than fixed in early design decisions. Finally, we propose a generalised datatype like notation,on top of our visitor library: this provides a convenient functional decomposition style in object oriented languages.

Constrained Types for Object-Oriented Languages

Room: 204

Date: Oct 23, 2008

Time: 11:30 - 12:00

Nathaniel Nystrom
IBM Research

Vijay Saraswat
IBM Research

Jens Palsberg
UCLA

Christian Grothoff
University of Denver

Abstract

X10 is a modern object-oriented language designed for productivity and performance in concurrent and distributed systems. In this setting, dependent types offer significant opportunities for detecting design errors statically, documenting design decisions, eliminating costly run time checks (e.g., for array bounds, null values), and improving the quality of generated code.

We present the design and implementation of constrained types, a natural, simple, clean, and expressive extension to object-oriented programming: A type C {c } names a class or interface C and a constraint c on the immutable state of C and in-scope final variables. Constraints may also be associated with class definitions (representing class invariants) and with method and constructor definitions (representing preconditions). Dynamic casting is permitted. The system is parametric on the underlying constraint system: the compiler supports a simple equality-based constraint system but, in addition, supports extension with new constraint systems using compiler plugins.

Type Inference and Tools

Efficient Local Type Inference

Room: 204

Date: Oct 23, 2008

Time: 13:30 - 14:00

Ben Bellamy
Programming Tools Group, University of Oxford

Pavel Avgustinov
Programming Tools Group, University of Oxford

Oege de Moor
Programming Tools Group, University of Oxford

Damien Sereni
Programming Tools Group, University of Oxford

Abstract

Inference of static types for local variables in Java bytecode is the first step of any serious tool that manipulates bytecode, be it for decompilation, transformation or analysis. It is important, therefore, to perform that step as accurately and efficiently as possible. Previous work has sought to give solutions with good worst-case complexity. We present a novel algorithm, which is optimised for the common case rather than worst-case performance. It works by first finding a set of minimal typings that are valid for all assignments, and then checking whether these minimal typings satisfy all uses. Unlike previous algorithms, it does not explicitly build a data structure of type constraints, and it is easy to implement efficiently. We prove that the algorithm produces a typing that is both sound (obeying the rules of the language) and as tight as possible. We then go on to present extensive experiments, comparing the results of the new algorithm against the previously best known method. The experiments include bytecode that is generated in other ways than compilation of Java source. The new algorithm is always faster, typically by a factor 6, but on some real benchmarks the gain is as high as a factor of 92. Furthermore, whereas that previous method is sometimes suboptimal, our algorithm always returns a tightest possible type. We also discuss in detail how we handle primitive types, which is a difficult issue due to the discrepancy in their treatment between Java bytecode and Java source. For the application to decompilation, however, it is very important to handle this correctly.

Efficient Software Model Checking of Soundness of Type Systems

Room: 204

Date: Oct 23, 2008

Time: 14:00 - 14:30

Michael Roberson
University of Michigan

Melanie Harries
University of Michigan

Paul T. Darga
University of Michigan

Chandrasekhar Boyapati
University of Michigan

Abstract

This paper presents novel techniques for checking the soundness of a type system automatically using a software model checker. Our idea is to systematically generate every type correct intermediate program state (within some finite bounds), execute the program one step forward if possible using its small step operational semantics, and then check that the resulting intermediate program state is also type correct---but do so efficiently by detecting similarities in this search space and pruning away large portions of the search space. Thus, given only a specification of type correctness and the small step operational semantics for a language, our system automatically checks type soundness by checking that the progress and preservation theorems hold for the language (albeit for program states of at most some finite size). Our preliminary experimental results on several languages---including a language of integer and boolean expressions, a simple imperative programming language, an object-oriented language which is a subset of Java, and a language with ownership types---indicate that our approach is feasible and that our search space pruning techniques do indeed significantly reduce what is otherwise an extremely large search space. Our paper thus makes contributions both in the area of checking soundness of type systems, and in the area of reducing the state space of a software model checker.

Java Type Inference Is Broken: Can We Fix It?

Room: 204

Date: Oct 23, 2008

Time: 14:30 - 15:00

Daniel Smith
Rice University

Robert Cartwright
Rice University

Abstract

Java 5, the most recent major update to the Java Programming Language, introduced a number of sophisticated features, including a major extension to the type system. While the technical details of these new features are complex, much of this complexity is hidden from the typical Java developer by an ambitious type inference mechanism. Unfortunately, the extensions to the Java 5 type system were so novel that their technical details had not yet been thoroughly investigated in the research literature. As a result, the Java 5 compiler includes a pragmatic but flawed type inference algorithm that is, by design, neither sound nor locally complete. The language specification points out that neither of these failures is catastrophic: the correctness of potentially unsound results must be verified during type checking; and incompleteness can usually be worked around by manually providing the method type parameter bindings for a given call site.

This paper dissects the type inference algorithm of Java 5 and proposes a signficant revision that is sound and able to calculate correct results where the Java 5 algorithm fails. The new algorithm is locally complete with the exception of a difficult corner case. Moreover, the new algorithm demonstrates that several arbitrary restrictions in the Java type system most notably the ban on lower bounded type parameter declarations and the limited expressibility of intersection types are unnecessary. We hope that this work will spur the evolution of a more coherent, more comprehensive generic type system for Java.

Aspects and Modularity

Delegation-based Semantics for Modularizing Crosscutting Concerns

Room: 204

Date: Oct 23, 2008

Time: 15:30 - 16:00

Hans Schippers
University of Antwerp

Dirk Janssens
University of Antwerp

Michael Haupt
Hasso-Plattner-Institut, University of Potsdam

Robert Hirschfeld
Hasso-Plattner-Institut, University of Potsdam

Abstract

We describe semantic mappings of four high level programming languages to our delegation based machine model for aspect oriented programming. One of the languages is a class based object oriented one. The other three represent extensions thereof that support various approaches to modularizing crosscutting concerns. We explain informally that an operational semantics expressed in terms of the model's concepts preserves the behavior of a program written in one of the high level languages. We hence argue our model to be semantically sound in that sense, as well as sufficiently expressive in order to correctly support features such as class based object oriented programming, the open classes and pointcut and advice flavors of aspect oriented programming, and dynamic layers. For the latter, being a core feature of context oriented programming, we also provide a formal semantics.

A Theory of Aspects as Latent Topics

Room: 204

Date: Oct 23, 2008

Time: 16:00 - 16:30

Pierre F. Baldi
University of California, Irvine

Cristina V. Lopes
University of California, Irvine

Erik J. Linstead
University of California, Irvine

Sushil K. Bajracharya
University of California, Irvine

Abstract

After more than 10 years, Aspect-Oriented Programming (AOP) is still a controversial idea. While the concept of aspects appeals to everyone's intuitions, concrete AOP solutions often fail to convince researchers and practitioners alike. This discrepancy results in part from a lack of an adequate theory of aspects, which in turn leads to the development of AOP solutions that are useful in limited situations.

We propose a new theory of aspects that can be summarized as follows: concerns are latent topics that can be automatically extracted using statistical topic modeling techniques adapted to software. Software scattering and tangling can be measured precisely by the entropies of the underlying topic over files and files over topics distributions. Aspects are latent topics with high scattering entropy. The theory is validated empirically on both the large scale, with a study of 4,632 Java projects, and the small scale, with a study of 5 individual projects. From these analyses, we identify two dozen topics that emerge as general purpose aspects across multiple projects, as well as project specific topics/concerns. The approach is also shown to produce results that are compatible with previous methods for identifying aspects, and also extends them.

Our work provides not only a concrete approach for identifying aspects at several scales in an unsupervised manner but, more importantly, a formulation of AOP grounded in information theory. The understanding of aspects under this new perspective makes additional progress toward the design of models and tools that facilitate software development.

Multiple Dispatch in Practice

Room: 204

Date: Oct 23, 2008

Time: 16:30 - 17:00

Radu Muschevici
Victoria University of Wellington

Alex Potanin
Victoria University of Wellington

Ewan Tempero
University of Auckland

James Noble
Victoria University of Wellington

Abstract

Multiple dispatch uses the run time types of more than one argument to a method call to determine which method body to run. While several languages over the last 20 years have provided multiple dispatch, most object-oriented languages still support only single dispatch - forcing programmers to implement multiple dispatch manually when required. This paper presents an empirical study of the use of multiple dispatch in practice, considering six languages that support multiple dispatch, and also investigating the potential for multiple dispatch in Java programs. We hope that this study will help programmers understand the uses and abuses of multiple dispatch; virtual machine implementors optimise multiple dispatch; and language designers to evaluate the choice of providing multiple dispatch in new programming languages.

View , Navigation and Search

Research Papers

Caching and Incrementalisation in the Java Query Language

Abstract

Interprocedural Query Extraction for Transparent Persistence

Abstract

Deep Typechecking and Refactoring

Abstract

Join Patterns for Visual Basic

Abstract

Whiteoak: Introducing Structural Typing into Java

Abstract

Mixing Source and Bytecode - A Case for Compilation by Normalization

Abstract

Tolerating Memory Leaks

Abstract

Jolt: Lightweight Dynamic Analysis and Removal of Object Churn

Abstract

QVM: An Efficient Runtime for Detecting Defects in Deployed Systems

Abstract

Contention-Aware Scheduler: Unlocking Execution Parallelism in Multithreaded Java Programs

Abstract

Dynamic Optimization for Efficient Strong Atomicity

Abstract

Design and Implementation of Transactional Constructs for C/C++

Abstract

jStar: Towards Practical Verification for Java

Abstract

Verifying Correct Usage of Atomic Blocks and Typestate

Abstract

Enforcing Object Protocols by Combining Static and Runtime Analysis

Abstract

The Impact of Static-Dynamic Coupling on Remodularization

Abstract

Sound and Extensible Renaming for Java

Abstract

Annotation Refactoring: Inferring Upgrade Transformations for Legacy Applications

Abstract

Enabling Static Analysis for Partial Java Programs

Abstract

Safer Unsafe Code for .NET

Abstract

Typestate-like Analysis of Multiple Interacting Objects

Abstract

Java Performance Evaluation through Rigorous Replay Compilation

Abstract

Analysis and Reduction of Memory Inefficiencies in Java Strings

Abstract

Analyzing the Performance of Code-copying Virtual Machines

Abstract

Generics of a Higher Kind

Abstract

The Visitor Pattern as a Reusable, Generic, Type-Safe Component

Abstract

Constrained Types for Object-Oriented Languages

Abstract

Efficient Local Type Inference

Abstract

Efficient Software Model Checking of Soundness of Type Systems

Abstract

Java Type Inference Is Broken: Can We Fix It?

Abstract

Delegation-based Semantics for Modularizing Crosscutting Concerns

Abstract

A Theory of Aspects as Latent Topics

Abstract

Multiple Dispatch in Practice

Abstract