Faster Feedback with AI? -- A Test Prioritization Study (Programming with AI 2024)

Who

Toni Mattis, Lukas Böhme, Eva Krebs, Martin C. Rinard, Robert Hirschfeld

Track

Programming with AI 2024

Time Zone

The program is currently displayed in (GMT+01:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+01:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 12 Mar 2024 13:15 - 13:45 at M:Teknodromen - Presentations and Panel

Abstract

Feedback during programming is desirable, but its usefulness depends on immediacy and relevance to the task. Unit and regression testing are practices to ensure programmers can obtain feedback on their changes; however, running a large test suite is rarely fast, and only a few results are relevant.

Identifying which tests in a test suite are most relevant to a change helps detect defects earlier during programming and selecting tests that serve as examples to help programmers understand particular code.

In this work, we describe an approach to evaluate how well large language models (LLMs) and embedding models can judge the relevance of a test to a change. We construct a dataset by applying faulty variations of real-world code changes and measuring whether the model could nominate the failing tests beforehand.

We found that, while embedding models perform best on such a task, even simple information retrieval models are surprisingly competitive. In contrast, pre-trained LLMs are of limited use as they focus on confounding aspects like coding styles.

We argue that the high computational cost of AI models is not always justified, and tool developers should also consider non-AI models for code-related retrieval and recommendation tasks. Lastly, we generalize from unit tests to live examples and outline how our approach can benefit live programming environments.

Toni Mattis

University of Potsdam; Hasso Plattner Institute

Germany

Lukas Böhme

Hasso Plattner Institute, University of Potsdam, Potsdam, Germany

Germany

Eva Krebs

Hasso Plattner Institute (HPI), University of Potsdam, Germany

Germany

Martin C. Rinard

Massachusetts Institute of Technology

United States

Robert Hirschfeld

University of Potsdam; Hasso Plattner Institute

Germany

Time Zone

The program is currently displayed in (GMT+01:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+01:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 12 Mar
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

13:15 - 15:00	Presentations and PanelProgramming with AI at M:Teknodromen

13:15 30m Paper		Faster Feedback with AI? -- A Test Prioritization Study Programming with AI Toni Mattis University of Potsdam; Hasso Plattner Institute, Lukas Böhme Hasso Plattner Institute, University of Potsdam, Potsdam, Germany, Eva Krebs Hasso Plattner Institute (HPI), University of Potsdam, Germany, Martin C. Rinard Massachusetts Institute of Technology, Robert Hirschfeld University of Potsdam; Hasso Plattner Institute
13:45 30m Talk		Extrapolating a programmer career - from Vim to LLM and beyond Programming with AI Andreas Bexell Ericsson
14:15 45m Panel		Industry Panel Programming with AI Markus Borg CodeScene, Gustaf Lundh Axis Communications, Mikael Lindberg Saab Kockums