Combining Machine Learning and Combinatorial Search in Program Repair

We consider the problem of automatically generating repair suggestions for a defective database program that behaves incorrectly due to an error in the WHERE condition of a SELECT statement.
A common setting in database programs is that the output is incorrect only for part of the data, e.g., for certain key values. In this paper, we use techniques from machine learning to take advantage of the information revealed by the defect-free data. Our basic approach is to learn a decision tree from correct behavior—including correct behavior on the defect-inducing data—of the SELECT statement. This decision tree can give valuable hints, if not directly the correct WHERE condition.
Our novelty is in the crucial step of determining the correct behavior of the defect-inducing data. We do this using a combination of SAT-based search and prediction generated by support vector machines (SVMs). Our insight is that SVMs can learn from the behavior of the defect-free data to predict the behavior of defect-inducing data with high accuracy, with SAT-based search bridging over any deficit in the accuracy efficiently.
We implemented this approach and present experimental results using a suite of programs and data sets obtained from real-world applications.

By: Divya Gopinath, Sarfraz Khurshid, Diptikalyan Saha, Satish Chandra

Published in: RI13002 in 2013


This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.


Questions about this service can be mailed to .