S3 - Searching, Selecting, and Synthesizing Source Code
Software developers rely on reusing source code snippets from existing libraries or
applications to develop software features on time and within budget. The reality is
that most previously implemented features are embedded in billions of lines of scattered
source code. State-of-the-art code search engines provide no guarantee that retrieved
code snippets implement these features. Even if relevant code fragments are located,
developers face the rather complex task of selecting and moving these fragments into
their applications. This research program proposes an integrated model for addressing
the fundamental problems of searching, selecting, and synthesizing (S3) source code. The
S3 model relies on integrating program analysis and information retrieval to produce
transformative models to automatically search, select, and synthesize relevant source
code fragments. The S3 model will directly support new methodologies for software
change and automated tools that assist programmers with various development, reuse, and
maintenance activities. Thus far, we have built three source code search engines.
Exemplar is a search engine that combines information
retrieval and program analysis techniques to reliably link high-level concepts to the
source code of the software applications via standard API calls that these applications
use. CLAN is an engine for computing similarities among
software applications in large software repositories.
Finally, Portfolio is a source code search engine that retrieves and visualizes
relevant functions and their uses in 18,203 C/C++ software projects from over 260
million lines of code in FreeBSD Ports. This project is sponsored by the NSF.
![]() |
Collaborative Research: Creating and Evolving Software via Searching, Selecting and Synthesizing Relevant Source Code |
SE2 - Software evolution, based on semantic and evolutionary information
Software maintenance and evolution is a vital and resource consuming phase of the
software lifecycle. Introducing software changes is a particularly complex phenomenon
in case of long-lived, large-scale, and globally distributed systems. Years of research
efforts have recognized three core tasks to support developers during software
maintenance: feature location (a starting point of a change in source code), impact
analysis (other software entities that are also change prone), and expert developer
recommendations (appropriate developers to implement changes). The project will
develop a novel one-stop solution for these tasks by integrating and mining the latent
information cluttered in structured and unstructured software artifacts produced and
constantly changed during evolution of software systems, which are largely untapped in
current solutions. This project has three main goals: 1) Define a new integrated
framework SE2 for a comprehensive analysis of software evolution, based on conceptual
and evolutionary information, under a single umbrella, 2) Define new methodologies for
software maintenance tasks based on SE2, and 3) Perform empirical studies to evaluate
SE2 and supported methodologies. Central to our solution are the state of the art data
mining, information retrieval, and program analysis methods. This project is sponsored
by the NSF.
![]() |
Collaborative Research: An Inductive Framework to Support Software Maintenance |
TraceLab - Traceability Instrument to Facilitate and Empower Traceability Research and Technology Transfer
The work will support a critical research agenda of the software engineering community
and facilitates technology transfer of traceability solutions to business and industry. The
traceability instrument, namely TraceLab, will contain a library of reusable trace
algorithms and utilities, a benchmarked repository of trace-related datasets, tasks,
metrics, and experimental results, a plug-and-play environment for conducting tracerelated
experiments, and predefined experimental templates representing common types
of empirical traceability experiments. The traceability instrument will also facilitate the
application of traceability solutions across a broad range of software engineering
activities including requirements analysis, architectural design, maintenance, reverse
engineering, and independent verification and validation. This is a collaborative effort
lead by Jane-Cleland Huang (PI), DePaul University, Jonathan Maletic (co-PI), Kent
State University and Denys Poshyvanyk (co-PI), William and Mary. This project is
sponsored by the NSF.
![]() |
MRI-R2: Development of a Software Traceability Instrument to Facilitate and Empower Traceability Research and Technology Transfer |
We gratefully acknowledge financial support from the NSF on this research project.