There are two basic approaches to searching source code. In one, your query is matched directly to the text contained in source code elements. The results you see are typically fragments of code with variables, methods, and comments containing words from your entry. This might help you find a previously-known method or useful API call, but generally doesn't give you many guarantees, nor any context in which to use the fragments found.

Other engines can give you very intricate results provided equally intricate queries. If you happen to know that you want a method with a certain return type, and a specific set of inputs, and throws this or that type of exception, this type of tool may help you fit something into your programming context.

Exemplar is a new approach combining the benefits of both. Given a few keywords, Exemplar uses the intrinsic qualities of code (rather than text-only artifacts) in a source repository to return complete projects, providing a context implementing your needed task.

1. Operation

Let's take a look at what happens when you query Exemplar. Say, for example, you want to find code that records musical instrument data to a MIDI file. The first page should look familiar:

Exemplar is populated with projects from the open-source repository SourceForge. Clicking "Exemplar Search" gives you a page something like this:

Notice that in the second column, two scores are listed. The first score is a text-based comparison score of the user query to the project description shown in the third column. The second score is the heart of Exemplar's operation; it is the ranking of the query to the descriptions of the API calls made in the project. When a project is added to Exemplar's internal repository, all API calls are extracted from its source code, as well as their file and line locations. These calls all have help documentation (e.g. JavaDocs). An Exemplar search does a text-based comparison of the query to these help pages, and then from the help pages to the projects containing the API calls they describe. Details on this process are located in Section 2.

The first two projects have a high description score as well as relevant API calls. The third has no relevant calls. The fourth retrieved project has a zero description ranking but a high reported API call relevancy, and thus is of special interest as it probably will not be located by search engines looking only at project artifacts. Clicking the "Tritonus" button lets the user explore exactly which API calls the project makes:

At this point, an inquisitive programmer has several projects at hand, each relevant to the query for a different reason. The listing of API calls gives him or her the location of specific useful parts of code. Just as importantly, because these morsels occur within an entire project, the developer has an executable context in which to understand them.

2. Components

Stay Tuned!

3. Case Study

We found statistically-significant improvement over SourceForge's search mechanism during our case study with student and professional programmers! Here you can download our raw materials (CaseStudyMaterials.zip) and copies of all questionnaires (CaseStudyResults.zip).

4. Publications

Mark Grechanik, Chen Fu, Qing Xie, Collin McMillan, Denys Poshyvanyk, and Chad Cumby, A Search Engine For Finding Highly Relevant Applications, in Proceedings of 32nd ACM/IEEE International Conference on Software Engineering (ICSE'10), Cape Town, South Africa, May 2-8, 2010, to appear 10 pages (14% acceptance ratio).

5. The Exemplar Team

Current Members

  • Mark Grechanik

    E-mail: drmark at uic dot edu
    Affiliation: Accenture Technology Labs and University of Illinois at Chicago

  • Chen Fu

    E-mail: chen dot fu at accenture dot com
    Affiliation: Accenture Technology Labs

  • Qing Xie

    E-mail: qing dot xie at accenture dot com
    Affiliation: Accenture Technology Labs

  • Collin McMillan

    E-mail: cmc at cs dot wm dot edu
    Affiliation: College of William & Mary

  • Denys Poshyvanyk

    E-mail: denys at cs dot wm dot edu
    Affiliation: College of William & Mary

  • Chad Cumby

    E-mail: chad dot m dot cumby at accenture dot com
    Affiliation: Accenture Technology Labs

Former Members

6. Support

We gratefully acknowledge financial support from the NSF on this research project.