TopicXP: Exploring Topics in Source Code using Latent Dirichlet Allocation

TopicXP is an Eclipse plugin that uses Latent Dirichlet Allocation (LDA) to extract topics present in the natural language used in identifiers and comments in source code. It then presents these topics to the user through a set of visualizations which first introduce the topics and their relationships and then let the user examine how these topics are found in actual source code.


Video Walkthrough


Walkthrough

Analyzing a Project

  • In Eclipse's Project Explorer, right click on a project that you would like to analyze and select .

  • An options dialog will be displayed. This dialog allows you to tweak the options which LDA will use to extract topics from the project. Notably, changing the number of topics can greatly affect the resulting visualization. Likewise, changing the threshold or cutoff with which classes are assigned to topics will greatly change how many classes you see in the Topic Contents View. Lowering the number of iterations makes the analysis go faster, but may result in a less useful model. And custom stopwords let you ignore words that are common in a particular project but which don't have any useful meaning.

  • After you click OK in the options dialog, the model will be generated. This may take some time, particularly if you are analyzing a large project. Once the analysis is done, the help dialog will be displayed to get your started with understanding the visualizations. The Topic Dependency view will also be displayed.

Topic Dependency View

  • In this view, each box represents a topic. The words most relevant to the topic are displayed on the first line, followed by the names of the most relevant classes and packages. Arrows represent dependency links between topics (calls in one topic to methods in another topic). The color and size of the arrow corresponds to the number of method calls. Clicking on a topic-box will take you to the Topic Contents View for that topic.

  • There are also a few operations that one can perform to manipulate the topic dependency and topic contents views. A user query () can be used to filter the classes displayed. Queries can also be removed ().
  • A user can also filter the arrows representing dependency links to only see those which represent stronger relationships (). They can take a snapshot of the current view (). Finally, they can change the LDA options (). This will regenerate the model automatically if necessary.

Topic Contents View

  • Each colored box represents a class. The size of the box corresponds to the probability with which the class belongs to the topic. Each grey box corresponds to a package, so you can view both the package heirarchy and the classes within that heirarchy.
  • Right clicking takes you back to the Topic Dependency View. Double clicking on a class opens the class's source code. Toolbar icons function as in the Topic Dependency View.

  • The color of the box corresponds to the class's MWE Cohesion. Red maps to a value of about 0, green to around .5, and blue/purple to around 1.

  • Hovering over a class shows you a tooltip with more information on the class and it's place in the model.

Installation

To install the plugin, download the JAR file and drop it in your "plugins" folder inside your Eclipse home directory. When you restart Eclipse, the Explore with LDA contextual menu item should be avaliable when you right click on a project in Eclipse's Project Explorer, or you can open the plug-in's view to see the help file by opening the Window menu and selecting Show View -> Other -> TopicXP -> LDA Exploration View

To load the Eclipse project containing the TopicXP source code, you will also need to download the JAR file. Make sure that you are running a version of Eclipse which includes the Plug-in Development Enviroment, then open the import dialog in Eclipse (File -> Import) and select Plug-in Development -> Plug-ins and Fragments. Click next, then select Directory under Import From and browse for the folder containing the JAR file. Click next again, and TopicXP should be listed in the left box. Select the plug-in and then click the Add button, and then finally click Finish. The plug-in's source should be loaded into a new Eclipse project.

TopicXP was developed on Eclipse 3.5, but may also work in earlier versions. It should work on Linux, Windows, and Mac OS X.


User Study

This section is a companion to our ICSM 2010 tool demo:

Trevor Savage, Bogdan Dit, Malcom Gethers, and Denys Poshyvanyk, TopicXP: Exploring Topics in Source Code using Latent Dirichlet Allocation, in Proceedings of 26th IEEE International Conference on Software Maintenance (ICSM'10), Formal Research Tool Demonstration, Timişoara, Romania, September 12-18, 2010, to appear 6 pages

The user study handout that was given to the students before the user study, contains the following information:

  • instructions on how install and run the suite of tools necessary for the user study (e.g., Eclipse, TopicXP, DevMon)
  • details about how to use TopicXP
  • the description of the tasks for jEdit and muCommander

Acknowledgements

Thanks are due in particular to JGibbLDA, which provides LDA functionality, and to X-Ray, which is used to acquire dependency links and which also provides the basis for the topic dependency view.


People

  • Trevor Savage (main developer)

    E-mail: tcsava at email dot wm dot edu

  • Bogdan Dit

    E-mail: bdit at email dot wm dot edu

  • Malcom Gethers

    E-mail: mgethers at cs dot wm dot edu

  • Denys Poshyvanyk

    E-mail: denys at cs dot wm dot edu


We gratefully acknowledge financial support from the NSF on this research project.