Denys Poshyvanyk's Webpage

Denys Poshyvanyk, Ph.D.

Software

This page acts as a portal to various notable software projects that have been developed by the SEMERU team. We have borken projects into revelant categories and give a brief overview of each project below. Additionally, some project descriptions have links to repos for tools that are open source, or running web versions of tools. For the most current list of projects, please visit SEMERU web-site.

Deep Learning Projects

Neural Code Translator. Neural Code Translator provides instructions, datasets, and a deep learning infrastructure (based on seq2seq) that aims at learning code transformations. An Recurrent Neural Network (RNN) Encoder-Decoder model is trained to learn, from a set of known transformations, to translate the code before a transformation to the code after.

AutoenCODE. AutoenCODE is a Deep Learning infrastructure that allows to encode source code fragments into vector representations, which can be used to learn similarities. The repository contains code, data, and instructions on how to learn sentence-level embeddings for a given textual corpus (source code, or any other textual corpus). The learned embeddings (i.e., continous-valued vectors) can then be used to identify similarities among the sentences in the corpus. AutoenCODE uses a Neural Network Language Model (word2vec), which pre-trains word embeddings in the corpus, and a Recursive Neural Network (Recursive Autoencode) that recursively combines embeddings to learn sentence-level embeddings.

Android Projects

CrashScope aims to overcome current shortcomings in automated testing tools for mobile apps by using static analysis to identify GUI-specific locations where contextual features exist and multiple input generation strategies to effectively test these locations and uncover crashes. When the tool crashes a target application, it generates an expressive report with the steps for reproduction and a repayable test script.

FUSION: Despite striking advancements in program analysis techniques, reporters typically enter textual information to construct a bug report. However, this type of report has been shown to be woefully inadequate for developers looking to reproduce and fix reported bugs. The goal of the project is to leverage static and dynamic program analyses to improve the bug reporting process and produce higher quality reports with more detailed information, while requiring less effort from reporters.

GEMMA (Gui Energy Multi-objective optiMization for Android apps), generates color palettes for mobile apps using a multi- objective optimization technique, which produces color solutions optimizing energy consumption and contrast while using consistent colors with respect to the original color palette. An empirical evaluation that we performed on 25 Android apps demonstrates not only significant improvements in terms of the three different objectives, but also confirmed that in most cases users still perceived the choices of colors as attractive. A live web instance can be found here.

The major goal of the ODBR project is to provide a practical automated bug-reporting tool to developers that is capable of accurately recording and replaying a set of actions that will reproduce a bug that surfaces in a mobile application directly on a physical or virtual Android device. Our tool achieves this by leveragring low-level linux kernel user input event streams coupled with GUI information extracted from the Android uiautomator framework.

CLANdroid is an approach for automatically detecting Closely reLated applications in ANdroid by relying on advanced Information Retrieval techniques and five semantic anchors: identifiers, Android APIs, intents, permissions, and sen- sors. To evaluate CLANdroid we created a benchmark consisting of 14,450 apps along with information on similar apps provided by Google Play. We also compared effectiveness of different semantic anchors for detecting similar apps as perceived by 27 users. The results show that using Android-specific semantic anchors are useful for detecting similar Android apps across different categories.

Automated Software Documentation Projects

UnitTestScribe can automatically generate natural language (NL) documentation of unit test cases. The approach aims to ameliorate the burden of maintaining unit test cases for developers and ideally help developers rapidly identify outdated unit test cases to avoid regressions in their systems. UnitTestScribe is a novel combination of static analysis, natural language processing, backward slicing, and code summarization techniques to generate descriptions at unit test method level. UnitTestScribe generates the descriptions by detecting focal methods, assertions, and data dependencies in unit test methods. Source Code

DBScribe allows for automatic generation of natural language descriptions of Database related code at method-level. DBScribe statically analyzes the source code and the database schema to detect database usages, and propagates the usages and schema constraints through the call graph to document local and delegated execution of SQL queries/statements. Source Code

ChangeScribe is a tool for automatically generating commit messages using natural language summaries for Java applications hosted on Git SCM. The generated commit message describes the rationale of changes by using commit stereotypes, impact sets, and specific templates. ChangeScribe also allows to control the length of the message by using an elegant impact set-based heuristic. Install via Eclipse

Software Traceabiltiy Projects

TraceLab seeks to develop an experimental workbench for designing, constructing, and executing traceability experiments, and facilitating the rigorous evaluation of different traceability techniques. TraceLab is similar in some respects to existing tools such as Weka, MatLab, or RapidMiner, except that it is highly customized to support rigorous SE experiments. The project is led by Jane Cleland-Huang @ the University of Notre Dame with several collaborators, inclding W&M.

Traceclipse is an Eclipse plugin which enables software developers to specify, view, and manipulate traceability links within Eclipse, and it provides an API through which recovery techniques may be added, specified and run within an integrated development environment.

Information Retrieval Projects

We have released Portfolio, a new code search engine for finding relevant functions to user queries in more than 18,000 C/C++ FreeBSD ports projects. We use a novel combination of indexing, speading activation, and a variation of PageRank as well as a cool visualization of search results. For more information, please take your browser to http://www.searchportfolio.net.

ImpactMiner is a tool that implements an integrated approach to software change impact analysis. The proposed approach estimates an impact set using an adaptive combination of static textual analysis, dynamic execution tracing, and mining software repositories techniques.

FLAT3 is an open-source Eclipse plugin that integrates static and dynamic feature location techniques with a feature annotation function, providing a complete suite of tools to allow developers to locate the code that implements a feature and then save these mappings in a rich format. For more information (and ICSE-demo paper), please take your browser to http://www.cs.wm.edu/semeru/flat3/

TopicXP is an Eclipse plugin that uses Latent Dirichlet Allocation (LDA) to extract topics present in the natural language used in identifiers and comments in source code. TopicXP presents these topics to the user through a set of visualizations, which first introduce the topics and their relationships, and then let the user examine how these topics are found in actual source code. For more information, please see http://www.cs.wm.edu/semeru/TopicXP/

CodeTopics is an Eclipse plugin that shows developers the similarity between source code and high-level artifacts (HLAs) as well as highlights to what extent the code under development covers topics described in HLAs. Such views complement information derived by showing only the similarity between source code and HLAs helping (i) developers to identify functionality that is not implemented yet or (ii) newcomers to comprehend source code artifacts by showing them the topics that these artifacts relate to. Check it out: http://www.cs.wm.edu/semeru/CodeTopics/

We gratefully acknowledge financial support from the NSF on these research projects