When and Why Your Code Starts to Smell Bad
(and Whether the Smells Go Away)
This web page is a companion to our TSE 2016 journal paper entitled "When and Why Your Code Starts to Smell Bad (and Whether the Smells Go Away)".
Abstract
Technical debt is a metaphor introduced by Cunningham to indicate "not quite right code which we postpone making it right". One noticeable symptom of technical debt is represented by code smells, defined as symptoms of poor design and implementation choices. Previous studies showed the negative impact of code smells on the comprehensibility and maintainability of code. While the repercussions of smells on code quality have been empirically assessed, there is still only anecdotal evidence on when and why bad smells are introduced, what is their survivability, and how they are removed by developers. To empirically corroborate such anecdotal evidence, we conducted a large empirical study over the change history of 200 open source projects. This study required the development of a strategy to identify smell-introducing commits, the mining of over half a million of commits, and the manual analysis and classification of over 10K of them. Our findings mostly contradict common wisdom, showing that most of the smell instances are introduced when an artifact is created and not as a result of its evolution. At the same time, 80% of them survive in the system. Also, among the 20% of removed instances, only 9% is removed as a direct consequence of refactoring operations.RQ1: When are code smells introduced?
- Raw data of number of commits needed by a smell for its introduction: rawDataTime.zip
- Raw data of metrics trend: rawDataMetrics.zip
- R2 achieved by the different functions tested for regression analysis (and leading to the selection of the liner regression model): regressionR2.zip
RQ2: Why are code smells introduced?
- Raw data of assigned tags: assignedTags.zip
RQ3: What is the survivability of code smells?
The following file contains all the results for this research question: rq3.zipSurvival analysis considering as temporal variable:
- #commits
- #days
- All instances
- Closed instances
- Censored instances
- Boxplot
- Boxplot (log scale)
- Cumulative Density Function
- Closed instances
- Censored instances
Proportion of Censored and Closed Intervals
The Table shows the percentage closed and censored intervals considering (i) the original change history (no instance removed), (ii) the first quartile as threshold, (iii) the median value as threshold, and (iv) the third quartile as threshold. In particular we can see that the proportion of censored and closed intervals after excluding censored intervals using the median value, remains almost identical to the initial proportion (i.e., row corresponding to "All"). Indeed, in most of the cases the differences are less than 1% while in only few cases it reaches 2%.
Smell | History | Android | Apache | Eclipse | |||
---|---|---|---|---|---|---|---|
Closed | Censored | Closed | Censored | Closed | Censored | ||
Blob | All | 36 | 64 | 15 | 85 | 31 | 69 |
Ex. 1st Q. | 36 | 64 | 15 | 85 | 31 | 69 | |
Ex. Median | 36 | 64 | 15 | 85 | 31 | 69 | |
Ex. 3rd Q. | 38 | 62 | 16 | 84 | 31 | 69 | |
CDSBP | All | 13 | 87 | 12 | 88 | 17 | 83 |
Ex. 1st Q. | 13 | 87 | 12 | 88 | 17 | 83 | |
Ex. Median | 14 | 86 | 12 | 88 | 17 | 83 | |
Ex. 3rd Q. | 14 | 86 | 13 | 87 | 18 | 82 | |
CC | All | 15 | 85 | 14 | 86 | 29 | 71 |
Ex. 1st Q. | 15 | 85 | 14 | 86 | 29 | 71 | |
Ex. Median | 15 | 85 | 14 | 86 | 30 | 70 | |
Ex. 3rd Q. | 15 | 85 | 14 | 86 | 33 | 67 | |
FD | All | 9 | 91 | 8 | 92 | 10 | 90 |
Ex. 1st Q. | 9 | 91 | 8 | 92 | 10 | 90 | |
Ex. Median | 9 | 91 | 9 | 91 | 10 | 90 | |
Ex. 3rd Q. | 9 | 91 | 9 | 91 | 11 | 89 | |
SC | All | 11 | 89 | 12 | 88 | 41 | 59 |
Ex. 1st Q. | 11 | 89 | 12 | 88 | 41 | 59 | |
Ex. Median | 11 | 89 | 13 | 87 | 43 | 57 | |
Ex. 3rd Q. | 11 | 89 | 13 | 87 | 46 | 54 |
Replicating the study by just considering the 2,555 smell-introducing commits manually validated
- Achieved results: validated.pdf
Authors
- Michele Tufano
- The College of William and Mary, VA, USA.
E-mail: mtufano at cs dot wm
dot edu
- Fabio Palomba
- University of Salerno, Salerno, Italy.
E-mail: fpalomba at unisa
dot it
- Gabriele Bavota
- Free University of Bozen-Bolzano, Bolzano, Italy.
E-mail: gabriele.bavota at unibz
dot it
- Rocco Oliveto
- University of Molise, Pesche (IS), Italy.
E-mail: rocco.oliveto at unimol
dot it
- Massimiliano Di Penta
- University of Sannio, Benevento, Italy.
E-mail: dipenta at unisannio
dot it
- Andrea De Lucia
- University of Salerno, Salerno, Italy.
E-mail: adelucia at unisa
dot it
- Denys Poshyvanyk
- The College of William and Mary.
E-mail: denys at cs dot wm dot edu