When and Why Your Code Starts to Smell Bad
(and Whether the Smells Go Away)


This web page is a companion to our TSE 2016 journal paper entitled "When and Why Your Code Starts to Smell Bad (and Whether the Smells Go Away)".

Abstract

Technical debt is a metaphor introduced by Cunningham to indicate "not quite right code which we postpone making it right". One noticeable symptom of technical debt is represented by code smells, defined as symptoms of poor design and implementation choices. Previous studies showed the negative impact of code smells on the comprehensibility and maintainability of code. While the repercussions of smells on code quality have been empirically assessed, there is still only anecdotal evidence on when and why bad smells are introduced, what is their survivability, and how they are removed by developers. To empirically corroborate such anecdotal evidence, we conducted a large empirical study over the change history of 200 open source projects. This study required the development of a strategy to identify smell-introducing commits, the mining of over half a million of commits, and the manual analysis and classification of over 10K of them. Our findings mostly contradict common wisdom, showing that most of the smell instances are introduced when an artifact is created and not as a result of its evolution. At the same time, 80% of them survive in the system. Also, among the 20% of removed instances, only 9% is removed as a direct consequence of refactoring operations.

RQ1: When are code smells introduced?


  • Raw data of number of commits needed by a smell for its introduction: rawDataTime.zip
  • Raw data of metrics trend: rawDataMetrics.zip
  • R2 achieved by the different functions tested for regression analysis (and leading to the selection of the liner regression model): regressionR2.zip



RQ2: Why are code smells introduced?




RQ3: What is the survivability of code smells?

The following file contains all the results for this research question: rq3.zip

Survival analysis considering as temporal variable:
  • #commits
  • #days
Distribution of number of days and commits for:
  • All instances
  • Closed instances
  • Censored instances
We show these distribution using:
  • Boxplot
  • Boxplot (log scale)
  • Cumulative Density Function
Number of Modification (i.e., commit performed) for:
  • Closed instances
  • Censored instances
Finally, we show the complete survival comparison between born and not born smelly for all the code smell types analyzed.



Proportion of Censored and Closed Intervals


The Table shows the percentage closed and censored intervals considering (i) the original change history (no instance removed), (ii) the first quartile as threshold, (iii) the median value as threshold, and (iv) the third quartile as threshold. In particular we can see that the proportion of censored and closed intervals after excluding censored intervals using the median value, remains almost identical to the initial proportion (i.e., row corresponding to "All"). Indeed, in most of the cases the differences are less than 1% while in only few cases it reaches 2%.

SmellHistoryAndroidApacheEclipse
ClosedCensoredClosedCensoredClosedCensored
Blob All 36 64 15 85 31 69
Ex. 1st Q. 36 64 15 85 31 69
Ex. Median 36 64 15 85 31 69
Ex. 3rd Q. 38 62 16 84 31 69
CDSBP All 13 87 12 88 17 83
Ex. 1st Q. 13 87 12 88 17 83
Ex. Median 14 86 12 88 17 83
Ex. 3rd Q. 14 86 13 87 18 82
CC All 15 85 14 86 29 71
Ex. 1st Q. 15 85 14 86 29 71
Ex. Median 15 85 14 86 30 70
Ex. 3rd Q. 15 85 14 86 33 67
FD All 9 91 8 92 10 90
Ex. 1st Q. 9 91 8 92 10 90
Ex. Median 9 91 9 91 10 90
Ex. 3rd Q. 9 91 9 91 11 89
SC All 11 89 12 88 41 59
Ex. 1st Q. 11 89 12 88 41 59
Ex. Median 11 89 13 87 43 57
Ex. 3rd Q. 11 89 13 87 46 54


Replicating the study by just considering the 2,555 smell-introducing commits manually validated




Authors

  • Michele Tufano - The College of William and Mary, VA, USA.
    E-mail: mtufano at cs dot wm
  • dot edu
  • Fabio Palomba - University of Salerno, Salerno, Italy.
    E-mail: fpalomba at unisa
  • dot it
  • Gabriele Bavota - Free University of Bozen-Bolzano, Bolzano, Italy.
    E-mail: gabriele.bavota at unibz
  • dot it
  • Rocco Oliveto - University of Molise, Pesche (IS), Italy.
    E-mail: rocco.oliveto at unimol
  • dot it
  • Massimiliano Di Penta - University of Sannio, Benevento, Italy.
    E-mail: dipenta at unisannio
  • dot it
  • Andrea De Lucia - University of Salerno, Salerno, Italy.
    E-mail: adelucia at unisa
  • dot it
  • Denys Poshyvanyk - The College of William and Mary.
    E-mail: denys at cs dot wm dot edu