How Do Static and Dynamic Test Case Prioritization Techniques Perform on Modern Software Systems? A Large-Scale Study on GitHub Projects

- TSE 2017 Online Appendix

This web page is a companion to our TSE 2017 submission entitled "How Do Static and Dynamic Test Case Prioritization Techniques Perform on Modern Software Systems? A Large-Scale Study on GitHub Projects".

1. Studied TCP techniques

We conducted our empirical study on four static TCP techniques and four dynamic techniques.


2. Subject programs

We collected 58 large, real-world Java systems. The programs names, links and sizes in terms of lines of code (LOC) are shown in the following table. The numbers of test cases on method level and class level are shown in Columns 4 and 5 respectively. Columns 6 and 7 show the number of mutation faults that can be detected and the the number of all mutation faults for each subject.

Subjects & LinksVersion#LOC#TM#TCDetectedAll
P1-geojson-jackson(Link)f4c6061,1514413301717
P2-statsd-jvm-profiler(Link)73ba391,3552912290708
P3-stateless4j(Link)ec3b301,7566110392696
P4-jarchivelib(Link)563b7c1,9402212655948
P5-JSONassert(Link)ce4eeb1,957121109351,116
P6-java-faker(Link)01c77a2,0692811392600
P7-jackson-datatype-joda(Link)bb552d2,4095786751,212
P8-Java-apns(Link)2665ef3,23487154121,122
P9-pusher-websocket-java(Link)fdb4ad3,259199118511,470
P10-gson-fire(Link)6ac2583,42155148471,064
P11-jackson-datatype-guava(Link)a049b93,99491153131,832
P12-dictomaton(Link)c3ef364,09953112,02410,857
P13-jackson-uuid-generator(Link)4872894,1584568022,039
P14-JAdventure(Link)3784cf4,41635107385,098
P15-exp4j(Link)2c77954,61728591,3651,563
P16-jumblr(Link)68fa794,623103156101,192
P17-efflux(Link)2642214,94041101,1902,840
P18-metrics-core(Link)d83b985,027144281,6565,265
P19-low-gc-membuffers(Link)febe595,19851181,8613,654
P20-xembly(Link)05101d5,31958161,1902,546
P21-scribe-java(Link)0311a45,35599185631,622
P22-jpush-api-java-client(Link)7aedcb5,46265108222,961
P23-gdx-artemis(Link)1687eb6,04331209681,687
P24-Protoparser(Link)8be66b6,074171143,3464,640
P25-commons-cli(Link)b486fb6,601317262,3622,801
P26-mp3agic(Link)99fa156,939205193,3626,391
P27-webbit(Link)f628a77,363131251,2683,833
P28-RestFixture(Link)bb4c707,421268302,2343,278
P29-LastCalc(Link)7e0fc97,70734132,8146,635
P30-jackson-dataformat-csv(Link)a4c1047,85098271,6936,795
P31-skype-java-api(Link)a7df1a8,26424168856,494
P32-lambdaj(Link)bd3afc8,510252353,3824,341
P33-jackson-dataformat-xml(Link)a1ab208,648134451,7064,149
P34-jopt-simple(Link)a16d868,778511792,3252,525
P35-jline2(Link)8d6c538,783130163,5238,368
P36-javapoet(Link)1a16949,007246163,4004,601
P37-Liqp(Link)54f9bd9,139235587,96218,608
P38-cassandra-reaper(Link)ef76a29,89640121,1865,105
P39-JSqlParser(Link)a0e37210,3353131915,69832,785
P40-raml-java-parser(Link)6b691611,126190364,6786,431
P41-redline-smalltalk(Link)6322ac11,2283791,83410,763
P42-user-agent-utils(Link)ecc99111,456627376688
P43-javaewah(Link)558ab013,293229116,30711,939
P44-jsoup-learning(Link)2c058013,505380257,76113,230
P45-wsc(Link)a0ab0813,6521681,68717,942
P46-rome(Link)772d4f13,874443454,92010,744
P47-JActor(Link)20473914,17154431321,375
P48-RoaringBitmap(Link)13038a16,341286159,70913,574
P49-JavaFastPFOR(Link)2dd09c17,69542846,42964,372
P50-jprotobuf(Link)e4ee6e21,16148181,53910,338
P51-worldguard(Link)32341324,457148121,12725,940
P52-commons-jxpath(Link)e4804324,9104113913,61124,369
P53-commons-io(Link)c4931527,2631125927,630110,365
P54-nodebox(Link)2717b332,244293407,82436,793
P55-asterisk-java(Link)4cbd2339,542220393,29917,664
P56-ews-java-api(Link)95c6df46,863130282,41931,569
P57-commons-lang(Link)e42dad61,518238811425,77532,291
P58-joda-time(Link)c3ef3682,9984,02612220,95728,382

3. Tools

  • PIT: A mutation testing system, providing mutation testing and test coverage for Java and the jvm (LINK), and its all available mutators (LINK).
  • WALA: A tool to collect the RTA static call graph for each test (LINK)
  • JDT: A tool to collect the textual test information (LINK).
  • R-lda: A R package to build topic models for test cases (LINK).
  • Mallet: A tool to build topic models for test cases (LINK).
  • ASM: A tool to collect the coverage information for each test case (LINK).

4. Results

4.1 The results of APFD and APFDc for different TCP techniques across all subjects on test-class level.



4.2 The results of APFD and APFDc for different TCP techniques across all subjects on test-method level.



4.3 The results for the ANOVA and Tukey HSD tests on the average APFD and APFDc values at test-class level.



4.4 The results for the ANOVA and Tukey HSD tests on the average APFD and APFDc values at test-method level.



4.5 The box-and-whisker plots represent the values of APFDc for different TCP techniques at different test granularities. The x-axis represents the APFDc values. The y-axis represents the different techniques. The central box of each plot represents the values from the lower to upper quartile (i.e., 25 to 75 percentile).



4.6 The table shows the results of Wilcoxon signed rank test on the average APFD values for each pair of TCP techniques. The techniques T1 to T9 refer to TPcg-tot, TPcg-add, TPstr, TPtopic-r, TPtopic-m, TPtotal, TPadd, TPart, TPsearch respectively. For each pair of TCP techniques, there are two sub-cells. The first one refers to the p-value at test-class level and the second one refers to the p-value at test-method level. If a p-value is less than 0.05, the corresponding cell is shaded.



4.7 The table shows the results of Wilcoxon signed rank test on the average APFDc values for each pair of TCP techniques. The techniques T1 to T9 refer to TPcg-tot, TPcg-add, TPstr, TPtopic-r, TPtopic-m, TPtotal, TPadd, TPart, TPsearch respectively. For each pair of TCP techniques, there are two sub-cells. The first one refers to the p-value at test-class level and the second one refers to the p-value at test-method level. If a p-value is less than 0.05, the corresponding cell is shaded.



4.8 Results for average APFD values on different sizes of mutation faults. The last column shows the results for Kendall tau Rank Correlation Coefficient between the average APFD values with different sizes of mutation faults and the average APFD values with the default mutation faults.



4.9 Results for average APFDc values on different sizes of mutation faults. The last column shows the results for Kendall tau Rank Correlation Coefficient between the average APFD values with different sizes of mutation faults and the average APFD values with the default mutation faults.



4.10 Results for average APFD values on different types of mutation faults. The last column shows the results for Kendall tau Rank Correlation Coefficient between the average APFD values with different types of mutation faults and the average APFD values with the default mutation faults.



4.11 Results for average APFDc values on different types of mutation faults. The last column shows the results for Kendall tau Rank Correlation Coefficient between the average APFD values with different types of mutation faults and the average APFD values with the default mutation faults.



4.12 The classification of subjects on different granularities using Jaccard distance at cut point 10%. The four values in each cell are the numbers of subjects, the faults of which detected by two techniques are highly dissimilar, dissimilar, similar and highly similar respectively. The results at other cut points can be found here.



4.13 Counts and percentage for different types of mutation faults across all subjects at cut point 10% for class-level granularity.



4.14 Counts and percentage for different types of mutation faults across all subjects at cut point 10% for method-level granularity.



4.15 The results for execution costs for different TCP techniques.



4.16 The results for TCPs in terms of APFD at test-class level in evolution scenario.



4.17 The results for TCPs in terms of APFD at test-method level in evolution scenario.



4.18 The results for TCPs in terms of APFDc at test-class level in evolution scenario.



4.19 The results for TCPs in terms of APFDc at test-Method level in evolution scenario.




5. Authors

  • Qi Luo - The College of William and Mary, VA, USA.
    E-mail: qluo at cs dot wm dot edu
  • Kevin Moran - The College of William and Mary, VA, USA.
    E-mail: kpmoran at cs dot wm dot edu
  • Lingming Zhang - University of Texas at Dallas, Dallas, TX.
    E-mail: ingming.zhang at utdallas dot edu
  • Denys Poshyvanyk - The College of William and Mary.
    E-mail: denys at cs dot wm dot edu