On Automatically Detecting Similar Android Apps - ICPC 2016 Online Appendix
This web page is a companion to our ICPC 16 paper
entitled "On Automatically Detecting Similar Android Apps".
1. What is CLANdroid?
- CLANdroid is an approach for detecting similar Android applications. CLANdroid works by extracting different types of data from Android applications, then computes the similarity between these applications using these different types. For instance, CLANdroid computes how similar apps are based on their API calls, or how similar apps are based on the intents declared in source code.
- An online version of CLANdroid can be found here. Please note that this version of CLANdroid only returns the top 20 ranked apps. If you'd like to see more than the top 20, please download the similarity files and CLANdroid tool down below.
2. Data
Android apps
- The list of apps used for RQ1 is:
App Category Package Name Google Play website Basketball Dood Sports air.BasketballDoodFree LINK Adobe Photoshop Express Photography com.adobe.psmobile LINK Mind Fire (Free version) Card com.e_gadget.MindFireF LINK The Building Game Puzzle com.apesoup.buildinggame LINK OANDA fxTrade for Android Finance com.oanda.fxtrade LINK Office Depot® For Business Shopping com.officedepot.mobile.ui.bsd.us.prod LINK Funny Warp Entertainment com.rm.android.facewarp LINK AutoKiller Memory Optimizer Productivity com.rs.autokiller LINK Bubble Sky Blaster Casual com.enlightenedapps.bubbleblaster LINK SWF Player Media and Video air.br.com.bitlabs.SWFPlayer LINK Meteor Storm Arcade air.com.terrypaton.meteorstorm LINK Fluxo de Caixa Retro Finance om.dadonas.fluxocaixa.retro LINK - The list of apps and their categories used in our study for RQ2 and RQ3 are available as a CSV file
3. Data extraction

4. Results
RQ1-What semantic anchors used in CLANdroid produce better results when compared to the others?
- Survey Results. Spreadsheet with crossvalidation design and answers provided by participants.
RQ2-How orthogonal are the apps detected by CLAN-droid compared to Google Play?
- Average & Top files per category and overall which contain: [app name, (avg or top) api similarity value, (avg or top) identifier similarity value, (avg or top) intent similarity value, (avg or top) permission similarity value, (avg or top) sensor similarity value, (avg or top) combined similarity value]
RQ3-Do third-party libraries and obfuscated apps impact the accuracy of CLANdroid?
- Average & Top files per category and overall for datasets excluding third-party libraries (inside results_notpl folder) and excluding obfuscated apps (inside results_noobf folder) which contain: [app name, (avg or top) api similarity value, (avg or top) identifier similarity value, (avg or top) intent similarity value, (avg or top) permission similarity value, (avg or top) sensor similarity value, (avg or top) combined similarity value]
5. Tools
- Apache Commons BCEL library: required for extracting class signatures from JAR files.
- Apktool: required for extracting the contents of an APK file
- Dex2jar: used for getting JAR files from APK files
- JAD Java decompiler: for getting source code from JAR files
6. Scripts and Data
- Corpus Preprocessing: used to create corpus, run LSI, and parse output into multiple similarity files
- CLANdroid: after acquiring similarity files, the actual CLANdroid tool can be found here, as well as how to generate RQ2 and RQ3 results
- CLANdroid Similarities: full similarity values for each engine, and each dataset (excluding third-party libraries and excluding obfuscated apps) - NOTE: 12GB download, when unzipped becomes approximately 66GB
*Authors
- Mario Linares-Vásquez
- The College of William and Mary, VA, USA.
E-mail: mlinarev at cs dot wm dot edu - Andrew Holtzhauer
- The College of William and Mary, VA, USA.
E-mail: asholtzh at cs dot wm dot edu - Denys Poshyvanyk
- The College of William and Mary.
E-mail: denys at cs dot wm dot edu