On Automatically Detecting Similar Android Apps - ICPC 2016 Online Appendix

This web page is a companion to our ICPC 16 paper entitled "On Automatically Detecting Similar Android Apps".

1. What is CLANdroid?

  • CLANdroid is an approach for detecting similar Android applications. CLANdroid works by extracting different types of data from Android applications, then computes the similarity between these applications using these different types. For instance, CLANdroid computes how similar apps are based on their API calls, or how similar apps are based on the intents declared in source code.
  • An online version of CLANdroid can be found here. Please note that this version of CLANdroid only returns the top 20 ranked apps. If you'd like to see more than the top 20, please download the similarity files and CLANdroid tool down below.

2. Data


Android apps

  • The list of apps used for RQ1 is:
    App Category Package Name Google Play website
    Basketball Dood Sports air.BasketballDoodFree LINK
    Adobe Photoshop Express Photography com.adobe.psmobile LINK
    Mind Fire (Free version) Card com.e_gadget.MindFireF LINK
    The Building Game Puzzle com.apesoup.buildinggame LINK
    OANDA fxTrade for Android Finance com.oanda.fxtrade LINK
    Office Depot® For Business Shopping com.officedepot.mobile.ui.bsd.us.prod LINK
    Funny Warp Entertainment com.rm.android.facewarp LINK
    AutoKiller Memory Optimizer Productivity com.rs.autokiller LINK
    Bubble Sky Blaster Casual com.enlightenedapps.bubbleblaster LINK
    SWF Player Media and Video air.br.com.bitlabs.SWFPlayer LINK
    Meteor Storm Arcade air.com.terrypaton.meteorstorm LINK
    Fluxo de Caixa Retro Finance om.dadonas.fluxocaixa.retro LINK

  • The list of apps and their categories used in our study for RQ2 and RQ3 are available as a CSV file


3. Data extraction

Data extraction from APKs

4. Results

RQ1-What semantic anchors used in CLANdroid produce better results when compared to the others?

  • Survey Results. Spreadsheet with crossvalidation design and answers provided by participants.

RQ2-How orthogonal are the apps detected by CLAN-droid compared to Google Play?

  • Average & Top files per category and overall which contain: [app name, (avg or top) api similarity value, (avg or top) identifier similarity value, (avg or top) intent similarity value, (avg or top) permission similarity value, (avg or top) sensor similarity value, (avg or top) combined similarity value]

RQ3-Do third-party libraries and obfuscated apps impact the accuracy of CLANdroid?

  • Average & Top files per category and overall for datasets excluding third-party libraries (inside results_notpl folder) and excluding obfuscated apps (inside results_noobf folder) which contain: [app name, (avg or top) api similarity value, (avg or top) identifier similarity value, (avg or top) intent similarity value, (avg or top) permission similarity value, (avg or top) sensor similarity value, (avg or top) combined similarity value]


5. Tools



6. Scripts and Data

  • Corpus Preprocessing: used to create corpus, run LSI, and parse output into multiple similarity files
  • CLANdroid: after acquiring similarity files, the actual CLANdroid tool can be found here, as well as how to generate RQ2 and RQ3 results
  • CLANdroid Similarities: full similarity values for each engine, and each dataset (excluding third-party libraries and excluding obfuscated apps) - NOTE: 12GB download, when unzipped becomes approximately 66GB


*Authors

  • Mario Linares-Vásquez - The College of William and Mary, VA, USA.
    E-mail: mlinarev at cs dot wm dot edu
  • Andrew Holtzhauer - The College of William and Mary, VA, USA.
    E-mail: asholtzh at cs dot wm dot edu
  • Denys Poshyvanyk - The College of William and Mary.
    E-mail: denys at cs dot wm dot edu