Jakob Bleier

Projektass. / BSc BSc MSc

Jakob Bleier

android and the similarity of electric sheep

Roles
  • PreDoc Researcher
Courses
Publications (created while at TU Wien)
    2023
    • Of Ahead Time: Evaluating Disassembly of Android Apps Compiled to Binary OATs Through the ART
      Bleier, J., & Lindorfer, M. (2023). Of Ahead Time: Evaluating Disassembly of Android Apps Compiled to Binary OATs Through the ART. In J. Polakis & E. van der Kouwe (Eds.), EUROSEC ’23: Proceedings of the 16th European Workshop on System Security (pp. 21–29).
      DOI: 10.1145/3578357.3591219 Metadata
      Abstract
      The Android operating system has evolved significantly since its initial release in 2008. Most importantly, in a continuing effort to increase the run-time performance of mobile applications (apps) and to reduce resource requirements, the way code is executed has transformed from being bytecode-based to a binary-based approach: Apps are still mainly distributed as Dalvik bytecode, but the Android Runtime (ART) uses an optimizing compiler to create binary code ahead-of-time (AOT), just-in-time (JIT), or as a combination of both. These changes in the build pipeline, including increasing obfuscation and optimization of the Dalvik bytecode, invalidate assumptions of bytecode-based static code analysis approaches through identifier renaming and code shrinking. Furthermore, customized apps can be distributed pre-compiled with devices’ firmware, sidestepping the bytecode altogether. Finally, Android apps have always relied on native binary code libraries for performance-critical tasks. We propose to narrow the gap between bytecode and binary code by leveraging the ART compiler’s capability to create well-formed ELF binaries, called OATs, as the basis for further static code analysis. To this end, we created a pipeline to automatically and efficiently compile APKs to OATs into a benchmark dataset of 1,339 apps. We then evaluate five popular disassemblers based on how well they can analyze these OATs based on how well they can detect function boundaries. Our results, in particular, compared to the success rate of two bytecode-based analyzers, demonstrate that our OAT-based approach can help to bring a wider set of code analysis tools and techniques to the area of Android app analysis.
    • Mixed Signals: Analyzing Software Attribution Challenges in the Android Ecosystem
      Hageman, K., Feal, A., Gamba, J., Girish, A., Bleier, J., Lindorfer, M., Tapiador, J., & Vallina-Rodriguez, N. (2023). Mixed Signals: Analyzing Software Attribution Challenges in the Android Ecosystem. IEEE Transactions on Software Engineering, 49(4), 2964–2979.
      DOI: 10.34726/5296 Metadata
      Abstract
      The ability to identify the author responsible for a given software object is critical for many research studies and for enhancing software transparency and accountability. However, as opposed to other application markets like Apple's iOS App Store, attribution in the Android ecosystem is known to be hard. Prior research has leveraged market metadata and signing certificates to identify software authors without questioning the validity and accuracy of these attribution signals. However, Android application (app) authors can, either intentionally or by mistake, hide their true identity due to: (1) the lack of policy enforcement by markets to ensure the accuracy and correctness of the information disclosed by developers in their market profiles during the app release process, and (2) the use of self-signed certificates for signing apps instead of certificates issued by trusted CAs. In this paper, we perform the first empirical analysis of the availability, volatility and overall aptness of publicly available market and app metadata for author attribution in Android markets. To that end, we analyze a dataset of over 2.5 million market entries and apps extracted from five Android markets for over two years. Our results show that widely used attribution signals are often missing from market profiles and that they change over time. We also invalidate the general belief about the validity of signing certificates for author attribution. For instance, we find that apps from different authors share signing certificates due to the proliferation of app building frameworks and software factories. Finally, we introduce the concept of an attribution graph and we apply it to evaluate the validity of existing attribution signals on the Google Play Store. Our results confirm that the lack of control over publicly available signals can confuse automatic attribution processes.
    2022
    • No Spring Chicken: Quantifying the Lifespan of Exploits in IoT Malware Using Static and Dynamic Analysis
      Al Alsadi, A. A., Sameshima, K., Bleier, J., Yoshioka, K., Lindorfer, M., van Eeten, M., & Hernández Gañán, C. (2022). No Spring Chicken: Quantifying the Lifespan of Exploits in IoT Malware Using Static and Dynamic Analysis. In Yuji Suga, Kouichi Sakurai, Xuhua Ding, & Kazue Sako (Eds.), ASIA CCS ’22: Proceedings of the 2022 ACM on Asia Conference on Computer and Communications Security (pp. 309–321). Association for Computing Machinery.
      DOI: 10.1145/3488932.3517408 Metadata
      Abstract
      The Internet of things (IoT) is composed by a wide variety of software and hardware components that inherently contain vulnerabilities. Previous research has shown that it takes only a few minutes from the moment an IoT device is connected to the Internet to the first infection attempts. Still, we know little about the evolution of exploit vectors: Which vulnerabilities are being targeted in the wild, how has the functionality changed over time, and for how long are vulnerabilities being targeted? Understanding these questions can help in the secure development, and deployment of IoT networks. We present the first longitudinal study of IoT malware exploits by analyzing 17,720 samples collected from three different sources from 2015 to 2020. Leveraging static and dynamic analysis, we extract exploits from these binaries to then analyze them along the following four dimensions: (1) evolution of infection vectors over the years, (2) exploit lifespan, vulnerability age, and the time-to-exploit of vulnerabilities, (3) functionality of exploits, and (4) targeted IoT devices and manufacturers. Our descriptive analysis uncovers several patterns: IoT malware keeps evolving, shifting from simply leveraging brute force attacks to including dozens of device-specific exploits. Once exploits are developed, they are rarely abandoned. The most recent binaries still target (very) old vulnerabilities. In some cases, new exploits are developed for a vulnerability that has been known for years. We find that the mean time-to-exploit after vulnerability disclosure is around 29 months, much longer than for malware targeting other environments.
Presentations (created while at TU Wien)
    2022
    • ART-assisted App Diffing: Defeating Dalvik Bytecode Shrinking, Obfuscation, and Optimization with Android's OAT Compiler
      Bleier, J., & Lindorfer, M. (2022, May 23). ART-assisted App Diffing: Defeating Dalvik Bytecode Shrinking, Obfuscation, and Optimization with Android’s OAT Compiler [Poster Presentation]. 43rd IEEE Symposium on Security and Privacy, San Francisco, United States of America (the).
      Metadata
      Abstract
      Android aims to provide a secure and feature-rich, yet resource-saving platform for its applications (apps). To achieve these goals, the compilation to distributable packages shrinks, obfuscates, and optimizes the code by default. As an additional optimization, the Android Runtime (ART) nowadays compiles the app’s bytecode to native code on the device instead of executing it in the Dalvik VM. We study the effects of these changes in the Android build and runtime environment on the problem of calculating app similarity. We compare existing bytecode-based tools to our novel approach of using the recompiled (and optimized) binary form. We propose OATMEAL, an extensible framework to generate reliable ground truth for evaluating app similarity approaches and provide a benchmark dataset to the community. We built this dataset from open-source apps available on F-Droid in various configurations that optimize and obfuscate the bytecode. Using this dataset, we show the limitations of existing Android-specific bytecode analysis approaches when faced with the new optimizing R8 bytecode compiler. We further demonstrate how well BinDiff, a state-of-the-art binary-based alternative, works in scoring the similarity of apps. With OATMEAL, we provide the foundation for integrating and benchmarking further approaches, both for calculating the similarity between apps (based on bytecode or binary code), and for evaluating their robustness to evolving optimization and obfuscation techniques.