In summary, we feel Android malware analysis is trending in the right direction. Many simple solutions and anti-virus products do provide protection against the bulk of mal- ware, but methods such as bytecode signatures are weak against the growing amount of advanced malware [143, 183, 200, 214]. We therefore suggest the following areas for future research, including several addressed in this thesis.
2.5.1
Hybrid Analysis and Multi-Levelled Analysis
Static solutions are beginning to harden against trivial obfuscations [82], but many apps, and most malware, are already using higher levels of obfuscation [79, 183]. As current static systems are still effective and scalable, in the cases where obfuscation (e.g., native code, reflection, encryption) is detected, dynamic analysis can be used in conjunction for a more complete analysis. Alternatively, dynamic solutions inherently have less code coverage, but can use static analysis to guide analyses through more paths [137, 193, 242], or use apps exercisers like MonkeyRunner, manual input, or intelligent event injectors [22, 133, 137]. Hybrid solutions could therefore combine static and dynamic analysis in ways that their added strengths mitigate both weaknesses.
It also seems beneficial to develop multi-level systems, as it often provides more, richer, features. Furthermore, in a multi-level system analysis, it would be harder for malware to hide actions if multiple layers of the Android architecture are being moni- tored. Furthermore, while some layers may be tampered by malware or only to analyse some Android versions, monitors on the lower levels should be able to function securely across all OS versions. Parallel processing could also greatly enhance multi-level analy- ses and provide faster detection systems [66]. The downside of this multi-level method, however, is it can cause large additional overhead, decrease transparency, and be less portable. Hence, at this point in time, we believe that the most desirable techniques enable the analysis of multiple layers from a single low-point of observation.
2.5.2
Code Coverage
As mentioned previously, code coverage is essential for complete, robust malware anal- yses. Statically, this can be difficult when dealing with dynamically loaded code, native code, and network-based activity. Dynamically, this is challenging as only one path is shown per execution, user interactions are difficult to automate, and malware may have split-personality behaviours. There are several benefits to dynamic out-of-box so- lutions, considering the launch of ART [218], like being able to cope with available
Android versions, and to bar malware avoiding analyses with native code or reflection. For example, system-call centric analysis is out-of-the box but can still analyse Android- level behaviours and dynamic network behaviours (see Chapter 3) and can be used to stop certain root exploits [215]. While hybrid solutions and smarter stimulations would greatly increase code coverage, different approaches should be further researched based on malware trends. For example, while manual input is normally not scalable, crowd sourcing [87] may be an interesting approach. However, zero-day malware will intro- duce complications as time is needed to create and collect user input traces.
This also introduces an interesting question on whether malware tend to use “easily” chosen paths to execute more malicious behaviour, or harder paths to avoid detection. This would be an interesting area for future research, as it would help identify mal- ware trends and, therein, increase the effectiveness of future analyses. Another topic of interest is identifying and understanding subsets of malware behaviour through path restrictions (e.g., remove permissions or triggers like user UI or system events), to see what behaviour equates to what permission(s) and/or trigger(s).
We also feel that there needs to be a better understanding of when an event is user- triggered or performed in the background and how. To increase code coverage, apps should also be run on several different Android OS versions as different versions have different sets of vulnerabilities (several root exploit examples found in Chapter 5). This would be much more difficult to implement if any modifications were made to the run- time or the OS to accommodate for high-level analyses.
2.5.3
Hybrid Devices and Virtualization
In addition to smart stimuli, modifying emulators for increased transparency (e.g., realis- tic GPS, realistic phone identifiers) or using emulators with access to real physical hard- ware (e.g., sensors, accelerometer) to fool VM-aware malware may prove useful [239]. Newer, more sophisticated, malware from 2014 and 2015 are becoming increasingly aware of emulated environments, but achieving a perfect emulator is, unfortunately, in- feasible. Things such as a timing attack, where certain operations are timed for discrep- ancies, are still open problems for traditional malware as well. Furthermore, such mal- ware (e.g., DenDroid, Android.HeHe) do not just detect their emulated environments, but often hide their malicious behaviours or tamper with the environment.
Based on a previous study, malware can check on several device features to detect emulators. This includes, but does not stop at, the device IMEI, routing table, timing attacks, realistic sensory output, and device serial number [162]. It is also possible to
fingerprint and identify specific emulated environments, e.g. dynamic analysis frame- works, via the aforementioned device performance features [138]. One solution would be to use real devices in all dynamic experiments. However, this makes analysing large malware sets a laborious and expensive task, as many devices would be needed as well as a way to restore a device to a clean state for quick, efficient, and reliable analysis.
Thus it would be interesting to combine real devices and emulators as a hybrid so- lution, where real devices pass necessary values to emulators to enhance their trans- parency. Real device data can also be slightly or randomly altered and fed to multiple emulators. This would ideally reduce the cost and speed of experiments while revealing more malicious behaviours. This hybrid method has proved to be effective for analysing embedded systems’ firmware [239], and it would be interesting to see if it could work for detection and analysis, and how effective it would be against VM-aware malware.
Alternatively to virtualization, it would be interesting to split the Android kernel so the untrusted system calls are directed to the hardened kernel code. This method has only been applied to a traditional Linux kernel, and it would be interesting if regular application system calls can be redirected to, and monitored by, the hardened part of the “split” kernel [124]. Lastly, we look forward to new technology, such as the new ARM with full virtualization support, and more explorations into ART and its new challenges.
2.5.4
Datasets
Sample selection is essential for accurate analysis. For example, as be seen from Ta- bles A.1 and A.2, many studies draw from only one market but due to many social, geological, and technical factors, different markets host varying amounts of malware from different families [127, 183]. Malware samples should also be chosen from several families, unless testing for very specific behaviours, to provide a more diverse set of be- haviours and evasion techniques to analyse. A full list of malware families analysed in Chapter 3 can be found in Appendix B, and more details on the families used in Chap- ter 4 can be found in [54]. Although the latter does not provide an exhaustive list, the dataset used is larger, more current, and a superset of those in Appendix B.
Similarly, market apps should be chosen from several categories and some free/paid pairs if possible. This is because a paid app behaves differently from its free version, and popular apps behave differently than unpopular ones. However, as popular apps affect more users, it could be argued that they are more essential for research [144, 175]. In the future, datasets should continue to be expanded by incorporating malware from multiple sources to provide more globally representative datasets. If used cor- rectly, specialized datasets for benchmarking (e.g., DroidBench [18]), testing types of
obfuscation (e.g., DroidChameleon [170]), etc., would also be highly useful to identify specific weakness or traits in analyses. As the work in this thesis focuses on malware specifically, we draw from several large and diverse datasets. However, in the future, it would be interesting to introduce benign applications into the frameworks in this thesis. One such dataset is PlayDrone, which contains over a million Google Play apps [217].