Mobile QR Code QR CODE

  1. (Dept. of Computer Science and Engineering, Sun Moon University, Korea spil3141@naver.com )
  2. (Dept. of Computer Science and Engineering, Sun Moon University, Korea young@sunmoon.ac.kr )



Android malware detection, Code item, Convolutional neural network, Grayscale image, Static analysis

1. Introduction

Smartphones have become a daily necessity in life. In the smartphone market, Android is the most used operating system (OS), and it is still expanding its market share. This increase in popularity also makes Android a target for developers with malicious intentions. Android is a vulnerable operating system compared to other platforms because it allows the installation of applications from multiple third-party markets. Most third-party markets do not have anti-malware detection features, so the chance of downloading a malicious application is high. Developers of malicious applications are finding ways to develop attacks that are difficult to detect. Mobile malware seems to be increasing and is finding new ways to avoid detection [8]. This means that there is a vital need for Android malware security.

The methods used in malware analysis can be divided into two groups: static and dynamic methods. Static analysis is widely used by researchers and industry. It relies on scanning disassembled code without executing the application to capture information. The file is disassembled to obtain both syntactic and semantic information by exploiting API calls, permission lists, and opcodes. In contrast, dynamic analysis involves methods that can monitor the behavior of applications at runtime.

This study proposes a technique to detect Android malware effectively based on converting malware binaries into images and then applying a machine learning technique to them. Other methods merely convert the data section of the classes.dex file and use it as features. However, our technique converts only a part of the data section, the code item, into an image. The code item section is shown in Fig. 1 and was inspired by previous work [3]. The code item is the target of our APK file pre-processing step.

The rest of this paper is arranged as follows. Section 2 discusses previous works. Our methodology is explained in section 3. Experiments and results with related information are presented in section 4. Finally, section 5 concludes the paper.

Fig. 1. The DEX file structure.
../../Resources/ieie/IEIESPC.2021.10.2.116/fig1.png

2. Related Work

Many methods detect Android malware by converting application packages into images using a convolutional neural network (CNN) [1, 3, 5, 12-18]. These methods can be organized based on how they analyze and engineer data. Static and dynamic analysis approaches using generic, machine learning, or deep learning methods are the most known approaches.

2.1 Machine Learning and Deep Learning

Some methods that engineer features using dynamic analysis are TaintDroid [9], DroidRanger [10], and DroidScope [11]. TaintDroid provides real-time analysis by leveraging Android’s virtualized execution environment to detect malicious behavior of third-party Android applications. DroidScope utilizes virtualization-based malware analysis to reconstruct both OS-level and Java-level semantics. A few related works use a more static route and focus on generating colored images from the bytecode of the whole DEX file. Gamut converts DEX files into images with a user-controlled level of semantics [3]. R2-d2 decompresses Android application packages to retrieve classes.dex (DEX file) and map the malware application bytes to an RGB color image using a pre-defined rule [1].

Malware detection approaches that do not utilize the advantages of machine learning or deep learning pattern recognition have a noticeable caveat. For example, even though dynamic analysis is effective at identifying malicious activities at runtime, there is a matter of overhead that arises. A static analysis method that does not utilize machine learning or deep learning works well but can easily be dodged by malware developers who can trick the disassemblers into producing incorrect code. Malware detection based on machine learning has been introduced to mitigate these limitations, and our research expands on these approaches by focusing on a sub-section of the DEX file called the code item and utilizing deep learning. Machine learning has several advantages:

· It can handle malware variants

· It can detect unknown or packed malware

· It does not require an Android emulator environment

· It can achieve high code coverage.

Similar research to our study using the data section of the DEX file achieved a reduction of storage capacity by 17.5\% on average. Our research results show a lower performance overhead. Our goal was to determine and utilize sections that have the best representation of the APK file. This work shows that using only the code item section offers a greater reduction in memory while maintaining acceptable generalization performance.

3. Methods

Our method converts APK files to grayscale images and then trains a deep learning model for classification using the generated images. Firstly, APK apps were processed using Androguard [2] to analyze an APK file and gain access to the Dalvik executable (DEX/ classes.dex) files. The exact classes used for obtaining the code item byte data are APK and DalvikVMFormat. In order to obtain an equivalent of the code item bytes, the original open-source Androguard source code was modified. This was done because the code item bytes by default cannot be directly extracted, so the API built-in functionality relating to parsing the Dalvik object bytes was modified. In the end, the hexadecimal string representation of the code item bytes was obtained. The bytecode in classes.dex is represented as hexadecimal.

In the image creation stage, a 2D grayscale image was generated from a parsed hexadecimal string representation of the code item binaries. This hexadecimal string was in the form of a byte array. A byte array is a mutable sequence with elements in the range of 0 ${\leq}$ x {\textless} 255. The resulting vector is a one-dimensional array of bytes. We needed an algorithm to create a two-dimensional image from this one-dimensional vector of bytes. The work by Jordy Gennissen [3] was very helpful and contained information on continuous fractal space-filling curve algorithms. The first algorithm is called linear plotting, which plots a one-dimension array of elements linearly while jumping to a new line based on a predetermined width value.

The other algorithm is a Hilbert curve technique that creates a space-filling curve from a one-dimension vector by visiting every point in a square grid with a size of any power of two (2${\times}$2, 4${\times}$4, etc.). The resulting figure is a square image. After testing, the results yielded little difference in performance between the two techniques (about 1% increase in accuracy). We decided to use linear plotting for our conversion algorithm.

The generated images had diverse resolutions based on the size of their bytecodes. Therefore, after converting all the samples into images, we resized their resolution to a fixed size. More information concerning our decision is given in section 3.1.

The last stage is the classification stage, which involves a CNN architecture. We tested many popular CNN architectures on our dataset, including InceptionV3 [7], ResNet50, ResNet101, DenseNet121, NASNet, and InveptionResNetV2 [4]. In the end, we ended up using the InceptionResNetV2 architecture model, which showed the best result.

3.1 Image Resolution and Experiment Environment

Finding the input resolution of our targeted CNN model required various image resolutions to be tested. Resolutions of 100${\times}$100, 128${\times}$128, 150${\times}$150, and 256${\times}$256 were tested. Our goal was to determine the resolution that has the least information loss after resizing. But in the end, we realized that there was little change in the performance of the models. Therefore, for our experiment, we chose 100x100.

Another reason for choosing 100${\times}$100 was based on our experiment environment. Table 1 shows the hardware and software libraries used in our experiments. The GPU of our system had limited VRAM, so using an input size of 100${\times}$100 helps us reduce the memory used for training at the cost of performance. The initial model training was conducted on a system with an AMD Ryzen 7 2700X 8-core processor, 32 gigabytes of DDR4 RAM, and a 3.6 terabytes of storage to hold our dataset in a comma-separated value (CSV) file format.

Table 1. Experiment parameters.

Label

Information

CPU

AMD Ryzen 7 2700X Eight-Core Processor

Memory

32 GB

GPU

NVIDIA GeForce GTX 1080 Ti

(11 GB vRAM)

HDD

3.6 TB

Library

Tensorflow 2.x, Matplotlib, Numpy, etc.

CUDA

v10.0

cuDNN

V7.6.5

3.2 The Architecture of Our Methodology

Fig. 2 shows our technique. There are three main stages: APK File processing, image creation, and classification.

· Stage 1: AndroGuard is used to reverse engineer the APK files and retrieve the classes.dex information as bytecode.

· Stage 2: Using linear plotting, grayscale images are generated using the bytecode.

· Stage 3: InceptionResNet2 was trained on the acquired datasets.

Fig. 2. Architecture.
../../Resources/ieie/IEIESPC.2021.10.2.116/fig2.png
Fig. 3. Dataset split.
../../Resources/ieie/IEIESPC.2021.10.2.116/fig3.png

4. Performance Evaluation

4.1 Dataset

We made use of two types of APK files: malicious and benign sets. The APK file sources were from Google Play, Amazon, APKpure, AMD, and Drebin [6]. Samples were divided into 10,000 malware APKs and 10,000 benign APKs. The benign APKs were obtained from Google Play, Amazon, and APKpure, while the malware APKs were from AMD and Drebin. The samples were cleaned, and a balanced distribution of malware and benign datasets was created. Corrupted APK files and damaged files were discarded. In the end, only 20,000 samples were used, which were separated into 18,000 samples for training, 1,000 samples for validation, and 1,000 samples for testing.

4.2 Images

After reverse-engineering the APK file and retrieving the code item binary in a one-dimensional vector (an array containing parsed bytes), we had to convert the 1D vector into two dimensions to form a grayscale image. This can be done using various plotting algorithms, as explained in the methods section. Below are some of the generated images using the linear plotting algorithm. Fig. 4 shows examples of the samples after they have been resized to a fixed 2D resolution. The generated images cannot be distinguished with the naked eye, which is why we used a deep learning classification model.

Fig. 4. Generated images.
../../Resources/ieie/IEIESPC.2021.10.2.116/fig4.png

4.3 Model Performance

Table 2-4 show the results from the evaluations. Images were generated from 20,000 APK files, and the resolution depended on the size of the code item binary. Afterward, all the images were resized to a fixed resolution, which corresponded to the dimensions of the input layer of the CNN (100${\times}$100). Using the 20,000 generated images (10,000 malware and 10,000 benign), an InceptionResNetv2 CNN model was trained using a stochastic gradient descent (SGD) optimizer with a vanilla hyperparameter setup (with a learning rate of 0.01, 10 epochs, and batch size of 100, etc.).

Table 2. Experiment results.

Evaluation

DEX Image (100x100)

Code Item Image (100x100)

Training accuracy

98%

98%

Validation accuracy

94%

89%

Test accuracy

94%

90%

F1 score

0.94

0.90

Table 3. DEX file confusion matrix.

Actual Class

Predicted Class

Positive

Negative

Positive

469

31

Negative

12

488

Table 4. Code item confusion matrix.

Actual Class

Predicted Class

Positive

Negative

Positive

464

36

Negative

44

456

4.4 Memory Comparison

This study compared 2,000 APKs to calculate the average, minimum, and maximum size of the code item section of the whole DEX file. Out of the 2,000 APKs, 1,000 APKs were malware, and the other 1,000 were benign. The experiments led to the observations in Tables 5 and 6.

The size comparison tables show that the code item sections occupy approximately 44.6% of the DEX files. This implies that memory usage can be reduced by 55.4% when using only the code item section for Android malware detection.

Table 5. Size ratios.

Dataset

Min.

Max.

Avg.

Benign

3.4%

48.8%

45.56%

Malware

15.87%

47.26%

43.68%

Table 6. Size comparison.

Dataset

Min.

Max.

Avg.

Size of benign DEX (es)

6.19 kb

10120.7 kb

3320.7 kb

Size of malicious DEX (es)

2.7 kb

6098.5 kb

1418.9 kb

Size of code item in benign DEX (es)

0.21 kb

4898.5 kb

1513.0 kb

Size of code item in malicious DEX (es)

0.429 kb

2882.4 kb

619.8 kb

4.5 Execution Time of Conversion

In an experiment, a single sample with a size of 1.6 GB was selected from the dataset as our target. Selecting a single APK file served as a good representative for determining the execution time of our image generation approach. Algorithms for converting the code item and DEX binaries (bytes) to images were examined. The time it takes each algorithm (code item-image and DEX-image) to complete the image generation process was our target. The results show that the code item conversion algorithm took about 1.92 seconds, while the DEX file binary conversion algorithm took 2.27 seconds. This means that the code item conversion algorithm was about 15% faster.

5. Conclusion

This research adopted deep learning to construct an Android malware detection technique that involves converting Android APK binaries into images for classification. Our experiment results indicated that faster overall execution time can be achieved when generating images using the code item section in comparison to the whole DEX file or the data section. The overall execution time (time complexity) of the image generation was shown to decrease by 15% compared to other methods. In future work, higher performance will be our objective. The reduction in byte size when using the code item section leaves room for creating a hybrid system that combines the code item with other representative features while still having lower data size.

ACKNOWLEDGMENTS

REFERENCES

1 
Huang T.H, Kao H.Y, Dec 10, 2018, R2-D2: ColoR-inspired Convolutional NeuRalNetwork (CNN)-based AndroiD Malware Detections., IEEE BigData 2018, pp. 2633-2642DOI
2 
Anthony Desnos , 2019, Androguard Documentation, Release 3.4.0Google Search
3 
Gennissen J., Blasco J., , Gamut: Sifting through Images to Detect Android Malware., June-25-2017URL
4 
Szegedy C., Ioffe S., Vanhoucke V., Alemi. A.A., 2017 Feb 12, Inception-v4, inception-resnet and the impact of residual connections on learning., In Thirty-First AAAI Conference on Artificial IntelligenceURL
5 
Nataraj L., Karthikeyan S., Jacob G., 2011, Malware images: visualization and automatic classification., In Proceedings of the 8th international symposium on visualization for cyber security, p. 4. ACMDOI
6 
Arp D., Spreitzenbarth M., Hubner M., Gascon H., Rieck K., Siemens. C.E., 2014 Feb 23, Drebin: Effective and explainable detection of android malware in your pocket., In Ndss, Vol. 14, pp. 23-26DOI
7 
Szegedy C., Vanhoucke V., Ioffe S., Shlens J., Wojna. Z., , Rethinking the inception architecture for computer vision., In Proceedings of the IEEE conference on computer vision and pattern recognition 2016, pp. 2818-2826DOI
8 
2020 , McAfee Mobile Threat Report Q1Google Search
9 
Enck W., Gilbert P., gon Chun B., Cox L. P., Jung J., McDaniel P., Sheth. A., 2010, Taintdroid: An information-flow tracking system for realtime privacy monitoring on smartphones., In Proc. of USENIX Symposium on Operating Systems Design and Implementation (OSDI), pp. pages 393-407DOI
10 
Zhou Y., Wang Z., Zhou W., Jiang. X., 2012, Hey, you, get off of my market: Detecting malicious apps in official and alternative android markets., In Proc. Of Network and Distributed System Security Symposium (NDSS)URL
11 
Yan L.-K., Yin. H., 2012, Droidscope: Seamlessly reconstructing os and Dalvik semantic views for dynamic android malware analysis., In Proc. of USENIX Security SymposiumDOI
12 
Vidas T., Christin N., June 2014, Evading Android Runtime Analysis via Sandbox Detection, in Proceedings of the 9th ACM symposium on Information, computer and communications security (ASIA CCS ’14), Kyoto, JapanDOI
13 
Yang C., Xu Z., Gu G., Yegneswaran V., Porras P., September 2014, DroidMiner: Automated Mining and Characterization of Fine-grained Malicious Behaviors in Android Applications., in Proceedings of the 19th European Symposium on Research in Computer Security(ESORICS’14), Wroclaw, PolandDOI
14 
William Hardy , Lingwei Chen , Shifu Hou , Yanfang Ye , 2016, DL4MD: A Deep Learning Framework for Intelligent Malware Detection, International Conference on Data Mining (DMIN)DOI
15 
Krizhevsky A., Sutskever I., Hinton G. E., 2012, ImageNet Classification with Deep Convolutional Neural Networks, in Advances in Neural Information Processing Systems 25 (NIPS 2012) , Harrahs and Harveys, Lake Tahoe, pp. 1097-1105URL
16 
Simonyan A. Z. K., 2015, Very Deep Convolutional Networks for LargeScale Image Recognition, in International Conference on Learning Representations 2015 (ICLR2015), San Diego, CAURL
17 
Saxe J., Berlin K., 2015, Deep neural network based malware detection using two dimensional binary program features, 2015 10th International Conference on Malicious and Unwanted Software (MALWARE), FajardoDOI
18 
Yuan Z., Lu Y., Wang Z., Xue Y., 2014, Droid-Sec: deep learning in android malware detection, in Proceedings of the 2014 ACM conference on SIGCOMM, Chicago, Illinois, USADOI

Author

Seung-Pil W. Coleman
../../Resources/ieie/IEIESPC.2021.10.2.116/au1.png

Seung-Pil W. Coleman received his B.S degree in Computer Engineering and Electronics from Sun Moon University, Korea, in 2018. He is currently a graduate student at Sun Moon University, Korea.

Young-Sup Hwang
../../Resources/ieie/IEIESPC.2021.10.2.116/au2.png

Young-Sup Hwang received the PH.D. degree from the Department of Computer Science and Engineering, POSTECH, Korea, in 1997. He is currently a Professor in the Division of Computer Science and Engineering, Sun Moon University, Korea. His research interests include pattern recognition, machine learning and neural networks.