CN109902487B

CN109902487B - Android application malicious property detection method based on application behaviors

Info

Publication number: CN109902487B
Application number: CN201711296153.5A
Authority: CN
Inventors: 俞研; 黄兴远; 苏铓; 黄婵颖; 付安民; 王永利
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2017-12-08
Filing date: 2017-12-08
Publication date: 2022-09-13
Anticipated expiration: 2037-12-08
Also published as: CN109902487A

Abstract

The invention discloses an Android application maliciousness detection method based on application behaviors. According to the characteristics of the Android system, the component life cycle, asynchronous calling functions, calling relations among components and other factors in the Android are considered and are correspondingly processed, so that the integrity of analysis is ensured, and a control flow graph and a function calling graph with complete Android application can be obtained. And then, reliable security sensitive behavior path information is obtained by defining a security sensitive function and combining a reverse analysis technology and a program slicing analysis technology. And finally, training and classifying the extracted behavior paths by using a convolutional neural network which is one of deep learning models, wherein the trained model can be used for carrying out malicious detection on unknown Android application. The method can effectively extract all behavior paths possibly related to malicious behaviors in the Android application, and stores the key information in the behavior paths for subsequent analysis, and the behavior paths can accurately depict specific behaviors of the application, so that an analysis model based on the behavior paths has better detection accuracy.

Description

Android application maliciousness detection method based on application behaviors

Technical Field

The invention belongs to an Android application detection method, and particularly relates to an Android application malice detection method based on static analysis.

Background

The Android system is an operating system for a mobile terminal, which is introduced by Google corporation, and the use rate and market share of the Android system are rapidly increased due to the inherent openness and customizability of the operating system. According to the display of Kantar Wroldpanel data of overseas market data research company, the market share of the Android mobile phone in China in one quarter in 2017 rises to 86.4%, and the system is the first system of the mobile terminal operating system.

Because the mobile phone users of the Android system are numerous, the development of malicious applications can bring huge benefits to developers. Meanwhile, potential safety hazards are brought to Android application due to the open source of the Android system, so that an attacker can develop the Android malicious application more conveniently. The convenience of interest temptation and development makes malicious applications on the Android platform appear endlessly and in different forms. There are many 27000 malicious applications collected on the VirusShare website during only 2012 to 2014, and the number of malicious applications that appear each year is still increasing.

The Android malicious application detection method mainly comprises two major means of static analysis and dynamic analysis. The dynamic detection can acquire the actually executed behavior, but due to the randomness of the dynamic test, the coverage rate of the detected code is low, and the detection precision is not high. Static detection is realized by analyzing byte code files, so that the code coverage rate is high, the analysis time is long, a large amount of codes and branches jump, and the condition of analysis path explosion is easily caused. And in part of research, a machine learning method is combined, some characteristics possibly related to malicious behaviors are artificially selected, and malicious applications are distinguished through a trained model. The method often depends too much on subjective consciousness of people, and the selected characteristics are coarse-grained characteristics in most cases, so that specific behaviors are difficult to be described specifically.

Because the static analysis of all execution paths of the Android application has the problems of path explosion and path integrity, the existing behavior analysis method in the Android application usually only focuses on a specific characteristic in a behavior, but does not analyze the execution path and the runtime behavior of the application. In addition, if only the application behaviors are subjected to fine-grained depiction analysis at the code level, although the behavior description becomes accurate, a simple machine learning model cannot be used for training classification, because the shallow machine learning model has limited expression capability and cannot learn the representation of such complex information.

Disclosure of Invention

The invention aims to provide an application behavior-based Android application malice detection method based on deep neural network learning.

The technical solution for realizing the purpose of the invention is as follows: an Android application malice detection method based on application behaviors comprises the following steps:

1) decompiling the Android application to be detected to obtain a byte code file of the application, analyzing the byte code file, and constructing a control flow graph CFG and an intra-component call graph CG in the method;

2) analyzing a Manifest configuration file, acquiring component information defined in the file, analyzing an Android application code, searching a code segment related to function calling among components, and generating a calling relation among the components;

3) analyzing the Android application code, analyzing related function calls of an asynchronous event processing mechanism, and generating an asynchronous event call relation;

4) adding the calling relation between the components and the asynchronous event processing calling relation into CG, and defining the CG added with the two calling relations as an Android application extended call graph AECG;

5) using security sensitive function call, searching all execution paths capable of executing the function call from AECG by using a reverse technology to obtain a security sensitive behavior path set of Android application, and performing slice analysis on each path in the path set to obtain an instruction sequence of the security sensitive behavior path;

6) analyzing an instruction sequence of the security sensitive behavior path, extracting an instruction operation code and key operands related to the method parameter and the return value type, and then converting the key operands into a text sequence;

7) training a designed self-defined convolutional neural network by using a text sequence and a label with known good and bad properties of the Android application as the label to construct a behavior-based Android application malicious detection deep neural network classification model;

8) inputting a text sequence obtained after the operations of the steps 1) to 6) are carried out on the Android application with unknown quality and maliciousness into the neural network classification model constructed in the step 7) for detection, and obtaining a detection result.

Compared with the prior art, the invention has the following remarkable advantages: (1) aiming at inherent multi-component and event driving characteristics of an Android system and application, a static analysis method is adopted, and analysis of a component life cycle method, an event callback method and an inter-component calling method is combined, so that complete and accurate behavior path information of the Android application is obtained; (2) according to the method, sensitive behavior paths which possibly endanger the safety of system and user resources are extracted and analyzed through a reverse analysis and program slicing technology, so that the problem of path explosion is avoided, the analysis time is shortened, and the analysis efficiency is improved; (3) the method only analyzes the function calling information and the control flow information, does not analyze excessive data flow, and makes the analysis of large-scale application possible; (4) the Android executable file is analyzed and converted into a Jimple intermediate language for representation, so that source codes are not needed; the path analysis problem is converted into a text classification problem in natural language processing, so that the behavior characteristics of the Android application can be automatically acquired by utilizing a convolutional neural network; (5) according to the method, a deep learning model is adopted for training, and the automatic classification learning of Android application security sensitive behaviors is realized by utilizing the capability of automatically acquiring features of a convolutional neural network in the deep learning model.

Drawings

Fig. 1 is a framework diagram of an application behavior-based Android application malice detection system according to the present invention.

FIG. 2 is a flowchart illustrating Android application security sensitive behavior path extraction.

Detailed Description

The invention provides a static analysis method for Android application behavior analysis, which is characterized in that aiming at the characteristics of an Android system, an Android application function call graph and a control flow graph are utilized, a reverse analysis and program slicing technology is adopted, a complete behavior path which possibly contains malicious behaviors in Android application is extracted, and finally, the malicious behaviors of the Android application are detected by combining a deep learning model.

The principle of the invention is as follows: because the malicious function is usually hidden in the legal function code, and the inherent multi-component and event-driven characteristics of the Android system enable the malicious function code to be fragmented and hidden, the difficulty of analyzing the malicious function of the Android application is increased. According to the method, firstly, a static analysis technology is utilized to analyze control flow and data flow of a Dalvik executable file (dex file) of Android application, and an Android life cycle method, an event callback method, an inter-component message mechanism and a processing method are combined for analysis, so that a complete Android application dependency graph is constructed. Considering that the purpose of Android malicious application is to steal private information or execute malicious operation, namely access or operation on system sensitive resources, and the access to the sensitive resources in the Android system is performed by calling a system API, the full Android application dependency graph is too complex to accurately analyze the behaviors of the Android application dependency graph, so that the security-sensitive application behavior path information and the related execution context information are analyzed and extracted by using a reverse analysis and program slicing technology of sensitive API call guidance to serve as input data of a subsequent malicious application classification model based on machine learning. The multilayer structure of the deep learning model has strong expression capability, and can automatically acquire the characteristics in the input data, so that the method is suitable for classifying the complex information such as Android application behavior paths. According to the method, the path analysis problem is converted into a text classification problem in natural language processing, the automatic classification learning of the security sensitive behaviors of the Android application is realized by utilizing the capability of automatically acquiring features of a convolutional neural network in a deep learning model, and the detection of the malicious applications of the Android is further realized.

The invention will be further explained with reference to the drawings.

Fig. 2 is a flowchart of extracting a security sensitive behavior path of an Android application, which specifically includes the following steps:

the method comprises the steps of 1, firstly, decompiling an APK file, obtaining dex file information of application and information in a manifest file, and converting the dex file into a representation form of a Jimple intermediate code.

Analyzing Android application codes, utilizing a control flow graph CFG in a Soot tool generation method, utilizing the control flow graph, extracting all function call relations in the classes, and combining the function call relations of the life cycle of the component in an Android system to generate a call graph CG in the component.

And 3, independently analyzing the asynchronous events and the inter-component communication, and expanding the asynchronous Call relation and the inter-component Call relation obtained by analysis into the intra-component Call graph generated in the step 2 to generate an Android Extended Call graph AECG (Android Extended Call graph). The specific analysis method is as follows:

(1): the asynchronous event processing mechanism mainly analyzes thread starting, asynchronous tasks and message transmission. Analyzing the Android application code, searching for the starting function call of the three events, analyzing the actually executed function call corresponding to the starting function call, generating a call relation from the starting function call to the execution function call, and adding the call relation into the asynchronous event call relation;

(2): and analyzing the Manifest configuration file, and acquiring the component information and the intent-filter information defined in the configuration file. Analyzing Android application codes, searching function call information of communication among components, analyzing intent parameters in the method by using data stream analysis, determining call relations among the components through the intent parameters and component information in a configuration file, and generating the call relations among the components;

(3): and (3) adding the asynchronous call relation and the call relation among the components respectively generated in the steps (1) and (2) into the CG, and supplementing the missing part in the Android system in the original CG to obtain an expanded call graph.

And 4, starting from the security sensitive function call, performing reverse slicing analysis by using the Android extended call graph AECG generated in the step 3, and searching and storing all security sensitive behavior paths which can be executed to the function call. The specific analysis method is as follows:

(1): and searching the edges of the callees, which are security sensitive calls, in all the function call relations of the AECG, and generating a control flow graph of the callers in the relations. Performing control dependence slice analysis on the generated control flow diagram to obtain a code execution segment executed to Callee in the control flow diagram;

(2): and (3) continuously searching Callee as the side of the Caller in the previous step in the AECG, repeating the step (1), performing depth traversal search, finishing the search of a behavior path when the callees of all function call relations in the AECG do not have information of the Caller in the previous depth any more, combining the code sequences obtained in the search process according to the depth hierarchical sequence, generating a complete safety sensitive behavior path and storing the complete safety sensitive behavior path. When the full depth traversal is completed, all context-dependent path information for executing the call to the security-sensitive function can be obtained;

(3): and (3) repeating the steps (1) and (2) aiming at different security sensitive system calls to obtain all behavior paths related to the security sensitive function call in the Android application.

In summary, the context-based behavior track is used for carrying out malicious detection on the Android application for the first time, and the Android application security sensitive behavior analysis method is provided, so that the specific behavior track of the Android application can be accurately depicted, and the accuracy is higher. In the Android application security sensitive behavior generation process, a reverse analysis and program slicing technology of target guidance is used, so that analysis of irrelevant paths is greatly reduced, the problem of path explosion is solved to a certain extent, and the efficiency is improved. The invention converts the path analysis problem into the text analysis problem, so that the original code instruction which cannot be exhausted is converted into a limited text representation form, and the analysis can be carried out by utilizing a deep learning model. The automatic feature extraction is realized by adopting the convolutional neural network, so that the deviation caused by manual feature analysis is avoided; the malicious analysis based on Android application behaviors is realized by utilizing the deep learning model, and the analysis granularity is smaller and the detection precision is higher by virtue of the strong expression capability of the multilayer structure of the deep learning model.

Claims

1. An Android application malice detection method based on application behaviors is characterized by comprising the following steps:

1) decompiling the Android application to be detected to obtain a byte code file of the application, analyzing the byte code file, and constructing a control flow graph CFG and a component call graph CG in the method;

5) using security sensitive function call, searching all execution paths which can be executed to the function call from AECG by using a reverse technology to obtain a security sensitive behavior path set of Android application, and performing slice analysis on each path in the path set to obtain an instruction sequence of the security sensitive behavior path;

7) the method comprises the steps that known Android application benevolence and malice properties are used as labels, a text sequence and the labels are used for training a designed self-defined convolutional neural network, and therefore a behavior-based Android application malice detection deep neural network classification model is constructed;

2. The Android application malice detection method of claim 1, wherein: in the step 1), the application program to be detected is subjected to reverse processing by using an Android application program reverse analysis technology, and Android application program code logic is restored to a bytecode file in a Jimple form.

3. The Android application maliciousness detection method according to claim 1, characterized in that: in the step 1), a control flow graph is constructed by using a control flow graph generation method provided in a Soot open source tool, a byte code file is analyzed, a direct call relation contained in the file is searched and obtained, a function call relation related to the component life cycle provided by an Android system is added, and the call relations of the two are integrated to form an in-component call graph.

4. The Android application malice detection method of claim 1, wherein: the method for generating the call relationship among the components in the step 2) comprises the following steps: and determining a code segment to be analyzed by searching function calls related to communication among the components in the byte code file, acquiring component information contained in the code segment, matching the component information with the component information extracted in the Mainfest file, and generating an inter-component call relation from the function calls among the components to the component information.

5. The Android application malice detection method of claim 1, wherein: all the function calls related to the asynchronous event processing mechanism in the step 3) are indirect function calls, and comprise a thread starting function call, an asynchronous task function call and a handle message transfer function call; the method comprises the steps of searching function calls related to an asynchronous event processing mechanism in a byte code file, generating a calling relation of actual execution function calls corresponding to the function calls, and defining the relation as an asynchronous event calling relation.

6. The Android application maliciousness detection method according to claim 1, characterized in that: the security sensitive function call in the step 5) is a set of the following four types of function calls: the method comprises the steps of Android system function call, Java reflection function call, Native function call and file operation function call which are protected by the authority.

7. The Android application maliciousness detection method according to claim 1, characterized in that: the method for acquiring the instruction sequence of the security sensitive behavior path in the step 6) comprises the following steps: and obtaining code execution segments from the beginning of execution to the next function call in the path in each function call by carrying out slice analysis on the CFG of each function call in the path, and combining the code execution segments according to the function call sequence in the path.