CN110941828A - Android malicious software static detection method based on android GRU - Google Patents

Android malicious software static detection method based on android GRU Download PDF

Info

Publication number
CN110941828A
CN110941828A CN201911106998.2A CN201911106998A CN110941828A CN 110941828 A CN110941828 A CN 110941828A CN 201911106998 A CN201911106998 A CN 201911106998A CN 110941828 A CN110941828 A CN 110941828A
Authority
CN
China
Prior art keywords
android
gru
input data
similarity calculation
sim
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911106998.2A
Other languages
Chinese (zh)
Other versions
CN110941828B (en
Inventor
周翰逊
郭薇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Morning Intellectual Property Operations Co ltd
Sichuan Hengying Information Technology Service Co ltd
Original Assignee
Liaoning University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Liaoning University filed Critical Liaoning University
Priority to CN201911106998.2A priority Critical patent/CN110941828B/en
Publication of CN110941828A publication Critical patent/CN110941828A/en
Application granted granted Critical
Publication of CN110941828B publication Critical patent/CN110941828B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Virology (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Devices For Executing Special Programs (AREA)
  • Stored Programmes (AREA)

Abstract

An android malicious software static detection method based on android GRU comprises the following steps: 1) decompiling the android APK file by using an android reverse tool, analyzing android Manifest.xml and extracting the entries characteristics used by the android application program; 2) py is adopted to generate a function call graph and extract sensitive function call sequences from the function call graph; 3) the model training module is responsible for training the android GRU model based on the extracted static features, and the detection module detects unknown android APK samples through the trained android GRU model. The invention provides an android malicious software static detection method based on android GRU by combining the similarity of android malicious software and the characteristics of a deep learning GRU model through the method.

Description

Android malicious software static detection method based on android GRU
Technical Field
The invention relates to a static detection method for android malicious software, in particular to a static detection method for android malicious software based on android.
Background
At present, static detection is carried out on unknown android malware at present, the existing deep learning technology is generally directly applied to the static detection of the android malware, and the detection effect on the android malware is not obvious in the fields of images and the like due to the fact that the characteristics of the android malware are not considered.
Disclosure of Invention
Based on the problems, the invention provides an android malicious software static detection method based on android GRU. Firstly, reverse engineering processing is carried out on an android APK file, a sensitive function calling sequence and the entries characteristics are extracted from the android APK file, and the sensitive function calling sequence and the entries characteristics are used as training data of a deep learning model. For malware, there is some similarity between sensitive function call sequences of different malware. The invention improves the GRU structure by adopting a text similarity principle, and provides an android malicious software detection model-android GRU based on GRU.
The invention adopts the technical scheme that:
an android malicious software static detection method based on android is characterized by comprising the following steps:
1) decompiling the android APK file by using an android reverse tool, analyzing android Manifest.xml and extracting the entries characteristics used by the android application program;
2) py is adopted to generate a function call graph and extract sensitive function call sequences from the function call graph;
3) the model training module is responsible for training the android GRU model based on the extracted static features, and the detection module detects unknown android APK samples through the trained android GRU model.
The specific method in the step 2) is as follows:
2.1) preprocessing the function call graph, filtering the function call graph through an android reverse tool, and simplifying the function call graph to only contain sensitive function calls;
and 2.2) traversing the sensitive function call graph, extracting a sensitive function call sequence from the graph, and taking the extracted sensitive function call sequence as training data.
In the step 3), the android model is as follows: the method is characterized in that a text similarity principle is combined with a GRU structure, the internal structure of the GRU structure is improved by analyzing a threshold mechanism of the GRU structure, and the provided GRU-based android malicious software detection model is provided.
In the step 3), the step of the method is that,
3.1) similarity calculation based on input data:
input data x of GRUtThe method is characterized in that vectorization representation of original data is carried out, similarity calculation is carried out on input data of two adjacent GRU units, and a similarity can be obtained:s=sim(xt-1,xt) (ii) a X can be obtained from the similarityt-1And xtDifference information between: Δ x ═ 1-s) xt-1(ii) a The GRU structure for performing similarity calculation on input data is named as InputGRU;
inputting difference information between input data of two adjacent GRU units and current input data into a reset gate and an update gate together, controlling information transmission process by the difference information and learning more abstract information from the difference information:
zt=σ(Wzxt+UΔx(1-sim(xt-1,xt))xt-1+Uzht-1+bz) (3)
rt=σ(Wrxt+UΔx(1-sim(xt-1,xt))xt-1+Urht-1+br) (4)
for candidate state
Figure BDA0002271610600000021
Selecting input data xtAs input, the input information of the current time step is retained:
Figure BDA0002271610600000022
hidden state htIs information learned from the input data and hidden states; at time step t, hidden state h of InputGRUtThe calculation formula of (a) is as follows:
Figure BDA0002271610600000023
in the step 3), the step of the method is that,
3.2) similarity calculation based on hidden states:
hidden state h of GRU input at time step tt-1All information including input data for the first t-1 time steps; hidden state h of GRU input at time step t-1t-2Involving the first t-2 time stepsInputting all information of the data; similar to similarity calculation based on input data, for ht-1And ht-2Similarity calculation was performed: s ═ sim (h)t-2,ht-1) And the difference information of the two is as follows: Δ h ═ 1-s) ht-2(ii) a The GRU structure for carrying out similarity calculation on the hidden state is named as HiddenGRU;
hidden state h output to the first two GRU units of the current time step tt-2、ht-1Calculating the similarity to obtain the difference information between the two, and combining the difference information with the hidden state h currently inputt-1Input into reset and update gates:
rt=σ(Wrxt+Urht-1+UΔh(1-sim(ht-2,ht-1))ht-2+br) (7)
zt=σ(Wzxt+Uzht-1+UΔh(1-sim(ht-2,ht-1))ht-2+bz) (8)
for candidate state
Figure BDA0002271610600000024
And hidden state htThe calculation formula of the two is the same as that in the InputGRU structure.
In the step 3), the step of the method is that,
3.3) similarity calculation based on input data and hidden states
Similarity calculation is carried out on input data and hidden states in a GRU structure at the same time, so that more abstract information can be learned from the data; carrying out similarity calculation on input data and a hidden state in a GRU structure at the same time, and naming the GRU structure as an InputHiddenGRU;
rt=σ(Wrxt+UΔx(1-sim(xt-1,xt))xt-1+Urht-1+UΔh(1-sim(ht-2,ht-1))ht-2+br) (9)
zt=σ(Wzxt+UΔx(1-sim(xt-1,xt))xt-1+Uzht-1+UΔh(1-sim(ht-2,ht-1))ht-2+bz) (10)。
for three different GRU structures 3.1) -3.3), the following formula can be used for abstract representation:
NewGRU=i*InputGRU+j*HiddenGRU,{i,j}∈{0,1} (11)
NewGRU represents an abstract definition of the structure of a GRU;
since i and j can only take values between 0,1, the above formula can represent 4 different GRU structures as follows:
when i is 0 and j is 0, the structure will degenerate into a native GRU structure;
when i is 1 and j is 0, the structure is an InputGRU structure, and similarity calculation is carried out on input data only;
when i is 0 and j is 1, the structure is a HiddenGRU structure, and similarity calculation is carried out on the hidden state only;
when i is 1 and j is 1, the structure is an InputHiddenGRU structure, and similarity calculation is simultaneously carried out on input data and a hidden state in the GRU structure.
By adopting the scheme, the invention provides the android malicious software static detection method based on the android GRU. Through the technical method, the method solves the problem of combining the characteristics of the android malicious software with a deep learning model, and can obviously improve the detection effect.
Description of the drawings:
FIG. 1 is an overall architecture diagram of the present invention.
Fig. 2 is a schematic diagram of the internal structure of the InputGRU.
FIG. 3 is a schematic diagram of the internal structure of HiddenGRU.
FIG. 4 shows the internal structure of an InputHiddenGRU.
Fig. 5 is a schematic diagram of the AndroGRU model.
Fig. 6 is a schematic diagram of the internal structure of a GRU.
Detailed Description
An android malicious software static detection method based on android is characterized by comprising the following steps:
1) decompiling the android APK file by using an android reverse tool, analyzing android Manifest.xml and extracting the entries characteristics used by the android application program;
2) py is adopted to generate a function call graph and extract sensitive function call sequences from the function call graph;
2.1) preprocessing the function call graph, filtering the function call graph through an android reverse tool, and simplifying the function call graph to only contain sensitive function calls;
and 2.2) traversing the sensitive function call graph, extracting a sensitive function call sequence from the graph, and taking the extracted sensitive function call sequence as training data.
3) The model training module is responsible for training the android GRU model based on the extracted static features, and the detection module detects unknown android APK samples through the trained android GRU model.
The AndroGRU model is: the method is characterized in that a text similarity principle is combined with a GRU structure, the internal structure of the GRU structure is improved by analyzing a threshold mechanism of the GRU structure, and the provided GRU-based android malicious software detection model is provided.
3.1) similarity calculation based on input data:
input data x of GRUtThe vectorization representation of the original data is performed, similarity calculation is performed on input data of two adjacent GRU units, and a similarity can be obtained: s ═ sim (x)t-1,xt) (ii) a X can be obtained from the similarityt-1And xtDifference information between: Δ x ═ 1-s) xt-1(ii) a The GRU structure for performing similarity calculation on input data is named as InputGRU;
inputting difference information between input data of two adjacent GRU units and current input data into a reset gate and an update gate together, controlling information transmission process by the difference information and learning more abstract information from the difference information:
zt=σ(Wzxt+UΔx(1-sim(xt-1,xt))xt-1+Uzht-1+bz) (3)
rt=σ(Wrxt+UΔx(1-sim(xt-1,xt))xt-1+Urht-1+br) (4)
all models are GRU based, the GRU model is as follows:
a Recurrent Neural Network (RNN) is suitable for processing time series type data, and has a wide application in the field of natural language processing. The GRU is a special RNN model, solves the problem of gradient disappearance of a native RNN, and is widely applied to tasks such as text classification. The GRU uses a threshold mechanism to control the state of incoming data without using separate memory cells. There are two types of threshold structures in the GRU: a reset gate and an update gate which together control how the GRU internal structure learns from the input data and hidden state output by the previous GRU unit, the GRU internal structure being as shown:
wherein, at the t time step, the hidden state htThe calculation formula of (a) is as follows:
Figure BDA0002271610600000051
zt=σ(W2xt+Uzht-1+bz)
Figure BDA0002271610600000052
rt=σ(Wrxt+Urht-1+br)
xtis the input vector of the t time step, and sigma is sigmoid activation function Wz、WhAnd WrIs a mapping matrix, Uz、UhAnd UrIs the weight matrix and b is the offset. x is the number oftAnd htIs the input of the GRU structure at the t-th time step, rtIs the output of the reset gate, ztIs the output of the update gate. Hidden state htThe information is learned from input data and hidden states and is controlled by a reset gate and an update gate together, wherein the reset gate determines how much state information of the previous t-1 time step is discarded, and the smaller the value of the reset gate is, the more the discarded information is; the update gate determines how much state information for the previous t-1 time step is retained. That is, the reset gate and the update gate are able to store and filter information from the input data and hidden states. Candidate hidden states
Figure BDA0002271610600000057
In addition to storing the incoming data at time step t containing information, information of the hidden state controlled by the reset gate is also stored.
For candidate state
Figure BDA0002271610600000053
Selecting input data xtAs input, the input information of the current time step is retained:
Figure BDA0002271610600000054
hidden state htIs information learned from the input data and hidden states; at time step t, hidden state h of InputGRUtThe calculation formula of (a) is as follows:
Figure BDA0002271610600000055
3.2) similarity calculation based on hidden states:
hidden state h of GRU input at time step tt-1All information including input data for the first t-1 time steps; hidden state h of GRU input at time step t-1t-2All information including input data for the first t-2 time steps; similar to similarity calculation based on input data, for ht-1And ht-2Similarity calculation was performed: s ═ sim (h)t-2,ht-1) Difference of the twoThe different information is: Δ h ═ 1-s) ht-2(ii) a The GRU structure for carrying out similarity calculation on the hidden state is named as HiddenGRU;
hidden state h output to the first two GRU units of the current time step tt-2、ht-1Calculating the similarity to obtain the difference information between the two, and combining the difference information with the hidden state h currently inputt-1Input into reset and update gates:
rt=σ(Wrxt+Urht-1+UΔh(1-sim(ht-2,ht-1))ht-2+br) (7)
zt=σ(Wzxt+Uzht-1+UΔh(1-sim(ht-2,ht-1))ht-2+bz) (8)
for candidate state
Figure BDA0002271610600000056
And hidden state htThe calculation formula of the two is the same as that in the InputGRU structure.
3.3) similarity calculation based on input data and hidden states
Similarity calculation is carried out on input data and hidden states in a GRU structure at the same time, so that more abstract information can be learned from the data; carrying out similarity calculation on input data and a hidden state in a GRU structure at the same time, and naming the GRU structure as an InputHiddenGRU;
rt=σ(Wrxt+UΔx(1-sim(xt-1,xt))xt-1+Urht-1+UΔh(1-sim(ht-2,ht-1))ht-2+br) (9)
zt=σ(Wzxt+UΔx(1-sim(xt-1,xt))xt-1+Uzht-1+UΔh(1-sim(ht-2,ht-1))ht-2+bz) (10)
for three different GRU structures 3.1) -3.3), the following formula can be used for abstract representation:
NewGRU=i*InputGRU+j*HiddenGRU,{i,j}∈{0,1} (11)
NewGRU represents an abstract definition of the structure of a GRU;
since i and j can only take values between 0,1, the above formula can represent 4 different GRU structures as follows:
when i is 0 and j is 0, the structure will degenerate into a native GRU structure;
when i is 1 and j is 0, the structure is an InputGRU structure, and similarity calculation is carried out on input data only;
when i is 0 and j is 1, the structure is a HiddenGRU structure, and similarity calculation is carried out on the hidden state only;
when i is 1 and j is 1, the structure is an InputHiddenGRU structure, and similarity calculation is simultaneously carried out on input data and a hidden state in the GRU structure.
Example 1:
1 integral framework
The overall architecture 1 of the GRU-based android malicious software detection method comprises 4 parts: the device comprises a data collection module, a static characteristic extraction module, a model training module and a detection module.
The data collection module is used for collecting available android software data sets, and the data collection module comprises normal android samples and malicious android samples, and the android samples are usually grabbed from platforms such as an android application market and a malicious software forum in a crawler mode. The static feature extraction module is responsible for extracting sensitive function calling sequences and entries static features from the android APK file; firstly, decompiling an android APK file by using an android reverse tool, analyzing android Manifest.xml and extracting the entries characteristics used by an android application program; py is then used to generate a function call graph and extract sensitive function call sequences from it. The model training module is responsible for training the android model based on the extracted static features. And the detection module detects unknown android APK samples through the trained android model.
2 static feature extraction
When the android application program is externally published, a developer can package a source file of the application program. An android APK file is a compressed package and its size typically varies from a few KB to tens of MB, which typically consumes more computing resources if trained directly, while also not being able to extract the critical information in the APK file well. Therefore, this section will take the internal files of android APK as research objects. Android APK files generally include META-INF/, res/, libs/, android manifest, classes, frequencies, ars and other files, and some files are unreadable, so that a reverse tool (android) is required to reversely engineer the APK file and extract static information such as function call graphs, control flow graphs, permissions, entries and the like from the APK file.
For an android APK file, an android manifest.xml file provides information required for application installation and execution; dex contains features that can describe its behavior. For byte code files, information is included from a coarse level of granularity, such as packets, to a fine level of granularity, such as instructions. To avoid complicated procedural analysis, this is computationally expensive. Thus, function call level information is extracted that can capture the behavior of the android application. This subsection focuses on sensitive function call sequences and entries features extracted from malware.
2.1 extraction of Intents features
The objects are used as a complex message communication system in an android operating system, the communication between the interior of an android application program and the application program is mainly completed through the objects, and the objects provide an abstract definition for the operation executed by the application program. Intets consists of three components: actions, categories, and data. Action components describe the type of operation to be performed, such as MAIN, CALL, BATTERY LOW, SCREEN ON, and EDIT. Entries need to specify the categories to which they belong, such as launchr, BROWSABLE, and GADGET. The data component provides the necessary data for the operating component. For example, a CALL operation requires a telephone number, and an EDIT operation requires a document or HTTP URL to complete an action. The entries component of the android application has rich semantic information, and compared with static characteristics such as permissions, the entries can identify malware more accurately [19 ].
The method takes all intents contained in android Manifest xml in the android APK file as a feature set. Android malware often listens for certain specific intents to directly trigger malicious behavior. A typical example of android malware using Intent is BOOT _ COMPLETED, which is used to trigger malicious activity directly after a reboot of a smart device. Xml is an unreadable file, it needs to be parsed using the androgrard reverse tool and the entries features extracted from it.
However, intets is one of the valid features to identify android malicious applications. They have experimentally demonstrated that using the entries feature alone to identify malware is not the best solution, and the entries feature should be combined with other features. Thus, the present invention creates another alternative to sensitive function calls as another class of features.
2.2 extraction of sensitive function Call sequences from function Call graphs
The present invention is created with attention to features extracted from a Function Call Graph (FCG). The main reason for this is that the FCG better retains structural information in the binary file, e.g., compared to the n-gram feature. In addition to containing information about malware code in the form of functions and their code, they also contain information about interactions between functions. Static features based on function call graphs provide a powerful representation of malware and have been successfully used to detect malware on Windows systems.
Android APKs are typically written using the Java language, with Java source code being compiled into classes. Thus, the bytecode file is also processed in reverse using android, and a function call graph is generated using script android.
The android malware executes malicious activities on the private data by triggering sensitive functions. It is crucial for malware that the sensitive function calls in the function call graph are the ones that specify which malicious operations are to be performed by the malware. Malware and its variants of the same family generally behave similarly, that is, they call some similar sensitive functions. For example, the getVoiceMailNumber () method is called by the malware of the genimi family, which is a type of bot-like malware that is mainly used to steal personal privacy information and send it to a remote server. Malware calls very often sensitive functions. For example, setwifienable () function is used to launch WiFi, which may result in application updates without user permission, resulting in traffic used by the user being used in excess; runtime.exec () function is used to execute external commands, which may cause information of the user to be leaked, or may install some malicious software to the user; SendTextMessage (), SendBroadcast (), SendDataMessage () functions are used to send and receive SMS/MMS messages; the getDeviceId (), getSimStalkNumber () methods are used to access sensitive information on the handset.
Since direct analysis of the FCG is time consuming and computing resource intensive, it typically contains thousands of nodes, and thus the present invention creates a pre-processing of the FCG. The FCG is reduced to one containing only sensitive function calls by filtering the FCG through the Androguard reverse tool. This approach both preserves the malicious behavior of the android application while reducing the complexity of the FCG. In order to keep the relation between function calls in the sensitive function call graph, the invention adopts a graph traversal algorithm to traverse the sensitive function call graph, extracts a sensitive function call sequence from the graph, and takes the extracted sensitive function call sequence as training data.
4.3Androgru model
Based on the relevance among the sensitive function calling sequence features, the invention combines the text similarity principle with the GRU structure, improves the internal structure of the GRU structure by analyzing the threshold mechanism of the GRU structure, and provides an android malicious software detection model based on the GRU, namely an android GRU model.
Text similarity is the similarity between two texts calculated by a mathematical formula. The text similarity theory is widely applied in the fields of text classification, text clustering and the like, and the most common similarity measurement method is Euclidean distance which represents the similarity of two objects by calculating the distance between the two objects. Secondly, the Cosine similarity measure is used for calculating the included angle between two vectors, and is widely used in the fields of text classification and the like. Both Euclidean distance and Cosine similarity measures are common methods in machine learning and pattern recognition. The calculation formula of Euclidean similarity measurement and Cosine similarity measurement is as follows:
simEuc(ht-1,ht)=[(ht-1-ht)·(ht-1-ht)]1/2(1)
Figure BDA0002271610600000091
wherein: h istAnd ht-1Is a vector of the same dimension, | | h | |, is the length of h, ht`ht-1Is a dot product.
For android malware, different malware in the same family generally calls some common sensitive functions, and certain call relations exist among the sensitive functions. However, the sensitive function sequences obtained by the graph traversal algorithm preserve this calling relationship. Meanwhile, sensitive functions called in the malicious software of the same family have certain similarity, and the relevance between two adjacent characteristics is described by the text similarity principle.
Since the sensitive function calling sequence and the entries features are both text type data, the invention selectively uses the recurrent neural network model GRU for modeling. The input data and the hidden state of the GRU structure contain different information, and the reset gate and the update gate together control the information transfer process in the GRU structure. Therefore, the present invention creates similarity calculations only on the inputs (input data and hidden state) of the reset gate and the update gate, which is more able to pass as much information into the interior of the GRU structure as possible.
3.1 similarity calculation based on input data
Input data x of GRUtThe vectorization representation of the original data is performed, similarity calculation is performed on input data of two adjacent GRU units, and a similarity can be obtained: s ═ sim (x)t-1,xt). Based on the relationship between the information theory and the text similarity introduced above, x can be obtained through the similarityt-1And xtDifference information between: Δ x ═ 1-s) xt-1. For the sake of distinction, the GRU structure for similarity calculation on input data is named InputGRU, as shown in fig. 2.
Wherein the position pointed by the wide arrow is based on the input data xtAnd (4) calculating the similarity. Since both the reset gate and the update gate control the information transfer process in the GRU structure, the difference information between the input data of two adjacent GRU units is input to the reset gate and the update gate together with the current input data, the information transfer process is controlled by them, and more abstract information is learned therefrom:
zt=σ(Wzxt+UΔx(1-sim(xt-1,xt))xt-1+Uzht-1+bz) (3)
rt=σ(Wrxt+UΔx(1-sim(xt-1,xt))xt-1+Urht-1+br) (4)
however, for candidate states
Figure BDA0002271610600000101
Still select to input data xtAs input, to keep the input information for the current time step:
Figure BDA0002271610600000102
hidden state ht is information learned from the input data and the hidden state. Therefore, at time step t, the hidden state ht of the InputGRU is calculated as follows:
Figure BDA0002271610600000103
3.2 similarity calculation based on hidden states
Hidden state h of GRU input at time step tt-1Contains all information of the input data of the first t-1 time steps. Similarly, at time step t-1, hidden state h of GRU inputt-2Contains all information of the input data of the first t-2 time steps. Similar to similarity calculation based on input data, this subsection is for ht-1And ht-2Similarity calculation was performed: s ═ sim (h)t-2, ht-1) And the difference information of the two is as follows: Δ h ═ 1-s) ht-2. The GRU structure for similarity calculation for hidden states is named HiddenGRU as shown in fig. 3.
Wherein, the position pointed by the wide arrow is the similarity calculation based on the hidden state. Hidden state h output to the first two GRU units of the current time step tt-2、ht-1Calculating the similarity to obtain the difference information between the two, and combining the difference information with the hidden state h currently inputt-1Input into reset and update gates:
rt=σ(Wrxt+Urht-1+UΔh(1-sim(ht-2,ht-1))ht-2+br) (7)
zt=σ(Wzxt+Uzht-1+UΔh(1-sim(ht-2,ht-1))ht-2+bz) (8)
however, for candidate states
Figure BDA0002271610600000104
And hidden state htThe calculation formula of the two is the same as that in the InputGRU structure.
4.3.3 similarity calculation based on input data and hidden states
The similarity calculation is carried out on the input data and the hidden state respectively in the first two subsections, and because the information contained in the input data and the hidden state is different, the similarity calculation is carried out on the input data and the hidden state simultaneously in the GRU structure, so that more abstract information can be learned from the data. Similarity calculations are performed simultaneously on the input data and hidden states within the GRU structure, which is named InputHiddenGRU, as shown in fig. 4.
The positions pointed by the wide black arrows are similarity calculation based on the hidden state, and the positions pointed by the wide gray arrows are similarity calculation based on the input data. The calculation formulas for the reset gate and the update gate are as follows:
rt=σ(Wrxt+UΔx(1-sim(xt-1,xt))xt-1+Urht-1+UΔh(1-sim(ht-2,ht-1))ht-2+br) (9)
zt=σ(Wzxt+UΔx(1-sim(xt-1,xt))xt-1+Uzht-1+UΔh(1-sim(ht-2,ht-1))ht-2+bz) (10)
in summary, for the three different GRU structures, the following formula can be used for abstract representation:
NewGRU=i*InputGRU+j*HiddenGRU,{i,j}∈{0,1} (11)
wherein NewGRU represents an abstract definition of the GRU structure. Since i and j can only take values between 0,1, the above formula can represent 4 different GRU structures as follows:
when i is 0 and j is 0, the structure will degenerate into a native GRU structure;
when i is 1 and j is 0, the structure is an InputGRU structure, and similarity calculation is carried out on input data only;
when i is 0 and j is 1, the structure is a HiddenGRU structure, and similarity calculation is carried out on the hidden state only;
when i is 1 and j is 1, the structure is an InputHiddenGRU structure, and similarity calculation is simultaneously carried out on input data and a hidden state in the GRU structure.
3.4Androgru model
Based on two different static characteristics of the entries and the sensitive function calling sequence, the extracted static characteristics are combined with the improved GRU structure, and the invention provides an android malicious software detection model-android GRU model based on the GRU, as shown in FIG. 5:
wherein, the GRU of the cycle layer in the model can use one of the three GRU structures proposed in this section. The model respectively uses a GRU model to train different characteristics, learned information is combined through a full connection layer, and finally prediction is carried out through a SoftMax layer, so that whether unknown android application software is malicious or not is judged.

Claims (7)

1. An android malicious software static detection method based on android is characterized by comprising the following steps:
1) decompiling the android APK file by using an android reverse tool, analyzing android Manifest.xml and extracting the entries characteristics used by the android application program;
2) py is adopted to generate a function call graph and extract sensitive function call sequences from the function call graph;
3) the model training module is responsible for training the android GRU model based on the extracted static features, and the detection module detects unknown android APK samples through the trained android GRU model.
2. The android malware static detection method based on android of claim 1, wherein the specific method in step 2) is as follows:
2.1) preprocessing the function call graph, filtering the function call graph through an android reverse tool, and simplifying the function call graph to only contain sensitive function calls;
and 2.2) traversing the sensitive function call graph, extracting a sensitive function call sequence from the graph, and taking the extracted sensitive function call sequence as training data.
3. The android malware static detection method based on android of claim 1, wherein in the step 3), the android model is as follows: the method is characterized in that a text similarity principle is combined with a GRU structure, the internal structure of the GRU structure is improved by analyzing a threshold mechanism of the GRU structure, and the provided GRU-based android malicious software detection model is provided.
4. The android malware static detection method based on android of claim 1, wherein in step 3),
3.1) similarity calculation based on input data:
input data x of GRUtThe vectorization representation of the original data is performed, similarity calculation is performed on input data of two adjacent GRU units, and a similarity can be obtained: s ═ sim (x)t-1,xt) (ii) a X can be obtained from the similarityt-1And xtDifference information between: Δ x ═ 1-s) xt-1(ii) a The GRU structure for performing similarity calculation on input data is named as InputGRU;
inputting difference information between input data of two adjacent GRU units and current input data into a reset gate and an update gate together, controlling information transmission process by the difference information and learning more abstract information from the difference information:
zt=σ(Wzxt+UΔx(1-sim(xt-1,xt))xt-1+Uzht-1+bz) (3)
rt=σ(Wrxt+UΔx(1-sim(xt-1,xt))xt-1+Urht-1+br) (4)
for candidate state
Figure FDA0002271610590000011
Selecting input data xtAs input, the input information of the current time step is retained:
Figure FDA0002271610590000012
hidden state htIs information learned from the input data and hidden states; at time step t, hidden state h of InputGRUtThe calculation formula of (a) is as follows:
Figure FDA0002271610590000021
5. the android malware static detection method based on android of claim 4, wherein in the step 3),
3.2) similarity calculation based on hidden states:
hidden state h of GRU input at time step tt-1All information including input data for the first t-1 time steps; hidden state h of GRU input at time step t-1t-2All information including input data for the first t-2 time steps; similar to similarity calculation based on input data, for ht-1And ht-2Similarity calculation was performed: s ═ sim (h)t-2,ht-1) And the difference information of the two is as follows: Δ h ═ 1-s) ht-2(ii) a The GRU structure for carrying out similarity calculation on the hidden state is named as HiddenGRU;
hidden state h output to the first two GRU units of the current time step tt-2、ht-1Calculating the similarity to obtain the difference information between the two, and combining the difference information with the hidden state h currently inputt-1Input into reset and update gates:
rt=σ(Wrxt+Urht-1+UΔh(1-sim(ht-2,ht-1))ht-2+br) (7)
zt=σ(Wzxt+Uzht-1+UΔh(1-sim(ht-2,ht-1))ht-2+bz) (8)
for candidate state
Figure FDA0002271610590000022
And hidden state htThe calculation formula of the two is the same as that in the InputGRU structure.
6. The android malware static detection method based on android of claim 5, wherein in the step 3),
3.3) similarity calculation based on input data and hidden states
Similarity calculation is carried out on input data and hidden states in a GRU structure at the same time, so that more abstract information can be learned from the data; carrying out similarity calculation on input data and a hidden state in a GRU structure at the same time, and naming the GRU structure as an InputHiddenGRU;
rt=σ(Wrxt+UΔx(1-sim(xt-1,xt))xt-1+Urht-1+UΔh(1-sim(ht-2,ht-1))ht-2+br) (9)
zt=σ(Wzxt+UΔx(1-sim(xt-1,xt))xt-1+Uzht-1+UΔh(1-sim(ht-2,ht-1))ht-2+bz) (10)。
7. the android malware static detection method based on android of claim 6, wherein: for three different GRU structures 3.1) -3.3), the following formula can be used for abstract representation:
NewGRU=i*InputGRU+j*HiddenGRU,{i,j}∈{0,1} (11)
NewGRU represents an abstract definition of the structure of a GRU;
since i and j can only take values between 0,1, the above formula can represent 4 different GRU structures as follows:
when i is 0 and j is 0, the structure will degenerate into a native GRU structure;
when i is 1 and j is 0, the structure is an InputGRU structure, and similarity calculation is carried out on input data only;
when i is 0 and j is 1, the structure is a HiddenGRU structure, and similarity calculation is carried out on the hidden state only;
when i is 1 and j is 1, the structure is an InputHiddenGRU structure, and similarity calculation is simultaneously carried out on input data and a hidden state in the GRU structure.
CN201911106998.2A 2019-11-13 2019-11-13 Android malicious software static detection method based on android GRU Active CN110941828B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911106998.2A CN110941828B (en) 2019-11-13 2019-11-13 Android malicious software static detection method based on android GRU

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911106998.2A CN110941828B (en) 2019-11-13 2019-11-13 Android malicious software static detection method based on android GRU

Publications (2)

Publication Number Publication Date
CN110941828A true CN110941828A (en) 2020-03-31
CN110941828B CN110941828B (en) 2023-12-15

Family

ID=69906709

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911106998.2A Active CN110941828B (en) 2019-11-13 2019-11-13 Android malicious software static detection method based on android GRU

Country Status (1)

Country Link
CN (1) CN110941828B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114491525A (en) * 2021-12-16 2022-05-13 四川大学 Android malicious software detection feature extraction method based on deep reinforcement learning
CN116484382A (en) * 2023-04-07 2023-07-25 中国人民解放军61660部队 Dynamic detection method, system, electronic equipment and storage medium for An Zhuo Loudong

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105205396A (en) * 2015-10-15 2015-12-30 上海交通大学 Detecting system for Android malicious code based on deep learning and method thereof
CN107169351A (en) * 2017-05-11 2017-09-15 北京理工大学 With reference to the Android unknown malware detection methods of dynamic behaviour feature
CN108647518A (en) * 2018-03-16 2018-10-12 广东工业大学 A kind of Android platform malware detection method based on deep learning
CN109711163A (en) * 2018-12-26 2019-05-03 西安电子科技大学 Android malware detection method based on API Calls sequence
US10437999B1 (en) * 2016-08-31 2019-10-08 Symantec Corporation Runtime malware detection
KR20190125880A (en) * 2018-04-30 2019-11-07 한국과학기술원 Static analysis method and apparatus for activity injection detecting

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105205396A (en) * 2015-10-15 2015-12-30 上海交通大学 Detecting system for Android malicious code based on deep learning and method thereof
US10437999B1 (en) * 2016-08-31 2019-10-08 Symantec Corporation Runtime malware detection
CN107169351A (en) * 2017-05-11 2017-09-15 北京理工大学 With reference to the Android unknown malware detection methods of dynamic behaviour feature
CN108647518A (en) * 2018-03-16 2018-10-12 广东工业大学 A kind of Android platform malware detection method based on deep learning
KR20190125880A (en) * 2018-04-30 2019-11-07 한국과학기술원 Static analysis method and apparatus for activity injection detecting
CN109711163A (en) * 2018-12-26 2019-05-03 西安电子科技大学 Android malware detection method based on API Calls sequence

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杜炜 等: "基于半监督学习的安卓恶意软件检测及其恶意行为分析" *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114491525A (en) * 2021-12-16 2022-05-13 四川大学 Android malicious software detection feature extraction method based on deep reinforcement learning
CN114491525B (en) * 2021-12-16 2023-04-07 四川大学 Android malicious software detection feature extraction method based on deep reinforcement learning
CN116484382A (en) * 2023-04-07 2023-07-25 中国人民解放军61660部队 Dynamic detection method, system, electronic equipment and storage medium for An Zhuo Loudong

Also Published As

Publication number Publication date
CN110941828B (en) 2023-12-15

Similar Documents

Publication Publication Date Title
Chawla et al. Host based intrusion detection system with combined CNN/RNN model
Li et al. Deeppayload: Black-box backdoor attack on deep learning models through neural payload injection
Vinayakumar et al. Deep android malware detection and classification
Almomani et al. An automated vision-based deep learning model for efficient detection of android malware attacks
Agrawal et al. Neural sequential malware detection with parameters
CN109614795B (en) Event-aware android malicious software detection method
CN111092894A (en) Webshell detection method based on incremental learning, terminal device and storage medium
CN111428236A (en) Malicious software detection method, device, equipment and readable medium
CN110941828B (en) Android malicious software static detection method based on android GRU
CN113239354A (en) Malicious code detection method and system based on recurrent neural network
Kakisim et al. Sequential opcode embedding-based malware detection method
CN114817924B (en) AST (AST) and cross-layer analysis based android malicious software detection method and system
Fu et al. An LSTM-based malware detection using transfer learning
Fathurrahman et al. Lightweight convolution neural network for image-based malware classification on embedded systems
Dhabal et al. Towards Design of a Novel Android Malware Detection Framework Using Hybrid Deep Learning Techniques
CN116702143A (en) Intelligent malicious software detection method based on API (application program interface) characteristics
CN114817925B (en) Android malicious software detection method and system based on multi-modal graph features
Yang et al. A novel Android malware detection method with API semantics extraction
CN116306672A (en) Data processing method and device
Sasidharan et al. Memdroid-lstm based malware detection framework for android devices
Ale et al. Few-shot learning to classify android malwares
CN114491528A (en) Malicious software detection method, device and equipment
Han et al. A novel malware detection approach based on behavioral semantic analysis and LSTM model
CN114648679A (en) Neural network training method, neural network training device, target detection method, target detection device, equipment and storage medium
CN114638984A (en) Malicious website URL detection method based on capsule network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20231013

Address after: 518000 909, Building 49, No. 3, Queshan Yunfeng Road, Taoyuan Community, Dalang Street, Longhua District, Shenzhen, Guangdong

Applicant after: Shenzhen Morning Intellectual Property Operations Co.,Ltd.

Address before: No. 12, 19th Floor, Zone 1, Taifeng Building, East Section of Dangui Street, Ziliujing District, Zigong City, Sichuan Province, 643000

Applicant before: Sichuan Hengying Information Technology Service Co.,Ltd.

Effective date of registration: 20231013

Address after: No. 12, 19th Floor, Zone 1, Taifeng Building, East Section of Dangui Street, Ziliujing District, Zigong City, Sichuan Province, 643000

Applicant after: Sichuan Hengying Information Technology Service Co.,Ltd.

Address before: 110000 58 Shenbei New Area Road South, Shenyang, Liaoning.

Applicant before: LIAONING University

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant