CN115018046B

CN115018046B - Deep learning method for detecting malicious flow of mobile APP

Info

Publication number: CN115018046B
Application number: CN202210533158.XA
Authority: CN
Inventors: 陆凯; 胡香利
Original assignee: HAINAN VOCATIONAL COLLEGE OF POLITICAL SCIENCE AND LAW
Current assignee: HAINAN VOCATIONAL COLLEGE OF POLITICAL SCIENCE AND LAW
Priority date: 2022-05-17
Filing date: 2022-05-17
Publication date: 2023-09-15
Anticipated expiration: 2042-05-17
Also published as: CN115018046A

Abstract

The invention discloses a deep learning method for detecting malicious traffic of a mobile APP, which comprises the following steps of firstly, adopting a relevant information decision matrix CIDM to select network traffic characteristics, constructing the relevant information decision matrix CIDM firstly, and then carrying out attribute scoring. And secondly, detecting by adopting a malicious flow detection model based on a capsule neural network. Compared with other most advanced malicious software detection technologies, the method provided by the invention has the advantages that the accuracy and recall rate are respectively improved by 9.71% and 20.18%.

Description

Deep learning method for detecting malicious flow of mobile APP

Technical Field

The invention relates to a deep learning method for detecting malicious traffic of a mobile APP.

Background

With the popularity of the internet and mobile devices, malware has become a major threat to the growing mobile ecosystem. The statistical report of kabaski showed that by the end of 2021, the number of new malicious files detected per day reached 38.16 tens of thousands, a 6.1% increase compared to the last year. While mobile antivirus scanners provide security protection mechanisms for Android devices, more and more advanced mobile malware may still penetrate into mobile systems by bypassing these mechanisms. As mobile devices carry more and more user privacy information, there is an urgent need to develop an efficient malware detection scheme.

Malware detection techniques can be divided into three types: static analysis, dynamic analysis, and network traffic analysis. The essential difference between these three methods is that they use different functions of different malware. Static analysis methods use application code and its binary structure as features. However, to avoid detection by the antivirus scanner, malware authors use techniques such as repackaging and code obfuscation to generate malware variants. Dynamic analysis methods feature call relationships between functions during application program operation. This method needs to be done on a specific sandbox and requires sufficient execution to override the behavior of the application. When a malware author repackages malware or obfuscated code, the functionality of the above method will change significantly, resulting in degraded performance of the detection model. From another perspective, these malware variants have similar malicious behavior at runtime. In other words, malware-triggered malicious traffic is similar. The network traffic analysis takes application-triggered network traffic as a research object, and the method extracts statistical features (such as data packet size and data packet interval) or HTTP header semantic features (such as a host and a method) from the network traffic for analysis. Thus, the network traffic analysis method overcomes the drawbacks of static and dynamic analysis because certain traffic characteristics are similar even if malicious code changes significantly.

Machine learning provides a number of methods to handle malware detection. Deep learning is often a better choice if the only goal is to accurately detect malware. Research has shown that deep learning exhibits superior performance in different application areas compared to other machine learning techniques. Also, deep learning has been studied in the field of malware detection, and high performance has been achieved. However, the deep learning algorithms used for malware detection modeling are almost always based on convolutional neural networks. By pooling, convolutional neural networks facilitate analysis, some local information is lost, resulting in reduced robustness.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a deep learning method for detecting malicious traffic of a mobile APP.

The technical scheme adopted for solving the technical problems is as follows: a deep learning method for detecting malicious flow of a mobile APP, which comprises the following steps,

the first step, adopting a relevant information decision matrix CIDM to select the network flow characteristics,

firstly, constructing a relevant information decision matrix CIDM:

calculating the correlation coefficient between each pair of features using formula (1) and obtaining a correlation coefficient matrix C, wherein Var (A _i ) Is the characteristic value A calculated by the formula (2) _i Internal valuesVariance between M is the feature number, [ mu ] _i Is element A _i Average value of (2); cov (A) _i A _j ) As shown in formula (3), is a feature A _i And A _j The covariance between the two elements is 1-i and j-M, if the value of the element in the matrix C is less than 0, the element is converted into opposite numbers, namely the elements in the matrix are all nonnegative;

an initial correlation decision matrix O of the matrix C is established; each element value of each row of the current matrix O corresponds to the sequence number of its column. For example, the value of the i-th column on the left is i, and then according to the value of each row of the corresponding matrix O, each row of elements of the matrix C is arranged in an ascending order to obtain O' of CDM; finally, the iteration is statistically analyzed in the local matrix with widths to determine which features are reduced;

(1)

(2)

(3)

(4)

(5)

then attribute scoring is performed: in the local matrix, calculating the frequency of each existing feature, and combining the average value and the variance of the correlation coefficient of all the features to score; the scoring value is used as a basis for judging feature reduction; the score equation is shown as formula (6), in the formulaave（C _i ) Andvar（Ci) Representing the mean and variance of row i of matrix C, S_score (A _i ) Representing attribute A in a local matrix in a current iteration _i Statistical frequency of (i.e. score)

(6)

And secondly, detecting by adopting a malicious flow detection model based on a capsule neural network.

The processing steps of the capsule neural network are as follows: three capsules v ¹ 、v ² And v ³ Is used as input for the next capsule; v ¹ 、v ² And v ³ Respectively multiplied by two other matrices w ¹ 、w ² And w ³ Obtaining u ¹ 、u ² And u ³ The method comprises the steps of carrying out a first treatment on the surface of the Next, for u ¹ 、u ² And u ³ Performing weighted summation to obtain s obtained by extrusion; v is obtained by extrusion; parameter w ¹ And w ² Obtained by back propagation learning; c ¹ 、c ² And c ³ Is the coupling coefficient. Further, said c ¹ ， c ² And c ³ Is determined by a dynamic routing algorithm.

Further, the encapsulated neural network employs the interval loss function equation (10)

(10)

Wherein E is _k Is the presence of class k, presence of 1, absence of 0; m is m ⁺ Penalty false positives, class k present but predicted not present at 0.9; m is m ⁻ At 0.1, the penalty is false negative, there is no k class, but there is prediction.

The beneficial effects of the invention are as follows:

the advantages of the invention in the mobile APP malicious flow detection are mainly as follows:

1. the feature selection method based on the relevant information decision matrix (CIDM) is provided, soft dimension reduction is adopted to reduce the dimension of high-dimension data, the characteristic of hard dimension reduction that feature information is easy to lose in the general feature selection method is overcome, the flow feature relevant information can be effectively reserved, and the subsequent detection performance is improved.

2. A new malware detection method combining feature selection and a malware detection model is presented. The malicious software detection model is a capsule network based on a deep learning algorithm, and overcomes the defect of poor robustness of convolutional neural network pool operation data enhancement. The capsule network is said to be applied to the field of malicious software detection for the first time, and the robustness of a detection model is improved through the capsule network.

3. The effectiveness of the method of the present invention was evaluated by some detailed experimentation and our method was compared to the most advanced malware detection techniques. Experiments show that compared with other most advanced malicious software detection technologies, the method provided by the invention has the advantages that the accuracy and recall rate are respectively improved by 9.71% and 20.18%.

Drawings

FIG. 1 is a CIDM construction flow;

FIG. 2 shows the steps of the dimension reduction algorithm 1;

FIG. 3 is a step of dimension reduction algorithm 2;

FIG. 4 is a process of capsule network;

FIG. 5 is a dynamic routing algorithm step;

FIG. 6 is a processing flow of a dynamic routing algorithm of the capsule network;

Detailed Description

For a better understanding of the present invention, embodiments of the present invention are explained in detail below with reference to fig. 1-6. The invention provides a feature selection method based on a relevant information decision matrix (CIDM), wherein the related engineering process of optimizing the CIDM from an initial relevant decision matrix (CDM), a feature attribute scoring method for judging feature reduction basis and a feature selection dimension reduction algorithm according to the CIDM and the feature scoring method. The invention firstly provides a capsule network (capsule) of a deep learning algorithm applied to a mobile APP malicious flow detection method, wherein the capsule network operation mechanism is involved, and a core routing algorithm and a malicious flow allowance loss function are detected in a dynamic decision process of a coupling coefficient in operation.

Firstly, the method adopts a relevant information decision matrix (CIDM) to select the network flow characteristics, and the specific implementation steps are as follows:

(1) Building a relevant information decision matrix CIDM (Correlation Information Decision Matrix):

the reference information that determines which features are redundant is obtained from the CIDM and the CIDM is optimized from a Correlation Decision Matrix (CDM). The process of CDM generation is shown in fig. 1. First, we calculate the correlation coefficient between each pair of features using equation (1) and obtain the correlation coefficient matrix c. Where Var (a _i ) Is the characteristic value A calculated by the formula (2) _i Variance between values in, M is a feature number, [ mu ] _i Is element A _i Average value of (2). In addition, cov (A _i A _j ) As shown in formula (3), is a feature A _i And A _j Covariance between. In these formulas, 1.ltoreq.i, j.ltoreq.M. If the value of an element in matrix C is less than 0, it is converted to the opposite number, i.e. the elements in the matrix are all non-negative. Next, an initial correlation decision matrix O of matrix C is established. Each element value of each row of the current matrix O corresponds to the sequence number of its column. For example, the value of the i-th column on the left is i. Then, according to each row value of the corresponding matrix O, each row element of the matrix C is arranged in ascending order to obtain CDM O'. Finally, we statistically analyze the iteration with width in the local matrix (the boxed position in matrix O' in FIG. 1) to determine which features are reduced. Of course, in the process of constructing the relevant information decision matrix (CIDM), the element values of each row of the corresponding matrix O are arranged in ascending order, and the O ' of CDM is obtained instead of the O ' of CDM which is arranged in descending order of the element values of each row of the corresponding matrix O '

(1)

(2)

(3)

(4)

(5)

Furthermore, if we only consider the correlation of the dimension reduction, we will reduce the main features in some extreme cases, e.g. there are three vectors: a= [1,0,0,0,0,0 ]]，b=[0,1,0,0,0,0]，c=[1,1,1,0,1,0]。corrcoe⨍ (x, y) is a correlation coefficient function of two vectors x and y. Then the first time period of the first time period,corrcoe⨍（a,b）=0.2，corrcoe⨍ (a, c) =corrcoe ⨍ (b, c) =0.32. The high information content features c should be reduced according to the rules that the higher the correlation between features is, the lower the amount of information they carry. This is obviously wrong. This is like the two low information features a and b have displaced their difference c. To avoid this, in fact, this does occur in the data used in the latter experiment, and it is critical to consider the amount of information. Information processing systemaoi(i) Represented by the number of non-zero elements of feature i. All elements in matrix C are adjusted by equations (4) and (5).

Wherein, the liquid crystal display device comprises a liquid crystal display device,is a weighting factor used to adjust the range of correlations and information to the same order of magnitude as in equation (4).ave_CRepresenting the average value of all the elements in the matrix C,ave_aoi_rrepresentation ofaoi_rAverage value of (2).Gamma is used for controlling the proportion of the correlation and the information in assignment, the value range is [1, ], as is obvious from the formula (4), when gamma is equal to 1, only the correlation is considered; when it is towards positive infinity, only the amount of information is considered. Parameters (parameters)aFor controlling the intensity of the low information feature selection we take here 0.9. Now we can get the matrix CIDM according to the flow of fig. 1.

(2) Attribute scoring: local matrix is composed ofwidthColumn sumMRows are formed. In the matrix, the frequency of each existing feature is calculated, and the average value and the variance of the correlation coefficients of all the features are combined and scored. The score value is used as a basis for judging feature reduction. The scores are ranked and the iteration takes the first width feature with the largest score as the reduction object, rather than using the frequency of occurrence as the basis for the determination, the mean and variance of the correlation adds coefficients between features to avoid ambiguity in the frequency of occurrence between the same plurality of features caused when selecting the reduction object. Specifically, the score equation is shown in formula (6), in whichave（C _i ) Andvar（Ci) Representing the mean and variance of row i of matrix C, S_score (A _i ) Representing attribute A in a local matrix in a current iteration _i I.e. the score.

(6)

(3) Dimension reduction algorithm: according to CIDM and the feature scoring method, a feature selection method based on CIDM is provided. The specific implementation of the dimension reduction algorithm is pseudo-code as shown in fig. 2 and 3.

The whole dimension reduction process can be clearly understood through the description of the algorithm pseudo code. In the details of algorithm 1 of FIG. 2, parametersgoal_dimRepresenting dimensions to reduce data setsX. The purpose of setting this parameter is to match the input interface of the subsequent classification model. E is an identity matrix of M rows and M columns. The iteration number in the whole reduction process is determined iteratively.remainderIs the remainder of the difference between the initial data dimension M and the gold_dim, which is defined bywidthProviding. Parameters (parameters)widthDetermined experimentally. If it isremaindeA r not equal to 0 indicates that the algorithm has completed the operation reduction process, then a smaller remainder attribute than the iteration needs to be reduced. It should be noted that the use of a local matrix to score features in each iteration is to clarify the trade-off between multiple features that have a high degree of correlation.

For example, in extreme cases, there is a high degree of correlation between two features. Currently, only one of them needs to be reduced. How to choose between the two. Statistical scoring based on local matrices can solve this problem well.

Then, the method adopts a malicious flow detection model based on a capsule network (capsule), and comprises the following specific implementation steps:

to more clearly describe the capsule network-based detection model, we describe the capsule network detection model from three aspects, namely the operation mechanism of a single capsule, the core algorithm (dynamic routing) of the capsule network, and the loss function.

(1) And (3) capsule treatment: the processing of a capsule is shown in fig. 4, where the output of three capsules is used as input for the next capsule. Three capsules v ¹ 、v ² And v ³ Is used as input for the next capsule. v ¹ 、v ² And v ³ Respectively multiplied by two other matrices w ¹ 、w ² And w ³ Obtaining u ¹ 、u ² And u ³ . Next, for u ¹ 、u ² And u ³ The weighted sum is performed to obtain s obtained by extrusion. v is obtained by extrusion, only changing length, not direction. Parameter w ¹ And w ² Obtained by back propagation learning. c ¹ 、c ² And c ³ Referred to as the coupling coefficient. They used dynamic decisions of the capsule at the time of the test. This decision process is called dynamic routing, and details are found in the first subsection. The values of u, s and v are calculated from equations (7), (8) and (9).

(7)

(8)

(9)

(2) Dynamic routing: c ¹ ， c ² And c ³ Is determined by a dynamic routing algorithm whose pseudocode is the algorithm shown in fig. 5. First, there must be a set of parameters B, the initial values of B are all zero, where {,/>,/>,...,/>Corresponds to { c } ¹ ,c ² ,c ³ ,...,c ⁱ }. Let T iterations be run, T being a predetermined super parameter, the process flow of which is shown in fig. 6. It should be noted that, the present invention may also replace the initial value of the parameter of the core routing algorithm in the dynamic decision process of the coupling coefficient during the capsule network operation with other values, for example, the initial value 0 of the parameter B is replaced with other values.

(3) Loss function: the encapsulated neural network provides two loss functions, one edge loss for classification tasks and the other reconstruction loss for sample reconstruction. Since our task is to detect malicious traffic, gap loss is employed. The interval loss function equation is shown in equation (10).

(10)

Wherein E is _k The presence of k is 1, and the absence is 0.m is m ⁺ At 0.9, false positives are penalized, class k is present, but predicted not. m is m ⁻ At 0.1, the penalty is false negative, there is no k class, but there is prediction.

The invention adopts a specific experimental design and method for detecting the malicious flow of the mobile APP based on a capsule network, which comprises the following steps:

(1) Data set

The present invention employs the data set disclosed in the Lexical Mining of Malicious URLs for Classifying Android Malware paper. For the network traffic collection method, the above paper uses Android-based tool software monkey to send events randomly to the device to trigger network traffic during each application execution. To avoid that network traffic is mixed by different applications, they execute only one application at a time. This dataset provides information for the method, host, page, and name fields in the URL. Each sample is represented by 1708 features. Specific amounts of benign traffic and malicious traffic are shown in table 1. Our feature selection work is based on 1708 features per sample.

Label	NO.
		Benign	25,276
Malicious	11,251

Table 1 data set related attribute information

(2) Experimental device

1. Parameter setting analysis: width and γ are key parameters for dimension reduction herein. Different parameter values affect the efficiency of dimension reduction. In order to obtain proper parameter values in a data set environment and ensure high dimension reduction efficiency and stable dimension reduction process, corresponding experiments for setting width and gamma are designed.

2. And (3) feature analysis: the purpose of the feature analysis is to verify the effectiveness of the dimension reduction method we propose. To achieve this goal, we have experimented with data that is not dimensionality reduced and data that is dimensionality reduced using a classification algorithm. We use four most popular algorithms, decision Tree (DT), random Forest (RF), logistic Regression (LR), K Nearest Neighbor (KNN).

3. Model analysis: model analysis focuses on the evaluation of the capsule network, showing the performance differences of the capsule network from other deep learning networks. We selected Convolutional Neural Networks (CNNs) as the comparison object. To ensure fairness and rationality, all methods used the same training and testing sets in the evaluation experiments.

4. Comprehensive analysis: to further verify the effectiveness of this approach, we have focused on comparing our approach with other most advanced malware detection techniques in comprehensive analysis.

(3) Evaluation index

The evaluation indexes we use are accuracy, precision, recall and F-value, which are calculated based on fuzzy matrices. The fuzzy matrix is shown in table 2 where TP is true, which means that the true label of the sample is positive and the result of the model prediction is also positive. TN is true negative, meaning that the true label of the sample is negative and the model predicts it as negative. FP is false positive, meaning that the true signature of the sample is negative, but the model predicts positive. FN is false negative, meaning that the true signature of the sample is positive, but the model predicts negative. The equations for the four indices we use are shown below.

TABLE 2 fuzzy matrix

(11)

(12)

(13)

(14)

(15)

It should be noted that, in this document, unexplained terms of art are common names in the art, and method steps not described in detail are common knowledge of a person skilled in the art. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, or apparatus.

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A deep learning method for detecting malicious traffic of mobile APP is characterized by comprising the following steps,

firstly, constructing a relevant information decision matrix CIDM:

calculating the correlation coefficient between each pair of features using formula (1) and obtaining a correlation coefficient matrix C, wherein Var (A _i ) Is the characteristic value A calculated by the formula (2) _i Variance between values in, M is a feature number, μ _i Is element A _i Average value of (2); cov (A) _i A _j ) As shown in formula (3), is a feature A _i And A _j The covariance between the two elements is 1-i and j-M, if the value of the element in the matrix C is less than 0, the element is converted into opposite numbers, namely the elements in the matrix are all nonnegative;

an initial correlation decision matrix O of the matrix C is established; each element value of each row of the current matrix O corresponds to the sequence number of its column; then, according to each row value of the corresponding matrix O, arranging each row element of the matrix C in an ascending order to obtain O' of a relevant decision matrix CDM; finally, the iteration is statistically analyzed in the local matrix with widths to determine which features are reduced; the local matrix consists of width columns and M rows, and the width is the width of the columns of the local matrix; the iteration refers to counting the number of each feature in the local matrix;

then attribute scoring is performed: in the local matrix, calculating the frequency of each existing feature, and combining the average value and the variance of the correlation coefficient of all the features to score; the scoring value is used as a basis for judging feature reduction; the scoring equation is shown as formula (6), wherein Ave (C) _i ) And Var (C) _i ) Representing the mean and variance of row i of matrix C, S_score (A _i ) Representing attribute A in a local matrix in a current iteration _i Statistical frequency of (i.e. score)

Score(A _i )＝Ave(C _i )+Var(C _i )+S_score(A _i ) (6)

Reducing the attribute scores by a dimension reduction algorithm;

if the remainders are not equal to 0, indicating that the algorithm has completed the operation reduction process, then the remainder attribute smaller than the iteration needs to be reduced; the remainders are the remainder of the difference between the initial data dimension M and the goldim, which is provided by width, the parameter goldim representing the dimension.

2. The deep learning method for detecting malicious traffic of mobile APP according to claim 1, wherein the processing steps of the capsule neural network are as follows: three capsules v ¹ 、v ² And v ³ Is used as input for the next capsule; v ¹ 、v ² And v ³ Respectively multiplied by two other matrices w ¹ 、w ² And w ³ Obtaining u ¹ 、u ² And u ³ The method comprises the steps of carrying out a first treatment on the surface of the Next, the process is carried outFor u ¹ 、u ² And u ³ Performing weighted summation to obtain s obtained by extrusion; v is obtained by extrusion; parameter w ¹ And w ² Obtained by back propagation learning; c ¹ 、c ² And c ³ Is a coupling coefficient;

wherein the values of u, s and v are calculated from formulas (7), (8) and (9):

u ⁱ ＝v ⁱ w ⁱ (7)

3. the deep learning method for detecting malicious traffic of mobile APP as defined in claim 2, wherein c ¹ ，c ² And c ³ Is determined by a dynamic routing algorithm.

4. A deep learning method for detecting malicious traffic of mobile APP as claimed in claim 2, wherein the capsule neural network employs a space loss function equation (10)

L _k ＝E _k max(0，m ⁺ -||v _k ||) ² +λ(1-E _k )max(0，||v _k ||-m ^- ) ² (10)

Wherein E is _k Is the presence of class k, presence of 1, absence of 0; m is m ⁺ Penalty false positives, class k present but predicted not present at 0.9; m is m ^- A penalty of 0.1 is false negative, there is no k class, but there is prediction; v _k And the prediction result of the capsule network for predicting a certain training sample as k classes is represented.