CN115018046A

CN115018046A - Deep learning method for detecting malicious traffic of mobile APP

Info

Publication number: CN115018046A
Application number: CN202210533158.XA
Authority: CN
Inventors: 陆凯; 胡香利
Original assignee: HAINAN VOCATIONAL COLLEGE OF POLITICAL SCIENCE AND LAW
Current assignee: HAINAN VOCATIONAL COLLEGE OF POLITICAL SCIENCE AND LAW
Priority date: 2022-05-17
Filing date: 2022-05-17
Publication date: 2022-09-06
Anticipated expiration: 2042-05-17
Also published as: CN115018046B

Abstract

The invention discloses a deep learning method for detecting malicious traffic of a mobile APP (application), which comprises the following steps of firstly, adopting a related information decision matrix CIDM (common information model) to select network traffic characteristics, firstly constructing the related information decision matrix CIDM, and then carrying out attribute scoring. And secondly, detecting by adopting a malicious flow detection model based on a capsule neural network. Compared with other most advanced malicious software detection technologies, the method provided by the invention has the advantages that the accuracy rate and the recall rate are respectively improved by 9.71% and 20.18%.

Description

Deep learning method for detecting malicious traffic of mobile APP

Technical Field

The invention relates to a deep learning method for detecting malicious traffic of a mobile APP.

Background

With the popularity of the internet and mobile devices, malware has become a major threat to the growing mobile ecosystem. The statistical report of kabushi showed that by the end of 2021, the number of new malicious files detected each day reached 38.16 ten thousand, which increased by 6.1% compared to the last year. Although mobile antivirus scanners provide security protection mechanisms for Android devices, more and more advanced mobile malware may still penetrate into mobile systems by bypassing these mechanisms. As more and more user privacy information is carried by mobile devices, development of an efficient malware detection scheme is urgently needed.

Malware detection techniques can be divided into three types: static analysis, dynamic analysis, and network traffic analysis. The essential difference between these three methods is that they use different functions of different malware. The static analysis method uses the application code and its binary structure as features. However, to avoid being detected by antivirus scanners, malware authors use techniques such as repackaging and code obfuscation to generate malware variants. The dynamic analysis method is characterized by calling relation between functions during the running of the application program. This method needs to be done on a specific sandbox and needs enough execution to cover the behavior of the application. When a malware author repackages malware or obfuscated code, the functionality of the above method will change significantly, resulting in a degradation of the performance of the detection model. From another perspective, at runtime, these malware variants have similar malicious behavior. In other words, malware-triggered malicious traffic is similar. The network traffic analysis takes application-triggered network traffic as a research object, and the method extracts statistical characteristics (such as data packet size and data packet interval) or HTTP header semantic characteristics (such as host and method) from the network traffic for analysis. Thus, the network traffic analysis method overcomes the disadvantages of static and dynamic analysis, because some traffic characteristics are similar even if malicious code changes significantly.

Machine learning provides a number of methods to handle malware detection. Deep learning is often a better option if the only goal is to accurately detect malware. Research shows that deep learning shows excellent performance in different application fields compared with other machine learning technologies. Also, deep learning has been studied in the field of malware detection, and high performance has been achieved. However, deep learning algorithms for malware detection modeling are almost always based on convolutional neural networks. By pooling, the convolutional neural network facilitates analysis, and some local information is lost, resulting in reduced robustness.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a deep learning method for detecting malicious traffic of a mobile APP.

The technical scheme adopted by the invention for solving the technical problems is as follows: a deep learning method for detecting malicious traffic of mobile APP comprises the following steps,

firstly, a related information decision matrix CIDM is adopted to select network flow characteristics,

firstly, constructing a related information decision matrix CIDM:

the correlation coefficient between each pair of features is calculated using equation (1) and a matrix of correlation coefficients C is obtained, where Var (A) _i ) Is a characteristic value A calculated by the formula (2) _i Variance between values, M is the characteristic number, mu _i Is an element A _i Average value of (d); cov (A) _i A _j ) As shown in formula (3), is characteristic A _i And A _j The covariance between i and M is more than or equal to 1 and j is more than or equal to M, if the value of the elements in the matrix C is less than 0, the elements are converted into opposite numbers, namely the elements in the matrix are all non-negative;

establishing an initial correlation decision matrix O of the matrix C; each element value of each row of the current matrix O corresponds to a sequence number of its column. For example, the value of the ith left column is i, and then the elements of each row of the matrix C are arranged in ascending order according to the value of each row of the corresponding matrix O, so as to obtain O' of CDM; finally, the iteration is statistically analyzed by width in the local matrix to determine which features have been reduced;

(1)

(2)

(3)

(4)

(5)

then, attribute scoring is carried out: calculating the frequency of each existing characteristic in a local matrix, and combining the mean value and the variance of the correlation coefficients of all the characteristics for scoring; the score value is used as a basis for judging feature reduction; the scoring equation is shown in equation (6), in whichave（C _i ) Andvar（Ci) Means and variance, S _ score (A), of row i of the matrix C _i ) Representing the attribute A in the local matrix in the current iteration _i The statistical frequency of (1) is obtained

(6)

And secondly, detecting by adopting a malicious flow detection model based on the capsule neural network.

The capsule neural network comprises the following processing steps: three capsules v ¹ 、v ² And v ³ Is used as input for the next capsule; v. of ¹ 、v ² And v ³ Are multiplied by two other matrices w respectively ¹ 、w ² And w ³ To obtain u ¹ 、u ² And u ³ (ii) a Then, for u ¹ 、u ² And u ³ Performing a weighted sum to obtain s obtained by extrusion; v is obtained by extrusion; parameter w ¹ And w ² Obtained by back propagation learning; c. C ¹ 、c ² And c ³ Is the coupling coefficient. Further, said c ¹ ， c ² And c ³ Is determined by a dynamic routing algorithm.

Further, the capsule neural network uses the interval loss function equation (10)

(10)

Wherein E _k Is the presence of k groups, the presence is 1, the absence is 0; m is a unit of ⁺ 0.9, punishment false positive, presence of class k, but absence of prediction; m is ⁻ At 0.1, the penalty is false negative, there is no class k, but there is a prediction.

The invention has the beneficial effects that:

the method has the following advantages in the mobile APP malicious flow detection:

1. a characteristic selection method based on a related information decision matrix (CIDM) is provided, soft dimension reduction is adopted to reduce the dimension of high-dimensional data, the characteristic of hard dimension reduction that characteristic information is easy to lose in a common characteristic selection method is overcome, flow characteristic correlation information can be effectively reserved, and subsequent detection performance is improved.

2. A new malware detection method is proposed that combines feature selection and a malware detection model. The malicious software detection model is a capsule network based on a deep learning algorithm, and overcomes the defect of poor robustness of the enhancement of operational data of a convolutional neural network pool. It is known that the capsule network is applied to the field of malware detection for the first time, and the robustness of a detection model is improved through the capsule network.

3. The effectiveness of the method of the present invention was evaluated by some detailed experiments and our method was compared with the most advanced malware detection techniques. Experiments show that compared with the most advanced malicious software detection technology, the method provided by the invention has the advantages that the accuracy rate and the recall rate are respectively improved by 9.71 percent and 20.18 percent.

Drawings

FIG. 1 is a CIDM construction process;

FIG. 2 is a step of dimension reduction algorithm 1;

FIG. 3 is a step of dimension reduction algorithm 2;

FIG. 4 is a process of a capsule network;

FIG. 5 is a step of a dynamic routing algorithm;

FIG. 6 is a processing flow of a capsule network dynamic routing algorithm;

Detailed Description

For a better understanding of the present invention, embodiments of the present invention are explained in detail below with reference to fig. 1 to 6. The invention provides a characteristic selection method based on a related information decision matrix (CIDM), wherein a related engineering process of optimizing the CIDM from an initial related decision matrix (CDM), a characteristic attribute scoring method for judging a characteristic reduction basis and a characteristic selection dimension reduction algorithm according to the CIDM and a characteristic scoring method are provided. The invention provides a method for applying a capsule network (Capsnet) with a deep learning algorithm to mobile APP malicious traffic detection for the first time, wherein a related capsule network operation mechanism, a core routing algorithm in a coupling coefficient dynamic decision process in operation and a malicious traffic detection margin loss function are provided.

Firstly, the method of the invention adopts a related information decision matrix (CIDM) to select network flow characteristics, and the specific implementation steps are as follows:

(1) constructing a related Information Decision matrix CIDM (correlation Information Decision matrix):

reference information is obtained from the CIDM that determines which features are redundant, and the CIDM is optimized from a Correlation Decision Matrix (CDM). The generation process of CDM is shown in fig. 1. First, we calculate the correlation coefficient between each pair of features using equation (1) and obtain the correlation coefficient matrix c _i ) Is a characteristic value A calculated by the formula (2) _i Variance between values, M is the number of features, mu _i Is an element A _i Average value of (a). Furthermore, Cov (A) _i A _j ) As shown in formula (3), is characteristic A _i And A _j The covariance between. In these equations, 1 ≦ i, j ≦ M, if the value of the elements in matrix C is less than 0, it is converted to the opposite number, i.e., the elements in the matrix are all non-negative. Second, an initial correlation decision matrix O for matrix C is established. Each element value of each row of the current matrix O corresponds to a sequence number of its column. For example, the value of the ith left column is i. Then, the elements of each row of matrix C are arranged in ascending order according to the value of each row of corresponding matrix O to obtain CDMO' is added. Finally, we perform a statistical analysis of the iteration with width in the local matrix (boxed position in matrix O' in FIG. 1) to determine which features are reduced. Of course, it is also possible to arrange the element values of each row of corresponding matrix O in ascending order and obtain O ' of CDM instead of arranging the element values of each row of corresponding matrix O in descending order to obtain O ' of CDM in constructing a related information decision matrix (CIDM) '

(1)

(2)

(3)

(4)

(5)

Furthermore, if we consider only the reduced-dimension correlation, we will reduce the dominant features in some extreme cases, e.g., there are three vectors: a = [1,0,0,0,0,0 ]]，b=[0,1,0,0,0,0]，c=[1,1,1,0,1,0]。corrcoe⨍ (x, y) is a correlation coefficient function of two vectors x and y. Then it is determined that,corrcoe⨍（a,b）=0.2，corrcoe⨍ (a, c) = corrcoe ⨍ (b, c) = 0.32. The feature c with high information content should be reduced according to the rule that the higher the correlation between features, the lower the amount of information they carry. This is clearly erroneous. This is as if the two low information features a and b were ruling out their difference c. To avoid this, in fact, it does happen that this happens in the data used in the latter experiment, taking into accountIt is important to consider the amount of information. Informationaoi(i) Represented by the number of non-zero elements of feature i. All elements in the matrix C are adjusted by formula (4) and formula (5).

Wherein the content of the first and second substances,

is a weighting factor that adjusts the range of correlations and information to the same order of magnitude in equation (4).ave_CRepresents the average of all the elements in the matrix C,ave_aoi_rto representaoi_rAverage value of (a). Gamma is used for controlling the proportion of correlation and information in assignment, and the value range is [1, ∞ ]. As is obvious from the formula (4), when gamma is equal to 1, only the correlation is considered; when it is towards being infinite, only the amount of information is considered. Parameter(s)aFor controlling the strength of the low information feature selection, we take 0.9 here. Now we can get the matrix CIDM according to the flow of fig. 1.

(2) And (3) attribute scoring: local matrix formed bywidthColumn sumMAnd (4) row composition. In this matrix, the frequency of each existing feature is calculated, and the mean and variance of the correlation coefficients for all features are combined and scored. The score value is used as a basis for judging feature reduction. The scores are sorted, and the iteration takes the first width feature with the largest score as a reduction object instead of using the occurrence frequency as a judgment basis, and adding coefficients among features by the mean value and the variance of the correlation so as to avoid the occurrence frequency among the same features caused by ambiguity when the reduction object is selected. Specifically, the score equation is shown in equation (6), whereave（C _i ) Andvar（Ci) Means and variance, S _ score (A), of row i of matrix C _i ) Representing the attribute A in the local matrix in the current iteration _i The statistical frequency of (1), i.e. the score.

(6)

(3) And (3) dimension reduction algorithm: according to the CIDM and a feature scoring method, a feature selection method based on the CIDM is provided. The pseudo code for the specific implementation of the dimension reduction algorithm is shown in fig. 2 and 3.

Through the description of the pseudo code of the algorithm, the whole dimension reduction process can be clearly understood. In the details of Algorithm 1 of FIG. 2, the parametersgoal_dimRepresenting dimensions to reduce data setX. The purpose of this parameter is to match the input interface of the subsequent classification model. E is an identity matrix of M rows and M columns. And (5) iterating to determine the iteration number in the whole reduction process.remainderIs the remainder of the difference between the original data dimension M and the gold _ dim, the difference beingwidthProvided is a method. Parameter(s)widthAs determined by experimentation. If it is notremaindeAn r not equal to 0 indicates that the algorithm has completed the operation reduction process, and a reduction of the remainder property smaller than the iteration is required. It should be noted that the use of a local matrix to score features in each iteration is a trade-off between defining features that are highly correlated.

For example, in an extreme case, there is a high correlation between two features. Currently, only one needs to be reduced. How to choose between the two. Statistical scoring based on local matrices may solve this problem well.

Then, the method adopts a malicious flow detection model based on a capsule network (Capsnet), and comprises the following specific implementation steps:

to introduce the detection model based on the capsule network more clearly, we describe the capsule network detection model from three aspects of the operation mechanism of the single capsule, the core algorithm (dynamic routing) and the loss function of the capsule network.

(1) And (3) capsule treatment: the processing of the capsules is shown in fig. 4, where the output of three capsules serves as the input for the next capsule. Three capsules v ¹ 、v ² And v ³ Is used as input for the next capsule. v. of ¹ 、v ² And v ³ Are multiplied by two other matrices w respectively ¹ 、w ² And w ³ To obtain u ¹ 、u ² And u ³ . Then, for u ¹ 、u ² And u ³ Weighted sum is carried out to obtain a product obtained by extrusionAnd s. v is obtained by extrusion, only changing length and not direction. Parameter w ¹ And w ² Obtained by back propagation learning. c. C ¹ 、c ² And c ³ Referred to as the coupling coefficient. They used the dynamic decision of the capsule at the time of testing. This decision process is called dynamic routing and the details are given in the next subsection. The values of u, s and v are calculated from the equations (7), (8) and (9).

(7)

(8)

(9)

(2) Dynamic routing: c. C ¹ ， c ² And c ³ The selection of (c) is determined by a dynamic routing algorithm, the pseudo code of which is the algorithm shown in fig. 5. First, there must be a set of parameters B, the initial values of which are all zero, wherein

,

,

,...,

Correspond to { c } ¹ ,c ² ,c ³ ,...,c ⁱ }. Assuming that T iterations are run, T being a predetermined hyper-parameter, the process flow is shown in fig. 6. It should be noted that the initial value of the parameter of the core routing algorithm in the dynamic decision process of the coupling coefficient during the operation of the capsule network can be replaced by other values, for exampleSuch as the initial value 0 of the parameter B instead.

(3) Loss function: the capsule neural network provides two loss functions, one is edge loss, for classification tasks, and the other is reconstruction loss, for sample reconstruction. As our task is to detect malicious traffic, gap loss is exploited. The interval loss function equation is shown in equation (10).

(10)

Wherein E _k Is the presence of k classes, with presence being 1 and absence being 0. m is ⁺ At 0.9, false positives are penalized, class k is present, but not predicted. m is ⁻ At 0.1, the penalty is false negative, there is no class k, but there is a prediction.

The invention adopts a specific experimental design and a method for detecting malicious traffic of mobile APP based on a capsule network, and the method comprises the following steps:

(1) data set

The invention discloses a data set in a Lexical Mining of macromolecular URLs for classic Android Malware paper. For the network traffic collection method, the above paper randomly sends some events to the device using the Android-based tool software monkey to trigger network traffic during each application execution. To avoid that network traffic is mixed by different applications, they execute only one application at a time. This dataset provides information for the method, host, page, and name fields in the URL. Each sample is represented by 1708 features. The specific amounts of benign traffic and malicious traffic are shown in table 1. Our feature selection work was based on 1708 features per sample.

Label	NO.
		Benign	25,276
Malicious	11,251

TABLE 1 data set-related Attribute information

(2) Experimental device

1. Parameter setting analysis: width and γ are key parameters for dimension reduction herein. Different parameter values may affect the efficiency of the dimensionality reduction. In order to obtain appropriate parameter values in a data set environment and ensure high dimensionality reduction efficiency and a stable dimensionality reduction process, experiments for setting the width and gamma are designed correspondingly.

2. Characteristic analysis: the purpose of the feature analysis is to verify the validity of the dimension reduction method we propose. To achieve this goal, we have experimented with data without and with reduced dimensions using a classification algorithm. We use four of the most popular algorithms, Decision Tree (DT), Random Forest (RF), Logistic Regression (LR), K-nearest neighbor (KNN).

3. And (3) analyzing a model: model analysis focuses on the evaluation of the capsule network, showing the performance difference between the capsule network and other deep learning networks. We chose Convolutional Neural Networks (CNNs) as comparison objects. To ensure fairness and rationality, all methods used the same training and test sets in the evaluation experiments.

4. Comprehensive analysis: to further validate the effectiveness of this approach, we focused our approach on other most advanced malware detection techniques in the integrated analysis.

(3) Evaluation index

The evaluation metrics we used are accuracy, precision, recall, and F-value, which are calculated based on the fuzzy matrix. The fuzzy matrix is shown in table 2 where TP is true, meaning that the true label of the sample is positive and the result of the model prediction is also positive. TN is true negative, meaning that the true label of the sample is negative, and the model predicts it as negative. FP was false positive, meaning that the true signature of the sample was negative, but the model predicted positive. FN was false negative, which means that the true signature of the sample was positive, but the model predicted negative. The equations we use for the four indices are shown below.

TABLE 2 fuzzy matrix

(11)

(12)

(13)

(14)

(15)

It is to be noted that, in this context, unexplained terms are generic names in the art, and method steps not described in detail are also common knowledge of the person skilled in the art. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, or apparatus.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A deep learning method for detecting malicious traffic of mobile APP is characterized by comprising the following steps,

firstly, constructing a related information decision matrix CIDM:

calculating the correlation coefficient between each pair of features by using the formula (1) and obtaining a correlation coefficient matrix C, wherein Var (A) _i ) Is a characteristic value A calculated by the formula (2) _i Variance between values, M is the number of features, mu _i Is an element A _i Average value of (d); cov (A) _i A _j ) As shown in formula (3), is characteristic A _i And A _j The covariance between i and M is more than or equal to 1 and j is more than or equal to M, if the value of the elements in the matrix C is less than 0, the elements are converted into opposite numbers, namely the elements in the matrix are all non-negative;

establishing an initial correlation decision matrix O of the matrix C; each element value of each row of the current matrix O corresponds to a sequence number of its column.

2. For example, the value of the ith left column is i, and then the elements of each row of the matrix C are arranged in ascending order according to the value of each row of the corresponding matrix O, so as to obtain O' of CDM; finally, the iteration is statistically analyzed by width in the local matrix to determine which features are reduced;

(1)

(2)

(3)

(4)

(5)

(6)

3. The method of claim 1, wherein the capsule neural network is processed by: three capsules v ¹ 、v ² And v ³ The output vector of (a) is used as input for the next capsule; v. of ¹ 、v ² And v ³ Are multiplied by two other matrices w respectively ¹ 、w ² And w ³ To obtain u ¹ 、u ² And u ³ (ii) a Then, for u ¹ 、u ² And u ³ Performing a weighted sum to obtain s obtained by extrusion; v is obtained by extrusion; parameter w ¹ And w ² Obtained by back propagation learning; c. C ¹ 、c ² And c ³ Is the coupling coefficient.

4. The deep learning method for detecting mobile APP malicious traffic as claimed in claim 2, wherein c is ¹ ， c ² And c ³ Is determined by a dynamic routing algorithm.

5. The deep learning method for mobile APP malicious traffic detection as claimed in claim 2, wherein the capsule neural network employs interval loss function equation (10)

(10)

Wherein E _k Is the presence of k groups, with presence being 1 and absence being 0; m is ⁺ 0.9, punishment false positive, presence of class k, but absence of prediction; m is ⁻ At 0.1, the penalty is false negative, there is no class k, but there is a prediction.