CN115314239A

CN115314239A - Analysis method and related equipment for hidden malicious behaviors based on multi-model fusion

Info

Publication number: CN115314239A
Application number: CN202210708183.7A
Authority: CN
Inventors: 刘西广; 周岐文; 申永利; 邓敦毅; 朱良军; 贾润枝; 崔华义; 王士龙
Original assignee: SHANDONG HIGHWAY ENGINEERING TECHNOLOGY RESEARCH CENTER CO LTD; Tianjin Haidel Technology Co ltd; China National Chemical Communications Construction Group Coltd
Current assignee: SHANDONG HIGHWAY ENGINEERING TECHNOLOGY RESEARCH CENTER CO LTD; Tianjin Haidel Technology Co ltd; China National Chemical Communications Construction Group Coltd
Priority date: 2022-06-21
Filing date: 2022-06-21
Publication date: 2022-11-08

Abstract

The application provides a multi-model fused analysis method and related equipment for hidden malicious behaviors, which comprise the following steps: acquiring encrypted flow data and preprocessing the encrypted flow data; inputting the preprocessed encrypted flow data into a pre-trained fusion analysis model, and outputting analysis result data through the fusion analysis model; in response to determining that the analysis result data is smaller than a preset threshold, the encrypted traffic data is concealed malicious behavior traffic data; and in response to determining that the analysis result data is greater than or equal to the preset threshold, the encrypted traffic data is normal traffic data. The method and the device have the advantages that the characteristics are fully extracted by constructing the fusion analysis model, and the detection precision of the hidden malicious behaviors is remarkably improved.

Description

Analysis method and related equipment for hidden malicious behaviors based on multi-model fusion

Technical Field

The application relates to the technical field of network security, in particular to an analysis method and related equipment for hidden malicious behaviors based on multi-model fusion.

Background

In the communication process, various encryption technologies are often adopted to encrypt information to protect information content, the encryption technologies also bring hidden dangers to network security while protecting privacy, and a network attacker utilizes the encryption technologies to hide malicious behaviors, avoids detection of a network detection system, implements network attack and affects information security.

The current encryption flow identification method mainly comprises six types: a method based on load randomness detection, a classification method based on effective load, a classification method based on data packet distribution, a classification method based on deep learning, a classification method based on host behavior, and a hybrid method combining multiple strategies; the classification method based on deep learning is a mainstream encryption traffic identification method, the currently common deep learning method comprises a deep confidence network DBN, a convolutional neural network CNN, a deep self-encoder AE, a recurrent neural network RNN and the like which are widely applied to encryption traffic analysis, but the problem of insufficient feature learning exists when a single model is used for encryption traffic identification, and the detection precision is influenced.

Disclosure of Invention

In view of this, an object of the present application is to provide an analysis method for malicious behavior hiding based on multi-model fusion and related devices, which utilize multiple deep learning models to fully extract features and improve detection accuracy.

Based on the above purpose, the present application provides a method for analyzing hidden malicious behaviors based on multi-model fusion, which includes:

acquiring encrypted flow data and preprocessing the encrypted flow data;

inputting the preprocessed encrypted flow data into a pre-trained fusion analysis model, and outputting analysis result data through the fusion analysis model;

in response to the fact that the analysis result data are smaller than a preset threshold value, the encrypted flow data are hidden malicious behavior flow data;

and in response to determining that the analysis result data is greater than or equal to the preset threshold, the encrypted traffic data is normal traffic data.

Based on the same inventive concept, the application also provides an analysis device for hidden malicious behaviors based on multi-model fusion, which comprises:

a data processing module configured to: acquiring encrypted flow data and preprocessing the encrypted flow data;

a data analysis module configured to: inputting the preprocessed encrypted flow data into a pre-trained fusion analysis model, and outputting analysis result data through the fusion analysis model;

a data determination module configured to: in response to the fact that the analysis result data are smaller than a preset threshold value, the encrypted flow data are hidden malicious behavior flow data; and in response to determining that the analysis result data is greater than or equal to the preset threshold, the encrypted traffic data is normal traffic data.

Based on the same inventive concept, the present application further provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable by the processor, wherein the processor implements the method as described above when executing the computer program.

Based on the same inventive concept, the present application also provides a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method as described above.

As can be seen from the above, according to the analysis method and the related device for hidden malicious behaviors based on multi-model fusion, the acquired encrypted traffic data is preprocessed and then input into the pre-trained fusion analysis model to obtain analysis result data, and whether the hidden malicious behaviors exist in the encrypted traffic data is judged by comparing the analysis result data with the preset threshold; compared with a detection model constructed by single deep learning, the fusion analysis model has more comprehensive characteristic identification on data and wider applicable data types, thereby improving the detection precision on the encrypted flow data.

Drawings

In order to more clearly illustrate the technical solutions in the present application or related technologies, the drawings required for the embodiments or related technologies in the following description are briefly introduced, and it is obvious that the drawings in the following description are only the embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a flowchart illustrating a method for analyzing concealed malicious behavior according to an embodiment of the present disclosure;

FIG. 2 is a flow chart of data preprocessing according to an embodiment of the present application;

FIG. 3 is a diagram illustrating an MLP model according to an embodiment of the present disclosure;

FIG. 4 is a schematic structural diagram of a 1D-CNN model according to an embodiment of the present application;

FIG. 5 is a schematic structural diagram of an LSTM model according to an embodiment of the present application;

FIG. 6 is a flow chart of pre-training of a fusion analysis model according to an embodiment of the present application;

fig. 7 is a block diagram of an analysis apparatus for concealing malicious behaviors according to an embodiment of the present application;

fig. 8 is a block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is further described in detail below with reference to the accompanying drawings in combination with specific embodiments.

It should be noted that technical terms or scientific terms used in the embodiments of the present application should have a general meaning as understood by those having ordinary skill in the art to which the present application belongs, unless otherwise defined. The use of "first," "second," and similar terms in the embodiments of the present application is not intended to indicate any order, quantity, or importance, but rather is used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that the element or item preceding the word comprises the element or item listed after the word and its equivalent, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", and the like are used merely to indicate relative positional relationships, and when the absolute position of the object being described is changed, the relative positional relationships may also be changed accordingly.

The background technology shows that the detection of the hidden malicious behavior in the encrypted traffic data is mainly a classification method based on a single deep learning model at present, when the single deep learning model is used for feature learning, only a part of features are learned, the judgment of the hidden malicious behavior is carried out according to the learned part of features, and the detection precision is influenced because the model cannot learn a certain class of features.

In order to solve the problem of detection of hidden malicious behaviors in encrypted traffic data in the related technology, the method and the device are based on fusion construction of a fusion analysis model for feature learning by multiple deep learning models such as a multilayer perceptron MLP, a one-dimensional convolutional neural network 1D-CNN, a long-short term memory LSTM network and the like, analysis result data are output, whether the hidden malicious behaviors exist in the encrypted traffic data or not is judged by comparing the analysis result data with a preset threshold value, misjudgment caused by insufficient feature learning of a single deep learning model is reduced, and detection precision is improved.

Embodiments of the present application are described in detail below with reference to the accompanying drawings.

The application provides an analysis method of hidden malicious behaviors based on multi-model fusion, which refers to fig. 1 and comprises the following steps:

s100, acquiring encrypted flow data and preprocessing the encrypted flow data;

and screening out encrypted flow data from the received information, eliminating irrelevant data which possibly influences the detection precision or the detection speed after the encrypted flow data are segmented before the encrypted flow data are input into the fusion analysis model, unifying the encrypted flow data without the irrelevant data into a preset data length, and converting the encrypted flow data into a data format which can be identified by the fusion analysis model.

Step S200: inputting the preprocessed encrypted flow data into a pre-trained fusion analysis model, and outputting analysis result data through the fusion analysis model;

after the preprocessed encrypted flow data are input into the pre-trained fusion analysis model, a plurality of deep learning models contained in the fusion analysis model respectively perform feature learning on the preprocessed encrypted flow data, output respective recognition result data, and calculate the result data according to a weighting formula arranged in the fusion analysis model to obtain analysis result data.

Step S300, in response to the fact that the analysis result data are smaller than a preset threshold value, the encrypted flow data are hidden malicious behavior flow data;

the analysis result data is the probability that the encrypted flow data is normal flow data, when the analysis result data is smaller than a preset threshold value, the encrypted flow data is indicated to contain hidden malicious behaviors, the encrypted flow data is the hidden malicious behavior data, and the preset threshold value is set in advance according to actual conditions.

Step S400: and responding to the analysis result data which is determined to be larger than or equal to the preset threshold value, and then encrypting the flow data into normal flow data.

As described above, the analysis result data is the probability that the encrypted traffic data is the normal traffic data, and when the analysis result data is greater than or equal to the preset threshold, it is indicated that the encrypted traffic data does not include the malicious hidden behavior, and the encrypted traffic data is the normal traffic data.

As an alternative embodiment, referring to fig. 2, the preprocessing of the encrypted traffic data in step S100 in the foregoing embodiment specifically includes:

step S110: carrying out flow splitting on the encrypted flow data;

if the acquired encrypted flow data is too large, the processing speed is very slow, even no response occurs, and in order to avoid such events, the encrypted flow data is firstly split into a plurality of small files and stored after being acquired.

Step S120: carrying out data cleaning on the encrypted flow data subjected to flow splitting;

the split encrypted flow data may contain interference data affecting the detection accuracy, such as an ARP data packet, a TCP data packet without a load, and IP information and port information in TCP and UDP data, and such data needs to be removed from the traffic split encrypted flow data, so as to improve the detection accuracy.

Step S130: carrying out flow optimization on the encrypted flow data subjected to data cleaning to remove redundant data;

the encrypted flow data cleaned by the data may have empty files or duplicate files, such redundant data can reduce the detection speed, the empty files need to be completely removed from the data, only one duplicate file is reserved, and the detection speed is improved.

Step S140: unifying the data length of the encrypted flow after the flow optimization, and converting the data length into a preset data format.

In the detection process, a single variable is required to be controlled, the possible influence of irrelevant factors on the detection result is avoided, the data lengths are all unified into a fixed length, wherein the same data can be intercepted according to different byte lengths to form a plurality of comparison groups which are respectively input into a pre-trained fusion analysis model, the byte length corresponding to one comparison group with the highest accuracy is taken as the unified fixed length, and the unified length process refers to the interception processing of the part of the data length exceeding the fixed length and the tail filling processing of the part of the data length less than the fixed length; the fusion analysis model comprises a plurality of deep learning models, and the encrypted flow data with uniform length can be input into the plurality of deep learning models for feature learning, so that the data format can be unified into a data format which can be recognized by the models, such as IDX format.

As an alternative embodiment, referring to FIGS. 3, 4 and 5, the fusion analysis model includes at least a multi-layered perceptron MLP, a one-dimensional convolutional neural network 1D-CNN and a long-short term memory LSTM network; the multi-layer perceptron MLP at least comprises an input layer, a hidden layer, a Dropout layer and an output layer, the one-dimensional convolutional neural network 1D-CNN at least comprises a convolutional layer, a pooling layer, a Batchnorm layer, a flat layer, the Dropout layer and a full connection layer, and the long-short term memory LSTM network at least comprises an LSTM layer, a Dropout layer and a full connection layer.

For example, as shown in fig. 3, the multi-layer perceptron MLP may be implemented by entering preprocessed data into an embedded layer of a fusion analysis model for encoding, converting the preprocessed data into a vector form, and inputting the vector into an input layer, where the input layer needs to flatten the vector, where flattening refers to one-dimensional input, where the input layer may be equivalent to a flattened layer, the flattened data may enter a hidden layer for feature learning, and there is a Dropout layer behind the hidden layer, and the Dropout layer may discard the neural network unit in the previous hidden layer at random to avoid occurrence of overfitting, where overfitting refers to obtaining information from unrelated data in a training process and expressing the information in parameters of a model structure. The hidden layer and the Dropout layer are both provided with multiple layers, the more the layers are, the more the learned characteristics are, the more the consumed time is, the more the detection performance is, the linear rise of the detection performance can not follow, wherein the main indexes of the detection performance comprise accuracy, precision and recall rate, the recall rate is the ratio of the number of the predicted positive samples to the number of the actual positive samples, so that the proper layers of the hidden layer and the Dropout layer can be selected in the practical application process according to the practical use condition, and the last layer of the model is an output layer and is responsible for outputting result data.

The one-dimensional convolutional neural network 1D-CNN can be as shown in figure 4, preprocessed data firstly enter an embedded layer of a fusion analysis model to be coded, the preprocessed data are converted into a vector form and then input into a convolutional layer to learn characteristics, the result of the convolutional layer can enter a Batchnorm layer, the result obtained by the convolutional layer is adjusted back to 0-1 by normal distribution, the training speed is improved, the probability of gradient explosion and gradient disappearance is reduced, the result of the Batchnorm layer can be output into an activation function, the activation function can introduce nonlinear factors into the neural network, the neural network can fit various curves, the result of the activation function is output into a pooling layer to be subjected to characteristic compression, the characteristic compression can reduce the complexity of the data and can prevent the data from being over-fitted, the result output by the pooling layer sequentially enters the convolutional layer, the Batchnorm layer and the pooling layer, detection performance indexes of different numbers of the convolutional layer, the Batchnorm layer and the pooling layer can be obtained by test, the data finally output by the convolutional layer, input into a full-connected layer after the data is input into a full-connected layer, and the Dropout layer is finally output to a full-connected neural network.

The long-short term memory LSTM network can be as shown in figure 5, the preprocessed data enter the embedding layer of the fusion analysis model to be coded, the preprocessed data are converted into a vector form and then input into the LSTM layer to be subjected to feature learning, a Dropout layer is arranged behind the LSTM layer, the relation between the number of layers of the LSTM layer and the Dropout layer and the detection performance index can be obtained through test, and the result data are output by the full connection layer.

As an alternative embodiment, inputting the preprocessed encrypted traffic data into a pre-trained fusion analysis model, and outputting analysis result data via the fusion analysis model, including: respectively inputting the encrypted flow data into a multilayer perceptron MLP, a one-dimensional convolutional neural network 1D-CNN and a long-short term memory LSTM network, and outputting respective result data through the multilayer perceptron MLP, the one-dimensional convolutional neural network 1D-CNN and the long-short term memory LSTM network; and calculating based on all the result data to obtain analysis result data.

The method comprises the steps that encrypted flow data are respectively input into a multilayer perceptron MLP, a one-dimensional convolutional neural network 1D-CNN and a long and short term memory LSTM network, feature learning is carried out, DNS (Domain Name System) hidden tunnel detection is carried out, if the detection result is normal flow data, the result data output by an output layer or a full connection layer is 1, if the encrypted flow data contain hidden malicious behaviors, the result data output by the output layer or the full connection layer is 0, wherein the DNS hidden tunnel detection means that whether indexes of DNS behavior features in the features learned by a deep learning model exceed normal limits or not, and the DNS behavior features comprise the maximum number of requests per minute, the maximum number of requests in a single Domain, different city numbers of resolved IP addresses and the like. The multilayer perceptron MLP, the one-dimensional convolutional neural network 1D-CNN and the long-short term memory LSTM network are distributed with different weights, and the result data of each deep learning model and the product of the weight are added to obtain the analysis result data.

As an alternative embodiment, referring to fig. 6, the pre-training of the S200 fusion analysis model in the foregoing embodiment specifically includes:

step S210: constructing an initial training set;

training a model needs to collect enough data, the data collection can be manually collected or a proper network security data set can be directly selected, the collected data is labeled by setting a label, a normal flow label is set to be 1, a malicious flow label is set to be 0, all labeled data jointly form an initial training set, and the ratio of the selected data quantity of normal flow to the abnormal flow is kept in a reasonable range so as to optimize the training effect.

Step S220: preprocessing an initial training set;

the data in the initial data set is subjected to flow splitting, data cleaning, flow optimization and data length unification, and is converted into a data format recognizable by a fusion analysis model, and the detection accuracy and the detection speed can be improved by preprocessing the data in the initial training set.

Step S230: dividing the preprocessed initial training set into a training set, a verification set and a test set;

dividing the proportion of the training set and the verification set according to the data volume in the initial training set, wherein if the data volume is small, the proportion of the general training set, the verification set and the test set can be selected from 6:2, if the data volume is large, the proportion of the training set can be properly improved.

Step S240: pre-training the fusion analysis model through a training set, verifying the fusion analysis model through a verification set, and testing the fusion analysis model through a test set;

firstly, a training set is input into a fusion analysis model for training, and the training effect is judged according to the accuracy and the loss value in the training process, wherein the loss value is an index for directly evaluating the fitting degree of the model on the training set, if the training effect does not meet the training requirement, the super-parameters are adjusted and then the training is continued, the adjustable super-parameters comprise the dimension of a convolution kernel, the discarding rate of a Dropout layer and the like, if the training effect is good, the fusion analysis model is verified through a verification set, if the verification effect is good, the model is stored, if the verification effect is not good, the super-parameters are adjusted and retrained, and after all rounds of training are completed, the model is verified by simulating real environmental data through a test set.

Step S250: and responding to the fusion analysis model to reach a preset training cut-off condition, and finishing the pre-training.

The preset training cut-off condition comprises a normal cut-off condition and a lead cut-off condition, the normal cut-off condition comprises the reaching of a preset training turn, when the loss value of the training set is continuously reduced, and when the loss value of the verification set tends to be unchanged, the lead cut-off condition is triggered when the representative model is over-fitted, and the waste of time and calculation power is reduced.

As an alternative embodiment, referring to formula (1), the analysis result data is obtained by performing calculation based on all the result data, and the analysis result data is:

p＝{p _MLP ×w ₁ +p _1D-CNN ×w ₂ +p _LSTM ×w ₃ } (1)

wherein p represents analysis result data, p _MLP 、p _1D-CNN And p _LSTM Representing the results of MLP, 1D-CNN and LSTM outputs, respectivelyData, w ₁ 、w ₂ And w ₃ Weights for MLP, 1D-CNN, and LSTM are indicated, respectively.

The analysis result data is the probability that the encrypted flow data is normal flow data, and whether hidden malicious behaviors exist in the encrypted flow data can be judged by comparing the probability with a preset threshold value.

As an alternative embodiment, referring to equation (2), the weight is calculated by the following equation:

wherein i represents the ith model, w _i Weight, v, representing the ith model _i Represents the accuracy of the ith model on the test set, n represents the total number of models, f (v) _i )＝v _i ^α And alpha is a constant greater than 1.

The weight is an important index of multi-model fusion, and in order to improve the accuracy of a fused analysis model, f (v) is introduced into the formula (2) _i )＝v _i ^α And alpha is larger than 1, so that the weight of the model with high accuracy in the fusion analysis model is improved, and the accuracy of the fusion analysis model is improved.

It should be noted that the method of the embodiment of the present application may be executed by a single device, such as a computer or a server. The method of the embodiment can also be applied to a distributed scene and completed by the mutual cooperation of a plurality of devices. In such a distributed scenario, one of the multiple devices may only perform one or more steps of the method of the embodiment, and the multiple devices interact with each other to complete the method.

It should be noted that the above describes some embodiments of the present application. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments described above and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

Based on the same inventive concept, corresponding to the method of any embodiment, the application also provides a hidden malicious behavior analysis device based on multi-model fusion.

Referring to fig. 7, the hidden malicious behavior analysis apparatus based on multi-model fusion includes:

a data processing module 11 configured to: acquiring encrypted flow data and preprocessing the encrypted flow data;

a data analysis module 12 configured to: inputting the preprocessed encrypted flow data into a pre-trained fusion analysis model, and outputting analysis result data through the fusion analysis model;

a data determination module 13 configured to: in response to the fact that the analysis result data are smaller than the preset threshold value, the encrypted flow data are hidden malicious behavior flow data; and responding to the analysis result data which is determined to be larger than or equal to the preset threshold value, and then encrypting the flow data into normal flow data.

As an optional embodiment, the data processing module 11 is further configured to:

carrying out flow splitting on the encrypted flow data;

carrying out data cleaning on the encrypted flow data subjected to flow splitting;

carrying out flow optimization on the encrypted flow data subjected to data cleaning to remove redundant data;

unifying the lengths of the encrypted flow data subjected to flow optimization, and converting the lengths into a preset data format.

As an optional embodiment, the data analysis module 12 is further configured to:

respectively inputting the encrypted flow data into a multilayer perceptron MLP, a one-dimensional convolutional neural network 1D-CNN and a long-short term memory LSTM network, and outputting respective result data through the multilayer perceptron MLP, the one-dimensional convolutional neural network 1D-CNN and the long-short term memory LSTM network;

and calculating based on all the result data to obtain the analysis result data.

For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. Of course, the functionality of the various modules may be implemented in the same one or more pieces of software and/or hardware in the practice of the present application.

The device of the above embodiment is used to implement the method for analyzing malicious behavior hidden based on multi-model fusion in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiments, which are not described herein again.

Based on the same inventive concept, corresponding to the method in any embodiment, the application further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, and when the processor executes the program, the analysis method for concealing malicious behaviors based on multi-model fusion in any embodiment is implemented.

Fig. 8 is a schematic diagram illustrating a more specific hardware structure of an electronic device according to this embodiment, where the electronic device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein the processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 are communicatively coupled to each other within the device via bus 1050.

The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solutions provided in the embodiments of the present disclosure.

The Memory 1020 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static Memory device, a dynamic Memory device, or the like. The memory 1020 may store an operating system and other application programs, and when the technical solution provided by the embodiments of the present specification is implemented by software or firmware, the relevant program codes are stored in the memory 1020 and called to be executed by the processor 1010.

The input/output interface 1030 is used for connecting an input/output module to input and output information. The i/o module may be configured as a component within the device (not shown) or may be external to the device to provide corresponding functionality. Wherein the input devices may include a keyboard, mouse, touch screen, microphone, various sensors, etc., and the output devices may include a display, speaker, vibrator, indicator light, etc.

The communication interface 1040 is used for connecting a communication module (not shown in the drawings) to implement communication interaction between the present device and other devices. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, bluetooth and the like).

The bus 1050 includes a path to transfer information between various components of the device, such as the processor 1010, memory 1020, input/output interface 1030, and communication interface 1040.

It should be noted that although the above-mentioned device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only the components necessary to implement the embodiments of the present disclosure, and need not include all of the components shown in the figures.

The electronic device of the foregoing embodiment is used to implement the method for analyzing malicious behavior hidden based on multi-model fusion in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiments, which are not described herein again.

Based on the same inventive concept, corresponding to any of the above embodiments, the present application further provides a non-transitory computer readable storage medium storing computer instructions for causing the computer to execute the analysis method for malicious behavior based on multi-model fusion as described in any of the above embodiments.

Computer-readable media, including both permanent and non-permanent, removable and non-removable media, for storing information may be implemented in any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device.

The computer instructions stored in the storage medium of the above embodiment are used to enable the computer to execute the analysis method for hiding malicious behaviors based on multi-model fusion as described in any of the above embodiments, and have the beneficial effects of corresponding method embodiments, which are not described herein again.

Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the context of the present application, features from the above embodiments or from different embodiments may also be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the embodiments of the present application as described above, which are not provided in detail for the sake of brevity.

In addition, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown within the provided figures for simplicity of illustration and discussion, and so as not to obscure the embodiments of the application. Further, devices may be shown in block diagram form in order to avoid obscuring embodiments of the application, and this also takes into account the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the embodiments of the application are to be implemented (i.e., specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the application, it should be apparent to one skilled in the art that the embodiments of the application can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative instead of restrictive.

While the present application has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description. For example, other memory architectures, such as Dynamic RAM (DRAM), may use the discussed embodiments.

The present embodiments are intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims. Therefore, any omissions, modifications, equivalents, improvements, and the like that may be made without departing from the spirit or scope of the embodiments of the present application are intended to be included within the scope of the claims.

Claims

1. A hidden malicious behavior analysis method based on multi-model fusion is characterized by comprising the following steps:

acquiring encrypted flow data and preprocessing the encrypted flow data;

2. The method for analyzing concealed malicious behaviors based on multi-model fusion as claimed in claim 1, wherein the preprocessing specifically includes:

carrying out flow splitting on the encrypted flow data;

3. The analysis method for concealed malicious behaviors based on multi-model fusion as claimed in claim 1, wherein the fusion analysis model at least comprises a multi-layer perceptron MLP, a one-dimensional convolutional neural network 1D-CNN and a long-short term memory LSTM network;

the multi-layer perceptron MLP at least comprises an input layer, a hidden layer, a Dropout layer and an output layer, the one-dimensional convolutional neural network 1D-CNN at least comprises a convolutional layer, a pooling layer, a Batchnorm layer, a flat layer, the Dropout layer and a full connection layer, and the long-short term memory LSTM network at least comprises an LSTM layer, the Dropout layer and the full connection layer.

4. The method for analyzing concealed malicious behaviors based on multi-model fusion according to claim 3, wherein the pre-processed encrypted traffic data is input into a pre-trained fusion analysis model, and analysis result data is output via the fusion analysis model, and the method comprises the following steps:

5. The analysis method for concealed malicious behaviors based on multi-model fusion as claimed in claim 1, wherein the pre-training specifically comprises:

constructing an initial training set;

performing the pre-processing on the initial training set;

dividing the initial training set subjected to the preprocessing into a training set, a verification set and a test set;

pre-training the fusion analysis model through the training set, verifying the fusion analysis model through the verification set, and testing the fusion analysis model through the test set;

and responding to the fusion analysis model reaching a preset training cut-off condition, and finishing the pre-training.

6. The method for analyzing concealed malicious behaviors based on multi-model fusion as claimed in claim 3, wherein the step of calculating based on all the result data to obtain the analysis result data comprises:

the analysis result data is as follows:

p＝{p _MLP ×w ₁ +p _1D-CNN ×w ₂ +p _LSTM ×w ₃ } (1)

wherein p represents analysis result data, p _MLP 、p _1D-CNN And p _LSTM The resulting data, w, representing the MLP, 1D-CNN and LSTM outputs, respectively ₁ 、w ₂ And w ₃ Weights for MLP, 1D-CNN, and LSTM are indicated, respectively.

7. The method for analyzing concealed malicious behaviors based on multi-model fusion as claimed in claim 6, wherein the formula for calculating the weight is as follows:

wherein i represents the ith model, w _i Weight, v, representing the ith model _i Represents the accuracy of the ith model on the test set, n represents the total number of models, f (v) _i )＝v _i ^α And α is a constant greater than 1.

8. An analysis device for hidden malicious behaviors based on multi-model fusion is characterized by comprising:

a data determination module configured to: in response to determining that the analysis result data is smaller than a preset threshold, the encrypted traffic data is concealed malicious behavior traffic data; and in response to determining that the analysis result data is greater than or equal to the preset threshold, the encrypted traffic data is normal traffic data.

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any one of claims 1 to 7 when executing the program.

10. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1 to 7.