Disclosure of Invention
The invention aims to provide a network homologous attack analysis method, a system, equipment and a storage medium, which are used for solving the problems of low analysis efficiency and low accuracy aiming at the network homologous attack in the prior art.
A network homology attack analysis method comprises the following steps:
s1, establishing a historical data set according to a historical attack event, selecting sequence features and field features in a data stream from the historical data set based on Modbus TCP data stream features, and establishing a homologous attack analysis model based on the acquired sequence features and field features;
s2, sequence features and field features of an attacker about the data stream are obtained from the to-be-detected attack data and are respectively compared with the sequence features and the field features in the homologous attack analysis model, so that attack source or attack organization information of the attack data is obtained.
Preferably, the establishing a historical data set according to the historical attack event, selecting sequence features and field features in the data stream from the historical data set based on Modbus TCP data stream features, and establishing a homologous attack analysis model based on the acquired sequence features and field features includes:
based on Modbus TCP data message format, converting sequence features and field features in data streams of a historical dataset into multidimensional vectors which can be used for feature learning in CNN according to key fields in honeypot data streams, and inputting the converted multidimensional vectors into CNN for convolution operation and maximum value pooling operation; and then taking the feature vector output by the CNN as an input layer of the LSTM, and calculating a weight matrix based on an attention mechanism to obtain a homologous attack analysis model.
Preferably, the weight matrix calculation based on the attention mechanism, to obtain a homologous attack analysis model, includes: and performing iterative optimization on the weight matrix by using a back propagation algorithm so as to obtain a homologous attack analysis model of the optimal feature vector.
Preferably, the comparing with the sequence feature and the field feature in the analysis model of homologous attack respectively further includes: normalizing the sequence features and the field features by a softmax activation function, and classifying according to the statistical distribution calculation probability of the normalization processing result to obtain a probability distribution result.
Preferably, the sequence features and the field features are normalized through the softmax activation function, the probability distribution results are obtained by classifying according to the statistical distribution calculation probability of the normalization processing results, and then model evaluation is carried out on the probability distribution results generated by the softmax activation function based on cross entropy loss.
A network homologous attack analysis system comprises a preprocessing module and an analysis module;
the preprocessing module is used for establishing a historical data set according to the historical attack event, selecting sequence features and field features in the data stream from the historical data set based on Modbus TCP data stream features, and establishing a homologous attack analysis model based on the acquired sequence features and field features;
the analysis module acquires sequence characteristics and field characteristics of an attacker about the data flow from the to-be-detected attack data, and compares the sequence characteristics and the field characteristics with the sequence characteristics and the field characteristics in the homologous attack analysis model respectively, so that attack source or attack organization information of the attack data is obtained.
Preferably, the preprocessing module establishes a historical data set according to the historical attack event, selects sequence features and field features in the data stream from the historical data set based on Modbus TCP data stream features, and establishes a homologous attack analysis model based on the acquired sequence features and field features, wherein the establishing the homologous attack analysis model comprises the following steps: based on Modbus TCP data message format, converting sequence features and field features in data streams of a historical dataset into multidimensional vectors which can be used for feature learning in CNN according to key fields in honeypot data streams, and inputting the converted multidimensional vectors into CNN for convolution operation and maximum value pooling operation; then, taking the feature vector output by the CNN as an input layer of the LSTM, and carrying out weight matrix calculation based on an attention mechanism to obtain a homologous attack analysis model;
the weight matrix calculation is performed based on the attention mechanism to obtain a homologous attack analysis model, which comprises the following steps: and performing iterative optimization on the weight matrix by using a back propagation algorithm so as to obtain a homologous attack analysis model of the optimal feature vector.
Preferably, the analysis module is respectively compared with sequence features and field features in the homologous attack analysis model, and further includes: and normalizing the sequence features and the field features by a softmax activation function, and classifying according to the statistical distributive calculation probability of the normalization processing result.
A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the network homology attack analysis method described above when the computer program is executed.
A computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the network homology attack analysis method described above.
Compared with the prior art, the invention has the following beneficial technical effects:
the invention relates to a network homologous attack analysis method, which comprises the steps of establishing a historical data set according to a historical attack event, selecting sequence features and field features in a data stream from the historical data set based on Modbus TCP data stream features, and establishing a homologous attack analysis model based on the acquired sequence features and field features; the method and the device can quickly and accurately acquire the homologous classification information of the attack data through comparing the sequence characteristics and the field characteristics, so that the latest attack and vulnerability launched by the server can be effectively prevented.
By means of the model optimization method based on the attention mechanism, unsupervised training and supervised adjustment are carried out on the homologous attack analysis model, so that the accuracy of model classification is greatly improved, and the accuracy of homologous attack analysis is improved.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The invention provides a network homologous attack analysis method, which specifically comprises the following steps:
s1, establishing a historical data set according to a historical attack event, selecting sequence features and field features in a data stream from the historical data set based on Modbus TCP data stream features, and establishing a homologous attack analysis model based on the acquired sequence features and field features;
s2, sequence features and field features of an attacker about the data stream are obtained from the to-be-detected attack data and are respectively compared with the sequence features and the field features in the homologous attack analysis model, so that attack source or attack organization information of the attack data is obtained.
Network traffic contains many redundant and trace-independent attributes that reduce the accuracy of detection and increase computational load and complexity. In the invention, the Modbus-based TCP data stream characteristics comprise data stream duration, TCP port number, average time interval of two inputs of a data packet, time interval variance of two inputs of the data packet and average size of a payload (payload).
According to the Modbus TCP data flow characteristics, modbus TCP data flow characteristic selection based on flow statistics characteristics is performed according to the specific data flow and data message characteristics of the cloud server protocol Modbus TCP. 15 cloud server flow characteristics are selected, the selected cloud server flow characteristics and the selected cloud server flow characteristics are described in table 1, wherein the first 6 characteristics are characteristics of Modbus TCP protocol, and the characteristics are respectively Modbus transaction processing marks, modbus protocol marks, MBAP data length, unit marks, IP packet header length and function codes in Modbus TCP. The rest is the traditional characteristics of the traffic, including total byte number, sequence duration, data packet transmission rate, time interval average value, time interval variance, time interval standard deviation, time interval maximum value, time interval minimum value and connection request number.
TABLE 1 flow layer characterization
Based on Modbus TCP data message format, converting sequence features and field features in data streams of a historical dataset into multidimensional vectors which can be used for feature learning in a CNN (convolutional neural network) according to key fields in honeypot data streams, and inputting the converted multidimensional vectors into the CNN for convolutional operation and maximum value pooling operation; and then taking the feature vector output by the CNN as an input layer of an LSTM (long short term memory network), and calculating a weight matrix based on an attention mechanism to obtain a homologous attack analysis model.
The invention uses a back propagation algorithm (Back propagation algorithm, BP algorithm) to carry out iterative optimization on the weight matrix, so as to find a homologous attack analysis model of the optimal feature vector; and classifying the attack data to be detected and evaluating the model by using the obtained homologous attack analysis model.
Specifically, according to the binary form of a data packet in a historical data set, the key word segment length is m/8 bytes, data in the historical data set is embedded into an m-dimensional space, and the total data amount is n, so that an m multiplied by n bit matrix is generated;
according to Modbus TCP data flow characteristics, performing convolution calculation in CNN by using a one-dimensional convolution layer; wherein the number of convolution kernels is k=5, the filter size is m×q, the stride is 2, and the layer output matrix size is 5× (n/2); at the Pooling layer, reducing the dimension of the characteristic value by using a maximum Pooling method (Max Pooling), and generating a corresponding characteristic diagram; the window value is 2; CNN output feature map size is 5× (n/4);
the feature map output by CNN is input into the constructed LSTM network, and the input is 5-dimensional feature vector c 0 ;
Data characteristics are obtained by performing unsupervised learning training and supervised adjustment in LSTM networkl 0 And then, using a model optimization method based on an attention mechanism to perform weight matrix iterative optimization on the feature vectors of the all-connection layer by using a back propagation algorithm to obtain an optimal weight matrix, and obtaining the optimal feature vector according to the optimal weight matrix so as to obtain a homologous attack analysis model.
The attention mechanism calculation formula is shown as follows:
wherein, the liquid crystal display device comprises a liquid crystal display device,
for neuronal output, +.>
Represents->
Weight matrix of time,/>
Representing data characteristics->
Matrix of>
The deviation value is the deviation value under the supervision and fine adjustment; />
For attention vector, ++>
Weight matrix representing iterative optimization, +.>
Representing the expression of the data stream based on the attention mechanism.
Normalizing the sequence features and the field features by the softmax activation function, classifying the calculated probability according to the statistical distribution of the normalized processing result to obtain a probability distribution result, and then carrying out model evaluation on the probability distribution result generated by the softmax activation function based on cross entropy loss.
And adopting a back propagation algorithm to carry out iterative optimization on the weight matrix so as to minimize the deviation value of the model feature vector. The back propagation algorithm comprises two stages of excitation propagation and weight updating, wherein the first stage obtains excitation through a training input network, and then obtains errors of a hidden layer and an output layer through a target corresponding to the back input; in the second stage, the weight is updated by the following formula to obtain a weight vector
。
Wherein, the liquid crystal display device comprises a liquid crystal display device,
for initial weight, ++>
Data characteristic>
For iteration->
Weight of secondary->
For the correction of the gradient, +.>
Is->
And b is a bias vector.
After the feature vector value is obtained according to a CNN and LSTM-based homologous attack analysis model, in order to solve the problem of flow classification, the probability is calculated through a softmax activation function; the softmax calculation formula is shown as follows:
wherein the method comprises the steps of
For classifying output +.>
Is the j-th input of softmax; calculating the deviation between the true value and the estimated value, and evaluating the output of the model by using cross entropy loss so as to cope with the influence of multiple factors such as model parameters, vector prediction, high complexity and the like; probability of cross entropyThe distribution calculation formula is shown as follows:
wherein, the liquid crystal display device comprises a liquid crystal display device,
for cross entropy function, X is sample, +.>
For the desired output, ++>
Is the actual output; cross entropy loss is reduced by iterating the attention vector and the attention matrix.
According to the network homologous attack analysis method, homologous attack analysis is carried out on Modbus TCP data streams; firstly, selecting characteristics of attack data to be detected based on Modbus TCP data flow characteristics; then, extracting sequence features and field features in the data stream, carrying out data stream feature convolution and pooling operation through CNN, inputting the generated feature vector into an LSTM long-short-term memory network through the feature vector output by CNN to learn sequence content, and adding an attention matrix in the part. And performing iterative optimization and weight updating of all weights in the attention matrix by using a back propagation algorithm, and then performing evaluation on the basis of the CNN-LSTM model by using a cross entropy loss function while obtaining a classification probability value based on a softmax activation function.
Based on Modbus TCP data flow characteristics, the cloud server protocol homologous attack analysis method based on CNN-LSTM is provided for the first time, unsupervised training and supervised adjustment are carried out on a homologous attack analysis model, so that the classification accuracy of the homologous attack analysis model is greatly improved, and the accuracy of the homologous attack analysis is improved.
In one embodiment of the present invention, as shown in fig. 1, a network homologous attack analysis system includes a preprocessing module and an analysis module;
the preprocessing module is used for establishing a historical data set according to the historical attack event, selecting sequence features and field features in the data stream from the historical data set based on Modbus TCP data stream features, and establishing a homologous attack analysis model based on the acquired sequence features and field features;
the analysis module acquires sequence characteristics and field characteristics of an attacker about the data flow from the to-be-detected attack data, and compares the sequence characteristics and the field characteristics with the sequence characteristics and the field characteristics in the homologous attack analysis model respectively, so that attack source or attack organization information of the attack data is obtained.
In yet another embodiment of the present invention, a terminal device is provided, the terminal device including a processor and a memory, the memory for storing a computer program, the computer program including program instructions, the processor for executing the program instructions stored by the computer storage medium. The processor may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf Programmable gate arrays (FPGAs) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc., which are the computational core and control core of the terminal adapted to implement one or more instructions, in particular adapted to load and execute one or more instructions to implement a corresponding method flow or a corresponding function; the processor provided by the embodiment of the invention can be used for the operation of the network homologous attack analysis method.
In a further embodiment of the present invention, the present invention also provides a storage medium, in particular, a computer readable storage medium (Memory), which is a Memory device in a terminal device, for storing programs and data. It will be appreciated that the computer readable storage medium herein may include both a built-in storage medium in the terminal device and an extended storage medium supported by the terminal device. The computer-readable storage medium provides a storage space storing an operating system of the terminal. Also stored in the memory space are one or more instructions, which may be one or more computer programs (including program code), adapted to be loaded and executed by the processor. The computer readable storage medium herein may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory. One or more instructions stored in a computer-readable storage medium may be loaded and executed by a processor to implement the corresponding steps of the method for analysis of network homology attacks in the above-described embodiments.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical aspects of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those of ordinary skill in the art that: modifications and equivalents may be made to the specific embodiments of the invention without departing from the spirit and scope of the invention, which is intended to be covered by the claims.