CN115883261A

CN115883261A - ATT and CK-based APT attack modeling method for power system

Info

Publication number: CN115883261A
Application number: CN202310187310.8A
Authority: CN
Inventors: 邱日轩; 周欣; 陈明亮; 曹娜; 井思桐; 周自岚; 肖子洋; 鄂驰
Original assignee: State Grid Corp of China SGCC; Information and Telecommunication Branch of State Grid Jiangxi Electric Power Co Ltd
Current assignee: State Grid Corp of China SGCC; Information and Telecommunication Branch of State Grid Jiangxi Electric Power Co Ltd
Priority date: 2023-03-02
Filing date: 2023-03-02
Publication date: 2023-03-31

Abstract

The invention discloses an ATT and CK-based power system APT attack modeling method, which comprises the following steps: analyzing the attack target and determining to use an attack means; preparing for attack according to the determined attack means and characteristics of the victim, developing a corresponding program, and implanting the malicious program into a system; according to the invention, through modeling and analyzing the APT, security technicians can be helped to understand attack processes, identify vulnerable nodes in a network system, extract information, analyze threat information, hunt threat, trace source attack and the like, so that security risks can be dynamically and integrally known on the basis of environment, the discovery, identification, understanding, analysis and response handling capabilities of security threats are improved from a global perspective, and the security situation perception capability of a network space is improved.

Description

ATT and CK-based APT attack modeling method for power system

Technical Field

The invention relates to the field of APT (advanced persistent threat) attack modeling, in particular to an ATT (automatic transient threat) and CK (CK) -based APT attack modeling method for a power system.

Background

The novel power system mainly based on new energy is a new development direction of the power industry, the novel power system can reduce traditional energy and use more new energy to be merged into a power grid, so that the energy structure of the future power system is more diversified, the power allocation and storage are more complicated, a large amount of digitalization and intelligent equipment can be used in the novel power system, the novel power system is more complicated due to the addition of more new digital technologies, the vulnerability of each link is more, and the form proportion of the new energy to the diversified load is greatly improved;

compared with the traditional power system, the novel power system is more complex and changeable, has more potential safety hazards, and has become an important attack object of APT attack as a national important infrastructure, so that modeling the APT attack of the novel power system is urgent.

Disclosure of Invention

The invention aims to provide an ATT and CK-based APT attack modeling method for a power system, which aims to solve the defects in the background art.

In order to achieve the above purpose, the invention provides the following technical scheme: an ATT and CK-based power system APT attack modeling method comprises the following steps:

s1: analyzing the attack target and determining to use an attack means;

s2: preparing for attack according to the determined attack means and characteristics of the victim, developing a corresponding program, and implanting the malicious program into a system;

s3: the method comprises the steps that an electronic mail is used as an attack carrier, a victim clicks a fishing mail through an attack means, user account password information is obtained, and attack of the next stage is carried out;

s4: obtaining system operation authority, running a malicious program, and carrying out attack operation together with an attack strategy;

s5: establishing an attack footing, installing a backdoor program in the system, replacing a legal program or adding a starting program, and improving the attack authority of the system by using an error modification registry of system configuration;

s6: observing a network and a system, stealing account names and passwords to access the system, creating more accounts to help achieve a target, exploring the surrounding environment of an attack point, and spreading attacks or stealing data in each system and account;

s7: controlling the operation of a target system by establishing commands and controls with different stealth levels according to the network structure and defense of a victim;

s8: and starting an attack according to the target of the attack, packaging the stolen data, acquiring the data from the target network, and transmitting the data through a control channel.

Preferably, in the step S1, the attack target is analyzed, including collecting information of the target, determining a defense mechanism of the target and whether the target has an attack value, and infiltrating by a technical means to obtain intelligence;

in step S2: the development program comprises Trojan horse virus and a malicious program, the malicious program is implanted into the system by means of 0day bug, spearphishing or mobile storage equipment, and preparation for attack comprises server building, web service purchasing, attack account registration, domain purchasing or program development certificate stealing.

Preferably, in step S3, the attack means includes a puddle attack and a harpoon attack;

in step S4, the attack operation includes obtaining a user account and a password, knowing a detailed structure of the system, stealing or tampering required data, controlling the system, destroying normal operation of the system, and communicating with the node on the basis of the device or the node to propagate an attack.

Preferably, in step S8, stealing data includes setting a size limit on transmission, so as to transmit data out of the system, and the attack is to suspend and disturb normal operation of the system, and often use deletion or tampering of key files, modification of user rights, destruction of hardware devices or devices controlled by the system.

Preferably, the ATT and the CK comprise an enterprise matrix, a mobile platform matrix and an ICS matrix, the ATT and the CK are distributed to each stage of attack, grammatical structure information and semantic information are extracted, technical texts with similar semantics are found, and stage division is carried out;

processing the technical description texts in the ATT and the CK, automatically acquiring semantic information of an attack technology and tactics, and dividing the semantic information into attack stages corresponding to the APT attack model;

constructing a data set through attack technology texts in ATT and CK, training data by using a Bert model, and dividing APT attack into tasks at a corresponding stage of a killer chain;

the technical information in the ATT and the CK is represented by unstructured natural language, and is distributed in an enterprise matrix according to an attack strategy, original data including technical texts and strategies to which the technical texts belong are obtained by using a crawler tool, context information and extraction relations are extracted by using a Bert model, prediction is carried out, parameters are iterated continuously, the effect of the model is optimized, and finally automatic division of an APT attack technical stage is achieved.

Preferably, in the APT attack text, the time sequence information is input into the model, that is, the position code is added, the position of the element in the sequence is pos, and the dimension is d _k ，

If the dimension of the position vector is the same as the hidden state dimension of the entire model, the calculation formula of the ith element of the position pos encoding vector is:

，

position t encoded vector

Comprises the following steps:

，

wherein

The words of the first and second in the input sequence.

Embedding the words of the sentence into the layer through the model words to obtain word vectors,

for the tth word, based on a predetermined criterion>

Is->

The word vector obtained after the processing is used for judging whether the word vector is based on the word value>

And adding the position vector to obtain a word vector with position information:

，

converting input word vectors into feature vectors through an attention module of a transform encoder layer

：/>

，

Where Attention represents the result of the calculation using the Attention mechanism, which is a method for calculating the correlation between different positions in the input sequence. The three matrixes Q, K and V respectively represent Query (Query), key (Key) and Value (Value). Query is a vector indicating information of a current location, key is a vector indicating information of another location, and Value is a vector indicating content of another location. The three matrices Q, K and V are obtained by linear transformation of the input sequence X, i.e. by multiplying the embedded vector by three different randomly initialized weight matrices through 3 different weight matrices

Get, & lt>

Is the dimension of the word vector/hidden layer.

Preferably, the transform encoder layer comprises at least 6 repeated layers, each layer has two sub-layers, namely a self-attention mechanism and a feedforward fully-connected layer, residual connection is used in each sub-layer, and the output is normalized by Add & Norm, the output of each layer of the transform encoder layer is LayerNorm (x + blanket (x)), wherein LayerNorm is a normalization layer, the normalization layer normalizes the output vector of each position, can reduce internal covariate offset, accelerate model convergence, and blanket is a sub-layer in each layer, including the self-attention mechanism and feedforward fully-connected layer, x is the function realized by the sub-layers, and the output dimension of each layer is set to 512.

Preferably, the attack phase process comprises the following steps:

converting a technical text into word vector representation as input of an encoder by using a pre-training word vector obtained by a Bert model on a large-scale training set;

the encoder layer is divided into two sublayers, namely a multi-head self-attention mechanism layer and a fully-connected feedforward network layer;

calculating a weight coefficient through the Attention layer, and carrying out weighted summation on Value values in the elements to generate an Attention Value;

performing residual error connection and LayerNormalization normalization through an Add & Norm layer;

performing linear transformation and relu activation function twice through a feed layer;

finally, the Add & Norm layer is passed once again.

Preferably, the crawler tool captures attack technology names and brief introduction texts in ATT and CK webpages, and arranges the attack technology names and brief introduction texts in a matrix according to the original sequence of the technology, so that an original data set is generated, and the attacks in the ATT and CK are divided into three fields of Enterprise, mobile and ICS.

In the technical scheme, the invention provides the following technical effects and advantages:

1. according to the invention, through modeling and analyzing the APT, security technicians can be helped to understand attack processes, identify vulnerable nodes in a network system, extract information, analyze threat information, hunt threat, trace source attack and the like, so that security risks can be dynamically and integrally known on the basis of environment, the discovery, identification, understanding, analysis and response handling capabilities of security threats are improved from a global perspective, and the security situation perception capability of a network space is improved.

2. The method can effectively capture the dependency relationship between context semantic information and the technical text, so that the technical text in the ATT and CK knowledge bases is automatically divided into the corresponding stages of the killer chain model, and experimental results show that the algorithm provided by the method is superior to related algorithms in other network security fields on an MITRE knowledge base data set, and the effect of dividing the APT attack technology into the killer chain is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings can be obtained by those skilled in the art according to the drawings.

FIG. 1 is a diagram of the APT attack strategy and target framework of the present invention.

FIG. 2 is a schematic diagram of an APT attack chain model according to the present invention.

Fig. 3 is a flow framework diagram of APT attack phase division according to the present invention.

FIG. 4 is a schematic diagram of an attack technology partition structure of the Bert model of the present invention.

FIG. 5 is a diagram of a transform model architecture according to the present invention.

FIG. 6 is a diagram of the Bert model pre-training architecture of the present invention.

FIG. 7 is a comparative bar graph of the present invention.

FIG. 8 is a model experiment line graph of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example 1

The modeling method for the APT attack of the power system based on the ATT and the CK comprises the following steps:

through modeling and analyzing the APT, security technicians can be helped to understand attack processes, weak nodes in a network system are identified, information extraction, threat information analysis, threat hunting, attack tracing and the like are carried out, so that security risks can be dynamically and integrally known on the basis of environment, discovery, identification, understanding, analysis and response handling capabilities of security threats are improved from a global perspective, and the network space security situation perception capability is improved.

Referring to FIGS. 1 and 2, A, a reconnaissance phase

The attacker analyzes the attack target, including collecting information about the target, determining the defense mechanism of the target, and whether the target has attack value, and also performs penetration by technical means (such as interception) to obtain information, thereby finally determining the attack means used.

B. Tool preparation phase

The method comprises the steps that an attacker develops corresponding trojan viruses and malicious codes according to determined attack means and characteristics of a victim, and uses means such as 0day loopholes, spear fishing or mobile storage devices to implant malicious programs into a system, and the stage also comprises other preparations for attack, such as server building or network service purchasing, account number registration for attack, domain name purchasing, code stealing and development certificate stealing and the like.

C. Load delivery phase

According to the attack, malicious codes constructed in the previous stage are implanted into a system, most of APT attacks use emails as attack carriers, and victims click phishing emails through means of water pit attack, fish-fork attack and the like, so that information such as user account numbers and passwords is obtained, and then the attack of the next stage is carried out.

D. Exploit stage

After obtaining enough system operation authority, an attacker runs malicious codes controlled by the attacker, and the attacker and other attack strategies are used together, so that attack operation can be carried out, such as obtaining a user account and a password, knowing the detailed structure of the system, stealing or tampering required data, controlling the system, damaging the normal operation of the system, and communicating with other nodes on the basis of the equipment or the nodes, thereby spreading the attack.

E. Installation of implant stage

In order to further gain access and control over the system, attackers establish a base point of attack, install backdoor programs and other control software in the system, and replace or hijack legitimate code or add boot code. Then, in order to further explore the network, an attacker may also raise the own attack authority by using errors or bugs of system configuration, or by using fake tokens, modifying a registry, and the like.

F. Exploration collection phase

To further extend the attack, attackers observe the network and system, possibly stealing account names and passwords, accessing the system with these legitimate credentials, and providing the opportunity to create more accounts to help achieve the goal. Then the attacker explores the environment around the attack point and the unit which can be operated by the attacker, and finally, the attacker propagates the attack or steals the data in each system and account through the environment by knowing the environment.

G. Command and control phases

An attacker attempting to communicate with an infected system, using communication with network protocols, may establish command and control with different levels of stealth in a variety of ways depending on the network architecture and defense of the victim. Thereby controlling the operation of the target system.

H. Target achievement phase

An attacker launches an attack according to the target of the attack and usually packs the stolen data. Techniques for obtaining data from a target network typically transmit the data through its command and control channel or alternate channel, and may also include setting size limits on the transmission to transfer the data out of the system. The purpose of the attack is to suspend and interfere with the normal operation of the system, and often use the modes of deleting or tampering key files, modifying user rights, destroying hardware devices or devices controlled by the system, and the like.

An APT attacker generally attacks through the above-described procedures and techniques, and reducing the corresponding vulnerabilities may destroy some steps of the attack, thereby failing the attack. Through modeling and analyzing the APT, security technicians can be helped to understand attack processes, weak nodes in a network system are identified, information extraction, threat information analysis, threat hunting, attack tracing and the like are carried out, so that security risks can be dynamically and integrally known on the basis of environment, discovery, identification, understanding, analysis and response handling capabilities of security threats are improved from a global perspective, and the network space security situation perception capability is improved.

Example 2

The ATT and CK knowledge base of MITRE analyzes network attacks from both tactical and technical levels by studying a large number of actual APT attack events, and describes attack entities such as attack organizations, malicious codes and the like in detail, and can predict and defend the APT attacks by analyzing the technologies and the entities, and the knowledge base is increasingly concerned by attack analysts due to the fact that ATT and CK are open sources.

ATT and CK divide the attack matrix into Enterprise (Enterprise) matrix, mobile platform (Mobile) matrix and ICS matrix according to the difference of using the platform, and divide the technology into more than ten strategies (Tactics), however, the division of these strategies lacks certain order, can't defend in time when discovering the technology of a certain stage, is difficult to be used for the APT attack that takes place in reality. The technologies in ATT and CK need to be divided and distributed to each stage of attack, so that defense measures can be taken at the corresponding stage. Because the technologies in the ATT and CK are all expressed by natural languages such as texts, the syntactic structure information and semantic information in the ATT and CK need to be extracted, so that the technical texts with similar semantics are found and the stages are divided.

The application provides an algorithm based on a Bert model, which is used for processing technology description texts in ATT and CK, automatically acquiring semantic information of attack technologies and tactics, and dividing the semantic information into attack stages corresponding to APT attack models.

And constructing a data set through attack technology texts in ATT and CK, and training the data by using a Bert model, thereby realizing the task of automatically dividing the APT attack technology into the corresponding stages of the killer chain.

In order to extract semantic information in an APT attack technology text and divide an attack technology into stages corresponding to a killer chain model, the application provides an attack stage division flow framework, as shown in FIG. 3:

the technical information in the ATT and the CK is represented by unstructured natural language, and is distributed in enterprise matrix according to an attack strategy, original data including technical texts and strategies to which the technical texts belong is obtained by using a crawler tool, context information and extraction relations are extracted by using a Bert model for prediction, parameters are iterated continuously, the effect of the model is optimized, and finally automatic division of APT attack technical stages is realized.

Example 3

In order to realize matching of the APT attack technology to the corresponding stage of the killer chain model, the network shown in FIG. 4 is designed based on the Bert model, wherein the attack technology text is "Adversaries major exchange Information about the viewing's host hardware that can be used for reducing targeting. Information about the hardware in construction map …", and the inputs are "Adversaries, major, exchange, information, about, …", respectively.

Through the Embedding layer, for mapping the words or phrases in the APT attack technology text to the low-dimensional dense vector, firstly, the words are divided and converted into the vector with specific dimension in the TokenEmbedding (word coding layer), and then the vector is added with the segmentembedding and positionembedding to generate the input vector of the coding layer.

the transform model is an important component of the Bert model, and the encoder module stacking structure of a plurality of transforms enables the Bert model to have bidirectional coding capability and strong feature extraction capability.

the transform model is divided into an encoder and decoder structure, based on a self-attention mechanism, and the structure is shown in fig. 5, and since a decoder layer is a generative model for natural language generation and is not suitable for a text classification task, only an encoder layer is used.

The position relation of each word in the APT attack technology text has great influence on semantics, and the tranf of an attention mechanism is completely usedThe ormer cannot automatically capture the position information, so that the timing information needs to be actively input into the model, i.e. the position code is added. The method is based on a trigonometric function, and the position of an element in a sequence is pos, and the dimension is d _k ，

If the dimension of the position vector is the same as the hidden state dimension value of the whole model, the calculation formula of the ith element of the position pos encoding vector is as follows:

，

for example: position t encoded vector

Comprises the following steps:

，/>

wherein

The words of the first and second in the input sequence.

The words of the sentence are embedded into the layer through the model words to obtain a word vector,

is the tth word, is greater than>

Is->

，

：

，

Where Attention represents the result of the calculation using the Attention mechanism, which is a method for calculating the correlation between different positions in the input sequence. The three matrixes Q, K and V respectively represent Query (Query), key (Key) and Value (Value). Query is a vector for indicating information of a current location, key is a vector for indicating information of other locations, and Value is a vector for indicating contents of other locations. The three matrices Q, K and V are obtained by linear transformation of the input sequence X, i.e. by multiplying the embedded vector by three different randomly initialized weight matrices through 3 different weight matrices

Get and/or are>

The method is a dimension of a word vector/hidden layer, different weights are obtained according to each calculation by regarding an embedded vector as a Query vector and a plurality of Key-Value pairs, and the embedded vector is acted on the Value, and the weight matrix is obtained by calculating the embedded vector per se, so the method is called a self-attention mechanism. The attention mechanism carries out different weight calculations according to the context information of the words, and the important words are given high weight, and the unimportant words are given low weight, so that the obtained weight vector contains the interrelation among the words.

The encoder layer is composed of N =6 repeated layers, each layer has two sublayers, namely a self-attention mechanism layer and a feedforward fully-connected layer, residual connection is used in each Sublayer, and output is normalized through Add & Norm, namely the output of each layer of the encoder layer is LayerNorm (x + Sublayer (x)), wherein LayerNorm is a normalization layer, the normalization layer normalizes output vectors of each position, internal covariate offset can be reduced, model convergence is accelerated, sublayer is a Sublayer in each layer and comprises a self-attention mechanism layer and a feedforward fully-connected layer, x is a function realized by the sublayers, and the dimension of the output of each layer is set to 512.

The Bert model belongs to an unsupervised Fine-tuning method, which mainly comprises two steps of pre-training and Fine-tuning, as shown in fig. 6, in the pre-training stage, a model is trained on unlabeled data of a large number of various tasks, in the Fine-tuning stage, an output layer is added to the Bert model according to a specific downstream task, the model is initialized by using the pre-trained parameters, and then parameters of the model are Fine-tuned on a labeled data set of the task.

(1) pre-training phase

At this stage, the Bert model is trained on a large number of unlabeled datasets, and two tasks are designed for learning in order to enable the trained model to be applied to different downstream tasks.

1）MaskedLM(MLM)：

15% of tokens in the random MASK input sequence, the selected tokens having a 80% probability of being replaced by [ MASK ] symbols and a 10% probability of being replaced by a random token, the 10% probability remaining unchanged; next, these mask-off tokens are predicted.

2) Nextsenteprediction (next sentence prediction, NSP):

sentence a and sentence B are picked in a monolingual corpus, sentence B is 50% likely to be the next sentence of sentence a, and 50% likely not to be the next sentence of sentence a. According to the structure of the transformerecoder, the hidden state of a special character [ CLS ] in the sequence is obtained by weighted summation of information of each token in the sequence, so that the final hidden state C of the [ CLS ] can represent semantic information of the whole sequence and can be used for classifying tasks, and the final goal of the Pretrain stage is to reduce the sum of MLM and NSP task loss values and loss (MLM) + loss (NSP).

(2) Fine-tuning stage

And carrying out fine adjustment aiming at different downstream tasks, designing a corresponding output layer, adding the output layer to the highest layer of the Bert model, and initializing the model and carrying out fine adjustment on the parameters by using pre-trained parameters.

Example 4

According to the process in embodiment 3, a process of stage division of the APT attack technology is given, for a given attack technology text, a pretrained word vector obtained on a large-scale training set by using a Bert model is used first, and the technology text is converted into a word vector representation as an input of an encoder, where xi represents the ith word vector in a sentence. The Encoder (Encoder) layer is divided into two sublayers, a multi-head self-Attention mechanism layer and a fully-connected Feedforward network layer, weight coefficients are calculated through the Attention layer, value values in elements are subjected to weighted summation to generate an Attention Value, residual error connection and LayerNormalization normalization are carried out through an Add & Norm layer, linear transformation and relu activation functions are carried out twice through a feed & Norm layer, and finally the Add & Norm layer is passed through once. Initializing required parameters before the model training is started; the text is vectorized through an embedding layer, then the similarity between elements is analyzed and the weight coefficient is calculated through a multi-head attention layer, corresponding output is obtained through a full connection layer, the loss value of the corresponding output is calculated, the model parameters are updated reversely, and training is continued. The result returned by the final model is the stage to which the APT attack technology is divided into the killer chain.

Example 5

The data set used by the application is derived from an ATT and CK knowledge base of MITRE, and the MITRE and CK architecture expands network attack activities from Tactics (Tactics) and technologies (Techniques) level, also sorts attack-initiating organization and attack events to form a network attack behavior knowledge base, provides a general classification to describe different stages of an attack life cycle, and describes attack processes and malicious codes/software used in detail.

And (3) capturing attack technology names and brief introduction texts in ATT and CK webpages by using a crawler tool, and arranging the attack technology names and brief introduction texts in the matrix according to the sequence of the technology originally in the matrix, thereby generating an original data set.

Attack technologies in ATT and CK are divided into three fields of Enterprise, mobile and ICS, enterprise is the most detailed, technologies in the field of ATT and CK enterprises (Enterprise) are selected, the field comprises 222 technologies and 526 sub-technologies, and a total of 748 technical texts, and distribution of the technologies in ATT and CK is shown in Table 1:

，

in the context of the following Table 1,

the data are processed and divided into stages of a killing chain according to the prior knowledge, so as to construct the label. The resulting data set is shown in table 2:

，

in the context of Table 2, the following examples are,

as can be seen from table 2, the difference between the number of technologies in each stage is large, and the detection means and the technical means of different APT organizations for enterprises are relatively consistent from the detection stage to the load delivery stage, so that the number of technologies in these stages is small, and the installation and implantation stages are the key points of the APT attack, and the means used are complicated and the number of technologies is large.

Only hundreds of technical data are obtained by using the ATT and CK knowledge bases, and a large amount of data is needed for effective comparison in model training, so that the original data needs to be processed, the diversity of the data is increased, the generalization capability of the model is improved, and the overfitting phenomenon is reduced, namely the data enhancement technology. In the NLP, text enhancement is mainly realized by three methods of paraphrasing, noise and sampling, a text enhancement tool nlpatch is adopted, 25 pieces of data are generated by each error, hundreds of pieces of data are generated by each technical text, and therefore an original technical text data set is enhanced into a large data set with tens of thousands of pieces of data.

Different text classification algorithms are selected to verify the effectiveness of the models, and the algorithms comprise a plurality of neural network models for classification and are compared with algorithms in the field of network security research, so that the algorithms are compared with the method for distributing the APT attack technology text to the killing chain, and the feasibility of the algorithms is verified. The algorithm used in detail is as follows: convolutional Neural Networks (CNNs) use pre-training word vectors derived from unsupervised learning models, and simple modifications to the architecture to allow the use of pre-training vectors and task-specific vectors with multiple channels. Cyclic neural network (RNN) text classification based on multitask learning was proposed by PengfeiLiu et al, implementing multitask learning using three shared models based on RNN, each model with a different shared or independent LSTM layer for the tasks. The RCNN model uses a Bi-directional recurrent neural network Bi-LSTM, adopts a maximum pooling layer, automatically judges which words play a key role in text classification, and captures key components in the text. BilSTM _ Attention (textRNN-Attention) text classification was proposed by PengZhou et al using an improved RNN structure BilSTM that mainly included forward and backward propagation, each time point containing an LSTM cell for selective memory, forgetting and output of information, adding an Attention tier, combining features by weighted summation with weight vectors. Fast text classifier FastText was proposed by armandjob et al, which decomposes linear classifiers into low-rank matrices or uses multi-layer neural networks based on simple linear models rather than deep neural networks. In the case of neural networks, information is shared through hidden layers. DPCNN text classification was proposed by RieJohnson et al to deepen and improve CNN models, capture longer distance text relationships through pooling layers, and mitigate the gradient disappearance problem using Shorcutconnection.

For classification tasks, the commonly used indicators are accuracy, precision, recall and F1 value, which are calculated based on the confusion matrix shown in table 3:

，

in the context of Table 3, the following examples are,

TP indicates that the classifier has correct identification result and considers the sample as a positive sample; TN means that the classifier has correct identification result, and the classifier considers the sample as a negative sample; FP means that the classifier recognizes the result as wrong, the classifier considers the sample as a positive sample, and actually the sample is a negative sample; FN refers to the classifier recognizing that the result is false, the classifier considers the sample to be a negative sample, and in fact the sample is a positive sample.

The calculation formula is as follows:

，

，

，

，

wherein ACC, P, R and F1 respectively represent the accuracy, the precision, the recall rate and the F1 value.

This application uses encoder number of piles N =2, and the word vector dimension sets up to 300, uses dropout to abandon some characteristics, can improve the model generalization, reduces overfitting, sets up dropout =0.5, because the length difference of text is great, handles the text for fixed length value pad _ size =256, sets up batch size 4, and every batch of training contains 4 data promptly, and the learning rate sets up to set up to be 4

And the number of multi-head attention head =5, the iteration number of the final model is epoch =20, and after each round is finished, the model reversely propagates the updated parameters to continue the optimization training. The parameter settings are shown in table 4:

，

table 4.

Example 6

1) In order to verify the feasibility of the attack technology stage division method provided by the present application, the present application is compared with other related researches in the field of network security, and the results are shown in fig. 7 and table 5:

，

in the context of Table 5, the following examples are,

in FIG. 7 and Table 5, P, R, and F1 represent the accuracy, recall, and F1 values, respectively.

As can be seen from fig. 7, the method provided by the present application is superior to the contrast model in various indexes, wherein the TextCNN and DPCNN model implement feature extraction through a convolutional neural network, and has a good effect when the data set is large, while RNN generates a gradient disappearance for data with a long text, and when the information position difference is large, RNN is difficult to capture, so the accuracy is poor, only 57.73%, and TextRNN _ Att, because BiLSTM is used to improve RNN, data is trained from front to back, semantic dependency between words is captured, so the above problem is effectively solved, and the model effect is also improved by adding an attention mechanism.

The fastext model uses a shallow network, which, although less accurate in training, takes much less time than a deep network. As the Bert model uses pre-train word vectors, the understanding of the semantics between context information and words is greatly improved, and the training is carried out by using a bidirectional transducer network structure, the result shows that the Bert method is 4-6% higher in each index compared with other methods.

2) In order to verify the effectiveness of the algorithm, the sample data set is divided for verification, and under the condition that other parameters are kept unchanged, the proportion of a training set, a verification set and a test set in the data set is adjusted, so that the performance of the model under the conditions of small samples and insufficient samples is observed. Taking different percentages of data as training sets, taking half of the others as verification sets and test sets, using Precision as a measure, the results are shown in fig. 8 and table 6:

，

in the context of Table 6, the following examples are,

it can be known from fig. 8 that, when the training sample size is small, the accuracy of each model is reduced to different degrees, and the Bert model uses the pre-training model, and trains a large amount of data for different tasks, so that the comprehension capability of the Bert model to the text is greatly improved, the relationship between context semantic information and sentences can be effectively captured, and therefore, even under the condition that the training samples are insufficient, the Bert model still performs well.

The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer instructions or computer programs. The procedures or functions described in accordance with the embodiments of the present application are produced in whole or in part when the computer instructions or the computer program are loaded or executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more collections of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. The semiconductor medium may be a solid state disk.

It should be understood that the term "and/or" in this application is only one type of association relationship that describes the associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone, wherein A and B can be singular or plural. In addition, the "/" in the present application generally indicates that the former and latter associated objects are in an "or" relationship, but may also indicate an "and/or" relationship, and may be understood by referring to the former and latter text specifically.

In the present application, "at least one" means one or more, "a plurality" means two or more. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of the singular or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or multiple.

It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Those of ordinary skill in the art would appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one type of logical functional division, and other divisions may be realized in practice, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. An ATT and CK-based APT attack modeling method for a power system is characterized by comprising the following steps: the modeling method comprises the following steps:

s1: analyzing the attack target and determining to use an attack means;

s6: observing a network and a system, stealing an account name and a password to access the system, creating more accounts to help achieve a target, exploring the surrounding environment of an attack point, and spreading attacks or stealing data in each system and account;

s7: controlling the operation of a target system by establishing commands and controls with different stealth levels according to the network structure and the defense of a victim;

s8: and starting attack according to the target of the attack, packaging the stolen data, and acquiring the data from the target network and transmitting the data through a control channel.

2. The APT attack modeling method for the power system based on ATT and CK as claimed in claim 1, characterized in that: in the step S1, an attack target is analyzed, target information is collected, a defense mechanism of the target and whether the target has attack value or not are determined, and infiltration is performed through technical means to obtain information;

3. The APT attack modeling method for the power system based on ATT and CK as claimed in claim 2, characterized in that: in the step S3, attack means comprise sump attack and harpoon attack;

4. The APT attack modeling method for the power system based on ATT and CK as claimed in claim 3, characterized in that: in step S8, stealing data includes setting a size limit on transmission, so as to transmit data out of the system, the purpose of attack is to suspend and disturb normal operation of the system, and deletion or tampering of a key file, modification of a user right, and destruction of hardware devices or devices controlled by the system are often used.

5. The APT attack modeling method for the power system based on ATT and CK as claimed in claim 1, characterized in that: the ATT and the CK comprise an enterprise matrix, a mobile platform matrix and an ICS matrix, the ATT and the CK are distributed to each stage of attack, grammatical structure information and semantic information are extracted, technical texts with similar semantics are found out, and stage division is carried out;

6. The APT attack modeling method for the power system based on ATT and CK as claimed in claim 5, characterized in that: in the APT attack text, time sequence information is input into a model, namely position coding is added, the position of an element in a sequence is pos, and the dimensionality is d _k ，

，

position t encoded vector

Comprises the following steps:

，

wherein

Words that are first and second in the input sequence;

is the tth word, is greater than>

Is->

，

：

，

Wherein the Attention means a result calculated using an Attention mechanism, which is a method for calculating a correlation between different positions in an input sequence; three matrixes of Q, K and V respectively represent Query, key and Value; query is a vector for representing information of a current location, key is a vector for representing information of other locations, and Value is a vector for representing contents of other locations; the three matrices Q, K and V are obtained by linear transformation of the input sequence X, i.e. by multiplying the embedded vector by three different randomly initialized weight matrices through 3 different weight matrices

Get and/or are>

Is the dimension of the word vector/hidden layer.

7. The APT attack modeling method for the power system based on ATT and CK as claimed in claim 6, characterized in that: the transform encoder layer comprises at least 6 repeated layers, each layer comprises two sublayers, namely a self-attention mechanism and a feedforward fully-connected layer, residual connection is used in each Sublayer, and the output is normalized by Add & Norm, the output of each layer of the transform encoder layer is LayerNorm (x + Sublayer (x)), wherein LayerNorm is a normalization layer which normalizes the output vector of each position, sublayer is a Sublayer in each layer and comprises the self-attention mechanism and the feedforward fully-connected layer, x is the function realized by the sublayers, and the dimension of the output of each layer is set to be 512.

8. The APT attack modeling method for the power system based on ATT and CK as claimed in claim 5, characterized in that: the attack phase process comprises the following steps:

performing linear transformation and relu activation functions twice through a feed layer;

finally, the Add & Norm layer is passed once again.

9. The APT attack modeling method for the power system based on ATT and CK as claimed in claim 8, characterized in that: the crawler tool captures attack technology names and brief introduction texts in ATT and CK webpages and arranges the attack technology names and brief introduction texts in a matrix according to the original sequence of the technology, so that an original data set is generated, and attacks in the ATT and CK are divided into three fields of Enterprise, mobile and ICS.