CN113162794B

CN113162794B - Next attack event prediction method and related equipment

Info

Publication number: CN113162794B
Application number: CN202110113711.XA
Authority: CN
Inventors: 李泽科; 多志林; 陈泽文; 王森淼; 赵梓伦; 涂腾飞; 林静怀; 金正平; 梁野; 张华�; 徐志光; 肖飞; 秦素娟
Original assignee: Beijing University of Posts and Telecommunications; Beijing Kedong Electric Power Control System Co Ltd; State Grid Fujian Electric Power Co Ltd; State Grid Shanghai Electric Power Co Ltd; State Grid Electric Power Research Institute
Current assignee: Beijing University of Posts and Telecommunications; Beijing Kedong Electric Power Control System Co Ltd; State Grid Fujian Electric Power Co Ltd; State Grid Shanghai Electric Power Co Ltd; State Grid Electric Power Research Institute
Priority date: 2021-01-27
Filing date: 2021-01-27
Publication date: 2024-01-16
Anticipated expiration: 2041-01-27
Also published as: CN113162794A

Abstract

One or more embodiments of the present disclosure provide a method for predicting a next attack event and related devices, which may be used to predict a next attack event in a power grid, where an attack chain formed by an alarm log generated in the power grid is used as source data, and the processes of data preprocessing, feature extraction, model selection, optimization, etc. are performed to input features extracted from the attack chain into a prediction model to predict the next attack event. Meanwhile, in the preprocessing process of the attack chain, a layered sampling method is provided for the problem of unbalanced power grid number attack chain, and in the characteristic extraction process, the text characteristic extraction method for the alarm content is improved. Experiments show that the accuracy and recall rate of the method and related equipment using the method in the actual prediction process reach 84.90% and 84.91%, respectively, and the effect of effectively predicting possible attack events is achieved.

Description

Next attack event prediction method and related equipment

Technical Field

The invention relates to the technical field of computers, in particular to a next attack event prediction method and related equipment.

Background

The vast majority of attacks that exist on the internet today are multi-step attacks. The correlation analysis of the multi-step attack event refines the high-level security event from the redundant low-level alarm information, a complete multi-step attack is provided, the steps have correlation, and the last step is the reason for the occurrence of the next step. Generally, the multi-step attack step is divided into five stages: the method comprises a investigation stage, a vulnerability scanning and analyzing stage, a permission acquisition stage, a permission keeping and attack implementation stage and a trace elimination stage.

In the initial stage of a multi-step attack event, an attacker often performs less harmful attack behaviors, and in the later stage of the multi-step attack event, the attacker often performs more harmful attack behaviors.

Conventional intrusion detection systems and firewall systems commonly used today can only report a single attack to a security administrator and lack active defense functionality, which makes security data on the network bulky and tedious to directly interpret. Because a large amount of redundant information exists and the safety data lacks unified standards, the limitation of high false alarm rate is easily caused, and effective identification of real attack events is interfered.

Therefore, research on multi-source heterogeneous security data fusion becomes a hot spot in recent years, and for complex network attacks, a network security event analysis model is established, which is important for managing and analyzing multi-source heterogeneous network security data, and the occurrence of attack events can be reflected through analysis and processing of the multi-source heterogeneous security data, so that the network attacks are prevented, and the effective monitoring of the whole network security situation is realized.

Disclosure of Invention

In view of this, it is an object of one or more embodiments of the present disclosure to provide a method and related apparatus for predicting a next attack event, so as to solve the problems encountered in the prior art.

Based on the above objects, one or more embodiments of the present disclosure provide a method for predicting a next attack event, which includes the following steps:

analyzing a power grid alarm log to generate an attack chain, wherein the attack chain is built by using host IP node association in a power grid;

preprocessing the attack chain, namely filtering the attack chain to remove heavy and extracting features, wherein node features and event features of the attack chain are obtained through the feature extraction operation;

inputting the extracted node characteristics and the extracted event characteristics into a prediction model, and outputting a prediction result by the prediction model.

Optionally, the prediction model is obtained through training by a random forest algorithm, an attack chain to be trained is analyzed and generated from an alarm log to be trained, the preprocessing is carried out, the attack chain to be trained is divided into layers, and the attack chain to be trained is divided into a training set and a testing set; inputting the node characteristics and the event characteristics of the extracted training set into the random forest algorithm to train to obtain a training model, inputting the node characteristics and the time characteristics of the extracted testing set into the training model, and calculating the accuracy of an output result of the training model, wherein the accuracy reaches a preset threshold value, and the training model is the prediction model; otherwise, continuing training the training model by using the node characteristics and the event characteristics of the training set, and adjusting and optimizing the training model until the accuracy of the output result reaches the preset threshold value when the node characteristics and the time characteristics of the test set are input into the optimized training model.

Based on the same inventive concept, one or more embodiments of the present disclosure further provide a next attack event prediction apparatus, including:

the attack chain generation module analyzes the power grid alarm log to generate the attack chain, wherein the attack chain is built by using host IP node association in the power grid;

the feature extraction module is used for preprocessing the attack chain and comprises filtering heavy and feature extraction of the attack chain, wherein node features and event features of the attack chain are obtained through the feature extraction operation;

and the prediction module inputs the extracted node characteristics and the extracted event characteristics into a prediction model, and observes a prediction result output by the prediction model.

Based on the same inventive concept, one or more embodiments of the present disclosure further provide an electronic device including a memory, a processor, and a computer program stored in the memory and executable by the processor, wherein the processor implements the method for predicting a next attack event when executing the computer program.

Based on the same inventive concept, one or more embodiments of the present specification further provide a non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium stores computer instructions that, when executed by a computer, cause the computer to implement the above-described next attack event prediction method.

From the foregoing, it can be seen that the next attack event prediction method and the related device provided in one or more embodiments of the present disclosure can propose a next attack event prediction model for a multi-step attack event, to predict a significant attack event that may occur in a power grid, and a power grid technician can take precautionary measures for the prediction result output by the model, so as to prevent a significant loss of the power grid system.

According to the next attack event prediction method and the related equipment provided by one or more embodiments of the present disclosure, an attack chain formed by an alarm log generated in a power grid is used as source data, and after data preprocessing, feature extraction, model selection, optimization and other processes, a next attack event prediction model with good generalization performance is finally formed, node features and time features extracted from the attack chain are input into the prediction model, and the prediction result is output after the feature is processed by the prediction model. Meanwhile, in the preprocessing process of the attack chain, a layered sampling method is provided for the problem of unbalanced power grid number attack chain, and in the characteristic extraction process, the text characteristic extraction method for the alarm content is improved. Experiments show that the accuracy rate and recall rate of the prediction model reach 84.90% and 84.91%, respectively, and the effect of effectively predicting possible attack events is achieved.

Drawings

For a clearer description of one or more embodiments of the present description or of the solutions of the prior art, the drawings that are necessary for the description of the embodiments or of the prior art will be briefly described, it being apparent that the drawings in the description below are only one or more embodiments of the present description, from which other drawings can be obtained, without inventive effort, for a person skilled in the art.

FIG. 1 is a block diagram of a method for predicting next attack events according to one or more embodiments of the present disclosure;

FIG. 2 is a flowchart of attack chain generation provided by one or more embodiments of the present disclosure;

FIG. 3 is a diagram of a predictive model training step provided in one or more embodiments of the present disclosure;

FIG. 4 is a schematic diagram of an attack chain provided by one or more embodiments of the present disclosure;

FIG. 5 is a block diagram of a next attack event prediction apparatus provided in one or more embodiments of the present disclosure;

fig. 6 is a schematic diagram of an electronic device capable of implementing a method for predicting a next attack event according to one or more embodiments of the present disclosure.

Detailed Description

For the purposes of promoting an understanding of the principles and advantages of the disclosure, reference will now be made to the embodiments illustrated in the drawings and specific language will be used to describe the same.

It is noted that unless otherwise defined, technical or scientific terms used in one or more embodiments of the present disclosure should be taken in a general sense as understood by one of ordinary skill in the art to which the present disclosure pertains. The word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items.

As described in the background section, conventional intrusion detection systems and firewall systems can only report a single attack to security administrators and lack active defense functionality, which makes security data on the network bulky and tedious to interpret directly. Because a large amount of redundant information exists and the safety data lacks unified standards, the limitation of high false alarm rate is easily caused, and effective identification of real attack events is interfered. The prediction of multi-step attack events does not work well.

In order to solve the problems faced by the prior art, one or more embodiments of the present disclosure provide a method for predicting a next attack event and related devices. And generating an attack chain through an analysis result of the power grid alarm log analysis, carrying out feature extraction after de-duplication on the attack chain, obtaining node features and event features of the attack chain, inputting the two features into an optimized prediction model for prediction, and carrying out analysis and judgment according to a prediction result obtained by the model. The prediction model is obtained by training by using the characteristics extracted from the attack chain in the alarm log to be trained through a random forest algorithm, and the method can better predict the next attack event in the multi-step attack event and provides assistance for technicians.

Referring to fig. 1, the following steps of the attack event prediction method provided in one or more embodiments of the present disclosure are as follows:

and S101, analyzing a power grid alarm log to generate an attack chain, wherein the attack chain is constructed by using host IP node association in the power grid.

In this step, an attack chain needs to be established by using data in an alarm log, and the steps are as shown in fig. 2:

step S201, constructing an attack tree by utilizing IP node association of the host according to the data of the alarm log.

Step S202, aggregating the attack tree to obtain an initial attack chain.

And step S203, pruning and noise reduction are carried out on the initial attack chain to obtain the attack chain.

Step S102, preprocessing the attack chain, namely filtering heavy and feature extraction on the attack chain, wherein node features and event features of the attack chain are obtained through the feature extraction operation.

In the step, the generated attack chains need to be subjected to de-duplication operation, each obtained attack chain is firstly circularly traversed, the occurrence time of a first alarm event in each attack chain is recorded, and the occurrence time is compared with the start time of a production window of the corresponding attack chain; if the two are not equal, the attack chain is rejected, otherwise, the attack chain is reserved to wait for subsequent operation.

In this step, when node features of an attack chain are extracted, first, the node features are abstracted to obtain a vector set suitable for machine learning, and feature extraction is performed on the obtained vector set to obtain the node features.

The method adopted when extracting the node characteristics is to abstract the attack chain event type data text in the vector set into one-hot codes by using a text characteristic extraction function, and process the obtained codes to obtain 8-dimensional node characteristics, comprising the following steps:

accessing the number of the IP of the local machine;

accessing the number of suspicious IP of the local machine;

the local IP is a suspicious source IP, if the local IP is the suspicious source IP which is 1, otherwise, the local IP is 0;

the local IP is a victim IP, is 1 and is not 0;

the local IP is a suspicious source IP and a victim IP, is 1 and is not 0;

accessing the number of other IPs;

accessing the number of victim IPs;

total number of events with destination IP.

The text feature extraction function only considers the frequency of each word, then forms a feature matrix, and each row represents word frequency statistics results of a training text; one-hot encoding, also known as one-bit valid encoding, uses an N-bit status register to encode N states, each of which is represented by its independent register bit, and only one of which is valid at any time.

And step S103, inputting the extracted node characteristics and the extracted event characteristics into a prediction model, and outputting a prediction result by the prediction model.

As an alternative embodiment, referring to fig. 3, the prediction model training method for performing the next attack event prediction is as follows:

step S301, an alarm log is obtained from power grid safety equipment, safety data including an attack chain, key equipment and key events are obtained by analyzing the power grid alarm log, and the attack chain is divided into a training set and a testing set after being preprocessed.

In the step, when the training set and the testing set are divided, firstly, sorting the training set and the testing set according to the time displayed by the labels of the attack chains, and then sorting the attack chains every day according to the time sequence; and then extracting the first 80% as the training set and the remaining 20% as the test set according to the sorting result.

And step S302, extracting characteristics of an attack chain in a training set, inputting the obtained node characteristics and event characteristics including key equipment and key events into a random forest algorithm for training to obtain a model, and optimizing a training result to obtain a next attack event prediction model.

And step 303, extracting characteristics of an attack chain in the test set, inputting the obtained node characteristics and the time characteristics into the obtained next attack event prediction model, and observing whether the result meets a preset threshold value.

In this step, if the output result of the prediction model does not reach the preset threshold, the adjustment and optimization of the prediction model need to be continued by using the node features and the time features of the training set until the prediction result obtained by inputting the node features and the event features of the test set into the optimized prediction model reaches the preset threshold, and at this time, a prediction model capable of performing the prediction of the next attack event is obtained.

As an alternative embodiment, an attack chain is used to describe the execution process of the next attack event prediction method, and an attack chain from the power grid is used to anonymize the IP address for protecting privacy. In this attack chain, host 20.16 initiates a host scan event to host 154.2, and host 154.2 initiates an abnormal data access event to host 20.17. The attack chain is shown in fig. 4.

The two alarm events occur in the attack chain and are detected by the power grid system: the IP number of accessing the attack chain is 23; the number of suspicious IP accessing the local machine is 64; the attack chain has victim IP and suspicious IP; the number of other IP accesses of the host IP in the attack chain is 12; the number of access victim IPs is 2; the total number of events to reach the destination IP is 2.

TABLE 1 characterization of the extraction of the attack chain

Numbering device	Feature dimension	Numerical value
			1	Host scan event	1
2	Abnormal data access event	1
			3	IP number for accessing the attack chain	23
4	Number of suspicious IP accessing to local machine	64
			5	Whether there is victim IP	1
6	Whether or not source IP exists	1
			7	Whether there is victim IP and source IP	1
8	Number of accesses to other IP	12
			9	Number of access victim IP	2
10	Total number of events to reach destination IP	2

The feature 1 and the feature 2 in the table 1 are the extracted event features of the attack chain, the event type with the value of 0 obtained by feature extraction has 125 dimensions, which are not listed in the table 1, the feature 3 to the feature 10 are node features of the attack chain, the feature data in the table 1 are input into a built and optimized prediction model, and the prediction result output by the model is that the next event of the attack chain is an intrusion event.

The next attack event prediction method provided by one or more embodiments of the present disclosure can predict a multi-step attack event in a power grid, and predict a next attack event possibly occurring in advance in the first several stages of the multi-step attack, so as to actively prevent occurrence of a major hazard in real time, and correct attack weaknesses and security vulnerabilities in the network according to the next predicted event timely and effectively, and timely respond and block the network attack.

Aiming at multi-step attack events occurring in a power grid, one or more embodiments of the present disclosure provide a next-step attack event prediction method, wherein an attack chain formed by an alarm log occurring in the power grid is used as source data, and the next-step attack event prediction model with better generalization performance is finally formed through the processes of data preprocessing, feature extraction, model selection, optimization and the like. Meanwhile, in the preprocessing process of the data set, a hierarchical sampling method is provided for the problem of unbalanced power grid data, and in the characteristic extraction process, the text characteristic extraction method for the alarm content is improved. The accuracy rate and recall rate of the next attack event prediction model of the invention reach 84.90% and 84.91%, respectively, so as to achieve the effect of effectively predicting the possible attack event.

It should be noted that the methods of one or more embodiments of the present description may be performed by a single device, such as a computer or server. The method of the embodiment can also be applied to a distributed scene, and is completed by mutually matching a plurality of devices. In the case of such a distributed scenario, one of the devices may perform only one or more steps of the methods of one or more embodiments of the present description, the devices interacting with each other to accomplish the methods.

It should be noted that the foregoing describes specific embodiments of the present invention. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

Based on the same inventive concept, one or more embodiments of the present disclosure also provide a next attack event prediction device corresponding to the method of any embodiment.

Referring to fig. 5, the next attack event prediction apparatus includes:

and the attack chain generation module 501 analyzes the power grid alarm log to generate the attack chain, wherein the attack chain is constructed by using host IP node association in the power grid.

The feature extraction module 502 performs preprocessing on the attack chain, including filtering heavy and feature extraction on the attack chain, where node features and event features of the attack chain are obtained through the feature extraction operation.

And the prediction module 503 inputs the extracted node characteristics and the event characteristics into a prediction model, and observes a prediction result output by the prediction model.

For convenience of description, the above devices are described as being functionally divided into various modules, respectively. Of course, the functions of each module may be implemented in one or more pieces of software and/or hardware when implementing one or more embodiments of the present description.

The device of the foregoing embodiment is configured to implement the corresponding next attack event prediction method in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which is not described herein.

Based on the same inventive concept, one or more embodiments of the present disclosure further provide an electronic device, corresponding to the method of any of the embodiments, including a memory, a processor, and a computer program stored on the memory and capable of running on the processor, where the processor executes the program to implement the method of predicting a next attack event according to any of the embodiments.

Fig. 6 shows a more specific hardware architecture of an electronic device according to this embodiment, where the device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 implement communication connections therebetween within the device via a bus 1050.

The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit ), microprocessor, application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, etc. for executing relevant programs to implement the technical solutions provided in the embodiments of the present disclosure.

The Memory 1020 may be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory ), static storage device, dynamic storage device, or the like. Memory 1020 may store an operating system and other application programs, and when the embodiments of the present specification are implemented in software or firmware, the associated program code is stored in memory 1020 and executed by processor 1010.

The input/output interface 1030 is used to connect with an input/output module for inputting and outputting information. The input/output module may be configured as a component in a device (not shown) or may be external to the device to provide corresponding functionality. Wherein the input devices may include a keyboard, mouse, touch screen, microphone, various types of sensors, etc., and the output devices may include a display, speaker, vibrator, indicator lights, etc.

Communication interface 1040 is used to connect communication modules (not shown) to enable communication interactions of the present device with other devices. The communication module may implement communication through a wired manner (such as USB, network cable, etc.), or may implement communication through a wireless manner (such as mobile network, WIFI, bluetooth, etc.).

Bus 1050 includes a path for transferring information between components of the device (e.g., processor 1010, memory 1020, input/output interface 1030, and communication interface 1040).

It should be noted that although the above-described device only shows processor 1010, memory 1020, input/output interface 1030, communication interface 1040, and bus 1050, in an implementation, the device may include other components necessary to achieve proper operation. Furthermore, it will be understood by those skilled in the art that the above-described apparatus may include only the components necessary to implement the embodiments of the present description, and not all the components shown in the drawings.

The electronic device of the foregoing embodiment is configured to implement the corresponding next attack event prediction method in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which is not described herein.

Based on the same inventive concept, one or more embodiments of the present disclosure also provide a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the next attack event prediction method according to any of the embodiments.

The computer readable media of the present embodiments, including both permanent and non-permanent, removable and non-removable media, may be used to implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device.

The storage medium of the foregoing embodiment stores computer instructions for causing the computer to execute the next attack event prediction method according to any of the foregoing embodiments, and has the advantages of the corresponding method embodiments, which are not described herein.

Those of ordinary skill in the art will appreciate that: the discussion of any of the embodiments above is merely exemplary and is not intended to suggest that the scope of the disclosure, including the claims, is limited to these examples; combinations of features of the above embodiments or in different embodiments are also possible within the spirit of the present disclosure, steps may be implemented in any order, and there are many other variations of the different aspects of one or more embodiments described above which are not provided in detail for the sake of brevity.

Additionally, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown within the provided figures, in order to simplify the illustration and discussion, and so as not to obscure one or more embodiments of the present description. Furthermore, the apparatus may be shown in block diagram form in order to avoid obscuring the one or more embodiments of the present description, and also in view of the fact that specifics with respect to implementation of such block diagram apparatus are highly dependent upon the platform within which the one or more embodiments of the present description are to be implemented (i.e., such specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the disclosure, it should be apparent to one skilled in the art that one or more embodiments of the disclosure can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative in nature and not as restrictive.

While the present disclosure has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of those embodiments will be apparent to those skilled in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic RAM (DRAM)) may use the embodiments discussed.

The present disclosure is intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims. Any omissions, modifications, equivalents, improvements, and the like, which are within the spirit and principles of the one or more embodiments of the disclosure, are therefore intended to be included within the scope of the disclosure.

Claims

1. A next attack event prediction method, comprising:

analyzing the power grid alarm log to generate an attack chain; the attack chain is built by using host IP node association in the power grid;

filtering the attack chain to remove heavy and extracting features; the node characteristics and the event characteristics of the attack chain are obtained through the characteristic extraction operation;

wherein filtering the attack chain for deduplication comprises:

traversing each attack chain in a circulating way and recording the occurrence time of a first alarm event in each attack chain;

comparing the occurrence time with the start time of the production window of each attack chain corresponding to the occurrence time, and if the occurrence time is not equal to the start time of the production window of each attack chain, removing the attack chain;

extracting the attack chain characteristics, including:

abstracting the attack chain into a vector set, abstracting the vector set into a single thermal code by adopting a text feature extraction function, and taking the single thermal code as the node feature;

the key events and key hosts recorded in the power grid alarm log are used as the event characteristics;

2. The method of claim 1, wherein analyzing the grid alarm log generates an attack chain comprising:

analyzing the alarm log, and constructing an attack tree by the IP node association of the host;

performing aggregation treatment on the attack tree to obtain an initial attack chain;

pruning and noise reduction are carried out on the initial attack chain to obtain the attack chain.

3. The method of claim 2, wherein the predictive model is trained by a random forest algorithm, comprising:

analyzing and generating an attack chain to be trained from an alarm log to be trained, preprocessing, carrying out layered division on the attack chain to be trained, and dividing the attack chain to be trained into a training set and a testing set;

inputting the node characteristics and the event characteristics of the training set into the random forest algorithm to train to obtain a training model, inputting the node characteristics and the time characteristics of the test set into the training model, and calculating the accuracy of an output result of the training model, wherein the accuracy reaches a preset threshold value, and the training model is the prediction model;

otherwise, continuing training the training model by using the node characteristics and the event characteristics of the training set, and adjusting and optimizing the training model until the accuracy of the output result reaches the preset threshold value when the node characteristics and the time characteristics of the test set are input into the optimized training model.

4. A method according to claim 3, wherein hierarchically partitioning the attack chain to be trained comprises:

sequencing the to-be-trained attack chains according to the time displayed by the labels of the to-be-trained attack chains, and sequencing the to-be-trained attack chains every day according to the time sequence;

according to the sorting result, the first 80% is extracted as the training set, and the remaining 20% is extracted as the test set.

5. A next attack event prediction apparatus comprising:

wherein filtering the attack chain for deduplication comprises:

extracting the attack chain characteristics, including:

6. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable by the processor, characterized in that the processor implements the method according to any one of claims 1 to 4 when executing the computer program.

7. A non-transitory computer readable storage medium storing computer instructions which, when executed by a computer, cause the computer to implement the method of any one of claims 1 to 4.