CN116401680A

CN116401680A - Industrial control vulnerability detection method and system based on gradient lifting decision tree algorithm

Info

Publication number: CN116401680A
Application number: CN202310677008.0A
Authority: CN
Inventors: 原树生
Original assignee: Beijing Wangteng Technology Co ltd
Current assignee: Beijing Wangteng Technology Co ltd
Priority date: 2023-06-08
Filing date: 2023-06-08
Publication date: 2023-07-07

Abstract

The application discloses an industrial control vulnerability detection method and system based on a gradient lifting decision tree algorithm, comprising the following steps: acquiring a data packet in an industrial control system, analyzing the data packet, extracting data characteristics, and dividing the data characteristics into a training set and a testing set; training a vulnerability detection model by using the training set based on a gradient lifting decision tree algorithm; evaluating the trained vulnerability detection model according to the test set; selecting and calculating performance evaluation indexes of the vulnerability detection model; evaluating the vulnerability detection model according to the calculated value of the performance evaluation index; and (5) detecting the loopholes of the industrial control system through a loophole detection model meeting the evaluation standard. The method and the device continuously train the vulnerability detection model to reach the detection model meeting the expected standard, and have high detection accuracy.

Description

Industrial control vulnerability detection method and system based on gradient lifting decision tree algorithm

Technical Field

The application relates to the field of computer network security, in particular to an industrial control vulnerability detection method and system based on a gradient lifting decision tree algorithm.

Background

The industrial control system transmits data to the industrial Internet in real time through equipment such as a sensor, an instrument and the like, and remote control and monitoring of industrial equipment and industrial production flow are realized. In recent years, with the rapid development of the internet of things and the continuous promotion of market demands, industrial control systems are continuously developed towards the direction of intellectualization, automation and high efficiency. However, with popularization and promotion of industrial control systems, network security problems of industrial control systems are increasingly receiving attention of enterprises. Once the network security of the industrial control system is problematic, unpredictable losses are caused to the industrial production activities of enterprises and the life and property security of workers. Therefore, enterprises commonly adopt the industrial control vulnerability detection system to monitor network security vulnerabilities, so that vulnerability information can be mastered in time, corresponding risk avoidance measures are adopted, the safety and the production efficiency of the industrial control system are improved, and the orderly performance of industrial production activities is ensured.

The industrial control vulnerability detection system is a tool special for carrying out security vulnerability scanning and detection on the industrial control system, can help enterprises to discover vulnerabilities and potential safety hazards existing in the industrial control system in time, and improves the security protection capability of the industrial control system. The industrial control vulnerability detection system integrates the functions of automatically identifying the industrial control system, constructing a vulnerability database, generating a risk assessment report, a safety detection report and the like according to a targeted vulnerability scanning technology, and has the core functions of rapidly and automatically detecting and analyzing the safety vulnerability and hidden danger in the industrial control system.

At present, along with the continuous development of information technology and continuous popularization of Internet application, the number of loopholes in the industrial Internet is continuously increased, the types of loopholes show a diversified trend, the complexity of the loopholes is also increased, and the traditional industrial control loophole detection system is difficult to meet the safety requirement of a modern industrial control system. At present, the traditional industrial control vulnerability detection system mainly has the following problems: 1) The leak detection accuracy is not high: as modern industrial control systems are increasingly complex, the accuracy of leak detection is lower and lower, and the phenomenon of missing detection or false detection often occurs, so that the efficiency of industrial production activities is greatly affected; 2) The occupation of resources is high: when the conventional vulnerability detection system faces a novel vulnerability detection scene, the problem of high calculation complexity exists, and massive memory resources are occupied, so that the performance of the whole industrial control system is reduced, and the stability is weakened; 3) The requirements of the professional technology are higher: the traditional industrial control system of the increasingly complex vulnerability detection scene is not adequate, a large number of professional security personnel are required to be invested for manual judgment, and the service level of the security personnel is also required to be very high; 4) System compatibility is poor: the existing industrial control system has various systems and protocols, and the traditional vulnerability detection system cannot be well compatible with the industrial control system.

Disclosure of Invention

Purpose of (one) application

Based on the above, in order to improve the accuracy of the vulnerability detection and solve the problem that the vulnerability detection system cannot be compatible when the industrial control system is changed, the application discloses the following technical scheme.

(II) technical scheme

The application discloses an industrial control vulnerability detection method based on a gradient lifting decision tree algorithm, which comprises the following steps:

s1, acquiring a data packet in an industrial control system, analyzing the data packet, extracting data characteristics, and dividing the data characteristics into a training set and a testing set;

s11, carrying out dynamic and static combination analysis on the data characteristics, and forming a digital vector from the data characteristics;

s12, dividing data features in the form of digital vectors to obtain a training set and a testing set;

s2, training a vulnerability detection model by using the training set based on a gradient lifting decision tree algorithm;

s3, evaluating a trained vulnerability detection model according to the test set;

s31, selecting and calculating performance evaluation indexes of the vulnerability detection model;

s32, evaluating the vulnerability detection model according to the calculated value of the performance evaluation index;

s4, achieving vulnerability detection of the industrial control system through a vulnerability detection model meeting evaluation standards.

In one possible implementation, the data features include static features including the number of annotations and dynamic features

Number of variables->

Number of functions->

Operator quantity->

Instruction sequence

Control flow graph->

The method comprises the steps of carrying out a first treatment on the surface of the The dynamic characteristics include API call +>

Function call

Input/output->

Resource utilization->

And memory map->

。

In one possible implementation, the numerical vector construction formula includes:

. In one possible implementation manner, the training process of the vulnerability detection model includes:

s21, initializing a classifier: setting an initial classifier as an average value of all sample data features in a training set;

wherein,,

representing a current classifier; />

Representing the data characteristics of the ith sample in the training set, wherein n is the number of samples in the current training set; />

An actual value representing a data characteristic of the i-th sample;

s22, calculating residual errors: calculating a residual error of each sample;

wherein,,

residual representing sample i on the mth decision tree,/->

Representing a predicted value of the current decision tree;

s23, building a tree model: fitting residual

Learning a regression tree to obtain a regression tree +.>

；

S24, increasing model complexity: adding the current decision tree into the regression tree to obtain an updated decision tree;

s25, repeating S22-S24: stopping iteration after the fitting effect is achieved, and obtaining a final lifting tree;

wherein, promote tree

Is a weighted sum of the first M trees.

In one possible implementation manner, the performance evaluation index and the evaluation method include:

accuracy P:

the accuracy P refers to the proportion of the code which is correctly classified and judged in all samples, the higher the accuracy is, the higher the success rate of vulnerability detection is, TP represents the true benign code, and FP represents the false benign code;

recall ratio R:

the recall rate R refers to the number of samples which are correctly predicted as benign codes by the vulnerability detection model in all samples which are actually benign codes, the recall rate is high, the missed judgment of the classifier on the classification of the positive samples is less, and FN represents false malicious codes;

f1 score:

wherein the F1 score is a harmonic average of the accuracy and the recall, the higher the F1 value, the better the classifier performance.

As a second aspect of the present application, the present application further discloses an adaptive industrial control vulnerability detection system based on a gradient lifting decision tree algorithm, including:

the acquisition module is used for: the method comprises the steps of acquiring a data packet in an industrial control system, analyzing the data packet, extracting data characteristics, and dividing the data characteristics into a training set and a testing set;

training module: the vulnerability detection model is trained by the training set based on a gradient lifting decision tree algorithm;

and an evaluation module: for evaluating the vulnerability detection model from the test set;

and a detection module: the method is used for achieving vulnerability detection of the industrial control system through a vulnerability detection model meeting evaluation standards.

In one possible implementation, the acquiring module includes:

an analysis submodule: the method is used for analyzing dynamic and static combination of the data features, and forming the data features into digital vectors;

dividing a molecular module: the method is used for dividing the data characteristics in the form of digital vectors to obtain a training set and a testing set.

In one possible implementation, the evaluation module includes:

selecting a sub-module: the performance evaluation index is used for selecting and calculating the vulnerability detection model;

an evaluation sub-module: for evaluating the vulnerability detection model based on the calculated value of the performance evaluation index

As a third aspect of the application, the application also discloses a storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the method of any of the above.

As a fourth aspect of the present application, the present application also discloses an electronic device, including: one or more processors and a memory, wherein the memory is configured to store one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of the preceding claims.

(III) beneficial effects

According to the industrial control vulnerability detection method and system based on the gradient lifting decision tree algorithm, the vulnerability detection model is trained based on the gradient lifting decision tree algorithm, the data characteristics are described through the multidimensional digital vectors, whether the vulnerability detection model has a vulnerability or not is analyzed as much as possible, the detection accuracy is improved, the false alarm probability is reduced, the detection performance of the vulnerability detection model is evaluated according to the performance evaluation index, the overall performance of the current model and the optimization direction of the next step are determined according to the quality of the performance evaluation index, and the accuracy of model vulnerability detection is further improved.

Drawings

The embodiments described below with reference to the drawings are exemplary and intended for the purpose of illustrating and explaining the present application and are not to be construed as limiting the scope of protection of the present application.

Fig. 1 is a schematic flow chart of an industrial control vulnerability detection method based on a gradient lifting decision tree algorithm disclosed in the present application.

FIG. 2 is a block diagram of an industrial control vulnerability detection system based on a gradient lifting decision tree algorithm disclosed in the present application.

Detailed Description

In order to make the purposes, technical solutions and advantages of the implementation of the present application more clear, the technical solutions in the embodiments of the present application will be described in more detail below with reference to the accompanying drawings in the embodiments of the present application.

An embodiment of the industrial control vulnerability detection method based on the gradient lifting decision tree algorithm disclosed in the application is described in detail below with reference to fig. 1. The method disclosed by the embodiment mainly comprises the following steps.

the industrial control system is composed of different hardware, software, an operating system, a network protocol and a network environment, a large number of data packets are generated in interaction, the data packets are captured in the industrial control system, and the captured data packets are divided into benign data packets without loopholes and malignant data packets with loopholes in a manual discrimination mode;

wherein the data features include static features including annotation quantity and dynamic features

Number of variables->

Number of functions->

Number of operatorsQuantity->

Instruction sequence->

Control flow graph

Function call +.>

Input/output

Resource utilization->

And memory map->

。

The numerical vector formation formula includes:

。

the data features in the data packets are all described by digital vectors, and each data packet is composed of 11-dimensional digital vectors.

and respectively selecting 4/5 digital vectors from the benign data packet and the malignant data packet as a training set, and using the remaining 1/5 digital vectors as a test set.

the training method comprises the following specific steps of:

s21, initializing a classifier: setting an initial classifier as a training setAn average of all sample data features;

wherein,,

representing a current classifier; />

An actual value representing a data characteristic of the i-th sample;

s22, calculating residual errors: calculating the residual error of each sample, namely the difference between the predicted value and the target variable;

wherein,,

residual representing sample i on the mth decision tree,/->

Representing a predicted value of the current decision tree;

s23, building a tree model: fitting residual

Learning a regression tree to obtain a regression tree +.>

；

s25, repeating S22-S24: reach fittingAfter the effect, stopping iteration to obtain a final lifting tree;

wherein, promote tree

Is a weighted sum of the first M trees.

wherein the evaluating step comprises:

the performance evaluation index of the test comprises an accuracy rate, a recall rate and an F1 score.

wherein, the accuracy rate P:

recall ratio R:

f1 score:

When the test result meets the condition that the accuracy is higher than 99%, the recall rate is lower than 0.05%, and the F1 score exceeds 0.98, the vulnerability detection model meeting the evaluation standard can be deployed and used, and the data characteristics applied by the industrial control system can be input into the vulnerability detection model to realize vulnerability detection.

For the vulnerability detection model which does not meet the evaluation standard, the data in the training set needs to be expanded, and the training process of S2 is continued.

An embodiment of the industrial control vulnerability detection system based on the gradient lifting decision tree algorithm disclosed in the present application is described in detail below with reference to fig. 2. The system disclosed in this embodiment includes:

wherein, the acquisition module includes:

wherein, select the submodule: the performance evaluation index is used for selecting and calculating the vulnerability detection model;

an evaluation sub-module: and the vulnerability detection model is used for evaluating the vulnerability detection model according to the calculated value of the performance evaluation index.

In summary, the invention constructs the 11-dimensional digital vector to describe the data characteristics by the dynamic and static combined data analysis method, which is helpful to further analyze whether the loophole occurs in the working condition system as detailed as possible, improve the detection accuracy and reduce the false alarm probability. According to the invention, three indexes of accuracy, recall and F1 score are used for evaluating the vulnerability detection performance of the model. In general, when the accuracy of a classifier is high, its recall rate tends to be low, and vice versa. In order to balance the relation between the accuracy and the recall, F1 score is introduced to carry out comprehensive evaluation, so that the accuracy of model vulnerability detection can be further improved.

The gradient lifting decision tree algorithm is used for vulnerability detection of an industrial control system. The gradient lifting decision tree algorithm has the characteristics of high precision and strong robustness, and has better leak detection performance than the traditional leak detection system. In the vulnerability detection, the characteristics of the vulnerability need to be comprehensively interpreted and analyzed so that the vulnerability can be rapidly positioned, and the gradient lifting decision tree algorithm has the characteristics of strong interpretation and easy understanding of output results, and has great advantages in the vulnerability detection field.

The invention can train the vulnerability detection model matched with the industrial control system formed by different hardware, software, operating systems, network protocols and network environments, can quickly train the model when the industrial control system is changed, and can update the model at any time to adapt to the new system condition, thereby having extremely strong system compatibility and expansibility which are not possessed by the traditional industrial control vulnerability detection system.

It should be noted that the above modules may be executed on a computer terminal.

The self-adaptive industrial control vulnerability detection system based on the gradient lifting decision tree algorithm comprises a processor and a memory, wherein the modules and the like are stored in the memory as program units, and the processor executes the program units stored in the memory to realize corresponding functions.

The processor includes a kernel, and the kernel fetches the corresponding program unit from the memory. The core may be provided with one or more memories, which may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM), including at least one memory chip.

The embodiment of the invention provides a computer readable storage medium, wherein a program is stored on the computer readable storage medium, and the program is executed by a processor to realize the self-adaptive industrial control vulnerability detection method based on the gradient lifting decision tree algorithm.

The embodiment of the invention provides electronic equipment, which comprises a processor, a memory and a program stored in the memory and capable of running on the processor, wherein the self-adaptive industrial control vulnerability detection method based on a gradient lifting decision tree algorithm is realized when the processor executes the program. The device herein may be a server, a PC (computer), a PAD (portable computer), a mobile phone, etc.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, etc., such as Read Only Memory (ROM) or flash RAM. Memory is an example of a computer-readable medium.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

In the description of this application, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises an element.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. which are within the spirit and principles of the present application are intended to be included within the scope of the claims of the present application.

Claims

1. The industrial control vulnerability detection method based on the gradient lifting decision tree algorithm is characterized by comprising the following steps of:

2. The method of claim 1, wherein the data features include static features and dynamic features, the static features including a number of annotations

Number of variables->

Number of functions->

Operator quantity->

Instruction sequence->

Control flow graph->

Function call

Input/output->

Resource utilization->

And memory map->

。

3. The method of claim 2, wherein the numerical vector formulation formula comprises:

。

4. the method of claim 3, wherein the training process of the vulnerability detection model comprises:

wherein (1)>

Representing a current classifier; />

An actual value representing a data characteristic of the i-th sample;

s22, calculating residual errors: calculating a residual error of each sample;

wherein (1)>

Representing the residual of sample i on the mth decision tree,

representing a predicted value of the current decision tree;

s23, building a tree model: fitting residual

Learning a regression tree to obtain a regression tree +.>

；

wherein, promote tree->

Is a weighted sum of the first M trees.

5. The method of claim 1, wherein the performance assessment indicator and assessment method comprises:

accuracy P:

wherein the accuracy P refers to the code with correct classification judgment in all samplesThe higher the accuracy rate is, the higher the success rate of vulnerability detection is, TP represents true benign codes, and FP represents false benign codes;

recall ratio R:

f1 score:

6. The self-adaptive industrial control vulnerability detection system based on the gradient lifting decision tree algorithm is characterized by comprising the following steps:

7. The system of claim 6, wherein the acquisition module comprises:

8. The system of claim 6, wherein the evaluation module comprises:

9. A storage medium having stored thereon a plurality of instructions adapted to be loaded by a processor and to carry out the method of any one of claims 1 to 5.

10. An electronic device, comprising: one or more processors and a memory, wherein the memory is configured to store one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-5.