CN113159615A - Intelligent information security risk measuring system and method for industrial control system - Google Patents

Intelligent information security risk measuring system and method for industrial control system Download PDF

Info

Publication number
CN113159615A
CN113159615A CN202110505744.9A CN202110505744A CN113159615A CN 113159615 A CN113159615 A CN 113159615A CN 202110505744 A CN202110505744 A CN 202110505744A CN 113159615 A CN113159615 A CN 113159615A
Authority
CN
China
Prior art keywords
module
data
risk
control system
information security
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110505744.9A
Other languages
Chinese (zh)
Inventor
麦荣章
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202110505744.9A priority Critical patent/CN113159615A/en
Publication of CN113159615A publication Critical patent/CN113159615A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements

Abstract

The invention belongs to the technical field of information security, and discloses an intelligent measuring system and method for information security risk of an industrial control system, wherein the intelligent measuring system for the information security risk of the industrial control system comprises: the system comprises a data set acquisition module, a data preprocessing module, a central control module, a risk assessment model construction module, a model training module, a risk prediction module, an information safety assessment module, an early warning module, a data storage module and an updating display module. According to the method, the random forest prediction model optimized by a time sequence prediction method is adopted, so that the influence of a plurality of factors influencing the predicted value is reduced, and the prediction precision is improved; the random forest algorithm has the advantages when a large number of data samples are processed, the requirement on data is not high when the data are processed, the random forest algorithm can be classified variables or continuous variables, and the accuracy is more stable. Meanwhile, the invention improves the algorithm on the basis of the prior art, improves the model prediction accuracy value and predicts the future risk value accurately according to the existing risk value.

Description

Intelligent information security risk measuring system and method for industrial control system
Technical Field
The invention belongs to the technical field of information safety, and particularly relates to an intelligent measurement system and method for information safety risk of an industrial control system.
Background
Currently, industrial control systems are a prerequisite for high-speed transmission of large amounts of data (e.g., image and audio signals), thereby forming a combination of ethernet and control networks, which is currently very popular in the commercial field. The wave of the industrial control system network integrates various general technologies such as an embedded technology, multi-standard industrial control network interconnection, a wireless technology and the like, thereby expanding the development space and new development space of the industrial control field and bringing development opportunities.
With the development of computer technology, communication technology and control technology, the traditional control field has changed unprecedentedly, and starts to develop towards networking. With rapid development of industrialization and informatization, information technology and communication network technology are increasingly adopted by industrial control systems, and information security of the industrial control systems faces a serious challenge.
In some prediction problems, the dependent variable is affected by the variation of multiple independent variables. For example, the subject of the survey is a house, and the relevant attributes include a house price, a number of rooms, a floor, a geographic location, or a residential area. In this case, it is necessary to use a plurality of linear regression or neural network models to predict the dependent variable using the independent variable. However, the analysis method used in this prediction is different. Time series prediction uses historical data for a particular variable to predict future data for that variable. It has two main functions: first, surveys include only certain variables and look at how the variables have changed in the past; second, time series analysis does not require attention to the properties of other variables that may affect the target. The implementation of time series relies on time series decomposition, i.e., the decomposition of data into trend, season and noise components. The first two of which are referred to as system components because they are predictable. However, since the noise component is random, it is sometimes called a non-systematic component.
Random forest algorithms have become a common tool for many researchers and are used in many fields. The random forest has low requirements on data when processing the data, and can be classified variable or continuous variable, so that the data processing becomes easier and the application range is wider. Secondly, the random forest has the characteristics of being capable of performing discriminant analysis, logistic regression and multiple linear regression; in general, applying random forest preconditions is relatively free, without any statistical parameters required for the normality, homogeneity of variance of independent variables.
Through the above analysis, the problems and defects of the prior art are as follows:
(1) industrial control systems have difficulty eliminating information risks from a source.
(2) The core of risk assessment is to estimate the total loss of the industrial control system caused by various external threats or resource loss, and the method aims to assess the vulnerability and threat degree in the whole system on the premise.
(3) The algorithm adopted by the prior art has high requirements on data, is easy to overfit and has insufficient accuracy.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides an intelligent determination system and method for information security risk of an industrial control system.
The invention is realized in this way, an industrial control system information security risk intelligence survey system, the said industrial control system information security risk intelligence survey system includes:
the system comprises a data set acquisition module, a data preprocessing module, a central control module, a risk assessment model construction module, a model training module, a risk prediction module, an information safety assessment module, an early warning module, a data storage module and an updating display module.
The data set acquisition module is connected with the central control module and used for acquiring a plurality of risk influencing elements of the industrial control system and a plurality of groups of evaluation values corresponding to the risk influencing elements through data set acquisition equipment and taking the risk influencing elements and the evaluation values as an initial sample data set;
the data preprocessing module is connected with the central control module and used for preprocessing the acquired initial sample data set of the industrial control system through a data preprocessing program, and the data preprocessing module comprises:
(1) labeling a plurality of risk influence elements of the industrial control system and a plurality of groups of evaluation values corresponding to the risk influence elements by using a plurality of labeling frames of different types to obtain a first data set;
(2) determining the data acquisition cost according to a reverse-climbing mechanism triggered in the labeling process of the first data set; determining the data cleaning cost according to the structure type of the collected test data packet;
(3) determining data storage cost according to at least one data storage format and data quantity corresponding to various data storage formats;
(4) determining a data processing cost based on the data acquisition cost, the data cleaning cost and the data storage cost, and processing the initial sample data set according to the data processing cost;
the central control module is connected with the data set acquisition module, the data preprocessing module, the risk assessment model construction module, the model training module, the risk prediction module, the information security assessment module, the early warning module, the data storage module and the updating display module and is used for coordinating and controlling the normal operation of each module of the information security risk intelligent determination system of the industrial control system through the central processing unit;
the risk assessment model building module is connected with the central control module and used for building a risk assessment model through a model building program, and the risk assessment model building module comprises:
(1) acquiring a plurality of groups of evaluation value data corresponding to a plurality of risk influence elements;
(2) analyzing the correlation of the evaluation value data, and drawing a time series curve of the evaluation value data;
(3) jumping points and inflection points of the time series curve are obtained, and a stable time series ARMA model is selected for curve fitting;
(4) constructing according to the curve fitting data by using a model construction program to obtain a risk assessment model;
the model training module is connected with the central control module and used for taking the initial sample data set after preprocessing as a training sample, and training a random forest optimized by a time sequence algorithm through a model training program to obtain a final risk assessment model, and the model training module comprises:
(1) taking the initial sample data set after preprocessing as a training sample, predicting the training sample data by adopting a time sequence, and predicting future data of a single variable by using historical data of the variable;
(2) constructing a random forest decision tree and determining attribute test conditions;
(3) training a random forest risk assessment model by using a regrF _ train function defined in an RF toolkit;
the risk prediction module is connected with the central control module and used for inputting the evaluation value of the risk influence element into a risk evaluation model through a risk prediction program to obtain a prediction value of the information security risk of the industrial control system;
the information security evaluation module is connected with the central control module and used for evaluating the obtained predicted value of the information security risk of the industrial control system through an information security evaluation program and generating an evaluation report;
the early warning module is connected with the central control module and is used for carrying out early warning notification on the abnormal information safety risk of the industrial control system through the acousto-optic early warning device;
the data storage module is connected with the central control module and used for storing the acquired initial sample data set, data preprocessing results, risk assessment models, model training results, predicted values of information security risks of the industrial control system, information security assessment reports and early warning notifications of the industrial control system through a memory;
and the updating display module is connected with the central control module and is used for updating and displaying the acquired initial sample data set, the data preprocessing result, the risk assessment model, the model training result, the predicted value of the information security risk of the industrial control system, the information security assessment report and the real-time data of the early warning notice of the industrial control system through a display.
Further, in the data set acquisition module, the risk influencing elements include: enterprise management layer elements, process control layer elements and field control layer elements; the enterprise management layer elements include: unauthorized access, malicious code, distributed denial of service, virus trojan and forgery attacks; the process control layer elements include: denial of service attacks, DOS attacks, flooding attacks, response spoofing, and direction misleading attacks; the field control layer elements include: physical attacks, information theft, data tampering, denial of service attacks, illegal access, and replay attacks.
Further, in the data preprocessing module, the determining the data acquisition cost includes:
1) searching the acquisition difficulty corresponding to the triggered anti-crawling mechanism from the corresponding relation between the preset anti-crawling mechanism and the acquisition difficulty; the larger the cracking difficulty of the anti-crawling mechanism is, the larger the acquisition difficulty corresponding to the anti-crawling mechanism is;
2) and determining the product of the sum of the searched acquisition difficulties and the basic acquisition cost as the data acquisition cost.
Further, in the model training module, the following statistics are needed in the data-based time sequence analysis:
(1) time interval: t 1,2,3.., n;
(2) time-series data: y is1,y2,y3,...,yn
(3) Predicting the value: fn+hA predicted value representing an h-th time interval after n; when h is 1, it means the next time interval immediately after one interval; h represents a time span, set to a value greater than 1;
(4) prediction error: at time t, et=yt-Ft
Further, in the model training module, the attribute testing condition includes:
(1) binary property: the test condition of the binary attribute can generate two output results;
(2) nominal attributes: the nominal attribute has a plurality of output results, and the nominal attribute has two expression modes, namely multipath division and binary division;
(3) the sequence value attribute is as follows: dividing the grouping of the attribute values into two or more paths according to the artificial desire under the condition of not influencing the sequence of the values;
(4) continuous attributes: for continuous attributes, the test condition is a comparison test with binary output (A < v) or (A > v), and may also be a range query.
Further, if multi-way partitioning is applied, all possible partition points and continuous intervals are fully considered; the discretization method can be adopted for the continuous attribute, each discretization interval is endowed with a new ordinal value, if the ordering of the intervals is kept, adjacent ordinal values can be gathered into a wider interval, and the optimal characteristic attribute is selected, wherein:
gini coefficient: the Gini coefficient index is a relative index and is widely applied to the aspects of economics or statistics; in the decision tree algorithm, the coefficient represents the degree of confusion of the attribute category in the pre-classified data set, and if P (X, Y) ═ P (X) × P (Y), and X and Y are independent from each other, then:
Log(XY)=Log(X)+Log(Y);
Figure BDA0003058352180000071
Figure BDA0003058352180000072
further, in the model training module, the random forest further includes:
the main algorithm idea of the random forest is random sampling, namely randomly collecting a fixed number of samples in a preprocessed training set, and adopting a replacement extraction method for each extraction; in such a case, the same number of samples are collected each time, but the contents are different, and the specific algorithm flow is as follows:
setting a sample set D { (x1, y1), (x2, y2), … (xn, yn) }, setting the iteration number of the weak classifier as T, and outputting a strong classifier f (x); for T ═ 1,2, …, T, there are:
firstly, randomly sampling a training set for the t time, and acquiring m times in total to obtain a sampling set Dt containing m samples;
training the t-th weak classifier by using the sampling set Dt;
and thirdly, the final m results are aggregated into the result f (x) of the strong classifier.
Further, in the model training module, the format of the regRF _ train function defined in the RF toolkit is as follows:
model=regRF_train(X,Y,ntree,mtry,extra_options);
in the expression, two parameters are mainly used by calling, and the other parameters can be freely selected to be used or not; wherein, X represents a data matrix, and the input normalized training set pn _ train is taken as the input data to be trained when the function is called; y represents a target value, and the output normalized training set tn _ train is taken as an output data set of the current training; secondly, selecting parameters, wherein the number of trees constructed by the model is expressed by ntree; defining the parameter as 20100, i.e. 20100 trees; mtry is the characteristic number in the tree and is used as the branch of the tree; setting the parameter to 45, and returning the model obtained by training to the model;
and finally, a section of existing risk value is input into the prediction model, and the future risk value is predicted step by step according to the trained model.
Another object of the present invention is to provide a computer program product stored on a computer readable medium, which includes a computer readable program for providing a user input interface to apply the intelligent risk determination system for information security of an industrial control system when the computer program product is executed on an electronic device.
Another object of the present invention is to provide a computer-readable storage medium, which stores instructions that, when executed on a computer, cause the computer to apply the intelligent industrial control system information security risk determination system.
By combining all the technical schemes, the invention has the advantages and positive effects that: according to the intelligent measuring system for the information security risk of the industrial control system, the random forest prediction model optimized by the time sequence prediction method is adopted, the influence of a plurality of factors influencing the predicted value is reduced, the prediction precision is greatly improved, and the random forest algorithm has advantages when a large number of data samples are processed and is more stable in precision.
Meanwhile, the random forest provided by the invention has low requirements on data when processing the data, and can be classified variable or continuous variable, so that the data processing is easier and the application range is wider. Secondly, the random forest has the characteristics of being capable of performing discriminant analysis, logistic regression and multiple linear regression; in general, applying random forest preconditions is relatively free, without any statistical parameters required for the normality, homogeneity of variance of independent variables. Therefore, the invention improves the algorithm on the basis of the prior art, improves the model prediction accuracy value, and predicts and accurately predicts the future risk value according to the existing risk value.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments of the present application will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained from the drawings without creative efforts.
FIG. 1 is a block diagram of an intelligent risk determination system for information security of an industrial control system according to an embodiment of the present invention;
in the figure: 1. a dataset acquisition module; 2. a data preprocessing module; 3. a central control module; 4. a risk assessment model construction module; 5. a model training module; 6. a risk prediction module; 7. an information security evaluation module; 8. an early warning module; 9. a data storage module; 10. and updating the display module.
Fig. 2 is a flowchart of an intelligent risk determination method for information security of an industrial control system according to an embodiment of the present invention.
Fig. 3 is a flowchart of a method for preprocessing an acquired initial sample data set of the industrial control system by using a data preprocessing program through a data preprocessing module according to an embodiment of the present invention.
Fig. 4 is a flowchart of a method for constructing a risk assessment model by using a risk assessment model construction module and a model construction program according to an embodiment of the present invention.
Fig. 5 is a flowchart of a method for obtaining a final risk assessment model by using a model training program to train a random forest optimized by a time sequence algorithm with a preprocessed initial sample data set as a training sample through a model training module according to an embodiment of the present invention.
Fig. 6 is a schematic diagram of a random forest Bagging principle provided by an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Aiming at the problems in the prior art, the invention provides an intelligent determination system and method for information security risk of an industrial control system, and the invention is described in detail below with reference to the accompanying drawings.
As shown in fig. 1, an intelligent measurement system for information security risk of an industrial control system according to an embodiment of the present invention includes: the system comprises a data set acquisition module 1, a data preprocessing module 2, a central control module 3, a risk assessment model construction module 4, a model training module 5, a risk prediction module 6, an information safety assessment module 7, an early warning module 8, a data storage module 9 and an update display module 10.
The data set acquisition module 1 is connected with the central control module 3, and is used for acquiring a plurality of risk influencing elements of the industrial control system and a plurality of groups of evaluation values corresponding to the risk influencing elements through data set acquisition equipment, and taking the risk influencing elements and the evaluation values as an initial sample data set;
the data preprocessing module 2 is connected with the central control module 3 and used for preprocessing the acquired initial sample data set of the industrial control system through a data preprocessing program;
the central control module 3 is connected with the data set acquisition module 1, the data preprocessing module 2, the risk assessment model construction module 4, the model training module 5, the risk prediction module 6, the information safety assessment module 7, the early warning module 8, the data storage module 9 and the updating display module 10, and is used for coordinating and controlling the normal operation of each module of the information safety risk intelligent determination system of the industrial control system through a central processing unit;
the risk assessment model building module 4 is connected with the central control module 3 and used for building a risk assessment model through a model building program;
the model training module 5 is connected with the central control module 3 and used for training a random forest optimized by a time sequence algorithm through a model training program by taking the preprocessed initial sample data set as a training sample to obtain a final risk assessment model;
the risk prediction module 6 is connected with the central control module 3 and used for inputting the evaluation value of the risk influence element into a risk evaluation model through a risk prediction program to obtain a prediction value of the information security risk of the industrial control system;
the information security evaluation module 7 is connected with the central control module 3 and used for evaluating the obtained predicted value of the information security risk of the industrial control system through an information security evaluation program and generating an evaluation report;
the early warning module 8 is connected with the central control module 3 and is used for carrying out early warning notification on the abnormal information safety risk of the industrial control system through an acousto-optic early warning device;
the data storage module 9 is connected with the central control module 3 and is used for storing the acquired initial sample data set, data preprocessing results, risk assessment models, model training results, predicted values of information security risks of the industrial control system, information security assessment reports and early warning notifications of the industrial control system through a memory;
and the updating display module 10 is connected with the central control module 3 and is used for updating and displaying the acquired initial sample data set, the data preprocessing result, the risk assessment model, the model training result, the predicted value of the information security risk of the industrial control system, the information security assessment report and the real-time data of the early warning notice of the industrial control system through a display.
As shown in fig. 2, the intelligent determination method for information security risk of an industrial control system according to an embodiment of the present invention includes the following steps:
s101, acquiring a plurality of risk influencing elements of the industrial control system and a plurality of groups of evaluation values corresponding to the risk influencing elements by using a data set acquisition module through data set acquisition equipment, and taking the risk influencing elements and the evaluation values as an initial sample data set;
s102, preprocessing the acquired initial sample data set of the industrial control system by a data preprocessing module through a data preprocessing program;
s103, the normal operation of each module of the intelligent information safety risk measuring system of the industrial control system is coordinated and controlled by a central control module through a central processing unit; constructing by a risk assessment model construction module through a model construction program to obtain a risk assessment model;
s104, using the initial sample data set after preprocessing as a training sample through a model training module, and training a random forest optimized by a time sequence algorithm by using a model training program to obtain a final risk assessment model;
s105, inputting the evaluation value of the risk influence element into a risk evaluation model by using a risk prediction program through a risk prediction module to obtain a prediction value of the information security risk of the industrial control system;
s106, evaluating the obtained predicted value of the information security risk of the industrial control system by using an information security evaluation program through an information security evaluation module, and generating an evaluation report; early warning and informing abnormal information safety risks of the industrial control system by using an acousto-optic early warning device through an early warning module;
s107, storing the obtained initial sample data set, data preprocessing results, risk assessment models, model training results, predicted values of information security risks of the industrial control system, information security assessment reports and early warning notifications of the industrial control system by using a memory through a data storage module;
and S108, updating and displaying the acquired initial sample data set, data preprocessing results, risk assessment models, model training results, predicted values of information security risks of the industrial control system, information security assessment reports and real-time data of early warning notifications by using a display through an updating and displaying module.
In step S101 provided in the embodiment of the present invention, the risk influencing elements include: enterprise management layer elements, process control layer elements and field control layer elements; the enterprise management layer elements include: unauthorized access, malicious code, distributed denial of service, virus trojan and forgery attacks; the process control layer elements include: denial of service attacks, DOS attacks, flooding attacks, response spoofing, and direction misleading attacks; the field control layer elements include: physical attacks, information theft, data tampering, denial of service attacks, illegal access, and replay attacks.
The invention is further described with reference to specific examples.
Example 1
The method for intelligently determining the information security risk of the industrial control system, provided by the embodiment of the present invention, is shown in fig. 1, and as a preferred embodiment, is shown in fig. 3, and the method for preprocessing the acquired initial sample data set of the industrial control system by using a data preprocessing program through a data preprocessing module, provided by the embodiment of the present invention, includes:
s201, labeling a plurality of risk influence elements of the industrial control system and a plurality of groups of evaluation values corresponding to the risk influence elements by using a plurality of labeling frames of different types to obtain a first data set;
s202, determining the data acquisition cost according to a reverse-crawling mechanism triggered in the labeling process of the first data set; determining the data cleaning cost according to the structure type of the collected test data packet;
s203, determining data storage cost according to at least one data storage format and data quantity corresponding to various data storage formats;
s204, determining data processing cost based on the data acquisition cost, the data cleaning cost and the data storage cost, and processing the initial sample data set according to the data processing cost.
Example 2
The intelligent determination method for information security risk of an industrial control system provided by the embodiment of the invention is shown in fig. 1, and as a preferred embodiment, as shown in fig. 4, the method for obtaining a risk assessment model through a risk assessment model building module and by using a model building program comprises the following steps:
s301, acquiring multiple groups of evaluation value data corresponding to multiple risk influence elements;
s302, analyzing the correlation of the evaluation value data, and drawing a time series curve of the evaluation value data;
s303, acquiring jumping points and inflection points of the time series curve, and performing curve fitting by using a stable time series ARMA model;
and S304, constructing according to the curve fitting data by using a model construction program to obtain a risk assessment model.
Example 3
The method for intelligently determining the information security risk of the industrial control system, provided by the embodiment of the invention, is shown in fig. 1, and as a preferred embodiment, is shown in fig. 5, and the method for obtaining the final risk assessment model by using the model training program to train the random forest optimized by the time sequence algorithm by using the initial sample data set after the preprocessing as the training sample through the model training module, provided by the embodiment of the invention, comprises the following steps:
s401, taking the initial sample data set after preprocessing as a training sample, predicting the training sample data by adopting a time sequence, and predicting future data of a single variable by using historical data of the variable;
s402, constructing a random forest decision tree, and determining attribute test conditions;
and S403, training a random forest risk assessment model by using a regrF _ train function defined in the RF toolkit.
In step S401 provided in the embodiment of the present invention, the following statistics are required to be used in the data-based time sequence analysis:
(1) time interval: t 1,2,3.., n;
(2) time-series data: y is1,y2,y3,...,yn
(3) Predicting the value: fn+hA predicted value representing an h-th time interval after n; when h is 1, it means the next time interval immediately after one interval; h represents a time span, set to a value greater than 1;
(4) prediction error: at time t, et=yt-Ft
In step S402 provided in the embodiment of the present invention, the attribute test condition includes:
(1) binary property: the test condition of the binary attribute can generate two output results;
(2) nominal attributes: the nominal attribute has a plurality of output results, and the nominal attribute has two expression modes, namely multipath division and binary division;
(3) the sequence value attribute is as follows: dividing the grouping of the attribute values into two or more paths according to the artificial desire under the condition of not influencing the sequence of the values;
(4) continuous attributes: for continuous attributes, the test condition is a comparison test with binary output (A < v) or (A > v), and may also be a range query.
If multi-way partitioning is applied, all possible partition points and contiguous intervals are taken into account. The discretization method can be adopted for the continuous attribute, each discretization interval is endowed with a new ordinal value, and if the orderliness of the intervals is kept, adjacent ordinal values can be gathered into a wider interval.
Selecting the best characteristic attribute:
gini coefficient: the Gini coefficient index is a relative index and is widely applied to the aspects of economics or statistics. In the decision tree algorithm, the coefficient represents the degree of confusion of the attribute category in the pre-classified data set, and if P (X, Y) ═ P (X) × P (Y), and X and Y are independent from each other, then:
Log(XY)=Log(X)+Log(Y);
Figure BDA0003058352180000171
Figure BDA0003058352180000172
the random forest provided by the embodiment of the invention further comprises:
the main algorithm idea of the random forest is random sampling, namely, a fixed number of samples are randomly acquired in a preprocessed training set, and a method of extraction with replacement is adopted in each extraction. In such a case, the same number of samples are collected each time, but the contents are different, and the specific algorithm flow is as follows:
setting a sample set D { (x1, y1), (x2, y2), … (xn, yn) }, setting the iteration number of the weak classifier as T, and outputting a strong classifier f (x); for T ═ 1,2, …, T, there are:
firstly, randomly sampling a training set for the t time, and acquiring m times in total to obtain a sampling set Dt containing m samples;
training the t-th weak classifier by using the sampling set Dt;
and thirdly, the final m results are aggregated into the result f (x) of the strong classifier.
In step S403 provided by the embodiment of the present invention, a usage format of a regRF _ train function defined in the RF toolkit is as follows:
model=regRF_train(X,Y,ntree,mtry,extra_options);
in the expression, two parameters are mainly used and the other parameters can be freely selected. Wherein, X represents a data matrix, and the input normalized training set pn _ train is taken as the input data to be trained when the function is called; y represents a target value, and the output normalized training set tn _ train is taken as an output data set of the current training; second, the number of trees constructed by the model is denoted ntree, an optional parameter. Defining the parameter as 20100 in order to optimize the training result, namely 20100 trees; mtry is the number of features in the tree, as the sub-trunk of the tree. This parameter is set to 45 in this model. Finally, the trained model is returned to the model.
And finally, a section of existing risk value is input into the prediction model, and the future risk value is predicted step by step according to the trained model.
The schematic diagram of the random forest Bagging principle provided by the embodiment of the invention is shown in fig. 6.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When used in whole or in part, can be implemented in a computer program product that includes one or more computer instructions. When loaded or executed on a computer, cause the flow or functions according to embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL), or wireless (e.g., infrared, wireless, microwave, etc.)). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
In the description of the present invention, "a plurality" means two or more unless otherwise specified; the terms "upper", "lower", "left", "right", "inner", "outer", "front", "rear", "head", "tail", and the like, indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, are only for convenience in describing and simplifying the description, and do not indicate or imply that the device or element referred to must have a particular orientation, be constructed in a particular orientation, and be operated, and thus, should not be construed as limiting the invention. Furthermore, the terms "first," "second," "third," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. An industrial control system information security risk intelligent measurement system, characterized in that, industrial control system information security risk intelligent measurement system includes:
the system comprises a data set acquisition module, a data preprocessing module, a central control module, a risk assessment model construction module, a model training module, a risk prediction module, an information safety assessment module, an early warning module, a data storage module and an update display module;
the data set acquisition module is connected with the central control module and used for acquiring a plurality of risk influencing elements of the industrial control system and a plurality of groups of evaluation values corresponding to the risk influencing elements through data set acquisition equipment and taking the risk influencing elements and the evaluation values as an initial sample data set;
the data preprocessing module is connected with the central control module and used for preprocessing the acquired initial sample data set of the industrial control system through a data preprocessing program, and the data preprocessing module comprises:
(1) labeling a plurality of risk influence elements of the industrial control system and a plurality of groups of evaluation values corresponding to the risk influence elements by using a plurality of labeling frames of different types to obtain a first data set;
(2) determining the data acquisition cost according to a reverse-climbing mechanism triggered in the labeling process of the first data set; determining the data cleaning cost according to the structure type of the collected test data packet;
(3) determining data storage cost according to at least one data storage format and data quantity corresponding to various data storage formats;
(4) determining a data processing cost based on the data acquisition cost, the data cleaning cost and the data storage cost, and processing the initial sample data set according to the data processing cost;
the central control module is connected with the data set acquisition module, the data preprocessing module, the risk assessment model construction module, the model training module, the risk prediction module, the information security assessment module, the early warning module, the data storage module and the updating display module and is used for coordinating and controlling the normal operation of each module of the information security risk intelligent determination system of the industrial control system through the central processing unit;
the risk assessment model building module is connected with the central control module and used for building a risk assessment model through a model building program, and the risk assessment model building module comprises:
(1) acquiring a plurality of groups of evaluation value data corresponding to a plurality of risk influence elements;
(2) analyzing the correlation of the evaluation value data, and drawing a time series curve of the evaluation value data;
(3) jumping points and inflection points of the time series curve are obtained, and a stable time series ARMA model is selected for curve fitting;
(4) constructing according to the curve fitting data by using a model construction program to obtain a risk assessment model;
the model training module is connected with the central control module and used for taking the initial sample data set after preprocessing as a training sample, and training a random forest optimized by a time sequence algorithm through a model training program to obtain a final risk assessment model, and the model training module comprises:
(1) taking the initial sample data set after preprocessing as a training sample, predicting the training sample data by adopting a time sequence, and predicting future data of a single variable by using historical data of the variable;
(2) constructing a random forest decision tree and determining attribute test conditions;
(3) training a random forest risk assessment model by using a regrF _ train function defined in an RF toolkit;
the risk prediction module is connected with the central control module and used for inputting the evaluation value of the risk influence element into a risk evaluation model through a risk prediction program to obtain a prediction value of the information security risk of the industrial control system;
the information security evaluation module is connected with the central control module and used for evaluating the obtained predicted value of the information security risk of the industrial control system through an information security evaluation program and generating an evaluation report;
the early warning module is connected with the central control module and is used for carrying out early warning notification on the abnormal information safety risk of the industrial control system through the acousto-optic early warning device;
the data storage module is connected with the central control module and used for storing the acquired initial sample data set, data preprocessing results, risk assessment models, model training results, predicted values of information security risks of the industrial control system, information security assessment reports and early warning notifications of the industrial control system through a memory;
and the updating display module is connected with the central control module and is used for updating and displaying the acquired initial sample data set, the data preprocessing result, the risk assessment model, the model training result, the predicted value of the information security risk of the industrial control system, the information security assessment report and the real-time data of the early warning notice of the industrial control system through a display.
2. The intelligent industrial control system information security risk determination system of claim 1, wherein in the data set acquisition module, the risk influencing elements comprise: enterprise management layer elements, process control layer elements and field control layer elements; the enterprise management layer elements include: unauthorized access, malicious code, distributed denial of service, virus trojan and forgery attacks; the process control layer elements include: denial of service attacks, DOS attacks, flooding attacks, response spoofing, and direction misleading attacks; the field control layer elements include: physical attacks, information theft, data tampering, denial of service attacks, illegal access, and replay attacks.
3. The intelligent industrial control system information security risk determination system of claim 1, wherein the determining the data collection cost in the data preprocessing module comprises:
1) searching the acquisition difficulty corresponding to the triggered anti-crawling mechanism from the corresponding relation between the preset anti-crawling mechanism and the acquisition difficulty; the larger the cracking difficulty of the anti-crawling mechanism is, the larger the acquisition difficulty corresponding to the anti-crawling mechanism is;
2) and determining the product of the sum of the searched acquisition difficulties and the basic acquisition cost as the data acquisition cost.
4. The intelligent industrial control system information security risk measurement system of claim 1, wherein the model training module requires the following statistics for data-based time series analysis:
(1) time interval: t 1,2,3.., n;
(2) time-series data: y is1,y2,y3,...,yn
(3) Predicting the value: fn+hA predicted value representing an h-th time interval after n; when h is 1, it means the next time interval immediately after one interval; h represents a time span, set to a value greater than 1;
(4) prediction error: at time t, et=yt-Ft
5. The intelligent industrial control system information security risk measurement system of claim 1, wherein the attribute test conditions in the model training module comprise:
(1) binary property: the test condition of the binary attribute can generate two output results;
(2) nominal attributes: the nominal attribute has a plurality of output results, and the nominal attribute has two expression modes, namely multipath division and binary division;
(3) the sequence value attribute is as follows: dividing the grouping of the attribute values into two or more paths according to the artificial desire under the condition of not influencing the sequence of the values;
(4) continuous attributes: for continuous attributes, the test condition is a comparison test with binary output (A < v) or (A > v), and may also be a range query.
6. The intelligent industrial control system information security risk measurement system of claim 5, wherein if multi-path division is applied, all possible division points and continuous intervals are fully considered; the discretization method can be adopted for the continuous attribute, each discretization interval is endowed with a new ordinal value, if the ordering of the intervals is kept, adjacent ordinal values can be gathered into a wider interval, and the optimal characteristic attribute is selected, wherein:
gini coefficient: the Gini coefficient index is a relative index and is widely applied to the aspects of economics or statistics; in the decision tree algorithm, the coefficient represents the degree of confusion of the attribute category in the pre-classified data set, and if P (X, Y) ═ P (X) × P (Y), and X and Y are independent from each other, then:
Log(XY)=Log(X)+Log(Y);
Figure FDA0003058352170000051
Figure FDA0003058352170000052
7. the intelligent industrial control system information security risk measurement system of claim 1, wherein the random forest in the model training module further comprises:
the main algorithm idea of the random forest is random sampling, namely randomly collecting a fixed number of samples in a preprocessed training set, and adopting a replacement extraction method for each extraction; in such a case, the same number of samples are collected each time, but the contents are different, and the specific algorithm flow is as follows:
setting a sample set D { (x1, y1), (x2, y2), … (xn, yn) }, setting the iteration number of the weak classifier as T, and outputting a strong classifier f (x); for T ═ 1,2, …, T, there are:
firstly, randomly sampling a training set for the t time, and acquiring m times in total to obtain a sampling set Dt containing m samples;
training the t-th weak classifier by using the sampling set Dt;
and thirdly, the final m results are aggregated into the result f (x) of the strong classifier.
8. The intelligent industrial control system information security risk measurement system of claim 1, wherein in the model training module, the format of the regRF _ train function defined in the RF toolkit is as follows:
model=regRF_train(X,Y,ntree,mtry,extra_options);
in the expression, two parameters are mainly used by calling, and the other parameters can be freely selected to be used or not; wherein, X represents a data matrix, and the input normalized training set pn _ train is taken as the input data to be trained when the function is called; y represents a target value, and the output normalized training set tn _ train is taken as an output data set of the current training; secondly, selecting parameters, wherein the number of trees constructed by the model is expressed by ntree; defining the parameter as 20100, i.e. 20100 trees; mtry is the characteristic number in the tree and is used as the branch of the tree; setting the parameter to 45, and returning the model obtained by training to the model;
and finally, a section of existing risk value is input into the prediction model, and the future risk value is predicted step by step according to the trained model.
9. A computer program product stored on a computer readable medium, comprising a computer readable program for providing a user input interface for applying the intelligent industrial control system information security risk determination system of any one of claims 1 to 8 when executed on an electronic device.
10. A computer readable storage medium storing instructions which, when executed on a computer, cause the computer to apply the intelligent industrial control system information security risk determination system according to any one of claims 1 to 8.
CN202110505744.9A 2021-05-10 2021-05-10 Intelligent information security risk measuring system and method for industrial control system Pending CN113159615A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110505744.9A CN113159615A (en) 2021-05-10 2021-05-10 Intelligent information security risk measuring system and method for industrial control system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110505744.9A CN113159615A (en) 2021-05-10 2021-05-10 Intelligent information security risk measuring system and method for industrial control system

Publications (1)

Publication Number Publication Date
CN113159615A true CN113159615A (en) 2021-07-23

Family

ID=76874170

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110505744.9A Pending CN113159615A (en) 2021-05-10 2021-05-10 Intelligent information security risk measuring system and method for industrial control system

Country Status (1)

Country Link
CN (1) CN113159615A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114518988A (en) * 2022-02-10 2022-05-20 中国光大银行股份有限公司 Resource capacity system, method of controlling the same, and computer-readable storage medium
CN115097796A (en) * 2022-07-08 2022-09-23 广州市物码信息科技有限公司 Quality control system and method for simulating big data and correcting AQL value
CN115953032A (en) * 2023-03-10 2023-04-11 山东柏源技术有限公司 Enterprise project execution risk assessment system based on data analysis
CN116130043A (en) * 2023-04-04 2023-05-16 中国标准化研究院 Risk prediction system based on dressing and smelting computer model
CN117273467A (en) * 2023-11-17 2023-12-22 江苏麦维智能科技有限公司 Multi-factor coupling-based industrial safety risk management and control method and system
CN117455245A (en) * 2023-12-22 2024-01-26 赛飞特工程技术集团有限公司 Intelligent risk assessment system for enterprise safety production

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108400895A (en) * 2018-03-19 2018-08-14 西北大学 One kind being based on the improved BP neural network safety situation evaluation algorithm of genetic algorithm
CN109359469A (en) * 2018-10-16 2019-02-19 上海电力学院 A kind of Information Security Risk Assessment Methods of industrial control system
AU2020100709A4 (en) * 2020-05-05 2020-06-11 Bao, Yuhang Mr A method of prediction model based on random forest algorithm
CN111292020A (en) * 2020-03-13 2020-06-16 贵州电网有限责任公司 Power grid real-time operation risk assessment method and system based on random forest
CN111784486A (en) * 2020-06-12 2020-10-16 苏宁金融科技(南京)有限公司 Construction method and device of business risk prediction model and computer equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108400895A (en) * 2018-03-19 2018-08-14 西北大学 One kind being based on the improved BP neural network safety situation evaluation algorithm of genetic algorithm
CN109359469A (en) * 2018-10-16 2019-02-19 上海电力学院 A kind of Information Security Risk Assessment Methods of industrial control system
CN111292020A (en) * 2020-03-13 2020-06-16 贵州电网有限责任公司 Power grid real-time operation risk assessment method and system based on random forest
AU2020100709A4 (en) * 2020-05-05 2020-06-11 Bao, Yuhang Mr A method of prediction model based on random forest algorithm
CN111784486A (en) * 2020-06-12 2020-10-16 苏宁金融科技(南京)有限公司 Construction method and device of business risk prediction model and computer equipment

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114518988A (en) * 2022-02-10 2022-05-20 中国光大银行股份有限公司 Resource capacity system, method of controlling the same, and computer-readable storage medium
CN114518988B (en) * 2022-02-10 2023-03-24 中国光大银行股份有限公司 Resource capacity system, control method thereof, and computer-readable storage medium
CN115097796A (en) * 2022-07-08 2022-09-23 广州市物码信息科技有限公司 Quality control system and method for simulating big data and correcting AQL value
CN115953032A (en) * 2023-03-10 2023-04-11 山东柏源技术有限公司 Enterprise project execution risk assessment system based on data analysis
CN116130043A (en) * 2023-04-04 2023-05-16 中国标准化研究院 Risk prediction system based on dressing and smelting computer model
CN117273467A (en) * 2023-11-17 2023-12-22 江苏麦维智能科技有限公司 Multi-factor coupling-based industrial safety risk management and control method and system
CN117273467B (en) * 2023-11-17 2024-01-26 江苏麦维智能科技有限公司 Multi-factor coupling-based industrial safety risk management and control method and system
CN117455245A (en) * 2023-12-22 2024-01-26 赛飞特工程技术集团有限公司 Intelligent risk assessment system for enterprise safety production

Similar Documents

Publication Publication Date Title
CN113159615A (en) Intelligent information security risk measuring system and method for industrial control system
US11792229B2 (en) AI-driven defensive cybersecurity strategy analysis and recommendation system
US11429627B2 (en) System monitoring driven by automatically determined operational parameters of dependency graph model with user interface
CN107666410B (en) Network security analysis system and method
US11032323B2 (en) Parametric analysis of integrated operational technology systems and information technology systems
EP3107026B1 (en) Event anomaly analysis and prediction
US20220224723A1 (en) Ai-driven defensive cybersecurity strategy analysis and recommendation system
EP2142994B1 (en) Statistical method and system for network anomaly detection
EP2487860B1 (en) Method and system for improving security threats detection in communication networks
CN111092852A (en) Network security monitoring method, device, equipment and storage medium based on big data
US20080115221A1 (en) System and method for predicting cyber threat
CN105637519A (en) Cognitive information security using a behavior recognition system
US20150096026A1 (en) Cyber security
CN109918279B (en) Electronic device, method for identifying abnormal operation of user based on log data and storage medium
CN103870751A (en) Method and system for intrusion detection
US20210136120A1 (en) Universal computing asset registry
JP7069399B2 (en) Systems and methods for reporting computer security incidents
WO2021216163A2 (en) Ai-driven defensive cybersecurity strategy analysis and recommendation system
CN110830467A (en) Network suspicious asset identification method based on fuzzy prediction
CN112261429A (en) Live broadcast application system, method, equipment and storage medium of cardless intelligent terminal
CN110716973A (en) Big data based security event reporting platform and method
CN116112194A (en) User behavior analysis method and device, electronic equipment and computer storage medium
CN111316272A (en) Advanced cyber-security threat mitigation using behavioral and deep analytics
CN111865899A (en) Threat-driven cooperative acquisition method and device
CN115426161A (en) Abnormal device identification method, apparatus, device, medium, and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination