CN113159181B

CN113159181B - Industrial control system anomaly detection method and system based on improved deep forest

Info

Publication number: CN113159181B
Application number: CN202110438900.4A
Authority: CN
Inventors: 李肯立; 陈伟杰; 余思洋; 肖国庆; 段明星
Original assignee: Hunan University
Current assignee: Hunan University
Priority date: 2021-04-23
Filing date: 2021-04-23
Publication date: 2022-06-10
Anticipated expiration: 2041-04-23
Also published as: CN113159181A

Abstract

The invention discloses an anomaly detection method based on an improved annular multi-granularity scanning deep forest under an industrial control network, which specifically comprises the following steps: carrying out normalization processing on the constructed training set sample by adopting a Z-score method, and mapping the characteristic data to an interval of [ -1, 1 ]; performing feature dimensionality reduction on the sample feature set by adopting a principal component analysis method to generate a new feature vector set with characteristics irrelevant to each other; generating a feature sub-vector of each sample by passing the feature vector set subjected to dimension reduction through an annular multi-granularity scanning structure; respectively inputting the feature sub-vector sets into a semi-random forest and a completely random forest to generate corresponding class feature vectors, and combining the class feature vectors into a new class feature vector to be used as feature input of the cascade forest; and inputting the generated class feature vector set into a diversified cascade forest structure, iterating until convergence is reached, and generating a final class feature vector. The method can solve the problems of low detection rate, weak generalization and the like of the existing method for the abnormal behaviors of the industrial control system network.

Description

Industrial control system anomaly detection method and system based on improved deep forest

Technical Field

The invention belongs to the technical field of information safety, and particularly relates to an industrial control system anomaly detection method and system based on improved deep forest.

Background

Industrial control networks typically communicate using proprietary protocols, which often only consider functional requirements at the beginning of the design, but security is not sufficiently robust. With the development of network technology, industrial control networks gradually deepen contact with the internet, and meanwhile, the probability of external attack or intrusion is gradually increased. More and more industrial control networks are connected to public networks such as the internet, which also makes security problems in industrial control networks more and more obvious.

The traditional measure for preventing the industrial control network from being attacked by the network is mainly to analyze and identify the abnormal communication behavior of the general Internet protocol and match the abnormal communication behavior according to the preset rule and the characteristic value, thereby realizing simple safety filtration.

However, the above-mentioned method of abnormal behavior detection has not negligible drawbacks: the method mainly relies on feature matching to identify dangerous network behaviors, and not only is the identification rate of industrial control network abnormity low, but also the generalization is weak.

Disclosure of Invention

Aiming at the defects or the improvement requirements of the prior art, the invention provides an industrial control system anomaly detection method based on an improved annular multi-granularity scanning deep forest, and aims to solve the technical problems of low recognition rate and low generalization rate of the existing industrial control network anomaly detection method.

To achieve the above object, according to one aspect of the present invention, there is provided an improved deep forest based industrial control system anomaly detection method, comprising the steps of:

(1) acquiring network data from an industrial control system to be detected, and preprocessing the network data to obtain a sample set;

(2) and (3) inputting the sample set obtained in the step (1) into a pre-trained anomaly detection model to obtain an anomaly detection result.

Preferably, the step (1) is specifically to perform unified numerical conversion on the data types of the acquired network data, represent the labels of normal data and abnormal data by 0 and 1 respectively, then perform normalization processing on the numerically-converted network data by using a Z-score method, and finally use one feature of each sample in the normalized network data as a data dimension, so as to convert the sample into a feature vector, and the feature vectors corresponding to all the samples form a sample set.

Preferably, the step (2) is specifically to perform feature extraction on the sample set obtained in the step (1) by using a PCA method to obtain a new feature vector set after dimension reduction, process each feature vector in the new feature vector set after dimension reduction by using an annular multi-granularity scanning structure in a trained anomaly detection model to obtain a feature sub-vector set corresponding to the feature vector, form a large set by using all the feature sub-vector sets, input each feature sub-vector in each feature sub-vector set in the large set into a fully random forest classifier and a semi-random forest classifier respectively to obtain class feature vectors, form a class feature vector set by using a plurality of initial class feature vectors corresponding to all the feature sub-vector sets in the set, then input the class feature vectors into a cascade forest model in the trained anomaly detection model, and finally, inputting the final class feature vector into the last layer of integrated classification model to obtain a plurality of classification results, and obtaining the average value of all classification results, wherein if the average value is more than 0.5, the industrial control system to be detected is abnormal, otherwise, the industrial control system to be detected is normal.

Preferably, the anomaly detection model is trained by the following steps:

(1) obtaining network data, and constructing a data set X according to the network data, wherein X belongs to R^n×mWhere R represents a set of real numbers and n represents a total number of samples in the data set;

wherein x is_i＝[x_1i x_2i … x_ni]^TAnd (i ═ 1,2, … m) represents a feature set composed of the ith-dimension features of each sample in the dataset.

(2) And (2) carrying out normalization processing on the data set obtained in the step (1) by adopting a Z-score normalization method, and carrying out normalization processing on the data set after normalization processing according to the ratio of 5: 1 into training set X_train,X_train∈R^n×mAnd test set X_test,X_test∈R^n×m。

(3) Using the PCA methodCarrying out the method on the training set X obtained in the step (2)_trainAnd (5) carrying out feature extraction to obtain a new feature vector set after dimension reduction.

(4) And (3) processing each feature vector in the new feature vector set subjected to dimensionality reduction in the step (3) by using an annular multi-granularity scanning structure in the anomaly detection model to obtain a feature sub-vector set corresponding to the feature vector, wherein all n feature sub-vector sets (used for enhancing the characterization learning of the subsequent diversified cascade forest structure) form a large set H.

(5) Collecting H each feature sub-vector in the large set H obtained in the step (4)_dAnd each eigenvector in d e {1,2, …, n } is respectively input into a fully random forest classifier and a semi-random forest classifier to obtain 2 c-dimensional class eigenvectors U ═ U [ n ]₁,u₂,…,u_cH, each feature subvector set H_dAnd d is equal to k 2 c-dimensional feature vectors corresponding to all k feature sub-vectors in {1,2, …, n }, so as to form an initial class feature vector b_dThe dimension is 1 × 2kc, and n initial class feature vectors corresponding to all feature sub-vector sets in the set H form an initial class feature vector set B ═ B₁b₂…b_nDimension n × 2kc, wherein c represents the number of classification categories;

(6) inputting the initial class feature vector generated in the step (5) into a diversified cascade forest structure for iterative training until the diversified cascade forest structure is converged, thereby obtaining a trained anomaly detection model.

Preferably, step (3) is specifically that firstly, the training set X is processed by using PCA method_trainThe m feature sets in (1) are linearly transformed as follows:

wherein Y ═ Y₁ y₂ … y_m]Is a new feature set after conversion, and the sample covariance matrix a is equal to:

wherein the content of the first and second substances,

x_ia feature set composed of the ith dimension features of each sample in the step (1),

representing the average value of the feature sets corresponding to all samples in the data set;

then, a feature vector set alpha and a feature value set lambda of the new feature set Y are obtained according to the lambda alpha-Y alpha,

wherein α ═ α₁,α₂,…,α_m]，λ＝[λ₁,λ₂,…,λ_m]Wherein λ is_iRepresenting a characteristic value corresponding to the ith principal component;

then, according to the characteristic value lambda corresponding to the ith principal component_iCalculating the variance contribution rate p of the ith principal component_i：

Wherein p is_iRepresenting the variance contribution rate of the ith principal component.

Then, k is set to 1, and the characteristic value λ corresponding to the ith principal component is used_iAnd obtaining the cumulative variance contribution rate of the ith principal component when k is 1 by using the following formula

Then, setting k to 2, and following the calculation formula in the previous paragraph, to obtain the cumulative variance contribution rate of the ith principal component when k is 2

And judging the cumulative variance contribution rate of the ith principal component when k is 1

If the growth rate is less than 1%, fixing the k value to 2 if the growth rate is less than 1%, otherwise, continuing to set k to 3, and acquiring the cumulative variance contribution rate of the ith principal component when k is 3

And judging the cumulative variance contribution rate of the ith principal component when k is 2

If the growth rate is less than 1%, fixing the value of k to 3 if the growth rate is less than 1%, otherwise, continuously setting k to 4, … and the like;

then, sorting the main components from large to small according to the variance contribution rates of the main components, selecting characteristic values corresponding to the variance contribution rates of the first k main components from the sorting result, and recording a subscript set consisting of the subscripts corresponding to the characteristic values as index ═ index { (index) }₁,index₂,…,index_kAnd selecting columns corresponding to the subscripts from the new characteristic set Y according to the subscript set index, thereby obtaining a new characteristic vector set of all the samples after dimensionality reduction, and marking as a new characteristic vector set

Wherein z is_d(d-1, 2, …, n) represents the d-th sample's feature vector after dimensionality reduction, which contains k features, i.e., z_d＝{f_d1f_d2 … f_dkDimension 1 × k.

Preferably, step (4) is specifically to first perform dimension reduction on the first eigenvector z in the new eigenvector set in step (3)₁＝{f₁₁ f₁₂ … f_1kPerforming annular multi-granularity scanning processing to the characteristic f₁₁And characteristic f_1kConnected so that the feature vector z₁＝{f₁₁ f₁₂ … f_1kForming an annular shape connected end to end; secondly, from the start feature f, a sliding window of length t₁₁Initially, a feature subvector is obtained for each unit sliding from left to right, and then the sliding window is moved one unit to the right, so that the starting feature is f₁₂And the obtained feature subvector is marked as F₂＝{f₁₂,f₁₃,…,f_1(t+1)…, and finally, the sliding window is moved to have the starting feature f_1kAnd the obtained characteristic subvectors are marked as F_k＝{f_1k,f₁₁,…,f_1(t-1)All k feature subvectors form a feature subvector set H₁＝{F₁ F₂ … F_k}。

Then, firstly, the second eigenvector z in the new eigenvector set after dimension reduction in step (3) is selected₂＝{f₂₁ f₂₂… f_2kPerforming annular multi-granularity scanning processing to the characteristic f₂₁And characteristic f_2kConnected so that the feature vector z₂＝{f₂₁ f₂₂ … f_2kBecoming a ring shape connected end to end; secondly, from the start feature f, a sliding window of length t₂₁Initially, a feature subvector is obtained for each unit sliding from left to right, and then the sliding window is moved one unit to the right, so that the starting feature is f₂₂And the obtained feature subvector is marked as F₂＝{f₂₂,f₂₃,…,f_2(t+1)…, and finally, the sliding window is moved to have the starting feature F_k＝{f_2k,f₂₁,…,f_2(t-1)All k feature subvectors form a feature subvector set H₂＝{F₁ F₂ … F_k}；

… and so on;

subsequently, the nth feature vector z in the new feature vector set after dimension reduction in the step (3) is subjected to_n＝{f_n1 f_n2 … f_nkPerforming annular multi-granularity scanning processing to the characteristic f_n1And characteristic f_nkConnected so that the feature vector z_n＝{f_n1 f_n2 … f_nkForming an annular shape connected end to end; secondly, from the starting feature f, a sliding window of length t_n1Initially, a feature subvector is obtained for each unit sliding from left to right, and then the sliding window is moved one unit to the right, so that the starting feature is f_n2And the obtained feature subvector is marked as F₂＝{f_n2,f_n3,…,f_n(t+1)…, and finally, the sliding window is moved to have the starting feature F_k＝{f_nk,f_n1,…,f_n(t-1)All k feature subvectors form a feature subvector set H_n＝{F₁F₂…F_k}。

Finally, a large set composed of all feature sub-vector sets corresponding to all n sample feature vectors is recorded as H ═ H₁,H₂,…,H_d},(d＝1,2,…,n)。

Preferably, step (6) is specifically to, first, perform the following processing on the first layer of the diversified cascade forest structure:

firstly, a first initial class feature vector B in an initial class feature vector set B is used₁Respectively inputting five decision trees of semi-random forest, completely random forest, XGboost, GBDT and Catboost of a diversified cascade forest structure to obtain five c-dimensional class characteristic vectors o₁,o₂,o₃,o₄,o₅Class feature vector o₁,o₂,o₃,o₄,o₅And the initial class feature vector b₁Generating (5c +2kc) dimension class feature vector e in combination₁₁＝{o₁ o₂ o₃ o₄ o₅ b₁}；

Then, a second final class feature vector B in the initial class feature vector set B is used₂Respectively inputting five decision trees of semi-random forest, completely random forest, XGboost, GBDT and Catboost of a diversified cascade forest structure to obtain five c-dimensional class characteristic vectors o₁,o₂,o₃,o₄,o₅Class feature vector o₁,o₂,o₃,o₄,o₅And the initial class feature vector b₂Generating a (5c +2kc) dimensional feature vector e in combination₁₂＝{o₁ o₂ o₃ o₄ o₅ b₂}, … and so on;

finally, all (5c +2kc) dimensional class feature vectors corresponding to all n initial class feature vectors in the initial class feature vector set B form a class feature vector set E of the first layer₁＝{e₁₁ e₁₂ … e_1n}。

Then, the second layer of the diversified cascade forest structure is processed as follows:

firstly, a class feature vector set E generated by a first layer is collected₁＝{e₁₁ e₁₂ … e_1nThe first class feature vector e in₁₁Respectively inputting five decision trees of semi-random forest, completely random forest, XGboost, GBDT and Catboost of a diversified cascade forest structure to obtain five c-dimensional class characteristic vectors o₁,o₂,o₃,o₄,o₅Class feature vector o₁,o₂,o₃,o₄,o₅And the initial class feature vector b₁Generating a (5c +2kc) dimensional feature vector e in combination₂₁＝{o₁ o₂ o₃ o₄ o₅ b₁}。

Secondly, the class feature vectors generated by the first layer are collected into a set E₁＝{e₁₁ e₁₂ … e_1nThe second class feature vector e in₁₂Respectively inputting five decision trees of semi-random forest, completely random forest, XGboost, GBDT and Catboost of a diversified cascade forest structure to obtain five c-dimensional class characteristic vectors o₁,o₂,o₃,o₄,o₅Class feature vector o₁,o₂,o₃,o₄,o₅And the initial class feature vector b₂Generating a (5c +2kc) dimensional feature vector e in combination₂₂＝{o₁ o₂ o₃ o₄ o₅ b₂}, … and so on.

Finally, all (5c +2kc) dimensional feature vectors corresponding to all n initial class feature vectors in the initial class feature vector set B form a class feature vector set E of the second layer₂＝{e₂₁ e₂₂ … e_2n}。

And then, processing subsequent layers of the diversified cascading forest structure in the same mode as the first layer and the second layer until the diversified cascading forest structure is converged, and finishing model training.

According to another aspect of the invention, there is provided an improved deep forest based industrial control system anomaly detection system comprising:

the system comprises a first module, a second module and a third module, wherein the first module is used for acquiring network data from an industrial control system to be detected and preprocessing the network data to obtain a sample set;

and the second module is used for inputting the sample set obtained by the first module into a pre-trained anomaly detection model so as to obtain an anomaly detection result.

In general, compared with the prior art, the above technical solution contemplated by the present invention can achieve the following beneficial effects:

(1) because the invention adopts the step (4), the characteristic vectors of the samples are fully scanned through the annular multi-granularity scanning structure, rich characteristic sub-vectors are obtained, and the characterization learning capacity of the subsequent diversified cascade forest structure is enhanced, so that the abnormity detection recognition rate is improved, and the technical problem of low abnormity detection recognition rate of the existing industrial control network can be solved;

(2) because the step (5) is adopted, the classifiers can make up each other before by increasing the diversity of the classifiers in the diversified cascade forest structure, and the technical problem of poor generalization of the abnormal detection of the existing industrial control network can be solved;

(3) in the invention, the idea of ensemble learning is adopted in the step (5), and a plurality of classifiers are integrated, so that the feasibility of distributed parallel operation exists, and the detection efficiency of the anomaly detection model can be accelerated.

Drawings

FIG. 1 is a flow chart of an improved deep forest based industrial control system anomaly detection method of the present invention;

FIG. 2 is a schematic diagram of an anomaly detection model used in the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

The method specifically comprises the steps of aiming at the network particularity of the industrial control system, constructing an industrial control system network abnormity detection model by utilizing a deep learning thought, on one hand, preprocessing sample data and extracting characteristics of the sample, on the other hand, fully scanning characteristic vectors by adopting an annular multi-granularity scanning structure, constructing a characteristic subset and carrying out abnormity detection by utilizing a diversified cascade forest structure. The effect of the anomaly detection herein is enhanced in these two angles.

As shown in FIG. 1, the invention provides an industrial control system anomaly detection method based on improved deep forest, which comprises the following steps:

specifically, according to different data acquisition methods, the data are represented in different modes, namely normal data or abnormal data, the data types of the acquired network data are subjected to unified numerical conversion, labels of the normal data and the abnormal data are represented by 0 and 1 respectively, then the network data subjected to the numerical conversion are subjected to normalization processing by adopting a Z-score method, the unified dimension among the features is ensured, finally, one feature of each sample in the normalized network data is used as a data dimension, the sample is converted into a feature vector, and the feature vectors corresponding to all the samples form a sample set.

The step (1) has the advantages of ensuring the dimension consistency among the characteristics and avoiding influencing the subsequent abnormal detection performance.

(2) And (2) inputting the sample set obtained in the step (1) into a pre-trained anomaly detection model to obtain an anomaly detection result.

Specifically, the method comprises the following steps of (3), (4) and (5) sequentially processing a sample set to obtain a class feature vector, inputting the class feature vector into the cascade forest model trained in the step (6) to obtain a final class feature vector, inputting the final class feature vector into the integrated classification model of the last layer to obtain a plurality of classification results, and calculating a mean value of all classification results, wherein if the mean value is greater than 0.5, it is indicated that the industrial control system to be detected is abnormal, and otherwise, it is indicated that the industrial control system to be detected is normal.

As shown in fig. 2, the anomaly detection model of the present invention includes a circular multi-granularity scanning structure and a diversified cascade forest structure connected to each other.

For an annular multi-granularity scanning structure, 1 × m-dimensional feature vectors (m is the total number of features of each sample) are input, and the 1 × m-dimensional feature vectors are scanned by using a sliding window with the length of 1 × t (t is a natural number and is generally adaptively adjusted according to specific feature vectors) to generate corresponding m feature sub-vectors;

for the diversified cascade forest structure, the input of the diversified cascade forest structure is m characteristic sub-vectors generated by the first part, the characteristic sub-vectors are input into a first layer of integrated learning module in the diversified cascade forest structure to respectively generate corresponding class vectors, the class vectors and the m characteristic sub-vectors are linearly combined to be used as the input of a second layer of integrated learning module, …, and the like;

specifically, the anomaly detection model in this step is obtained by training through the following steps:

(1) and acquiring network data and constructing a data set according to the network data.

In the step, a data set X is constructed, wherein X belongs to R^n×mWhere R represents the real number set and n represents the total number of samples in the data set.

Wherein x is_i＝[x_1i x_2i … x_ni]^TAnd (i ═ 1,2, … m) represents a feature set composed of the i-th dimensional features of each sample in the dataset.

(2) And (3) carrying out normalization processing on the data set obtained in the step (1) by adopting a Z-score normalization method, and carrying out normalization processing on the data set after the normalization processing according to the ratio of 5: 1 into training set X_train,X_train∈R^n×mAnd test set X_test,X_test∈R^n×m。

(3) Using Principal Component Analysis (PCA) method to the training set X obtained in step (2)_trainAnd (5) carrying out feature extraction to obtain a new feature vector set after dimension reduction.

Specifically, the method first uses PCA method to combine the training set X_trainThe m feature sets in (1) are linearly transformed as follows:

wherein Y ═ Y₁ y₂ … y_m]Is a new feature set after conversion, and the sample covariance matrix is recorded as A:

wherein the content of the first and second substances,

x_ifor each sample in step (1)The feature set composed of the ith dimension features of (1),

representing the average of the feature sets corresponding to all samples in the data set.

And satisfies:

1)y_i,y_j(i ≠ j) is independent of each other.

2)y₁Variance greater than y₂Variance, and so on.

3)

wherein α ═ α₁,α₂,…,α_m]，λ＝[λ₁,λ₂,…,λ_m]Wherein λ is_iAnd representing the characteristic value corresponding to the ith principal component.

Subsequently, k is set to 1, and the eigenvalue λ corresponding to the ith principal component is calculated_iAnd obtaining the cumulative variance contribution rate of the ith principal component when k is 1 by using the following formula

Then, set k to 2, follow the calculation formula of the previous paragraph, toObtaining the cumulative variance contribution rate of the ith principal component when k is 2

If the growth rate is less than 1%, fixing the k value to 2 if the growth rate is less than 1%, otherwise, continuously setting k to 3, and acquiring the cumulative variance contribution rate of the ith principal component when k is 3

If the growth rate of the time is less than 1%, if yes, fixing the value of k to be 3, otherwise, continuously setting k to be 4, … and the like;

then, sorting the components from large to small according to the variance contribution rates of the main components, selecting characteristic values corresponding to the variance contribution rates of the first k main components from the sorting result, and recording a subscript set consisting of the subscripts corresponding to the characteristic values as index ═ { index [ -index [ ]₁,index₂,…,index_kAnd (the subscript is discontinuous), selecting a column corresponding to the subscript from the new characteristic set Y according to the subscript set index, thereby obtaining a new characteristic vector set of all the samples after dimension reduction, and marking as a new characteristic vector set of all the samples after dimension reduction

(the matrix Q is of dimension n x k, where one row represents a sample and one column represents a feature vector of a sample), where z_d(d-1, 2, …, n) represents the d-th sample's feature vector after dimensionality reduction, which contains k features, i.e., z_d＝{f_d1 f_d2 … f_dkDimension 1 × k.

Specifically, the annular multi-granularity scanning method uses a sliding window with the length t (t ranges from 2 to k, preferably 3) to perform one-by-one comparison on each sample feature vector z obtained in the step (3)_dAnd (d is 1,2, …, n).

The specific step is that firstly, the first eigenvector z in the new eigenvector set after dimension reduction in the step (3) is carried out₁＝{f₁₁ f₁₂ … f_1kPerforming annular multi-granularity scanning processing to the characteristic f₁₁And characteristic f_1kConnected so that the feature vector z₁＝{f₁₁ f₁₂ … f_1kForming an annular shape connected end to end; secondly, from the start feature f, a sliding window of length t₁₁Initially, a feature sub-vector is obtained by sliding from left to right in units, each time sliding a unit (e.g., starting feature is f)₁₁Then the obtained feature subvector is noted as F₁＝{f₁₁,f₁₂,…,f_1t}) and then the sliding window is moved one unit to the right, so that the starting characteristic is f₁₂And the obtained feature subvector is marked as F₂＝{f₁₂,f₁₃,…,f_1(t+1)…, and finally, the sliding window is moved to have the starting feature f_1kAnd the obtained characteristic subvectors are marked as F_k＝{f_1k,f₁₁,…,f_1(t-1)H, all k feature sub-vectors form a set of feature sub-vectors H₁＝{F₁ F₂ … F_k}。

Then, firstly, the second eigenvector z in the new eigenvector set after dimension reduction in step (3) is selected₂＝{f₂₁ f₂₂… f_2kPerforming annular multi-granularity scanning processing to the characteristic f₂₁And characteristic f_2kConnected so that the feature vector z₂＝{f₂₁ f₂₂ … f_2kBecomeA ring shape with end-to-end characteristics; secondly, from the start feature f, a sliding window of length t₂₁Starting with a unit of left-to-right sliding, each sliding of a unit, a feature sub-vector is obtained (e.g., the starting feature is f)₂₁Then the obtained feature subvector is noted as F₁＝{f₂₁,f₂₂,…,f_2t}) and then the sliding window is moved one unit to the right, so that the starting characteristic is f₂₂And the obtained feature subvector is marked as F₂＝{f₂₂,f₂₃,…,f_2(t+1)…, and finally, the sliding window is moved to have the starting feature F_k＝{f_2k,f₂₁,…,f_2(t-1)All k feature subvectors form a feature subvector set H₂＝{F₁ F₂ … F_k}；

… and so on; subsequently, the nth feature vector z in the new feature vector set after dimension reduction in the step (3) is subjected to_n＝{f_n1 f_n2 … f_nkPerforming annular multi-granularity scanning processing to the characteristic f_n1And characteristic f_nkConnected so that the feature vector z_n＝{f_n1 f_n2 … f_nkForming an annular shape connected end to end; secondly, from the start feature f, a sliding window of length t_n1Starting with a unit of left-to-right sliding, each sliding of a unit, a feature sub-vector is obtained (e.g., the starting feature is f)_n1Then the obtained feature subvector is noted as F₁＝{f_n1,f_n2,…,f_nt}) and then the sliding window is moved one unit to the right, so that the starting characteristic is f_n2And the obtained feature subvector is marked as F₂＝{f_n2,f_n3,…,f_n(t+1)…, and finally, the sliding window is moved to have the starting feature F_k＝{f_nk,f_n1,…,f_n(t-1)All k feature subvectors form a feature subvector set H_n＝{F₁ F₂ … F_k}。

Finally, all n samples are specifiedThe large set composed of all feature sub-vector sets corresponding to the feature vectors is denoted as H ═ H { (H)₁,H₂,…,H_d},(d＝1,2,…,n)。

(5) A set H of each feature sub-vector in the large set H obtained in the step (4)_dAnd each feature subvector in d e {1,2, …, n } is respectively input into a fully random forest classifier and a semi-random forest classifier to obtain a class feature vector U ═ U ═ of 2c dimensions (where c denotes the number of classification classes, where anomaly detection belongs to a binary problem, so c ═ 2) of the class feature vector U ═ U ═ of 2c dimensions (where c denotes the number of classification classes)₁,u₂,…,u_cH, each feature subvector set H_dAnd d is equal to k 2 c-dimensional feature vectors corresponding to all k feature sub-vectors in {1,2, …, n }, so as to form an initial class feature vector b_dThe dimension is 1 × 2kc, and n initial class feature vectors corresponding to all feature sub-vector sets in the set H form an initial class feature vector set B ═ B₁ b₂ … b_nAnd dimension n × 2 kc.

The step (3) has the advantage that redundant features and irrelevant features in the features can be removed, so that the subsequent anomaly detection efficiency and accuracy are improved.

The step (4) has the advantages that the characteristic learning capacity of the diversified cascade forest structure is increased through the multi-granularity scanning structure, and the identification rate of the abnormal detection is further improved.

The step (5) has the advantage that a plurality of different classifiers are introduced through the diversified cascading forest structure, so that the generalization of the model is enhanced.

Specifically, in the step (5), the initial class feature vector set B generated in the step (5) is used as a training set and is input into a diversified cascade forest structure for model training.

The method comprises the following steps of firstly, processing a first layer of a diversified cascade forest structure as follows:

firstly, the following components are mixedThe first initial class feature vector B in the initial class feature vector set B₁Respectively inputting five decision trees of semi-random forest, completely random forest, XGboost, GBDT and Catboost of a diversified cascade forest structure to obtain five c-dimensional class characteristic vectors o₁,o₂,o₃,o₄,o₅Class feature vector o₁,o₂,o₃,o₄,o₅And the initial class feature vector b₁Generating (5c +2kc) dimension class feature vector e in combination₁₁＝{o₁ o₂ o₃ o₄ o₅ b₁}；

Then, a second final class feature vector B in the initial class feature vector set B is used₂Respectively inputting five decision trees of semi-random forest, completely random forest, XGboost, GBDT and Catboost of a diversified cascade forest structure to obtain five c-dimensional class characteristic vectors o₁,o₂,o₃,o₄,o₅Class feature vector o₁,o₂,o₃,o₄,o₅And the initial class feature vector b₂Generating (5c +2kc) dimension feature vector e in combination₁₂＝{o₁ o₂ o₃ o₄ o₅ b₂}, … and so on.

firstly, a class feature vector set E generated by a first layer is collected₁＝{e₁₁ e₁₂ … e_1nThe first class feature vector e in₁₁Respectively inputting five decision trees of semi-random forest, completely random forest, XGboost, GBDT and Catboost of a diversified cascade forest structure to obtain five c-dimensional class characteristic vectors o₁,o₂,o₃,o₄,o₅Class feature vector o₁,o₂,o₃,o₄,o₅And the initial class feature vector b₁Generating (5c +2kc) dimension feature vector e in combination₂₁＝{o₁ o₂ o₃ o₄ o₅ b₁}。

Results of the experiment

In order to illustrate the effectiveness and detection effect of the invention in the field of network anomaly detection of industrial control systems, verification tests are performed on a plurality of data sets, natural gas pipeline test data of key infrastructure protection center of mississippi state university are used for comparing the test results obtained by the invention with the currently common method, and the evaluation results are as follows in table 1:

TABLE 1

According to the table 1 shown in the above, in the experimental comparison of the natural gas pipeline test data of the key infrastructure protection center at mississippi state university, the classification algorithm is superior to other three common classification algorithms in the accuracy, the missing report rate and the false report rate. The deep forest anomaly detection model based on the annular multi-granularity scanning structure and the diversified cascade forest structure provided by the invention can be used for obtaining the characterization learning of the reinforced subsequent cascade forest by fully scanning the characteristic vectors on the annular multi-granularity scanning structure, and introducing various weak classifiers into the diversified cascade forest structure to complement the disadvantages of different classifiers, so that an integrated model with generalization and strong performance is obtained, and the performance of the whole anomaly detection model is better.

It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. An industrial control system anomaly detection method based on improved deep forests is characterized by comprising the following steps:

firstly, acquiring network data from an industrial control system to be detected, and preprocessing the network data to obtain a sample set;

inputting the sample set obtained in the step one into a pre-trained anomaly detection model to obtain an anomaly detection result; the anomaly detection model is obtained by training the following steps:

(1) obtaining network data, and constructing a data set X according to the network data, wherein X belongs to R^n×mWherein R represents a real number set and n represents a total number of samples in the data set;

wherein x is_i＝[x_1i x_2i … x_ni]^Τ(i-1, 2, … m) represents a feature set composed of ith dimension features of each sample in the dataset;

(2) and (3) carrying out normalization processing on the data set obtained in the step (1) by adopting a Z-score normalization method, and carrying out normalization processing on the data set after the normalization processing according to the ratio of 5: 1 into training set X_train,X_train∈R^n×mAnd test set X_test,X_test∈R^n×m；

(3) Using PCA method to perform training set X obtained in step (2)_trainCarrying out feature extraction to obtain a new feature vector set after dimension reduction; step (3) is specifically, firstly, a PCA method is utilized to carry out training set X_trainThe m feature sets in (1) are linearly transformed as follows:

wherein the content of the first and second substances,

then, a feature vector set alpha and a feature value set lambda of the new feature set Y are obtained according to the lambda alpha and the Y alpha,

Wherein p is_iRepresenting the variance contribution rate of the ith principal component;

then, sorting the components from large to small according to the variance contribution rates of the main components, selecting characteristic values corresponding to the variance contribution rates of the first k main components from the sorting result, and recording a subscript set consisting of the subscripts corresponding to the characteristic values as index ═ { index [ -index [ ]₁,index₂,…,index_kAnd selecting columns corresponding to the subscripts from the new characteristic set Y according to the subscript set index, thereby obtaining a new characteristic vector set of all the samples after dimensionality reduction, and marking as a new characteristic vector set

Wherein z is_d(d-1, 2, …, n) represents the d-th sample's feature vector after dimensionality reduction, which contains k features, i.e., z_d＝{f_d1f_d2 … f_dkDimension 1 × k;

(4) processing each feature vector in the new feature vector set subjected to dimensionality reduction in the step (3) by using an annular multi-granularity scanning structure in the anomaly detection model to obtain a feature sub-vector set corresponding to the feature vector, wherein a large set H is formed by all n feature sub-vector sets; step (4) is specifically that firstly, the first eigenvector z in the new eigenvector set after dimension reduction in step (3) is carried out₁＝{f₁₁ f₁₂ … f_1kPerforming annular multi-granularity scanning processing to the characteristic f₁₁And characteristic f_1kConnected so that the feature vector z₁＝{f₁₁ f₁₂ … f_1kBecoming a ring shape connected end to end; secondly, from the start feature f, a sliding window of length t₁₁Initially, a feature subvector is obtained for each unit sliding from left to right, and then the sliding window is moved one unit to the right, so that the starting feature is f₁₂And the obtained feature subvector is marked as F₂＝{f₁₂,f₁₃,…,f_1(t+1)…, and finally, sliding the windowThe mouth is moved to have a starting characteristic of f_1kAnd the obtained characteristic subvectors are marked as F_k＝{f_1k,f₁₁,…,f_1(t-1)All k feature subvectors form a feature subvector set H₁＝{F₁ F₂ … F_k}；

Then, firstly, the second eigenvector z in the new eigenvector set after dimension reduction in step (3) is selected₂＝{f₂₁ f₂₂ … f_2kPerforming annular multi-granularity scanning processing to the characteristic f₂₁And characteristic f_2kConnected so that the feature vector z₂＝{f₂₁ f₂₂ … f_2kBecoming a ring shape connected end to end; secondly, from the starting feature f, a sliding window of length t₂₁Initially, a feature subvector is obtained for each unit sliding from left to right, and then the sliding window is moved one unit to the right, so that the starting feature is f₂₂And the obtained feature subvector is marked as F₂＝{f₂₂,f₂₃,…,f_2(t+1)…, and finally, the sliding window is moved to have the starting feature F_k＝{f_2k,f₂₁,…,f_2(t-1)All k feature subvectors form a feature subvector set H₂＝{F₁ F₂ … F_k}；

… and so on; subsequently, the nth feature vector z in the new feature vector set after dimension reduction in the step (3) is subjected to_n＝{f_n1f_n2 … f_nkPerforming annular multi-granularity scanning processing to the characteristic f_n1And characteristic f_nkConnected so that the feature vector z_n＝{f_n1 f_n2… f_nkBecoming a ring shape connected end to end; secondly, from the start feature f, a sliding window of length t_n1Starting with a unit sliding from left to right, one feature sub-vector is obtained for each unit of sliding, and then the sliding window is moved to the right by one unit, so that the starting feature is f_n2And the obtained feature subvector is marked as F₂＝{f_n2,f_n3,…,f_n(t+1)…, and finally, the sliding window is moved to have the starting feature F_k＝{f_nk,f_n1,…,f_n(t-1)All k feature subvectors form a feature subvector set H_n＝{F₁ F₂ … F_k}；

Finally, a large set composed of all feature sub-vector sets corresponding to all n sample feature vectors is recorded as H ═ H₁,H₂,…,H_d},(d＝1,2,…,n)；

(5) Collecting H each feature sub-vector in the large set H obtained in the step (4)_dAnd each eigenvector in d e {1,2, …, n } is respectively input into a fully random forest classifier and a semi-random forest classifier to obtain 2 c-dimensional class eigenvectors U ═ U [ n ]₁,u₂,…,u_cH, each feature subvector set H_dAnd d is equal to k 2 c-dimensional feature vectors corresponding to all k feature sub-vectors in {1,2, …, n }, so as to form an initial class feature vector b_dThe dimension is 1 × 2kc, and n initial class feature vectors corresponding to all feature subvector sets in the set H form an initial class feature vector set B ═ B₁ b₂ … b_nDimension n × 2kc, wherein c represents the number of classification categories;

(6) inputting the initial class feature vector generated in the step (5) into a diversified cascade forest structure for iterative training until the diversified cascade forest structure is converged, thereby obtaining a trained anomaly detection model; the concrete step (6) is that,

firstly, the first layer of the diversified cascade forest structure is processed as follows:

firstly, a first initial class feature vector B in an initial class feature vector set B is set₁Respectively inputting five decision trees of semi-random forest, completely random forest, XGboost, GBDT and Catboost of a diversified cascade forest structure to obtain five c-dimensional class characteristic vectors o₁,o₂,o₃,o₄,o₅Class feature vector o₁,o₂,o₃,o₄,o₅And the initial class feature vector b₁Combining to generate (5c +2kc) dimension class feature vector e₁₁＝{o₁ o₂ o₃ o₄ o₅ b₁}；

finally, all (5c +2kc) dimensional class feature vectors corresponding to all n initial class feature vectors in the initial class feature vector set B form a class feature vector set E of the first layer₁＝{e₁₁ e₁₂ … e_1n}；

firstly, a class feature vector set E generated by a first layer is collected₁＝{e₁₁ e₁₂ … e_1nThe first class feature vector e in₁₁Respectively inputting five decision trees of semi-random forest, completely random forest, XGboost, GBDT and Catboost of a diversified cascade forest structure to obtain five c-dimensional class characteristic vectors o₁,o₂,o₃,o₄,o₅Class feature vector o₁,o₂,o₃,o₄,o₅And the initial class feature vector b₁Generating a (5c +2kc) dimensional feature vector e in combination₂₁＝{o₁ o₂ o₃ o₄ o₅ b₁}；

Secondly, the class feature vectors generated by the first layer are collected into a set E₁＝{e₁₁ e₁₂ … e_1nThe second class feature vector e in₁₂Respectively input diversified cascade forestFive decision trees of semi-random forest, completely random forest, XGboost, GBDT and Catboost of the structure are used for obtaining five c-dimensional class characteristic vectors o₁,o₂,o₃,o₄,o₅Class feature vector o₁,o₂,o₃,o₄,o₅And the initial class feature vector b₂Generating a (5c +2kc) dimensional feature vector e in combination₂₂＝{o₁ o₂ o₃ o₄ o₅ b₂}, … and so on;

finally, all (5c +2kc) dimensional feature vectors corresponding to all n initial class feature vectors in the initial class feature vector set B form a class feature vector set E of the second layer₂＝{e₂₁ e₂₂ … e_2n}；

2. The improved deep forest-based industrial control system abnormality detection method as claimed in claim 1, wherein step (1) is specifically to perform unified numerical conversion on data types of the acquired network data, label of normal data and label of abnormal data are respectively represented by 0 and 1, then normalization processing is performed on the numerically converted network data by a Z-score method, and finally, one feature of each sample in the normalized network data is used as a data dimension, so that the samples are converted into feature vectors, and the feature vectors corresponding to all the samples form a sample set.

3. The method as claimed in claim 1 or 2, wherein the step (2) is specifically that a PCA method is used to perform feature extraction on the sample set obtained in the step (1) to obtain a new feature vector set after dimension reduction, an annular multi-granularity scanning structure in a trained anomaly detection model is used to process each feature vector in the new feature vector set after dimension reduction to obtain a feature sub-vector set corresponding to the feature vector, all feature sub-vector sets form a large set, each feature sub-vector in each feature sub-vector set in the large set is respectively input into a fully random forest classifier and a semi-random forest classifier to obtain a class feature vector, and a plurality of initial class feature vectors corresponding to all feature sub-vector sets in the set form a class feature vector set, and finally, inputting the final class feature vector into the final layer of integrated classification model to obtain a plurality of classification results, and acquiring the mean value of all classification results, wherein if the mean value is more than 0.5, the industrial control system to be detected is abnormal, and otherwise, the industrial control system to be detected is normal.

4. An improved deep forest based industrial control system anomaly detection system, comprising:

the second module is used for inputting the sample set obtained by the first module into a pre-trained anomaly detection model so as to obtain an anomaly detection result; the anomaly detection model is obtained by training the following steps:

(2) normalization of step (1) by Z-score) And carrying out normalization processing on the obtained data set, and carrying out normalization processing on the data set according to the ratio of 5: 1 into training set X_train,X_train∈R^n×mAnd test set X_test,X_test∈R^n×m；

(3) Using PCA method to perform training set X obtained in step (2)_trainPerforming feature extraction to obtain a new feature vector set after dimension reduction; step (3) is specifically, firstly, a PCA method is utilized to carry out training set X_trainThe m feature sets in (1) are linearly transformed as follows:

wherein the content of the first and second substances,

then, according to the characteristic value lambda corresponding to the ith principal component_iCalculating the ith principal componentVariance contribution ratio p_i：

Then, set k to 2, follow the calculation formula in the previous paragraph to obtain the cumulative variance contribution rate of the ith principal component when k is 2

(4) processing each feature vector in the new feature vector set subjected to dimensionality reduction in the step (3) by using an annular multi-granularity scanning structure in the anomaly detection model to obtain a feature sub-vector set corresponding to the feature vector, wherein a large set H is formed by all n feature sub-vector sets; step (4) is specifically that firstly, the first eigenvector z in the new eigenvector set after dimension reduction in step (3) is carried out₁＝{f₁₁ f₁₂ … f_1kPerforming annular multi-granularity scanning processing to the characteristic f₁₁And characteristic f_1kConnected so that the feature vector z₁＝{f₁₁ f₁₂ … f_1kBecoming a ring shape connected end to end; secondly, from the start feature f, a sliding window of length t₁₁Initially, a feature subvector is obtained for each unit sliding from left to right, and then the sliding window is moved one unit to the right, so that the starting feature is f₁₂And the obtained feature subvector is marked as F₂＝{f₁₂,f₁₃,…,f_1(t+1)…, and finally, the sliding window is moved to have the starting feature f_1kAnd the obtained characteristic subvectors are marked as F_k＝{f_1k,f₁₁,…,f_1(t-1)All k feature subvectors form a feature subvector set H₁＝{F₁ F₂ … F_k}；

Then, firstly, the second eigenvector z in the new eigenvector set after dimension reduction in step (3) is selected₂＝{f₂₁ f₂₂ … f_2kPerforming annular multi-granularity scanning processing to the characteristic f₂₁And characteristic f_2kConnected so that the feature vector z₂＝{f₂₁ f₂₂ … f_2kBecoming a ring shape connected end to end; secondly, from the start feature f, a sliding window of length t₂₁Initially, a feature subvector is obtained for each unit sliding from left to right, and then the sliding window is moved one unit to the right, so that the starting feature is f₂₂And the obtained feature subvector is marked as F₂＝{f₂₂,f₂₃,…,f_2(t+1)…, and finally, the sliding window is moved to have the starting feature F_k＝{f_2k,f₂₁,…,f_2(t-1)All k feature subvectors form a feature subvector set H₂＝{F₁ F₂ … F_k}；

… and so on; subsequently, the nth feature vector z in the new feature vector set after dimension reduction in the step (3) is subjected to_n＝{f_n1f_n2 … f_nkPerforming annular multi-granularity scanning processing to the characteristic f_n1And characteristic f_nkConnected so that the feature vector z_n＝{f_n1 f_n2… f_nkBecoming a ring shape connected end to end; secondly, from the start feature f, a sliding window of length t_n1Initially, a feature subvector is obtained for each unit sliding from left to right, and then the sliding window is moved one unit to the right, so that the starting feature is f_n2And the obtained feature subvector is marked as F₂＝{f_n2,f_n3,…,f_n(t+1)…, and finally, the sliding window is moved to have the starting feature F_k＝{f_nk,f_n1,…,f_n(t-1)All k feature subvectors form a feature subvector set H_n＝{F₁ F₂ … F_k}；

(5) Collecting H each feature sub-vector in the large set H obtained in the step (4)_dAnd each eigenvector in d e {1,2, …, n } is respectively input into a fully random forest classifier and a semi-random forest classifier to obtain 2 c-dimensional class eigenvectors U ═ U [ n ]₁,u₂,…,u_cH, each feature subvector set H_dAnd d is equal to k 2 c-dimensional feature vectors corresponding to all k feature sub-vectors in {1,2, …, n }, so as to form an initial class feature vector b_dThe dimension is 1 × 2kc, and n initial class feature vectors corresponding to all feature sub-vector sets in the set H form an initial class feature vector set B ═ B₁ b₂ … b_nDimension n × 2kc, wherein c represents the number of classification categories;

Then, the second one in the initial class feature vector set BFinal class feature vector b₂Respectively inputting five decision trees of semi-random forest, completely random forest, XGboost, GBDT and Catboost of a diversified cascade forest structure to obtain five c-dimensional class characteristic vectors o₁,o₂,o₃,o₄,o₅Class feature vector o₁,o₂,o₃,o₄,o₅And the initial class feature vector b₂Generating a (5c +2kc) dimensional feature vector e in combination₁₂＝{o₁ o₂ o₃ o₄ o₅ b₂}, … and so on;

Secondly, the class feature vectors generated by the first layer are collected into a set E₁＝{e₁₁ e₁₂ … e_1nThe second class feature vector e in₁₂Respectively inputting five decision trees of semi-random forest, completely random forest, XGboost, GBDT and Catboost of a diversified cascade forest structure to obtain five c-dimensional class characteristic vectors o₁,o₂,o₃,o₄,o₅Class feature vector o₁,o₂,o₃,o₄,o₅And the initial class feature vector b₂Generating a (5c +2kc) dimensional feature vector e in combination₂₂＝{o₁ o₂ o₃ o₄ o₅ b₂}, … and so on;