CN111275074A - Power CPS information attack identification method based on stack type self-coding network model - Google Patents

Power CPS information attack identification method based on stack type self-coding network model Download PDF

Info

Publication number
CN111275074A
CN111275074A CN202010015226.4A CN202010015226A CN111275074A CN 111275074 A CN111275074 A CN 111275074A CN 202010015226 A CN202010015226 A CN 202010015226A CN 111275074 A CN111275074 A CN 111275074A
Authority
CN
China
Prior art keywords
feature
information
features
attack
maximum
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010015226.4A
Other languages
Chinese (zh)
Other versions
CN111275074B (en
Inventor
魏晓明
曲朝阳
武赟
王蕾
薄小永
曹杰
齐四清
吕洪波
胡可为
孙建
薛凯
徐鹏程
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Taipingwan Power Station State Grid Northeast Branch Department Lyuyuan Hydroelectric Co
State Grid Jilin Electric Power Corp
Northeast Electric Power University
Information and Telecommunication Branch of State Grid East Inner Mogolia Electric Power Co Ltd
Information and Telecommunication Branch of State Grid Jilin Electric Power Co Ltd
Original Assignee
Taipingwan Power Station State Grid Northeast Branch Department Lyuyuan Hydroelectric Co
Northeast Dianli University
State Grid Jilin Electric Power Corp
Information and Telecommunication Branch of State Grid East Inner Mogolia Electric Power Co Ltd
Information and Telecommunication Branch of State Grid Jilin Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Taipingwan Power Station State Grid Northeast Branch Department Lyuyuan Hydroelectric Co, Northeast Dianli University, State Grid Jilin Electric Power Corp, Information and Telecommunication Branch of State Grid East Inner Mogolia Electric Power Co Ltd, Information and Telecommunication Branch of State Grid Jilin Electric Power Co Ltd filed Critical Taipingwan Power Station State Grid Northeast Branch Department Lyuyuan Hydroelectric Co
Priority to CN202010015226.4A priority Critical patent/CN111275074B/en
Publication of CN111275074A publication Critical patent/CN111275074A/en
Application granted granted Critical
Publication of CN111275074B publication Critical patent/CN111275074B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Marketing (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Primary Health Care (AREA)
  • Quality & Reliability (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Public Health (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Water Supply & Treatment (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a power CPS information attack identification method based on a stack type self-coding network model, which is characterized by comprising the following contents: according to the properties of CPS data non-function dependence, non-linear correlation and the like, introducing a maximum information coefficient to select data characteristics and determining an optimal attack characteristic set; constructing an information attack identification model based on a stacked self-coding network, and setting an unsupervised pre-training encoder and a supervised fine-tuning classifier to perform network parameter training and updating; the model initial parameter optimization based on the self-adaptive cuckoo algorithm is realized. The method solves the problems of complex data characteristics, relatively low identification precision and the like in the process of identifying the electric CPS information attack, and has the advantages of scientific and reasonable method, strong applicability, good effect and the like.

Description

Power CPS information attack identification method based on stack type self-coding network model
Technical Field
The invention relates to the field of power information physical systems, in particular to a power CPS information attack identification method based on a stack type self-coding network model.
Background
With the continuous development of information technology, the information side and the physical side in the power system are increasingly coupled interactively, and a power information physical system (CPS) integrating a computing system, a communication network and a physical environment is gradually formed. In the process of power grid production management and scheduling control, an information system is not required to be separated. But at the same time, some bugs existing in the information system can be invaded and utilized by attackers, cross-space poses serious threats to the physical system, and even temporary paralysis of important infrastructure is caused. The security problem of the information side of the power CPS gradually draws attention of people, and how to effectively identify the information attack becomes a problem to be solved urgently for the safe and stable operation of the current power system. However, most of the existing researches are to combine an artificial intelligence algorithm with the electric power CPS to analyze the attack behaviors of massive CPS data so as to detect and identify the attacks. Although complex characteristics existing among data features are considered in research, the complex characteristics enable the convergence speed of a model in the identification process to be low, and identification results are easy to fall into local optimization.
Disclosure of Invention
The invention aims to solve the problems of complex data characteristics, relatively low identification precision and the like in the process of identifying the electric CPS information attack, and provides an electric CPS information attack identification method based on a stacked self-coding network model from the aspects of correlation and redundancy of CPS data characteristics. Firstly, analyzing the properties of CPS data such as non-function dependence, non-linear correlation and the like, providing a feature selection method based on the maximum information coefficient, and determining an optimal attack feature set. Then, an information attack identification model based on the stacked self-coding network is constructed, and an unsupervised pre-training encoder and a supervised fine-tuning classifier are arranged to carry out network parameter training and updating. And finally, model initial parameter optimization based on the self-adaptive cuckoo algorithm is realized. The example analysis shows that the method can effectively improve the identification precision of the information attack.
The technical scheme adopted for realizing the aim of the invention is as follows: a power CPS information attack identification method based on a stack type self-coding network model is characterized by comprising the following steps:
1) maximum correlation minimum redundancy attack feature selection method considering maximum information coefficient improvement reflects nonlinear correlation and non-functional dependency relationship in data features, analyzes correlation and redundancy among features, and further selects optimal attack feature set
(a) Cutting a set D formed by the features < x, y > by using a grid G, calculating mutual information values in each sub-grid by changing the positions of the segmentation points to obtain the maximum mutual information value of the whole grid G, and forming a feature matrix M (D) x, y by changing the maximum normalized I (D, x, y) values obtained by changing different cutting points, as shown in a formula (1):
Figure BDA0002358628160000021
the maximum information coefficient is defined as formula (2):
Figure BDA0002358628160000022
wherein MIC (D) is in the range of [0, 1%]B (n) is an upper limit of the grid size, if b (n) is too large, it may cause the data in the set D to be aggregated in a small portion of the sub-grids, and if b (n) is too small, less data may be searched, taking the values b (n) ═ n0.6The effect is optimal;
(b) the larger the MIC between the features and the categories is, the stronger the representative correlation is, and the greater the influence on the final classification result is; the greater the MIC between features, the stronger the substitutability between features, i.e. the stronger the redundancy, and the quantitative analysis correlation and redundancy process is as shown in formula (3) and formula (4):
Figure BDA0002358628160000023
Figure BDA0002358628160000024
wherein D represents the correlation between the feature set F and the attack category c, and R represents the redundancy between the features in the set F; f and | F | are respectively a feature set and the number of features, xiRepresents the ith feature, and c represents a category label; MIC (x)iC) maximum information coefficient between feature i and object class, MIC (x)i,xj) Representing the maximum information coefficient between the feature i and the feature j;
(c) the optimal attack feature set realizes feature selection from the aspect of feature correlation and redundancy, the conditions of maximum correlation and minimum redundancy are required to be met in the selected set, the original feature set is set to be F, and the optimal feature subset F of m-1 features is obtained(m-1)From the remainder F-F(m-1)The process of selecting the mth feature from the features should satisfy formula (5):
Figure BDA0002358628160000025
2) considering the complex characteristics among the data features of the power CPS, through analyzing and researching historical data, an information attack identification model based on a Stacked Auto-Encoding (SAE) is provided, and the steps are as follows:
(d) constructing an unsupervised pre-training encoder, enabling an input layer and an output layer of the network to be the same as far as possible, enabling middle hidden layer low-dimensional data to represent original data, pre-training each layer of the neural network by utilizing a layer-by-layer greedy training method, initializing network parameters layer by layer, respectively carrying out layer-by-layer abstract representation on physical and information characteristics in such a way, encoding the physical and information characteristics into low-dimensional data characteristics, and reducing the difficulty of model training;
(e) constructing a supervision fine-tuning classifier, encoding the encoded data for multiple times to obtain the physical characteristics and information characteristics after dimension reduction, constructing a softmax classifier to perform the final attack identification step, setting neurons of an output layer to be N, and regarding the N-type electric power CPS information attack modes, each neuron represents one-type attack;
(f) when the SAE identification model adjusts the optimization parameters, the setting requirement on the initial parameters is higher, and the objective function expression of the initial parameters of the model is as in formula (6):
Figure BDA0002358628160000031
where n is the total number of samples, y' (i) represents the desired output sample, and y (i) represents the actual output sample;
3) after the objective function of the model is obtained, a self-adaptive cuckoo algorithm is provided for carrying out function solving, initial parameters are effectively set, and the weight and the threshold value in the SAE identification model are optimized:
(h) for adaptive step size factor α0Dynamic setting is carried out, the larger the value is, the stronger the global search capability is represented, but the convergence precision of the algorithm is reduced; the smaller this value is, the higher the optimization accuracy is represented, but the slower the convergence speed is, and the dynamic setting is as in equation (7):
Figure BDA0002358628160000032
in the formula, tiRepresenting the current number of iterations, tmaxRepresenting the maximum number of iterations;
(g) the method provides a self-adaptive cuckoo algorithm to solve the initial parameters of the model, improves the traditional cuckoo algorithm and finds the probability paThe dynamic setting is carried out, the dynamic setting is gradually increased along with the progress of the search, the balance between the global search and the local search in the algorithm can be kept at the later stage of evolution, the convergence precision of the algorithm is integrally improved, the phenomenon that the algorithm is trapped into the local optimum is avoided, and the dynamic setting is as the formula (8):
Figure BDA0002358628160000033
in the formula, paRepresenting the probability of finding bird nests, pa,maxDenotes the maximum probability of discovery, tiRepresenting the current number of iterations, tmaxRepresenting the maximum number of iterations;
4) after the network parameters are initialized by the self-adaptive cuckoo algorithm, the network parameters are reversely adjusted and optimized on the basis, the weight of the neural network parameters is trained, CPS information attacks are identified, and operation and maintenance personnel carry out corresponding processing according to the identification result.
Compared with the prior art, the electric CPS information attack identification method based on the stacked self-coding network has the beneficial effects that:
1) considering the characteristics of high dimension, nonlinear correlation, non-function dependence relationship and the like of the electric power CPS data, a maximum information coefficient is introduced to select data characteristics, and an optimal attack characteristic set is determined. The correlation and redundancy among the characteristics are analyzed, and the identification precision and the training speed of the model are effectively improved;
2) and constructing a stack type self-coding network identification model, pre-training each layer of the neural network by using a layer-by-layer greedy training method, solving the problem of high dimensionality of data characteristics, and deeply extracting abstract characteristics. Comparing the label result with the classification result, and adjusting the model parameters by using a back propagation algorithm to ensure that all layer parameters of the whole identification model reach global optimum as much as possible;
3) the cuckoo algorithm is improved, the convergence speed of a target function is improved by adaptively setting the discovery probability and the step size factor, and the local optimization is prevented;
4) the method is scientific and reasonable, and has strong applicability and good effect.
Drawings
FIG. 1 is a flow chart of an electric CPS information attack identification method based on a stacked self-coding network according to the present invention;
FIG. 2 is a graph comparing loss functions in different initialization parameter modes;
FIG. 3 is a diagram of a power CPS feature correlation ranking based on maximum information coefficients;
FIG. 4 is a diagram illustrating recognition accuracy pairs for different initial features;
FIG. 5 is a diagram illustrating comparison of recognition accuracy between different recognition methods.
Detailed Description
The following describes in detail an electric CPS information attack identification method based on a stacked self-coding network according to the present invention with reference to the accompanying drawings.
Referring to fig. 1, a power CPS information attack identification method based on a stacked self-coding network includes the following steps:
1) considering that properties such as high-dimensional characteristics, nonlinear correlation, non-function dependency relationship and the like in CPS data cause serious obstacles in research and application processes, the invention provides a maximum correlation minimum redundancy attack feature selection method considering maximum information coefficient improvement, reflecting nonlinear correlation and non-function dependency relationship in data features, analyzing correlation and redundancy among features, and further selecting an optimal attack feature set
(a) And (4) preprocessing data. The CPS data may contain null values or obvious infinite abnormal values (such as NAN and INF values), the existence of the values has serious influence on the attack identification process, and the whole piece of data containing the abnormal values is selected to be deleted because the data set depended on by the invention is huge; identifying that the question belongs to a multi-category question, therefore, the category attribute should be converted into a one-hot coded form, e.g., event 1 type can be converted into (1,0,0, …,0), event 41 can be converted into (0,0, …,0, 1); the difference of different characteristic values in the original data is large, large errors are easy to generate, and the original data is normalized;
(b) and cutting a set D formed by the characteristics < x, y > by using the grid G, and calculating mutual information values in each sub-grid by changing the positions of the division points to obtain the maximum mutual information value of the whole grid G. And (3) combining the maximum normalized I (D, x, y) values obtained by changing different cutting points into a feature matrix M (D) x, y as shown in a formula (1):
Figure BDA0002358628160000041
the maximum information coefficient is defined as formula (2):
Figure BDA0002358628160000051
wherein MIC (D) is in the range of [0, 1%]And B (n) is the upper limit of the grid size. If B (n) is too large, it may cause the data in set D to be all gathered in fewerIn part of the submesh, and if b (n) is too small, then less data can be searched, taking the values b (n) ═ n0.6The effect is optimal;
(c) the larger the MIC between the features and the categories is, the stronger the representative correlation is, and the greater the influence on the final classification result is; the larger the MIC from feature to feature, the stronger the substitutability, i.e., the greater the redundancy, between the features. The correlation and redundancy process of quantitative analysis is shown in formula (3) and formula (4):
Figure BDA0002358628160000052
Figure BDA0002358628160000053
wherein D represents the correlation between the feature set F and the attack category c, and R represents the redundancy between the features in the set F; f and | F | are respectively a feature set and the number of features, xiRepresents the ith feature, and c represents a category label; MIC (x)iC) maximum information coefficient between feature i and object class, MIC (x)i,xj) Representing the maximum information coefficient between the feature i and the feature j;
(d) the optimal attack feature set realizes feature selection from the aspect of feature correlation and redundancy, which requires that the conditions of maximum correlation and minimum redundancy are satisfied in the selected set. Setting the original feature set to be F, and acquiring the optimal feature subset F of m-1 features(m-1)From the remainder F-F(m-1)The process of selecting the mth feature from the features should satisfy formula (5):
Figure BDA0002358628160000054
(e) the algorithm flow is as follows: inputting: feature set F, category label C; and (3) outputting: and F' of an optimal attack feature set.
① pairs of physical features F in the feature set FPAnd carrying out discretization processing, wherein the initial value of the feature set F' is a null value.
② calculate the maximum information coefficient of each feature and class label C, and remove features that are not relevant and are weakly relevant.
③ find the feature F in F that maximizes equation (5)iAdding the attack features into the optimal attack feature set F', and deleting the features F in Fi
④ loop through ③ to continue selecting features from the remaining features of feature set F.
⑤, an optimal attack feature set F' is obtained.
2) Considering the complex characteristics among the data features of the power CPS, through analyzing and researching historical data, an information attack identification model based on a Stacked Auto-Encoding (SAE) is provided, and the steps are as follows:
(f) an unsupervised pre-training encoder is constructed, so that an input layer and an output layer of the network are the same as far as possible, low-dimensional data of a middle hidden layer can represent original data, each layer of the neural network is pre-trained by using a layer-by-layer greedy training method, network parameters are initialized layer by layer, physical and information characteristics are abstractly represented layer by layer in such a way, the low-dimensional data characteristics are encoded, and the difficulty of model training is reduced.
(g) And constructing a supervision fine-tuning classifier, encoding the encoded data for multiple times to obtain the physical characteristics and the information characteristics after dimension reduction, and constructing a softmax classifier to perform the final attack identification step. And setting output layer neurons as N, wherein for the N types of electric power CPS information attack modes, each neuron represents a type of attack.
(h) When the SAE identification model adjusts the optimization parameters, the setting requirement on the initial parameters is higher, and the objective function expression of the initial parameters of the model is as in formula (6):
Figure BDA0002358628160000061
where n is the total number of samples, y' (i) represents the desired output samples, and y (i) represents the actual output samples.
3) After the objective function of the model is obtained, a self-adaptive cuckoo algorithm is provided for carrying out function solving, initial parameters are effectively set, and the weight and the threshold value in the SAE identification model are optimized:
(i) randomly generating n bird nest initial positions
Figure BDA0002358628160000062
Respectively corresponding to the initial weight and the threshold parameter of the stacked self-coding network model, training the model by the neural network according to the parameter value, and calculating the result according to the following formula:
Figure BDA0002358628160000063
Figure BDA0002358628160000064
in the formula (I), the compound is shown in the specification,
Figure BDA0002358628160000065
indicating the position of the ith bird's nest in the t-th generation, α indicating the step size control factor,
Figure BDA0002358628160000066
for the purpose of the point-to-point multiplication,
Figure BDA0002358628160000067
representing the current generation optimal solution, α0Fixed value 0.01, L (λ) is a random search path, obeying the lave distribution:
Figure BDA0002358628160000068
mu and v are normally distributed, β is 1.5, phi is as follows:
Figure BDA0002358628160000069
(j) step size factor α in general0The larger the algorithm is, the stronger the global search capability is represented, but the convergence accuracy of the algorithm is reduced; the smaller the value, the higher the optimization accuracy, but the convergence rateThe slower will be. The value is set as a fixed value in the standard cuckoo algorithm, so that the convergence process of the algorithm lacks adaptivity. The dynamic setting of the invention is shown in formula (7):
Figure BDA0002358628160000071
wherein, tiRepresenting the current number of iterations, tmaxIndicating the maximum number of iterations α0The value of the optimal value is gradually reduced along with the increase of the iteration times, the step length is ensured to be gradually reduced, the algorithm meets global search at the initial stage, and the optimization precision is improved at the later stage;
(k) by integrating the above processes, the expression of the adaptive cuckoo algorithm for generating new individuals is as follows:
Figure BDA0002358628160000072
(l) Calculating all bird nests in each iteration, and storing the best bird nest position in the current generation
Figure BDA0002358628160000073
And (5) storing.
(m) after obtaining the position of the bird nest of the new generation, replacing the position of the bird nest with poor performance of the previous generation by using the position of the bird nest with better performance, thereby obtaining a group of positions of the bird nest with better performance
Figure BDA0002358628160000074
(n) generating a [0,1 ]]Random number rand within the range. If rand>paThen the partial solutions are discarded and the same number of new solutions are generated using preferential random walk, as follows:
Figure BDA0002358628160000075
wherein the content of the first and second substances,
Figure BDA0002358628160000076
and
Figure BDA0002358628160000077
two random solutions representing the t-th generation.
(o) probability of discovery paTypically a fixed value of 0.25, the size of which determines whether the current solution is retained. In order to prevent the algorithm from falling into local optimization, the cuckoo algorithm is further improved: for the discovery probability paThe dynamic setting is carried out, the dynamic setting is gradually increased along with the search, the balance between the global search and the local search in the algorithm can be kept in the later evolution stage, the convergence precision of the algorithm is integrally improved, and the phenomenon that the algorithm is trapped in local optimum is avoided. The expression is shown in formula (8):
Figure BDA0002358628160000078
in the formula, paRepresenting the probability of finding bird nests, pa,maxDenotes the maximum probability of discovery, tiRepresenting the current number of iterations, tmaxThe maximum number of iterations is indicated.
(p) after obtaining a new group of nest positions, replacing e with the nest positions with better performance according to an objective functionkThe poor position of the bird nest. The latest nest position is obtained
Figure BDA0002358628160000079
(Q) finding QkAt an optimal bird nest position
Figure BDA00023586281600000710
If the maximum iteration number is not reached, returning to the step (l) to continue searching and optimizing, otherwise, outputting the optimal position
Figure BDA00023586281600000711
(r) according to the optimal bird nest position
Figure BDA0002358628160000081
Using the corresponding value as the initial parameter of the model, and performing forward training of the modelTraining and reverse regulation.
4) After the network parameters are initialized by the self-adaptive cuckoo algorithm, the network parameters are reversely adjusted and optimized on the basis, the weight of the neural network parameters is trained, CPS information attacks are identified, and operation and maintenance personnel carry out corresponding processing according to the identification result.
In order to verify that the electric power CPS information attack identification method based on the stacked self-coding network can effectively identify information attacks, the inventor adopts the method provided by the invention to compare and analyze with the traditional machine learning method, and fig. 2 shows that the adaptive cuckoo algorithm can fix initial parameters to a proper place, and after a fine adjustment process, a training model is converged to a more ideal state, and the model training speed is improved to a certain extent. As can be seen from fig. 3, in the optimal feature selection process, about 75% of the 128 features may provide a higher learning value, where the two features with the highest correlation are the phase angles of the a-phase voltages of the physical devices R1 and R2, respectively. Fig. 4 indicates that the initial feature numbers are different, and the model identification accuracy also changes greatly, because the features selected by the model 1 are relatively few, and part of valid information is missing, while the feature dimensions of the model 3 are large, and there are some redundant features and weak correlation features, which generate a certain level of confusion behavior for the model, increase the complexity of the model training process, and cause the identification accuracy to be slightly reduced. Fig. 5 compares the identification method proposed by the present invention with the conventional machine learning algorithm, and proves the feasibility and accuracy of the method proposed by the present invention.

Claims (1)

1. A power CPS information attack identification method based on a stack type self-coding network model is characterized by comprising the following steps:
1) maximum correlation minimum redundancy attack feature selection method considering maximum information coefficient improvement reflects nonlinear correlation and non-functional dependency relationship in data features, analyzes correlation and redundancy among features, and further selects optimal attack feature set
(a) Cutting a set D formed by the features < x, y > by using a grid G, calculating mutual information values in each sub-grid by changing the positions of the segmentation points to obtain the maximum mutual information value of the whole grid G, and forming a feature matrix M (D) x, y by changing the maximum normalized I (D, x, y) values obtained by changing different cutting points, as shown in a formula (1):
Figure FDA0002358628150000011
the maximum information coefficient is defined as formula (2):
Figure FDA0002358628150000012
wherein MIC (D) is in the range of [0, 1%]B (n) is an upper limit of the grid size, if b (n) is too large, it may cause the data in the set D to be aggregated in a small portion of the sub-grids, and if b (n) is too small, less data may be searched, taking the values b (n) ═ n0.6The effect is optimal;
(b) the larger the MIC between the features and the categories is, the stronger the representative correlation is, and the greater the influence on the final classification result is; the greater the MIC between features, the stronger the substitutability between features, i.e. the stronger the redundancy, and the quantitative analysis correlation and redundancy process is as shown in formula (3) and formula (4):
Figure FDA0002358628150000013
Figure FDA0002358628150000014
wherein D represents the correlation between the feature set F and the attack category c, and R represents the redundancy between the features in the set F; f and | F | are respectively a feature set and the number of features, xiRepresents the ith feature, and c represents a category label; MIC (x)iC) maximum information coefficient between feature i and object class, MIC (x)i,xj) Representing the maximum information coefficient between the feature i and the feature j;
(c) the optimal attack feature set realizes feature selection from the aspect of feature correlation and redundancy, the conditions of maximum correlation and minimum redundancy are required to be met in the selected set, the original feature set is set to be F, and the optimal feature subset F of m-1 features is obtained(m-1)From the remainder F-F(m-1)The process of selecting the mth feature from the features should satisfy formula (5):
Figure FDA0002358628150000021
2) considering the complex characteristics among the data features of the power CPS, through analyzing and researching historical data, an information attack identification model based on a Stacked Auto-Encoding (SAE) is provided, and the steps are as follows:
(d) constructing an unsupervised pre-training encoder, enabling an input layer and an output layer of the network to be the same as far as possible, enabling middle hidden layer low-dimensional data to represent original data, pre-training each layer of the neural network by utilizing a layer-by-layer greedy training method, initializing network parameters layer by layer, respectively carrying out layer-by-layer abstract representation on physical and information characteristics in such a way, encoding the physical and information characteristics into low-dimensional data characteristics, and reducing the difficulty of model training;
(e) constructing a supervision fine-tuning classifier, encoding the encoded data for multiple times to obtain the physical characteristics and information characteristics after dimension reduction, constructing a softmax classifier to perform the final attack identification step, setting neurons of an output layer to be N, and regarding the N-type electric power CPS information attack modes, each neuron represents one-type attack;
(f) when the SAE identification model adjusts the optimization parameters, the setting requirement on the initial parameters is higher, and the objective function expression of the initial parameters of the model is as in formula (6):
Figure FDA0002358628150000022
where n is the total number of samples, y' (i) represents the desired output sample, and y (i) represents the actual output sample;
3) after the objective function of the model is obtained, a self-adaptive cuckoo algorithm is provided for carrying out function solving, initial parameters are effectively set, and the weight and the threshold value in the SAE identification model are optimized:
(h) for adaptive step size factor α0Dynamic setting is carried out, the larger the value is, the stronger the global search capability is represented, but the convergence precision of the algorithm is reduced; the smaller this value is, the higher the optimization accuracy is represented, but the slower the convergence speed is, and the dynamic setting is as in equation (7):
Figure FDA0002358628150000023
in the formula, tiRepresenting the current number of iterations, tmaxRepresenting the maximum number of iterations;
(g) the method provides a self-adaptive cuckoo algorithm to solve the initial parameters of the model, improves the traditional cuckoo algorithm and finds the probability paThe dynamic setting is carried out, the dynamic setting is gradually increased along with the progress of the search, the balance between the global search and the local search in the algorithm can be kept at the later stage of evolution, the convergence precision of the algorithm is integrally improved, the phenomenon that the algorithm is trapped into the local optimum is avoided, and the dynamic setting is as the formula (8):
Figure FDA0002358628150000024
in the formula, paRepresenting the probability of finding bird nests, pa,maxDenotes the maximum probability of discovery, tiRepresenting the current number of iterations, tmaxRepresenting the maximum number of iterations;
4) after the network parameters are initialized by the self-adaptive cuckoo algorithm, the network parameters are reversely adjusted and optimized on the basis, the weight of the neural network parameters is trained, CPS information attacks are identified, and operation and maintenance personnel carry out corresponding processing according to the identification result.
CN202010015226.4A 2020-01-07 2020-01-07 Power CPS information attack identification method based on stacked self-coding network model Active CN111275074B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010015226.4A CN111275074B (en) 2020-01-07 2020-01-07 Power CPS information attack identification method based on stacked self-coding network model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010015226.4A CN111275074B (en) 2020-01-07 2020-01-07 Power CPS information attack identification method based on stacked self-coding network model

Publications (2)

Publication Number Publication Date
CN111275074A true CN111275074A (en) 2020-06-12
CN111275074B CN111275074B (en) 2022-08-05

Family

ID=71001564

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010015226.4A Active CN111275074B (en) 2020-01-07 2020-01-07 Power CPS information attack identification method based on stacked self-coding network model

Country Status (1)

Country Link
CN (1) CN111275074B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112417524A (en) * 2020-10-14 2021-02-26 浙江工业大学 Fingerprint identification method based on multivariate physical characteristic mining
CN112699936A (en) * 2020-12-29 2021-04-23 东北电力大学 Electric power CPS generalized false data injection attack identification method
CN115174170A (en) * 2022-06-23 2022-10-11 东北电力大学 VPN encrypted flow identification method based on ensemble learning

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180082058A1 (en) * 2016-09-20 2018-03-22 Ut Battelle, Llc Cyber physical attack detection
CN109039766A (en) * 2018-08-29 2018-12-18 东北电力大学 A kind of electric power CPS network risks transmission threshold based on seepage flow probability determines method
CN109167349A (en) * 2018-08-29 2019-01-08 东北电力大学 A kind of electric power CPS biological treatability quantitative estimation method counted and load optimal is reconfigured
CN109598336A (en) * 2018-12-05 2019-04-09 国网江西省电力有限公司信息通信分公司 A kind of Data Reduction method encoding neural network certainly based on stack noise reduction
CN110300018A (en) * 2019-05-30 2019-10-01 武汉大学 A kind of electric network information physical system hierarchical modeling method of object-oriented
CN110610708A (en) * 2019-08-31 2019-12-24 浙江工业大学 Voiceprint recognition attack defense method based on cuckoo search algorithm
CN110635474A (en) * 2019-09-16 2019-12-31 东北电力大学 Power grid dynamic trajectory trend prediction method based on long-term and short-term memory network

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180082058A1 (en) * 2016-09-20 2018-03-22 Ut Battelle, Llc Cyber physical attack detection
CN109039766A (en) * 2018-08-29 2018-12-18 东北电力大学 A kind of electric power CPS network risks transmission threshold based on seepage flow probability determines method
CN109167349A (en) * 2018-08-29 2019-01-08 东北电力大学 A kind of electric power CPS biological treatability quantitative estimation method counted and load optimal is reconfigured
CN109598336A (en) * 2018-12-05 2019-04-09 国网江西省电力有限公司信息通信分公司 A kind of Data Reduction method encoding neural network certainly based on stack noise reduction
CN110300018A (en) * 2019-05-30 2019-10-01 武汉大学 A kind of electric network information physical system hierarchical modeling method of object-oriented
CN110610708A (en) * 2019-08-31 2019-12-24 浙江工业大学 Voiceprint recognition attack defense method based on cuckoo search algorithm
CN110635474A (en) * 2019-09-16 2019-12-31 东北电力大学 Power grid dynamic trajectory trend prediction method based on long-term and short-term memory network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
丁达 等: "信息物理融合系统网络安全综述", 《信息与控制》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112417524A (en) * 2020-10-14 2021-02-26 浙江工业大学 Fingerprint identification method based on multivariate physical characteristic mining
CN112417524B (en) * 2020-10-14 2024-04-16 浙江工业大学 Fingerprint identification method based on multi-element physical feature mining
CN112699936A (en) * 2020-12-29 2021-04-23 东北电力大学 Electric power CPS generalized false data injection attack identification method
CN112699936B (en) * 2020-12-29 2022-06-28 东北电力大学 Electric power CPS generalized false data injection attack identification method
CN115174170A (en) * 2022-06-23 2022-10-11 东北电力大学 VPN encrypted flow identification method based on ensemble learning
CN115174170B (en) * 2022-06-23 2023-05-09 东北电力大学 VPN encryption flow identification method based on ensemble learning

Also Published As

Publication number Publication date
CN111275074B (en) 2022-08-05

Similar Documents

Publication Publication Date Title
CN108564192B (en) Short-term photovoltaic power prediction method based on meteorological factor weight similarity day
CN111275074B (en) Power CPS information attack identification method based on stacked self-coding network model
Chakraborty Feature subset selection by particle swarm optimization with fuzzy fitness function
CN111785329A (en) Single-cell RNA sequencing clustering method based on confrontation automatic encoder
CN111583031A (en) Application scoring card model building method based on ensemble learning
CN111143838B (en) Database user abnormal behavior detection method
CN110717610A (en) Wind power prediction method based on data mining
CN112529638B (en) Service demand dynamic prediction method and system based on user classification and deep learning
CN114330659A (en) BP neural network parameter optimization method based on improved ASO algorithm
CN114118596A (en) Photovoltaic power generation capacity prediction method and device
CN115099461A (en) Solar radiation prediction method and system based on double-branch feature extraction
CN113722980A (en) Ocean wave height prediction method, system, computer equipment, storage medium and terminal
CN110177112B (en) Network intrusion detection method based on double subspace sampling and confidence offset
CN115293400A (en) Power system load prediction method and system
CN117574776A (en) Task planning-oriented model self-learning optimization method
CN117117850A (en) Short-term electricity load prediction method and system
CN115795035A (en) Science and technology service resource classification method and system based on evolutionary neural network and computer readable storage medium thereof
CN115713144A (en) Short-term wind speed multi-step prediction method based on combined CGRU model
Liu et al. A novel hybrid model for image classification
Saroj et al. A genetic algorithm with entropy based probabilistic initialization and memory for automated rule mining
Jiang et al. A CTR prediction approach for advertising based on embedding model and deep learning
Zhao et al. A hybrid method for incomplete data imputation
CN116881854B (en) XGBoost-fused time sequence prediction method for calculating feature weights
CN113807005B (en) Bearing residual life prediction method based on improved FPA-DBN
CN113162914B (en) Intrusion detection method and system based on Taylor neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 132012, Changchun Road, Jilin, Jilin, 169

Patentee after: NORTHEAST DIANLI University

Patentee after: STATE GRID INNER MONGOLIA EASTERN ELECTRIC POWER CO.,LTD. INFORMATION AND COMMUNICATION BRANCH

Patentee after: INFORMATION COMMUNICATION COMPANY OF STATE GRID JILIN ELECTRIC POWER Co.,Ltd.

Patentee after: STATE GRID JILIN ELECTRIC POWER SUPPLY Co.

Patentee after: TAIPINGWAN POWER STATION, STATE GRID NORTHEAST BRANCH DEPARTMENT LYUYUAN HYDROELECTRIC Co.

Address before: 132012, Changchun Road, Jilin, Jilin, 169

Patentee before: NORTHEAST DIANLI University

Patentee before: INFORMATION COMMUNICATION COMPANY OF STATE GRID JILIN ELECTRIC POWER Co.,Ltd.

Patentee before: STATE GRID JILIN ELECTRIC POWER SUPPLY Co.

Patentee before: STATE GRID INNER MONGOLIA EASTERN ELECTRIC POWER CO.,LTD. INFORMATION AND COMMUNICATION BRANCH

Patentee before: TAIPINGWAN POWER STATION, STATE GRID NORTHEAST BRANCH DEPARTMENT LYUYUAN HYDROELECTRIC Co.