CN114511194A - Operation risk prediction method and system of power Internet of things and electronic equipment - Google Patents

Operation risk prediction method and system of power Internet of things and electronic equipment Download PDF

Info

Publication number
CN114511194A
CN114511194A CN202210015149.1A CN202210015149A CN114511194A CN 114511194 A CN114511194 A CN 114511194A CN 202210015149 A CN202210015149 A CN 202210015149A CN 114511194 A CN114511194 A CN 114511194A
Authority
CN
China
Prior art keywords
things
power internet
data
data set
risk prediction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210015149.1A
Other languages
Chinese (zh)
Inventor
曲朝阳
梁丰
高秀芝
刘世民
董运昌
崔鸣石
姜涛
王蕾
薄小永
张振明
曹杰
杨明升
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeast Electric Power University
Information and Telecommunication Branch of State Grid East Inner Mogolia Electric Power Co Ltd
Original Assignee
Northeast Dianli University
Information and Telecommunication Branch of State Grid East Inner Mogolia Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeast Dianli University, Information and Telecommunication Branch of State Grid East Inner Mogolia Electric Power Co Ltd filed Critical Northeast Dianli University
Priority to CN202210015149.1A priority Critical patent/CN114511194A/en
Publication of CN114511194A publication Critical patent/CN114511194A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/251Fusion techniques of input or preprocessed data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Marketing (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • General Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Tourism & Hospitality (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Development Economics (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Game Theory and Decision Science (AREA)
  • Educational Administration (AREA)
  • Probability & Statistics with Applications (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

The invention relates to the technical field of power internet of things, in particular to a method, a system and electronic equipment for predicting operation risk of the power internet of things, wherein in the method, multi-source data in a preset historical time period are fused by taking a time sequence as a reference to obtain a complete data set, and the data balance processing is carried out on the complete data set based on a self-adaptive comprehensive oversampling method to obtain a balance data set; training based on a balance data set to obtain an operation risk prediction model of the power internet of things; and obtaining an operation risk prediction result of the to-be-tested power internet of things according to the current multi-source data of the to-be-tested power internet of things and the power internet of things operation risk prediction model. The measurement data of the information side, the physical side and the social side are fused, and the fused data are subjected to data balance processing based on a self-adaptive comprehensive oversampling method, so that the prediction precision of the trained power internet of things operation risk prediction model can be improved, and the accuracy of an operation risk prediction result is improved.

Description

Operation risk prediction method and system of power Internet of things and electronic equipment
Technical Field
The invention relates to the technical field of power internet of things, in particular to a method and a system for predicting operation risk of power internet of things and electronic equipment.
Background
The risks faced by the power internet of things in the operation process have the characteristics of diversity of types, expanded range and the like, the risks such as equipment faults, network attacks and human errors can cause non-negligible influence on the power internet of things, if the risks are not processed in time, a series of cross-space cascading faults can be caused, and catastrophic blackout accidents can be even caused in severe cases. The earlier the risk is found, the more timely the measures are taken, and the lower the cost of controlling the risk. Therefore, the electric power internet of things operation risk prediction research is developed, the electric power internet of things information and the multi-source data of the physical and social sides are introduced, the information hidden in the data is fully excavated, and various safety risks faced by the electric power internet of things can be predicted in time before a fault occurs, so that weak links can be found and improved, and the method has important significance for guaranteeing the safe and stable operation of the electric power internet of things.
Three defects exist in the current risk analysis of the power internet of things:
1) the traditional power internet of things operation risk prediction is mainly used for researching risks of a power information domain and a physical domain in an isolated manner, the influence of social side risks is rarely considered, and the operation risk of the power internet of things is not comprehensively analyzed from the aspect of information physics society; the operation risk of the power internet of things is essentially determined by the risks of three spaces of an information side, a physical side and a social side, so that the measurement data of the information side, the physical side and the social side are comprehensively considered in risk prediction.
2) Risk samples in the running data of the power internet of things are few, data imbalance can be caused, the trained classifier is more biased to multiple classes, the performance of the classifier is reduced, the challenge is brought to the training precision of a subsequent model, and the model is enabled to cause false alarm when the risk is predicted. Therefore, it is necessary to perform efficient data processing on multi-source data from information, physics, and society before model training.
Disclosure of Invention
The invention aims to solve the technical problem of the prior art and provides a method and a system for predicting the operation risk of an electric power Internet of things and electronic equipment.
The technical scheme of the operation risk prediction method of the power Internet of things is as follows:
with the time sequence as a reference, fusing multi-source data in a preset historical time period to obtain a complete data set, wherein the multi-source data comprises: the method comprises the steps that measurement data of an information side, measurement data of a physical side and measurement data of a social side of the power internet of things are obtained;
when the data in the complete data set are unbalanced, carrying out data balance processing on the complete data set based on a self-adaptive comprehensive oversampling method to obtain a balanced data set;
training based on the balance data set to obtain an electric power Internet of things operation risk prediction model;
and obtaining an operation risk prediction result of the to-be-tested power internet of things according to the current multi-source data of the to-be-tested power internet of things and the power internet of things operation risk prediction model.
The operation risk prediction method of the power internet of things has the following beneficial effects:
on one hand, measurement data of an information side, a physical side and a social side which influence the safety of the power internet of things are introduced, data fusion is carried out on the basis of a time sequence, a complete data set fusing the measurement data of the information side, the measurement data of the physical side and the measurement data of the social side is constructed on the basis of a random matrix theory, on the other hand, data balance processing is carried out on the fused data on the basis of an adaptive synthesis oversampling (ADASYN) method, a pseudo sample highly similar to a real sample can be generated, a balance data set is constructed in an auxiliary mode, the defect that the performance of a risk prediction model is unstable due to too low training precision caused by too low quantity of certain types of samples is overcome, on the basis of the two aspects, the prediction precision of a trained power internet of things operation risk prediction model can be improved, and the accuracy of an operation risk prediction result is improved.
On the basis of the scheme, the operation risk prediction method of the power internet of things can be further improved as follows.
Further, the obtaining of the power internet of things operation risk prediction model based on the balance data set training comprises:
constructing a Catboost ensemble learning model by taking a symmetric decision tree as a base classifier, and training based on the balanced data set to obtain a Catboost ensemble classifier;
obtaining an optimal parameter corresponding to each parameter of the Catboost integrated classifier by using a Bayesian optimization method;
and transmitting all the optimal parameters to the Catboost integrated classifier to obtain the power Internet of things operation risk prediction model.
The beneficial effect of adopting the further scheme is that: the traditional Catboost model can improve the classification performance by combining a plurality of classifiers, but the model performance is influenced by key parameters, manual parameter adjustment has certain blindness, the optimal solution of the parameters is easy to lose, the consumed time is too long, and the precision of a risk prediction model is influenced. In the method, a modeling process comprises two model training and learning stages, wherein in the first stage, a Catboost integrated learning model is constructed by taking a symmetric decision tree as a base classifier, and a Catboost integrated classifier is obtained through training; and in the second stage, a Bayesian Optimization algorithm (Bayesian Optimization) is introduced to perform parameter Optimization on the Catboost model, so that the obtained power Internet of things operation risk prediction model has higher prediction precision.
Further, the fusing the multi-source data in the preset historical time period by using the time sequence as a reference to obtain a complete data set, including:
generating an original data set Dataset according to the multi-source data in the preset historical time period,
Figure BDA0003460194390000031
wherein x isi=(xi1,xi2,...xiN)T,yi=(yi1,yi2,...yiN)T zi=(zi1,zi2,...ziN)T,DcRepresents: measurement data, D, of the information side of the power Internet of things within a preset historical time periodpRepresents: measurement data, D, of the physical side of the power Internet of things within a preset historical time periodsRepresenting measurement data, x, of the social side of the power internet of things within a preset historical time periodi1,xi2,...xiNRepresents: acquiring N measurement data, y of information side of the power Internet of things at the ith moment in preset historical time periodi1,yi2,...yiNRepresents: acquiring N measurement data, z of the physical side of the power Internet of things at the ith moment in a preset historical time periodi1,zi2,...ziNRepresents: acquiring N measurement data of a physical side of the power Internet of things at the ith moment in a preset historical time period, wherein i, N and N are positive integers;
constructing the data set by using a random matrix theory based on the raw data set Dataset and taking a time sequence as a referenceComplete data set D:
Figure BDA0003460194390000041
the technical scheme of the operation risk prediction system of the power Internet of things is as follows:
the system comprises a fusion module, a balance module, a training module and a prediction module;
the fusion module is configured to: with the time sequence as a reference, fusing multi-source data in a preset historical time period to obtain a complete data set, wherein the multi-source data comprises: the method comprises the steps that measurement data of an information side, measurement data of a physical side and measurement data of a social side of the power internet of things are obtained;
the balancing module is configured to: when the data in the complete data set are unbalanced, carrying out data balance processing on the complete data set based on a self-adaptive comprehensive oversampling method to obtain a balanced data set;
the training module is configured to: training based on the balance data set to obtain an electric power Internet of things operation risk prediction model;
the prediction module is to: and obtaining an operation risk prediction result of the to-be-tested power internet of things according to the current multi-source data of the to-be-tested power internet of things and the power internet of things operation risk prediction model.
The operation risk prediction system of the power internet of things has the following beneficial effects:
on one hand, measurement data of an information side, a physical side and a social side which influence the safety of the power internet of things are introduced, data fusion is carried out on the basis of a time sequence, a complete data set fusing the measurement data of the information side, the measurement data of the physical side and the measurement data of the social side is constructed on the basis of a random matrix theory, on the other hand, data balance processing is carried out on the fused data on the basis of an adaptive synthesis oversampling (ADASYN) method, a pseudo sample highly similar to a real sample can be generated, a balance data set is constructed in an auxiliary mode, the defect that the performance of a risk prediction model is unstable due to too low training precision caused by too low quantity of certain types of samples is overcome, on the basis of the two aspects, the prediction precision of a trained power internet of things operation risk prediction model can be improved, and the accuracy of an operation risk prediction result is improved.
On the basis of the scheme, the operation risk prediction system of the power internet of things can be further improved as follows.
Further, the training module is specifically configured to:
constructing a Catboost ensemble learning model by taking a symmetric decision tree as a base classifier, and training based on the balanced data set to obtain a Catboost ensemble classifier;
obtaining an optimal parameter corresponding to each parameter of the Catboost integrated classifier by using a Bayesian optimization method;
and transmitting all the optimal parameters to the Catboost integrated classifier to obtain the power Internet of things operation risk prediction model.
The beneficial effect of adopting the further scheme is that: the traditional Catboost model can improve the classification performance by combining a plurality of classifiers, but the model performance is influenced by key parameters, manual parameter adjustment has certain blindness, the optimal solution of the parameters is easy to lose, the consumed time is too long, and the precision of a risk prediction model is influenced. In the application, the modeling process comprises two model training and learning stages, wherein in the first stage, a Catboost ensemble learning model is constructed by taking a symmetric decision tree as a base classifier, and the Catboost ensemble classifier is obtained through training; and in the second stage, a Bayesian Optimization algorithm (Bayesian Optimization) is introduced to perform parameter Optimization on the Catboost model, so that the obtained power Internet of things operation risk prediction model has higher prediction precision.
Further, the fusion module is specifically configured to:
generating an original data set Dataset according to the multi-source data in the preset historical time period,
Figure BDA0003460194390000051
wherein x isi=(xi1,xi2,...xiN)T,yi=(yi1,yi2,...yiN)T zi=(zi1,zi2,...ziN)T,DcRepresents: measurement data, D, of the information side of the power Internet of things within a preset historical time periodpRepresents: measurement data, D, of the physical side of the power Internet of things within a preset historical time periodsRepresenting measurement data, x, of the social side of the power internet of things within a preset historical time periodi1,xi2,...xiNRepresents: acquiring N measurement data, y of information side of the power Internet of things at the ith moment in preset historical time periodi1,yi2,...yiNRepresents: acquiring N measurement data, z of the physical side of the power Internet of things at the ith moment in a preset historical time periodi1,zi2,...ziNRepresents: acquiring N measurement data of a physical side of the power Internet of things at the ith moment in a preset historical time period, wherein i, N and N are positive integers;
based on the raw data set Dataset, constructing the complete data set D by using a random matrix theory and taking a time sequence as a reference:
Figure BDA0003460194390000061
the storage medium stores instructions, and when the instructions are read by a computer, the computer is enabled to execute any one of the operation risk prediction methods of the power internet of things.
An electronic device of the present invention includes a processor and the storage medium, where the processor executes instructions in the storage medium.
Drawings
Fig. 1 is a schematic flow chart of an operation risk prediction method of an electric power internet of things according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of the acquisition process of a balanced data set;
FIG. 3 is a schematic flow diagram of a training power Internet of things operation risk prediction model;
FIG. 4 is a schematic diagram of a GBDT training process;
FIG. 5 is a schematic diagram of a topology;
FIG. 6 is a confusion matrix of risk prediction results prior to data balancing;
FIG. 7 is a confusion matrix of risk prediction results after data balancing;
FIG. 8 is a schematic representation of a ROC curve;
FIG. 9 is a schematic of a precision-recall curve;
FIG. 10 is a schematic diagram of a confusion matrix;
FIG. 11 is a ROC curve after parameter optimization;
FIG. 12 is a graph of accuracy versus recall after parameter optimization;
FIG. 13 is a confusion matrix after parameter optimization;
fig. 14 is a schematic structural diagram of an operation risk prediction system of an electric power internet of things according to an embodiment of the present invention;
Detailed Description
As shown in fig. 1, the operation risk prediction method for the power internet of things according to the embodiment of the present invention includes the following steps:
s1, fusing multi-source data in a preset historical time period by taking the time sequence as a reference to obtain a complete data set, wherein the multi-source data comprises: the method comprises the steps that measurement data of an information side, measurement data of a physical side and measurement data of a social side of the power internet of things are obtained;
the preset historical time period can be set according to actual conditions, and the multi-source data can be obtained by collecting measurement data of an information side, measurement data of a physical side and measurement data of a social side of a plurality of power internet of things.
Wherein, the measured data of the information side comprises: to the attack signal of electric power thing networking to and the network flow of electric power thing networking etc, the data of measurationing of physics side includes: three-phase voltage, three-phase current etc. of electric power thing networking, the measurement data of society side include meteorological data such as humidity, temperature, precipitation etc..
S2, when the data in the complete data set are unbalanced, carrying out data balance processing on the complete data set based on a self-adaptive comprehensive oversampling method to obtain a balanced data set; because a data imbalance phenomenon may exist in the complete data set to cause performance degradation of a subsequent model, a few types of samples are subjected to oversampling processing based on an adaptive synthesis oversampling (ADASYN) algorithm before model training, as shown in fig. 2, and the specific steps are as follows:
s20, calculating whether the data are balanced, that is, determining whether the data in the complete data set are balanced, specifically: the complete data set D comprises a risk data set and a normally operating data set, wherein the risk data refers to data generated by network attacks, system faults, system disturbances and the like on the power internet of things, are distributed on an information side, a physical side and a social side, and all risk samples such as 'x' in the complete data set D can be determined according to the risk data11 x21…xn1 y11 y21…yn1 z11 z21…zn1And the like, forming a risk data set, wherein the data in the data set which normally operates is as follows: the samples remaining in the complete data set D except for the risk data set.
That is, each row of data in the complete data set D is a sample, e.g., "x11 x21…xn1 y11 y21…yn1z11 z21…zn1"is a sample, the sample including the risk data is determined as a risk sample, the risk data set is formed, and the rest of the samples are formed into a data set which normally operates.
Number of all samples in risk dataset is msLet the number of all samples of a normally running data set be mlThe degree of unbalance is calculated by the following formula:
Figure BDA0003460194390000081
and when D is smaller than a preset threshold value, determining that the data in the complete data set D is unbalanced, wherein the preset threshold value is 1% or 1% and the like, and setting and adjusting can be performed according to actual conditions.
Determining whether the data in the complete data set is balanced according to the content, if so, determining the current complete data set as a balanced data set, and if not, executing S21;
and S21, oversampling, namely performing data balance processing on the complete data set, and determining the oversampled complete data set as a balanced data set. Wherein, the oversampling process is as follows specifically:
s210, calculating the number G of samples needing to be synthesized, wherein G is equal to (m)l-ms) B, wherein b ∈ [0,1 ]]The specific value of b may be set according to an actual situation, for example, when b is 1, at this time, the sum of the number of synthesized samples and the number of risk samples in the risk data set is equal to the number of samples in the data set in normal operation, and in addition, b may also be set to different values such as 0.5;
s211, for each risk sample, finding out K adjacent samples, and calculating:
Figure BDA0003460194390000082
where Δ is the number of samples in the data set that belong to normal operation among the K neighbors, and Z is a normalization factor to ensure that r constitutes a distribution. Thus, the larger the number of samples in the dataset belonging to normal operation around a risk sample, the higher r, where Δ and K are integers.
S212, through the formula: gj=rjX G, calculating the number of samples to be synthesized for each risk sample, rjRepresents the r, g corresponding to the jth risk samplejAnd representing the number of samples to be synthesized corresponding to the jth risk sample, wherein j is an integer.
S213, synthesizing a synthesized sample corresponding to the jth risk sample, that is, the risk samples are a few types of samples in the ADASYN algorithm, and the samples in the normally running data set are used as a plurality of types of samples in the ADASYN algorithm, so as to synthesize the synthesized sample corresponding to each risk sample. The ADASYNN algorithm obtains sufficient pseudo data highly similar to the original data by automatically determining the number of samples needing to be synthesized in each few types of samples, and thoroughly solves the problem of influence of data imbalance on the training precision of a machine learning algorithm at a data end.
S3, training based on the balance data set to obtain an electric power Internet of things operation risk prediction model;
s4, obtaining an operation risk prediction result of the to-be-tested power internet of things according to the current multi-source data of the to-be-tested power internet of things and the power internet of things operation risk prediction model.
On one hand, measurement data of an information side, a physical side and a social side which influence the safety of the power internet of things are introduced, data fusion is carried out on the basis of a time sequence, a complete data set fusing the measurement data of the information side, the measurement data of the physical side and the measurement data of the social side is constructed on the basis of a random matrix theory, on the other hand, data balance processing is carried out on the fused data on the basis of an adaptive synthesis oversampling (ADASYN) method, a pseudo sample highly similar to a real sample can be generated, a balance data set is constructed in an auxiliary mode, the defect that the performance of a risk prediction model is unstable due to too low training precision caused by too low quantity of certain types of samples is overcome, on the basis of the two aspects, the prediction precision of a trained power internet of things operation risk prediction model can be improved, and the accuracy of an operation risk prediction result is improved.
Optionally, in the above technical solution, training based on a balanced data set to obtain an electric power internet of things operation risk prediction model includes:
s30, constructing a Catboost ensemble learning model by taking the symmetric decision tree as a base classifier, and training based on a balanced data set to obtain a Catboost ensemble classifier; as shown in fig. 3, specifically:
dividing a balanced data set into a training set and a testing set, then, taking a symmetric decision tree as a base classifier, constructing a Catboost ensemble learning model, specifically, constructing a plurality of symmetric decision trees, and carrying out classification training on the basis of the training set and the testing set by each symmetric decision tree to obtain the Catboost ensemble classifier.
S31, obtaining the optimal parameter corresponding to each parameter of the Catboost integrated classifier by using a Bayesian optimization method; as shown in fig. 3, specifically:
s310, determining a parameter initialization population, specifically: acquiring each parameter of a Catboost integrated classifier, and establishing a parameter initialization population;
s311, establishing a substitution probability model, and bringing the parameter initialization population into the substitution probability model;
s312, calculating an objective function;
s313, constructing a Bayesian network;
s314, sampling the Bayesian network, judging whether the maximum iteration times is reached, if so, outputting an optimal parameter combination, and if not, modifying the probability model and returning to the S311 until the optimal parameter combination is output, wherein the optimal parameter combination comprises optimal parameters corresponding to each parameter of the Catboost integrated classifier.
The specific technical details of S310 to S314 are known to those skilled in the art and are not described herein.
And S32, transmitting all the optimal parameters to a Catboost integrated classifier to obtain an electric power Internet of things operation risk prediction model. Specifically, the method comprises the following steps:
and (3) constructing a Catboost ensemble learning model by taking the symmetric decision tree as a base classifier, and searching for optimal parameters of the model by adopting a Bayesian optimization algorithm in order to further improve the performance of the model. Firstly, constructing a Catboost risk prediction model by taking a symmetric decision tree as a base classifier; and then updating posterior distribution of the target function by continuously adding sample points according to the given target function by using Bayesian optimization, thereby obtaining the optimal parameters of the model and further improving the classification precision of the model on the samples. An operation risk prediction model construction scheme of the electric power Internet of things based on BO-Catboost. Specifically, the method comprises the following steps: the Gradient Boosting Decision Tree (GBDT) is an integrated learning framework based on decision tree, and it uses an additive model (i.e. linear combination of basis functions) and continuously reduces the residual error generated in the training process to achieve an algorithm for classifying or regressing data, and continuously improves the precision of the final classifier by reducing the deviation in the training process, and weights and sums the weak classifiers obtained from each training round to obtain the final total classifier, and the training process of GBDT is shown in fig. 4.
With the geometric increase of data volume, the GDBT algorithm generally has the defects of easy overfitting and slow training speed. The Catboost is improved on the basis of the GBDT, the main improvement measure is to provide a sequencing promotion strategy to solve the problems of gradient deviation and prediction deviation existing in a standard GBDT model, and simultaneously, a completely symmetrical decision tree is adopted to improve the generalization capability and the prediction speed of the model and ensure the training and prediction precision of the model.
1) Fast scoring
The Catboost uses an entirely symmetric decision tree (ODT) as a base learner, and unlike a general decision tree, the entirely symmetric decision tree has completely consistent features and feature thresholds selected at the time of splitting for internal nodes of the same depth. The fully symmetric decision tree can also be transformed into a decision table with 2d entries, d representing the number of levels of the decision tree. The decision tree of this structure is more balanced and the feature processing speed is much faster than that of a general decision tree.
2) Sequencing type lifting algorithm
The Catboost uses a sorting and lifting method to reduce gradient deviation and solve the problem of prediction offset. For each sample X in the training setkTraining the corresponding model M with all sample data except the samplekAnd training the weak learner continuously by calculating a gradient estimation value of the sample data to obtain an optimized model, so that the generalization capability of the model is improved. The algorithm processing flow is as follows:
inputting: training set
Figure BDA0003460194390000111
The number of iteration rounds T;
and (3) outputting: model M1,M2,...,MQ
Firstly, randomly generating a sequence sigma, sequencing sample data in a training set W according to the sequence value, and respectively calculating a corresponding model M of the sample data1,M2,...,MQ
② each model Mk,Mk∈(M1,M2,...,MQ) All the samples are obtained by training the first k randomly arranged samples;
③ in the course of iterative updating, model Mk-1Gradient unbiased estimation is carried out based on the kth sample;
and fourthly, continuously training the weak learner based on the gradient of the sample until the number of iteration rounds reaches the maximum value, outputting a final model, and stopping training. Wherein k and Q are positive integers.
The Catboost model can improve the classification performance by combining a plurality of classifiers, but the model performance is influenced by key parameters, manual parameter adjustment needs certain workload, certain blindness is achieved, the optimal solution of the parameters is easy to lose, and therefore the accuracy of the risk prediction model is influenced. Compared with other hyper-parameter optimization algorithms such as grid search, random search, genetic algorithm and the like, the Bayesian optimization has low requirement on the number of initial sample points and high optimization efficiency, and is more suitable for model hyper-parameter tuning scenes.
The Bayesian optimization algorithm replaces the crossover and the variation in the genetic algorithm by Bayesian network sampling, and firstly, the joint probability distribution of a better solution is obtained after sampling, and a Bayesian network model is generated. And sampling the network model to generate a new candidate solution for the next iteration. By cycling this process, the optimal solution is finally obtained, and the specific algorithm flow is shown in fig. 3.
In order to find a hyper-parameter set suitable for a model and improve the prediction precision of the model, a BO-Catboost-based electric power Internet of things operation risk prediction model is constructed, and the specific execution process is as follows:
setting an optimization interval of the parameters of the Catboost algorithm. Each parameter can take any value within the interval.
Initializing a regression prediction model of the Catboost algorithm, wherein the Catboost algorithm is selected as a training target, and performance indexes of the model are used as evaluation criteria.
Initializing the Bayesian optimization algorithm, and establishing a candidate probability model. And selecting a parameter combination from the parameter set as an initial parameter of the Catboost algorithm model, and then training. After training is completed, the model is tested through the test set, and the test set and the result set are used as input of an evaluation function for evaluation. The result is an evaluation of the effectiveness of the Catboost algorithm model using this combination of parameters.
Fourthly, searching the optimal parameters on the agent model according to the effect evaluation value obtained in the third step, and simultaneously outputting the corresponding evaluation value and the corresponding parameters.
And stopping optimization when the iteration number of the found parameters reaches the maximum, and finding a parameter combination set which enables the evaluation value to be the maximum from the proxy model. And the obtained optimal parameter combination is the parameters of the Catboost algorithm, and a final prediction model is obtained through training.
The traditional Catboost model can improve the classification performance by combining a plurality of classifiers, but the model performance is influenced by key parameters, manual parameter adjustment has certain blindness, the optimal solution of the parameters is easy to lose, the consumed time is too long, and the precision of a risk prediction model is influenced. In the application, the modeling process comprises two model training and learning stages, wherein in the first stage, a Catboost ensemble learning model is constructed by taking a symmetric decision tree as a base classifier, and the Catboost ensemble classifier is obtained through training; and in the second stage, a Bayesian Optimization algorithm (Bayesian Optimization) is introduced to perform parameter Optimization on the Catboost model, so that the obtained power Internet of things operation risk prediction model has higher prediction precision.
Optionally, in the above technical solution, with the time series as a reference, fusing the multi-source data in a preset historical time period to obtain a complete data set, including:
s10, generating an original data set Dataset according to the multi-source data in the preset historical time period,
Figure BDA0003460194390000131
wherein x isi=(xi1,xi2,...xiN)T,yi=(yi1,yi2,...yiN)T zi=(zi1,zi2,...ziN)T,DcRepresents: measurement data, D, of the information side of the power Internet of things within a preset historical time periodpRepresents: measurement data, D, of the physical side of the power Internet of things within a preset historical time periodsRepresenting measurement data, x, of the social side of the power internet of things within a preset historical time periodi1,xi2,...xiNRepresents: collecting electricity at the ith moment in a preset historical time periodN measured data, y of the information side of the force internet of thingsi1,yi2,...yiNRepresents: acquiring N measurement data, z of the physical side of the power Internet of things at the ith moment in a preset historical time periodi1,zi2,...ziNRepresents: acquiring N measurement data of a physical side of the power Internet of things at the ith moment in a preset historical time period; wherein i, N and N are all positive integers.
Based on the original data set Dataset, by taking the time sequence as a reference and utilizing a random matrix theory, constructing a complete data set D:
Figure BDA0003460194390000132
specifically, the method comprises the following steps:
in order to comprehensively analyze the operation risk of the power internet of things from three angles of an information side, a physical side and a social side, a random matrix method is introduced, and in any period of time, measurement data acquired by any attribute in any node of the information side, the physical side and the social side can form a column vector, wherein any attribute in any node is as follows: the attack signal + A item voltage + temperature of the 2 nd node, the measurement data of the information side, the physical side and the social side are extracted to form an original data set Dataset,
Figure BDA0003460194390000133
and fusing data in different spaces at the same time by taking the time sequence as a reference, wherein the time sequence in one data file is selected as the reference when the data are fused, the file is called as a reference file, and the parameters of other data stream files are unified to the time reference. Importing data of each sampling moment of an information side, a physical side and a social side according to a time sequence to construct a high-dimensional random matrix, wherein the high-dimensional random matrix is a complete data set D:
Figure BDA0003460194390000141
wherein, the measured data of the information side comprises: to the attack signal of electric power thing networking to and the network flow of electric power thing networking etc, the data of measurationing of physics side includes: the three-phase voltage, the three-phase current of electric power thing networking etc. and the measured data of society side includes meteorological data such as humidity, temperature, precipitation etc..
The technical effect of the operation risk prediction method of the power internet of things is described below by an example, specifically:
a16-node topological structure is built by utilizing RT-LAB and OPNET combined simulation, measurement data of an information side and a physical side are simulated, the topological structure is shown as figure 5, wherein elements selected in a circular frame represent a circuit breaker, elements selected in a solid line rectangular frame represent a power transmission line, elements selected in a dotted line rectangular frame represent a transformer, and elements selected in an octagonal frame represent a power supply.
In the constructed 16-node topological structure, twenty-two-thousand data of 200s are collected at time intervals of 0.01s, wherein a single-phase short circuit risk is set in a period of 15-16.5 s, a double-phase short circuit risk is set in a period of 45-45.5 s, a two-phase grounding risk is set in a period of 75-76.5 s, a three-phase short circuit risk is set in a period of 120-120.5 s, a false command attack injection risk is set in a period of 150-151 s, a human misoperation risk is set in a period of 195-200 s, and measurement data of an information side and a physical side are collected.
Meanwhile, the meteorological data downloaded to the social side of the same time segment in the China meteorological network mainly comprise humidity, temperature, precipitation and the like. After the measurement data of the information side, the physical side and the social side are obtained, the measurement data of the information side and the social side are fused into the simulated physical data set by taking the acquired time sequence as a reference to obtain a complete data set, and as shown in table 1, the nodes in the table represent each node.
Table 1:
Figure BDA0003460194390000151
in order to analyze the degree of the risk prediction model improved by oversampling, the data sets before and after the balancing are respectively trained by the Catboost algorithm, and the model risk prediction performance is contrasted and analyzed, and the confusion matrix of the risk prediction results before and after the data balancing processing is shown in fig. 6 and fig. 7.
As can be seen from fig. 6, the risk categories 1, 2, 3, and 4 in the raw data have low prediction accuracy and high false alarm due to the small number of samples. As can be seen from the analysis in fig. 7, after the data balancing process, the prediction accuracy of risk categories 1, 2, 3, and 4 is improved by 10%, 2%, 9%, and 10%, respectively, and the false alarm rate is also significantly reduced. Therefore, oversampling a few types of samples to obtain a balanced data set plays an important role in reducing the false alarm rate of model risk prediction and improving the stability of the model.
Precision (Precision), Recall (Recall), F1-Score and the like are respectively adopted as performance indexes for measuring a risk prediction model, wherein the Precision is used for measuring the proportion of real samples in all samples which are predicted to be the true samples. The recall ratio is used to measure the proportion of regular samples in all correctly classified samples. F1-Score is the harmonic mean of precision and recall. The calculation formula of each index is as follows:
Figure BDA0003460194390000152
wherein TP (true positive) indicates that a positive sample is predicted as a positive; TN (true negative) indicates that the positive sample is predicted as a negative; FP (false positive) indicates that a negative sample is predicted as a positive; FN (false negative) indicates that a negative sample is predicted as a negative.
Putting the oversampled balance data into a model for training, and according to the following steps of 7: 3, average accuracy, average recall and average F1-Score of the risk prediction of the Catboost algorithm are 99.76%, 98.09% and 98.9%, respectively, the ROC curve, accuracy-recall curve and confusion matrix of the Catboost algorithm are shown in fig. 8-10.
From the analysis of fig. 8, it can be seen that the ROC inflection point of the Catboost model is close to (0, 1), which indicates that the model can achieve high prediction accuracy under the condition of low false alarm rate. From the analysis of fig. 9, the inflection point of the curve is close to (1, 1), which shows that the model can obtain high precision under the condition of high recall rate. As can be seen from the analysis of fig. 10, the overall classification accuracy is better, and only the categories 1, 3, and 4 need to be improved slightly. By combining the analysis, the Catboost model has certain applicability to processing the prediction problem of the operation risk of the power Internet of things.
The performance of the Catboost model is affected by some key parameters, which are listed in Table 2, and the maximum tree number affects the computational cost of the model and results in overfitting. The learning rate affects the total time of training. The maximum depth of the tree also has a large impact on the model effect and overfitting.
Table 2:
parameter(s) Means of Default value
iterations Maximum number of trees 500
learning_rate Learning rate 0.03
depth Maximum depth of tree 6
In order to further improve the performance of the model, a Bayesian optimization method is adopted to find the optimal parameters of the model, and in the experiment, the parameter interval of the Bayesian optimization is set as shown in Table 3.
Table 3:
parameter(s) Means of Optimizing interval
iterations Maximum number of trees [100,1000]
learning_rate Learning rate [0.01,0.3]
depth Maximum depth of tree [1,10]
In order to avoid the training result from having the contingency, the parameter optimization process adopts five-fold cross validation for training, and takes the AUC mean value of 5 times of cross validation of the model as the target function, and finally obtains the optimal parameter set of the risk prediction model as { iteration ═ 883, learning _ rate ═ 0.13, depth ═ 9 }.
The optimal parameter combinations are set in the model, and the obtained ROC curve, precision-recall curve and confusion matrix are shown in FIGS. 11-13.
From the analysis of FIGS. 11-13, the average accuracy, the average recall rate and the average F1-Score of the risk prediction of the BO-Catboost algorithm were 99.77%, 98.8% and 99.07%, respectively, which were improved by 0.01%, 0.71% and 0.17% respectively before the optimization of the parameters. The model after parameter optimization has better performance, and the false alarm rate of the model is only 1%, which shows that the risk prediction model has good stability.
(1) The operation risk of the power internet of things is essentially determined by the risks of three spaces of an information side, a physical side and a social side, so that the measurement data of the information side, the physical side and the social side are comprehensively considered in risk prediction. Most of the current risk prediction methods only consider information and measurement data of a physical side, but ignore measurement data of a social side.
(2) And risk samples in the running data of the power internet of things are few, so that data imbalance can be caused, the trained classifier is more biased to most classes, the performance of the classifier is reduced, the subsequent model training precision is challenged, and the model can be mistakenly reported when predicting the risk. Therefore, it is necessary to perform efficient data processing on multi-source data from information, physics, and society before model training. The traditional SMOTE algorithm randomly selects a minority sample as a main sample, randomly selects one from K adjacent minority samples, and takes the convex combination of the two as a composite sample. It does not simply replicate a few classes, mitigating the effects of overfitting. But it is sensitive to noise samples, and when the main sample is a noise sample, it is possible that the newly synthesized sample is also a noise sample.
(3) Traditional risk analysis mainly focuses on control afterwards in the electric power thing networking, to the quick accurate solution of the problem that has taken place promptly, reduces actual loss, like risk assessment and fault location, emergent mode is too passive, is unfavorable for maintaining electric power thing networking safe operation. The risk categories facing the operation of the power internet of things can be accurately and timely predicted, and the method plays a certain auxiliary role in timely isolating risks and eliminating faults for power grid personnel. Therefore, in model design, the accuracy of the model to risk prediction and the performance problem of risk prediction in unknown data should be mainly concerned. The traditional Catboost model can improve the classification performance by combining a plurality of classifiers, but the model performance is influenced by key parameters, manual parameter adjustment has certain blindness, the optimal solution of the parameters is easy to lose, the consumed time is too long, and the precision of a risk prediction model is influenced.
Interpretation of terms:
electric power thing networking: the power internet of things is an application of the internet of things in a smart grid, is a result of information communication technology development to a certain stage, effectively integrates communication infrastructure resources and power system infrastructure resources, improves the informatization level of a power system, improves the utilization efficiency of the existing infrastructure of the power system, and provides important technical support for links such as power grid generation, transmission, transformation, distribution and power utilization.
Power information physical System (cyber-physical System, CPS): the information side and the physical side of the power system are increasingly interactively coupled, a large number of electrical equipment, data acquisition devices and computing terminals are interconnected through two entity networks, namely a power information physical system, namely a power CPS, which integrates a computing system, a communication network and a physical environment into a whole is gradually formed.
In the foregoing embodiments, although the steps are numbered as S1, S2, etc., but only the specific embodiments given herein are provided, and those skilled in the art may adjust the execution sequence of S1, S2, etc. according to the actual situation, and this is also within the protection scope of the present invention, and it is understood that some embodiments may include some or all of the above embodiments.
As shown in fig. 14, an operational risk prediction system 200 of an electric power internet of things according to an embodiment of the present invention includes a fusion module 210, a balance module 220, a training module 230, and a prediction module 240;
the fusion module 210 is configured to: with the time sequence as a reference, fusing multi-source data in a preset historical time period to obtain a complete data set, wherein the multi-source data comprises: the method comprises the steps that measurement data of an information side, measurement data of a physical side and measurement data of a social side of the power internet of things are obtained;
the balancing module 220 is configured to: when the data in the complete data set are unbalanced, carrying out data balance processing on the complete data set based on a self-adaptive comprehensive oversampling method to obtain a balanced data set;
the training module 230 is configured to: training based on a balance data set to obtain an operation risk prediction model of the power internet of things;
the prediction module 240 is configured to: and obtaining an operation risk prediction result of the to-be-tested power internet of things according to the current multi-source data of the to-be-tested power internet of things and the power internet of things operation risk prediction model.
On one hand, measurement data of an information side, a physical side and a social side which influence the safety of the power internet of things are introduced, data fusion is carried out on the basis of a time sequence, a complete data set fusing the measurement data of the information side, the measurement data of the physical side and the measurement data of the social side is constructed on the basis of a random matrix theory, on the other hand, data balance processing is carried out on the fused data on the basis of an adaptive synthesis oversampling (ADASYN) method, a pseudo sample highly similar to a real sample can be generated, a balance data set is constructed in an auxiliary mode, the defect that the performance of a risk prediction model is unstable due to too low training precision caused by too low quantity of certain types of samples is overcome, on the basis of the two aspects, the prediction precision of a trained power internet of things operation risk prediction model can be improved, and the accuracy of an operation risk prediction result is improved.
Optionally, in the above technical solution, the training module 230 is specifically configured to:
constructing a Catboost ensemble learning model by taking a symmetric decision tree as a base classifier, and training based on a balanced data set to obtain a Catboost ensemble classifier;
obtaining an optimal parameter corresponding to each parameter of the Catboost integrated classifier by using a Bayesian optimization method;
and transmitting all the optimal parameters to a Catboost integrated classifier to obtain an operation risk prediction model of the power Internet of things.
The traditional Catboost model can improve the classification performance by combining a plurality of classifiers, but the model performance is influenced by key parameters, manual parameter adjustment has certain blindness, the optimal solution of the parameters is easy to lose, the consumed time is too long, and the precision of a risk prediction model is influenced. In the application, the modeling process comprises two model training and learning stages, wherein in the first stage, a Catboost ensemble learning model is constructed by taking a symmetric decision tree as a base classifier, and the Catboost ensemble classifier is obtained through training; and in the second stage, a Bayesian Optimization algorithm (Bayesian Optimization) is introduced to perform parameter Optimization on the Catboost model, so that the obtained power Internet of things operation risk prediction model has higher prediction precision.
Optionally, in the above technical solution, the fusion module 210 is specifically configured to:
generating a raw data set Dataset according to multi-source data in a preset historical time period,
Figure BDA0003460194390000201
wherein x isi=(xi1,xi2,...xiN)T,yi=(yi1,yi2,...yiN)T zi=(zi1,zi2,...ziN)T,DcRepresents: measurement data, D, of the information side of the power Internet of things within a preset historical time periodpRepresents: measurement data, D, of the physical side of the power Internet of things within a preset historical time periodsRepresenting measurement data, x, of the social side of the power internet of things within a preset historical time periodi1,xi2,...xiNRepresents: acquiring N measurement data, y of information side of the power Internet of things at the ith moment in preset historical time periodi1,yi2,...yiNRepresents: acquiring N measurement data, z of the physical side of the power Internet of things at the ith moment in a preset historical time periodi1,zi2,...ziNRepresents: acquiring N measurement data of a physical side of the power Internet of things at the ith moment in a preset historical time period, wherein i, N and N are positive integers;
based on the original data set Dataset, by taking the time sequence as a reference and utilizing a random matrix theory, constructing a complete data set D:
Figure BDA0003460194390000202
the above steps for realizing the corresponding functions of the parameters and the unit modules in the operation risk prediction system 200 of the power internet of things of the present invention refer to the above parameters and steps in the embodiment of the operation risk prediction method of the power internet of things, which are not described herein again.
The storage medium stores instructions, and when the instructions are read by a computer, the computer executes any one of the above operation risk prediction methods for the power internet of things.
The electronic device of the embodiment of the invention comprises a processor and the storage medium, wherein the processor executes instructions in the storage medium, and the electronic device can be a computer, a mobile phone and the like.
As will be appreciated by one skilled in the art, the present invention may be embodied as a system, method or computer program product.
Accordingly, the present disclosure may be embodied in the form of: may be embodied entirely in hardware, entirely in software (including firmware, resident software, micro-code, etc.) or in a combination of hardware and software, and may be referred to herein generally as a "circuit," module "or" system. Furthermore, in some embodiments, the invention may also be embodied in the form of a computer program product in one or more computer-readable media having computer-readable program code embodied in the medium.
Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium include an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (8)

1. An operation risk prediction method of an electric power Internet of things is characterized by comprising the following steps:
with the time sequence as a reference, fusing multi-source data in a preset historical time period to obtain a complete data set, wherein the multi-source data comprises: the method comprises the steps that measurement data of an information side, measurement data of a physical side and measurement data of a social side of the power internet of things are obtained;
when the data in the complete data set are unbalanced, carrying out data balance processing on the complete data set based on a self-adaptive comprehensive oversampling method to obtain a balanced data set;
training based on the balance data set to obtain an electric power Internet of things operation risk prediction model;
and obtaining an operation risk prediction result of the to-be-tested power internet of things according to the current multi-source data of the to-be-tested power internet of things and the power internet of things operation risk prediction model.
2. The method for predicting the operational risk of the power internet of things according to claim 1, wherein training the operational risk prediction model of the power internet of things based on the balance data set comprises:
constructing a Catboost ensemble learning model by taking a symmetric decision tree as a base classifier, and training based on the balanced data set to obtain a Catboost ensemble classifier;
obtaining an optimal parameter corresponding to each parameter of the Catboost integrated classifier by using a Bayesian optimization method;
and transmitting all the optimal parameters to the Catboost integrated classifier to obtain the power Internet of things operation risk prediction model.
3. The method for predicting the operational risk of the power internet of things according to claim 1 or 2, wherein the step of fusing multi-source data in a preset historical time period by taking the time sequence as a reference to obtain a complete data set comprises the following steps:
generating an original data set Dataset according to the multi-source data in the preset historical time period,
Figure FDA0003460194380000021
wherein x isi=(xi1,xi2,...xiN)T,yi=(yi1,yi2,...yiN)Tzi=(zi1,zi2,...ziN)T,DcRepresents: measurement data, D, of information side of power internet of things in preset historical time periodpRepresents: measurement data, D, of the physical side of the power Internet of things within a preset historical time periodsRepresenting measurement data, x, of the social side of the power internet of things within a preset historical time periodi1,xi2,...xiNRepresents: acquiring N measurement data, y of information side of the power Internet of things at the ith moment in preset historical time periodi1,yi2,...yiNRepresents: acquiring N measurement data, z of the physical side of the power Internet of things at the ith moment in a preset historical time periodi1,zi2,...ziNRepresents: acquiring N measurement data of a physical side of the power Internet of things at the ith moment in a preset historical time period, wherein i, N and N are positive integers;
based on the raw data set Dataset, constructing the complete data set D by using a random matrix theory and taking a time sequence as a reference:
Figure FDA0003460194380000022
4. an operation risk prediction system of an electric power Internet of things is characterized by comprising a fusion module, a balance module, a training module and a prediction module;
the fusion module is configured to: with the time sequence as a reference, fusing multi-source data in a preset historical time period to obtain a complete data set, wherein the multi-source data comprises: the method comprises the steps that measurement data of an information side, measurement data of a physical side and measurement data of a social side of the power internet of things are obtained;
the balancing module is configured to: when the data in the complete data set are unbalanced, carrying out data balance processing on the complete data set based on a self-adaptive comprehensive oversampling method to obtain a balanced data set;
the training module is configured to: training based on the balance data set to obtain an electric power Internet of things operation risk prediction model;
the prediction module is to: and obtaining an operation risk prediction result of the to-be-tested power internet of things according to the current multi-source data of the to-be-tested power internet of things and the power internet of things operation risk prediction model.
5. The operational risk prediction system of the power internet of things as claimed in claim 4, wherein the training module is specifically configured to:
constructing a Catboost ensemble learning model by taking a symmetric decision tree as a base classifier, and training based on the balanced data set to obtain a Catboost ensemble classifier;
obtaining an optimal parameter corresponding to each parameter of the Catboost integrated classifier by using a Bayesian optimization method;
and transmitting all the optimal parameters to the Catboost integrated classifier to obtain the power Internet of things operation risk prediction model.
6. The operation risk prediction system of the power internet of things as claimed in claim 4 or 5, wherein the fusion module is specifically configured to:
according toGenerating a raw data set Dataset by the multi-source data in the preset historical time period,
Figure FDA0003460194380000031
wherein x isi=(xi1,xi2,...xiN)T,yi=(yi1,yi2,...yiN)Tzi=(zi1,zi2,...ziN)T,DcRepresents: measurement data, D, of the information side of the power Internet of things within a preset historical time periodpRepresents: measurement data, D, of the physical side of the power Internet of things within a preset historical time periodsRepresenting measurement data, x, of the social side of the power internet of things within a preset historical time periodi1,xi2,...xiNRepresents: acquiring N measurement data, y of information side of the power Internet of things at the ith moment in preset historical time periodi1,yi2,...yiNRepresents: acquiring N measurement data, z of the physical side of the power Internet of things at the ith moment in a preset historical time periodi1,zi2,...ziNRepresents: acquiring N measurement data of a physical side of the power Internet of things at the ith moment in a preset historical time period, wherein i, N and N are positive integers;
based on the raw data set Dataset, constructing the complete data set D by using a random matrix theory and taking a time sequence as a reference:
Figure FDA0003460194380000041
7. a storage medium, wherein instructions are stored in the storage medium, and when the instructions are read by a computer, the instructions cause the computer to execute an operation risk prediction method of an electric power internet of things according to any one of claims 1 to 3.
8. An electronic device comprising a processor and the storage medium of claim 7, the processor executing instructions in the storage medium.
CN202210015149.1A 2022-01-07 2022-01-07 Operation risk prediction method and system of power Internet of things and electronic equipment Pending CN114511194A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210015149.1A CN114511194A (en) 2022-01-07 2022-01-07 Operation risk prediction method and system of power Internet of things and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210015149.1A CN114511194A (en) 2022-01-07 2022-01-07 Operation risk prediction method and system of power Internet of things and electronic equipment

Publications (1)

Publication Number Publication Date
CN114511194A true CN114511194A (en) 2022-05-17

Family

ID=81549495

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210015149.1A Pending CN114511194A (en) 2022-01-07 2022-01-07 Operation risk prediction method and system of power Internet of things and electronic equipment

Country Status (1)

Country Link
CN (1) CN114511194A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117522177A (en) * 2024-01-08 2024-02-06 国网江苏省电力有限公司信息通信分公司 Smart power grid stability prediction method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117522177A (en) * 2024-01-08 2024-02-06 国网江苏省电力有限公司信息通信分公司 Smart power grid stability prediction method
CN117522177B (en) * 2024-01-08 2024-03-12 国网江苏省电力有限公司信息通信分公司 Smart power grid stability prediction method

Similar Documents

Publication Publication Date Title
CN111523778A (en) Power grid operation safety assessment method based on particle swarm algorithm and gradient lifting tree
CN112508442A (en) Transient stability evaluation method and system based on automation and interpretable machine learning
CN113935440A (en) Iterative evaluation method and system for error state of voltage transformer
CN109389253B (en) Power system frequency prediction method after disturbance based on credibility ensemble learning
CN107274015A (en) A kind of method and system of prediction of wind speed
CN116401532B (en) Method and system for recognizing frequency instability of power system after disturbance
CN112200038A (en) CNN-based rapid identification method for oscillation type of power system
CN109378835A (en) Based on the large-scale electrical power system Transient Stability Evaluation system that mutual information redundancy is optimal
CN114492675A (en) Intelligent fault cause diagnosis method for capacitor voltage transformer
CN114386024A (en) Power intranet terminal equipment abnormal attack detection method based on ensemble learning
CN114511194A (en) Operation risk prediction method and system of power Internet of things and electronic equipment
CN108805419B (en) Power grid node importance calculation method based on network embedding and support vector regression
Zhang et al. Prediction for the maximum frequency deviation of post-disturbance based on the deep belief network
CN116127447A (en) Virtual power plant false data injection attack detection method, device, terminal and medium
CN116204771A (en) Power system transient stability key feature selection method, device and product
CN112967154B (en) Assessment method and device for Well-rolling of power system
Li et al. Fault location of active distribution network based on improved gray wolf algorithm
CN115598459A (en) Power failure prediction method for 10kV feeder line fault of power distribution network
CN113721461A (en) New energy unit parameter identification method and system based on multiple test scenes
Zhou et al. Transient stability assessment of large-scale AC/DC hybrid power grid based on separation feature and deep belief networks
CN112748358A (en) Power distribution network ground fault identification method and system based on artificial immune network
Yu et al. Probabilistic stability of small disturbance in wind power system based on a variational Bayes and Lyapunov theory using PMU data
CN117390418B (en) Transient stability evaluation method, system and equipment for wind power grid-connected system
CN116702629B (en) Power system transient stability evaluation method with migration capability
Wang et al. Transient Stability Evaluation of Power System based on Neighborhood Rough Set and Extreme Learning Machine [J]

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination