CN114692729A - New energy station bad data identification and correction method based on deep learning - Google Patents
New energy station bad data identification and correction method based on deep learning Download PDFInfo
- Publication number
- CN114692729A CN114692729A CN202210230736.2A CN202210230736A CN114692729A CN 114692729 A CN114692729 A CN 114692729A CN 202210230736 A CN202210230736 A CN 202210230736A CN 114692729 A CN114692729 A CN 114692729A
- Authority
- CN
- China
- Prior art keywords
- data
- real
- historical
- time
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012937 correction Methods 0.000 title claims abstract description 59
- 238000000034 method Methods 0.000 title claims abstract description 38
- 238000013135 deep learning Methods 0.000 title claims abstract description 28
- 238000012549 training Methods 0.000 claims abstract description 42
- 238000010248 power generation Methods 0.000 claims description 13
- 238000012360 testing method Methods 0.000 claims description 11
- 239000006185 dispersion Substances 0.000 claims description 10
- 230000006870 function Effects 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 8
- 238000004458 analytical method Methods 0.000 claims description 7
- 238000012545 processing Methods 0.000 claims description 5
- 238000010606 normalization Methods 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 3
- 230000008569 process Effects 0.000 description 13
- 238000013528 artificial neural network Methods 0.000 description 8
- 210000002569 neuron Anatomy 0.000 description 5
- 238000004422 calculation algorithm Methods 0.000 description 4
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 238000010223 real-time analysis Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000013523 data management Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000010361 irregular oscillation Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Business, Economics & Management (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Computing Systems (AREA)
- Economics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Public Health (AREA)
- Water Supply & Treatment (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Testing And Monitoring For Control Systems (AREA)
Abstract
The invention provides a new energy station bad data identification and correction method based on deep learning, which comprises the following steps: acquiring historical operating data of an identification object in the new energy station, and marking historical normal data and historical bad data in the historical operating data; establishing an identification model, and performing deep learning training on the identification model according to historical normal data; establishing a correction model, inputting historical bad data into the correction model and the trained identification model, and performing deep learning training on the correction model by combining the output of the identification model; acquiring real-time operation data of an identification object, and inputting the real-time operation data into a trained identification model to distinguish real-time normal data and real-time bad data in the real-time operation data; and inputting the real-time bad data into the trained correction model to obtain the corrected value of the real-time bad data. The method can obviously improve the efficiency of identifying and correcting the bad data and ensure the real-time safe and stable operation of the new energy power station.
Description
Technical Field
The invention belongs to the field of energy data management, and particularly relates to a new energy station bad data identification and correction method based on deep learning.
Background
With the continuous deepening and advancing of new energy station construction, the data acquisition of the new energy station shows the trend of high-volume and high-dimensional, and meanwhile, the problem of poor data is increasingly highlighted. Poor data such as missing, invalid, repeated and wrong often appear in the real-time data collection of new energy station, and poor data usually is caused by two kinds of reasons: firstly, the power system of new energy has faults and the like, such as temporary interruption of a certain data channel in a data acquisition system, which causes unreal data; secondly, due to special events such as sudden accidental fluctuation of some large industrial loads and sudden adverse environments, irregular oscillation of data can occur. The existence of bad data distorts the state estimation result of the new energy station, affects the operation scheduling and stable operation of the power system, and may even cause unknown safety consequences.
The data volume of each type of new energy station is huge, the relationship among the data is various, the station, the unit, the environment and other data are mutually coupled, and the coupling relationship also exists among the internal data. With the continuous development of modern information technology, artificial intelligence is applied to various fields, wherein deep learning is widely applied to new energy pattern recognition, classification and load prediction scenes. The deep learning has strong adaptability to time-varying rule characteristics of time sequences, has memory and association functions on historical information, and can continuously learn massive and coupled data. Therefore, bad data of the new energy station can be identified and corrected by deep learning, and safe and stable operation of the new energy station is guaranteed.
Disclosure of Invention
In order to solve the defects in the prior art, the invention provides a new energy station bad data identification and correction method based on deep learning, which comprises the following steps:
s100: acquiring historical operating data of an identification object in the new energy station, and marking historical normal data and historical bad data in the historical operating data;
s200: establishing an identification model, and performing deep learning training on the identification model according to historical normal data;
s300: establishing a correction model, inputting historical bad data into the correction model and the trained identification model, and performing deep learning training on the correction model by combining the output of the identification model;
s400: acquiring real-time running data of an identification object, and inputting the real-time running data into a trained identification model to distinguish real-time normal data and real-time bad data in the real-time running data;
s500: and inputting the real-time bad data into the trained correction model to obtain the corrected value of the real-time bad data.
Optionally, the identification object includes a wind power generation parameter, a photovoltaic power generation parameter, and installed capacity and active power of a unit in the new energy station;
the wind power generation parameters comprise wind speed, temperature, wind direction cosine value, humidity and pressure intensity;
the photovoltaic power generation parameters comprise irradiation intensity, irradiation duration and assembly area.
Optionally, the S200 includes:
s210: dividing historical normal data into training data and testing data according to a time sequence, and initializing hyper-parameters of an identification model;
s220: inputting training data into an identification model for training, and calculating a predicted value of an identification object under a corresponding time sequence of test data through the identification model;
s230: calculating whether the convergence accuracy of the identification model meets a preset threshold, wherein the calculation formula of the convergence accuracy is as follows:
where a is convergence accuracy, n is the total number of predicted values, xfiFor the ith test data, xiIs the ith predicted value;
s240: if the convergence accuracy meets the preset condition, the training is finished, otherwise, the hyper-parameters of the identification model are adjusted, and S220-S230 are repeated until the convergence accuracy does not exceed the preset condition.
Optionally, the method further includes: preprocessing historical operating data before S200, including:
identifying missing values in historical operating data, acquiring historical operating data belonging to the same class as the missing values, calculating the same-class mean value of the missing values to obtain interpolation values of the missing values, and replacing the missing values with the interpolation values, wherein the calculation formula of the interpolation values is as follows:
wherein, aiAs the mean coefficient, when the ith input historical operating data si0 in the absence, or 1, m is the total amount of data of the historical operating data of the same class,are interpolation values.
Optionally, the S300 includes:
s310: initializing hyper-parameters of the identification model;
s320: inputting historical bad data into a trained identification model, and taking a predicted value of the historical bad data output by the identification model as an accurate value;
s330: inputting historical bad data into a correction model for training, analyzing the characteristics of the historical bad data through the correction model, and outputting a correction value to the historical bad data according to an analysis result;
s340: and analyzing the error degree and the error dispersion degree of the corrected value relative to the accurate value, finishing training when the analysis result meets a preset condition, otherwise, adjusting the hyperparameter of the corrected model, and repeating S320-S330 until the preset condition is met.
Optionally, the analyzing the degree of error and the degree of error dispersion of the corrected value with respect to the accurate value includes:
analyzing the degree of error by calculating the average absolute error of the corrected value relative to the accurate value;
and analyzing the error dispersion degree by calculating the root mean square difference of the corrected value relative to the accurate value.
Optionally, the method further includes:
while executing S500, selecting normal data with a preset proportion and inputting the normal data into the correction model, and calculating the average absolute error and the root mean square error of the output value of the correction model and the normal data;
and when any one of the average absolute error and the root mean square error does not meet the preset condition, taking the real-time operation data input during the execution of the S400 as training data, and re-training the identification model and the correction model.
Optionally, the identification model is provided with a solver on an output layer, and the solver is a softmax function.
Optionally, the solver outputs the probability of the real-time bad data by comparing the calculated predicted value with the error of the actually measured real-time operation data, and outputs the identified real-time bad data and the time sequence position where the real-time bad data is located according to the probability.
The technical scheme provided by the invention has the beneficial effects that:
the method is combined with deep learning to establish and train the identification model and the correction model in a combined manner, and the obtained model is used for rapidly identifying and correcting the bad data collected in the new energy station in real time, so that the identification and correction efficiency of the bad data can be remarkably improved, the real-time analysis and application of the new energy power station is supported, and the real-time safe and stable operation of the new energy power station is guaranteed.
Drawings
In order to more clearly illustrate the technical solution of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
Fig. 1 is a schematic flow chart of a new energy station bad data identification and correction method based on deep learning according to an embodiment of the present invention;
fig. 2 is a line graph showing the correlation between the neuron number and the convergence accuracy of the neural network.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein.
It should be understood that, in various embodiments of the present invention, the sequence numbers of the processes do not mean the execution sequence, and the execution sequence of the processes should be determined by the functions and the internal logic of the processes, and should not constitute any limitation on the implementation process of the embodiments of the present invention.
It should be understood that in the present application, "comprising" and "having" and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be understood that, in the present invention, "a plurality" means two or more. "and/or" is merely an association relationship describing an associated object, meaning that there may be three relationships, for example, and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "comprises A, B and C" and "comprises A, B, C" means that all three of A, B, C comprise, "comprises A, B or C" means that one of three of A, B, C are comprised, "comprises A, B and/or C" means that any 1 or any 2 or 3 of the three comprise A, B, C are comprised.
It should be understood that in the present invention, "B corresponding to a", "a corresponds to B", or "B corresponds to a" means that B is associated with a, and B can be determined from a. Determining B from a does not mean determining B from a alone, but may be determined from a and/or other information. And the matching of A and B means that the similarity of A and B is greater than or equal to a preset threshold value.
As used herein, "if" may be interpreted as "at … …" or "when … …" or "in response to a determination" or "in response to a detection", depending on the context.
The technical solution of the present invention will be described in detail below with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.
As shown in fig. 1, the present embodiment provides a method for identifying and correcting bad data of a new energy station based on deep learning, including:
s100: acquiring historical operating data and real-time operating data of an identification object in the new energy station, and marking historical normal data and historical bad data in the historical operating data;
s200: establishing an identification model, and performing deep learning training on the identification model according to historical normal data;
s300: establishing a correction model, inputting historical bad data into the correction model and the trained identification model, and performing deep learning training on the correction model by combining the output of the identification model;
s400: real-time normal data and real-time bad data in the real-time operation data are distinguished by inputting the real-time operation data into a trained identification model;
s500: and inputting the real-time bad data into the trained correction model to obtain the corrected value of the real-time bad data.
In the embodiment, a deep neural network algorithm is used for learning and training historical data collected by a new energy power generation station to obtain a deep neural network identification model meeting the precision requirement; and inputting the data acquired in real time into the identification model to obtain a predicted value of the deep neural network, taking the predicted value as an accurate value, setting a deviation threshold value, comparing the data acquired actually with the accurate value, and if the data exceeds the threshold value range, determining the data as bad data, and finally identifying the acquired bad data in real time. Through the process, the deep learning is utilized, the time-varying rule characteristic adaptability of the time sequence is strong, the memory and association functions are realized on historical information, massive and coupling data can be continuously learned, further, the bad data can be rapidly identified and corrected in real time, the bad data identification and correction efficiency can be remarkably improved, the real-time analysis application of the new energy power station is supported, and the real-time safe and stable operation of the new energy power station is guaranteed.
In this embodiment, the identification object includes a wind power generation parameter, a photovoltaic power generation parameter, and an installed capacity and active power of a unit in the new energy station; the wind power generation parameters comprise wind speed, temperature, wind direction cosine value, humidity and pressure intensity; the photovoltaic power generation parameters comprise irradiation intensity, irradiation duration and assembly area.
In this embodiment, both the historical operating data and the real-time operating data are acquired by the SCADA system, and specifically, the SCADA system performs data acquisition once in 15 minutes. Because the collected data may have an uncontrollable random fault due to the SCADA system, and there are some obvious data errors, which may affect the accuracy of the subsequent training model, the embodiment preprocesses the historical operating data and the real-time operating data before S200, including:
identifying missing values in historical operating data, acquiring historical operating data belonging to the same class as the missing values, calculating the same-class mean value of the missing values to obtain interpolation values of the missing values, and replacing the missing values with the interpolation values, wherein the calculation formula of the interpolation values is as follows:
wherein,to interpolate a value, aiAs the mean coefficient, when the ith input historical operating data or real-time operating data siAnd if the data is missing, the data is 0, otherwise the data is 1, and m is the total data amount of the historical operation data or the real-time operation data of the same class.
In this embodiment, 35040 sample data sets are obtained through data preprocessing, and the time span of the data set is 1 year. And intercepting the last 1000 sample data as a real-time identification data set according to the consistency characteristic of time, using the residual sample data set as historical operating data for constructing and training an identification model and a correction model, and marking historical normal data and historical bad data in the historical operating data according to experience in advance.
Firstly, training a recognition model, wherein the S200 includes:
s210: dividing historical normal data into training data and testing data according to a time sequence, and initializing hyper-parameters of an identification model;
s220: inputting training data into an identification model for training, and calculating a predicted value of an identification object under a corresponding time sequence of test data through the identification model;
s230: calculating whether the convergence accuracy of the identification model meets a preset threshold, wherein the calculation formula of the convergence accuracy is as follows:
where a is convergence accuracy, n is the total number of predicted values, xfiFor the ith test data, xiIs the ith predicted value;
s240: if the convergence accuracy meets the preset condition, the training is finished, otherwise, the hyper-parameters of the identification model are adjusted, and S220-S230 are repeated until the convergence accuracy does not exceed the preset condition.
In this embodiment, first, the hyper-parameters of the identification model are set, including parameters such as the initial network structure, the network threshold, and the weight. The identification problem of the bad data is a classification problem essentially, so that the output layer of the identification model is provided with a solver and adopts a softmax function, and the activation function adopts a sigmoid function. And the solver outputs the probability of real-time bad data by comparing the calculated predicted value with the error of the actually measured real-time operation data, and outputs the identified real-time bad data and the time sequence position of the real-time bad data according to the probability.
Specifically, during the training process of the identification model, the training data is used for predicting the operation data of the identification object in the subsequent time sequence, wherein the subsequent time sequence is consistent with the time sequence corresponding to the test data, and then the identification accuracy of the identification model is judged according to the error between the predicted value and the test data, and when the subsequent execution S400 is carried out, the identification model predicts the subsequent time sequence according to the real-time operation data of the previous time sequence, and compares the predicted result with the real-time operation data of the subsequent time sequence, namely the measured value of the subsequent time sequence of the identification object. The identification model sets up a threshold value which floats up and down through a solver of an output layer, calculates and judges an output accurate value and a corresponding measured value according to a threshold value delta e, and if the output accurate value and the corresponding measured value exceed the threshold value range, the identification model is regarded as bad data. The threshold value Δ e is:
xmaxis the largest measured value, xminIs the smallest measured value.
Because the number of samples used in this embodiment is large, it is sufficient to adopt a double hidden layer to meet the requirement in consideration of accuracy and processing speed. As shown in fig. 2, the graphs show the correlation between the convergence accuracies of the neuron numbers identified using the defective data corresponding to 5, 10, 15, 20, 25, and 30, respectively, and the horizontal axis represents the neuron number and the vertical axis represents the corresponding convergence accuracy, and the neuron numbers of the respective layers are the same for comparison. In the training process, the simulation time is almost multiplied with the increase of the number of neurons, the accuracy of the identification model is gradually increased, and after the number of nodes of the hidden layer of the identification model is respectively 20 and 20, the identification result is not obviously improved, because the depth model is always increased progressively and tends to be stable, which shows that the performance of the identification model is gradually optimized with the increase of the hidden layer. According to the relation of comprehensively considering model time and performance, the number of the hidden layer nodes of the identification model selected in the text is 20, and the finally obtained bad data result is shown in table 1. As can be seen from table 1, the model accuracy of each type of data reaches more than 97%, which indicates that the accuracy and convergence of identifying different types of bad data by using the identification model are high.
TABLE 1
After the training of the identification model is completed, the correction model is trained again, the correction model in the embodiment is a BP neural network, a trainbr algorithm is adopted, and the algorithm has better functional capability and higher convergence rate than a basic gradient algorithm, and is more suitable for a data set coupled with each other.
Specifically, the S300 includes:
s310: initializing hyper-parameters of the identification model, namely determining a neural network structure, a network threshold value and a weight value;
s320: after normalization processing is carried out on historical bad data, inputting the normalized historical bad data into a trained identification model, and taking a predicted value of the historical bad data output by the identification model as an accurate value, wherein specifically, the historical bad data is subjected to data normalization processing in Matlab by a mapminmax function;
s330: inputting historical bad data into a correction model, analyzing the characteristics of the historical bad data through the correction model, and outputting a correction value to the historical bad data according to an analysis result;
s340: and analyzing the error degree and the error dispersion degree of the corrected value relative to the accurate value, finishing training when the analysis result meets the preset condition, otherwise, adjusting the hyperparameter of the corrected model, namely adjusting the neural network structure, the network threshold value and the weight, and repeating S320-S330 until the preset condition is met.
In this embodiment, first, the hyper-parameters of the identification model are set, including parameters such as the initial network structure, the network threshold, and the weight. For the selection of the number of hidden layer nodes and the number of hidden layers of the neural network, if the number of the hidden layer nodes is too small, the network cannot have necessary learning capacity and information processing capacity, and conversely, if the number of the hidden layer nodes is too large, the network is more complex and the processing speed is slower, and the network is more prone to fall into local minimum points in the learning process. Therefore, in the embodiment, when the output of the identification model is multiple, the identification and modification effect of the bad data is better by adopting the hidden layers with more than 2 layers, and the number of nodes and other parameters of each layer are obtained in the training.
The degree of error and the degree of error dispersion of the analysis correction value with respect to the accurate value include:
analyzing the error degree by calculating the Mean Absolute Error (MAE) of the corrected value relative to the accurate value, wherein the specific calculation formula is as follows:
yiis the ith correction value, ytiIs the ith accurate value, and l is the total number of the corrected values.
Analyzing the error dispersion degree by calculating the root mean square difference (RMSE) of the corrected value relative to the accurate value, wherein the specific calculation formula is as follows:
specifically, in the training process of the correction model, the error degree and the error dispersion degree of the correction value relative to the accurate value are shown in table 2, the values of the correction evaluation indexes RMSE and MAE are very small and are relatively close to the actual values, which shows that the method has a good correction effect on various types of bad data of the new energy station, and both meet the preset condition, so that the training of the correction model is finished.
TABLE 2
Data type | RMSE | MAE |
Active power | 0.47706 | 0.07211 |
Wind speed | 0.51107 | 0.05084 |
Wind direction cosine value | 0.17631 | 0.13896 |
Temperature of | 0.08850 | 0.00889 |
Pressure intensity | 0.35447 | 0.05220 |
Humidity | 0.41174 | 0.03381 |
In order to cope with the random change of the operation condition of the new energy station, the embodiment further includes: while executing S500, selecting normal data with a preset proportion and inputting the normal data into the correction model, and calculating the average absolute error and the root mean square error of the output value of the correction model and the normal data; and when any one of the average absolute error and the root mean square error does not meet the preset condition, taking the real-time operation data input during the execution of the S400 as training data, and re-training the identification model and the correction model. The hyper-parameters of the identification model and the correction model are adjusted in time through the process so as to meet the accuracy optimization of the identification model and the correction model.
In this embodiment, the real-time normal data is not changed, and when the real-time bad data is corrected, a part of the real-time normal data is also input into the trained correction model to be corrected, the corrected related error indexes are counted to obtain a corrected related error index comparison, if both the correction evaluation indexes RMSE and MAE meet the preset condition, the accuracy of the current model meets the requirement, and the hyper-parameters of the identification model and the correction model do not need to be adjusted temporarily.
The sequence numbers in the above embodiments are merely for description, and do not represent the sequence of the assembly or the use of the components.
The above description is intended to be illustrative of the present invention and should not be taken as limiting the invention, as the invention is intended to cover various modifications, equivalents, improvements, and equivalents, which may be made within the spirit and scope of the present invention.
Claims (9)
1. Poor data identification and correction method of new energy station based on deep learning is characterized by comprising the following steps:
s100: acquiring historical operating data and real-time operating data of an identification object in the new energy station, and marking historical normal data and historical bad data in the historical operating data;
s200: establishing an identification model, and performing deep learning training on the identification model according to historical normal data;
s300: establishing a correction model, inputting historical bad data into the correction model and the trained identification model, and performing deep learning training on the correction model by combining the output of the identification model;
s400: real-time normal data and real-time bad data in the real-time operation data are distinguished by inputting the real-time operation data into a trained identification model;
s500: and inputting the real-time bad data into the trained correction model to obtain the corrected value of the real-time bad data.
2. The method for identifying and correcting the bad data of the new energy station based on the deep learning as claimed in claim 1, wherein the identification objects comprise wind power generation parameters, photovoltaic power generation parameters and installed capacity and active power of units in the new energy station;
the wind power generation parameters comprise wind speed, temperature, wind direction cosine value, humidity and pressure intensity;
the photovoltaic power generation parameters comprise irradiation intensity, irradiation duration and assembly area.
3. The method for identifying and correcting the bad data of the new energy station based on the deep learning as claimed in claim 1, wherein the method further comprises: preprocessing historical operating data and real-time operating data before S200, including:
identifying missing values in historical operating data, acquiring historical operating data which belongs to the same class as the missing values, calculating the same-class mean value of the missing values to obtain interpolation values of the missing values, and replacing the missing values with the interpolation values, wherein the interpolation values have the calculation formula:
wherein,to interpolate a value, aiAs the mean coefficient, when the ith input historical operating data or real-time operating data siAnd if the data is missing, the data is 0, otherwise the data is 1, and m is the total data amount of the historical operation data or the real-time operation data of the same class.
4. The method according to claim 1, wherein the S200 includes:
s210: dividing historical normal data into training data and testing data according to a time sequence, and initializing hyper-parameters of an identification model;
s220: inputting training data into an identification model for training, and calculating a predicted value of an identification object under a corresponding time sequence of test data through the identification model;
s230: calculating whether the convergence accuracy of the identification model meets a preset threshold, wherein the calculation formula of the convergence accuracy is as follows:
where a is convergence accuracy, n is the total number of predicted values, xfiFor the ith test data, xiIs the ith predicted value;
s240: if the convergence accuracy meets the preset condition, the training is finished, otherwise, the hyper-parameters of the identification model are adjusted, and S220-S230 are repeated until the convergence accuracy does not exceed the preset condition.
5. The method for identifying and correcting the bad data of the new energy station based on the deep learning of claim 1, wherein the step S300 comprises:
s310: initializing hyper-parameters of the identification model;
s320: after normalization processing is carried out on historical bad data, inputting the historical bad data into a trained identification model, and taking a predicted value of the historical bad data output by the identification model as an accurate value;
s330: inputting historical bad data into a correction model for training, analyzing the characteristics of the historical bad data through the correction model, and outputting a correction value to the historical bad data according to an analysis result;
s340: and analyzing the error degree and the error dispersion degree of the corrected value relative to the accurate value, finishing training when the analysis result meets a preset condition, otherwise, adjusting the hyperparameter of the corrected model, and repeating S320-S330 until the preset condition is met.
6. The method for identifying and correcting the bad data of the new energy station based on the deep learning of claim 5, wherein analyzing the degree of error and the degree of error dispersion of the corrected value with respect to the accurate value comprises:
analyzing the degree of error by calculating the average absolute error of the corrected value relative to the accurate value;
and analyzing the error dispersion degree by calculating the root mean square difference of the corrected value relative to the accurate value.
7. The method for identifying and correcting the bad data of the new energy station based on the deep learning as claimed in claim 1, wherein the method further comprises:
while executing S500, selecting normal data with a preset proportion and inputting the normal data into the correction model, and calculating the average absolute error and the root mean square error of the output value of the correction model and the normal data;
and when any one of the average absolute error and the root mean square error does not meet the preset condition, taking the real-time operation data input during the execution of the S400 as training data, and re-training the identification model and the correction model.
8. The method for identifying and correcting the bad data of the new energy station based on the deep learning as claimed in claim 1, wherein the identification model is provided with a solver at an output layer, and the solver is a softmax function.
9. The method for identifying and correcting the bad data of the new energy station based on the deep learning of claim 8, wherein the solver outputs a probability of the bad data in real time by comparing the calculated predicted value with an actually measured error of the real-time operation data, and outputs the identified bad data in real time and a time sequence position where the bad data is located according to the probability.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210230736.2A CN114692729A (en) | 2022-03-10 | 2022-03-10 | New energy station bad data identification and correction method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210230736.2A CN114692729A (en) | 2022-03-10 | 2022-03-10 | New energy station bad data identification and correction method based on deep learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114692729A true CN114692729A (en) | 2022-07-01 |
Family
ID=82137416
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210230736.2A Pending CN114692729A (en) | 2022-03-10 | 2022-03-10 | New energy station bad data identification and correction method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114692729A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115481916A (en) * | 2022-09-27 | 2022-12-16 | 呼伦贝尔安泰热电有限责任公司海拉尔热电厂 | Method and system for identifying heating area heat index in central heating in alpine region |
-
2022
- 2022-03-10 CN CN202210230736.2A patent/CN114692729A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115481916A (en) * | 2022-09-27 | 2022-12-16 | 呼伦贝尔安泰热电有限责任公司海拉尔热电厂 | Method and system for identifying heating area heat index in central heating in alpine region |
CN115481916B (en) * | 2022-09-27 | 2023-09-19 | 呼伦贝尔安泰热电有限责任公司海拉尔热电厂 | Method and system for identifying heat index of heating area in central heating in alpine region |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN116757534B (en) | Intelligent refrigerator reliability analysis method based on neural training network | |
CN111210024A (en) | Model training method and device, computer equipment and storage medium | |
CN110879377B (en) | Metering device fault tracing method based on deep belief network | |
CN115018021A (en) | Machine room abnormity detection method and device based on graph structure and abnormity attention mechanism | |
CN113344288B (en) | Cascade hydropower station group water level prediction method and device and computer readable storage medium | |
CN116227637A (en) | Active power distribution network oriented refined load prediction method and system | |
CN116579768B (en) | Power plant on-line instrument operation and maintenance management method and system | |
CN116485049B (en) | Electric energy metering error prediction and optimization system based on artificial intelligence | |
CN113379116A (en) | Cluster and convolutional neural network-based line loss prediction method for transformer area | |
CN115656824A (en) | Lithium battery nuclear power state prediction method based on CNN-LSTM model | |
CN113758652B (en) | Oil leakage detection method and device for converter transformer, computer equipment and storage medium | |
CN114692729A (en) | New energy station bad data identification and correction method based on deep learning | |
Liu et al. | Research on the strategy of locating abnormal data in IOT management platform based on improved modified particle swarm optimization convolutional neural network algorithm | |
CN114189047A (en) | False data detection and correction method for active power distribution network state estimation | |
CN113033898A (en) | Electrical load prediction method and system based on K-means clustering and BI-LSTM neural network | |
CN117407675A (en) | Lightning arrester leakage current prediction method based on multi-variable reconstruction combined dynamic weight | |
CN117113086A (en) | Energy storage unit load prediction method, system, electronic equipment and medium | |
CN111863153A (en) | Method for predicting total amount of suspended solids in wastewater based on data mining | |
CN116933860A (en) | Transient stability evaluation model updating method and device, electronic equipment and storage medium | |
CN116151799A (en) | BP neural network-based distribution line multi-working-condition fault rate rapid assessment method | |
CN116384223A (en) | Nuclear equipment reliability assessment method and system based on intelligent degradation state identification | |
CN116090872A (en) | Power distribution area health state evaluation method | |
CN115619028A (en) | Clustering algorithm fusion-based power load accurate prediction method | |
CN109547248A (en) | Based on artificial intelligence in orbit aerocraft ad hoc network method for diagnosing faults and device | |
Yan et al. | Remaining Useful Life Interval Prediction for Complex System Based on BiGRU Optimized by Log-Norm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |