CN114692729A - New energy station bad data identification and correction method based on deep learning - Google Patents

New energy station bad data identification and correction method based on deep learning Download PDF

Info

Publication number
CN114692729A
CN114692729A CN202210230736.2A CN202210230736A CN114692729A CN 114692729 A CN114692729 A CN 114692729A CN 202210230736 A CN202210230736 A CN 202210230736A CN 114692729 A CN114692729 A CN 114692729A
Authority
CN
China
Prior art keywords
data
real
historical
time
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210230736.2A
Other languages
Chinese (zh)
Inventor
陈文进
陈水耀
祁炜雯
张俊
朱峰
茹伟
范强
宋美雅
刘震
刘皓明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shaoxing Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Original Assignee
Shaoxing Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shaoxing Power Supply Co of State Grid Zhejiang Electric Power Co Ltd filed Critical Shaoxing Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Priority to CN202210230736.2A priority Critical patent/CN114692729A/en
Publication of CN114692729A publication Critical patent/CN114692729A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Business, Economics & Management (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Economics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Testing And Monitoring For Control Systems (AREA)

Abstract

The invention provides a new energy station bad data identification and correction method based on deep learning, which comprises the following steps: acquiring historical operating data of an identification object in the new energy station, and marking historical normal data and historical bad data in the historical operating data; establishing an identification model, and performing deep learning training on the identification model according to historical normal data; establishing a correction model, inputting historical bad data into the correction model and the trained identification model, and performing deep learning training on the correction model by combining the output of the identification model; acquiring real-time operation data of an identification object, and inputting the real-time operation data into a trained identification model to distinguish real-time normal data and real-time bad data in the real-time operation data; and inputting the real-time bad data into the trained correction model to obtain the corrected value of the real-time bad data. The method can obviously improve the efficiency of identifying and correcting the bad data and ensure the real-time safe and stable operation of the new energy power station.

Description

New energy station bad data identification and correction method based on deep learning
Technical Field
The invention belongs to the field of energy data management, and particularly relates to a new energy station bad data identification and correction method based on deep learning.
Background
With the continuous deepening and advancing of new energy station construction, the data acquisition of the new energy station shows the trend of high-volume and high-dimensional, and meanwhile, the problem of poor data is increasingly highlighted. Poor data such as missing, invalid, repeated and wrong often appear in the real-time data collection of new energy station, and poor data usually is caused by two kinds of reasons: firstly, the power system of new energy has faults and the like, such as temporary interruption of a certain data channel in a data acquisition system, which causes unreal data; secondly, due to special events such as sudden accidental fluctuation of some large industrial loads and sudden adverse environments, irregular oscillation of data can occur. The existence of bad data distorts the state estimation result of the new energy station, affects the operation scheduling and stable operation of the power system, and may even cause unknown safety consequences.
The data volume of each type of new energy station is huge, the relationship among the data is various, the station, the unit, the environment and other data are mutually coupled, and the coupling relationship also exists among the internal data. With the continuous development of modern information technology, artificial intelligence is applied to various fields, wherein deep learning is widely applied to new energy pattern recognition, classification and load prediction scenes. The deep learning has strong adaptability to time-varying rule characteristics of time sequences, has memory and association functions on historical information, and can continuously learn massive and coupled data. Therefore, bad data of the new energy station can be identified and corrected by deep learning, and safe and stable operation of the new energy station is guaranteed.
Disclosure of Invention
In order to solve the defects in the prior art, the invention provides a new energy station bad data identification and correction method based on deep learning, which comprises the following steps:
s100: acquiring historical operating data of an identification object in the new energy station, and marking historical normal data and historical bad data in the historical operating data;
s200: establishing an identification model, and performing deep learning training on the identification model according to historical normal data;
s300: establishing a correction model, inputting historical bad data into the correction model and the trained identification model, and performing deep learning training on the correction model by combining the output of the identification model;
s400: acquiring real-time running data of an identification object, and inputting the real-time running data into a trained identification model to distinguish real-time normal data and real-time bad data in the real-time running data;
s500: and inputting the real-time bad data into the trained correction model to obtain the corrected value of the real-time bad data.
Optionally, the identification object includes a wind power generation parameter, a photovoltaic power generation parameter, and installed capacity and active power of a unit in the new energy station;
the wind power generation parameters comprise wind speed, temperature, wind direction cosine value, humidity and pressure intensity;
the photovoltaic power generation parameters comprise irradiation intensity, irradiation duration and assembly area.
Optionally, the S200 includes:
s210: dividing historical normal data into training data and testing data according to a time sequence, and initializing hyper-parameters of an identification model;
s220: inputting training data into an identification model for training, and calculating a predicted value of an identification object under a corresponding time sequence of test data through the identification model;
s230: calculating whether the convergence accuracy of the identification model meets a preset threshold, wherein the calculation formula of the convergence accuracy is as follows:
Figure BDA0003540418140000021
where a is convergence accuracy, n is the total number of predicted values, xfiFor the ith test data, xiIs the ith predicted value;
s240: if the convergence accuracy meets the preset condition, the training is finished, otherwise, the hyper-parameters of the identification model are adjusted, and S220-S230 are repeated until the convergence accuracy does not exceed the preset condition.
Optionally, the method further includes: preprocessing historical operating data before S200, including:
identifying missing values in historical operating data, acquiring historical operating data belonging to the same class as the missing values, calculating the same-class mean value of the missing values to obtain interpolation values of the missing values, and replacing the missing values with the interpolation values, wherein the calculation formula of the interpolation values is as follows:
Figure BDA0003540418140000022
wherein, aiAs the mean coefficient, when the ith input historical operating data si0 in the absence, or 1, m is the total amount of data of the historical operating data of the same class,
Figure BDA0003540418140000023
are interpolation values.
Optionally, the S300 includes:
s310: initializing hyper-parameters of the identification model;
s320: inputting historical bad data into a trained identification model, and taking a predicted value of the historical bad data output by the identification model as an accurate value;
s330: inputting historical bad data into a correction model for training, analyzing the characteristics of the historical bad data through the correction model, and outputting a correction value to the historical bad data according to an analysis result;
s340: and analyzing the error degree and the error dispersion degree of the corrected value relative to the accurate value, finishing training when the analysis result meets a preset condition, otherwise, adjusting the hyperparameter of the corrected model, and repeating S320-S330 until the preset condition is met.
Optionally, the analyzing the degree of error and the degree of error dispersion of the corrected value with respect to the accurate value includes:
analyzing the degree of error by calculating the average absolute error of the corrected value relative to the accurate value;
and analyzing the error dispersion degree by calculating the root mean square difference of the corrected value relative to the accurate value.
Optionally, the method further includes:
while executing S500, selecting normal data with a preset proportion and inputting the normal data into the correction model, and calculating the average absolute error and the root mean square error of the output value of the correction model and the normal data;
and when any one of the average absolute error and the root mean square error does not meet the preset condition, taking the real-time operation data input during the execution of the S400 as training data, and re-training the identification model and the correction model.
Optionally, the identification model is provided with a solver on an output layer, and the solver is a softmax function.
Optionally, the solver outputs the probability of the real-time bad data by comparing the calculated predicted value with the error of the actually measured real-time operation data, and outputs the identified real-time bad data and the time sequence position where the real-time bad data is located according to the probability.
The technical scheme provided by the invention has the beneficial effects that:
the method is combined with deep learning to establish and train the identification model and the correction model in a combined manner, and the obtained model is used for rapidly identifying and correcting the bad data collected in the new energy station in real time, so that the identification and correction efficiency of the bad data can be remarkably improved, the real-time analysis and application of the new energy power station is supported, and the real-time safe and stable operation of the new energy power station is guaranteed.
Drawings
In order to more clearly illustrate the technical solution of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
Fig. 1 is a schematic flow chart of a new energy station bad data identification and correction method based on deep learning according to an embodiment of the present invention;
fig. 2 is a line graph showing the correlation between the neuron number and the convergence accuracy of the neural network.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein.
It should be understood that, in various embodiments of the present invention, the sequence numbers of the processes do not mean the execution sequence, and the execution sequence of the processes should be determined by the functions and the internal logic of the processes, and should not constitute any limitation on the implementation process of the embodiments of the present invention.
It should be understood that in the present application, "comprising" and "having" and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be understood that, in the present invention, "a plurality" means two or more. "and/or" is merely an association relationship describing an associated object, meaning that there may be three relationships, for example, and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "comprises A, B and C" and "comprises A, B, C" means that all three of A, B, C comprise, "comprises A, B or C" means that one of three of A, B, C are comprised, "comprises A, B and/or C" means that any 1 or any 2 or 3 of the three comprise A, B, C are comprised.
It should be understood that in the present invention, "B corresponding to a", "a corresponds to B", or "B corresponds to a" means that B is associated with a, and B can be determined from a. Determining B from a does not mean determining B from a alone, but may be determined from a and/or other information. And the matching of A and B means that the similarity of A and B is greater than or equal to a preset threshold value.
As used herein, "if" may be interpreted as "at … …" or "when … …" or "in response to a determination" or "in response to a detection", depending on the context.
The technical solution of the present invention will be described in detail below with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.
As shown in fig. 1, the present embodiment provides a method for identifying and correcting bad data of a new energy station based on deep learning, including:
s100: acquiring historical operating data and real-time operating data of an identification object in the new energy station, and marking historical normal data and historical bad data in the historical operating data;
s200: establishing an identification model, and performing deep learning training on the identification model according to historical normal data;
s300: establishing a correction model, inputting historical bad data into the correction model and the trained identification model, and performing deep learning training on the correction model by combining the output of the identification model;
s400: real-time normal data and real-time bad data in the real-time operation data are distinguished by inputting the real-time operation data into a trained identification model;
s500: and inputting the real-time bad data into the trained correction model to obtain the corrected value of the real-time bad data.
In the embodiment, a deep neural network algorithm is used for learning and training historical data collected by a new energy power generation station to obtain a deep neural network identification model meeting the precision requirement; and inputting the data acquired in real time into the identification model to obtain a predicted value of the deep neural network, taking the predicted value as an accurate value, setting a deviation threshold value, comparing the data acquired actually with the accurate value, and if the data exceeds the threshold value range, determining the data as bad data, and finally identifying the acquired bad data in real time. Through the process, the deep learning is utilized, the time-varying rule characteristic adaptability of the time sequence is strong, the memory and association functions are realized on historical information, massive and coupling data can be continuously learned, further, the bad data can be rapidly identified and corrected in real time, the bad data identification and correction efficiency can be remarkably improved, the real-time analysis application of the new energy power station is supported, and the real-time safe and stable operation of the new energy power station is guaranteed.
In this embodiment, the identification object includes a wind power generation parameter, a photovoltaic power generation parameter, and an installed capacity and active power of a unit in the new energy station; the wind power generation parameters comprise wind speed, temperature, wind direction cosine value, humidity and pressure intensity; the photovoltaic power generation parameters comprise irradiation intensity, irradiation duration and assembly area.
In this embodiment, both the historical operating data and the real-time operating data are acquired by the SCADA system, and specifically, the SCADA system performs data acquisition once in 15 minutes. Because the collected data may have an uncontrollable random fault due to the SCADA system, and there are some obvious data errors, which may affect the accuracy of the subsequent training model, the embodiment preprocesses the historical operating data and the real-time operating data before S200, including:
identifying missing values in historical operating data, acquiring historical operating data belonging to the same class as the missing values, calculating the same-class mean value of the missing values to obtain interpolation values of the missing values, and replacing the missing values with the interpolation values, wherein the calculation formula of the interpolation values is as follows:
Figure BDA0003540418140000061
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003540418140000062
to interpolate a value, aiAs the mean coefficient, when the ith input historical operating data or real-time operating data siAnd if the data is missing, the data is 0, otherwise the data is 1, and m is the total data amount of the historical operation data or the real-time operation data of the same class.
In this embodiment, 35040 sample data sets are obtained through data preprocessing, and the time span of the data set is 1 year. And intercepting the last 1000 sample data as a real-time identification data set according to the consistency characteristic of time, using the residual sample data set as historical operating data for constructing and training an identification model and a correction model, and marking historical normal data and historical bad data in the historical operating data according to experience in advance.
Firstly, training a recognition model, wherein the S200 includes:
s210: dividing historical normal data into training data and testing data according to a time sequence, and initializing hyper-parameters of an identification model;
s220: inputting training data into an identification model for training, and calculating a predicted value of an identification object under a corresponding time sequence of test data through the identification model;
s230: calculating whether the convergence accuracy of the identification model meets a preset threshold, wherein the calculation formula of the convergence accuracy is as follows:
Figure BDA0003540418140000063
where a is convergence accuracy, n is the total number of predicted values, xfiFor the ith test data, xiIs the ith predicted value;
s240: if the convergence accuracy meets the preset condition, the training is finished, otherwise, the hyper-parameters of the identification model are adjusted, and S220-S230 are repeated until the convergence accuracy does not exceed the preset condition.
In this embodiment, first, the hyper-parameters of the identification model are set, including parameters such as the initial network structure, the network threshold, and the weight. The identification problem of the bad data is a classification problem essentially, so that the output layer of the identification model is provided with a solver and adopts a softmax function, and the activation function adopts a sigmoid function. And the solver outputs the probability of real-time bad data by comparing the calculated predicted value with the error of the actually measured real-time operation data, and outputs the identified real-time bad data and the time sequence position of the real-time bad data according to the probability.
Specifically, during the training process of the identification model, the training data is used for predicting the operation data of the identification object in the subsequent time sequence, wherein the subsequent time sequence is consistent with the time sequence corresponding to the test data, and then the identification accuracy of the identification model is judged according to the error between the predicted value and the test data, and when the subsequent execution S400 is carried out, the identification model predicts the subsequent time sequence according to the real-time operation data of the previous time sequence, and compares the predicted result with the real-time operation data of the subsequent time sequence, namely the measured value of the subsequent time sequence of the identification object. The identification model sets up a threshold value which floats up and down through a solver of an output layer, calculates and judges an output accurate value and a corresponding measured value according to a threshold value delta e, and if the output accurate value and the corresponding measured value exceed the threshold value range, the identification model is regarded as bad data. The threshold value Δ e is:
Figure BDA0003540418140000071
xmaxis the largest measured value, xminIs the smallest measured value.
Because the number of samples used in this embodiment is large, it is sufficient to adopt a double hidden layer to meet the requirement in consideration of accuracy and processing speed. As shown in fig. 2, the graphs show the correlation between the convergence accuracies of the neuron numbers identified using the defective data corresponding to 5, 10, 15, 20, 25, and 30, respectively, and the horizontal axis represents the neuron number and the vertical axis represents the corresponding convergence accuracy, and the neuron numbers of the respective layers are the same for comparison. In the training process, the simulation time is almost multiplied with the increase of the number of neurons, the accuracy of the identification model is gradually increased, and after the number of nodes of the hidden layer of the identification model is respectively 20 and 20, the identification result is not obviously improved, because the depth model is always increased progressively and tends to be stable, which shows that the performance of the identification model is gradually optimized with the increase of the hidden layer. According to the relation of comprehensively considering model time and performance, the number of the hidden layer nodes of the identification model selected in the text is 20, and the finally obtained bad data result is shown in table 1. As can be seen from table 1, the model accuracy of each type of data reaches more than 97%, which indicates that the accuracy and convergence of identifying different types of bad data by using the identification model are high.
TABLE 1
Figure BDA0003540418140000072
Figure BDA0003540418140000081
After the training of the identification model is completed, the correction model is trained again, the correction model in the embodiment is a BP neural network, a trainbr algorithm is adopted, and the algorithm has better functional capability and higher convergence rate than a basic gradient algorithm, and is more suitable for a data set coupled with each other.
Specifically, the S300 includes:
s310: initializing hyper-parameters of the identification model, namely determining a neural network structure, a network threshold value and a weight value;
s320: after normalization processing is carried out on historical bad data, inputting the normalized historical bad data into a trained identification model, and taking a predicted value of the historical bad data output by the identification model as an accurate value, wherein specifically, the historical bad data is subjected to data normalization processing in Matlab by a mapminmax function;
s330: inputting historical bad data into a correction model, analyzing the characteristics of the historical bad data through the correction model, and outputting a correction value to the historical bad data according to an analysis result;
s340: and analyzing the error degree and the error dispersion degree of the corrected value relative to the accurate value, finishing training when the analysis result meets the preset condition, otherwise, adjusting the hyperparameter of the corrected model, namely adjusting the neural network structure, the network threshold value and the weight, and repeating S320-S330 until the preset condition is met.
In this embodiment, first, the hyper-parameters of the identification model are set, including parameters such as the initial network structure, the network threshold, and the weight. For the selection of the number of hidden layer nodes and the number of hidden layers of the neural network, if the number of the hidden layer nodes is too small, the network cannot have necessary learning capacity and information processing capacity, and conversely, if the number of the hidden layer nodes is too large, the network is more complex and the processing speed is slower, and the network is more prone to fall into local minimum points in the learning process. Therefore, in the embodiment, when the output of the identification model is multiple, the identification and modification effect of the bad data is better by adopting the hidden layers with more than 2 layers, and the number of nodes and other parameters of each layer are obtained in the training.
The degree of error and the degree of error dispersion of the analysis correction value with respect to the accurate value include:
analyzing the error degree by calculating the Mean Absolute Error (MAE) of the corrected value relative to the accurate value, wherein the specific calculation formula is as follows:
Figure BDA0003540418140000082
yiis the ith correction value, ytiIs the ith accurate value, and l is the total number of the corrected values.
Analyzing the error dispersion degree by calculating the root mean square difference (RMSE) of the corrected value relative to the accurate value, wherein the specific calculation formula is as follows:
Figure BDA0003540418140000091
specifically, in the training process of the correction model, the error degree and the error dispersion degree of the correction value relative to the accurate value are shown in table 2, the values of the correction evaluation indexes RMSE and MAE are very small and are relatively close to the actual values, which shows that the method has a good correction effect on various types of bad data of the new energy station, and both meet the preset condition, so that the training of the correction model is finished.
TABLE 2
Data type RMSE MAE
Active power 0.47706 0.07211
Wind speed 0.51107 0.05084
Wind direction cosine value 0.17631 0.13896
Temperature of 0.08850 0.00889
Pressure intensity 0.35447 0.05220
Humidity 0.41174 0.03381
In order to cope with the random change of the operation condition of the new energy station, the embodiment further includes: while executing S500, selecting normal data with a preset proportion and inputting the normal data into the correction model, and calculating the average absolute error and the root mean square error of the output value of the correction model and the normal data; and when any one of the average absolute error and the root mean square error does not meet the preset condition, taking the real-time operation data input during the execution of the S400 as training data, and re-training the identification model and the correction model. The hyper-parameters of the identification model and the correction model are adjusted in time through the process so as to meet the accuracy optimization of the identification model and the correction model.
In this embodiment, the real-time normal data is not changed, and when the real-time bad data is corrected, a part of the real-time normal data is also input into the trained correction model to be corrected, the corrected related error indexes are counted to obtain a corrected related error index comparison, if both the correction evaluation indexes RMSE and MAE meet the preset condition, the accuracy of the current model meets the requirement, and the hyper-parameters of the identification model and the correction model do not need to be adjusted temporarily.
The sequence numbers in the above embodiments are merely for description, and do not represent the sequence of the assembly or the use of the components.
The above description is intended to be illustrative of the present invention and should not be taken as limiting the invention, as the invention is intended to cover various modifications, equivalents, improvements, and equivalents, which may be made within the spirit and scope of the present invention.

Claims (9)

1. Poor data identification and correction method of new energy station based on deep learning is characterized by comprising the following steps:
s100: acquiring historical operating data and real-time operating data of an identification object in the new energy station, and marking historical normal data and historical bad data in the historical operating data;
s200: establishing an identification model, and performing deep learning training on the identification model according to historical normal data;
s300: establishing a correction model, inputting historical bad data into the correction model and the trained identification model, and performing deep learning training on the correction model by combining the output of the identification model;
s400: real-time normal data and real-time bad data in the real-time operation data are distinguished by inputting the real-time operation data into a trained identification model;
s500: and inputting the real-time bad data into the trained correction model to obtain the corrected value of the real-time bad data.
2. The method for identifying and correcting the bad data of the new energy station based on the deep learning as claimed in claim 1, wherein the identification objects comprise wind power generation parameters, photovoltaic power generation parameters and installed capacity and active power of units in the new energy station;
the wind power generation parameters comprise wind speed, temperature, wind direction cosine value, humidity and pressure intensity;
the photovoltaic power generation parameters comprise irradiation intensity, irradiation duration and assembly area.
3. The method for identifying and correcting the bad data of the new energy station based on the deep learning as claimed in claim 1, wherein the method further comprises: preprocessing historical operating data and real-time operating data before S200, including:
identifying missing values in historical operating data, acquiring historical operating data which belongs to the same class as the missing values, calculating the same-class mean value of the missing values to obtain interpolation values of the missing values, and replacing the missing values with the interpolation values, wherein the interpolation values have the calculation formula:
Figure FDA0003540418130000011
wherein the content of the first and second substances,
Figure FDA0003540418130000012
to interpolate a value, aiAs the mean coefficient, when the ith input historical operating data or real-time operating data siAnd if the data is missing, the data is 0, otherwise the data is 1, and m is the total data amount of the historical operation data or the real-time operation data of the same class.
4. The method according to claim 1, wherein the S200 includes:
s210: dividing historical normal data into training data and testing data according to a time sequence, and initializing hyper-parameters of an identification model;
s220: inputting training data into an identification model for training, and calculating a predicted value of an identification object under a corresponding time sequence of test data through the identification model;
s230: calculating whether the convergence accuracy of the identification model meets a preset threshold, wherein the calculation formula of the convergence accuracy is as follows:
Figure FDA0003540418130000021
where a is convergence accuracy, n is the total number of predicted values, xfiFor the ith test data, xiIs the ith predicted value;
s240: if the convergence accuracy meets the preset condition, the training is finished, otherwise, the hyper-parameters of the identification model are adjusted, and S220-S230 are repeated until the convergence accuracy does not exceed the preset condition.
5. The method for identifying and correcting the bad data of the new energy station based on the deep learning of claim 1, wherein the step S300 comprises:
s310: initializing hyper-parameters of the identification model;
s320: after normalization processing is carried out on historical bad data, inputting the historical bad data into a trained identification model, and taking a predicted value of the historical bad data output by the identification model as an accurate value;
s330: inputting historical bad data into a correction model for training, analyzing the characteristics of the historical bad data through the correction model, and outputting a correction value to the historical bad data according to an analysis result;
s340: and analyzing the error degree and the error dispersion degree of the corrected value relative to the accurate value, finishing training when the analysis result meets a preset condition, otherwise, adjusting the hyperparameter of the corrected model, and repeating S320-S330 until the preset condition is met.
6. The method for identifying and correcting the bad data of the new energy station based on the deep learning of claim 5, wherein analyzing the degree of error and the degree of error dispersion of the corrected value with respect to the accurate value comprises:
analyzing the degree of error by calculating the average absolute error of the corrected value relative to the accurate value;
and analyzing the error dispersion degree by calculating the root mean square difference of the corrected value relative to the accurate value.
7. The method for identifying and correcting the bad data of the new energy station based on the deep learning as claimed in claim 1, wherein the method further comprises:
while executing S500, selecting normal data with a preset proportion and inputting the normal data into the correction model, and calculating the average absolute error and the root mean square error of the output value of the correction model and the normal data;
and when any one of the average absolute error and the root mean square error does not meet the preset condition, taking the real-time operation data input during the execution of the S400 as training data, and re-training the identification model and the correction model.
8. The method for identifying and correcting the bad data of the new energy station based on the deep learning as claimed in claim 1, wherein the identification model is provided with a solver at an output layer, and the solver is a softmax function.
9. The method for identifying and correcting the bad data of the new energy station based on the deep learning of claim 8, wherein the solver outputs a probability of the bad data in real time by comparing the calculated predicted value with an actually measured error of the real-time operation data, and outputs the identified bad data in real time and a time sequence position where the bad data is located according to the probability.
CN202210230736.2A 2022-03-10 2022-03-10 New energy station bad data identification and correction method based on deep learning Pending CN114692729A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210230736.2A CN114692729A (en) 2022-03-10 2022-03-10 New energy station bad data identification and correction method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210230736.2A CN114692729A (en) 2022-03-10 2022-03-10 New energy station bad data identification and correction method based on deep learning

Publications (1)

Publication Number Publication Date
CN114692729A true CN114692729A (en) 2022-07-01

Family

ID=82137416

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210230736.2A Pending CN114692729A (en) 2022-03-10 2022-03-10 New energy station bad data identification and correction method based on deep learning

Country Status (1)

Country Link
CN (1) CN114692729A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115481916A (en) * 2022-09-27 2022-12-16 呼伦贝尔安泰热电有限责任公司海拉尔热电厂 Method and system for identifying heating area heat index in central heating in alpine region

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115481916A (en) * 2022-09-27 2022-12-16 呼伦贝尔安泰热电有限责任公司海拉尔热电厂 Method and system for identifying heating area heat index in central heating in alpine region
CN115481916B (en) * 2022-09-27 2023-09-19 呼伦贝尔安泰热电有限责任公司海拉尔热电厂 Method and system for identifying heat index of heating area in central heating in alpine region

Similar Documents

Publication Publication Date Title
CN106874581B (en) Building air conditioner energy consumption prediction method based on BP neural network model
CN116757534B (en) Intelligent refrigerator reliability analysis method based on neural training network
CN111210024A (en) Model training method and device, computer equipment and storage medium
CN115018021B (en) Machine room abnormity detection method and device based on graph structure and abnormity attention mechanism
CN113486578A (en) Method for predicting residual life of equipment in industrial process
CN110879377A (en) Metering device fault tracing method based on deep belief network
CN113344288B (en) Cascade hydropower station group water level prediction method and device and computer readable storage medium
CN116579768B (en) Power plant on-line instrument operation and maintenance management method and system
CN116227637A (en) Active power distribution network oriented refined load prediction method and system
CN112070322A (en) High-voltage cable line running state prediction method based on long-short term memory network
CN113758652B (en) Oil leakage detection method and device for converter transformer, computer equipment and storage medium
CN114692729A (en) New energy station bad data identification and correction method based on deep learning
Liu et al. Research on the strategy of locating abnormal data in IOT management platform based on improved modified particle swarm optimization convolutional neural network algorithm
CN114189047A (en) False data detection and correction method for active power distribution network state estimation
CN113379116A (en) Cluster and convolutional neural network-based line loss prediction method for transformer area
CN116485049B (en) Electric energy metering error prediction and optimization system based on artificial intelligence
CN117407675A (en) Lightning arrester leakage current prediction method based on multi-variable reconstruction combined dynamic weight
CN111863153A (en) Method for predicting total amount of suspended solids in wastewater based on data mining
CN116151799A (en) BP neural network-based distribution line multi-working-condition fault rate rapid assessment method
CN116644358A (en) Power system transient stability evaluation method based on Bayesian convolutional neural network
CN114252266A (en) Rolling bearing performance degradation evaluation method based on DBN-SVDD model
CN113761795A (en) Aircraft engine fault detection method and system
Yan et al. Remaining Useful Life Interval Prediction for Complex System Based on BiGRU Optimized by Log-Norm
CN110826690A (en) Equipment state identification method and system and computer readable storage medium
CN116595883B (en) Real-time online system state correction method for numerical reactor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination