CN113128612A - Processing method of abnormal value in power data and terminal equipment - Google Patents

Processing method of abnormal value in power data and terminal equipment Download PDF

Info

Publication number
CN113128612A
CN113128612A CN202110463520.6A CN202110463520A CN113128612A CN 113128612 A CN113128612 A CN 113128612A CN 202110463520 A CN202110463520 A CN 202110463520A CN 113128612 A CN113128612 A CN 113128612A
Authority
CN
China
Prior art keywords
data
power data
abnormal
output value
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110463520.6A
Other languages
Chinese (zh)
Other versions
CN113128612B (en
Inventor
张凯
何胜
郭威
魏新杰
吴清普
李士林
刘梅
罗欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Tsingsoft Technology Co ltd
State Grid Corp of China SGCC
State Grid Hebei Electric Power Co Ltd
Marketing Service Center of State Grid Hebei Electric Power Co Ltd
Original Assignee
Beijing Tsingsoft Technology Co ltd
State Grid Corp of China SGCC
State Grid Hebei Electric Power Co Ltd
Marketing Service Center of State Grid Hebei Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Tsingsoft Technology Co ltd, State Grid Corp of China SGCC, State Grid Hebei Electric Power Co Ltd, Marketing Service Center of State Grid Hebei Electric Power Co Ltd filed Critical Beijing Tsingsoft Technology Co ltd
Priority to CN202110463520.6A priority Critical patent/CN113128612B/en
Publication of CN113128612A publication Critical patent/CN113128612A/en
Application granted granted Critical
Publication of CN113128612B publication Critical patent/CN113128612B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/10Pre-processing; Data cleansing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention is suitable for the technical field of data processing, and provides a method for processing abnormal values in electric power data and terminal equipment, wherein the method comprises the following steps: the method comprises the steps of obtaining electric power data generated in an electric power operation link, conducting data dimensionality reduction and data standardization processing on the electric power data to obtain processed standard electric power data, inputting the standard electric power data into a kernel function extreme learning model and a superposition integration model respectively to conduct outlier identification to obtain a first output value set and a second output value set, conducting dynamic analysis according to the first output value set and the second output value set by adopting a regularization-based linear regression analysis method to obtain an abnormal data set, conducting data cleaning on the abnormal data set, and obtaining cleaned electric power data. The invention can simultaneously improve the detection precision and efficiency of abnormal values in the power data and obtain the corrected power data after the abnormal data set is subjected to data cleaning.

Description

Processing method of abnormal value in power data and terminal equipment
Technical Field
The invention belongs to the technical field of data processing, and particularly relates to a method for processing abnormal values in electric power data and terminal equipment.
Background
With the gradual expansion of the electric power construction scale in five links of power generation, transmission, transformation, distribution, power utilization and the like of a power grid, a large amount of electric power data are generated, and the data play an important role in the development of the power grid in China towards digitalization and intellectualization. However, in the process of electric power operation, once the situations such as electric power equipment damage, fuse or illegal intrusion occur, the electric power data is abnormal, which may affect normal power consumption, line loss increase and electric quantity loss of users, and may cause fire, affect the stability and safety of the power grid, and bring serious economic loss and reputation loss to power grid enterprises. Effective outlier detection of power data has been frustrating.
However, the detection of abnormal values in the currently employed power data cannot achieve the dual objectives of detection accuracy and efficiency at the same time.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method for processing an abnormal value in power data and a terminal device, which are used to solve the problem in the prior art that the dual objectives of detection accuracy and efficiency cannot be achieved simultaneously.
To achieve the above object, a first aspect of an embodiment of the present invention provides a method for processing an abnormal value in power data, including:
acquiring power data generated in a power operation link;
performing data dimensionality reduction and data standardization processing on the electric power data to obtain processed standard electric power data;
respectively inputting the standard power data into a kernel function extreme learning model and a superposition integration model for abnormal value identification to obtain a first output value set and a second output value set;
according to the first output value set and the second output value set, performing dynamic analysis by adopting a regularization-based linear regression analysis method to obtain an abnormal data set;
and carrying out data cleaning on the abnormal data set to obtain cleaned power data.
A second aspect of an embodiment of the present invention provides an apparatus for processing an abnormal value in power data, including:
the acquisition module is used for acquiring power data generated in a power operation link;
the first processing module is used for carrying out data dimensionality reduction and data standardization processing on the electric power data to obtain processed standard electric power data;
the second processing module is used for inputting the standard power data into the kernel function extreme learning model and the superposition integration model respectively for abnormal value identification to obtain a first output value set and a second output value set;
the third processing module is used for carrying out dynamic analysis by adopting a linear regression analysis method based on regularization according to the first output value set and the second output value set to obtain an abnormal data set;
and the data cleaning module is used for cleaning the data of the abnormal data set to obtain the cleaned power data.
A third aspect of an embodiment of the present invention provides a terminal device, including: a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the method for processing abnormal values in power data as described in the first aspect.
Compared with the prior art, the embodiment of the invention has the following beneficial effects: according to the method, the abnormal values are identified by respectively inputting the standard power data into the kernel function limit learning model and the superposition integration model, the first output value set and the second output value set are obtained, dynamic analysis is carried out by adopting a regularization-based linear regression analysis method according to the first output value set and the second output value set, the abnormal data set is obtained, the problems that the abnormal data set is easily trapped into a local minimum value and a statistical hypothesis space is limited due to the fact that only a single model is adopted to obtain the abnormal data set can be solved, the accuracy and the reliability of the abnormal value detection in the power data are improved, and meanwhile the efficiency of the abnormal value detection in the power data is improved. In addition, the abnormal data set is subjected to data cleaning, and modified power data after the data cleaning can be obtained.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
Fig. 1 is a schematic flow chart of an implementation of a method for processing an abnormal value in power data according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a basic architecture of cloud computing provided by an embodiment of the present invention;
FIG. 3 is a schematic diagram of a data warehouse provided by an embodiment of the invention;
FIG. 4 is a basic architecture diagram of a kernel function extreme learning model according to an embodiment of the present invention;
FIG. 5 is a comparison graph of the results of three algorithms provided by the embodiment of the present invention;
FIG. 6 is a schematic diagram of an apparatus for processing abnormal values in power data provided by an embodiment of the present invention;
fig. 7 is a schematic diagram of a terminal device according to an embodiment of the present invention.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.
In order to explain the technical means of the present invention, the following description will be given by way of specific examples.
Fig. 1 is a schematic flow chart illustrating an implementation of a method for processing an abnormal value in power data according to an embodiment of the present invention, which is described in detail below.
Step 101, acquiring power data generated in a power operation link.
Important hidden information can be mined from the power data, and the information plays an important role in strategic decision making, troubleshooting, operation cost reduction, safe and stable operation maintenance of a power grid and long-term development of an enterprise of the power enterprise. Due to the fact that the power data has the characteristic of large volume, abnormal values in the power data can be processed by adopting a basic framework of cloud computing as shown in fig. 2 based on the storage and computing capacity of the cloud computing on the ultra-large scale data.
Optionally, the obtaining of the power data generated in the power operation link may include: acquiring source power data generated in a power operation link; extracting, converting and loading the source power data to obtain power data; the power data is saved in a data repository.
Abnormal value analysis is carried out on data of 5 power operation links such as power generation, transmission, transformation, distribution and power utilization of a power grid, the power operation data of the links are stored in different data sources, and therefore integration is needed to obtain the power data generated in the power operation links. The integrated acquisition of the power data is mainly realized through data warehouse technology. The data warehouse structure is shown in fig. 3.
The purpose of the data warehouse is to construct an analysis-oriented integrated data environment and provide decision support for a power grid enterprise, and the key of the data warehouse is an Extract-Transform-Load (ETL), which is a process that data is extracted (Extract), converted (Transform), and loaded (Load) from a source end to a destination end. ETL is a pipeline of data warehouse, also known as blood of data warehouse, that maintains the metabolism of data in the data warehouse, and most of the effort of daily management and maintenance work of the data warehouse is to maintain the normal and stable operation of ETL.
And 102, performing data reduction and data standardization processing on the power data to obtain processed standard power data.
In the process of acquiring the power data, the quality of the acquired power data cannot meet the standard of subsequent abnormal value detection and analysis due to the influence of various factors, and various problems always exist, such as maximum value, minimum value, null value, noise, data inconsistency, data confusion and the like, so that preprocessing is required, and the processing process comprises data dimension reduction and data standardization.
The dimensionality reduction refers to dimensionality reduction, namely, original data characteristic variables are represented by fewer new characteristic variables, so that the new characteristic variables are not related to each other, and the new variables also have main information capable of reflecting business problems. The purpose of data dimensionality reduction is to reduce the amount of data, reduce computational complexity, and remove noisy data therefrom.
Data from different sources, unit or dimension are different, and different unit or dimension data cannot be directly compared and analyzed, so data standardization processing is required to eliminate the unit or dimension of the data from different sources.
Optionally, performing data reduction and data standardization processing on the power data to obtain processed standard power data, which may include: constructing a data covariance matrix according to the power data; performing data dimensionality reduction according to the eigenvalue of the data covariance matrix to obtain new data subjected to dimensionality reduction; determining the maximum value and the minimum value in the new data after dimensionality reduction; and according to the maximum value and the minimum value, sequentially carrying out standardization processing on each datum in the new data after dimensionality reduction to obtain processed standard electric power data.
Optionally, performing data dimension reduction according to the eigenvalue of the data covariance matrix to obtain new data after dimension reduction, which may include: and calculating eigenvalues and eigenvalues of the data covariance matrix, sorting the eigenvalues from large to small according to the contribution degree, taking the first K eigenvalues as principal components, converting the electric power data into a new data space constructed by new eigenvectors, finishing data dimension reduction, and obtaining new data after dimension reduction.
Optionally, according to the maximum value and the minimum value, sequentially standardizing each data in the new data after the dimensionality reduction to obtain processed standard power data, which may include:
according to
Figure BDA0003039062730000051
Obtaining processed standard electric power data;
wherein, a' represents the processed standard power data, a represents the new data after the current dimension reduction to be processed, and min anDenotes the minimum value, max anRepresents the maximum value, anAll the new dimension-reduced data to be processed are shown, and n is the number of the new dimension-reduced data and is a positive integer.
Step 103, respectively inputting the standard power data into the kernel function limit learning model and the superposition integration model for abnormal value identification, and obtaining a first output value set and a second output value set.
Optionally, the step of inputting the standard power data into the kernel function limit learning model and the superposition integration model respectively for abnormal value identification to obtain the first output value set may include:
setting a preset constant, carrying out optimal second-order solution processing on the weights of a hidden layer and an output layer in the kernel function extreme learning model based on the preset constant to obtain a new weight, updating the kernel function extreme learning model according to the new weight to obtain a new kernel function extreme learning model, and inputting standard power data into the new kernel function extreme learning model to obtain a first output value set.
The kernel function extreme learning model belongs to a single-layer feedforward neural network algorithm. The basic extreme learning model is expressed in the form of:
f(x)=h(x)β;
where h (x) represents the calculated output of the hidden layer. Beta is ═ beta12,…,βL]TRepresenting the connection weight between the hidden layer and the output layer. The error expression of the extreme learning model is as follows:
Figure BDA0003039062730000061
wherein L represents the number of neurons, fO(x) Representing a genuine mark. The basic architecture of the extreme learning model is shown in fig. 4, and the output function can be expressed as:
Figure BDA0003039062730000062
wherein, gi(x) And G (b)i,ciX) represents the output function of the i-th hidden node, bi,ciRepresenting a hidden layer parameter, betaiRepresenting the connection weight between the hidden layer and the output layer,training the feedforward neural network requires an optimal quadratic solution of the weights:
Figure BDA0003039062730000063
wherein, T represents a preset constant, H represents a neural network hidden layer matrix, and based on the preset constant, optimal second-order solution processing is performed on the weights of the hidden layer and the output layer in the kernel function extreme learning model to obtain a new weight, which may include:
according to
Figure BDA0003039062730000064
A new weight is obtained.
Wherein, beta represents the new weight,
Figure BDA0003039062730000065
is a generalized inverse matrix of H, HTAnd C is a transposed matrix of H and represents a second preset constant. And increasing the constant 1/C based on the second preset constant C, so that the solving result of the new weight has better generalization capability.
Illustratively, the kernel function employed in the present embodiment may be a gaussian kernel function ΩELMThe expression of the gaussian kernel function extreme learning model is as follows:
Figure BDA0003039062730000071
where N represents the input layer dimension.
Optionally, the step of inputting the standard power data into the kernel function limit learning model and the superposition integration model respectively to perform outlier identification to obtain a second output value set may include: k sets of training data are obtained from the standard power data, and an initial weight for each set of training data. And training the preset learning model according to the initial weight of each group of training data and the k groups of training data to obtain the sum of the output errors of the first group of weak learning models and the first group of weak learning models in the superposition integration model. And updating the weight of each group of training data according to the sum of the output errors of the first group of weak learning models, and continuing training according to the updated weight of each group of training data and the k groups of training data until T groups of weak learning models are obtained. And combining the T groups of weak learning models to obtain a strong learning model. And inputting standard power data into the strong learning model to obtain a second output value set.
For example, the superposition integration model may be an Adaboost model, which trains a plurality of weak learning models, and then combines the weak learning models to form a strong learning model. The general idea is that a lower weight is given to a correct sample, a higher weight is given to an incorrect sample, and the model performance is improved through continuous weighted combination.
For example, k sets of training data may be obtained from the standard power data, and an initial weight D for each set of training data1(i) 1/k. When a t-th weak learning model is trained, k groups of training data are used for training a decision tree, so that the sum of output errors of the t-th weak learning model is et=D1(i) Calculating the weight alpha of the t-th weak learning model according to the sum of the output errors of the t-th weak learning modeltThe weight calculation formula is:
Figure BDA0003039062730000081
weight α according to the t-th weak learning modeltUpdating the weight of each group of training data, wherein the updating formula is as follows:
Figure BDA0003039062730000082
wherein i is 1,2, …, k, BtRepresenting a normalization factor.
Training T rounds to obtain T groups of weak learning models f (g)tt) And combining the T groups of weak learning models to obtain a strong learning model q (x):
Figure BDA0003039062730000083
optionally, the standard power data may be input into the neural network model to perform outlier identification, and the result obtained through the neural network model identification and the result obtained through the superposition integration model identification are combined to obtain the second output value set.
And 104, performing dynamic analysis by adopting a linear regression analysis method based on regularization according to the first output value set and the second output value set to obtain an abnormal data set.
Optionally, performing dynamic analysis by using a linear regression analysis method based on regularization according to the first output value set and the second output value set to obtain an abnormal data set, where the method includes:
and obtaining an abnormal data set according to the Y as X delta + epsilon.
Where Y denotes an abnormal data set, and X ═ X (X)(1),X(2)),X(1)Representing a first set of output values, X(2)Representing a second set of output values, δ representing a regression coefficient, and ε representing a random error term.
The regularization-based linear regression analysis method can be a Lasso regression method, and the main idea is to minimize the sum of squares of residuals under the condition that the sum of absolute values of correlation coefficients is less than a threshold value.
Wherein, X(2)The second output value set obtained by inputting the standard power data into the superposition integration model for abnormal value identification can be represented, or the second output value set obtained by inputting the standard power data into the neural network model and the superposition integration model for abnormal value identification can be represented. Random error term ═ epsilon12,…,εm)T,εi~N(0,σ2) The regression coefficient δ ═ δ (δ)12,…,δd)TAnd m and d represent the number of regression coefficients. Lasso regression method in which L is increased1Penalty term, resulting in a Lasso estimate:
Figure BDA0003039062730000091
wherein,
Figure BDA0003039062730000092
representing dynamic weights corresponding to the first or second set of output values, k representing a tuning coefficient, δjRepresenting the weight of the jth learning algorithm, and solving the dynamic weight in the process of training the Lasso model according to the first output value set and the second output value set
Figure BDA0003039062730000093
Dynamic weights
Figure BDA0003039062730000094
And (4) finishing the establishment of the representative Lasso model after the calculation, namely obtaining an abnormal data set according to the trained Lasso model, the first output value set and the second output value set.
In this embodiment, the abnormal value identification results of the kernel function extreme learning model and the superposition integration model are further learned in a Lasso linear combination manner, or the abnormal value identification results of the kernel function extreme learning model, the neural network model and the superposition integration model are further learned in a Lasso linear combination manner, so that the advantages of each combination model can be learned, and the problems that an abnormal data set is easily trapped in a local minimum value and a statistical hypothesis space is limited due to the fact that only a single model is used for obtaining the abnormal data set are solved.
Optionally, in order to maintain real-time performance of the method for processing the abnormal value in the power data, when the method for obtaining the abnormal data set in the embodiment satisfies a certain time threshold or reaches a certain error threshold, the first output value set and the second output value set may be updated according to the method for obtaining the first output value set and the second output value set, and the dynamic weights of the updated first output value set and the updated second output value set may be solved according to the above formula according to the updated first output value set and the updated second output value set. Therefore, an abnormal data identification model with time sequence rolling is obtained, so that the abnormal data identification model of the embodiment can learn the structural characteristics of the latest data, and the mutual matching of the current model and the data is ensured.
And 105, performing data cleaning on the abnormal data set to obtain cleaned power data.
Optionally, the data cleaning is performed on the abnormal data set to obtain cleaned power data, and the method includes:
according to
Figure BDA0003039062730000101
And obtaining the cleaned power data.
Wherein L isd,tRepresents the power data after the cleaning at the time t on the day d, yd-g,tDenotes the load observation data, lambda, at time t on day d-ggRepresents the weight of the influence of the load observation data at the d-g day t on the power data to be cleaned at the d day t, g is 1,2, …, ma,maIndicating the number of similar days selected.
In this embodiment, the load abnormal value in the power data may be identified based on the abnormal data set, and on this basis, the load abnormal value in the identified power data may be corrected by using a weighted average method.
The method for processing the abnormal value in the power data will be further described below by specific examples.
A daily power consumption data table of nearly 20000 users in a year of 2018 is obtained, 77798020 pieces of data are selected from the power consumption data to serve as test samples, and the distribution of the samples is shown in table 1.
TABLE 1 sample distribution
Categories Abnormality (S)Data of Normal data
Number of 26562555 51235465
The overall index testing method is as follows:
1) accuracy of detection
When the sample data set used for algorithm performance detection is unbalanced in type, if the accuracy of the classification result is directly adopted for algorithm performance evaluation, a larger error exists between the obtained result and the actual result, so that detection accuracy analysis is performed by adopting (Receiver Operating Characteristic, ROC) and (Area open rock boundary, AUC).
ROC is a curve drawn on a two-dimensional plane, ROC curve. The abscissa of the plane is False Positive Rate (FPR) and the ordinate is True Positive Rate (TPR). For a classifier, a TPR and FPR point pair may be derived from its performance on the test sample. Thus, the classifier can be mapped to a point on the ROC plane. By adjusting the threshold used in the classification of the classifier, a curve passing through (0,0) and (1,1) can be obtained, which is the ROC curve of the classifier. However, it is always desirable to have a value to indicate the quality of the classifier. The AUC then appears.
The AUC value is the area of the region covered by the ROC curve, and it is obvious that the larger the AUC is, the better the classifier classification effect is.
AUC 1 is a perfect classifier, and when this prediction model is used, a perfect prediction can be obtained regardless of what threshold is set. In most prediction scenarios, no perfect classifier exists.
0.5< AUC <1, superior to random guess. This classifier (model) can be predictive if it sets the threshold value properly.
AUC is 0.5, as the follower guesses, the model has no predictive value.
AUC <0.5, worse than random guess; but is better than random guessing as long as it always works against prediction.
2) Efficiency of detection
The detection efficiency refers to the time consumed by the algorithm in each operating period, and the shorter the time is, the higher the detection efficiency is.
To better illustrate the advantages of the processing method for abnormal values in the power data of the embodiment in terms of detection accuracy and calculation efficiency, a comparison algorithm is adopted: the method (model 1) for processing abnormal values in the power data, the gaussian process algorithm (model 2) and the random forest algorithm (model 3) of the present embodiment perform simultaneous calculation.
The results were analyzed as follows:
1) accuracy of detection
As can be seen from fig. 5, when the model 1 is used to detect abnormal values in the power data, the ROC curve drawn according to the detection result is closer to the upper left corner, and the AUC value is 0.8025, the ROC and AUC perform better than the results obtained by the detection using the models 2 and 3, thereby illustrating that the algorithm of the embodiment of the present invention has better detection accuracy.
2) Efficiency of detection
As can be seen from table 2, the time consumed for detecting the abnormal value of the large data of the power operation is 85s when the model 1 runs for one cycle, and the detection time is less than that of the other 2 methods. Therefore, the algorithm of the embodiment of the invention has high detection efficiency.
TABLE 2 calculation of time
Figure BDA0003039062730000111
Figure BDA0003039062730000121
According to the processing method of the abnormal values in the power data, the standard power data are respectively input into the kernel function limit learning model and the superposition integration model to identify the abnormal values to obtain the first output value set and the second output value set, and the abnormal data set is obtained by adopting a regularization-based linear regression analysis method to perform dynamic analysis according to the first output value set and the second output value set. In addition, the abnormal data set is subjected to data cleaning, and modified power data after the data cleaning can be obtained.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
Fig. 6 is a diagram showing an example of a processing apparatus for processing an abnormal value in power data according to an embodiment of the present invention, which corresponds to the processing method for an abnormal value in power data described in the above embodiment. As shown in fig. 6, the apparatus may include: an acquisition module 61, a first processing module 62, a second processing module 63, a third processing module 64, and a data cleansing module 65.
An obtaining module 61, configured to obtain power data generated in a power operation link;
the first processing module 62 is configured to perform data dimensionality reduction and data standardization processing on the power data to obtain processed standard power data;
the second processing module 63 is configured to input the standard power data into the kernel function limit learning model and the superposition integration model respectively for abnormal value identification, so as to obtain a first output value set and a second output value set;
a third processing module 64, configured to perform dynamic analysis by using a linear regression analysis method based on regularization according to the first output value set and the second output value set, so as to obtain an abnormal data set;
and the data cleaning module 65 is configured to perform data cleaning on the abnormal data set to obtain cleaned power data.
Optionally, the obtaining module 61 may be configured to obtain source power data generated in a power operation link; extracting, converting and loading the source power data to obtain power data; saving the power data in a data repository.
Optionally, the first processing module 62 may be configured to construct a data covariance matrix according to the power data; performing data dimensionality reduction according to the eigenvalue of the data covariance matrix to obtain new data subjected to dimensionality reduction; determining the maximum value and the minimum value in the new data after dimension reduction; and sequentially carrying out standardization processing on each datum in the new data after dimensionality reduction according to the maximum value and the minimum value to obtain processed standard electric power data.
Optionally, the first processing module 62 may be adapted to process the data according to
Figure BDA0003039062730000131
Obtaining processed standard electric power data;
wherein, a' represents the processed standard power data, a represents the new data after the current dimension reduction to be processed, and min anDenotes the minimum value, max anRepresents the maximum value, anAll the new dimension-reduced data to be processed are shown, and n is the number of the new dimension-reduced data and is a positive integer.
Optionally, the second processing module 63 may be configured to set a preset constant;
based on the preset constant, carrying out optimal second-product solution processing on the weights of the hidden layer and the output layer in the kernel function extreme learning model to obtain a new weight;
updating the kernel function extreme learning model according to the new weight to obtain a new kernel function extreme learning model;
inputting the standard power data into the new kernel function limit learning model to obtain a first output value set;
acquiring k groups of training data and the initial weight of each group of training data according to the standard power data;
training a preset learning model according to the initial weight of each group of training data and the k groups of training data to obtain the sum of a first group of weak learning models in the superposition integration model and the output error of the first group of weak learning models;
updating the weight of each group of training data according to the sum of the output errors of the first group of weak learning models, and continuing to train according to the updated weight of each group of training data and the k groups of training data until T groups of weak learning models are obtained;
combining the T groups of weak learning models to obtain a strong learning model;
and inputting the standard power data into the strong learning model to obtain a second output value set.
Optionally, a second processing module 63 may be used according to
Figure BDA0003039062730000141
Obtaining a new weight;
wherein beta represents a new weight, H represents a neural network hidden layer matrix,
Figure BDA0003039062730000142
is a generalized inverse matrix of H, HTAnd the transpose matrix is H, T represents a preset constant, and C represents a second preset constant.
Optionally, the third processing module 64 may be configured to obtain an abnormal data set according to Y ═ X δ + epsilon;
where Y denotes an abnormal data set, and X ═ X (X)(1),X(2)),X(1)Representing said first set of output values, X(2)Represents the second set of output values, δ represents a regression coefficient, and ε represents a random error term.
Optionally, a data cleansing module 65 may be used according to
Figure BDA0003039062730000143
Obtaining cleaned power data;
wherein L isd,tIndicates day d, time tPower data, y, after washingd-g,tDenotes the load observation data, lambda, at time t on day d-ggRepresents the weight of the influence of the load observation data at the d-g day t on the power data to be cleaned at the d day t, g is 1,2, …, ma,maIndicating the number of similar days selected.
According to the processing device for the abnormal values in the power data, the standard power data are respectively input into the kernel function limit learning model and the superposition integration model to identify the abnormal values, so that a first output value set and a second output value set are obtained, dynamic analysis is carried out by adopting a regularization-based linear regression analysis method according to the first output value set and the second output value set, so that an abnormal data set is obtained, the problems that the abnormal data set is easy to fall into a local minimum value and a statistical hypothesis space is limited due to the fact that only a single model is adopted to obtain the abnormal data set can be avoided, and the efficiency of detecting the abnormal values in the power data is improved while the accuracy and reliability of detecting the abnormal values in the power data are improved. In addition, the abnormal data set is subjected to data cleaning, and modified power data after the data cleaning can be obtained.
Fig. 7 is a schematic diagram of a terminal device according to an embodiment of the present invention. As shown in fig. 7, the terminal device 700 of this embodiment includes: a processor 701, a memory 702, and a computer program 703, such as a processing program for abnormal values in power data, stored in the memory 702 and executable on the processor 701. The processor 701 implements the steps in the embodiment of the processing method for the abnormal value in the power data, such as the steps 101 to 105 shown in fig. 1, when executing the computer program 703, and the processor 701 implements the functions of the modules in the embodiments of the apparatuses, such as the modules 61 to 65 shown in fig. 6, when executing the computer program 703.
Illustratively, the computer program 703 may be partitioned into one or more program modules, which are stored in the memory 702 and executed by the processor 701 to implement the present invention. The one or more program modules may be a series of computer program instruction segments capable of performing a specific function, which are used to describe the execution process of the computer program 703 in the processing apparatus or the terminal device 700 of the abnormal value in the power data. For example, the computer program 703 may be divided into the obtaining module 61, the first processing module 62, the second processing module 63, the third processing module 64, and the data cleansing module 65, and specific functions of the modules are shown in fig. 6, which is not described herein again.
The terminal device 700 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The terminal device may include, but is not limited to, a processor 701, a memory 702. Those skilled in the art will appreciate that fig. 7 is merely an example of a terminal device 700 and does not constitute a limitation of terminal device 700 and may include more or fewer components than shown, or some components may be combined, or different components, e.g., the terminal device may also include input-output devices, network access devices, buses, etc.
The Processor 701 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The storage 702 may be an internal storage unit of the terminal device 700, such as a hard disk or a memory of the terminal device 700. The memory 702 may also be an external storage device of the terminal device 700, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the terminal device 700. Further, the memory 702 may also include both an internal storage unit and an external storage device of the terminal device 700. The memory 702 is used for storing the computer programs and other programs and data required by the terminal device 700. The memory 702 may also be used to temporarily store data that has been output or is to be output.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. . Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims (10)

1. A method for processing an abnormal value in power data, comprising:
acquiring power data generated in a power operation link;
performing data dimensionality reduction and data standardization processing on the electric power data to obtain processed standard electric power data;
respectively inputting the standard power data into a kernel function extreme learning model and a superposition integration model for abnormal value identification to obtain a first output value set and a second output value set;
according to the first output value set and the second output value set, performing dynamic analysis by adopting a regularization-based linear regression analysis method to obtain an abnormal data set;
and carrying out data cleaning on the abnormal data set to obtain cleaned power data.
2. The method for processing abnormal values in power data according to claim 1, wherein the acquiring of the power data generated in the power operation link comprises:
acquiring source power data generated in a power operation link;
extracting, converting and loading the source power data to obtain power data;
saving the power data in a data repository.
3. The method for processing the abnormal value in the power data according to claim 1, wherein the step of performing data de-dimensionality and data normalization on the power data to obtain the processed standard power data comprises:
constructing a data covariance matrix according to the power data;
performing data dimensionality reduction according to the eigenvalue of the data covariance matrix to obtain new data subjected to dimensionality reduction;
determining the maximum value and the minimum value in the new data after dimension reduction;
and sequentially carrying out standardization processing on each datum in the new data after dimensionality reduction according to the maximum value and the minimum value to obtain processed standard electric power data.
4. The method for processing abnormal values in power data according to claim 3, wherein the step of sequentially normalizing each datum in the new reduced-dimension datum according to the maximum value and the minimum value to obtain processed standard power data comprises the steps of:
according to
Figure FDA0003039062720000021
Obtaining processed standard electric power data;
wherein, a' represents the processed standard power data, a represents the new data after the dimensionality reduction to be processed currently, minanDenotes the minimum value, maxanRepresents the maximum value, anAll the new dimension-reduced data to be processed are shown, n is the number of the new dimension-reduced data, and is positiveAn integer number.
5. The method for processing the abnormal value in the power data according to any one of claims 1 to 4, wherein the step of inputting the standard power data into a kernel function extreme learning model and a superposition integration model respectively for abnormal value identification to obtain a first output value set and a second output value set comprises the steps of:
setting a preset constant;
based on the preset constant, carrying out optimal second-product solution processing on the weights of the hidden layer and the output layer in the kernel function extreme learning model to obtain a new weight;
updating the kernel function extreme learning model according to the new weight to obtain a new kernel function extreme learning model;
inputting the standard power data into the new kernel function limit learning model to obtain a first output value set;
acquiring k groups of training data and the initial weight of each group of training data according to the standard power data;
training a preset learning model according to the initial weight of each group of training data and the k groups of training data to obtain the sum of a first group of weak learning models in the superposition integration model and the output error of the first group of weak learning models;
updating the weight of each group of training data according to the sum of the output errors of the first group of weak learning models, and continuing to train according to the updated weight of each group of training data and the k groups of training data until T groups of weak learning models are obtained;
combining the T groups of weak learning models to obtain a strong learning model;
and inputting the standard power data into the strong learning model to obtain a second output value set.
6. The method for processing the abnormal value in the power data according to claim 5, wherein the performing an optimal second-order solution process on the weights of the hidden layer and the output layer in the kernel function limit learning model based on the preset constant to obtain a new weight comprises:
according to
Figure FDA0003039062720000031
Obtaining a new weight;
wherein beta represents a new weight, H represents a neural network hidden layer matrix,
Figure FDA0003039062720000033
is a generalized inverse matrix of H, HTAnd the transpose matrix is H, T represents a preset constant, and C represents a second preset constant.
7. The method for processing abnormal values in power data according to claim 6, wherein the dynamic analysis is performed by a linear regression analysis method based on regularization according to the first output value set and the second output value set, so as to obtain an abnormal data set, and the method comprises:
obtaining an abnormal data set according to the Y-X delta + epsilon;
where Y denotes an abnormal data set, and X ═ X (X)(1),X(2)),X(1)Representing said first set of output values, X(2)Represents the second set of output values, δ represents a regression coefficient, and ε represents a random error term.
8. The method for processing abnormal values in power data according to claim 5, wherein the step of performing data cleaning on the abnormal data set to obtain cleaned power data comprises the following steps:
according to
Figure FDA0003039062720000032
Obtaining cleaned power data;
wherein L isd,tRepresents the power data after the cleaning at the time t on the day d, yd-g,tDenotes the load observation data, lambda, at time t on day d-ggRepresenting the weight of the influence of the load observation data at the time t of the d-g days on the power data to be cleaned at the time t of the d day, g ═1,2,…,ma,maIndicating the number of similar days selected.
9. An apparatus for processing an abnormal value in power data, comprising:
the acquisition module is used for acquiring power data generated in a power operation link;
the first processing module is used for carrying out data dimensionality reduction and data standardization processing on the electric power data to obtain processed standard electric power data;
the second processing module is used for inputting the standard power data into the kernel function extreme learning model and the superposition integration model respectively for abnormal value identification to obtain a first output value set and a second output value set;
the third processing module is used for carrying out dynamic analysis by adopting a linear regression analysis method based on regularization according to the first output value set and the second output value set to obtain an abnormal data set;
and the data cleaning module is used for cleaning the data of the abnormal data set to obtain the cleaned power data.
10. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 8 when executing the computer program.
CN202110463520.6A 2021-04-26 2021-04-26 Processing method of abnormal value in power data and terminal equipment Active CN113128612B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110463520.6A CN113128612B (en) 2021-04-26 2021-04-26 Processing method of abnormal value in power data and terminal equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110463520.6A CN113128612B (en) 2021-04-26 2021-04-26 Processing method of abnormal value in power data and terminal equipment

Publications (2)

Publication Number Publication Date
CN113128612A true CN113128612A (en) 2021-07-16
CN113128612B CN113128612B (en) 2022-11-29

Family

ID=76780423

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110463520.6A Active CN113128612B (en) 2021-04-26 2021-04-26 Processing method of abnormal value in power data and terminal equipment

Country Status (1)

Country Link
CN (1) CN113128612B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114254702A (en) * 2021-12-16 2022-03-29 南方电网数字电网研究院有限公司 Method, device, equipment, medium and product for identifying abnormal data of bus load
CN115827621A (en) * 2023-02-17 2023-03-21 河北雄安睿天科技有限公司 Water affair data management system based on cloud computing and data analysis

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105719002A (en) * 2016-01-18 2016-06-29 重庆大学 Wind turbine generator state parameter abnormity identification method based on combination prediction
CN109299156A (en) * 2018-08-21 2019-02-01 平安科技(深圳)有限公司 Electronic device, the electric power data predicting abnormality method based on XGBoost and storage medium
CN110287983A (en) * 2019-05-10 2019-09-27 杭州电子科技大学 Based on maximal correlation entropy deep neural network single classifier method for detecting abnormality
CN110363384A (en) * 2019-06-03 2019-10-22 杭州电子科技大学 Exception electric detection method based on depth weighted neural network
CN111210846A (en) * 2020-01-07 2020-05-29 重庆大学 Parkinson voice recognition system based on integrated manifold dimensionality reduction
US20200234165A1 (en) * 2018-01-26 2020-07-23 Dalian University Of Technology Prediction method for aero-engine starting exhaust temperature
CN111563615A (en) * 2020-04-17 2020-08-21 国网天津市电力公司 Load prediction method based on feature analysis and combination learning
CN111985546A (en) * 2020-08-10 2020-11-24 西北工业大学 Aircraft engine multi-working-condition detection method based on single-classification extreme learning machine algorithm
CN112084237A (en) * 2020-09-09 2020-12-15 广东电网有限责任公司中山供电局 Power system abnormity prediction method based on machine learning and big data analysis

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105719002A (en) * 2016-01-18 2016-06-29 重庆大学 Wind turbine generator state parameter abnormity identification method based on combination prediction
US20200234165A1 (en) * 2018-01-26 2020-07-23 Dalian University Of Technology Prediction method for aero-engine starting exhaust temperature
CN109299156A (en) * 2018-08-21 2019-02-01 平安科技(深圳)有限公司 Electronic device, the electric power data predicting abnormality method based on XGBoost and storage medium
CN110287983A (en) * 2019-05-10 2019-09-27 杭州电子科技大学 Based on maximal correlation entropy deep neural network single classifier method for detecting abnormality
CN110363384A (en) * 2019-06-03 2019-10-22 杭州电子科技大学 Exception electric detection method based on depth weighted neural network
CN111210846A (en) * 2020-01-07 2020-05-29 重庆大学 Parkinson voice recognition system based on integrated manifold dimensionality reduction
CN111563615A (en) * 2020-04-17 2020-08-21 国网天津市电力公司 Load prediction method based on feature analysis and combination learning
CN111985546A (en) * 2020-08-10 2020-11-24 西北工业大学 Aircraft engine multi-working-condition detection method based on single-classification extreme learning machine algorithm
CN112084237A (en) * 2020-09-09 2020-12-15 广东电网有限责任公司中山供电局 Power system abnormity prediction method based on machine learning and big data analysis

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
石莹等: ""基于云计算的电力运行大数据异常值快速检测算法"", 《电子设计工程》 *
邱明等: ""基于数据清洗与组合学习的光伏发电功率预测方法研究"", 《可再生能源》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114254702A (en) * 2021-12-16 2022-03-29 南方电网数字电网研究院有限公司 Method, device, equipment, medium and product for identifying abnormal data of bus load
CN115827621A (en) * 2023-02-17 2023-03-21 河北雄安睿天科技有限公司 Water affair data management system based on cloud computing and data analysis

Also Published As

Publication number Publication date
CN113128612B (en) 2022-11-29

Similar Documents

Publication Publication Date Title
CN115018021B (en) Machine room abnormity detection method and device based on graph structure and abnormity attention mechanism
CN113128612B (en) Processing method of abnormal value in power data and terminal equipment
CN110175541B (en) Method for extracting sea level change nonlinear trend
CN115587666A (en) Load prediction method and system based on seasonal trend decomposition and hybrid neural network
CN112418476A (en) Ultra-short-term power load prediction method
Karamizadeh et al. Using the clustering algorithms and rule-based of data mining to identify affecting factors in the profit and loss of third party insurance, insurance company auto
WO2023159760A1 (en) Convolutional neural network model pruning method and apparatus, electronic device, and storage medium
CN116205863A (en) Method for detecting hyperspectral image abnormal target
CN118134046A (en) Wind farm power prediction method and system based on machine learning
CN117131022B (en) Heterogeneous data migration method of electric power information system
CN110866672B (en) Data processing method, device, terminal and medium
CN116776209A (en) Method, system, equipment and medium for identifying operation state of gateway metering device
Wibowo et al. Food price prediction using time series linear ridge regression with the best damping factor
CN117272145A (en) Health state evaluation method and device of switch machine and electronic equipment
CN116610973A (en) Sensor fault monitoring and failure information reconstruction method and system
CN116885697A (en) Load prediction method based on combination of cluster analysis and intelligent algorithm
CN116090546A (en) Training method of energy consumption model, energy consumption characterization method and related equipment
CN114722941A (en) Credit default identification method, apparatus, device and medium
CN115169740A (en) Sequence prediction method and system of pooled echo state network based on compressed sensing
Wei et al. Sparse reduced-rank regression with adaptive selection of groups of predictors
CN113051809A (en) Virtual health factor construction method based on improved restricted Boltzmann machine
CN116723083B (en) Cloud server online fault diagnosis method and device
CN111489011A (en) Economic information processing system based on machine learning algorithm
CN112541554B (en) Multi-mode process monitoring method and system based on time constraint and nuclear sparse representation
CN118194028B (en) Low orbit satellite abnormal state identification method based on mixed probability principal component analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant