CN113128612A

CN113128612A - Processing method of abnormal value in power data and terminal equipment

Info

Publication number: CN113128612A
Application number: CN202110463520.6A
Authority: CN
Inventors: 张凯; 何胜; 郭威; 魏新杰; 吴清普; 李士林; 刘梅; 罗欣
Original assignee: Beijing Tsingsoft Technology Co ltd; State Grid Corp of China SGCC; State Grid Hebei Electric Power Co Ltd; Marketing Service Center of State Grid Hebei Electric Power Co Ltd
Current assignee: Beijing Tsingsoft Technology Co ltd; State Grid Corp of China SGCC; State Grid Hebei Electric Power Co Ltd; Marketing Service Center of State Grid Hebei Electric Power Co Ltd
Priority date: 2021-04-26
Filing date: 2021-04-26
Publication date: 2021-07-16
Anticipated expiration: 2041-04-26
Also published as: CN113128612B

Abstract

The invention is suitable for the technical field of data processing, and provides a method for processing abnormal values in electric power data and terminal equipment, wherein the method comprises the following steps: the method comprises the steps of obtaining electric power data generated in an electric power operation link, conducting data dimensionality reduction and data standardization processing on the electric power data to obtain processed standard electric power data, inputting the standard electric power data into a kernel function extreme learning model and a superposition integration model respectively to conduct outlier identification to obtain a first output value set and a second output value set, conducting dynamic analysis according to the first output value set and the second output value set by adopting a regularization-based linear regression analysis method to obtain an abnormal data set, conducting data cleaning on the abnormal data set, and obtaining cleaned electric power data. The invention can simultaneously improve the detection precision and efficiency of abnormal values in the power data and obtain the corrected power data after the abnormal data set is subjected to data cleaning.

Description

Processing method of abnormal value in power data and terminal equipment

Technical Field

The invention belongs to the technical field of data processing, and particularly relates to a method for processing abnormal values in electric power data and terminal equipment.

Background

With the gradual expansion of the electric power construction scale in five links of power generation, transmission, transformation, distribution, power utilization and the like of a power grid, a large amount of electric power data are generated, and the data play an important role in the development of the power grid in China towards digitalization and intellectualization. However, in the process of electric power operation, once the situations such as electric power equipment damage, fuse or illegal intrusion occur, the electric power data is abnormal, which may affect normal power consumption, line loss increase and electric quantity loss of users, and may cause fire, affect the stability and safety of the power grid, and bring serious economic loss and reputation loss to power grid enterprises. Effective outlier detection of power data has been frustrating.

However, the detection of abnormal values in the currently employed power data cannot achieve the dual objectives of detection accuracy and efficiency at the same time.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method for processing an abnormal value in power data and a terminal device, which are used to solve the problem in the prior art that the dual objectives of detection accuracy and efficiency cannot be achieved simultaneously.

To achieve the above object, a first aspect of an embodiment of the present invention provides a method for processing an abnormal value in power data, including:

acquiring power data generated in a power operation link;

performing data dimensionality reduction and data standardization processing on the electric power data to obtain processed standard electric power data;

respectively inputting the standard power data into a kernel function extreme learning model and a superposition integration model for abnormal value identification to obtain a first output value set and a second output value set;

according to the first output value set and the second output value set, performing dynamic analysis by adopting a regularization-based linear regression analysis method to obtain an abnormal data set;

and carrying out data cleaning on the abnormal data set to obtain cleaned power data.

A second aspect of an embodiment of the present invention provides an apparatus for processing an abnormal value in power data, including:

the acquisition module is used for acquiring power data generated in a power operation link;

the first processing module is used for carrying out data dimensionality reduction and data standardization processing on the electric power data to obtain processed standard electric power data;

the second processing module is used for inputting the standard power data into the kernel function extreme learning model and the superposition integration model respectively for abnormal value identification to obtain a first output value set and a second output value set;

the third processing module is used for carrying out dynamic analysis by adopting a linear regression analysis method based on regularization according to the first output value set and the second output value set to obtain an abnormal data set;

and the data cleaning module is used for cleaning the data of the abnormal data set to obtain the cleaned power data.

A third aspect of an embodiment of the present invention provides a terminal device, including: a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the method for processing abnormal values in power data as described in the first aspect.

Compared with the prior art, the embodiment of the invention has the following beneficial effects: according to the method, the abnormal values are identified by respectively inputting the standard power data into the kernel function limit learning model and the superposition integration model, the first output value set and the second output value set are obtained, dynamic analysis is carried out by adopting a regularization-based linear regression analysis method according to the first output value set and the second output value set, the abnormal data set is obtained, the problems that the abnormal data set is easily trapped into a local minimum value and a statistical hypothesis space is limited due to the fact that only a single model is adopted to obtain the abnormal data set can be solved, the accuracy and the reliability of the abnormal value detection in the power data are improved, and meanwhile the efficiency of the abnormal value detection in the power data is improved. In addition, the abnormal data set is subjected to data cleaning, and modified power data after the data cleaning can be obtained.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic flow chart of an implementation of a method for processing an abnormal value in power data according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a basic architecture of cloud computing provided by an embodiment of the present invention;

FIG. 3 is a schematic diagram of a data warehouse provided by an embodiment of the invention;

FIG. 4 is a basic architecture diagram of a kernel function extreme learning model according to an embodiment of the present invention;

FIG. 5 is a comparison graph of the results of three algorithms provided by the embodiment of the present invention;

FIG. 6 is a schematic diagram of an apparatus for processing abnormal values in power data provided by an embodiment of the present invention;

fig. 7 is a schematic diagram of a terminal device according to an embodiment of the present invention.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.

In order to explain the technical means of the present invention, the following description will be given by way of specific examples.

Fig. 1 is a schematic flow chart illustrating an implementation of a method for processing an abnormal value in power data according to an embodiment of the present invention, which is described in detail below.

Step 101, acquiring power data generated in a power operation link.

Important hidden information can be mined from the power data, and the information plays an important role in strategic decision making, troubleshooting, operation cost reduction, safe and stable operation maintenance of a power grid and long-term development of an enterprise of the power enterprise. Due to the fact that the power data has the characteristic of large volume, abnormal values in the power data can be processed by adopting a basic framework of cloud computing as shown in fig. 2 based on the storage and computing capacity of the cloud computing on the ultra-large scale data.

Optionally, the obtaining of the power data generated in the power operation link may include: acquiring source power data generated in a power operation link; extracting, converting and loading the source power data to obtain power data; the power data is saved in a data repository.

Abnormal value analysis is carried out on data of 5 power operation links such as power generation, transmission, transformation, distribution and power utilization of a power grid, the power operation data of the links are stored in different data sources, and therefore integration is needed to obtain the power data generated in the power operation links. The integrated acquisition of the power data is mainly realized through data warehouse technology. The data warehouse structure is shown in fig. 3.

The purpose of the data warehouse is to construct an analysis-oriented integrated data environment and provide decision support for a power grid enterprise, and the key of the data warehouse is an Extract-Transform-Load (ETL), which is a process that data is extracted (Extract), converted (Transform), and loaded (Load) from a source end to a destination end. ETL is a pipeline of data warehouse, also known as blood of data warehouse, that maintains the metabolism of data in the data warehouse, and most of the effort of daily management and maintenance work of the data warehouse is to maintain the normal and stable operation of ETL.

And 102, performing data reduction and data standardization processing on the power data to obtain processed standard power data.

In the process of acquiring the power data, the quality of the acquired power data cannot meet the standard of subsequent abnormal value detection and analysis due to the influence of various factors, and various problems always exist, such as maximum value, minimum value, null value, noise, data inconsistency, data confusion and the like, so that preprocessing is required, and the processing process comprises data dimension reduction and data standardization.

The dimensionality reduction refers to dimensionality reduction, namely, original data characteristic variables are represented by fewer new characteristic variables, so that the new characteristic variables are not related to each other, and the new variables also have main information capable of reflecting business problems. The purpose of data dimensionality reduction is to reduce the amount of data, reduce computational complexity, and remove noisy data therefrom.

Data from different sources, unit or dimension are different, and different unit or dimension data cannot be directly compared and analyzed, so data standardization processing is required to eliminate the unit or dimension of the data from different sources.

Optionally, performing data reduction and data standardization processing on the power data to obtain processed standard power data, which may include: constructing a data covariance matrix according to the power data; performing data dimensionality reduction according to the eigenvalue of the data covariance matrix to obtain new data subjected to dimensionality reduction; determining the maximum value and the minimum value in the new data after dimensionality reduction; and according to the maximum value and the minimum value, sequentially carrying out standardization processing on each datum in the new data after dimensionality reduction to obtain processed standard electric power data.

Optionally, performing data dimension reduction according to the eigenvalue of the data covariance matrix to obtain new data after dimension reduction, which may include: and calculating eigenvalues and eigenvalues of the data covariance matrix, sorting the eigenvalues from large to small according to the contribution degree, taking the first K eigenvalues as principal components, converting the electric power data into a new data space constructed by new eigenvectors, finishing data dimension reduction, and obtaining new data after dimension reduction.

Optionally, according to the maximum value and the minimum value, sequentially standardizing each data in the new data after the dimensionality reduction to obtain processed standard power data, which may include:

according to

Obtaining processed standard electric power data;

wherein, a' represents the processed standard power data, a represents the new data after the current dimension reduction to be processed, and min a_nDenotes the minimum value, max a_nRepresents the maximum value, a_nAll the new dimension-reduced data to be processed are shown, and n is the number of the new dimension-reduced data and is a positive integer.

Step 103, respectively inputting the standard power data into the kernel function limit learning model and the superposition integration model for abnormal value identification, and obtaining a first output value set and a second output value set.

Optionally, the step of inputting the standard power data into the kernel function limit learning model and the superposition integration model respectively for abnormal value identification to obtain the first output value set may include:

setting a preset constant, carrying out optimal second-order solution processing on the weights of a hidden layer and an output layer in the kernel function extreme learning model based on the preset constant to obtain a new weight, updating the kernel function extreme learning model according to the new weight to obtain a new kernel function extreme learning model, and inputting standard power data into the new kernel function extreme learning model to obtain a first output value set.

The kernel function extreme learning model belongs to a single-layer feedforward neural network algorithm. The basic extreme learning model is expressed in the form of:

f(x)＝h(x)β；

where h (x) represents the calculated output of the hidden layer. Beta is ═ beta₁,β₂,…,β_L]^TRepresenting the connection weight between the hidden layer and the output layer. The error expression of the extreme learning model is as follows:

wherein L represents the number of neurons, f_O(x) Representing a genuine mark. The basic architecture of the extreme learning model is shown in fig. 4, and the output function can be expressed as:

wherein, g_i(x) And G (b)_i,c_iX) represents the output function of the i-th hidden node, b_i,c_iRepresenting a hidden layer parameter, beta_iRepresenting the connection weight between the hidden layer and the output layer,training the feedforward neural network requires an optimal quadratic solution of the weights:

wherein, T represents a preset constant, H represents a neural network hidden layer matrix, and based on the preset constant, optimal second-order solution processing is performed on the weights of the hidden layer and the output layer in the kernel function extreme learning model to obtain a new weight, which may include:

according to

A new weight is obtained.

Wherein, beta represents the new weight,

is a generalized inverse matrix of H, H^TAnd C is a transposed matrix of H and represents a second preset constant. And increasing the constant 1/C based on the second preset constant C, so that the solving result of the new weight has better generalization capability.

Illustratively, the kernel function employed in the present embodiment may be a gaussian kernel function Ω_ELMThe expression of the gaussian kernel function extreme learning model is as follows:

where N represents the input layer dimension.

Optionally, the step of inputting the standard power data into the kernel function limit learning model and the superposition integration model respectively to perform outlier identification to obtain a second output value set may include: k sets of training data are obtained from the standard power data, and an initial weight for each set of training data. And training the preset learning model according to the initial weight of each group of training data and the k groups of training data to obtain the sum of the output errors of the first group of weak learning models and the first group of weak learning models in the superposition integration model. And updating the weight of each group of training data according to the sum of the output errors of the first group of weak learning models, and continuing training according to the updated weight of each group of training data and the k groups of training data until T groups of weak learning models are obtained. And combining the T groups of weak learning models to obtain a strong learning model. And inputting standard power data into the strong learning model to obtain a second output value set.

For example, the superposition integration model may be an Adaboost model, which trains a plurality of weak learning models, and then combines the weak learning models to form a strong learning model. The general idea is that a lower weight is given to a correct sample, a higher weight is given to an incorrect sample, and the model performance is improved through continuous weighted combination.

For example, k sets of training data may be obtained from the standard power data, and an initial weight D for each set of training data₁(i) 1/k. When a t-th weak learning model is trained, k groups of training data are used for training a decision tree, so that the sum of output errors of the t-th weak learning model is e_t＝D₁(i) Calculating the weight alpha of the t-th weak learning model according to the sum of the output errors of the t-th weak learning model_tThe weight calculation formula is:

weight α according to the t-th weak learning model_tUpdating the weight of each group of training data, wherein the updating formula is as follows:

wherein i is 1,2, …, k, B_tRepresenting a normalization factor.

Training T rounds to obtain T groups of weak learning models f (g)_t,α_t) And combining the T groups of weak learning models to obtain a strong learning model q (x):

optionally, the standard power data may be input into the neural network model to perform outlier identification, and the result obtained through the neural network model identification and the result obtained through the superposition integration model identification are combined to obtain the second output value set.

And 104, performing dynamic analysis by adopting a linear regression analysis method based on regularization according to the first output value set and the second output value set to obtain an abnormal data set.

Optionally, performing dynamic analysis by using a linear regression analysis method based on regularization according to the first output value set and the second output value set to obtain an abnormal data set, where the method includes:

and obtaining an abnormal data set according to the Y as X delta + epsilon.

Where Y denotes an abnormal data set, and X ═ X (X)⁽¹⁾,X⁽²⁾)，X⁽¹⁾Representing a first set of output values, X⁽²⁾Representing a second set of output values, δ representing a regression coefficient, and ε representing a random error term.

The regularization-based linear regression analysis method can be a Lasso regression method, and the main idea is to minimize the sum of squares of residuals under the condition that the sum of absolute values of correlation coefficients is less than a threshold value.

Wherein, X⁽²⁾The second output value set obtained by inputting the standard power data into the superposition integration model for abnormal value identification can be represented, or the second output value set obtained by inputting the standard power data into the neural network model and the superposition integration model for abnormal value identification can be represented. Random error term ═ epsilon₁,ε₂,…,ε_m)^T，ε_i～N(0,σ²) The regression coefficient δ ═ δ (δ)₁,δ₂,…,δ_d)^TAnd m and d represent the number of regression coefficients. Lasso regression method in which L is increased₁Penalty term, resulting in a Lasso estimate:

wherein,

representing dynamic weights corresponding to the first or second set of output values, k representing a tuning coefficient, δ_jRepresenting the weight of the jth learning algorithm, and solving the dynamic weight in the process of training the Lasso model according to the first output value set and the second output value set

Dynamic weights

And (4) finishing the establishment of the representative Lasso model after the calculation, namely obtaining an abnormal data set according to the trained Lasso model, the first output value set and the second output value set.

In this embodiment, the abnormal value identification results of the kernel function extreme learning model and the superposition integration model are further learned in a Lasso linear combination manner, or the abnormal value identification results of the kernel function extreme learning model, the neural network model and the superposition integration model are further learned in a Lasso linear combination manner, so that the advantages of each combination model can be learned, and the problems that an abnormal data set is easily trapped in a local minimum value and a statistical hypothesis space is limited due to the fact that only a single model is used for obtaining the abnormal data set are solved.

Optionally, in order to maintain real-time performance of the method for processing the abnormal value in the power data, when the method for obtaining the abnormal data set in the embodiment satisfies a certain time threshold or reaches a certain error threshold, the first output value set and the second output value set may be updated according to the method for obtaining the first output value set and the second output value set, and the dynamic weights of the updated first output value set and the updated second output value set may be solved according to the above formula according to the updated first output value set and the updated second output value set. Therefore, an abnormal data identification model with time sequence rolling is obtained, so that the abnormal data identification model of the embodiment can learn the structural characteristics of the latest data, and the mutual matching of the current model and the data is ensured.

And 105, performing data cleaning on the abnormal data set to obtain cleaned power data.

Optionally, the data cleaning is performed on the abnormal data set to obtain cleaned power data, and the method includes:

according to

And obtaining the cleaned power data.

Wherein L is_d,tRepresents the power data after the cleaning at the time t on the day d, y_d-g,tDenotes the load observation data, lambda, at time t on day d-g_gRepresents the weight of the influence of the load observation data at the d-g day t on the power data to be cleaned at the d day t, g is 1,2, …, m_a，m_aIndicating the number of similar days selected.

In this embodiment, the load abnormal value in the power data may be identified based on the abnormal data set, and on this basis, the load abnormal value in the identified power data may be corrected by using a weighted average method.

The method for processing the abnormal value in the power data will be further described below by specific examples.

A daily power consumption data table of nearly 20000 users in a year of 2018 is obtained, 77798020 pieces of data are selected from the power consumption data to serve as test samples, and the distribution of the samples is shown in table 1.

TABLE 1 sample distribution

Categories	Abnormality (S)Data of	Normal data
			Number of	26562555	51235465

The overall index testing method is as follows:

1) accuracy of detection

When the sample data set used for algorithm performance detection is unbalanced in type, if the accuracy of the classification result is directly adopted for algorithm performance evaluation, a larger error exists between the obtained result and the actual result, so that detection accuracy analysis is performed by adopting (Receiver Operating Characteristic, ROC) and (Area open rock boundary, AUC).

ROC is a curve drawn on a two-dimensional plane, ROC curve. The abscissa of the plane is False Positive Rate (FPR) and the ordinate is True Positive Rate (TPR). For a classifier, a TPR and FPR point pair may be derived from its performance on the test sample. Thus, the classifier can be mapped to a point on the ROC plane. By adjusting the threshold used in the classification of the classifier, a curve passing through (0,0) and (1,1) can be obtained, which is the ROC curve of the classifier. However, it is always desirable to have a value to indicate the quality of the classifier. The AUC then appears.

The AUC value is the area of the region covered by the ROC curve, and it is obvious that the larger the AUC is, the better the classifier classification effect is.

AUC 1 is a perfect classifier, and when this prediction model is used, a perfect prediction can be obtained regardless of what threshold is set. In most prediction scenarios, no perfect classifier exists.

0.5< AUC <1, superior to random guess. This classifier (model) can be predictive if it sets the threshold value properly.

AUC is 0.5, as the follower guesses, the model has no predictive value.

AUC <0.5, worse than random guess; but is better than random guessing as long as it always works against prediction.

2) Efficiency of detection

The detection efficiency refers to the time consumed by the algorithm in each operating period, and the shorter the time is, the higher the detection efficiency is.

To better illustrate the advantages of the processing method for abnormal values in the power data of the embodiment in terms of detection accuracy and calculation efficiency, a comparison algorithm is adopted: the method (model 1) for processing abnormal values in the power data, the gaussian process algorithm (model 2) and the random forest algorithm (model 3) of the present embodiment perform simultaneous calculation.

The results were analyzed as follows:

1) accuracy of detection

As can be seen from fig. 5, when the model 1 is used to detect abnormal values in the power data, the ROC curve drawn according to the detection result is closer to the upper left corner, and the AUC value is 0.8025, the ROC and AUC perform better than the results obtained by the detection using the models 2 and 3, thereby illustrating that the algorithm of the embodiment of the present invention has better detection accuracy.

2) Efficiency of detection

As can be seen from table 2, the time consumed for detecting the abnormal value of the large data of the power operation is 85s when the model 1 runs for one cycle, and the detection time is less than that of the other 2 methods. Therefore, the algorithm of the embodiment of the invention has high detection efficiency.

TABLE 2 calculation of time

According to the processing method of the abnormal values in the power data, the standard power data are respectively input into the kernel function limit learning model and the superposition integration model to identify the abnormal values to obtain the first output value set and the second output value set, and the abnormal data set is obtained by adopting a regularization-based linear regression analysis method to perform dynamic analysis according to the first output value set and the second output value set. In addition, the abnormal data set is subjected to data cleaning, and modified power data after the data cleaning can be obtained.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

Fig. 6 is a diagram showing an example of a processing apparatus for processing an abnormal value in power data according to an embodiment of the present invention, which corresponds to the processing method for an abnormal value in power data described in the above embodiment. As shown in fig. 6, the apparatus may include: an acquisition module 61, a first processing module 62, a second processing module 63, a third processing module 64, and a data cleansing module 65.

An obtaining module 61, configured to obtain power data generated in a power operation link;

the first processing module 62 is configured to perform data dimensionality reduction and data standardization processing on the power data to obtain processed standard power data;

the second processing module 63 is configured to input the standard power data into the kernel function limit learning model and the superposition integration model respectively for abnormal value identification, so as to obtain a first output value set and a second output value set;

a third processing module 64, configured to perform dynamic analysis by using a linear regression analysis method based on regularization according to the first output value set and the second output value set, so as to obtain an abnormal data set;

and the data cleaning module 65 is configured to perform data cleaning on the abnormal data set to obtain cleaned power data.

Optionally, the obtaining module 61 may be configured to obtain source power data generated in a power operation link; extracting, converting and loading the source power data to obtain power data; saving the power data in a data repository.

Optionally, the first processing module 62 may be configured to construct a data covariance matrix according to the power data; performing data dimensionality reduction according to the eigenvalue of the data covariance matrix to obtain new data subjected to dimensionality reduction; determining the maximum value and the minimum value in the new data after dimension reduction; and sequentially carrying out standardization processing on each datum in the new data after dimensionality reduction according to the maximum value and the minimum value to obtain processed standard electric power data.

Optionally, the first processing module 62 may be adapted to process the data according to

Obtaining processed standard electric power data;

Optionally, the second processing module 63 may be configured to set a preset constant;

based on the preset constant, carrying out optimal second-product solution processing on the weights of the hidden layer and the output layer in the kernel function extreme learning model to obtain a new weight;

updating the kernel function extreme learning model according to the new weight to obtain a new kernel function extreme learning model;

inputting the standard power data into the new kernel function limit learning model to obtain a first output value set;

acquiring k groups of training data and the initial weight of each group of training data according to the standard power data;

training a preset learning model according to the initial weight of each group of training data and the k groups of training data to obtain the sum of a first group of weak learning models in the superposition integration model and the output error of the first group of weak learning models;

updating the weight of each group of training data according to the sum of the output errors of the first group of weak learning models, and continuing to train according to the updated weight of each group of training data and the k groups of training data until T groups of weak learning models are obtained;

combining the T groups of weak learning models to obtain a strong learning model;

and inputting the standard power data into the strong learning model to obtain a second output value set.

Optionally, a second processing module 63 may be used according to

Obtaining a new weight;

wherein beta represents a new weight, H represents a neural network hidden layer matrix,

is a generalized inverse matrix of H, H^TAnd the transpose matrix is H, T represents a preset constant, and C represents a second preset constant.

Optionally, the third processing module 64 may be configured to obtain an abnormal data set according to Y ═ X δ + epsilon;

where Y denotes an abnormal data set, and X ═ X (X)⁽¹⁾,X⁽²⁾)，X⁽¹⁾Representing said first set of output values, X⁽²⁾Represents the second set of output values, δ represents a regression coefficient, and ε represents a random error term.

Optionally, a data cleansing module 65 may be used according to

Obtaining cleaned power data;

wherein L is_d,tIndicates day d, time tPower data, y, after washing_d-g,tDenotes the load observation data, lambda, at time t on day d-g_gRepresents the weight of the influence of the load observation data at the d-g day t on the power data to be cleaned at the d day t, g is 1,2, …, m_a，m_aIndicating the number of similar days selected.

According to the processing device for the abnormal values in the power data, the standard power data are respectively input into the kernel function limit learning model and the superposition integration model to identify the abnormal values, so that a first output value set and a second output value set are obtained, dynamic analysis is carried out by adopting a regularization-based linear regression analysis method according to the first output value set and the second output value set, so that an abnormal data set is obtained, the problems that the abnormal data set is easy to fall into a local minimum value and a statistical hypothesis space is limited due to the fact that only a single model is adopted to obtain the abnormal data set can be avoided, and the efficiency of detecting the abnormal values in the power data is improved while the accuracy and reliability of detecting the abnormal values in the power data are improved. In addition, the abnormal data set is subjected to data cleaning, and modified power data after the data cleaning can be obtained.

Fig. 7 is a schematic diagram of a terminal device according to an embodiment of the present invention. As shown in fig. 7, the terminal device 700 of this embodiment includes: a processor 701, a memory 702, and a computer program 703, such as a processing program for abnormal values in power data, stored in the memory 702 and executable on the processor 701. The processor 701 implements the steps in the embodiment of the processing method for the abnormal value in the power data, such as the steps 101 to 105 shown in fig. 1, when executing the computer program 703, and the processor 701 implements the functions of the modules in the embodiments of the apparatuses, such as the modules 61 to 65 shown in fig. 6, when executing the computer program 703.

Illustratively, the computer program 703 may be partitioned into one or more program modules, which are stored in the memory 702 and executed by the processor 701 to implement the present invention. The one or more program modules may be a series of computer program instruction segments capable of performing a specific function, which are used to describe the execution process of the computer program 703 in the processing apparatus or the terminal device 700 of the abnormal value in the power data. For example, the computer program 703 may be divided into the obtaining module 61, the first processing module 62, the second processing module 63, the third processing module 64, and the data cleansing module 65, and specific functions of the modules are shown in fig. 6, which is not described herein again.

The terminal device 700 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The terminal device may include, but is not limited to, a processor 701, a memory 702. Those skilled in the art will appreciate that fig. 7 is merely an example of a terminal device 700 and does not constitute a limitation of terminal device 700 and may include more or fewer components than shown, or some components may be combined, or different components, e.g., the terminal device may also include input-output devices, network access devices, buses, etc.

The Processor 701 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The storage 702 may be an internal storage unit of the terminal device 700, such as a hard disk or a memory of the terminal device 700. The memory 702 may also be an external storage device of the terminal device 700, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the terminal device 700. Further, the memory 702 may also include both an internal storage unit and an external storage device of the terminal device 700. The memory 702 is used for storing the computer programs and other programs and data required by the terminal device 700. The memory 702 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. . Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims

1. A method for processing an abnormal value in power data, comprising:

acquiring power data generated in a power operation link;

2. The method for processing abnormal values in power data according to claim 1, wherein the acquiring of the power data generated in the power operation link comprises:

acquiring source power data generated in a power operation link;

extracting, converting and loading the source power data to obtain power data;

saving the power data in a data repository.

3. The method for processing the abnormal value in the power data according to claim 1, wherein the step of performing data de-dimensionality and data normalization on the power data to obtain the processed standard power data comprises:

constructing a data covariance matrix according to the power data;

performing data dimensionality reduction according to the eigenvalue of the data covariance matrix to obtain new data subjected to dimensionality reduction;

determining the maximum value and the minimum value in the new data after dimension reduction;

and sequentially carrying out standardization processing on each datum in the new data after dimensionality reduction according to the maximum value and the minimum value to obtain processed standard electric power data.

4. The method for processing abnormal values in power data according to claim 3, wherein the step of sequentially normalizing each datum in the new reduced-dimension datum according to the maximum value and the minimum value to obtain processed standard power data comprises the steps of:

according to

Obtaining processed standard electric power data;

wherein, a' represents the processed standard power data, a represents the new data after the dimensionality reduction to be processed currently, mina_nDenotes the minimum value, maxa_nRepresents the maximum value, a_nAll the new dimension-reduced data to be processed are shown, n is the number of the new dimension-reduced data, and is positiveAn integer number.

5. The method for processing the abnormal value in the power data according to any one of claims 1 to 4, wherein the step of inputting the standard power data into a kernel function extreme learning model and a superposition integration model respectively for abnormal value identification to obtain a first output value set and a second output value set comprises the steps of:

setting a preset constant;

6. The method for processing the abnormal value in the power data according to claim 5, wherein the performing an optimal second-order solution process on the weights of the hidden layer and the output layer in the kernel function limit learning model based on the preset constant to obtain a new weight comprises:

according to

Obtaining a new weight;

7. The method for processing abnormal values in power data according to claim 6, wherein the dynamic analysis is performed by a linear regression analysis method based on regularization according to the first output value set and the second output value set, so as to obtain an abnormal data set, and the method comprises:

obtaining an abnormal data set according to the Y-X delta + epsilon;

8. The method for processing abnormal values in power data according to claim 5, wherein the step of performing data cleaning on the abnormal data set to obtain cleaned power data comprises the following steps:

according to

Obtaining cleaned power data;

wherein L is_d,tRepresents the power data after the cleaning at the time t on the day d, y_d-g,tDenotes the load observation data, lambda, at time t on day d-g_gRepresenting the weight of the influence of the load observation data at the time t of the d-g days on the power data to be cleaned at the time t of the d day, g ═1,2,…,m_a，m_aIndicating the number of similar days selected.

9. An apparatus for processing an abnormal value in power data, comprising:

10. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 8 when executing the computer program.