CN113051256A

CN113051256A - Method, device and equipment for filling missing data of user

Info

Publication number: CN113051256A
Application number: CN202110302543.9A
Authority: CN
Inventors: 马浩; 付文杰; 申洪涛; 杨迪; 马红明; 刘林青
Original assignee: State Grid Corp of China SGCC; Electric Power Research Institute of State Grid Hebei Electric Power Co Ltd; Marketing Service Center of State Grid Hebei Electric Power Co Ltd
Current assignee: State Grid Corp of China SGCC; Electric Power Research Institute of State Grid Hebei Electric Power Co Ltd; Marketing Service Center of State Grid Hebei Electric Power Co Ltd
Priority date: 2021-03-22
Filing date: 2021-03-22
Publication date: 2021-06-29

Abstract

The invention discloses a method, a device and equipment for filling missing data of a user, wherein the method comprises the following steps: extracting power data from an original power data matrix to be filled through a preset sliding time window to form a time window data matrix; pre-filling missing data in the time window data matrix based on an interpolation algorithm to obtain a pre-filled matrix; determining a first user in the pre-populated matrix, wherein the first user is a target user for which there is missing power data; for each first user, searching a plurality of second users similar to the power data of the first user in a pre-filling matrix, forming the power data of the first user and the power data of the plurality of second users into a similar matrix corresponding to the first user, determining filling values of missing data of the first user in the similar matrix corresponding to the first user based on a singular value threshold algorithm, and filling the filling values into corresponding positions in an original power data matrix, so that the filling precision of the missing data of the users can be improved.

Description

Method, device and equipment for filling missing data of user

Technical Field

The present invention relates to the field of power technologies, and in particular, to a method, an apparatus, and a device for filling missing data of a user.

Background

With the continuous promotion of energy internet construction and the rapid development of smart grid technology, the requirements of the distribution network side on the storage, analysis and mining of electric power data are increasing day by day. In the whole actual acquisition and transmission link of a power grid, in the acquisition, transmission and storage processes of power consumer data, due to the reasons of machine sensing equipment failure, transmission network delay, storage processing errors and the like, data loss generally exists, and further analysis, mining and modeling of power data in the later period are seriously influenced.

The traditional missing data filling method is used for filling according to local characteristics of data, such as a linear interpolation method and the like, and is easily interfered by noise and abnormal data, and the filling precision is low.

Disclosure of Invention

In view of this, the present invention provides a method, an apparatus, and a device for filling missing data of a user, and aims to solve the problem of low precision in filling missing data of a user in the prior art.

A first aspect of an embodiment of the present invention provides a method for filling missing data of a user, including:

extracting power data from an original power data matrix to be filled through a preset sliding time window to form a time window data matrix, wherein the original power data matrix comprises appointed power data of each target user acquired at a plurality of preset sampling moments;

pre-filling missing data in the time window data matrix based on an interpolation algorithm to obtain a pre-filled matrix;

determining a first user in the pre-populated matrix, wherein the first user is a target user for which there is missing power data;

for each first user, searching a plurality of second users similar to the power data of the first user in the pre-filling matrix, forming the power data of the first user and the power data of the plurality of second users into a similar matrix corresponding to the first user, and determining a filling value of missing data of the first user in the similar matrix corresponding to the first user based on a singular value threshold algorithm;

filling the filling value of the missing data of each first user into the corresponding position in the original power data matrix.

Optionally, searching for a plurality of second users similar to the first user power data in the pre-populated matrix includes:

and classifying the power data of each user in the pre-filling matrix based on a K nearest neighbor classification algorithm, and determining a plurality of second users similar to the power data of the first user according to the classification result.

Optionally, the similarity degree formula for evaluating similarity of the two pieces of user power data in the K-nearest neighbor classification algorithm is as follows:

d＝1-|ρ|

wherein d is the similarity degree of similarity of two user power data, p is the similarity measurement,

x 'and y' being redefinedThe power data vectors of the two users, cov (x ', y'), are the covariance of x 'and y', σ_xIs the standard deviation of x_yIs the standard deviation of y ', E (x ') is the average of x ', E (y ') is the average of y '; x ═ wx, y ═ wy, x, y are the power data vectors of two users, x ═ y₁,x₂,…,x_t]，y＝[y₁,y₂,…,y_t]W is a vector of weight coefficients, w ═ w₁,w₂,…,w_t]，

q is a constant less than 1.

Optionally, the number of the second users is a preset number, and a value range of the preset number is [40, 50 ].

Optionally, the value range of the time length of the preset sliding time window is [60, 90] days.

Optionally, after the preset sliding time window extracts data from the original power data matrix to be filled to form a time window data matrix, the method further includes:

carrying out data deduplication processing and abnormal value discarding processing on the time window data matrix;

the pre-filling missing data in the time window data matrix based on the interpolation algorithm to obtain a pre-filled matrix, comprising:

and pre-filling missing data in the time window data matrix after the data deduplication processing and the abnormal value discarding processing based on an interpolation algorithm to obtain the pre-filled matrix.

Optionally, the method further includes:

marking the position of the missing data in the time window data matrix;

the determining a first user in the pre-populated matrix comprises:

based on the location, a first user in the pre-populated matrix is determined.

Optionally, the specified power data includes at least one of: the positive active total electric quantity, the peak valley-leveling moment electric quantity, the active power, the ABC three-phase active power, the reactive power and the ABC three-phase reactive power.

Optionally, the interpolation algorithm comprises a linear interpolation method.

Optionally, the original power data matrix is M e R^t×nAnd t is the total number of the preset sampling time, and n is the total number of the target users.

Optionally, the method further includes:

acquiring appointed power data of each target user at a plurality of preset sampling moments according to a preset period;

and adjusting the same type of specified power data into the same format, and establishing the original power data matrix based on the adjusted specified power data.

Optionally, before extracting power data from the original power data matrix to be filled through a preset sliding time window to form a time window data matrix, the method further includes:

acquiring the original electric power data matrix from a data source platform;

after padding the padding values of the missing data of each first user to the respective position in the original power data matrix, the method further comprises:

and updating the filled original power data matrix to the data source platform.

A second aspect of the embodiments of the present invention provides a device for filling missing data of a user, including:

the system comprises an extraction module, a time window data matrix and a data processing module, wherein the extraction module is used for extracting power data from an original power data matrix to be filled through a preset sliding time window to form the time window data matrix, and the original power data matrix comprises appointed power data of each target user acquired at a plurality of preset sampling moments;

a pre-filling module, configured to pre-fill missing data in the time window data matrix based on an interpolation algorithm to obtain a pre-filled matrix;

a populating module to determine a first user in the pre-populated matrix, wherein the first user is a target user for which there is missing power data;

the filling module is further configured to search, for each first user, a plurality of second users similar to the power data of the first user in the pre-filling matrix, construct a similarity matrix corresponding to the first user from the power data of the first user and the power data of the plurality of second users, and determine a filling value of missing data of the first user in the similarity matrix corresponding to the first user based on a singular value threshold algorithm;

the filling module is further configured to fill the filling value of the missing data of each first user into a corresponding position in the original power data matrix.

A third aspect of the embodiments of the present invention provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the user missing data padding method according to the first aspect when executing the computer program.

A fourth aspect of the embodiments of the present invention provides a computer-readable storage medium, which stores a computer program, and the computer program, when executed by a processor, implements the steps of the user missing data padding method according to the first aspect.

Compared with the prior art, the invention has the following beneficial effects:

the method comprises the steps of extracting power data from an original power data matrix to be filled through a preset sliding time window to form a time window data matrix, wherein the original power data matrix comprises appointed power data of each target user acquired at a plurality of preset sampling moments; pre-filling missing data in the time window data matrix based on an interpolation algorithm to obtain a pre-filled matrix; determining a first user in the pre-populated matrix, wherein the first user is a target user for which there is missing power data; for each first user, searching a plurality of second users similar to the power data of the first user in a pre-filling matrix, forming the power data of the first user and the power data of the plurality of second users into a similar matrix corresponding to the first user, and determining a filling value of missing data of the first user in the similar matrix corresponding to the first user based on a singular value threshold algorithm; filling values of missing data of each first user into corresponding positions in an original power data matrix, pre-filling the missing data by using an interpolation algorithm, searching users with similar data based on the pre-filled data, and re-filling by using a singular value threshold algorithm, so that the filling precision of the missing data of the users can be improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

FIG. 1 is a diagram illustrating an application environment of a method for filling missing data in a user according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating an implementation of a method for filling missing data in a user according to an embodiment of the present invention;

FIG. 3 is a graph illustrating the fill error and time consumption for data filling with different numbers of similar users according to an embodiment of the present invention;

FIG. 4 is a comparison graph of fill errors for data filling with different sliding time windows for five specific power data, according to an embodiment of the present invention;

fig. 5 is a graph comparing filling errors of filling power data with different data loss rates by using the user missing data filling method, the classical interpolation method, the conventional singular value threshold algorithm, and the secondary filling method provided in this embodiment;

fig. 6 is a schematic structural diagram of a user missing data padding apparatus according to an embodiment of the present invention;

fig. 7 is a schematic diagram of an electronic device provided by an embodiment of the invention.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.

In order to make the objects, technical solutions and advantages of the present invention more apparent, the following description is made by way of specific embodiments with reference to the accompanying drawings.

With the continuous promotion of energy internet construction and the rapid development of smart grid technology, the requirements of a power grid distribution side on the storage, analysis and mining of 10kV special power transmission marketing data are increasing day by day. The 10kV special transformer user data comprises more types of electric power data, longer data acquisition period, larger data value and 10-order difference between the same type of data⁶These characteristics make the processing of the 10kV proprietary subscriber data also often different from the normal power data. The 10V special transformer user data contains abundant information resources, and has all-round values from accurate positioning of power consumption customers, power production feedback guidance and accurate reduction of national economy for analyzing the user data.

In the process of collecting, transmitting and storing the 10kV special transformer user data, due to the reasons of machine sensing equipment failure, transmission network delay, storage processing error and the like, data loss can occur, and the analysis, mining and modeling of the power data in the later period are further seriously influenced. The existing data filling method usually considers the local characteristics of data and is easily interfered by noise and abnormal data; with the development of artificial intelligence technology, some scholars adopt a deep learning method to repair missing data, for example, lost data is filled based on a neural network algorithm, although the method has global characteristics, more computing resources are required to be consumed in the use process, and the problem of user data privacy safety exists.

Fig. 1 is an application environment diagram of a method for filling missing data by a user according to an embodiment of the present invention. The user missing data filling method provided by the embodiment of the invention can be applied to the power system in the application environment diagram but is not limited to the application environment diagram. The power system comprises a power data acquisition device 11, a data source platform 12 and an electronic device 13. The data source platform 12 and the electronic device 13 may be the same device or different devices, and are not limited herein.

The power data acquisition device 11 is configured to acquire power data of a user and send the acquired power data to the data source platform 12. The data source platform 12 is used to store power data, such as power data for each customer in a database, where all or part of the power data may constitute the raw power data matrix to be populated. The electronic device 12 is configured to obtain an original power data matrix from the data source platform 12, fill power data missing by a user in the original power data matrix, and send the filled original power data matrix to the data source platform 12. The data source platform 12 updates the database based on the populated raw power data matrix. The electronic device 12 may extract power data from an original power data matrix to be filled through a preset sliding time window to form a time window data matrix, where the original power data matrix includes designated power data of each target user acquired at a plurality of preset sampling moments; pre-filling missing data in the time window data matrix based on an interpolation algorithm to obtain a pre-filled matrix; determining a first user in the pre-populated matrix, wherein the first user is a target user for which there is missing power data; for each first user, searching a plurality of second users similar to the power data of the first user in a pre-filling matrix, forming the power data of the first user and the power data of the plurality of second users into a similar matrix corresponding to the first user, and determining a filling value of missing data of the first user in the similar matrix corresponding to the first user based on a singular value threshold algorithm; and filling values of the missing data of each first user into corresponding positions in the original power data matrix.

The power data collection device 11 may be an electromechanical integrated electric meter, an all-electronic electric meter, etc., and is not limited herein. The data source platform 12 may be a server, a terminal, or the like, and is not limited herein, for example, the data source platform may be a data management platform of a power consumption information acquisition system, and may also be a data management platform of other power systems. The electronic device 13 may be a server, a terminal, etc., and is not limited herein. The server may be implemented as a stand-alone server or as a server cluster comprised of multiple servers. The terminal may include, but is not limited to, a desktop computer, a laptop computer, a tablet computer, and the like.

Fig. 2 is a flowchart of an implementation of a method for filling missing data by a user according to an embodiment of the present invention. In this embodiment, the method is applied to the electronic device in fig. 1 as an example. As shown in fig. 2, the method includes:

s201, extracting power data from an original power data matrix to be filled through a preset sliding time window to form a time window data matrix, wherein the original power data matrix comprises appointed power data of each target user acquired at a plurality of preset sampling moments.

In this embodiment, the size and the step size of the sliding time window may be set according to actual situations, and are not limited herein. Each data in the time window power data matrix corresponds to a target user and a collection time point. The data corresponding to the same target user in the time window electric power data matrix are in the same row (column), and the data corresponding to the same acquisition time point are in the same column (row). In the raw power data matrix, each raw power data matrix corresponds to one item of designated power data. The preset sampling time may be once a day or once in half an hour, and is specifically set according to an actual situation, and is not limited herein. The data can be collected according to a preset period, the repeated data of the collected time points can be averaged, and the data of the collected data time points which are not the preset sampling time can be classified into the preset sampling time closest to the collected time and the average can be taken. The designated power data may include, but is not limited to, power amount data, load data, etc. of the user, and is not limited thereto.

S202, based on an interpolation algorithm, the missing data in the time window data matrix is pre-filled to obtain a pre-filled matrix.

S203, a first user in the pre-populated matrix is determined, wherein the first user is a target user with missing power data.

S204, aiming at each first user, searching a plurality of second users similar to the power data of the first user in a pre-filling matrix, forming the power data of the first user and the power data of the plurality of second users into a similar matrix corresponding to the first user, and determining filling values of missing data of the first user in the similar matrix corresponding to the first user based on a singular value threshold algorithm.

In this embodiment, the number of the second users may be a preset number. Each user with missing data corresponds to a similar matrix, and each similar matrix is used for solving the missing data of the user.

And S205, filling the filling value of the missing data of each first user into the corresponding position in the original power data matrix.

In the embodiment, electric power data are extracted from an original electric power data matrix to be filled through a preset sliding time window to form a time window data matrix, wherein the original electric power data matrix comprises appointed electric power data of each target user acquired at a plurality of preset sampling moments; pre-filling missing data in the time window data matrix based on an interpolation algorithm to obtain a pre-filled matrix; determining a first user in the pre-populated matrix, wherein the first user is a target user for which there is missing power data; for each first user, searching a plurality of second users similar to the power data of the first user in a pre-filling matrix, forming the power data of the first user and the power data of the plurality of second users into a similar matrix corresponding to the first user, and determining a filling value of missing data of the first user in the similar matrix corresponding to the first user based on a singular value threshold algorithm; filling values of missing data of each first user into corresponding positions in an original power data matrix, pre-filling the missing data by using an interpolation algorithm, searching users with similar data based on the pre-filled data, and re-filling by using a singular value threshold algorithm, so that the filling precision of the missing data of the users can be improved.

In some embodiments, based on the embodiment shown in fig. 2, finding a plurality of second users in the pre-populated matrix that are similar to the first user power data may include:

In this embodiment, the power data of each user in the pre-population matrix is classified based on a K-nearest neighbor classification algorithm, each user corresponds to a power data vector in the pre-population matrix, the power data vector of a first user in the pre-population matrix is compared with the power data vectors of other users, and a second user is determined.

Optionally, the similarity degree formula for evaluating similarity of the two pieces of user power data in the K-nearest neighbor classification algorithm may be:

d＝1-|ρ| (1)

x 'and y' are redefined power data vectors of two users, cov (x ', y') is the covariance of x 'and y', σ_xIs the standard deviation of x_yIs the standard deviation of y ', E (x ') is the average of x ', E (y ') is the average of y '; x ═ wx, y ═ wy, x, y are the power data vectors of two users, x ═ y₁,x₂,…,x_t]，y＝[y₁,y₂,…,y_t]W is a vector of weight coefficients, w ═ w₁,w₂,…,w_t]，

q is a constant less than 1.

Referring to fig. 3, fig. 3 is a graph illustrating a filling error and a filling time consumption of data filling with different numbers of similar users according to an embodiment of the present invention. The horizontal axis represents the number of similar users, the left vertical axis represents filling errors of missing data, the right vertical axis represents filling time consumption of the missing data, the solid line represents a filling error curve of the number of different similar users, and the dotted line represents a filling time consumption curve of the number of different similar users. As can be seen from the figure, the difference of the number of similar users has an influence on the data filling precision and speed, and 40 to 50 similar users can be extracted to achieve the highest filling precision and short filling time.

In this embodiment, by presetting the number of the second users, where the value range of the preset number is [40, 50], the filling accuracy of filling missing data of the users can be improved, and the filling time can be shortened.

Optionally, in this embodiment, a value range of the time length of the preset sliding time window is [60, 90] days.

Fig. 4 is a comparison graph of filling errors for data filling with different sliding time windows for five specific power data according to an embodiment of the present invention. As shown in fig. 4, the horizontal axis represents the length of the sliding time window, the vertical axis represents the filling error of the missing data, and five curves in the graph represent curves of five kinds of designated power data, respectively, wherein a curve 1 represents a curve of active total power, a curve 2 represents a curve of power at a sharp moment, a curve 3 represents an active power, a curve 4 represents a reactive power, and a curve 5 represents an a-phase voltage. As can be seen from the figure, the length of the sliding time window has a certain influence on the data filling precision, and the filling precision can be maximized by selecting the sliding time window for 60-90 days.

In this embodiment, the data filling accuracy can be improved by presetting the value range of the time length of the sliding time window to [60, 90] days.

Optionally, on the basis of any of the above embodiments, the specified power data includes, but is not limited to, at least one of the following: the positive active total electric quantity, the peak valley-leveling moment electric quantity, the active power, the ABC three-phase active power, the reactive power and the ABC three-phase reactive power are not limited herein.

Optionally, the interpolation algorithm includes, but is not limited to, a linear interpolation method, a nearest neighbor interpolation algorithm, and a bilinear interpolation algorithm, and is not limited herein.

In some embodiments, on the basis of any of the above embodiments, the method further comprises:

and adjusting the same type of specified power data into the same format, and establishing an original power data matrix based on the adjusted specified power data.

In this embodiment, the data collection period may be set to once per day for the electric quantity data, such as the positive active total electric quantity, the peak valley time electric quantity, and the like. The load data, such as active power, ABC three-phase active power, reactive power, ABC three-phase reactive power data, etc., may be set to have a data collection period of once every half hour.

In some embodiments, on the basis of any of the above embodiments, before extracting power data from the original power data matrix to be filled through a preset sliding time window to form a time window data matrix, the method further includes:

a raw power data matrix is obtained from a data source platform.

In this embodiment, after padding the padding value of the missing data of each first user into the corresponding position in the original power data matrix, the method further includes:

and updating the filled original power data matrix to a data source platform.

In some embodiments, on the basis of any of the above embodiments, after extracting data from the raw power data matrix to be populated in a preset sliding time window to form a time window data matrix, the method further includes:

correspondingly, the missing data in the time window data matrix is pre-filled based on an interpolation algorithm to obtain a pre-filled matrix, which includes:

and based on an interpolation algorithm, pre-filling missing data in the time window data matrix after the data deduplication processing and the abnormal value discarding processing to obtain a pre-filled matrix.

marking the position of the missing data in the time window data matrix;

accordingly, determining a first user in the pre-populated matrix comprises:

based on the location, a first user in the pre-populated matrix is determined.

The method for filling the missing user data is described below with an implementation example, but the method is not limited to filling the missing user data of 10kV dedicated transformer. In this implementation example, the method includes:

step 1, data normalization: the method comprises the steps of obtaining designated power data of power users in a power utilization information acquisition system, wherein the designated power data mainly comprises forward active total electric quantity, peak valley-leveling moment electric quantity, active power, ABC three-phase active power, reactive power, ABC three-phase reactive power and the like, and organizing the same data into the same format according to an acquisition cycle to construct an original power data matrix of power users-acquisition time of each data.

The method comprises the steps that the collection cycle of electric quantity data such as positive active total electric quantity, peak valley-leveling time and the like is set to be once every day, the average number of the data with repeated collection time points is obtained, and the data with the collection time points not being the whole points are classified into the whole point time points closest to the collection time and the average number is obtained. The collection cycle of load data such as active power, ABC three-phase active power, reactive power, ABC three-phase reactive power and the like is set to be once every half hour, the data with repeated collection time points are averaged, and the data with the collection time points not being the whole moment are classified into the time points of the whole moment nearest to the collection time and are averaged. Original power data matrix M of 'power consumer-collection time' belongs to R^t×nWherein t is a 10kV special transformer user data acquisition time period, and n is the number of 10kV special transformer users in the whole power data set.

Step 2, data pre-filling: and dividing the original power data matrix by adopting a sliding time window to obtain a time window data matrix. And carrying out data duplicate removal and abnormal value processing on the data in the time window data matrix. And then performing data pre-filling on the missing data in the time window by adopting a linear interpolation method to construct a pre-filling matrix.

The method comprises the following specific steps:

step 2.1, setting a proper sliding time window size, extracting data in the sliding time window, and constructing a time window data matrix;

step 2.2, data deduplication and abnormal value processing are carried out on the time window data matrix, repeated data and abnormal data are discarded, then missing data in the time window data matrix are marked, and the location of the missing data in the time window data matrix is recorded;

and 2.3, pre-filling missing data in the time window data matrix based on a traditional linear interpolation method to obtain a pre-filled matrix.

Step 3, clustering similar users: and positioning the users missing data, selecting a proper number of users similar to the users missing data for each user missing data by adopting a K-based nearest neighbor classification algorithm (KNN), and constructing a plurality of similar matrixes.

The method comprises the following specific steps:

step 3.1, positioning a plurality of missing data users according to the marked missing data positions;

step 3.2, aiming at each missing data user, extracting the electric power data vector of each missing data user in the time window data matrix, and extracting 50 user data vectors similar to the missing data users by adopting a K nearest neighbor classification algorithm;

the similarity degree formula for evaluating similarity of two user power data in the K nearest neighbor classification algorithm is as follows: d ═ 1- | ρ non-conducting phosphor

x 'and y' are redefined power data vectors of two users, cov (x ', y') is the covariance of x 'and y', σ_xIs the standard deviation of x_yIs the standard deviation of y ', E (x ') is the average of x ', E (y ') is the average of y '; x ═ wx, y ═ wy, x, y are the power data vectors of two users, x ═ y₁，x₂，…，x_t]，y＝[y₁，y₂，…，y_t]W is a vector of weight coefficients, w ═ w₁，w₂，…，w_t]，

q is a constant less than 1.

And 3.3, constructing similar matrixes with the same quantity as the quantity of the missing users for each missing data user through the missing user vector and the similar user vector according to the similar user vector obtained by the K nearest neighbor classification algorithm.

Step 4, data filling stage: and (4) positioning missing data in the matrix, repairing the missing data of each similar matrix by adopting a singular value threshold algorithm, and returning the successfully filled data to the original power data matrix to obtain complete 10kV special power transformation user data.

The method comprises the following specific steps:

step 4.1, substituting the similar matrix into a singular value threshold algorithm equation:

wherein X is a matrix after data recovery; m is a matrix to be repaired; i | · | purple wind_*The sum of singular values of a representation matrix is the nuclear norm of the matrix; p_Ω(. cndot.) is a projection operator, representing the orthogonal mapping of the matrix in Ω; i | · | purple wind_FThe Frobenius norm of the matrix represents the square root of the sum of squares of all elements of the matrix; τ is a fixed constant.

Step 4.2, solving the above formula by adopting an alternative iteration method, wherein the iteration sequence is shown as:

wherein k is the number of iterations; delta_kIs a weight coefficient; d_τ(. h) is a soft threshold operator operation; the matrix singular value is expressed as σ ═ σ [ ([ sigma ] ])₁，σ₂，…，σ_r) R ═ min { m, n }; τ is a constant vector, denoted as τ ═ (τ, τ, …, τ); u, V are left and right singular matrices.

According to the above formula, the missing data in each similar matrix is obtained through solving;

step 4.3, filling the solved data to the position corresponding to the sliding time window electric power data according to the positioned missing user and the positioned missing data position to obtain a complete time window data matrix;

and 4.4, filling the complete sliding time window electric power data matrix back to the position of the time window corresponding to the original electric power data matrix.

In this embodiment, when the power consumption information acquisition system acquires new power user data, the sliding time window is moved so that the new power data is included in the sliding time window, and the above steps are repeated to obtain complete 10kV dedicated power user data.

Fig. 5 is a graph comparing filling errors of filling power data with different data loss rates by using the user missing data filling method, the classical interpolation method, the conventional singular value threshold algorithm, and the secondary filling method provided in this embodiment.

In fig. 5, the horizontal axis represents a data missing rate, the vertical axis represents a missing data filling error, a curve a is a filling error curve corresponding to the user missing data filling method provided in the embodiment of the present invention, a curve b is a filling error curve corresponding to a classical interpolation method, a curve c is a filling error curve corresponding to a singular value threshold method, and a curve d is a filling error curve corresponding to a quadratic filling method. It can be seen from the figure that under different data loss rates, the data repair error of the classic interpolation method is always kept the largest in the four algorithms, and the repair error of the user missing data filling method of the invention is the lowest.

Compared with the prior art, the implementation example has the following beneficial effects:

1) the method for filling missing data of the distribution network 10kV special transformer users based on the singular value threshold algorithm improves the defect that the traditional method only considers local characteristics of data and is weak in pertinence, avoids interference of local noise and abnormal data, and improves accuracy of data filling.

2) The defects that the deep learning method needs a large amount of training data, the training consumes a large amount of resources, and the safety of user data cannot be guaranteed are overcome, the missing data is trained and processed directly, and the robustness is good.

3) By combining a similar user clustering method, the problem that the difference of the data magnitude of 10kV special variable users cannot be processed by a traditional method and a deep learning method is too large is solved, a data set required to be processed is reduced, and the data filling speed is accelerated.

4) By adding the sliding time window, the efficiency of data filling is improved, online filling of missing data of a 10kV special transformer user is realized, and the method has obvious advantages compared with a past data filling method which can only process a static data set.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

Fig. 6 is a schematic structural diagram of a user missing data padding apparatus according to an embodiment of the present invention. As shown in fig. 6, the user missing data padding apparatus 6 includes: an extraction module 610, a pre-fill module 620, and a fill module 630.

The extracting module 610 is configured to extract power data from an original power data matrix to be filled through a preset sliding time window to form a time window data matrix, where the original power data matrix includes designated power data of each target user acquired at a plurality of preset sampling moments.

A pre-filling module 620, configured to pre-fill missing data in the time window data matrix based on an interpolation algorithm to obtain a pre-filled matrix.

A populating module 630 configured to determine a first user in the pre-populated matrix, wherein the first user is a target user for which there is missing power data.

The filling module 630 is further configured to, for each first user, search a pre-filled matrix for a plurality of second users similar to the power data of the first user, construct a similarity matrix corresponding to the first user from the power data of the first user and the power data of the plurality of second users, and determine a filling value of missing data of the first user in the similarity matrix corresponding to the first user based on a singular value threshold algorithm.

And a padding module 630, further configured to pad padding values of the missing data of each first user to corresponding positions in the original power data matrix.

In the embodiment, electric power data are extracted from an original electric power data matrix to be filled through a preset sliding time window to form a time window data matrix, wherein the original electric power data matrix comprises appointed electric power data of each target user acquired at a plurality of preset sampling moments; pre-filling missing data in the time window data matrix based on an interpolation algorithm to obtain a pre-filled matrix; determining a first user in the pre-populated matrix, wherein the first user is a target user for which there is missing power data; for each first user, searching a plurality of second users similar to the power data of the first user in a pre-filling matrix, forming the power data of the first user and the power data of the plurality of second users into a similar matrix corresponding to the first user, and determining a filling value of missing data of the first user in the similar matrix corresponding to the first user based on a singular value threshold algorithm; and filling the filling value of the missing data of each first user into the corresponding position in the original power data matrix, so that the filling precision of the missing data of the users can be improved.

In some embodiments, on the basis of any of the above embodiments, the populating module 630 is configured to classify the power data of each user in the pre-populated matrix based on a K-nearest neighbor classification algorithm, and determine a plurality of second users similar to the power data of the first user according to the classification result.

Optionally, the similarity degree formula for evaluating similarity of the two pieces of user power data in the K nearest neighbor classification algorithm is as follows:

d＝1-|ρ|

q is a constant less than 1.

In some embodiments, based on any of the above embodiments, the time length of the preset sliding time window is within a range of [60, 90] days.

Optionally, the extracting module 610 is further configured to extract data from the original power data matrix to be filled in a preset sliding time window to form a time window data matrix, and then perform data deduplication processing and outlier discarding processing on the time window data matrix.

Accordingly, the pre-population module 620 is configured to pre-population missing data in the time window data matrix after the data deduplication processing and the outlier discarding processing based on an interpolation algorithm to obtain a pre-population matrix.

Optionally, the extracting module 610 is further configured to mark a position of missing data in the time window data matrix.

A pre-population module 620 to determine a first user in the pre-populated matrix based on the location.

Optionally, the extracting module 610 is further configured to collect, according to a preset period, specified power data of each target user at multiple preset sampling moments; and adjusting the same type of specified power data into the same format, and establishing an original power data matrix based on the adjusted specified power data.

Optionally, the extracting module 610 is further configured to obtain the original power data matrix from the data source platform before extracting the power data from the original power data matrix to be filled through a preset sliding time window to form a time window data matrix.

Optionally, the padding module 630 is further configured to update the padded original power data matrix to the data source platform after padding the padding value of the missing data of each first user into the corresponding position in the original power data matrix.

Fig. 7 is a schematic diagram of an electronic device provided by an embodiment of the invention. As shown in fig. 7, an embodiment of the present invention provides an electronic device 7, where the electronic device 7 of the embodiment includes: a processor 70, a memory 71, and a computer program 72 stored in the memory 71 and executable on the processor 70. The processor 70, when executing the computer program 72, implements the steps in the above-mentioned embodiments of the user missing data padding method, such as the steps 201 to 205 shown in fig. 2. Alternatively, the processor 70, when executing the computer program 72, implements the functions of the various modules/units in the above-described apparatus embodiments, such as the functions of the modules 610 to 630 shown in fig. 6.

Illustratively, the computer program 72 may be divided into one or more modules/units, which are stored in the memory 71 and executed by the processor 70 to carry out the invention. One or more of the modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program 72 in the electronic device 7.

The electronic device 7 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The terminal may include, but is not limited to, a processor 70, a memory 71. It will be appreciated by those skilled in the art that fig. 7 is only an example of the electronic device 7 and does not constitute a limitation of the electronic device 7 and may comprise more or less components than those shown, or some components may be combined, or different components, e.g. the terminal may further comprise input output devices, network access devices, buses, etc.

The Processor 70 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The storage 71 may be an internal storage unit of the electronic device 7, such as a hard disk or a memory of the electronic device 7. The memory 71 may also be an external storage device of the electronic device 7, such as a plug-in hard disk provided on the electronic device 7, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. Further, the memory 71 may also include both an internal storage unit of the electronic device 7 and an external storage device. The memory 71 is used for storing computer programs and other programs and data required by the terminal. The memory 71 may also be used to temporarily store data that has been output or is to be output.

The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored, and when being executed by a processor, the computer program implements the steps in the above-mentioned embodiments of the method for filling missing data for a user.

The computer-readable storage medium stores a computer program 72, the computer program 72 includes program instructions, and when the program instructions are executed by the processor 70, all or part of the processes in the method according to the above embodiments may be implemented by the computer program 72 instructing related hardware, and the computer program 72 may be stored in a computer-readable storage medium, and when the computer program 72 is executed by the processor 70, the steps of the above embodiments of the method may be implemented. The computer program 72 comprises, inter alia, computer program code, which may be in the form of source code, object code, an executable file or some intermediate form. The computer readable medium may include: any entity or device capable of carrying computer program code, recording medium, U.S. disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution media, and the like. It should be noted that the computer readable medium may include any suitable increase or decrease as required by legislation and patent practice in the jurisdiction, for example, in some jurisdictions, computer readable media may not include electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

The computer readable storage medium may be an internal storage unit of the terminal of any of the foregoing embodiments, for example, a hard disk or a memory of the terminal. The computer readable storage medium may also be an external storage device of the terminal, such as a plug-in hard disk provided on the terminal, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. Further, the computer-readable storage medium may also include both an internal storage unit and an external storage device of the terminal. The computer-readable storage medium is used for storing a computer program and other programs and data required by the terminal. The computer-readable storage medium may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal and method may be implemented in other ways. For example, the above-described apparatus/terminal embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain other components which may be suitably increased or decreased as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media which may not include electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims

1. A method for filling missing data of a user, comprising: extracting power data from an original power data matrix to be filled through a preset sliding time window to form a time window data matrix; pre-filling missing data in the time window data matrix based on an interpolation algorithm to obtain a pre-filled matrix; determining a first user in the pre-populated matrix; for each first user, searching a plurality of second users similar to the power data of the first user in the pre-filling matrix, forming the power data of the first user and the power data of the plurality of second users into a similar matrix corresponding to the first user, and determining a filling value of missing data of the first user in the similar matrix corresponding to the first user based on a singular value threshold algorithm; filling values of missing data of each first user into corresponding positions in the original power data matrix; the original power data matrix comprises designated power data of each target user acquired at a plurality of preset sampling moments; the first user is a target user with missing power data.

2. The method of claim 1, wherein searching for a plurality of second users in the pre-populated matrix that are similar to the first user power data comprises:

3. The user missing data filling method according to claim 2, wherein the similarity degree formula for evaluating similarity of two user power data in the K-nearest neighbor classification algorithm is as follows:

d＝1-|ρ|

q is a constant less than 1.

4. The method according to claim 2, wherein the number of the second users is a preset number, and the value range of the preset number is [40, 50 ].

5. The method according to claim 1, wherein the length of the preset sliding time window is [60, 90] days.

6. The method of claim 1, wherein after the predetermined sliding time window extracts data from the raw power data matrix to be filled to form a time window data matrix, the method further comprises:

7. The method of claim 1, wherein the method further comprises:

marking the position of the missing data in the time window data matrix;

the determining a first user in the pre-populated matrix comprises:

based on the location, a first user in the pre-populated matrix is determined.

8. The user missing data population method according to any of claims 1-7, wherein said specified power data comprises at least one of: the method comprises the following steps of (1) forward active total electric quantity, peak valley-leveling moment electric quantity, active power, ABC three-phase active power, reactive power and ABC three-phase reactive power;

the interpolation algorithm comprises a linear interpolation method;

the original power data matrix is M ∈ R^t×nWherein t is the total number of the preset sampling time, and n is the total number of the target users;

the method further comprises the following steps:

the same type of specified power data are adjusted to be in the same format, and the original power data matrix is established based on the adjusted specified power data;

before extracting power data from the raw power data matrix to be populated through a preset sliding time window to form a time window data matrix, the method further comprises:

acquiring the original electric power data matrix from a data source platform;

and updating the filled original power data matrix to the data source platform.

9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the method of user missing data population as claimed in any of claims 1 to 8 above when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the user missing data population method as claimed in any one of the preceding claims 1 to 8.