CN113051256A - Method, device and equipment for filling missing data of user - Google Patents
Method, device and equipment for filling missing data of user Download PDFInfo
- Publication number
- CN113051256A CN113051256A CN202110302543.9A CN202110302543A CN113051256A CN 113051256 A CN113051256 A CN 113051256A CN 202110302543 A CN202110302543 A CN 202110302543A CN 113051256 A CN113051256 A CN 113051256A
- Authority
- CN
- China
- Prior art keywords
- data
- user
- matrix
- power data
- filling
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 105
- 239000011159 matrix material Substances 0.000 claims abstract description 225
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 49
- 238000004590 computer program Methods 0.000 claims description 30
- 238000003860 storage Methods 0.000 claims description 26
- 238000012545 processing Methods 0.000 claims description 25
- 239000013598 vector Substances 0.000 claims description 24
- 238000005070 sampling Methods 0.000 claims description 21
- 238000007635 classification algorithm Methods 0.000 claims description 13
- 230000002159 abnormal effect Effects 0.000 claims description 12
- 238000005259 measurement Methods 0.000 claims description 5
- 239000000284 extract Substances 0.000 claims description 3
- 230000008569 process Effects 0.000 description 11
- 238000010586 diagram Methods 0.000 description 8
- 230000006870 function Effects 0.000 description 7
- 230000005540 biological transmission Effects 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 4
- 238000009826 distribution Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 238000005065 mining Methods 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000013480 data collection Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 230000008439 repair process Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000013523 data management Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- OAICVXFJPJFONN-UHFFFAOYSA-N Phosphorus Chemical compound [P] OAICVXFJPJFONN-UHFFFAOYSA-N 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24147—Distances to closest patterns, e.g. nearest neighbour classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Business, Economics & Management (AREA)
- Databases & Information Systems (AREA)
- Computational Mathematics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Pure & Applied Mathematics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Economics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Artificial Intelligence (AREA)
- Marketing (AREA)
- General Health & Medical Sciences (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Computing Systems (AREA)
- Primary Health Care (AREA)
- Water Supply & Treatment (AREA)
- Public Health (AREA)
- Quality & Reliability (AREA)
- Algebra (AREA)
- Software Systems (AREA)
- Supply And Distribution Of Alternating Current (AREA)
Abstract
The invention discloses a method, a device and equipment for filling missing data of a user, wherein the method comprises the following steps: extracting power data from an original power data matrix to be filled through a preset sliding time window to form a time window data matrix; pre-filling missing data in the time window data matrix based on an interpolation algorithm to obtain a pre-filled matrix; determining a first user in the pre-populated matrix, wherein the first user is a target user for which there is missing power data; for each first user, searching a plurality of second users similar to the power data of the first user in a pre-filling matrix, forming the power data of the first user and the power data of the plurality of second users into a similar matrix corresponding to the first user, determining filling values of missing data of the first user in the similar matrix corresponding to the first user based on a singular value threshold algorithm, and filling the filling values into corresponding positions in an original power data matrix, so that the filling precision of the missing data of the users can be improved.
Description
Technical Field
The present invention relates to the field of power technologies, and in particular, to a method, an apparatus, and a device for filling missing data of a user.
Background
With the continuous promotion of energy internet construction and the rapid development of smart grid technology, the requirements of the distribution network side on the storage, analysis and mining of electric power data are increasing day by day. In the whole actual acquisition and transmission link of a power grid, in the acquisition, transmission and storage processes of power consumer data, due to the reasons of machine sensing equipment failure, transmission network delay, storage processing errors and the like, data loss generally exists, and further analysis, mining and modeling of power data in the later period are seriously influenced.
The traditional missing data filling method is used for filling according to local characteristics of data, such as a linear interpolation method and the like, and is easily interfered by noise and abnormal data, and the filling precision is low.
Disclosure of Invention
In view of this, the present invention provides a method, an apparatus, and a device for filling missing data of a user, and aims to solve the problem of low precision in filling missing data of a user in the prior art.
A first aspect of an embodiment of the present invention provides a method for filling missing data of a user, including:
extracting power data from an original power data matrix to be filled through a preset sliding time window to form a time window data matrix, wherein the original power data matrix comprises appointed power data of each target user acquired at a plurality of preset sampling moments;
pre-filling missing data in the time window data matrix based on an interpolation algorithm to obtain a pre-filled matrix;
determining a first user in the pre-populated matrix, wherein the first user is a target user for which there is missing power data;
for each first user, searching a plurality of second users similar to the power data of the first user in the pre-filling matrix, forming the power data of the first user and the power data of the plurality of second users into a similar matrix corresponding to the first user, and determining a filling value of missing data of the first user in the similar matrix corresponding to the first user based on a singular value threshold algorithm;
filling the filling value of the missing data of each first user into the corresponding position in the original power data matrix.
Optionally, searching for a plurality of second users similar to the first user power data in the pre-populated matrix includes:
and classifying the power data of each user in the pre-filling matrix based on a K nearest neighbor classification algorithm, and determining a plurality of second users similar to the power data of the first user according to the classification result.
Optionally, the similarity degree formula for evaluating similarity of the two pieces of user power data in the K-nearest neighbor classification algorithm is as follows:
d=1-|ρ|
wherein d is the similarity degree of similarity of two user power data, p is the similarity measurement,x 'and y' being redefinedThe power data vectors of the two users, cov (x ', y'), are the covariance of x 'and y', σxIs the standard deviation of xyIs the standard deviation of y ', E (x ') is the average of x ', E (y ') is the average of y '; x ═ wx, y ═ wy, x, y are the power data vectors of two users, x ═ y1,x2,…,xt],y=[y1,y2,…,yt]W is a vector of weight coefficients, w ═ w1,w2,…,wt],q is a constant less than 1.
Optionally, the number of the second users is a preset number, and a value range of the preset number is [40, 50 ].
Optionally, the value range of the time length of the preset sliding time window is [60, 90] days.
Optionally, after the preset sliding time window extracts data from the original power data matrix to be filled to form a time window data matrix, the method further includes:
carrying out data deduplication processing and abnormal value discarding processing on the time window data matrix;
the pre-filling missing data in the time window data matrix based on the interpolation algorithm to obtain a pre-filled matrix, comprising:
and pre-filling missing data in the time window data matrix after the data deduplication processing and the abnormal value discarding processing based on an interpolation algorithm to obtain the pre-filled matrix.
Optionally, the method further includes:
marking the position of the missing data in the time window data matrix;
the determining a first user in the pre-populated matrix comprises:
based on the location, a first user in the pre-populated matrix is determined.
Optionally, the specified power data includes at least one of: the positive active total electric quantity, the peak valley-leveling moment electric quantity, the active power, the ABC three-phase active power, the reactive power and the ABC three-phase reactive power.
Optionally, the interpolation algorithm comprises a linear interpolation method.
Optionally, the original power data matrix is M e Rt×nAnd t is the total number of the preset sampling time, and n is the total number of the target users.
Optionally, the method further includes:
acquiring appointed power data of each target user at a plurality of preset sampling moments according to a preset period;
and adjusting the same type of specified power data into the same format, and establishing the original power data matrix based on the adjusted specified power data.
Optionally, before extracting power data from the original power data matrix to be filled through a preset sliding time window to form a time window data matrix, the method further includes:
acquiring the original electric power data matrix from a data source platform;
after padding the padding values of the missing data of each first user to the respective position in the original power data matrix, the method further comprises:
and updating the filled original power data matrix to the data source platform.
A second aspect of the embodiments of the present invention provides a device for filling missing data of a user, including:
the system comprises an extraction module, a time window data matrix and a data processing module, wherein the extraction module is used for extracting power data from an original power data matrix to be filled through a preset sliding time window to form the time window data matrix, and the original power data matrix comprises appointed power data of each target user acquired at a plurality of preset sampling moments;
a pre-filling module, configured to pre-fill missing data in the time window data matrix based on an interpolation algorithm to obtain a pre-filled matrix;
a populating module to determine a first user in the pre-populated matrix, wherein the first user is a target user for which there is missing power data;
the filling module is further configured to search, for each first user, a plurality of second users similar to the power data of the first user in the pre-filling matrix, construct a similarity matrix corresponding to the first user from the power data of the first user and the power data of the plurality of second users, and determine a filling value of missing data of the first user in the similarity matrix corresponding to the first user based on a singular value threshold algorithm;
the filling module is further configured to fill the filling value of the missing data of each first user into a corresponding position in the original power data matrix.
A third aspect of the embodiments of the present invention provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the user missing data padding method according to the first aspect when executing the computer program.
A fourth aspect of the embodiments of the present invention provides a computer-readable storage medium, which stores a computer program, and the computer program, when executed by a processor, implements the steps of the user missing data padding method according to the first aspect.
Compared with the prior art, the invention has the following beneficial effects:
the method comprises the steps of extracting power data from an original power data matrix to be filled through a preset sliding time window to form a time window data matrix, wherein the original power data matrix comprises appointed power data of each target user acquired at a plurality of preset sampling moments; pre-filling missing data in the time window data matrix based on an interpolation algorithm to obtain a pre-filled matrix; determining a first user in the pre-populated matrix, wherein the first user is a target user for which there is missing power data; for each first user, searching a plurality of second users similar to the power data of the first user in a pre-filling matrix, forming the power data of the first user and the power data of the plurality of second users into a similar matrix corresponding to the first user, and determining a filling value of missing data of the first user in the similar matrix corresponding to the first user based on a singular value threshold algorithm; filling values of missing data of each first user into corresponding positions in an original power data matrix, pre-filling the missing data by using an interpolation algorithm, searching users with similar data based on the pre-filled data, and re-filling by using a singular value threshold algorithm, so that the filling precision of the missing data of the users can be improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
FIG. 1 is a diagram illustrating an application environment of a method for filling missing data in a user according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating an implementation of a method for filling missing data in a user according to an embodiment of the present invention;
FIG. 3 is a graph illustrating the fill error and time consumption for data filling with different numbers of similar users according to an embodiment of the present invention;
FIG. 4 is a comparison graph of fill errors for data filling with different sliding time windows for five specific power data, according to an embodiment of the present invention;
fig. 5 is a graph comparing filling errors of filling power data with different data loss rates by using the user missing data filling method, the classical interpolation method, the conventional singular value threshold algorithm, and the secondary filling method provided in this embodiment;
fig. 6 is a schematic structural diagram of a user missing data padding apparatus according to an embodiment of the present invention;
fig. 7 is a schematic diagram of an electronic device provided by an embodiment of the invention.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following description is made by way of specific embodiments with reference to the accompanying drawings.
With the continuous promotion of energy internet construction and the rapid development of smart grid technology, the requirements of a power grid distribution side on the storage, analysis and mining of 10kV special power transmission marketing data are increasing day by day. The 10kV special transformer user data comprises more types of electric power data, longer data acquisition period, larger data value and 10-order difference between the same type of data6These characteristics make the processing of the 10kV proprietary subscriber data also often different from the normal power data. The 10V special transformer user data contains abundant information resources, and has all-round values from accurate positioning of power consumption customers, power production feedback guidance and accurate reduction of national economy for analyzing the user data.
In the process of collecting, transmitting and storing the 10kV special transformer user data, due to the reasons of machine sensing equipment failure, transmission network delay, storage processing error and the like, data loss can occur, and the analysis, mining and modeling of the power data in the later period are further seriously influenced. The existing data filling method usually considers the local characteristics of data and is easily interfered by noise and abnormal data; with the development of artificial intelligence technology, some scholars adopt a deep learning method to repair missing data, for example, lost data is filled based on a neural network algorithm, although the method has global characteristics, more computing resources are required to be consumed in the use process, and the problem of user data privacy safety exists.
The method comprises the steps of extracting power data from an original power data matrix to be filled through a preset sliding time window to form a time window data matrix, wherein the original power data matrix comprises appointed power data of each target user acquired at a plurality of preset sampling moments; pre-filling missing data in the time window data matrix based on an interpolation algorithm to obtain a pre-filled matrix; determining a first user in the pre-populated matrix, wherein the first user is a target user for which there is missing power data; for each first user, searching a plurality of second users similar to the power data of the first user in a pre-filling matrix, forming the power data of the first user and the power data of the plurality of second users into a similar matrix corresponding to the first user, and determining a filling value of missing data of the first user in the similar matrix corresponding to the first user based on a singular value threshold algorithm; filling values of missing data of each first user into corresponding positions in an original power data matrix, pre-filling the missing data by using an interpolation algorithm, searching users with similar data based on the pre-filled data, and re-filling by using a singular value threshold algorithm, so that the filling precision of the missing data of the users can be improved.
Fig. 1 is an application environment diagram of a method for filling missing data by a user according to an embodiment of the present invention. The user missing data filling method provided by the embodiment of the invention can be applied to the power system in the application environment diagram but is not limited to the application environment diagram. The power system comprises a power data acquisition device 11, a data source platform 12 and an electronic device 13. The data source platform 12 and the electronic device 13 may be the same device or different devices, and are not limited herein.
The power data acquisition device 11 is configured to acquire power data of a user and send the acquired power data to the data source platform 12. The data source platform 12 is used to store power data, such as power data for each customer in a database, where all or part of the power data may constitute the raw power data matrix to be populated. The electronic device 12 is configured to obtain an original power data matrix from the data source platform 12, fill power data missing by a user in the original power data matrix, and send the filled original power data matrix to the data source platform 12. The data source platform 12 updates the database based on the populated raw power data matrix. The electronic device 12 may extract power data from an original power data matrix to be filled through a preset sliding time window to form a time window data matrix, where the original power data matrix includes designated power data of each target user acquired at a plurality of preset sampling moments; pre-filling missing data in the time window data matrix based on an interpolation algorithm to obtain a pre-filled matrix; determining a first user in the pre-populated matrix, wherein the first user is a target user for which there is missing power data; for each first user, searching a plurality of second users similar to the power data of the first user in a pre-filling matrix, forming the power data of the first user and the power data of the plurality of second users into a similar matrix corresponding to the first user, and determining a filling value of missing data of the first user in the similar matrix corresponding to the first user based on a singular value threshold algorithm; and filling values of the missing data of each first user into corresponding positions in the original power data matrix.
The power data collection device 11 may be an electromechanical integrated electric meter, an all-electronic electric meter, etc., and is not limited herein. The data source platform 12 may be a server, a terminal, or the like, and is not limited herein, for example, the data source platform may be a data management platform of a power consumption information acquisition system, and may also be a data management platform of other power systems. The electronic device 13 may be a server, a terminal, etc., and is not limited herein. The server may be implemented as a stand-alone server or as a server cluster comprised of multiple servers. The terminal may include, but is not limited to, a desktop computer, a laptop computer, a tablet computer, and the like.
Fig. 2 is a flowchart of an implementation of a method for filling missing data by a user according to an embodiment of the present invention. In this embodiment, the method is applied to the electronic device in fig. 1 as an example. As shown in fig. 2, the method includes:
s201, extracting power data from an original power data matrix to be filled through a preset sliding time window to form a time window data matrix, wherein the original power data matrix comprises appointed power data of each target user acquired at a plurality of preset sampling moments.
In this embodiment, the size and the step size of the sliding time window may be set according to actual situations, and are not limited herein. Each data in the time window power data matrix corresponds to a target user and a collection time point. The data corresponding to the same target user in the time window electric power data matrix are in the same row (column), and the data corresponding to the same acquisition time point are in the same column (row). In the raw power data matrix, each raw power data matrix corresponds to one item of designated power data. The preset sampling time may be once a day or once in half an hour, and is specifically set according to an actual situation, and is not limited herein. The data can be collected according to a preset period, the repeated data of the collected time points can be averaged, and the data of the collected data time points which are not the preset sampling time can be classified into the preset sampling time closest to the collected time and the average can be taken. The designated power data may include, but is not limited to, power amount data, load data, etc. of the user, and is not limited thereto.
S202, based on an interpolation algorithm, the missing data in the time window data matrix is pre-filled to obtain a pre-filled matrix.
S203, a first user in the pre-populated matrix is determined, wherein the first user is a target user with missing power data.
S204, aiming at each first user, searching a plurality of second users similar to the power data of the first user in a pre-filling matrix, forming the power data of the first user and the power data of the plurality of second users into a similar matrix corresponding to the first user, and determining filling values of missing data of the first user in the similar matrix corresponding to the first user based on a singular value threshold algorithm.
In this embodiment, the number of the second users may be a preset number. Each user with missing data corresponds to a similar matrix, and each similar matrix is used for solving the missing data of the user.
And S205, filling the filling value of the missing data of each first user into the corresponding position in the original power data matrix.
In the embodiment, electric power data are extracted from an original electric power data matrix to be filled through a preset sliding time window to form a time window data matrix, wherein the original electric power data matrix comprises appointed electric power data of each target user acquired at a plurality of preset sampling moments; pre-filling missing data in the time window data matrix based on an interpolation algorithm to obtain a pre-filled matrix; determining a first user in the pre-populated matrix, wherein the first user is a target user for which there is missing power data; for each first user, searching a plurality of second users similar to the power data of the first user in a pre-filling matrix, forming the power data of the first user and the power data of the plurality of second users into a similar matrix corresponding to the first user, and determining a filling value of missing data of the first user in the similar matrix corresponding to the first user based on a singular value threshold algorithm; filling values of missing data of each first user into corresponding positions in an original power data matrix, pre-filling the missing data by using an interpolation algorithm, searching users with similar data based on the pre-filled data, and re-filling by using a singular value threshold algorithm, so that the filling precision of the missing data of the users can be improved.
In some embodiments, based on the embodiment shown in fig. 2, finding a plurality of second users in the pre-populated matrix that are similar to the first user power data may include:
and classifying the power data of each user in the pre-filling matrix based on a K nearest neighbor classification algorithm, and determining a plurality of second users similar to the power data of the first user according to the classification result.
In this embodiment, the power data of each user in the pre-population matrix is classified based on a K-nearest neighbor classification algorithm, each user corresponds to a power data vector in the pre-population matrix, the power data vector of a first user in the pre-population matrix is compared with the power data vectors of other users, and a second user is determined.
Optionally, the similarity degree formula for evaluating similarity of the two pieces of user power data in the K-nearest neighbor classification algorithm may be:
d=1-|ρ| (1)
wherein d is the similarity degree of similarity of two user power data, p is the similarity measurement,x 'and y' are redefined power data vectors of two users, cov (x ', y') is the covariance of x 'and y', σxIs the standard deviation of xyIs the standard deviation of y ', E (x ') is the average of x ', E (y ') is the average of y '; x ═ wx, y ═ wy, x, y are the power data vectors of two users, x ═ y1,x2,…,xt],y=[y1,y2,…,yt]W is a vector of weight coefficients, w ═ w1,w2,…,wt],q is a constant less than 1.
Optionally, the number of the second users is a preset number, and a value range of the preset number is [40, 50 ].
Referring to fig. 3, fig. 3 is a graph illustrating a filling error and a filling time consumption of data filling with different numbers of similar users according to an embodiment of the present invention. The horizontal axis represents the number of similar users, the left vertical axis represents filling errors of missing data, the right vertical axis represents filling time consumption of the missing data, the solid line represents a filling error curve of the number of different similar users, and the dotted line represents a filling time consumption curve of the number of different similar users. As can be seen from the figure, the difference of the number of similar users has an influence on the data filling precision and speed, and 40 to 50 similar users can be extracted to achieve the highest filling precision and short filling time.
In this embodiment, by presetting the number of the second users, where the value range of the preset number is [40, 50], the filling accuracy of filling missing data of the users can be improved, and the filling time can be shortened.
Optionally, in this embodiment, a value range of the time length of the preset sliding time window is [60, 90] days.
Fig. 4 is a comparison graph of filling errors for data filling with different sliding time windows for five specific power data according to an embodiment of the present invention. As shown in fig. 4, the horizontal axis represents the length of the sliding time window, the vertical axis represents the filling error of the missing data, and five curves in the graph represent curves of five kinds of designated power data, respectively, wherein a curve 1 represents a curve of active total power, a curve 2 represents a curve of power at a sharp moment, a curve 3 represents an active power, a curve 4 represents a reactive power, and a curve 5 represents an a-phase voltage. As can be seen from the figure, the length of the sliding time window has a certain influence on the data filling precision, and the filling precision can be maximized by selecting the sliding time window for 60-90 days.
In this embodiment, the data filling accuracy can be improved by presetting the value range of the time length of the sliding time window to [60, 90] days.
Optionally, on the basis of any of the above embodiments, the specified power data includes, but is not limited to, at least one of the following: the positive active total electric quantity, the peak valley-leveling moment electric quantity, the active power, the ABC three-phase active power, the reactive power and the ABC three-phase reactive power are not limited herein.
Optionally, the interpolation algorithm includes, but is not limited to, a linear interpolation method, a nearest neighbor interpolation algorithm, and a bilinear interpolation algorithm, and is not limited herein.
Optionally, the original power data matrix is M e Rt×nAnd t is the total number of the preset sampling time, and n is the total number of the target users.
In some embodiments, on the basis of any of the above embodiments, the method further comprises:
acquiring appointed power data of each target user at a plurality of preset sampling moments according to a preset period;
and adjusting the same type of specified power data into the same format, and establishing an original power data matrix based on the adjusted specified power data.
In this embodiment, the data collection period may be set to once per day for the electric quantity data, such as the positive active total electric quantity, the peak valley time electric quantity, and the like. The load data, such as active power, ABC three-phase active power, reactive power, ABC three-phase reactive power data, etc., may be set to have a data collection period of once every half hour.
In some embodiments, on the basis of any of the above embodiments, before extracting power data from the original power data matrix to be filled through a preset sliding time window to form a time window data matrix, the method further includes:
a raw power data matrix is obtained from a data source platform.
In this embodiment, after padding the padding value of the missing data of each first user into the corresponding position in the original power data matrix, the method further includes:
and updating the filled original power data matrix to a data source platform.
In some embodiments, on the basis of any of the above embodiments, after extracting data from the raw power data matrix to be populated in a preset sliding time window to form a time window data matrix, the method further includes:
carrying out data deduplication processing and abnormal value discarding processing on the time window data matrix;
correspondingly, the missing data in the time window data matrix is pre-filled based on an interpolation algorithm to obtain a pre-filled matrix, which includes:
and based on an interpolation algorithm, pre-filling missing data in the time window data matrix after the data deduplication processing and the abnormal value discarding processing to obtain a pre-filled matrix.
In some embodiments, on the basis of any of the above embodiments, the method further comprises:
marking the position of the missing data in the time window data matrix;
accordingly, determining a first user in the pre-populated matrix comprises:
based on the location, a first user in the pre-populated matrix is determined.
The method for filling the missing user data is described below with an implementation example, but the method is not limited to filling the missing user data of 10kV dedicated transformer. In this implementation example, the method includes:
step 1, data normalization: the method comprises the steps of obtaining designated power data of power users in a power utilization information acquisition system, wherein the designated power data mainly comprises forward active total electric quantity, peak valley-leveling moment electric quantity, active power, ABC three-phase active power, reactive power, ABC three-phase reactive power and the like, and organizing the same data into the same format according to an acquisition cycle to construct an original power data matrix of power users-acquisition time of each data.
The method comprises the steps that the collection cycle of electric quantity data such as positive active total electric quantity, peak valley-leveling time and the like is set to be once every day, the average number of the data with repeated collection time points is obtained, and the data with the collection time points not being the whole points are classified into the whole point time points closest to the collection time and the average number is obtained. The collection cycle of load data such as active power, ABC three-phase active power, reactive power, ABC three-phase reactive power and the like is set to be once every half hour, the data with repeated collection time points are averaged, and the data with the collection time points not being the whole moment are classified into the time points of the whole moment nearest to the collection time and are averaged. Original power data matrix M of 'power consumer-collection time' belongs to Rt×nWherein t is a 10kV special transformer user data acquisition time period, and n is the number of 10kV special transformer users in the whole power data set.
Step 2, data pre-filling: and dividing the original power data matrix by adopting a sliding time window to obtain a time window data matrix. And carrying out data duplicate removal and abnormal value processing on the data in the time window data matrix. And then performing data pre-filling on the missing data in the time window by adopting a linear interpolation method to construct a pre-filling matrix.
The method comprises the following specific steps:
step 2.1, setting a proper sliding time window size, extracting data in the sliding time window, and constructing a time window data matrix;
step 2.2, data deduplication and abnormal value processing are carried out on the time window data matrix, repeated data and abnormal data are discarded, then missing data in the time window data matrix are marked, and the location of the missing data in the time window data matrix is recorded;
and 2.3, pre-filling missing data in the time window data matrix based on a traditional linear interpolation method to obtain a pre-filled matrix.
Step 3, clustering similar users: and positioning the users missing data, selecting a proper number of users similar to the users missing data for each user missing data by adopting a K-based nearest neighbor classification algorithm (KNN), and constructing a plurality of similar matrixes.
The method comprises the following specific steps:
step 3.1, positioning a plurality of missing data users according to the marked missing data positions;
step 3.2, aiming at each missing data user, extracting the electric power data vector of each missing data user in the time window data matrix, and extracting 50 user data vectors similar to the missing data users by adopting a K nearest neighbor classification algorithm;
the similarity degree formula for evaluating similarity of two user power data in the K nearest neighbor classification algorithm is as follows: d ═ 1- | ρ non-conducting phosphor
Wherein d is the similarity degree of similarity of two user power data, p is the similarity measurement,x 'and y' are redefined power data vectors of two users, cov (x ', y') is the covariance of x 'and y', σxIs the standard deviation of xyIs the standard deviation of y ', E (x ') is the average of x ', E (y ') is the average of y '; x ═ wx, y ═ wy, x, y are the power data vectors of two users, x ═ y1,x2,…,xt],y=[y1,y2,…,yt]W is a vector of weight coefficients, w ═ w1,w2,…,wt],q is a constant less than 1.
And 3.3, constructing similar matrixes with the same quantity as the quantity of the missing users for each missing data user through the missing user vector and the similar user vector according to the similar user vector obtained by the K nearest neighbor classification algorithm.
Step 4, data filling stage: and (4) positioning missing data in the matrix, repairing the missing data of each similar matrix by adopting a singular value threshold algorithm, and returning the successfully filled data to the original power data matrix to obtain complete 10kV special power transformation user data.
The method comprises the following specific steps:
step 4.1, substituting the similar matrix into a singular value threshold algorithm equation:
wherein X is a matrix after data recovery; m is a matrix to be repaired; i | · | purple wind*The sum of singular values of a representation matrix is the nuclear norm of the matrix; pΩ(. cndot.) is a projection operator, representing the orthogonal mapping of the matrix in Ω; i | · | purple windFThe Frobenius norm of the matrix represents the square root of the sum of squares of all elements of the matrix; τ is a fixed constant.
Step 4.2, solving the above formula by adopting an alternative iteration method, wherein the iteration sequence is shown as:
wherein k is the number of iterations; deltakIs a weight coefficient; dτ(. h) is a soft threshold operator operation; the matrix singular value is expressed as σ ═ σ [ ([ sigma ] ])1,σ2,…,σr) R ═ min { m, n }; τ is a constant vector, denoted as τ ═ (τ, τ, …, τ); u, V are left and right singular matrices.
According to the above formula, the missing data in each similar matrix is obtained through solving;
step 4.3, filling the solved data to the position corresponding to the sliding time window electric power data according to the positioned missing user and the positioned missing data position to obtain a complete time window data matrix;
and 4.4, filling the complete sliding time window electric power data matrix back to the position of the time window corresponding to the original electric power data matrix.
In this embodiment, when the power consumption information acquisition system acquires new power user data, the sliding time window is moved so that the new power data is included in the sliding time window, and the above steps are repeated to obtain complete 10kV dedicated power user data.
Fig. 5 is a graph comparing filling errors of filling power data with different data loss rates by using the user missing data filling method, the classical interpolation method, the conventional singular value threshold algorithm, and the secondary filling method provided in this embodiment.
In fig. 5, the horizontal axis represents a data missing rate, the vertical axis represents a missing data filling error, a curve a is a filling error curve corresponding to the user missing data filling method provided in the embodiment of the present invention, a curve b is a filling error curve corresponding to a classical interpolation method, a curve c is a filling error curve corresponding to a singular value threshold method, and a curve d is a filling error curve corresponding to a quadratic filling method. It can be seen from the figure that under different data loss rates, the data repair error of the classic interpolation method is always kept the largest in the four algorithms, and the repair error of the user missing data filling method of the invention is the lowest.
Compared with the prior art, the implementation example has the following beneficial effects:
1) the method for filling missing data of the distribution network 10kV special transformer users based on the singular value threshold algorithm improves the defect that the traditional method only considers local characteristics of data and is weak in pertinence, avoids interference of local noise and abnormal data, and improves accuracy of data filling.
2) The defects that the deep learning method needs a large amount of training data, the training consumes a large amount of resources, and the safety of user data cannot be guaranteed are overcome, the missing data is trained and processed directly, and the robustness is good.
3) By combining a similar user clustering method, the problem that the difference of the data magnitude of 10kV special variable users cannot be processed by a traditional method and a deep learning method is too large is solved, a data set required to be processed is reduced, and the data filling speed is accelerated.
4) By adding the sliding time window, the efficiency of data filling is improved, online filling of missing data of a 10kV special transformer user is realized, and the method has obvious advantages compared with a past data filling method which can only process a static data set.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
Fig. 6 is a schematic structural diagram of a user missing data padding apparatus according to an embodiment of the present invention. As shown in fig. 6, the user missing data padding apparatus 6 includes: an extraction module 610, a pre-fill module 620, and a fill module 630.
The extracting module 610 is configured to extract power data from an original power data matrix to be filled through a preset sliding time window to form a time window data matrix, where the original power data matrix includes designated power data of each target user acquired at a plurality of preset sampling moments.
A pre-filling module 620, configured to pre-fill missing data in the time window data matrix based on an interpolation algorithm to obtain a pre-filled matrix.
A populating module 630 configured to determine a first user in the pre-populated matrix, wherein the first user is a target user for which there is missing power data.
The filling module 630 is further configured to, for each first user, search a pre-filled matrix for a plurality of second users similar to the power data of the first user, construct a similarity matrix corresponding to the first user from the power data of the first user and the power data of the plurality of second users, and determine a filling value of missing data of the first user in the similarity matrix corresponding to the first user based on a singular value threshold algorithm.
And a padding module 630, further configured to pad padding values of the missing data of each first user to corresponding positions in the original power data matrix.
In the embodiment, electric power data are extracted from an original electric power data matrix to be filled through a preset sliding time window to form a time window data matrix, wherein the original electric power data matrix comprises appointed electric power data of each target user acquired at a plurality of preset sampling moments; pre-filling missing data in the time window data matrix based on an interpolation algorithm to obtain a pre-filled matrix; determining a first user in the pre-populated matrix, wherein the first user is a target user for which there is missing power data; for each first user, searching a plurality of second users similar to the power data of the first user in a pre-filling matrix, forming the power data of the first user and the power data of the plurality of second users into a similar matrix corresponding to the first user, and determining a filling value of missing data of the first user in the similar matrix corresponding to the first user based on a singular value threshold algorithm; and filling the filling value of the missing data of each first user into the corresponding position in the original power data matrix, so that the filling precision of the missing data of the users can be improved.
In some embodiments, on the basis of any of the above embodiments, the populating module 630 is configured to classify the power data of each user in the pre-populated matrix based on a K-nearest neighbor classification algorithm, and determine a plurality of second users similar to the power data of the first user according to the classification result.
Optionally, the similarity degree formula for evaluating similarity of the two pieces of user power data in the K nearest neighbor classification algorithm is as follows:
d=1-|ρ|
wherein d is the similarity degree of similarity of two user power data, p is the similarity measurement,x 'and y' are redefined power data vectors of two users, cov (x ', y') is the covariance of x 'and y', σxIs the standard deviation of xyIs the standard deviation of y ', E (x ') is the average of x ', E (y ') is the average of y '; x ═ wx, y ═ wy, x, y are the power data vectors of two users, x ═ y1,x2,…,xt],y=[y1,y2,…,yt]W is a vector of weight coefficients, w ═ w1,w2,…,wt],q is a constant less than 1.
Optionally, the number of the second users is a preset number, and a value range of the preset number is [40, 50 ].
In some embodiments, based on any of the above embodiments, the time length of the preset sliding time window is within a range of [60, 90] days.
Optionally, the extracting module 610 is further configured to extract data from the original power data matrix to be filled in a preset sliding time window to form a time window data matrix, and then perform data deduplication processing and outlier discarding processing on the time window data matrix.
Accordingly, the pre-population module 620 is configured to pre-population missing data in the time window data matrix after the data deduplication processing and the outlier discarding processing based on an interpolation algorithm to obtain a pre-population matrix.
Optionally, the extracting module 610 is further configured to mark a position of missing data in the time window data matrix.
A pre-population module 620 to determine a first user in the pre-populated matrix based on the location.
Optionally, the specified power data includes at least one of: the positive active total electric quantity, the peak valley-leveling moment electric quantity, the active power, the ABC three-phase active power, the reactive power and the ABC three-phase reactive power.
Optionally, the interpolation algorithm comprises a linear interpolation method.
Optionally, the original power data matrix is M e Rt×nAnd t is the total number of the preset sampling time, and n is the total number of the target users.
Optionally, the extracting module 610 is further configured to collect, according to a preset period, specified power data of each target user at multiple preset sampling moments; and adjusting the same type of specified power data into the same format, and establishing an original power data matrix based on the adjusted specified power data.
Optionally, the extracting module 610 is further configured to obtain the original power data matrix from the data source platform before extracting the power data from the original power data matrix to be filled through a preset sliding time window to form a time window data matrix.
Optionally, the padding module 630 is further configured to update the padded original power data matrix to the data source platform after padding the padding value of the missing data of each first user into the corresponding position in the original power data matrix.
Fig. 7 is a schematic diagram of an electronic device provided by an embodiment of the invention. As shown in fig. 7, an embodiment of the present invention provides an electronic device 7, where the electronic device 7 of the embodiment includes: a processor 70, a memory 71, and a computer program 72 stored in the memory 71 and executable on the processor 70. The processor 70, when executing the computer program 72, implements the steps in the above-mentioned embodiments of the user missing data padding method, such as the steps 201 to 205 shown in fig. 2. Alternatively, the processor 70, when executing the computer program 72, implements the functions of the various modules/units in the above-described apparatus embodiments, such as the functions of the modules 610 to 630 shown in fig. 6.
Illustratively, the computer program 72 may be divided into one or more modules/units, which are stored in the memory 71 and executed by the processor 70 to carry out the invention. One or more of the modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program 72 in the electronic device 7.
The electronic device 7 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The terminal may include, but is not limited to, a processor 70, a memory 71. It will be appreciated by those skilled in the art that fig. 7 is only an example of the electronic device 7 and does not constitute a limitation of the electronic device 7 and may comprise more or less components than those shown, or some components may be combined, or different components, e.g. the terminal may further comprise input output devices, network access devices, buses, etc.
The Processor 70 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The storage 71 may be an internal storage unit of the electronic device 7, such as a hard disk or a memory of the electronic device 7. The memory 71 may also be an external storage device of the electronic device 7, such as a plug-in hard disk provided on the electronic device 7, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. Further, the memory 71 may also include both an internal storage unit of the electronic device 7 and an external storage device. The memory 71 is used for storing computer programs and other programs and data required by the terminal. The memory 71 may also be used to temporarily store data that has been output or is to be output.
The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored, and when being executed by a processor, the computer program implements the steps in the above-mentioned embodiments of the method for filling missing data for a user.
The computer-readable storage medium stores a computer program 72, the computer program 72 includes program instructions, and when the program instructions are executed by the processor 70, all or part of the processes in the method according to the above embodiments may be implemented by the computer program 72 instructing related hardware, and the computer program 72 may be stored in a computer-readable storage medium, and when the computer program 72 is executed by the processor 70, the steps of the above embodiments of the method may be implemented. The computer program 72 comprises, inter alia, computer program code, which may be in the form of source code, object code, an executable file or some intermediate form. The computer readable medium may include: any entity or device capable of carrying computer program code, recording medium, U.S. disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution media, and the like. It should be noted that the computer readable medium may include any suitable increase or decrease as required by legislation and patent practice in the jurisdiction, for example, in some jurisdictions, computer readable media may not include electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.
The computer readable storage medium may be an internal storage unit of the terminal of any of the foregoing embodiments, for example, a hard disk or a memory of the terminal. The computer readable storage medium may also be an external storage device of the terminal, such as a plug-in hard disk provided on the terminal, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. Further, the computer-readable storage medium may also include both an internal storage unit and an external storage device of the terminal. The computer-readable storage medium is used for storing a computer program and other programs and data required by the terminal. The computer-readable storage medium may also be used to temporarily store data that has been output or is to be output.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal and method may be implemented in other ways. For example, the above-described apparatus/terminal embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain other components which may be suitably increased or decreased as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media which may not include electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.
Claims (10)
1. A method for filling missing data of a user, comprising: extracting power data from an original power data matrix to be filled through a preset sliding time window to form a time window data matrix; pre-filling missing data in the time window data matrix based on an interpolation algorithm to obtain a pre-filled matrix; determining a first user in the pre-populated matrix; for each first user, searching a plurality of second users similar to the power data of the first user in the pre-filling matrix, forming the power data of the first user and the power data of the plurality of second users into a similar matrix corresponding to the first user, and determining a filling value of missing data of the first user in the similar matrix corresponding to the first user based on a singular value threshold algorithm; filling values of missing data of each first user into corresponding positions in the original power data matrix; the original power data matrix comprises designated power data of each target user acquired at a plurality of preset sampling moments; the first user is a target user with missing power data.
2. The method of claim 1, wherein searching for a plurality of second users in the pre-populated matrix that are similar to the first user power data comprises:
and classifying the power data of each user in the pre-filling matrix based on a K nearest neighbor classification algorithm, and determining a plurality of second users similar to the power data of the first user according to the classification result.
3. The user missing data filling method according to claim 2, wherein the similarity degree formula for evaluating similarity of two user power data in the K-nearest neighbor classification algorithm is as follows:
d=1-|ρ|
wherein d is the similarity degree of similarity of two user power data, p is the similarity measurement,x 'and y' are redefined power data vectors of two users, cov (x ', y') is the covariance of x 'and y', σxIs the standard deviation of xyIs the standard deviation of y ', E (x ') is the average of x ', E (y ') is the average of y '; x ═ wx, y ═ wy, x, y are the power data vectors of two users, x ═ y1,x2,…,xt],y=[y1,y2,…,yt]W is a vector of weight coefficients, w ═ w1,w2,…,wt],q is a constant less than 1.
4. The method according to claim 2, wherein the number of the second users is a preset number, and the value range of the preset number is [40, 50 ].
5. The method according to claim 1, wherein the length of the preset sliding time window is [60, 90] days.
6. The method of claim 1, wherein after the predetermined sliding time window extracts data from the raw power data matrix to be filled to form a time window data matrix, the method further comprises:
carrying out data deduplication processing and abnormal value discarding processing on the time window data matrix;
the pre-filling missing data in the time window data matrix based on the interpolation algorithm to obtain a pre-filled matrix, comprising:
and pre-filling missing data in the time window data matrix after the data deduplication processing and the abnormal value discarding processing based on an interpolation algorithm to obtain the pre-filled matrix.
7. The method of claim 1, wherein the method further comprises:
marking the position of the missing data in the time window data matrix;
the determining a first user in the pre-populated matrix comprises:
based on the location, a first user in the pre-populated matrix is determined.
8. The user missing data population method according to any of claims 1-7, wherein said specified power data comprises at least one of: the method comprises the following steps of (1) forward active total electric quantity, peak valley-leveling moment electric quantity, active power, ABC three-phase active power, reactive power and ABC three-phase reactive power;
the interpolation algorithm comprises a linear interpolation method;
the original power data matrix is M ∈ Rt×nWherein t is the total number of the preset sampling time, and n is the total number of the target users;
the method further comprises the following steps:
acquiring appointed power data of each target user at a plurality of preset sampling moments according to a preset period;
the same type of specified power data are adjusted to be in the same format, and the original power data matrix is established based on the adjusted specified power data;
before extracting power data from the raw power data matrix to be populated through a preset sliding time window to form a time window data matrix, the method further comprises:
acquiring the original electric power data matrix from a data source platform;
after padding the padding values of the missing data of each first user to the respective position in the original power data matrix, the method further comprises:
and updating the filled original power data matrix to the data source platform.
9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the method of user missing data population as claimed in any of claims 1 to 8 above when executing the computer program.
10. A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the user missing data population method as claimed in any one of the preceding claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110302543.9A CN113051256A (en) | 2021-03-22 | 2021-03-22 | Method, device and equipment for filling missing data of user |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110302543.9A CN113051256A (en) | 2021-03-22 | 2021-03-22 | Method, device and equipment for filling missing data of user |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113051256A true CN113051256A (en) | 2021-06-29 |
Family
ID=76514145
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110302543.9A Pending CN113051256A (en) | 2021-03-22 | 2021-03-22 | Method, device and equipment for filling missing data of user |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113051256A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105893610A (en) * | 2016-04-26 | 2016-08-24 | 中国科学院信息工程研究所 | Deficiency-source completion method of multi-source heterogeneous large data |
WO2017084713A1 (en) * | 2015-11-19 | 2017-05-26 | Huawei Technologies Co., Ltd. | A computational efficient method to generate an rf coverage map taken into account uncertainty of drive test measurement data |
CN110082699A (en) * | 2019-05-10 | 2019-08-02 | 国网天津市电力公司电力科学研究院 | A kind of low-voltage platform area intelligent electric energy meter kinematic error calculation method and its system |
CN111159638A (en) * | 2019-12-26 | 2020-05-15 | 华南理工大学 | Power distribution network load missing data recovery method based on approximate low-rank matrix completion |
CN112380998A (en) * | 2020-11-16 | 2021-02-19 | 华北电力大学(保定) | Low-voltage transformer area missing data completion method based on matrix completion |
-
2021
- 2021-03-22 CN CN202110302543.9A patent/CN113051256A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017084713A1 (en) * | 2015-11-19 | 2017-05-26 | Huawei Technologies Co., Ltd. | A computational efficient method to generate an rf coverage map taken into account uncertainty of drive test measurement data |
CN105893610A (en) * | 2016-04-26 | 2016-08-24 | 中国科学院信息工程研究所 | Deficiency-source completion method of multi-source heterogeneous large data |
CN110082699A (en) * | 2019-05-10 | 2019-08-02 | 国网天津市电力公司电力科学研究院 | A kind of low-voltage platform area intelligent electric energy meter kinematic error calculation method and its system |
CN111159638A (en) * | 2019-12-26 | 2020-05-15 | 华南理工大学 | Power distribution network load missing data recovery method based on approximate low-rank matrix completion |
CN112380998A (en) * | 2020-11-16 | 2021-02-19 | 华北电力大学(保定) | Low-voltage transformer area missing data completion method based on matrix completion |
Non-Patent Citations (1)
Title |
---|
乔文俞: "基于曲线相似与低秩矩阵的缺失电量数据补全方法", 《电力建设》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021000556A1 (en) | Method and system for predicting remaining useful life of industrial equipment, and electronic device | |
CN106951695A (en) | Plant equipment remaining life computational methods and system under multi-state | |
CN110515931B (en) | Capacitive type equipment defect prediction method based on random forest algorithm | |
CN113177366B (en) | Comprehensive energy system planning method and device and terminal equipment | |
CN113112099A (en) | Power grid daily electric quantity prediction model training method and power grid daily electric quantity prediction method | |
CN111612275A (en) | Method and device for predicting load of regional user | |
CN113516275A (en) | Power distribution network ultra-short term load prediction method and device and terminal equipment | |
CN111537884A (en) | Method and device for acquiring service life data of power battery, computer equipment and medium | |
CN114021483A (en) | Ultra-short-term wind power prediction method based on time domain characteristics and XGboost | |
CN117744916A (en) | Method and device for predicting energy storage capacity, computer equipment and readable storage medium | |
CN114498619A (en) | Wind power prediction method and device | |
CN113807728A (en) | Performance assessment method, device, equipment and storage medium based on neural network | |
CN113673742A (en) | Distribution transformer area load prediction method, system, device and medium | |
CN115795329A (en) | Power utilization abnormal behavior analysis method and device based on big data grid | |
CN118228061A (en) | Device design diagram generation method and computing device | |
CN109977131A (en) | A kind of house type matching system | |
CN112632857A (en) | Method, device, equipment and storage medium for determining line loss of power distribution network | |
CN112801315A (en) | State diagnosis method and device for power secondary equipment and terminal | |
CN117458437A (en) | Short-term wind power prediction method, system, equipment and medium | |
CN112085926A (en) | River water pollution early warning method and system | |
CN109657907B (en) | Quality control method and device for geographical national condition monitoring data and terminal equipment | |
CN113051256A (en) | Method, device and equipment for filling missing data of user | |
CN115809795A (en) | Digitalized production team bearing capacity evaluation method and device | |
CN114676931B (en) | Electric quantity prediction system based on data center technology | |
CN115952928A (en) | Short-term power load prediction method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210629 |
|
RJ01 | Rejection of invention patent application after publication |