CN116304949A

CN116304949A - Calibration method for energy consumption historical data

Info

Publication number: CN116304949A
Application number: CN202310329227.XA
Authority: CN
Inventors: 汪红亮; 刘钢; 罗鹏鑫; 李沁贇; 谭灿; 祝娟
Original assignee: Zhejiang Yuanchuang Intelligent Control Technology Co ltd
Current assignee: Zhejiang Yuanchuang Intelligent Control Technology Co ltd
Priority date: 2023-03-30
Filing date: 2023-03-30
Publication date: 2023-06-23

Abstract

The invention discloses a calibration method of energy consumption historical data, which comprises the steps of firstly, chrononizing the energy consumption data, screening out unreliable data at the same time by combining an outlier factor LOF principle, substituting effective information in abnormal data, and simulating middle calibration data by a model prediction algorithm. The method has the characteristics of effectively improving the accuracy of judging the abnormal data and fully utilizing the effective information in the abnormal data.

Description

Calibration method for energy consumption historical data

Technical Field

The invention relates to the field of real-time historical data of an energy management system, in particular to a calibration method of energy consumption historical data.

Background

The energy management system is in butt joint with several common meters, including an ammeter, a water meter, a gas meter and an energy meter, and monitors, manages, counts, analyzes, executes an energy-saving strategy and the like by collecting meter data. The period of data acquisition according to the requirements of actual projects is generally between 1 minute and 15 minutes. In order to realize the service requirement, the collected real-time data is stored to be historical data. In practical project implementation and use, several problems are likely to occur:

(1) The collector cannot guarantee to provide 365×24 hours of service, and data loss caused by equipment damage, power failure, network disconnection and the like is not eliminated.

(2) The 0 error in the engineering implementation configuration and the use process of a user cannot be guaranteed, the configuration and use problems possibly cause the occurrence of abnormality of the collected data, the data is continuously accumulated along with the time, and finally the abnormal data is difficult to find and difficult to repair after finding.

In order to solve the above technical problems, researchers have begun to adopt various methods to try to solve the problems. The patent name is "a method, a system and a device for filling missing values of electricity collection data". The method comprises the steps that the average value-variance method is utilized to detect abnormal values in the power utilization acquisition data, and then the abnormal power utilization acquisition data are deleted; then, the power consumption acquisition data is utilized to train a noise reduction self-encoder model, the original power consumption sample data is reconstructed based on the trained noise reduction self-encoder network model, and the reconstruction data is utilized to fill the missing power consumption acquisition sample data; preventing the model from being over fitted, and providing a new Determination-FourOrder regularization term; in order to obtain a better noise attenuation ratio, reducing the noise level according to the number of units of the network layer; then, the k-means clustering method, the average distance from the adjacent data points to the cluster-like center, and the standard deviation of the data are combined to correct the filled data values, which has the defects that:

1) The patent detects abnormal values in the electricity collection data according to the mean-variance method, and directly deletes the abnormal values, but in practice, the abnormal data are judged in a wrong way.

2) The exception data is discarded directly, resulting in some valid information not being extracted and utilized.

Therefore, the conventional technique has a problem that erroneous judgment is likely to occur and effective information in abnormal data cannot be used.

Disclosure of Invention

The invention aims to provide a calibration method of energy consumption historical data. The method has the characteristics of effectively improving the accuracy of judging the abnormal data and fully utilizing the effective information in the abnormal data.

The technical scheme of the invention is as follows: the method comprises the steps of firstly, time-sequencing energy consumption data, screening out unreliable data at the same time by combining an outlier factor LOF principle, substituting effective information in abnormal data, and simulating intermediate calibration data by a model prediction algorithm.

The foregoing method for calibrating the energy consumption history data comprises the following specific steps:

A. acquiring historical energy consumption data of a single meter, converting the historical energy consumption data into a time sequence, and calculating a reading value of each moment and an increment of a corresponding time zone to obtain a data point of each moment of the meter;

B. quantifying the degree of anomaly of each data point based on a LOF algorithm of the time sequence;

C. the user confirms the abnormal degree of the data points and backfills the effective information;

D. modeling calibration data through the trend of the contemporaneous historical data;

E. the user validates after confirmation.

In the foregoing method for calibrating energy consumption history data, the step B includes the following specific steps:

b1, for each data point, calculating the distance between other all data points and sorting from near to far;

b2, then find its k-nearest-neighbor for each data point according to the ordering, calculate the LOF score.

In the foregoing calibration method for energy consumption history data, the LOF score, that is, the specific calculation formula of the local anomaly factor is:

the local anomaly factor for point p is expressed as:

the local reachable density of point p is expressed as:

wherein the kth distance neighborhood Nk (p) of the point p is the kth distance of p, i.e. all points within, including the kth distance; the number of k-th neighborhood points of p, |nk (p) | > = k;

the kth reachable distance from point o to point p is defined as:

reach-distance _k (p，o)＝max{k-distance(o)，d(p，o)}，

wherein d (P, O) represents the distance between two points P and O; k-distance represents the K-nearest distance, wherein the distance between the K nearest point and the point p is the K-adjacent distance of the point p, and is denoted as K-distance (p);

the kth distance dk (p) for point p is defined as follows: dk (p) =d (p, o).

In the foregoing method for calibrating energy consumption history data, in step C, the effective information includes a read value at a certain time and an increment of a certain time width.

Compared with the prior art, the method has the advantages that the time factors are added in the classical density-based algorithm Local Outlier Factor, the energy consumption data of each table are arranged according to time sequence, and whether the energy consumption data point at the moment is an outlier is judged by distributing an outlier factor (local anomaly factor) LOF which depends on the neighborhood density to the energy consumption data point at each moment. Modeling valid historical data, predicting missing data and reassigning anomalous data by bringing in valid information of the anomalous data. The advantages are as follows:

1) The degree of anomaly (outlierness) for each data point can be quantified;

2) The abnormal data are only used for identifying possible abnormal data and abnormal degree, so that a user can conveniently and quickly position the abnormal data, manual confirmation is allowed, and excessive correction is avoided;

3) The effective information in the abnormal data is fully utilized, and is brought into the model to perform time sequence arrangement on the middle missing data and the abnormal data through predictive analysis.

In summary, the method and the device have the characteristics of effectively improving the accuracy of judging the abnormal data and fully utilizing the effective information in the abnormal data.

Drawings

FIG. 1 is a flowchart of the specific steps of the present invention;

FIG. 2 is a graph of a table of the present invention with abnormal energy consumption delta data trends.

FIG. 3 is a graph of the present invention with abnormal energy consumption delta data calibrated trend;

fig. 4 is a schematic diagram of the kth reachable distance from point o to point p.

Detailed Description

The invention is further illustrated by the following figures and examples, which are not intended to be limiting.

Examples. The invention provides a method for detecting abnormality and calibrating data of water, electricity, gas and energy consumption historical data, which is shown in a flow chart of the method in the attached figure 1, and mainly comprises the following steps:

s1: historical energy consumption data of a single meter are obtained and converted into a time sequence, and the reading value of each moment and the increment of the corresponding time zone are calculated.

S2: for each data point, the distances to all other points are calculated and ordered from near to far.

S3: for each data point, find its K-nearest-neighbor (K nearest neighbor, meaning K nearest neighbors), calculate the LOF score.

The LOF score, i.e., the specific calculation formula of the local anomaly factor, is:

d (P, O) represents the distance between two points P and O; k-distance represents the kth distance;

among the points nearest to the data point p, the K-nearest distance between the kth nearest point and the point p is the K-adjacent distance of the point p, denoted as K-distance (p);

the kth distance dk (p) for point p is defined as follows: dk (p) =d (p, o), and satisfies the following condition:

(a) At least k points o epsilon C { x not equal to p } which do not contain p in the set, and d (p, o') is less than or equal to d (p, o);

(b) K-1 points o epsilon C { x not equal to p } which do not include p at most in the set, and d (p, o')notmore than d (p, o);

the kth distance of P, i.e. the distance from the kth point of P, does not include P.

(3) k-distance neighborhood of p: kth distance neighborhood

The kth distance neighborhood Nk (p) of a point p is the kth distance of p, i.e., all points within, including the kth distance.

The number of k-th neighborhood points of p, |nk (p) | > =k.

(4) reach-distance: reach distance

Reachable distance (Reachablity distance): the definition of the reachable distance is related to the K-neighbor distance, and given a parameter K, the reachable distance reach-dist (p, o) of data point p to data point o is the K-neighbor distance of data point o and the maximum of the direct distance between data point p and point o.

The kth reachable distance from point o to point p is defined as:

reach-distanCe _k (p，o)=max{k-distance(o),d(p,o)}

that is, the kth reachable distance from point o to point p is at least the kth distance of o, or the true distance between o, p. This also means that the k points nearest to point o, o to their reachable distances are considered equal and all equal to dk (o). O as shown in FIG. 4 below ₁ The 5 th reachable distance to p is d (p, o ₁ )，o ₂ The 5 th reachable distance to p is d ₅ (o ₂ )。

(5) local reachablity density: local reachable density

Local reachable density (local reachablity density): the definition of local reachable density is based on reachable distance, and for data point p, those data points with a distance from the point p less than or equal to K-distance (p) are called its K-nearest-neighbor, denoted as Nk (p), and the local reachable density of data point p is the inverse of its average reachable distance from neighboring data points.

The local reachable density of point p is expressed as:

representing the inverse of the average reachable distance from point to p in the kth neighborhood of point p.

Note that: the distance from the neighbor point Nk (p) of p to p is not the distance from p to Nk (p), and the relationship must be clarified. And if there are repeat points, the sum of the reachable distances of the denominators is possibly 0, which will result in ird becoming infinite, as will be further mentioned below.

The meaning of this value can be understood by first representing a density, the higher the density, the more likely we consider to belong to the same cluster, the lower the density, the more likely to be outliers, the more likely the reachable distance is a smaller dk (o) if p and surrounding neighborhood points are the same cluster, resulting in a smaller sum of reachable distances and a higher density value; if p and surrounding neighbor points are far apart, the reachable distances may both take on larger values d (p, o), resulting in a smaller density, more likely to be outliers.

(6) local outlier factor: local outlier factor

Local Outlier Factor: according to the definition of local reachable density, if one data point is far from the other, it is apparent that its local reachable density is small. The LOF algorithm measures the degree of abnormality of a data point, not its absolute local density, but its relative density to surrounding neighboring data points. This has the advantage of allowing for non-uniform data distribution and different densities. Local anomaly factors are defined by both local relative densities. The local relative density (local anomaly factor) of a data point p is the ratio of the average local reachable density of neighbors of the point p to the local reachable density of the data point p.

The local outlier factor of point p is expressed as:

represents the average of the ratio of the local reachable density of the neighborhood point Nk (p) of the point p to the local reachable density of the point p.

LOF reflects the degree of abnormality of a sample, primarily by calculating a numerical score. This value generally means: the average density of the locations of the sample points around a sample point is compared to the density of the locations of the sample points. If the ratio is closer to 1, the density of the neighborhood points of p is almost the same as that of the neighborhood, and p is possibly in the same cluster with the neighborhood; if the ratio is smaller than 1, the density of p is higher than that of the neighborhood point, and p is a density point; if this ratio is greater than 1, the density of p is less than the density of its neighborhood points, and p is more likely to be an outlier.

S4: the user confirms and backfills the effective information: a read value at a certain time, an increment of a certain time width.

S5: calibration data is simulated by contemporaneous historical data trends.

S6: validation is confirmed.

According to the invention, firstly, the energy consumption data is time-sequenced, then the unreliable data at the same time is screened out by combining with an outlier factor LOF principle, then effective information in abnormal data is substituted, and intermediate calibration data is simulated by a model prediction algorithm. Thus, the degree of abnormality of each data point can be quantified; the abnormal data are only used for identifying possible abnormal data and abnormal degree, so that a user can conveniently and quickly position the abnormal data, manual confirmation is allowed, and excessive correction is avoided; the effective information in the abnormal data is fully utilized, and is brought into the model to perform time sequence arrangement on the middle missing data and the abnormal data through predictive analysis.

Claims

1. The method for calibrating the energy consumption historical data is characterized by comprising the following steps of: firstly, the energy consumption data is time-sequenced, then the unreliable data at the same time is screened out by combining with an outlier factor LOF principle, then effective information in abnormal data is substituted, and intermediate calibration data is simulated by a model prediction algorithm.

2. A method for calibrating energy consumption history data according to claim 1, comprising the specific steps of:

E. the user validates after confirmation.

3. A method for calibrating energy consumption history data according to claim 2, wherein step B comprises the following steps:

4. A method for calibrating energy consumption history data according to claim 3, wherein the LOF score, i.e. the specific calculation formula of the local anomaly factor, is:

the local anomaly factor for point p is expressed as:

the local reachable density of point p is expressed as:

the kth reachable distance from point o to point p is defined as:

reach-distance _k (p，o)＝max{k-distance(o)，d(p，o)}，

the kth distance dk (p) for point p is defined as follows: dk (p) =d (p, o).

5. A method of calibrating energy consumption history according to claim 2, wherein: in step C, the valid information includes a read value at a certain time and an increment of a certain time width.