CN111461409A - Abnormal value processing method for medium and long-term load data - Google Patents

Abnormal value processing method for medium and long-term load data Download PDF

Info

Publication number
CN111461409A
CN111461409A CN202010163490.2A CN202010163490A CN111461409A CN 111461409 A CN111461409 A CN 111461409A CN 202010163490 A CN202010163490 A CN 202010163490A CN 111461409 A CN111461409 A CN 111461409A
Authority
CN
China
Prior art keywords
abnormal
data
industry
increment
medium
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010163490.2A
Other languages
Chinese (zh)
Inventor
胡迎迎
王尧
李佳
邓娇娇
刘红丽
童星
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Economic and Technological Research Institute of State Grid Shanxi Electric Power Co Ltd
Original Assignee
Shenzhen Orange Technology Co Ltd
Economic and Technological Research Institute of State Grid Shanxi Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Orange Technology Co Ltd, Economic and Technological Research Institute of State Grid Shanxi Electric Power Co Ltd filed Critical Shenzhen Orange Technology Co Ltd
Priority to CN202010163490.2A priority Critical patent/CN111461409A/en
Publication of CN111461409A publication Critical patent/CN111461409A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Marketing (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Quality & Reliability (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Development Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the invention discloses an abnormal value processing method of medium and long-term load data. The abnormal value processing method of the medium and long term load data comprises the following steps: identifying abnormal data; wherein the abnormal data comprises null values, zero values, negative data points, similarities, outlier data points and step phenomena; performing industry data abnormality degree checking on the abnormal data; and if the abnormal degree is within the normal range, correcting the abnormal data. After the abnormal value processing method of the medium and long-term load data is adopted to repair the abnormal data, the relative error of electric quantity prediction can be reduced in detail.

Description

Abnormal value processing method for medium and long-term load data
Technical Field
The invention relates to the technical field related to early warning, prediction and analysis of power demand, in particular to a method for processing abnormal values of medium and long-term load data.
Background
In the related art of power demand early warning and forecasting analysis, if some data points do not meet the general rule of a load curve, and the early warning and forecasting results are misled, the data points are called abnormal data points. Under this definition, not only the errors in measurement and data transmission lead to the generation of abnormal data points, but also abnormal fluctuations in the user-side load can be considered as abnormal data points.
Data used for power demand early warning prediction analysis are mostly electric quantity original data directly collected from a power system, and the data cannot be directly applied to specific analysis due to some problems.
The abnormal data points in the power demand early warning and prediction are generally caused by the following three reasons: data acquisition errors, power distribution and power consumption fluctuation and statistics system harm.
1. Data acquisition errors:
all abnormal data points caused by secondary side faults including measurement failures, communication faults and data processing system errors belong to data acquisition errors. The proportion of abnormal data generated at this stage is the highest, and the interference to early warning and prediction is the greatest, but the main mode of the abnormal data is also the easiest to identify. The common modes mainly comprise zero-value data points, negative-value data points and the same phenomenon. The zero data point and the negative data point are abnormal data points with the highest proportion in the system, and are characterized in that the data records of the points are empty, zero or negative. The phenomenon of Raynaud: the phenomenon of similarity refers to the situation that the electricity consumption of the whole society and the electricity consumption of each industry are completely consistent at two different time points, and most of the phenomena are caused by errors of a data processing system.
2. Fluctuation of the power distribution network:
abnormal data of electricity usage may be caused by sporadic irregular electricity usage behavior at the user side. For example, if a large-scale literature performance is held in a place for several consecutive days, the electricity consumption in the month will abnormally rise irregularly, and the abnormal data pattern caused by the irregular rise appears as discrete value data points. Outliers are a few isolated points whose values differ significantly from other point values. This phenomenon occurs because the electricity consumption behavior that causes it to occur is sporadic, and once this sporadic electricity consumption behavior ends, this abnormal phenomenon ends immediately.
3. Statistical system impairment
Data used for early warning and forecasting analysis of power demand relates to industry power consumption data in provincial administration units, and different industries in provincial administration units at the present stage have large difference in power consumption statistical system, and the difference is mainly expressed in that the initial years of statistics of power consumption of different industries are different in the morning and evening, and the statistical calibers of some industries are changed in the statistical process, so that null value data points and step phenomena are generated. The null data point is that the industry with the later statistical starting age is null at the time point before the statistical starting age. The step phenomenon means that the electricity consumption is maintained at a certain fixed level in the initial period, and when reaching a certain time point, the electricity consumption is suddenly changed to a certain value which is different from the previous value, and then the electricity consumption is maintained at the value level.
Because various errors can occur in the links of the acquisition, transmission and the like of the electric quantity data, some random factors can cause severe fluctuation of the electric quantity data in a short time, and the wrong analysis results can be caused by the wrong or problem data which does not accord with the whole change rule of the electric quantity sequence. Based on the above phenomenon, the abnormal value processing is performed on the original electric quantity data before the electric power demand early warning and prediction analysis is performed.
Disclosure of Invention
In view of the deficiencies of the prior art, one aspect of the present invention provides a method for processing abnormal values of medium and long term load data.
The abnormal value processing method of the medium and long term load data comprises the following steps:
identifying abnormal data; wherein the abnormal data comprises null values, zero values, negative data points, similarities, outlier data points and step phenomena;
performing industry data abnormality degree checking on the abnormal data;
and if the abnormal degree is within the normal range, correcting the abnormal data.
According to a preferred embodiment, the method of identifying anomalous data comprises:
generating an electric quantity matrix E; the first dimension index of the electric quantity matrix E is an industry serial number, and the second dimension index is a time serial number;
generating an abnormal data identification matrix I corresponding to the electricity consumption matrix E; the size of the abnormal data identification matrix I is the same as that of the abnormal data identification matrix E;
Figure BDA0002406616380000031
when E (I, j) is correct data, I (I, j) is zero; when E (I, j) is anomalous data, I (I, j) is identified as a non-zero value;
wherein A is a first dimension index set of E, and B is a second dimension index set of E.
According to a preferred embodiment, the method of identifying null, zero, and negative data points is:
Figure BDA0002406616380000032
if E (I, j) is null, zero or negative, then label I (I, j) as 1, indicating a null, zero or negative data point;
the method for identifying the similarities comprises the following steps:
Figure BDA0002406616380000033
if it is not
Figure BDA0002406616380000034
Are all provided with E (i, j)1)=E(i,j2) Then, I (I, j)2) Labeled 2, where i ∈ A, represents a duplicate data point.
According to a preferred embodiment, a method of identifying outlier data points comprises:
identifying an abnormal increment, specifically:
Figure BDA0002406616380000041
let Delta Ei(j) E (i, j +1) -E (i, j) represents the increment between two points in the power sequence; and identifying the abnormal increment according to the following formula:
Figure BDA0002406616380000042
wherein X represents a random variable, E (X) represents the mathematical expectation of the random variable X, D is the variance of the random variable, and k is a constant representing that the random variable leaves an expected range;
when k is 5, X falls in the interval
Figure BDA0002406616380000043
The probability of inner is about 96%; under the same criteria, the random variable Δ Ei(j) Falls within the interval [ E (Δ E)i(j))-5D(ΔEi(j)),E(ΔEi(j))+5D(ΔEi(j))]The probability of (1) is 96%; if the increment falls outside this interval, then this increment is an abnormal increment;
if Δ Ei(j) And Δ EiAnd (j-1) is the abnormal increment and falls on two sides of the interval respectively, then E (I, j) is the outlier point and I (I, j) is 3.
According to a preferred embodiment, the method of identifying a step phenomenon comprises:
identifying an abnormal increment, specifically:
Figure BDA0002406616380000044
let Delta Ei(j) E (i, j +1) -E (i, j) represents the increment between two points in the power sequence; and identifying the abnormal increment according to the following formula
Figure BDA0002406616380000045
Wherein X represents a random variable, E (X) represents the mathematical expectation of the random variable X, D is the variance of the random variable, and k is a constant representing that the random variable leaves an expected range;
when k is 5, X falls within the interval [ E (X) -5D (X), E (X) +5D (X)]The probability of inner is about 96%; under the same criteria, the random variable Δ Ei(j) Falls within the interval [ E (Δ E)i(j))-5D(ΔEi(j)),E(ΔEi(j))+5D(ΔEi(j))]The probability of (1) is 96%; if the increment falls outside this interval, then this increment is an abnormal increment;
for industry i, if only Δ Ei(j) If the increment is abnormal, the industry I has a step phenomenon, and the index is I (I, j)k) Is 4, jkJ, which represents j owned by industry ikThe points are step outliers.
According to a preferred embodiment, the method for performing industry data abnormality degree check on the abnormal data comprises the following steps:
calculating the degree of abnormality of the industry by adopting the following formula
λ(i)=n1(i)/n(i)
Wherein λ (i) is the degree of industry anomaly, where n1(i) The number of abnormal points of the electricity consumption of the industry i is n (i), and the total number of the data points of the electricity consumption of the industry i is n (i);
if λ (i) < λlimIf the abnormal degree of the industry i is in a normal range, the abnormal point can be corrected;
wherein λ islimThe minimum value for judging that the industry i belongs to the data abnormal industry.
According to a preferred embodiment, the method for correcting the data comprises the following steps: and performing linear interpolation correction on the abnormal data in a linear interpolation correction mode.
Another aspect of the invention provides a medium and long term load data outlier processing system.
The abnormal value processing system of the medium and long term load data is characterized by comprising:
the data acquisition module is used for acquiring data;
the abnormal data identification module is used for identifying abnormal data;
the industry data abnormality degree checking module is used for checking the abnormality degree of the industry data of the abnormal data;
and the correction module is used for correcting the abnormal data.
Still another aspect of the present invention provides a medium-and-long-term load data abnormal value processing apparatus.
The abnormal value processing device of the medium and long term load data comprises a processor and a memory, wherein the memory is used for storing instructions, and the instructions are characterized in that when being executed by the processor, the instructions cause the device to realize the abnormal value processing method of the medium and long term load data.
Yet another aspect of the invention provides a computer-readable storage medium.
The storage medium stores computer instructions, and after the computer reads the computer instructions in the storage medium, the computer runs the abnormal value processing method of the medium and long term load data.
Compared with the prior art, the abnormal value processing method of the medium and long-term load data has the following beneficial effects:
after the abnormal value processing method of the medium and long-term load data is adopted to repair the abnormal data, the relative error of electric quantity prediction can be reduced in detail.
Additional features of the invention will be set forth in part in the description which follows. Additional features of some aspects of the invention will become apparent to those of ordinary skill in the art upon examination of the following description and accompanying drawings or may be learned by the manufacture or operation of the embodiments. The features of the present disclosure may be realized and attained by practice or use of various methods, instrumentalities and combinations of the specific embodiments described below.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention without limiting the invention. Like reference symbols in the various drawings indicate like elements. Wherein the content of the first and second substances,
FIG. 1 is a schematic flow diagram of a method of outlier processing of medium and long term load data according to some embodiments of the present invention;
FIG. 2 is a schematic illustration of the results of outlier processing of medium and long term load data shown in accordance with some embodiments of the present invention;
FIG. 3 is a schematic illustration of a method of processing outliers of medium and long term load data without industry outlier checking according to some embodiments of the present invention;
FIG. 4 is a system block diagram of an outlier processing system for medium and long term load data, shown in accordance with some embodiments of the present invention;
fig. 5 is a schematic diagram of an abnormal value processing apparatus of medium and long term load data according to some embodiments of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention. In the following description, for purposes of explanation and not limitation, specific details are set forth such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict. The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.
One aspect of the embodiment of the invention discloses an abnormal value processing method for medium and long-term load data.
As shown in fig. 1, the abnormal value processing method for medium and long term load data mainly includes:
s11: identifying abnormal data; wherein the abnormal data comprises null values, zero values, negative data points, similarities, outlier data points and step phenomena;
s12: performing industry data abnormality degree checking on the abnormal data;
s13: and if the abnormal degree is within the normal range, correcting the abnormal data.
The identification methods of abnormal data in different modes such as null value, zero value, negative value data point, similarity phenomenon, outlier data point and step phenomenon are different: null value, zero value and negative value data points, the similarity phenomenon can be directly judged according to the numerical characteristics of the data, and the outlier data points and the step phenomenon can be identified only after the statistical characteristics of the normal data are known. From the view of identifying the data range needing to be judged, the similar phenomenon needs to be judged from the whole of the power consumption of each industry, and the judgment of other abnormal values only needs to pay attention to the power data of the industry.
In this embodiment, the method for identifying abnormal data may include:
generating an electric quantity matrix E based on the acquired medium and long-term load data; the first dimension index of the electric quantity matrix E is an industry serial number, and the second dimension index is a time serial number;
generating an abnormal data identification matrix I corresponding to the electricity consumption matrix E; the size of the abnormal data identification matrix I is the same as that of the abnormal data identification matrix E;
Figure BDA0002406616380000081
when E (I, j) is correct data, I (I, j) is zero; when E (I, j) is anomalous data, I (I, j) is identified as a non-zero value;
wherein A is a first dimension index set of E, and B is a second dimension index set of E.
Specifically, in the identification process, an abnormal data identification matrix I is generated corresponding to the electricity consumption matrix E, the size of the abnormal data identification matrix I is the same as that of the abnormal data identification matrix E, the first dimension index of the abnormal data identification matrix E is an industry serial number, and the second dimension index of the abnormal data identification matrix E is a time serial number; and A is a first dimension index set of E, and B is a second dimension index set of E.
Figure BDA0002406616380000091
When E (I, j) is correct data, I (I, j) is zero; when E (I, j) is abnormal data, I (I, j) will be identified as a non-zero value, the size of which is related to the corresponding abnormal data type.
For example, for null, zero and negative data points,
Figure BDA0002406616380000092
if E (I, j) is null, zero, or negative, then mark I (I, j) as 1, indicating a null, zero, or negative data point.
With respect to the phenomenon of the same numerical value,
Figure BDA0002406616380000093
if it is not
Figure BDA0002406616380000094
Are all provided with
Figure BDA0002406616380000095
Then, I (I, j)2) Labeled 2(i ∈ A), representing a duplicate data point.
For outlier data points, an anomalous delta needs to be identified first.
Figure BDA0002406616380000096
Let Delta Ei(j) E (i, j +1) -E (i, j) represents the increment between two points in the power sequence. Due to in the original dataVarious abnormal values are not corrected in the identification stage, so that the statistics of the power consumption sequence cannot be known. In order to identify anomalous deltas without resorting to any subjective or prior information, it is necessary to apply statistical rules for sequences where the probability distribution is unknown, where the chebyshev inequality provides a powerful tool for this task. According to the Chebyshev inequality, random variables distributed randomly all satisfy:
Figure BDA0002406616380000097
where X represents a random variable, E (X) represents the mathematical expectation of the random variable X, and D is the variance of the random variable. k is a constant, indicating that the random variable leaves the desired range. When k is 5, X falls within the interval [ E (X) -5D (X), E (X) +5D (X)]The probability of inner is about 96%. Then under the same criteria, the random variable Δ Ei(j) Falls within the interval [ E (Δ E)i(j))-5D(ΔEi(j)),E(ΔEi(j))+5D(ΔEi(j))]The probability of the load increment is 96%, but the probability of the load increment falling in the interval is higher than 96% because the random variable of the actual power consumption increment shows certain normality. We consider accordingly that if an increment falls outside this interval, then this increment is an anomalous increment.
If Δ Ei(j) And Δ EiAnd (j-1) is an abnormal increment, and the two respectively fall on two sides of the interval, then E (I, j) is regarded as an outlier point, and I (I, j) is recorded as 3, which represents an outlier data point.
For the step phenomenon, the step phenomenon needs to be processed according to a discrete value data point processing method, and abnormal increment is identified. If for industry i, only Δ Ei(j) If the increment is abnormal, the industry I is considered to have a step phenomenon, and the I (I, j) is recordedk) Is 4, jk1, 2, …, j, denotes j owned by industry ikThe points are step outliers.
In the present embodiment, when a certain data point is already identified as an abnormal data point of a certain pattern, the data point does not participate in the identification of other abnormal data patterns.
In this embodiment, the method for performing industry data abnormality degree check on abnormal data specifically includes: let λ (i) ═ n1(i)/n(i)
Wherein λ (i) is the degree of industry anomaly, where n1(i) The number of power consumption anomaly points for industry i, and n (i) the total number of power consumption data points for industry i. If λ (i) < λlimIf the abnormal degree of the industry i is in a normal range, the abnormal point can be corrected; otherwise, the industry i is considered to belong to the data abnormal industry, and the influence of the industry i is not considered when early warning and prediction analysis is carried out. In this embodiment with λlim. Wherein λ islimCan be set as required. For example, as shown in FIG. 3, λlimMay be set to 10%.
In this embodiment, the method for correcting data specifically includes:
when the power consumption data of the industry passes the abnormal degree check, the linear interpolation correction is uniformly carried out on the abnormal data of various modes, namely
Figure BDA0002406616380000101
If I (I, j) ≠ 0, linear interpolation is performed on the E (I, j) data points.
Table 1 comparison table of electric quantity prediction relative errors before and after abnormal data is repaired
Figure BDA0002406616380000102
As shown in table 1 and fig. 2, after the abnormal value processing method for medium and long term load data according to the embodiment of the present invention is used to repair abnormal data, the relative error of power prediction may be reduced in detail.
One aspect of the embodiment of the invention discloses an abnormal value processing method for medium and long-term load data.
As shown in fig. 4, the abnormal value processing system 20 for medium and long term load data mainly includes a data acquisition module 21, an abnormal data identification module 22, an industry data abnormal degree check module 23, and a correction module 24.
The data acquiring module 21 may be used to acquire data. In this embodiment, the data acquisition module 21 may directly collect raw data of electric quantity from the power system.
The anomalous data identification module 22 may be configured to identify anomalous data; in this embodiment, the method for the abnormal data identification module 22 to identify the abnormal data is specifically as follows:
generating an electric quantity matrix E based on the acquired medium and long-term load data; the first dimension index of the electric quantity matrix E is an industry serial number, and the second dimension index is a time serial number;
generating an abnormal data identification matrix I corresponding to the electricity consumption matrix E; the size of the abnormal data identification matrix I is the same as that of the abnormal data identification matrix E;
Figure BDA0002406616380000111
when E (I, j) is correct data, I (I, j) is zero; when E (I, j) is anomalous data, I (I, j) is identified as a non-zero value;
wherein A is a first dimension index set of E, and B is a second dimension index set of E.
Specifically, in the identification process, an abnormal data identification matrix I is generated corresponding to the electricity consumption matrix E, the size of the abnormal data identification matrix I is the same as that of the abnormal data identification matrix E, the first dimension index of the abnormal data identification matrix E is an industry serial number, and the second dimension index of the abnormal data identification matrix E is a time serial number; and A is a first dimension index set of E, and B is a second dimension index set of E.
Figure BDA0002406616380000112
When E (I, j) is correct data, I (I, j) is zero; when E (I, j) is abnormal data, I (I, j) will be identified as a non-zero value, the size of which is related to the corresponding abnormal data type.
For example, for null, zero and negative data points,
Figure BDA0002406616380000113
if E (I, j) is null, zero or negative, then mark I (I, j) as 1, indicating null, zero or negative dataAnd (4) point.
With respect to the phenomenon of the same numerical value,
Figure BDA0002406616380000114
if it is not
Figure BDA0002406616380000115
Are all provided with E (i, j)1)=E(i,j2) Then, I (I, j)2) Labeled 2(i ∈ A), representing a duplicate data point.
For outlier data points, an anomalous delta needs to be identified first.
Figure BDA0002406616380000116
Let Delta Ei(j) E (i, j + l) -E (i, j) represents the increment between two points in the power sequence. Since various abnormal values in the original data are not corrected in the identification stage, the statistics of the power consumption sequence cannot be known. In order to identify anomalous deltas without resorting to any subjective or prior information, it is necessary to apply statistical rules for sequences where the probability distribution is unknown, where the chebyshev inequality provides a powerful tool for this task. According to the Chebyshev inequality, random variables distributed randomly all satisfy:
Figure BDA0002406616380000121
where X represents a random variable, E (X) represents the mathematical expectation of the random variable X, and D is the variance of the random variable. k is a constant, indicating that the random variable leaves the desired range. When k is 5, X falls within the interval [ E (X) -5D (X), E (X) +5D (X)]The probability of inner is about 96%. Then under the same criteria, the random variable Δ Ei(j) Falls within the interval [ E (Δ E)i(j))-5D(ΔEi(j)),E(ΔEi(j))+5D(ΔEi(j))]The probability of the load increment is 96%, but the probability of the load increment falling in the interval is higher than 96% because the random variable of the actual power consumption increment shows certain normality. It is believed that, if the increment falls outside this interval,then this increment is an exception increment.
If Δ Ei(j) And Δ EiAnd (j-1) is an abnormal increment, and the two respectively fall on two sides of the interval, then E (I, j) is regarded as an outlier point, and I (I, j) is recorded as 3, which represents an outlier data point.
For the step phenomenon, the step phenomenon needs to be processed according to a discrete value data point processing method, and abnormal increment is identified. If for industry i, only Δ Ei(j) If the increment is abnormal, the industry I is considered to have a step phenomenon, and the I (I, j) is recordedk) Is 4, jkJ, which represents j owned by industry ikThe points are step outliers.
In the present embodiment, when a certain data point is already identified as an abnormal data point of a certain pattern, the data point does not participate in the identification of other abnormal data patterns.
The industry data abnormality degree checking module 23 may be configured to perform industry data abnormality degree checking on the abnormal data.
In this embodiment, the method for performing industry data abnormality degree check on the abnormal data by the industry data abnormality degree check module 23 specifically includes the following steps: note the book
λ(i)=n1(i)/n(i)
Wherein λ (i) is the degree of industry anomaly, where n1(i) The number of power consumption anomaly points for industry i, and n (i) the total number of power consumption data points for industry i. If λ (i) < λlimIf the abnormal degree of the industry i is in a normal range, the abnormal point can be corrected; otherwise, the industry i is considered to belong to the data abnormal industry, and the influence of the industry i is not considered when early warning and prediction analysis is carried out. In this embodiment with λlim. Wherein λ islimCan be set as required. For example, as shown in FIG. 3, λlimMay be set to 10%.
The correction module 24 may be used to correct the abnormal data. In this embodiment, the method for correcting the data by the correction module 24 specifically includes:
when the industrial electricity consumption data passes the abnormal degree check,the abnormal data of various modes are uniformly subjected to linear interpolation correction, i.e.
Figure BDA0002406616380000131
If I (I, j) ≠ 0, linear interpolation is performed on the E (I, j) data points.
In another aspect of the embodiment of the invention, an abnormal value processing device of medium and long term load data is disclosed.
As shown in fig. 5, the abnormal value processing apparatus 30 for medium and long term load data mainly includes a processor 33 and a memory 31 for storing an instruction 32. The instructions, when executed by the processor, cause the apparatus to implement the abnormal value processing method for medium and long term load data as described in any one of the above.
The embodiment of the invention also discloses a computer readable storage medium.
The storage medium stores computer instructions, and after the computer reads the computer instructions in the storage medium, the computer runs the abnormal value processing method of the medium and long term load data.
It should be noted that all of the features disclosed in this specification, or all of the steps in any method or process so disclosed, may be combined in any combination, except for mutually exclusive features and/or steps.
In addition, the above-described embodiments are exemplary, and those skilled in the art, having benefit of this disclosure, will appreciate numerous solutions that are within the scope of the disclosure and that fall within the scope of the invention. It should be understood by those skilled in the art that the present specification and figures are illustrative only and are not limiting upon the claims. The scope of the invention is defined by the claims and their equivalents.

Claims (7)

1. An abnormal value processing method for medium and long term load data is characterized by comprising the following steps:
identifying abnormal data; wherein the abnormal data comprises null values, zero values, negative data points, similarities, outlier data points and step phenomena;
performing industry data abnormality degree checking on the abnormal data;
and if the abnormal degree is within the normal range, correcting the abnormal data.
2. The abnormal value processing method of the medium and long term load data according to claim 1, wherein the method for identifying the abnormal data comprises:
generating an electric quantity matrix E; the first dimension index of the electric quantity matrix E is an industry serial number, and the second dimension index is a time serial number;
generating an abnormal data identification matrix I corresponding to the electricity consumption matrix E; the size of the abnormal data identification matrix I is the same as that of the abnormal data identification matrix E;
Figure FDA0002406616370000011
j ∈ B, I (I, j) is zero when E (I, j) is correct data, and I (I, j) is marked as a non-zero value when E (I, j) is abnormal data;
wherein A is a first dimension index set of E, and B is a second dimension index set of E.
3. The abnormal value processing method of medium and long term load data according to claim 2,
the method for identifying null, zero and negative data points is as follows:
Figure FDA0002406616370000012
j ∈ B, if E (I, j) is null, zero value or negative value, label I (I, j) as 1, indicating a null, zero value or negative value data point;
the method for identifying the similarities comprises the following steps:
Figure FDA0002406616370000021
if it is not
Figure FDA0002406616370000022
Are all provided with E (i, j)1)=E(i,j2) Then, I (I, j)2) Labeled 2, where i ∈ A, represents a duplicate data point.
4. The abnormal value processing method of medium and long term load data according to claim 2,
the method of identifying outlier data points comprises:
identifying an abnormal increment, specifically:
Figure FDA0002406616370000023
j ∈ B, let Δ Ei(j) E (i, j +1) -E (i, j) represents the increment between two points in the power sequence; and identifying the abnormal increment according to the following formula:
Figure FDA0002406616370000024
wherein X represents a random variable, E (X) represents the mathematical expectation of the random variable X, D is the variance of the random variable, and k is a constant representing that the random variable leaves an expected range;
when k is 5, X falls within the interval [ E (X) -5D (X), E (X) +5D (X)]The probability of inner is about 96%; under the same criteria, the random variable Δ Ei(j) Falls within the interval [ E (Δ E)i(j))-5D(ΔEi(j)),E(ΔEi(j))+5D(ΔEi(j))]The probability of (1) is 96%; if the increment falls outside this interval, then this increment is an abnormal increment;
if Δ Ei(j) And Δ EiAnd (j-1) is the abnormal increment and falls on two sides of the interval respectively, then E (I, j) is the outlier point and I (I, j) is 3.
5. The abnormal value processing method of medium and long term load data according to claim 2,
the method for identifying the step phenomenon comprises the following steps:
identifying an abnormal increment, specifically:
Figure FDA0002406616370000031
j ∈ B, let Δ Ei(j) E (i, j +1) -E (i, j) represents the increment between two points in the power sequence; and identifying the abnormal increment according to the following formula
Figure FDA0002406616370000032
Wherein X represents a random variable, E (X) represents the mathematical expectation of the random variable X, D is the variance of the random variable, and k is a constant representing that the random variable leaves an expected range;
when k is 5, X falls within the interval [ E (X) -5D (X), E (X) +5D (X)]The probability of inner is about 96%; under the same criteria, the random variable Δ Ei(j) Falls within the interval [ E (Δ E)i(j))-5D(ΔEi(j)),E(ΔEi(j))+5D(ΔEi(j))]The probability of (1) is 96%; if the increment falls outside this interval, then this increment is an abnormal increment;
for industry i, if only Δ Ei(j) If the increment is abnormal, the industry I has a step phenomenon, and the index is I (I, j)k) Is 4, jkJ, which represents j owned by industry ikThe points are step outliers.
6. The abnormal value processing method of medium and long term load data according to claim 1,
the method for checking the abnormal degree of the industry data of the abnormal data comprises the following steps:
calculating the degree of abnormality of the industry by adopting the following formula
λ(i)=n1(i)/n(i)
Wherein λ (i) is the degree of industry anomaly, where n1(i) The number of abnormal points of the electricity consumption of the industry i is n (i), and the total number of the data points of the electricity consumption of the industry i is n (i);
if λ (i) < λlimIf the abnormal degree of the industry i is in a normal range, the abnormal point can be corrected;
wherein λ islimThe minimum value for judging that the industry i belongs to the data abnormal industry.
7. The abnormal value processing method of medium and long term load data according to claim 1,
the method for correcting the data comprises the following steps:
and performing linear interpolation correction on the abnormal data in a linear interpolation correction mode.
CN202010163490.2A 2020-03-10 2020-03-10 Abnormal value processing method for medium and long-term load data Pending CN111461409A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010163490.2A CN111461409A (en) 2020-03-10 2020-03-10 Abnormal value processing method for medium and long-term load data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010163490.2A CN111461409A (en) 2020-03-10 2020-03-10 Abnormal value processing method for medium and long-term load data

Publications (1)

Publication Number Publication Date
CN111461409A true CN111461409A (en) 2020-07-28

Family

ID=71682755

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010163490.2A Pending CN111461409A (en) 2020-03-10 2020-03-10 Abnormal value processing method for medium and long-term load data

Country Status (1)

Country Link
CN (1) CN111461409A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114254702A (en) * 2021-12-16 2022-03-29 南方电网数字电网研究院有限公司 Method, device, equipment, medium and product for identifying abnormal data of bus load
CN117349778A (en) * 2023-12-04 2024-01-05 湖南蓝绿光电科技有限公司 Online real-time monitoring system of consumer based on thing networking

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1832949A2 (en) * 2002-08-22 2007-09-12 Air Products and Chemicals, Inc. Method and apparatus for producing perturbation signals
CN101888087A (en) * 2010-05-21 2010-11-17 深圳市科陆电子科技股份有限公司 Method for realizing distributed super-short-term area load forecasting in distribution network terminal
US20120150489A1 (en) * 2010-12-13 2012-06-14 International Business Machines Corporation Multi-step time series prediction in complex instrumented domains
CN104766175A (en) * 2015-04-16 2015-07-08 东南大学 Power system abnormal data identifying and correcting method based on time series analysis
CN105023198A (en) * 2015-07-16 2015-11-04 国电南瑞科技股份有限公司 Network rule constraint-based power plant data anomaly identification method
CN105488736A (en) * 2015-12-02 2016-04-13 国家电网公司 Data processing method for photovoltaic power station data acquisition system
CN105956755A (en) * 2016-04-26 2016-09-21 广西电网有限责任公司百色供电局 Method and system for establishing quantitative relationship of general power line loss rate influencing factors
CN106780121A (en) * 2016-12-06 2017-05-31 广州供电局有限公司 A kind of multiplexing electric abnormality recognition methods based on power load pattern analysis
CN106844594A (en) * 2017-01-12 2017-06-13 南京大学 A kind of electric power Optimal Configuration Method based on big data
KR101791467B1 (en) * 2017-02-27 2017-10-30 편도균 Method for remote checking electric power facility using drone
EP3336656A1 (en) * 2016-12-19 2018-06-20 OFFIS e.V. Model based detection of user reaction times and further effects as well as systems therefore
CN108830510A (en) * 2018-07-16 2018-11-16 国网上海市电力公司 A kind of electric power data preprocess method based on mathematical statistics
CN109886836A (en) * 2019-03-01 2019-06-14 西安交通大学 A kind of dynamic partition Prices Calculation based on partition clustering analysis
CN110610280A (en) * 2018-10-31 2019-12-24 山东大学 Short-term prediction method, model, device and system for power load

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1832949A2 (en) * 2002-08-22 2007-09-12 Air Products and Chemicals, Inc. Method and apparatus for producing perturbation signals
CN101888087A (en) * 2010-05-21 2010-11-17 深圳市科陆电子科技股份有限公司 Method for realizing distributed super-short-term area load forecasting in distribution network terminal
US20120150489A1 (en) * 2010-12-13 2012-06-14 International Business Machines Corporation Multi-step time series prediction in complex instrumented domains
CN104766175A (en) * 2015-04-16 2015-07-08 东南大学 Power system abnormal data identifying and correcting method based on time series analysis
CN105023198A (en) * 2015-07-16 2015-11-04 国电南瑞科技股份有限公司 Network rule constraint-based power plant data anomaly identification method
CN105488736A (en) * 2015-12-02 2016-04-13 国家电网公司 Data processing method for photovoltaic power station data acquisition system
CN105956755A (en) * 2016-04-26 2016-09-21 广西电网有限责任公司百色供电局 Method and system for establishing quantitative relationship of general power line loss rate influencing factors
CN106780121A (en) * 2016-12-06 2017-05-31 广州供电局有限公司 A kind of multiplexing electric abnormality recognition methods based on power load pattern analysis
EP3336656A1 (en) * 2016-12-19 2018-06-20 OFFIS e.V. Model based detection of user reaction times and further effects as well as systems therefore
CN106844594A (en) * 2017-01-12 2017-06-13 南京大学 A kind of electric power Optimal Configuration Method based on big data
KR101791467B1 (en) * 2017-02-27 2017-10-30 편도균 Method for remote checking electric power facility using drone
CN108830510A (en) * 2018-07-16 2018-11-16 国网上海市电力公司 A kind of electric power data preprocess method based on mathematical statistics
CN110610280A (en) * 2018-10-31 2019-12-24 山东大学 Short-term prediction method, model, device and system for power load
CN109886836A (en) * 2019-03-01 2019-06-14 西安交通大学 A kind of dynamic partition Prices Calculation based on partition clustering analysis

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JIAN YANG; SUMEI LI; WEI XIANG: "A novel processing method of the subjective quality evaluation results of stereo video" *
XINYU CHEN; CHONGQING KANG; XING TONG; QING XIA; JUNFENG YANG: "Improving the Accuracy of Bus Load Forecasting by a Two-Stage Bad Data Identification Method" *
朱月梅: "配电网异常负荷数据辨识与修正" *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114254702A (en) * 2021-12-16 2022-03-29 南方电网数字电网研究院有限公司 Method, device, equipment, medium and product for identifying abnormal data of bus load
CN117349778A (en) * 2023-12-04 2024-01-05 湖南蓝绿光电科技有限公司 Online real-time monitoring system of consumer based on thing networking
CN117349778B (en) * 2023-12-04 2024-02-20 湖南蓝绿光电科技有限公司 Online real-time monitoring system of consumer based on thing networking

Similar Documents

Publication Publication Date Title
CN108896853B (en) Power grid voltage sag frequency evaluation method and device with distributed power supply
CN111461409A (en) Abnormal value processing method for medium and long-term load data
CN109374063B (en) Cluster management-based transformer anomaly detection method, device and equipment
CN115392812B (en) Abnormal root cause positioning method, device, equipment and medium
CN116502925B (en) Digital factory equipment inspection evaluation method, system and medium based on big data
CN111476375B (en) Method and device for determining identification model, electronic equipment and storage medium
WO2020166236A1 (en) Work efficiency evaluating method, work efficiency evaluating device, and program
CN113391256B (en) Electric energy meter metering fault analysis method and system of field operation terminal
CN111413564B (en) Supercapacitor failure early warning method, system and equipment
CN113255096A (en) High-loss line abnormal distribution area positioning method and system based on forward stepwise regression
CN109887253B (en) Correlation analysis method for petrochemical device alarm
CN114384885B (en) Process parameter adjusting method, device, equipment and medium based on abnormal working conditions
CN113376564B (en) Smart electric meter metering correction method and device based on data analysis and terminal
CN110739082A (en) occupational health risk management and control measure evaluation method and related device
CN111199419B (en) Stock abnormal transaction identification method and system
CN112748390B (en) Method and device for evaluating state of electric energy meter
CN114662589A (en) Ammeter fault research and judgment method, device, equipment and readable storage medium
CN112055376B (en) Base station equipment monitoring method, device and storage medium
CN114897631A (en) Meter-user dislocation analysis method and device for characteristic analysis
CN108319573A (en) A method of judged based on energy statistics data exception and is repaired
CN114169915A (en) Method and device for determining price reference value of automobile parts in automobile insurance claim settlement industry
CN113065234A (en) Batch reliability risk level assessment method and system for intelligent electric meters
CN114967613B (en) Production equipment state monitoring method and device with multiple sensors
CN112988506B (en) Big data server node performance monitoring method and system
CN117371976A (en) Maintenance personnel determination method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Hu Yingying

Inventor after: Wang Yao

Inventor after: Li Jia

Inventor after: Deng Jiaojiao

Inventor after: Liu Hongli

Inventor before: Hu Yingying

Inventor before: Wang Yao

Inventor before: Li Jia

Inventor before: Deng Jiaojiao

Inventor before: Liu Hongli

Inventor before: Tong Xing

CB03 Change of inventor or designer information
TA01 Transfer of patent application right

Effective date of registration: 20220127

Address after: 030000 15 / F, building 1, No.89, Fudong street, Xinghualing District, Taiyuan City, Shanxi Province

Applicant after: ECONOMIC RESEARCH INSTITUTE OF STATE GRID SHANXI ELECTRIC POWER Co.

Address before: 030000 15 / F, building 1, No.89, Fudong street, Xinghualing District, Taiyuan City, Shanxi Province

Applicant before: ECONOMIC RESEARCH INSTITUTE OF STATE GRID SHANXI ELECTRIC POWER Co.

Applicant before: Shenzhen orange Technology Co.,Ltd.

TA01 Transfer of patent application right
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200728

WD01 Invention patent application deemed withdrawn after publication