CN109190184B - Heat supply system historical data preprocessing method - Google Patents

Heat supply system historical data preprocessing method Download PDF

Info

Publication number
CN109190184B
CN109190184B CN201810903746.1A CN201810903746A CN109190184B CN 109190184 B CN109190184 B CN 109190184B CN 201810903746 A CN201810903746 A CN 201810903746A CN 109190184 B CN109190184 B CN 109190184B
Authority
CN
China
Prior art keywords
value
data
sequence
vacancy
heat supply
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810903746.1A
Other languages
Chinese (zh)
Other versions
CN109190184A (en
Inventor
田喆
李万程
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201810903746.1A priority Critical patent/CN109190184B/en
Publication of CN109190184A publication Critical patent/CN109190184A/en
Application granted granted Critical
Publication of CN109190184B publication Critical patent/CN109190184B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2119/00Details relating to the type or aim of the analysis or the optimisation
    • G06F2119/08Thermal analysis or thermal optimisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Evolutionary Computation (AREA)
  • Geometry (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Investigating Or Analyzing Materials Using Thermal Means (AREA)

Abstract

The invention discloses a heat supply system historical data preprocessing method, which comprises the steps of importing collected heat supply historical data and corresponding outdoor meteorological data into a computer; reconstructing heat supply historical data to remove trend items, extracting random fluctuation items in the sequences to generate a group of sequences to be processed; carrying out abnormal value identification on the sequence to be processed and positioning an abnormal point; removing the value of the corresponding original heat supply historical data sequence according to the positioning result of the abnormal value of the sequence to be processed to form a screened sequence of the vacancy value to be filled; carrying out visual pattern identification on the characteristics of the screened sequence, and filling the empty values to form a processed sequence; and repeating the steps, and carrying out visual pattern identification again until completing the task of preprocessing the heat supply historical data. The method can improve the accuracy of subsequent data mining.

Description

Heat supply system historical data preprocessing method
Technical Field
The invention relates to a data processing method, in particular to a method for preprocessing historical data of a heating system.
Technical Field
With the rapid development of the current production technology, the cost price of various sensors is greatly reduced, so that the sensors are widely deployed in various research fields of the whole society to obtain various monitoring data. The data contains rich information of the monitored system, and some interesting hidden information can be extracted from the massive information through the technologies of data mining and big data analysis, so that the system can be assisted to operate better.
In the field of heat supply, a plurality of sensors are also arranged on a heat supply pipe network and heat supply equipment to monitor the operation of a heat supply system, so that on one hand, the water temperature, the flow and the like of the heat supply pipe network are monitored to meet the requirements of heat load; on the other hand, the heat supply equipment including pipe network pressure, water pump operation frequency, current and the like are monitored, so that the safety and low-energy-consumption operation of a heat supply system are met.
For the heating system, a lot of heating historical data are accumulated through monitoring of the sensor, and data mining and big data analysis are carried out on the heating historical data, so that a lot of information can be obtained to assist the heating system to better operate. But heating history data has some problems inherent to big data. On one hand, the quality of the monitored heating system data is affected due to the uneven quality of the sensors; on the other hand, even if the quality of the sensor itself is too poor, the data quality of the heating system may be affected due to some reasons of installation.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a method for preprocessing the historical data of a heating system, which can improve the quality of the historical data of heating.
A heating system historical data preprocessing method comprises the following steps:
step one, importing collected heat supply historical data and corresponding outdoor meteorological data into a computer, wherein all the outdoor meteorological data form an outdoor meteorological data set X;
the heat supply historical data is data to be preprocessed, the time scale is minute by minute or hour by hour, the heat supply historical data comprises various types of operation data which are generated in the operation process of a heat supply system and are related to time sequence, and the outdoor meteorological data is real-time meteorological data observed by a meteorological station;
reconstructing and removing trend items in heat supply operation data according to heat supply historical data, and extracting random fluctuation items in the sequences to generate a group of sequences to be processed;
thirdly, identifying abnormal values of the sequence to be processed and positioning abnormal points;
removing the corresponding value of the original heat supply historical data sequence according to the positioning result of the abnormal value of the sequence to be processed to form a screened sequence of the vacancy value to be filled;
step five, performing visual pattern identification on the characteristics of the screened sequence, and filling the empty values to form a processed sequence, wherein the sequence is divided into four cases;
in the first situation, when the overall fluctuation of the sequence data is small after the whole screening, the data value fluctuates around a certain value and contains random errors, the data vacancy value is filled by adopting an averaging method;
the calculation formula of the averaging method is as follows:
Figure BDA0001760097780000021
wherein y is the vacancy value to be filled in the screened sequence, and x i The data of the ith non-null value in the sequence is used, and n is the number of the data of the non-null value in the sequence;
in the second situation, the data of the sequence after the whole screening has larger global fluctuation but more stable local, the data value fluctuates around the local mean value in a certain period of time and contains random errors, and the data is filled by adopting a previous time value method;
the calculation formula of the previous time value method is as follows:
y i =y i-1
in the formula, y i For screening the empty value to be filled at time i in the subsequent sequence data, y i-1 Is the value at time i-1;
the data value of the whole screened sequence stably increases, the current time value is obtained by adding the previous time value and the increase value, and the vacancy value is filled by adopting an accumulative value method;
the formula of the cumulative method is as follows:
Figure BDA0001760097780000031
in the formula, y i For screening the empty value to be filled at time i in the subsequent sequence data, y m Is the value at time m, y n The time is the value of n, the time m is the time corresponding to the first non-vacancy value before the vacancy value to be filled, and the time n is the time corresponding to the first non-vacancy value after the vacancy value to be filled;
and fourthly, the overall fluctuation range of the sequence data after the whole screening is large, the sequence data rule is not obvious, and the vacancy value is filled by adopting a similarity algorithm, and the method comprises the following specific steps:
(1) Firstly, normalization processing is carried out on meteorological parameters corresponding to the vacancy values and the non-vacancy values by adopting a min-max standardization method, and the calculation formula is as follows:
Figure BDA0001760097780000032
in the formula, X is a certain value in an outdoor meteorological data set X, min { X } is the minimum value in the outdoor meteorological data set X, max { X } is the maximum value in the outdoor meteorological data set X, and y is data normalized by a min-max method;
(2) Calculating the Euclidean distance between the meteorological parameters corresponding to the vacancy values and the meteorological parameters corresponding to the non-vacancy values by adopting the following formula:
Figure BDA0001760097780000033
wherein d is the Euclidean distance, x i A certain meteorological parameter, y, corresponding to the vacancy value i A certain meteorological parameter corresponding to the non-vacancy value, and n is the type of the meteorological parameter participating in calculation;
(3) Finding out m non-vacancy value meteorological parameters with the nearest Euclidean distance to the vacancy value meteorological parameters, and then taking the average number of the non-vacancy values corresponding to the m non-vacancy value meteorological parameters as filling items of the vacancy values;
step six, visual pattern recognition is carried out again, whether abnormal points exist in the processed sequence or not is judged, and if the abnormal points do not exist in the processed sequence, the task of preprocessing the heat supply historical data is finished; if the abnormal point data exists, the steps I to V are repeated until the preprocessing task is completed.
The invention has the advantages and positive effects that:
1. through the preprocessing of the heat supply historical data, the quality of the heat supply historical data can be improved, and a foundation is laid for subsequent heat supply information data mining and big data analysis, so that a more accurate and reliable conclusion can be obtained.
2. Through the analysis of the characteristics of the heat supply historical data, data processing methods of different properties are provided in a targeted manner, so that the preprocessing effect is better, and the sequence data after preprocessing is closer to the true value.
Drawings
FIG. 1 is a flow chart of the present invention for pre-processing historical data;
FIG. 2 is a schematic diagram of a data sequence suitable for mean padding in the present invention;
FIG. 3 is a schematic diagram of a data sequence suitable for a previous time-valued padding in the present invention;
FIG. 4 is a schematic diagram of a data sequence suitable for filling by the accumulation method according to the present invention;
fig. 5 is a schematic diagram of a data sequence suitable for similarity algorithm padding in the present invention.
Detailed Description
The specific steps of the method for preprocessing the historical data of the heating system according to the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.
The invention discloses a method for preprocessing historical data of a heating system, which has a specific flow as shown in figure 1 and comprises the following steps:
step one, importing collected heat supply historical data and corresponding outdoor meteorological data into a computer, wherein all the outdoor meteorological data form an outdoor meteorological data set X;
the heat supply historical data is data to be preprocessed, the time scale is minute by minute or hour by hour, and the heat supply historical data comprises various operation data which are generated in the operation process of a heat supply system and are related to time sequences, namely water temperature data, flow data and pressure data of a heat supply pipe network and the tail end of a user, and information data such as power consumption, heat consumption, frequency of a pump, current of the pump and the like of a heat source and heat exchange station equipment. When the heat supply historical data contains abnormal values, the heat supply historical data needs to be subjected to next data preprocessing.
The outdoor meteorological data are real-time meteorological data observed by a meteorological station, and comprise outdoor temperature, solar radiation intensity and outdoor wind speed, and the time scale of the outdoor meteorological data is consistent with the heat supply historical data. The outdoor weather data is data obtained from a weather station, can be regarded as accurate data and does not need to be processed.
Reconstructing and removing trend items in heat supply operation data according to heat supply historical data, and extracting random fluctuation items in the sequences to generate a group of sequences to be processed;
the original heating historical data sequence is continuously changed in time, and the change trend of heating operation data including water temperature, flow and the like is slow aiming at the characteristics of large inertia and large lag of a heating system, so that the heating operation data can be regarded as being composed of a trend item and a random fluctuation item in the time sequence.
And subtracting the heat supply operation data at the previous moment from the heat supply operation data at each moment by a first-order difference method to obtain a variation value of the heat supply operation data at each moment so as to remove a trend item in the heat supply operation data and extract a random fluctuation item in the sequence. The method is simple and easy to implement, and the random fluctuation item is extracted quickly.
Besides the first order difference method, the random fluctuation term can be extracted by using a wavelet decomposition method, specifically, the wavelet decomposition method disclosed in the electrocardiosignal denoising method and system (publication No. CN 107341769A).
Thirdly, identifying abnormal values of the sequence to be processed and positioning abnormal points;
the sequence only containing random fluctuation is obtained through reconstruction of the data sequence, and due to the fact that the heat supply historical data size is large, the reconstructed sequence only contains random fluctuation factors and accords with normal distribution, the abnormal value of the sequence to be processed can be identified by using the 3 sigma principle of the Lauda criterion. The method is simple, feasible and accurate to identify the abnormal value.
The 3 σ principle formula of the Lauda criterion is as follows:
P{μ-3σ≤X≤μ+3σ}=99.7%
where μ is an average value of the sequence data to be processed and σ is a standard deviation of the sequence data to be processed.
The 3 σ rule indicates that if the sequence data contains only random errors and fits a normal distribution, 99.7% of the data will fall within the 3 σ interval, and only 0.3% of the data will fall outside the 3 σ interval, and the part of the data outside the 3 σ interval can be considered as gross errors, i.e., outliers, which should be identified and located.
The 3 sigma principle of the Lauda criterion can be used for identifying abnormal values of the reconstructed sequence and positioning abnormal points.
In addition to the lai criterion, an abnormal value can be identified by using a quartile method, specifically, see a method for obtaining a quartile box diagram disclosed in "a fan abnormal data processing method and apparatus based on a quartile box diagram" (publication number CN106897941 a).
Removing the corresponding value of the original heat supply historical data sequence according to the positioning result of the abnormal value of the sequence to be processed to form a screened sequence of the vacancy value to be filled;
according to the method of the third step, the position where the abnormal value appears can be located, the value of the corresponding original heat supply historical data can be removed according to the locating result, then the vacancy value is temporarily replaced, and the vacancy value is filled in the next step.
Fifthly, carrying out visual pattern identification on the characteristics of the screened sequence, and filling the empty values to form a processed sequence;
by analyzing the characteristics of heat supply historical data, the data distribution condition mainly has four types. The first type is characterized in that the overall fluctuation of the data of the whole screened sequence is small, the data value fluctuates around a certain value and contains random errors, so that the data vacancy value with the characteristics is filled by adopting an averaging method, and the specific form of the characteristics of the data can be seen in an attached figure 2.
The calculation formula of the averaging method is as follows:
Figure BDA0001760097780000061
wherein y is the vacancy value to be filled in the screened sequence, and x i Is the ith in the sequenceAnd n is the number of the data of the non-vacancy values in the sequence.
And filling the vacancy values of the heat supply data with the characteristic of small global fluctuation aiming at the sequence data by using an averaging method.
The second type of heating data is characterized in that the overall fluctuation of the sequence data after the whole screening is large, but the local part is stable, the data value fluctuates around the local mean value in a certain period of time, and random errors exist, so that the data vacancy value with the characteristics is filled by adopting a previous time value method, and the specific form of the characteristics of the data can be seen in an attached figure 3.
The calculation formula of the previous time value method is as follows:
y i =y i-1
in the formula, y i For screening the empty value to be filled at time i in the subsequent sequence data, y i-1 Is the value at time i-1.
The method can fill the heat supply data vacancy value with the characteristics of large global fluctuation and stable local part aiming at the sequence data through a previous time value method.
The third kind of heating data is characterized in that the data value of the whole screened sequence is stably increased, the value at the current moment is obtained by adding the value at the previous moment and the increased value, therefore, the vacancy value with the characteristics is filled by adopting an accumulative value method, and the specific form of the characteristics of the data can be seen in an attached figure 4.
The formula of the cumulative method is as follows:
Figure BDA0001760097780000071
in the formula, y i For screening the empty value to be filled at time i in the subsequent sequence data, y m Is the value at time m, y n Is the value at time n. The time m is the time corresponding to the first non-vacancy value before the vacancy value to be filled, and the time n is the time corresponding to the first non-vacancy value after the vacancy value to be filled.
And filling the vacancy values of the heat supply data with the addition characteristics aiming at the sequence data after screening by an accumulative value method.
The fourth type of heating data is characterized in that the overall fluctuation range of the sequence data after the whole screening is large, and the sequence data rule is not obvious, so that the vacancy values with the characteristics are filled by adopting a similarity algorithm, and the specific form of the characteristics of the data can be seen in an attached figure 5.
The principle of similarity algorithm is to compare the similarity of two things to judge the difference between individuals, the similarity measurement principle has many methods, the Euclidean distance is adopted in the invention to judge the similarity, and the similarity of different attribute data is determined by calculating the Euclidean distance.
Aiming at the characteristics of a heat supply system, the operation parameters of the heat supply system are extremely related to outdoor meteorological factors, the outdoor meteorological parameters are key factors for leading heat supply operation, each vacancy value corresponds to a meteorological parameter at a corresponding moment, m pieces of non-vacancy value meteorological parameters closest to the vacancy value meteorological parameters are found out by calculating the Euclidean distance between the meteorological parameters corresponding to the vacancy values and the meteorological parameters corresponding to the non-vacancy values, then the average number of the non-vacancy values corresponding to the m pieces of non-vacancy value meteorological parameters is taken as a filling item of the vacancy values, wherein the value of m can be automatically adjusted according to heat supply professionals, and 10 can be taken generally.
The method comprises the following specific steps:
(1) Firstly, normalization processing is carried out on meteorological parameters corresponding to the vacancy values and the non-vacancy values by adopting a min-max standardization method, and the calculation formula is as follows:
Figure BDA0001760097780000081
in the formula, X is a certain value in the outdoor meteorological data set X, min { X } is the minimum value in the outdoor meteorological data set X, max { X } is the maximum value in the outdoor meteorological data set X, and y is data normalized by a min-max method.
(2) Calculating the Euclidean distance between the meteorological parameters corresponding to the vacancy values and the meteorological parameters corresponding to the non-vacancy values by adopting the following formula:
Figure BDA0001760097780000082
wherein d is the Euclidean distance, x i A certain meteorological parameter, y, corresponding to the vacancy value i A certain meteorological parameter corresponding to the non-vacancy value, and n is the type of the meteorological parameter participating in calculation;
(3) M non-vacancy value meteorological parameters which are closest to the vacancy value meteorological parameters in Euclidean distance are found out, then the average number of the non-vacancy values corresponding to the m non-vacancy value meteorological parameters is taken as a filling item of the vacancy value, and m is generally 10.
The following calculation is a specific example:
it is known that a certain heating history data set is shown in table 1, in which the absence of water supply temperature data with ID 6 needs to be filled.
TABLE 1 certain Heat supply historical data set table
ID Outdoor temperature C Solar radiation W/m 2 Outdoor wind speed m/s Temperature of water supply
1 8.8 105 0 62.8
2 9.9 239 0 63
3 11 264 0 62
4 11 279 0.8 61.3
5 11.3 286 0.3 60.7
6 11.4 241 2.2 Value of vacancy
7 11.3 230 1.8 63.1
8 11.4 97 1.5 60.5
9 11 50 0.7 60.9
10 6.6 99 2.4 67.7
11 7.1 224 0.7 67.3
12 7.3 209 0 66.3
13 7.5 274 1.5 65.3
14 8.4 347 2.1 64.8
15 9.3 440 0.8 63.9
16 9.6 338 0.5 63.8
17 9.7 263 0.6 63.6
18 9.6 108 0.3 64.9
19 7.9 280 1.8 66.5
20 8.2 242 0.8 66.3
(1) And normalizing the meteorological data. The normalized meteorological parameters are shown in table 2.
TABLE 2 weather data after normalization
Figure BDA0001760097780000091
Figure BDA0001760097780000101
The data in the table is passed through table 1
Figure BDA0001760097780000102
And calculating to obtain the following steps:
data with ID of 1 were normalized to outdoor temperature
Figure BDA0001760097780000103
Solar radiation is
Figure BDA0001760097780000104
Outdoor wind speed is
Figure BDA0001760097780000105
(2) The Euclidean distances between the meteorological parameters corresponding to the vacancy values and the meteorological parameters corresponding to the non-vacancy values are calculated and sorted, as shown in Table 3.
Table 3 euclidean distance ranking table
ID Outdoor temperature C Solar radiation W/m 2 Outdoor wind speed m/s Temperature of water supply Euclidean distance
6 1.00 0.49 0.92 Value of vacancy 0.00
7 0.98 0.46 0.75 63.1 0.17
8 1.00 0.12 0.63 60.5 0.47
4 0.92 0.59 0.33 61.3 0.60
14 0.38 0.76 0.88 64.8 0.68
19 0.27 0.59 0.75 66.5 0.75
17 0.65 0.55 0.25 63.6 0.76
9 0.92 0.00 0.29 60.9 0.80
5 0.98 0.61 0.13 60.7 0.80
16 0.63 0.74 0.21 63.8 0.84
13 0.19 0.57 0.63 65.3 0.87
20 0.33 0.49 0.33 66.3 0.89
15 0.56 1.00 0.33 63.9 0.89
3 0.92 0.55 0.00 62 0.92
18 0.63 0.15 0.13 64.9 0.94
2 0.69 0.48 0.00 63 0.97
10 0.00 0.13 1.00 67.7 1.07
11 0.10 0.45 0.29 67.3 1.09
1 0.46 0.14 0.00 62.8 1.12
12 0.15 0.41 0.00 66.3 1.26
The data in the table is passed through table 2
Figure BDA0001760097780000106
And calculating to obtain the following steps:
calculating the Euclidean distance between the meteorological parameter with the ID of 1 and the meteorological parameter with the vacancy value (the ID of 6):
Figure BDA0001760097780000111
(3) Finding out 10 non-vacancy value meteorological parameters which are closest to the Euclidean distance of the vacancy value meteorological parameters, then taking the average number of the non-vacancy values corresponding to the 10 non-vacancy value meteorological parameters as filling items of the vacancy values, and filling the water supply temperature vacancy value with the ID of 6 to be 63.05 ℃.
And the heat supply data vacancy value with the characteristics of large global fluctuation range and unobvious sequence data rule of the sequence data can be filled through a similarity algorithm.
Step six, carrying out visual pattern identification again, judging whether abnormal points exist in the processed sequence, and if not, finishing the task of preprocessing the heat supply historical data; if the abnormal point data exists, the steps I to V are repeated until the preprocessing task is completed.
Because the heat supply data has the inherent characteristics of large data such as high dimensionality, long time scale, large data volume and the like, the data preprocessing is a relatively complex process, the problems of abnormal values and vacant values in the data are difficult to solve through one preprocessing process, the result of each preprocessing needs to be judged by combining the professional knowledge of heat supply personnel, and whether the result meets the subsequent data mining requirements or not is determined.
Therefore, after the abnormal point identification and removal and the vacancy value filling are completed, the preprocessing result is displayed through the visual graph, the heat supply professional judges the visual result, whether the abnormal point is completely removed, whether the vacancy value is completely filled and whether the requirement of subsequent heat supply data mining is met are determined, and if the heat supply professional determines that the expected processing effect is achieved, the preprocessing process is ended; and if the heat supply professional deems that the expected treatment effect is not achieved, repeating the flow from the first step to the fifth step until the expected treatment effect is achieved, and completing the task of pretreatment.
It should be noted that the summary and the detailed description of the invention are intended to demonstrate the practical application of the technical solutions provided by the present invention, and should not be construed as limiting the scope of the present invention. Various modifications, equivalent alterations, and improvements will occur to those skilled in the art and are intended to be within the spirit and scope of the invention. Such changes and modifications are intended to be included within the scope of the appended claims.

Claims (3)

1. A heating system historical data preprocessing method is characterized by comprising the following steps:
step one, importing collected heat supply historical data and corresponding outdoor meteorological data into a computer, wherein all the outdoor meteorological data form an outdoor meteorological data set X;
the heat supply historical data is data to be preprocessed, the time scale is minute by minute or hour by hour, the heat supply historical data comprises various types of operation data which are generated in the operation process of a heat supply system and are related to time sequence, and the outdoor meteorological data is real-time meteorological data observed by a meteorological station;
reconstructing and removing trend items in heat supply operation data according to heat supply historical data, and extracting random fluctuation items in the sequences to generate a group of sequences to be processed;
thirdly, identifying abnormal values of the sequence to be processed and positioning abnormal points;
removing the corresponding value of the original heat supply historical data sequence according to the positioning result of the abnormal value of the sequence to be processed to form a screened sequence of the vacancy value to be filled;
step five, performing visual pattern identification on the characteristics of the screened sequence, and filling the empty values to form a processed sequence, wherein the sequence is divided into four cases;
in the first situation, when the overall fluctuation of the sequence data is small after the whole screening, the data value fluctuates around a certain value and contains random errors, the data vacancy value is filled by adopting an averaging method;
the calculation formula of the averaging method is as follows:
Figure FDA0001760097770000011
wherein y is the vacancy value to be filled in the screened sequence, and x i The data of the ith non-null value in the sequence is used, and n is the number of the data of the non-null value in the sequence;
in the second situation, the data of the sequence after the whole screening has larger global fluctuation but more stable local, the data value fluctuates around the local mean value in a certain period of time and contains random errors, and the data is filled by adopting a previous time value method;
the calculation formula of the previous time value method is as follows:
y i =y i-1
in the formula, y i For screening the empty value to be filled at time i in the subsequent sequence data, y i-1 Is the value at time i-1;
the data value of the whole screened sequence stably increases, the current time value is obtained by adding the previous time value and the increase value, and the vacancy value is filled by adopting an accumulative value method;
the formula of the cumulative method is as follows:
Figure FDA0001760097770000021
in the formula, y i For screening the empty value to be filled at time i in the subsequent sequence data, y m Is the value of m time, y n The time is the value of n, the time m is the time corresponding to the first non-vacancy value before the vacancy value to be filled, and the time n is the time corresponding to the first non-vacancy value after the vacancy value to be filled;
and fourthly, the overall fluctuation range of the sequence data after the whole screening is large, the sequence data rule is not obvious, and the vacancy value is filled by adopting a similarity algorithm, and the method comprises the following specific steps:
(1) Firstly, normalization processing is carried out on meteorological parameters corresponding to the vacancy value and the non-vacancy value by adopting a min-max standardization method, and the calculation formula is as follows:
Figure FDA0001760097770000022
in the formula, X is a certain value in an outdoor meteorological data set X, min { X } is the minimum value in the outdoor meteorological data set X, max { X } is the maximum value in the outdoor meteorological data set X, and y is data normalized by a min-max method;
(2) Calculating the Euclidean distance between the meteorological parameters corresponding to the vacancy values and the meteorological parameters corresponding to the non-vacancy values by adopting the following formula:
Figure FDA0001760097770000023
wherein d is the Euclidean distance, x i A certain meteorological parameter, y, corresponding to the vacancy value i A certain meteorological parameter corresponding to the non-vacancy value, and n is the type of the meteorological parameter participating in calculation;
(3) Finding out m non-vacancy value meteorological parameters with the nearest Euclidean distance to the vacancy value meteorological parameters, and then taking the average number of the non-vacancy values corresponding to the m non-vacancy value meteorological parameters as filling items of the vacancy values;
step six, carrying out visual pattern identification again, judging whether abnormal points exist in the processed sequence, and if not, finishing the task of preprocessing the heat supply historical data; if the abnormal point data exists, the steps I to V are repeated until the preprocessing task is completed.
2. A heating system history data preprocessing method according to claim 1, characterized in that: and subtracting the heat supply operation data at the previous moment from the heat supply operation data at each moment by a first-order difference method to obtain a variation value of the heat supply operation data at each moment so as to remove a trend item in the heat supply operation data and extract a random fluctuation item in the sequence.
3. A heating system history data preprocessing method according to claim 1 or 2, characterized in that: the identification of outliers is performed on the sequences to be processed using the 3 sigma principle of the Lauda criterion.
CN201810903746.1A 2018-08-09 2018-08-09 Heat supply system historical data preprocessing method Active CN109190184B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810903746.1A CN109190184B (en) 2018-08-09 2018-08-09 Heat supply system historical data preprocessing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810903746.1A CN109190184B (en) 2018-08-09 2018-08-09 Heat supply system historical data preprocessing method

Publications (2)

Publication Number Publication Date
CN109190184A CN109190184A (en) 2019-01-11
CN109190184B true CN109190184B (en) 2022-12-09

Family

ID=64921312

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810903746.1A Active CN109190184B (en) 2018-08-09 2018-08-09 Heat supply system historical data preprocessing method

Country Status (1)

Country Link
CN (1) CN109190184B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110081508B (en) * 2019-03-18 2021-04-30 天津理工大学 Method for reducing energy consumption of regional heating system based on big data
CN110748951A (en) * 2019-11-01 2020-02-04 北京硕人时代科技股份有限公司 Method, device and system for determining heat supply energy saving amount
CN112199365A (en) * 2020-10-26 2021-01-08 天津大学 Abnormal identification method for monitoring data of heat supply system
CN112559827A (en) * 2020-12-08 2021-03-26 上海上实龙创智能科技股份有限公司 Measurement parameter prediction and sewage treatment control method based on deep learning
CN114964042B (en) * 2022-05-20 2023-10-20 西安交通大学 Method for distinguishing and identifying abnormal points in data in curve profile online measurement
CN115933787B (en) * 2023-03-14 2023-05-16 西安英图克环境科技有限公司 Indoor multi-terminal intelligent control system based on indoor environment monitoring

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016101690A1 (en) * 2014-12-22 2016-06-30 国家电网公司 Time sequence analysis-based state monitoring data cleaning method for power transmission and transformation device
CN106846164A (en) * 2016-08-27 2017-06-13 董涛 Intelligent grid data managing method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11113337B2 (en) * 2016-09-08 2021-09-07 Indian Institute Of Technology Bombay Method for imputing missed data in sensor data sequence with missing data

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016101690A1 (en) * 2014-12-22 2016-06-30 国家电网公司 Time sequence analysis-based state monitoring data cleaning method for power transmission and transformation device
CN106846164A (en) * 2016-08-27 2017-06-13 董涛 Intelligent grid data managing method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
一种大域数据流中缺失值的填充方法;赵飞等;《南京大学学报(自然科学版)》;20110130(第01期);全文 *
基于时间序列分析的电力负荷数据预处理方法;王在乾等;《科技创新与应用》;20180308(第07期);全文 *
集中供热管理数据挖掘系统;葛淑杰等;《黑龙江科技学院学报》;20041130(第06期);全文 *

Also Published As

Publication number Publication date
CN109190184A (en) 2019-01-11

Similar Documents

Publication Publication Date Title
CN109190184B (en) Heat supply system historical data preprocessing method
CN113255795B (en) Equipment state monitoring method based on multi-index cluster analysis
WO2021179572A1 (en) Operation and maintenance system anomaly index detection model optimization method and apparatus, and storage medium
CN106780121B (en) Power consumption abnormity identification method based on power consumption load mode analysis
CN104794535B (en) A kind of method of electric power demand forecasting and early warning based on Dominant Industry
CN110837874B (en) Business data anomaly detection method based on time sequence classification
CN109947815B (en) Power theft identification method based on outlier algorithm
CN112213687B (en) Gateway electric energy meter data anomaly detection method and system based on pseudo-anomaly point identification
CN116956198B (en) Intelligent electricity consumption data analysis method and system based on Internet of things
CN111915089A (en) Method and device for predicting pump set energy consumption of sewage treatment plant
Oprime et al. X-bar control chart design with asymmetric control limits and triple sampling
CN112329868A (en) CLARA clustering-based manufacturing and processing equipment group energy efficiency state evaluation method
CN116629686A (en) Method and device for evaluating enterprise energy consumption data
CN117172601A (en) Non-invasive load monitoring method based on residual total convolution neural network
CN111623905A (en) Wind turbine bearing temperature early warning method and device
Khaleghian et al. Electric vehicle identification in low-sampling non-intrusive load monitoring systems using machine learning
CN113791186B (en) Method and system for selecting water quality abnormality alarm monitoring factors
Luo et al. Recognition and labeling of faults in wind turbines with a density-based clustering algorithm
KR101696105B1 (en) Apparatus and Method for analyzing defect reason
CN113408210A (en) Deep learning based non-intrusive load splitting method, system, medium, and apparatus
Yu et al. Application of distance measure in KNN motor fault diagnosis
CN116595338B (en) Engineering information acquisition and processing system based on Internet of things
CN114580467B (en) Power data anomaly detection method and system based on data enhancement and Tri-Training
Xu et al. Fault diagnostics by conceptors-aided clustering
CN117494063B (en) Novel enterprise carbon emission monitoring method under power system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant