CN109190184B

CN109190184B - Heat supply system historical data preprocessing method

Info

Publication number: CN109190184B
Application number: CN201810903746.1A
Authority: CN
Inventors: 田喆; 李万程
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2018-08-09
Filing date: 2018-08-09
Publication date: 2022-12-09
Anticipated expiration: 2038-08-09
Also published as: CN109190184A

Abstract

The invention discloses a heat supply system historical data preprocessing method, which comprises the steps of importing collected heat supply historical data and corresponding outdoor meteorological data into a computer; reconstructing heat supply historical data to remove trend items, extracting random fluctuation items in the sequences to generate a group of sequences to be processed; carrying out abnormal value identification on the sequence to be processed and positioning an abnormal point; removing the value of the corresponding original heat supply historical data sequence according to the positioning result of the abnormal value of the sequence to be processed to form a screened sequence of the vacancy value to be filled; carrying out visual pattern identification on the characteristics of the screened sequence, and filling the empty values to form a processed sequence; and repeating the steps, and carrying out visual pattern identification again until completing the task of preprocessing the heat supply historical data. The method can improve the accuracy of subsequent data mining.

Description

Heat supply system historical data preprocessing method

Technical Field

The invention relates to a data processing method, in particular to a method for preprocessing historical data of a heating system.

Technical Field

With the rapid development of the current production technology, the cost price of various sensors is greatly reduced, so that the sensors are widely deployed in various research fields of the whole society to obtain various monitoring data. The data contains rich information of the monitored system, and some interesting hidden information can be extracted from the massive information through the technologies of data mining and big data analysis, so that the system can be assisted to operate better.

In the field of heat supply, a plurality of sensors are also arranged on a heat supply pipe network and heat supply equipment to monitor the operation of a heat supply system, so that on one hand, the water temperature, the flow and the like of the heat supply pipe network are monitored to meet the requirements of heat load; on the other hand, the heat supply equipment including pipe network pressure, water pump operation frequency, current and the like are monitored, so that the safety and low-energy-consumption operation of a heat supply system are met.

For the heating system, a lot of heating historical data are accumulated through monitoring of the sensor, and data mining and big data analysis are carried out on the heating historical data, so that a lot of information can be obtained to assist the heating system to better operate. But heating history data has some problems inherent to big data. On one hand, the quality of the monitored heating system data is affected due to the uneven quality of the sensors; on the other hand, even if the quality of the sensor itself is too poor, the data quality of the heating system may be affected due to some reasons of installation.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a method for preprocessing the historical data of a heating system, which can improve the quality of the historical data of heating.

A heating system historical data preprocessing method comprises the following steps:

step one, importing collected heat supply historical data and corresponding outdoor meteorological data into a computer, wherein all the outdoor meteorological data form an outdoor meteorological data set X;

the heat supply historical data is data to be preprocessed, the time scale is minute by minute or hour by hour, the heat supply historical data comprises various types of operation data which are generated in the operation process of a heat supply system and are related to time sequence, and the outdoor meteorological data is real-time meteorological data observed by a meteorological station;

reconstructing and removing trend items in heat supply operation data according to heat supply historical data, and extracting random fluctuation items in the sequences to generate a group of sequences to be processed;

thirdly, identifying abnormal values of the sequence to be processed and positioning abnormal points;

removing the corresponding value of the original heat supply historical data sequence according to the positioning result of the abnormal value of the sequence to be processed to form a screened sequence of the vacancy value to be filled;

step five, performing visual pattern identification on the characteristics of the screened sequence, and filling the empty values to form a processed sequence, wherein the sequence is divided into four cases;

in the first situation, when the overall fluctuation of the sequence data is small after the whole screening, the data value fluctuates around a certain value and contains random errors, the data vacancy value is filled by adopting an averaging method;

the calculation formula of the averaging method is as follows:

wherein y is the vacancy value to be filled in the screened sequence, and x _i The data of the ith non-null value in the sequence is used, and n is the number of the data of the non-null value in the sequence;

in the second situation, the data of the sequence after the whole screening has larger global fluctuation but more stable local, the data value fluctuates around the local mean value in a certain period of time and contains random errors, and the data is filled by adopting a previous time value method;

the calculation formula of the previous time value method is as follows:

y _i ＝y _i-1

in the formula, y _i For screening the empty value to be filled at time i in the subsequent sequence data, y _i-1 Is the value at time i-1;

the data value of the whole screened sequence stably increases, the current time value is obtained by adding the previous time value and the increase value, and the vacancy value is filled by adopting an accumulative value method;

the formula of the cumulative method is as follows:

in the formula, y _i For screening the empty value to be filled at time i in the subsequent sequence data, y _m Is the value at time m, y _n The time is the value of n, the time m is the time corresponding to the first non-vacancy value before the vacancy value to be filled, and the time n is the time corresponding to the first non-vacancy value after the vacancy value to be filled;

and fourthly, the overall fluctuation range of the sequence data after the whole screening is large, the sequence data rule is not obvious, and the vacancy value is filled by adopting a similarity algorithm, and the method comprises the following specific steps:

(1) Firstly, normalization processing is carried out on meteorological parameters corresponding to the vacancy values and the non-vacancy values by adopting a min-max standardization method, and the calculation formula is as follows:

in the formula, X is a certain value in an outdoor meteorological data set X, min { X } is the minimum value in the outdoor meteorological data set X, max { X } is the maximum value in the outdoor meteorological data set X, and y is data normalized by a min-max method;

(2) Calculating the Euclidean distance between the meteorological parameters corresponding to the vacancy values and the meteorological parameters corresponding to the non-vacancy values by adopting the following formula:

wherein d is the Euclidean distance, x _i A certain meteorological parameter, y, corresponding to the vacancy value _i A certain meteorological parameter corresponding to the non-vacancy value, and n is the type of the meteorological parameter participating in calculation;

(3) Finding out m non-vacancy value meteorological parameters with the nearest Euclidean distance to the vacancy value meteorological parameters, and then taking the average number of the non-vacancy values corresponding to the m non-vacancy value meteorological parameters as filling items of the vacancy values;

step six, visual pattern recognition is carried out again, whether abnormal points exist in the processed sequence or not is judged, and if the abnormal points do not exist in the processed sequence, the task of preprocessing the heat supply historical data is finished; if the abnormal point data exists, the steps I to V are repeated until the preprocessing task is completed.

The invention has the advantages and positive effects that:

1. through the preprocessing of the heat supply historical data, the quality of the heat supply historical data can be improved, and a foundation is laid for subsequent heat supply information data mining and big data analysis, so that a more accurate and reliable conclusion can be obtained.

2. Through the analysis of the characteristics of the heat supply historical data, data processing methods of different properties are provided in a targeted manner, so that the preprocessing effect is better, and the sequence data after preprocessing is closer to the true value.

Drawings

FIG. 1 is a flow chart of the present invention for pre-processing historical data;

FIG. 2 is a schematic diagram of a data sequence suitable for mean padding in the present invention;

FIG. 3 is a schematic diagram of a data sequence suitable for a previous time-valued padding in the present invention;

FIG. 4 is a schematic diagram of a data sequence suitable for filling by the accumulation method according to the present invention;

fig. 5 is a schematic diagram of a data sequence suitable for similarity algorithm padding in the present invention.

Detailed Description

The specific steps of the method for preprocessing the historical data of the heating system according to the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.

The invention discloses a method for preprocessing historical data of a heating system, which has a specific flow as shown in figure 1 and comprises the following steps:

the heat supply historical data is data to be preprocessed, the time scale is minute by minute or hour by hour, and the heat supply historical data comprises various operation data which are generated in the operation process of a heat supply system and are related to time sequences, namely water temperature data, flow data and pressure data of a heat supply pipe network and the tail end of a user, and information data such as power consumption, heat consumption, frequency of a pump, current of the pump and the like of a heat source and heat exchange station equipment. When the heat supply historical data contains abnormal values, the heat supply historical data needs to be subjected to next data preprocessing.

The outdoor meteorological data are real-time meteorological data observed by a meteorological station, and comprise outdoor temperature, solar radiation intensity and outdoor wind speed, and the time scale of the outdoor meteorological data is consistent with the heat supply historical data. The outdoor weather data is data obtained from a weather station, can be regarded as accurate data and does not need to be processed.

the original heating historical data sequence is continuously changed in time, and the change trend of heating operation data including water temperature, flow and the like is slow aiming at the characteristics of large inertia and large lag of a heating system, so that the heating operation data can be regarded as being composed of a trend item and a random fluctuation item in the time sequence.

And subtracting the heat supply operation data at the previous moment from the heat supply operation data at each moment by a first-order difference method to obtain a variation value of the heat supply operation data at each moment so as to remove a trend item in the heat supply operation data and extract a random fluctuation item in the sequence. The method is simple and easy to implement, and the random fluctuation item is extracted quickly.

Besides the first order difference method, the random fluctuation term can be extracted by using a wavelet decomposition method, specifically, the wavelet decomposition method disclosed in the electrocardiosignal denoising method and system (publication No. CN 107341769A).

the sequence only containing random fluctuation is obtained through reconstruction of the data sequence, and due to the fact that the heat supply historical data size is large, the reconstructed sequence only contains random fluctuation factors and accords with normal distribution, the abnormal value of the sequence to be processed can be identified by using the 3 sigma principle of the Lauda criterion. The method is simple, feasible and accurate to identify the abnormal value.

The 3 σ principle formula of the Lauda criterion is as follows:

P{μ-3σ≤X≤μ+3σ}＝99.7％

where μ is an average value of the sequence data to be processed and σ is a standard deviation of the sequence data to be processed.

The 3 σ rule indicates that if the sequence data contains only random errors and fits a normal distribution, 99.7% of the data will fall within the 3 σ interval, and only 0.3% of the data will fall outside the 3 σ interval, and the part of the data outside the 3 σ interval can be considered as gross errors, i.e., outliers, which should be identified and located.

The 3 sigma principle of the Lauda criterion can be used for identifying abnormal values of the reconstructed sequence and positioning abnormal points.

In addition to the lai criterion, an abnormal value can be identified by using a quartile method, specifically, see a method for obtaining a quartile box diagram disclosed in "a fan abnormal data processing method and apparatus based on a quartile box diagram" (publication number CN106897941 a).

according to the method of the third step, the position where the abnormal value appears can be located, the value of the corresponding original heat supply historical data can be removed according to the locating result, then the vacancy value is temporarily replaced, and the vacancy value is filled in the next step.

Fifthly, carrying out visual pattern identification on the characteristics of the screened sequence, and filling the empty values to form a processed sequence;

by analyzing the characteristics of heat supply historical data, the data distribution condition mainly has four types. The first type is characterized in that the overall fluctuation of the data of the whole screened sequence is small, the data value fluctuates around a certain value and contains random errors, so that the data vacancy value with the characteristics is filled by adopting an averaging method, and the specific form of the characteristics of the data can be seen in an attached figure 2.

The calculation formula of the averaging method is as follows:

wherein y is the vacancy value to be filled in the screened sequence, and x _i Is the ith in the sequenceAnd n is the number of the data of the non-vacancy values in the sequence.

And filling the vacancy values of the heat supply data with the characteristic of small global fluctuation aiming at the sequence data by using an averaging method.

The second type of heating data is characterized in that the overall fluctuation of the sequence data after the whole screening is large, but the local part is stable, the data value fluctuates around the local mean value in a certain period of time, and random errors exist, so that the data vacancy value with the characteristics is filled by adopting a previous time value method, and the specific form of the characteristics of the data can be seen in an attached figure 3.

The calculation formula of the previous time value method is as follows:

y _i ＝y _i-1

in the formula, y _i For screening the empty value to be filled at time i in the subsequent sequence data, y _i-1 Is the value at time i-1.

The method can fill the heat supply data vacancy value with the characteristics of large global fluctuation and stable local part aiming at the sequence data through a previous time value method.

The third kind of heating data is characterized in that the data value of the whole screened sequence is stably increased, the value at the current moment is obtained by adding the value at the previous moment and the increased value, therefore, the vacancy value with the characteristics is filled by adopting an accumulative value method, and the specific form of the characteristics of the data can be seen in an attached figure 4.

The formula of the cumulative method is as follows:

in the formula, y _i For screening the empty value to be filled at time i in the subsequent sequence data, y _m Is the value at time m, y _n Is the value at time n. The time m is the time corresponding to the first non-vacancy value before the vacancy value to be filled, and the time n is the time corresponding to the first non-vacancy value after the vacancy value to be filled.

And filling the vacancy values of the heat supply data with the addition characteristics aiming at the sequence data after screening by an accumulative value method.

The fourth type of heating data is characterized in that the overall fluctuation range of the sequence data after the whole screening is large, and the sequence data rule is not obvious, so that the vacancy values with the characteristics are filled by adopting a similarity algorithm, and the specific form of the characteristics of the data can be seen in an attached figure 5.

The principle of similarity algorithm is to compare the similarity of two things to judge the difference between individuals, the similarity measurement principle has many methods, the Euclidean distance is adopted in the invention to judge the similarity, and the similarity of different attribute data is determined by calculating the Euclidean distance.

Aiming at the characteristics of a heat supply system, the operation parameters of the heat supply system are extremely related to outdoor meteorological factors, the outdoor meteorological parameters are key factors for leading heat supply operation, each vacancy value corresponds to a meteorological parameter at a corresponding moment, m pieces of non-vacancy value meteorological parameters closest to the vacancy value meteorological parameters are found out by calculating the Euclidean distance between the meteorological parameters corresponding to the vacancy values and the meteorological parameters corresponding to the non-vacancy values, then the average number of the non-vacancy values corresponding to the m pieces of non-vacancy value meteorological parameters is taken as a filling item of the vacancy values, wherein the value of m can be automatically adjusted according to heat supply professionals, and 10 can be taken generally.

The method comprises the following specific steps:

in the formula, X is a certain value in the outdoor meteorological data set X, min { X } is the minimum value in the outdoor meteorological data set X, max { X } is the maximum value in the outdoor meteorological data set X, and y is data normalized by a min-max method.

(3) M non-vacancy value meteorological parameters which are closest to the vacancy value meteorological parameters in Euclidean distance are found out, then the average number of the non-vacancy values corresponding to the m non-vacancy value meteorological parameters is taken as a filling item of the vacancy value, and m is generally 10.

The following calculation is a specific example:

it is known that a certain heating history data set is shown in table 1, in which the absence of water supply temperature data with ID 6 needs to be filled.

TABLE 1 certain Heat supply historical data set table

ID	Outdoor temperature C	Solar radiation W/m ²	Outdoor wind speed m/s	Temperature of water supply
					1	8.8	105	0	62.8
2	9.9	239	0	63
					3	11	264	0	62
4	11	279	0.8	61.3
					5	11.3	286	0.3	60.7
6	11.4	241	2.2	Value of vacancy
					7	11.3	230	1.8	63.1
8	11.4	97	1.5	60.5
					9	11	50	0.7	60.9
10	6.6	99	2.4	67.7
					11	7.1	224	0.7	67.3
12	7.3	209	0	66.3
					13	7.5	274	1.5	65.3
14	8.4	347	2.1	64.8
					15	9.3	440	0.8	63.9
16	9.6	338	0.5	63.8
					17	9.7	263	0.6	63.6
18	9.6	108	0.3	64.9
					19	7.9	280	1.8	66.5
20	8.2	242	0.8	66.3

(1) And normalizing the meteorological data. The normalized meteorological parameters are shown in table 2.

TABLE 2 weather data after normalization

The data in the table is passed through table 1

And calculating to obtain the following steps:

data with ID of 1 were normalized to outdoor temperature

Solar radiation is

Outdoor wind speed is

(2) The Euclidean distances between the meteorological parameters corresponding to the vacancy values and the meteorological parameters corresponding to the non-vacancy values are calculated and sorted, as shown in Table 3.

Table 3 euclidean distance ranking table

ID	Outdoor temperature C	Solar radiation W/m ²	Outdoor wind speed m/s	Temperature of water supply	Euclidean distance
						6	1.00	0.49	0.92	Value of vacancy	0.00
7	0.98	0.46	0.75	63.1	0.17
						8	1.00	0.12	0.63	60.5	0.47
4	0.92	0.59	0.33	61.3	0.60
						14	0.38	0.76	0.88	64.8	0.68
19	0.27	0.59	0.75	66.5	0.75
						17	0.65	0.55	0.25	63.6	0.76
9	0.92	0.00	0.29	60.9	0.80
						5	0.98	0.61	0.13	60.7	0.80
16	0.63	0.74	0.21	63.8	0.84
						13	0.19	0.57	0.63	65.3	0.87
20	0.33	0.49	0.33	66.3	0.89
						15	0.56	1.00	0.33	63.9	0.89
3	0.92	0.55	0.00	62	0.92
						18	0.63	0.15	0.13	64.9	0.94
2	0.69	0.48	0.00	63	0.97
						10	0.00	0.13	1.00	67.7	1.07
11	0.10	0.45	0.29	67.3	1.09
						1	0.46	0.14	0.00	62.8	1.12
12	0.15	0.41	0.00	66.3	1.26

The data in the table is passed through table 2

And calculating to obtain the following steps:

calculating the Euclidean distance between the meteorological parameter with the ID of 1 and the meteorological parameter with the vacancy value (the ID of 6):

(3) Finding out 10 non-vacancy value meteorological parameters which are closest to the Euclidean distance of the vacancy value meteorological parameters, then taking the average number of the non-vacancy values corresponding to the 10 non-vacancy value meteorological parameters as filling items of the vacancy values, and filling the water supply temperature vacancy value with the ID of 6 to be 63.05 ℃.

And the heat supply data vacancy value with the characteristics of large global fluctuation range and unobvious sequence data rule of the sequence data can be filled through a similarity algorithm.

Step six, carrying out visual pattern identification again, judging whether abnormal points exist in the processed sequence, and if not, finishing the task of preprocessing the heat supply historical data; if the abnormal point data exists, the steps I to V are repeated until the preprocessing task is completed.

Because the heat supply data has the inherent characteristics of large data such as high dimensionality, long time scale, large data volume and the like, the data preprocessing is a relatively complex process, the problems of abnormal values and vacant values in the data are difficult to solve through one preprocessing process, the result of each preprocessing needs to be judged by combining the professional knowledge of heat supply personnel, and whether the result meets the subsequent data mining requirements or not is determined.

Therefore, after the abnormal point identification and removal and the vacancy value filling are completed, the preprocessing result is displayed through the visual graph, the heat supply professional judges the visual result, whether the abnormal point is completely removed, whether the vacancy value is completely filled and whether the requirement of subsequent heat supply data mining is met are determined, and if the heat supply professional determines that the expected processing effect is achieved, the preprocessing process is ended; and if the heat supply professional deems that the expected treatment effect is not achieved, repeating the flow from the first step to the fifth step until the expected treatment effect is achieved, and completing the task of pretreatment.

It should be noted that the summary and the detailed description of the invention are intended to demonstrate the practical application of the technical solutions provided by the present invention, and should not be construed as limiting the scope of the present invention. Various modifications, equivalent alterations, and improvements will occur to those skilled in the art and are intended to be within the spirit and scope of the invention. Such changes and modifications are intended to be included within the scope of the appended claims.

Claims

1. A heating system historical data preprocessing method is characterized by comprising the following steps:

the calculation formula of the averaging method is as follows:

the calculation formula of the previous time value method is as follows:

y _i ＝y _i-1

the formula of the cumulative method is as follows:

in the formula, y _i For screening the empty value to be filled at time i in the subsequent sequence data, y _m Is the value of m time, y _n The time is the value of n, the time m is the time corresponding to the first non-vacancy value before the vacancy value to be filled, and the time n is the time corresponding to the first non-vacancy value after the vacancy value to be filled;

(1) Firstly, normalization processing is carried out on meteorological parameters corresponding to the vacancy value and the non-vacancy value by adopting a min-max standardization method, and the calculation formula is as follows:

2. A heating system history data preprocessing method according to claim 1, characterized in that: and subtracting the heat supply operation data at the previous moment from the heat supply operation data at each moment by a first-order difference method to obtain a variation value of the heat supply operation data at each moment so as to remove a trend item in the heat supply operation data and extract a random fluctuation item in the sequence.

3. A heating system history data preprocessing method according to claim 1 or 2, characterized in that: the identification of outliers is performed on the sequences to be processed using the 3 sigma principle of the Lauda criterion.