CN109190184B - Heat supply system historical data preprocessing method - Google Patents
Heat supply system historical data preprocessing method Download PDFInfo
- Publication number
- CN109190184B CN109190184B CN201810903746.1A CN201810903746A CN109190184B CN 109190184 B CN109190184 B CN 109190184B CN 201810903746 A CN201810903746 A CN 201810903746A CN 109190184 B CN109190184 B CN 109190184B
- Authority
- CN
- China
- Prior art keywords
- value
- data
- sequence
- vacancy
- heat supply
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2119/00—Details relating to the type or aim of the analysis or the optimisation
- G06F2119/08—Thermal analysis or thermal optimisation
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Hardware Design (AREA)
- Evolutionary Computation (AREA)
- Geometry (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Investigating Or Analyzing Materials Using Thermal Means (AREA)
Abstract
The invention discloses a heat supply system historical data preprocessing method, which comprises the steps of importing collected heat supply historical data and corresponding outdoor meteorological data into a computer; reconstructing heat supply historical data to remove trend items, extracting random fluctuation items in the sequences to generate a group of sequences to be processed; carrying out abnormal value identification on the sequence to be processed and positioning an abnormal point; removing the value of the corresponding original heat supply historical data sequence according to the positioning result of the abnormal value of the sequence to be processed to form a screened sequence of the vacancy value to be filled; carrying out visual pattern identification on the characteristics of the screened sequence, and filling the empty values to form a processed sequence; and repeating the steps, and carrying out visual pattern identification again until completing the task of preprocessing the heat supply historical data. The method can improve the accuracy of subsequent data mining.
Description
Technical Field
The invention relates to a data processing method, in particular to a method for preprocessing historical data of a heating system.
Technical Field
With the rapid development of the current production technology, the cost price of various sensors is greatly reduced, so that the sensors are widely deployed in various research fields of the whole society to obtain various monitoring data. The data contains rich information of the monitored system, and some interesting hidden information can be extracted from the massive information through the technologies of data mining and big data analysis, so that the system can be assisted to operate better.
In the field of heat supply, a plurality of sensors are also arranged on a heat supply pipe network and heat supply equipment to monitor the operation of a heat supply system, so that on one hand, the water temperature, the flow and the like of the heat supply pipe network are monitored to meet the requirements of heat load; on the other hand, the heat supply equipment including pipe network pressure, water pump operation frequency, current and the like are monitored, so that the safety and low-energy-consumption operation of a heat supply system are met.
For the heating system, a lot of heating historical data are accumulated through monitoring of the sensor, and data mining and big data analysis are carried out on the heating historical data, so that a lot of information can be obtained to assist the heating system to better operate. But heating history data has some problems inherent to big data. On one hand, the quality of the monitored heating system data is affected due to the uneven quality of the sensors; on the other hand, even if the quality of the sensor itself is too poor, the data quality of the heating system may be affected due to some reasons of installation.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a method for preprocessing the historical data of a heating system, which can improve the quality of the historical data of heating.
A heating system historical data preprocessing method comprises the following steps:
step one, importing collected heat supply historical data and corresponding outdoor meteorological data into a computer, wherein all the outdoor meteorological data form an outdoor meteorological data set X;
the heat supply historical data is data to be preprocessed, the time scale is minute by minute or hour by hour, the heat supply historical data comprises various types of operation data which are generated in the operation process of a heat supply system and are related to time sequence, and the outdoor meteorological data is real-time meteorological data observed by a meteorological station;
reconstructing and removing trend items in heat supply operation data according to heat supply historical data, and extracting random fluctuation items in the sequences to generate a group of sequences to be processed;
thirdly, identifying abnormal values of the sequence to be processed and positioning abnormal points;
removing the corresponding value of the original heat supply historical data sequence according to the positioning result of the abnormal value of the sequence to be processed to form a screened sequence of the vacancy value to be filled;
step five, performing visual pattern identification on the characteristics of the screened sequence, and filling the empty values to form a processed sequence, wherein the sequence is divided into four cases;
in the first situation, when the overall fluctuation of the sequence data is small after the whole screening, the data value fluctuates around a certain value and contains random errors, the data vacancy value is filled by adopting an averaging method;
the calculation formula of the averaging method is as follows:
wherein y is the vacancy value to be filled in the screened sequence, and x i The data of the ith non-null value in the sequence is used, and n is the number of the data of the non-null value in the sequence;
in the second situation, the data of the sequence after the whole screening has larger global fluctuation but more stable local, the data value fluctuates around the local mean value in a certain period of time and contains random errors, and the data is filled by adopting a previous time value method;
the calculation formula of the previous time value method is as follows:
y i =y i-1
in the formula, y i For screening the empty value to be filled at time i in the subsequent sequence data, y i-1 Is the value at time i-1;
the data value of the whole screened sequence stably increases, the current time value is obtained by adding the previous time value and the increase value, and the vacancy value is filled by adopting an accumulative value method;
the formula of the cumulative method is as follows:
in the formula, y i For screening the empty value to be filled at time i in the subsequent sequence data, y m Is the value at time m, y n The time is the value of n, the time m is the time corresponding to the first non-vacancy value before the vacancy value to be filled, and the time n is the time corresponding to the first non-vacancy value after the vacancy value to be filled;
and fourthly, the overall fluctuation range of the sequence data after the whole screening is large, the sequence data rule is not obvious, and the vacancy value is filled by adopting a similarity algorithm, and the method comprises the following specific steps:
(1) Firstly, normalization processing is carried out on meteorological parameters corresponding to the vacancy values and the non-vacancy values by adopting a min-max standardization method, and the calculation formula is as follows:
in the formula, X is a certain value in an outdoor meteorological data set X, min { X } is the minimum value in the outdoor meteorological data set X, max { X } is the maximum value in the outdoor meteorological data set X, and y is data normalized by a min-max method;
(2) Calculating the Euclidean distance between the meteorological parameters corresponding to the vacancy values and the meteorological parameters corresponding to the non-vacancy values by adopting the following formula:
wherein d is the Euclidean distance, x i A certain meteorological parameter, y, corresponding to the vacancy value i A certain meteorological parameter corresponding to the non-vacancy value, and n is the type of the meteorological parameter participating in calculation;
(3) Finding out m non-vacancy value meteorological parameters with the nearest Euclidean distance to the vacancy value meteorological parameters, and then taking the average number of the non-vacancy values corresponding to the m non-vacancy value meteorological parameters as filling items of the vacancy values;
step six, visual pattern recognition is carried out again, whether abnormal points exist in the processed sequence or not is judged, and if the abnormal points do not exist in the processed sequence, the task of preprocessing the heat supply historical data is finished; if the abnormal point data exists, the steps I to V are repeated until the preprocessing task is completed.
The invention has the advantages and positive effects that:
1. through the preprocessing of the heat supply historical data, the quality of the heat supply historical data can be improved, and a foundation is laid for subsequent heat supply information data mining and big data analysis, so that a more accurate and reliable conclusion can be obtained.
2. Through the analysis of the characteristics of the heat supply historical data, data processing methods of different properties are provided in a targeted manner, so that the preprocessing effect is better, and the sequence data after preprocessing is closer to the true value.
Drawings
FIG. 1 is a flow chart of the present invention for pre-processing historical data;
FIG. 2 is a schematic diagram of a data sequence suitable for mean padding in the present invention;
FIG. 3 is a schematic diagram of a data sequence suitable for a previous time-valued padding in the present invention;
FIG. 4 is a schematic diagram of a data sequence suitable for filling by the accumulation method according to the present invention;
fig. 5 is a schematic diagram of a data sequence suitable for similarity algorithm padding in the present invention.
Detailed Description
The specific steps of the method for preprocessing the historical data of the heating system according to the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.
The invention discloses a method for preprocessing historical data of a heating system, which has a specific flow as shown in figure 1 and comprises the following steps:
step one, importing collected heat supply historical data and corresponding outdoor meteorological data into a computer, wherein all the outdoor meteorological data form an outdoor meteorological data set X;
the heat supply historical data is data to be preprocessed, the time scale is minute by minute or hour by hour, and the heat supply historical data comprises various operation data which are generated in the operation process of a heat supply system and are related to time sequences, namely water temperature data, flow data and pressure data of a heat supply pipe network and the tail end of a user, and information data such as power consumption, heat consumption, frequency of a pump, current of the pump and the like of a heat source and heat exchange station equipment. When the heat supply historical data contains abnormal values, the heat supply historical data needs to be subjected to next data preprocessing.
The outdoor meteorological data are real-time meteorological data observed by a meteorological station, and comprise outdoor temperature, solar radiation intensity and outdoor wind speed, and the time scale of the outdoor meteorological data is consistent with the heat supply historical data. The outdoor weather data is data obtained from a weather station, can be regarded as accurate data and does not need to be processed.
Reconstructing and removing trend items in heat supply operation data according to heat supply historical data, and extracting random fluctuation items in the sequences to generate a group of sequences to be processed;
the original heating historical data sequence is continuously changed in time, and the change trend of heating operation data including water temperature, flow and the like is slow aiming at the characteristics of large inertia and large lag of a heating system, so that the heating operation data can be regarded as being composed of a trend item and a random fluctuation item in the time sequence.
And subtracting the heat supply operation data at the previous moment from the heat supply operation data at each moment by a first-order difference method to obtain a variation value of the heat supply operation data at each moment so as to remove a trend item in the heat supply operation data and extract a random fluctuation item in the sequence. The method is simple and easy to implement, and the random fluctuation item is extracted quickly.
Besides the first order difference method, the random fluctuation term can be extracted by using a wavelet decomposition method, specifically, the wavelet decomposition method disclosed in the electrocardiosignal denoising method and system (publication No. CN 107341769A).
Thirdly, identifying abnormal values of the sequence to be processed and positioning abnormal points;
the sequence only containing random fluctuation is obtained through reconstruction of the data sequence, and due to the fact that the heat supply historical data size is large, the reconstructed sequence only contains random fluctuation factors and accords with normal distribution, the abnormal value of the sequence to be processed can be identified by using the 3 sigma principle of the Lauda criterion. The method is simple, feasible and accurate to identify the abnormal value.
The 3 σ principle formula of the Lauda criterion is as follows:
P{μ-3σ≤X≤μ+3σ}=99.7%
where μ is an average value of the sequence data to be processed and σ is a standard deviation of the sequence data to be processed.
The 3 σ rule indicates that if the sequence data contains only random errors and fits a normal distribution, 99.7% of the data will fall within the 3 σ interval, and only 0.3% of the data will fall outside the 3 σ interval, and the part of the data outside the 3 σ interval can be considered as gross errors, i.e., outliers, which should be identified and located.
The 3 sigma principle of the Lauda criterion can be used for identifying abnormal values of the reconstructed sequence and positioning abnormal points.
In addition to the lai criterion, an abnormal value can be identified by using a quartile method, specifically, see a method for obtaining a quartile box diagram disclosed in "a fan abnormal data processing method and apparatus based on a quartile box diagram" (publication number CN106897941 a).
Removing the corresponding value of the original heat supply historical data sequence according to the positioning result of the abnormal value of the sequence to be processed to form a screened sequence of the vacancy value to be filled;
according to the method of the third step, the position where the abnormal value appears can be located, the value of the corresponding original heat supply historical data can be removed according to the locating result, then the vacancy value is temporarily replaced, and the vacancy value is filled in the next step.
Fifthly, carrying out visual pattern identification on the characteristics of the screened sequence, and filling the empty values to form a processed sequence;
by analyzing the characteristics of heat supply historical data, the data distribution condition mainly has four types. The first type is characterized in that the overall fluctuation of the data of the whole screened sequence is small, the data value fluctuates around a certain value and contains random errors, so that the data vacancy value with the characteristics is filled by adopting an averaging method, and the specific form of the characteristics of the data can be seen in an attached figure 2.
The calculation formula of the averaging method is as follows:
wherein y is the vacancy value to be filled in the screened sequence, and x i Is the ith in the sequenceAnd n is the number of the data of the non-vacancy values in the sequence.
And filling the vacancy values of the heat supply data with the characteristic of small global fluctuation aiming at the sequence data by using an averaging method.
The second type of heating data is characterized in that the overall fluctuation of the sequence data after the whole screening is large, but the local part is stable, the data value fluctuates around the local mean value in a certain period of time, and random errors exist, so that the data vacancy value with the characteristics is filled by adopting a previous time value method, and the specific form of the characteristics of the data can be seen in an attached figure 3.
The calculation formula of the previous time value method is as follows:
y i =y i-1
in the formula, y i For screening the empty value to be filled at time i in the subsequent sequence data, y i-1 Is the value at time i-1.
The method can fill the heat supply data vacancy value with the characteristics of large global fluctuation and stable local part aiming at the sequence data through a previous time value method.
The third kind of heating data is characterized in that the data value of the whole screened sequence is stably increased, the value at the current moment is obtained by adding the value at the previous moment and the increased value, therefore, the vacancy value with the characteristics is filled by adopting an accumulative value method, and the specific form of the characteristics of the data can be seen in an attached figure 4.
The formula of the cumulative method is as follows:
in the formula, y i For screening the empty value to be filled at time i in the subsequent sequence data, y m Is the value at time m, y n Is the value at time n. The time m is the time corresponding to the first non-vacancy value before the vacancy value to be filled, and the time n is the time corresponding to the first non-vacancy value after the vacancy value to be filled.
And filling the vacancy values of the heat supply data with the addition characteristics aiming at the sequence data after screening by an accumulative value method.
The fourth type of heating data is characterized in that the overall fluctuation range of the sequence data after the whole screening is large, and the sequence data rule is not obvious, so that the vacancy values with the characteristics are filled by adopting a similarity algorithm, and the specific form of the characteristics of the data can be seen in an attached figure 5.
The principle of similarity algorithm is to compare the similarity of two things to judge the difference between individuals, the similarity measurement principle has many methods, the Euclidean distance is adopted in the invention to judge the similarity, and the similarity of different attribute data is determined by calculating the Euclidean distance.
Aiming at the characteristics of a heat supply system, the operation parameters of the heat supply system are extremely related to outdoor meteorological factors, the outdoor meteorological parameters are key factors for leading heat supply operation, each vacancy value corresponds to a meteorological parameter at a corresponding moment, m pieces of non-vacancy value meteorological parameters closest to the vacancy value meteorological parameters are found out by calculating the Euclidean distance between the meteorological parameters corresponding to the vacancy values and the meteorological parameters corresponding to the non-vacancy values, then the average number of the non-vacancy values corresponding to the m pieces of non-vacancy value meteorological parameters is taken as a filling item of the vacancy values, wherein the value of m can be automatically adjusted according to heat supply professionals, and 10 can be taken generally.
The method comprises the following specific steps:
(1) Firstly, normalization processing is carried out on meteorological parameters corresponding to the vacancy values and the non-vacancy values by adopting a min-max standardization method, and the calculation formula is as follows:
in the formula, X is a certain value in the outdoor meteorological data set X, min { X } is the minimum value in the outdoor meteorological data set X, max { X } is the maximum value in the outdoor meteorological data set X, and y is data normalized by a min-max method.
(2) Calculating the Euclidean distance between the meteorological parameters corresponding to the vacancy values and the meteorological parameters corresponding to the non-vacancy values by adopting the following formula:
wherein d is the Euclidean distance, x i A certain meteorological parameter, y, corresponding to the vacancy value i A certain meteorological parameter corresponding to the non-vacancy value, and n is the type of the meteorological parameter participating in calculation;
(3) M non-vacancy value meteorological parameters which are closest to the vacancy value meteorological parameters in Euclidean distance are found out, then the average number of the non-vacancy values corresponding to the m non-vacancy value meteorological parameters is taken as a filling item of the vacancy value, and m is generally 10.
The following calculation is a specific example:
it is known that a certain heating history data set is shown in table 1, in which the absence of water supply temperature data with ID 6 needs to be filled.
TABLE 1 certain Heat supply historical data set table
ID | Outdoor temperature C | Solar radiation W/m 2 | Outdoor wind speed m/s | Temperature of water supply |
1 | 8.8 | 105 | 0 | 62.8 |
2 | 9.9 | 239 | 0 | 63 |
3 | 11 | 264 | 0 | 62 |
4 | 11 | 279 | 0.8 | 61.3 |
5 | 11.3 | 286 | 0.3 | 60.7 |
6 | 11.4 | 241 | 2.2 | Value of vacancy |
7 | 11.3 | 230 | 1.8 | 63.1 |
8 | 11.4 | 97 | 1.5 | 60.5 |
9 | 11 | 50 | 0.7 | 60.9 |
10 | 6.6 | 99 | 2.4 | 67.7 |
11 | 7.1 | 224 | 0.7 | 67.3 |
12 | 7.3 | 209 | 0 | 66.3 |
13 | 7.5 | 274 | 1.5 | 65.3 |
14 | 8.4 | 347 | 2.1 | 64.8 |
15 | 9.3 | 440 | 0.8 | 63.9 |
16 | 9.6 | 338 | 0.5 | 63.8 |
17 | 9.7 | 263 | 0.6 | 63.6 |
18 | 9.6 | 108 | 0.3 | 64.9 |
19 | 7.9 | 280 | 1.8 | 66.5 |
20 | 8.2 | 242 | 0.8 | 66.3 |
(1) And normalizing the meteorological data. The normalized meteorological parameters are shown in table 2.
TABLE 2 weather data after normalization
(2) The Euclidean distances between the meteorological parameters corresponding to the vacancy values and the meteorological parameters corresponding to the non-vacancy values are calculated and sorted, as shown in Table 3.
Table 3 euclidean distance ranking table
ID | Outdoor temperature C | Solar radiation W/m 2 | Outdoor wind speed m/s | Temperature of water supply | Euclidean distance |
6 | 1.00 | 0.49 | 0.92 | Value of vacancy | 0.00 |
7 | 0.98 | 0.46 | 0.75 | 63.1 | 0.17 |
8 | 1.00 | 0.12 | 0.63 | 60.5 | 0.47 |
4 | 0.92 | 0.59 | 0.33 | 61.3 | 0.60 |
14 | 0.38 | 0.76 | 0.88 | 64.8 | 0.68 |
19 | 0.27 | 0.59 | 0.75 | 66.5 | 0.75 |
17 | 0.65 | 0.55 | 0.25 | 63.6 | 0.76 |
9 | 0.92 | 0.00 | 0.29 | 60.9 | 0.80 |
5 | 0.98 | 0.61 | 0.13 | 60.7 | 0.80 |
16 | 0.63 | 0.74 | 0.21 | 63.8 | 0.84 |
13 | 0.19 | 0.57 | 0.63 | 65.3 | 0.87 |
20 | 0.33 | 0.49 | 0.33 | 66.3 | 0.89 |
15 | 0.56 | 1.00 | 0.33 | 63.9 | 0.89 |
3 | 0.92 | 0.55 | 0.00 | 62 | 0.92 |
18 | 0.63 | 0.15 | 0.13 | 64.9 | 0.94 |
2 | 0.69 | 0.48 | 0.00 | 63 | 0.97 |
10 | 0.00 | 0.13 | 1.00 | 67.7 | 1.07 |
11 | 0.10 | 0.45 | 0.29 | 67.3 | 1.09 |
1 | 0.46 | 0.14 | 0.00 | 62.8 | 1.12 |
12 | 0.15 | 0.41 | 0.00 | 66.3 | 1.26 |
calculating the Euclidean distance between the meteorological parameter with the ID of 1 and the meteorological parameter with the vacancy value (the ID of 6):
(3) Finding out 10 non-vacancy value meteorological parameters which are closest to the Euclidean distance of the vacancy value meteorological parameters, then taking the average number of the non-vacancy values corresponding to the 10 non-vacancy value meteorological parameters as filling items of the vacancy values, and filling the water supply temperature vacancy value with the ID of 6 to be 63.05 ℃.
And the heat supply data vacancy value with the characteristics of large global fluctuation range and unobvious sequence data rule of the sequence data can be filled through a similarity algorithm.
Step six, carrying out visual pattern identification again, judging whether abnormal points exist in the processed sequence, and if not, finishing the task of preprocessing the heat supply historical data; if the abnormal point data exists, the steps I to V are repeated until the preprocessing task is completed.
Because the heat supply data has the inherent characteristics of large data such as high dimensionality, long time scale, large data volume and the like, the data preprocessing is a relatively complex process, the problems of abnormal values and vacant values in the data are difficult to solve through one preprocessing process, the result of each preprocessing needs to be judged by combining the professional knowledge of heat supply personnel, and whether the result meets the subsequent data mining requirements or not is determined.
Therefore, after the abnormal point identification and removal and the vacancy value filling are completed, the preprocessing result is displayed through the visual graph, the heat supply professional judges the visual result, whether the abnormal point is completely removed, whether the vacancy value is completely filled and whether the requirement of subsequent heat supply data mining is met are determined, and if the heat supply professional determines that the expected processing effect is achieved, the preprocessing process is ended; and if the heat supply professional deems that the expected treatment effect is not achieved, repeating the flow from the first step to the fifth step until the expected treatment effect is achieved, and completing the task of pretreatment.
It should be noted that the summary and the detailed description of the invention are intended to demonstrate the practical application of the technical solutions provided by the present invention, and should not be construed as limiting the scope of the present invention. Various modifications, equivalent alterations, and improvements will occur to those skilled in the art and are intended to be within the spirit and scope of the invention. Such changes and modifications are intended to be included within the scope of the appended claims.
Claims (3)
1. A heating system historical data preprocessing method is characterized by comprising the following steps:
step one, importing collected heat supply historical data and corresponding outdoor meteorological data into a computer, wherein all the outdoor meteorological data form an outdoor meteorological data set X;
the heat supply historical data is data to be preprocessed, the time scale is minute by minute or hour by hour, the heat supply historical data comprises various types of operation data which are generated in the operation process of a heat supply system and are related to time sequence, and the outdoor meteorological data is real-time meteorological data observed by a meteorological station;
reconstructing and removing trend items in heat supply operation data according to heat supply historical data, and extracting random fluctuation items in the sequences to generate a group of sequences to be processed;
thirdly, identifying abnormal values of the sequence to be processed and positioning abnormal points;
removing the corresponding value of the original heat supply historical data sequence according to the positioning result of the abnormal value of the sequence to be processed to form a screened sequence of the vacancy value to be filled;
step five, performing visual pattern identification on the characteristics of the screened sequence, and filling the empty values to form a processed sequence, wherein the sequence is divided into four cases;
in the first situation, when the overall fluctuation of the sequence data is small after the whole screening, the data value fluctuates around a certain value and contains random errors, the data vacancy value is filled by adopting an averaging method;
the calculation formula of the averaging method is as follows:
wherein y is the vacancy value to be filled in the screened sequence, and x i The data of the ith non-null value in the sequence is used, and n is the number of the data of the non-null value in the sequence;
in the second situation, the data of the sequence after the whole screening has larger global fluctuation but more stable local, the data value fluctuates around the local mean value in a certain period of time and contains random errors, and the data is filled by adopting a previous time value method;
the calculation formula of the previous time value method is as follows:
y i =y i-1
in the formula, y i For screening the empty value to be filled at time i in the subsequent sequence data, y i-1 Is the value at time i-1;
the data value of the whole screened sequence stably increases, the current time value is obtained by adding the previous time value and the increase value, and the vacancy value is filled by adopting an accumulative value method;
the formula of the cumulative method is as follows:
in the formula, y i For screening the empty value to be filled at time i in the subsequent sequence data, y m Is the value of m time, y n The time is the value of n, the time m is the time corresponding to the first non-vacancy value before the vacancy value to be filled, and the time n is the time corresponding to the first non-vacancy value after the vacancy value to be filled;
and fourthly, the overall fluctuation range of the sequence data after the whole screening is large, the sequence data rule is not obvious, and the vacancy value is filled by adopting a similarity algorithm, and the method comprises the following specific steps:
(1) Firstly, normalization processing is carried out on meteorological parameters corresponding to the vacancy value and the non-vacancy value by adopting a min-max standardization method, and the calculation formula is as follows:
in the formula, X is a certain value in an outdoor meteorological data set X, min { X } is the minimum value in the outdoor meteorological data set X, max { X } is the maximum value in the outdoor meteorological data set X, and y is data normalized by a min-max method;
(2) Calculating the Euclidean distance between the meteorological parameters corresponding to the vacancy values and the meteorological parameters corresponding to the non-vacancy values by adopting the following formula:
wherein d is the Euclidean distance, x i A certain meteorological parameter, y, corresponding to the vacancy value i A certain meteorological parameter corresponding to the non-vacancy value, and n is the type of the meteorological parameter participating in calculation;
(3) Finding out m non-vacancy value meteorological parameters with the nearest Euclidean distance to the vacancy value meteorological parameters, and then taking the average number of the non-vacancy values corresponding to the m non-vacancy value meteorological parameters as filling items of the vacancy values;
step six, carrying out visual pattern identification again, judging whether abnormal points exist in the processed sequence, and if not, finishing the task of preprocessing the heat supply historical data; if the abnormal point data exists, the steps I to V are repeated until the preprocessing task is completed.
2. A heating system history data preprocessing method according to claim 1, characterized in that: and subtracting the heat supply operation data at the previous moment from the heat supply operation data at each moment by a first-order difference method to obtain a variation value of the heat supply operation data at each moment so as to remove a trend item in the heat supply operation data and extract a random fluctuation item in the sequence.
3. A heating system history data preprocessing method according to claim 1 or 2, characterized in that: the identification of outliers is performed on the sequences to be processed using the 3 sigma principle of the Lauda criterion.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810903746.1A CN109190184B (en) | 2018-08-09 | 2018-08-09 | Heat supply system historical data preprocessing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810903746.1A CN109190184B (en) | 2018-08-09 | 2018-08-09 | Heat supply system historical data preprocessing method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109190184A CN109190184A (en) | 2019-01-11 |
CN109190184B true CN109190184B (en) | 2022-12-09 |
Family
ID=64921312
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810903746.1A Active CN109190184B (en) | 2018-08-09 | 2018-08-09 | Heat supply system historical data preprocessing method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109190184B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110081508B (en) * | 2019-03-18 | 2021-04-30 | 天津理工大学 | Method for reducing energy consumption of regional heating system based on big data |
CN110748951A (en) * | 2019-11-01 | 2020-02-04 | 北京硕人时代科技股份有限公司 | Method, device and system for determining heat supply energy saving amount |
CN112199365A (en) * | 2020-10-26 | 2021-01-08 | 天津大学 | Abnormal identification method for monitoring data of heat supply system |
CN112559827A (en) * | 2020-12-08 | 2021-03-26 | 上海上实龙创智能科技股份有限公司 | Measurement parameter prediction and sewage treatment control method based on deep learning |
CN114964042B (en) * | 2022-05-20 | 2023-10-20 | 西安交通大学 | Method for distinguishing and identifying abnormal points in data in curve profile online measurement |
CN115933787B (en) * | 2023-03-14 | 2023-05-16 | 西安英图克环境科技有限公司 | Indoor multi-terminal intelligent control system based on indoor environment monitoring |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016101690A1 (en) * | 2014-12-22 | 2016-06-30 | 国家电网公司 | Time sequence analysis-based state monitoring data cleaning method for power transmission and transformation device |
CN106846164A (en) * | 2016-08-27 | 2017-06-13 | 董涛 | Intelligent grid data managing method |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11113337B2 (en) * | 2016-09-08 | 2021-09-07 | Indian Institute Of Technology Bombay | Method for imputing missed data in sensor data sequence with missing data |
-
2018
- 2018-08-09 CN CN201810903746.1A patent/CN109190184B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016101690A1 (en) * | 2014-12-22 | 2016-06-30 | 国家电网公司 | Time sequence analysis-based state monitoring data cleaning method for power transmission and transformation device |
CN106846164A (en) * | 2016-08-27 | 2017-06-13 | 董涛 | Intelligent grid data managing method |
Non-Patent Citations (3)
Title |
---|
一种大域数据流中缺失值的填充方法;赵飞等;《南京大学学报(自然科学版)》;20110130(第01期);全文 * |
基于时间序列分析的电力负荷数据预处理方法;王在乾等;《科技创新与应用》;20180308(第07期);全文 * |
集中供热管理数据挖掘系统;葛淑杰等;《黑龙江科技学院学报》;20041130(第06期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN109190184A (en) | 2019-01-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109190184B (en) | Heat supply system historical data preprocessing method | |
CN113255795B (en) | Equipment state monitoring method based on multi-index cluster analysis | |
WO2021179572A1 (en) | Operation and maintenance system anomaly index detection model optimization method and apparatus, and storage medium | |
CN106780121B (en) | Power consumption abnormity identification method based on power consumption load mode analysis | |
CN104794535B (en) | A kind of method of electric power demand forecasting and early warning based on Dominant Industry | |
CN110837874B (en) | Business data anomaly detection method based on time sequence classification | |
CN109947815B (en) | Power theft identification method based on outlier algorithm | |
CN112213687B (en) | Gateway electric energy meter data anomaly detection method and system based on pseudo-anomaly point identification | |
CN116956198B (en) | Intelligent electricity consumption data analysis method and system based on Internet of things | |
CN111915089A (en) | Method and device for predicting pump set energy consumption of sewage treatment plant | |
Oprime et al. | X-bar control chart design with asymmetric control limits and triple sampling | |
CN112329868A (en) | CLARA clustering-based manufacturing and processing equipment group energy efficiency state evaluation method | |
CN116629686A (en) | Method and device for evaluating enterprise energy consumption data | |
CN117172601A (en) | Non-invasive load monitoring method based on residual total convolution neural network | |
CN111623905A (en) | Wind turbine bearing temperature early warning method and device | |
Khaleghian et al. | Electric vehicle identification in low-sampling non-intrusive load monitoring systems using machine learning | |
CN113791186B (en) | Method and system for selecting water quality abnormality alarm monitoring factors | |
Luo et al. | Recognition and labeling of faults in wind turbines with a density-based clustering algorithm | |
KR101696105B1 (en) | Apparatus and Method for analyzing defect reason | |
CN113408210A (en) | Deep learning based non-intrusive load splitting method, system, medium, and apparatus | |
Yu et al. | Application of distance measure in KNN motor fault diagnosis | |
CN116595338B (en) | Engineering information acquisition and processing system based on Internet of things | |
CN114580467B (en) | Power data anomaly detection method and system based on data enhancement and Tri-Training | |
Xu et al. | Fault diagnostics by conceptors-aided clustering | |
CN117494063B (en) | Novel enterprise carbon emission monitoring method under power system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |