CN114840505A

CN114840505A - Main transformer big data preprocessing method based on big data analysis platform

Info

Publication number: CN114840505A
Application number: CN202210263204.9A
Authority: CN
Inventors: 于明; 林信; 包忠强; 李波; 黄丽娟; 周恒旺; 覃晖; 郭华; 谢瑞浩
Original assignee: Guangxi Power Grid Co Ltd
Current assignee: Guangxi Power Grid Co Ltd
Priority date: 2022-03-17
Filing date: 2022-03-17
Publication date: 2022-08-02

Abstract

The invention carries out data processing aiming at the current, the voltage, the active power and the reactive power of the main transformer extracted from a dispatching automation system, comprises repeated data detection and processing, abnormal data detection and processing, local outlier detection and processing and data integrity detection and processing, and can process the problems of noise, abnormity, deficiency, repetition and the like in the operation data current, the voltage, the active power and the reactive power of the main transformer. The quality of the data is improved through big data preprocessing, standard, continuous and accurate mass data are obtained, and the efficiency and the accuracy are improved for subsequent main transformer big data mining analysis. Meanwhile, the continuity and the priori of time, namely the relevance of the front moment and the back moment of the monitoring data of the main transformer, are considered when missing data completion is carried out, so that the calculation is carried out in a weight summation mode, the trend of a monitoring curve is considered, the completed value is more real and more accurate and is closer to the actually monitored data, and the accuracy of subsequent data mining is improved.

Description

Main transformer big data preprocessing method based on big data analysis platform

Technical Field

The invention belongs to the technical field of data preprocessing, and particularly relates to a main transformer big data preprocessing method based on a big data analysis platform.

Background

A main transformer, called a main transformer (GSU) for short, is a main step-down transformer mainly used for power transmission and transformation in a unit or a transformer substation, and is also a core part of the transformer substation. The transformer is a core device of the traction power supply system of the electric locomotive and is also a key device for ensuring the safe and stable operation of the traction power supply system. The main transformer generally has a relatively large capacity and requires high operational reliability. Although the failure rate of the main transformer is not high, once the failure occurs, the main transformer causes great loss, the main transformer can cause equipment failure if the main transformer fails, and fire can be caused if the main transformer fails, so that the normal transportation safety is endangered. Therefore, it is very important to analyze the cause of the transformer fault and take corresponding countermeasures. With the development of society and the progress of technology, state maintenance is a maintenance mode which can reduce maintenance cost, shorten maintenance power-off time and improve equipment utilization ratio compared with a regular maintenance mode, and has become a development direction for the maintenance of power equipment such as transformers and the like. And the key to correctly grasp the running state of the transformer is whether the condition maintenance is successful or not. At present, the main transformer is monitored and corresponding monitoring data is collected and analyzed by adopting monitoring equipment to correctly master the operation state of the main transformer, however, the collected monitoring data is abnormal due to system errors, network delay or other factors of the monitoring equipment, so that the difficulty is improved for the data analysis of the main transformer in the later period, and therefore the collected monitoring data needs to be preprocessed so as to be suitable for the data mining analysis in the later period.

Disclosure of Invention

In order to solve the problems, the invention provides a main transformer big data preprocessing method based on a big data analysis platform, and the specific technical scheme is as follows:

the main transformer big data preprocessing method based on the big data analysis platform comprises the following steps:

step S1, data acquisition and storage: extracting operation data of the main transformer from the dispatching automation system, wherein the operation data comprises current, voltage, active power and reactive power at each moment in operation, and storing the extracted operation data to a big data analysis platform;

step S2, repeating data detection and processing: the big data analysis platform detects repeated data in the extracted operating data of the main transformer, selects one of the repeated data to be reserved, eliminates redundant repeated data, and inputs the processed data into the step S3;

step S3: abnormal data detection and processing: the big data analysis platform detects whether the extracted running data of the main transformer is abnormal, if the extracted running data is abnormal, the abnormal data is removed, and the processed data is input to the step S4;

step S4: local outlier detection and processing: the big data analysis platform detects whether extracted operation data of the main transformer has local outliers, and if the extracted operation data of the main transformer has the local outliers, the local outliers are removed;

step S5, data integrity detection and processing: and the big data analysis platform detects whether the extracted operation data is complete, completes the missing value if the extracted operation data has the missing value, and outputs the completed operation data as finally processed data.

Preferably, the repeated data detection and processing in step S2 specifically includes the following steps:

step S21: dividing the extracted operating data of the main transformer into n data blocks according to types; each data block includesm objects; the object is described in

Representing the jth data in the ith data block in the kth type of operation data; wherein k is 1,2,3,4, respectively representing current, voltage, active power and reactive power; 1,2, · · n; j ═ 1,2, · · m; the above-mentioned

Expressed as a time object, expressed as a time and value pair;

step S22: detecting whether repeated data exist in any two objects in each data block in the kth type of running data by adopting XOR operation, if the operation result is 0, indicating that the repeated data exist, rejecting one of the repeated data, and if the operation result is 1, indicating that the repeated data do not exist in the data block;

step S23: after the repeated data is removed from each data block, the exclusive-or operation is performed on any two data blocks to detect whether the repeated data exists, namely, the exclusive-or operation is performed on each object of one data block and each object of the other data block, and if the repeated data exists, one data is reserved.

Preferably, the exclusive-or operation is performed on the time to eliminate the data at the repeated time.

Preferably, the abnormal data detection and processing in step S3 specifically includes the following steps:

step S31: setting the maximum value and the minimum value of the current, the voltage, the active power and the reactive power of the main transformer on a big data analysis platform;

step S32: and respectively detecting whether the extracted operating data of each type are between the set corresponding maximum value and the set corresponding minimum value, if the corresponding numerical values are not between the minimum value and the maximum value, judging as abnormal data, and performing numerical value elimination.

Preferably, the local outlier detection and processing in step S4 specifically includes the following steps:

step S41: dividing each type of extracted operating data of the main transformer into n data blocks; each object value of the data block is initialized to a maximum value with its (m + k) neighbor distance;

step S42: calculating the distance between each object value of the operation data and each object value of the first data block, updating (m + k) neighbor of each object value in the first data block, calculating the degree of outlier of each object value in real time, setting the degree of outlier to be infinite when the number of neighbors is less than m + k, and excluding the outlier from the data block when the degree of outlier is less than an initial threshold value c; the degree of outlier of each object value is the sum of the distances between the object value and the m +1 th to m + k th neighbors of the object value;

step S43: after the first data block is processed, sorting the object numerical values which are not excluded in the first data block from large to small according to the degree of outlier, taking the first n object numerical values, adding the first n object numerical values into a TOP n outlier, and updating a threshold c;

step S44: calculating the distance between each object value of the operation data and each object value of the second data block, updating (m + k) neighbors of each object value in the second data block, calculating the degree of outlier of each object value in real time, setting the degree of outlier to be infinite when the number of neighbors is less than m + k, and excluding the degree of outlier from the data block when the degree of outlier is less than a threshold value c;

step S45: after the second data block is processed, if the degree of outlier of the object value which is not excluded in the second data block is larger than the degree of outlier in the TOP n outlier, updating the TOP n outlier and updating the threshold c;

step S46: repeating steps S44-S45 for the ith data block, i being 3,4, 5 … … n; outputting TOP n outliers until all data blocks are processed;

in the steps S43 and S45, the threshold c is updated by setting the degree of outlier of the nth one of the TOP n outliers as the value of the threshold c.

Preferably, the threshold value c in step S42 is set to 0.

Preferably, the data integrity detection and processing in step S5 specifically includes the following steps:

step S51: for each type of operation data, after the processing of the steps S2-S4, the data loss comprises that the time and the numerical value are simultaneously lost, and only the numerical value is lost; detecting the integrity of the processed operating data of the main transformer, judging whether the data is complete, namely comprising corresponding time and numerical values, and if the data is missing, judging the corresponding data missing type;

step S52: if the time and the value are simultaneously missing, the corresponding missing time is supplemented, the missing type is converted into the missing of the value, and then the data is supplemented by adopting the method in the step S53;

step S53: if only the numerical value is missing, extracting N numerical values before and after the corresponding time of the missing numerical value, respectively calculating an average Eq of the N numerical values before the corresponding time and an average Eh of the N numerical values after the corresponding time, taking the N numerical values before as first data and the N numerical values after as second data, allocating a weight lambda to a first numerical value adjacent to the missing numerical value in the first data and the second data, allocating a weight a to a second numerical value adjacent to the missing numerical value in the first data and the second data, and taking the weight of N-2 data in the first data and the second data as the weight

Step S54: respectively multiplying N numerical values in the first data by corresponding weights and summing to obtain a first calculation value, respectively multiplying N numerical values in the second data by corresponding weights and summing to obtain a second calculation value, averaging the first calculation value and the second calculation value to obtain an intermediate calculation value, judging whether the intermediate calculation value is between the average values Eq and Eh, if the intermediate calculation value is between the average values Eq and Eh, using the intermediate calculation value as a supplement value of a missing value of the data, if the intermediate calculation value is not between the average values Eq and Eh, adjusting the values of the weight values lambda and a, and repeating the calculation to enable the calculated intermediate calculation value to be between the average values Eq and Eh.

Preferably, if the average values Eq and Eh are equal, the missing value at the corresponding time is the average value Eq or Eh.

Preferably, if the missing value is the value of the initial 2 moments, the next N values adjacent to the missing value are calculated, and whether the next value increases or decreases with time is observed, if the trend of the increase is positive, the calculated intermediate value is set to be smaller than the average Eh of the next N values at the corresponding moment, and if the trend of the decrease is negative, the calculated intermediate value is set to be larger than the average Eh of the next N values at the corresponding moment.

Preferably, if the missing value is the last 2 time values, the previous N adjacent values are calculated, and whether the previous value increases or decreases with time is observed, if the trend of the increase is positive, the calculated intermediate value is set to be larger than the average Eq of the next N corresponding time values, and if the trend of the decrease is negative, the calculated intermediate value is set to be smaller than the average Eq of the next N corresponding time values.

The invention has the beneficial effects that: the invention carries out data processing aiming at the current, the voltage, the active power and the reactive power of the main transformer extracted from a dispatching automation system, comprises repeated data detection and processing, abnormal data detection and processing, local outlier detection and processing and data integrity detection and processing, and can process the problems of noise, abnormity, deficiency, repetition and the like in the operation data current, the voltage, the active power and the reactive power of the main transformer. The quality of the data is improved through big data preprocessing, and standard, continuous and accurate mass data are obtained, so that the efficiency and the accuracy are improved for subsequent big data mining and analyzing of the main transformer. Meanwhile, the continuity and the priori of time, namely the relevance of the front moment and the back moment of the monitoring data of the main transformer, are considered when missing data completion is carried out, so that the calculation is carried out in a weight summation mode, the trend of a monitoring curve is considered, the completed value is more real and more accurate and is closer to the actually monitored data, and the accuracy of subsequent data mining is improved.

Drawings

In order to more clearly illustrate the detailed description of the invention or the technical solutions in the prior art, the drawings that are needed in the detailed description of the invention or the prior art will be briefly described below. Throughout the drawings, like elements or portions are generally identified by like reference numerals. In the drawings, elements or portions are not necessarily drawn to scale.

FIG. 1 is a schematic flow chart of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As shown in fig. 1, a main transformer big data preprocessing method based on a big data analysis platform according to an embodiment of the present invention includes the following steps:

step S1, data acquisition and storage: and extracting the operation data of the main transformer from the dispatching automation system, wherein the operation data comprises current, voltage, active power and reactive power at each moment in operation, and storing the extracted operation data to a big data analysis platform.

Step S2, repeating data detection and processing: and the big data analysis platform detects repeated data in the extracted operating data of the main transformer, selects one of the repeated data to be reserved, eliminates redundant repeated data, and inputs the processed data into the step S3.

The method specifically comprises the following steps:

step S21: dividing the extracted operating data of the main transformer into n data blocks according to types; each data block comprises m objects; the object is described in

Representing the jth data in the ith data block in the kth type of operation data; wherein k is 1,2,3,4, respectively representing current, voltage, active power and reactive power; 1,2, · · n; j ═ 1,2, · · m; the described

Expressed as a time object, expressed as a time and value pair;

step S23: after the repeated data is removed from each data block, the exclusive-or operation is performed on any two data blocks to detect whether the repeated data exists, namely, the exclusive-or operation is performed on each object of one data block and each object of the other data block, and if the repeated data exists, one data is reserved. And when the XOR operation is carried out, the XOR operation is carried out on the time, and the data at the repeated time is removed. The time of the monitoring data obtained by adopting the operation is unique, and the data of two same times does not exist.

Step S3: abnormal data detection and processing: and the big data analysis platform detects whether the extracted operation data of the main transformer is abnormal or not, if the extracted operation data is abnormal, the abnormal data is removed, and the processed data is input to the step S4. The method specifically comprises the following steps:

Step S4: local outlier detection and processing: and detecting whether the extracted operating data of the main transformer has local outliers by the big data analysis platform, and if so, rejecting the local outliers.

The method specifically comprises the following steps:

step S42: calculating the distance between each object value of the operation data and each object value of the first data block, updating (m + k) neighbor of each object value in the first data block, calculating the degree of outlier of each object value in real time, setting the degree of outlier to be infinite when the number of neighbors is less than m + k, and excluding the outlier from the data block when the degree of outlier is less than an initial threshold value c; the degree of outlier of each object value is the sum of the distances between the object value and the m +1 th to m + k th neighbors of the object value; the initial threshold c is set to 0;

step S44: calculating the distance between each object value of the operation data and each object value of the second data block, updating (m + k) neighbor of each object value in the second data block, calculating the degree of outlier of each object value in real time, setting the degree of outlier to be infinite when the number of neighbors is less than m + k, and excluding the outlier from the data block when the degree of outlier is less than a threshold value c;

Preferably, said step S42

Step S5, data integrity detection and processing: and the big data analysis platform detects whether the extracted operation data is complete, completes the missing value if the extracted operation data has the missing value, and outputs the completed operation data as finally processed data. The method specifically comprises the following steps:

step S51: for each type of operation data, after the processing of the steps S2-S4, the data loss comprises that the time and the numerical value are simultaneously lost, and only the numerical value is lost; detecting the integrity of the processed running data of the main transformer, judging whether the data is complete, namely comprising corresponding time and a numerical value, and if the data is missing, judging the corresponding data missing type;

step S53: if only the numerical value is missing, extracting N numerical values before and after the corresponding time of the missing numerical value, respectively calculating the average Eq of the N numerical values before the corresponding time and the average Eh of the N numerical values after the corresponding time, taking the front N numerical values as first data and the rear N numerical values as second data, and taking the front N numerical values as the first data and the rear N numerical values as second dataA first value adjacent to the missing value is assigned a weight lambda, a second value adjacent to the missing value in the first data and the second data is assigned a weight a, and N-2 data in the first data and the second data are assigned weights of

Wherein λ > a;

For example, N is 4 for the data with voltage missing as the 12 th time, that is, the voltage value 4 before the 12 th time is taken as the first data, where the weight of the voltage value at the 11 th time is λ, the weight of the voltage value at the 10 th time is a, and the weight of the voltage values at the 9 th and 8 th times is a

If the voltage values 4 times before the 12 th time are divided into D11, D10, D9, and D8, Eq ═ 4 (D11+ D10+ D9+ D8).

Taking the voltage value 4 moments after the 12 th moment as first data, wherein the weight of the voltage value at the 13 th moment is lambda, the weight of the voltage value at the 14 th moment is a, and the weights of the voltage values at the 15 th moment and the 16 th moment are

If the voltage values 4 times before the 12 th next time are divided into D13, D14, D15 and D16, Eh is equal to Eh(D13+D14+D15+D16)/4。

The first intermediate value is calculated as:

the second intermediate value is calculated as:

and if the average values Eq and Eh are equal, the missing value at the corresponding moment is the average value Eq or Eh.

If the missing values are the values at the initial 2 moments, calculating the next N values adjacent to the missing values, observing whether the next values increase or decrease along with the change of time, if the new values increase, setting the calculated middle value to be smaller than the average value Eh of the next N values at the corresponding moments, and if the new values decrease, setting the calculated middle value to be larger than the average value Eh of the next N values at the corresponding moments.

If the missing value is the value of the last 2 moments, the former N values adjacent to the missing value are calculated, whether the former values increase or decrease along with the change of time is observed, if the former values increase, the calculated middle value is set to be larger than the average value Eq of the later N values of the corresponding moments, and if the latter values decrease, the calculated middle value is set to be smaller than the average value Eq of the later N values of the corresponding moments.

Those of ordinary skill in the art will appreciate that the elements of the examples described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the components of the examples have been described above generally in terms of their functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present application, it should be understood that the division of the unit is only one division of logical functions, and other division manners may be used in actual implementation, for example, multiple units may be combined into one unit, one unit may be split into multiple units, or some features may be omitted.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the present invention, and they should be construed as being included in the following claims and description.

Claims

1. The main transformer big data preprocessing method based on the big data analysis platform is characterized by comprising the following steps: the method comprises the following steps:

step S4: local outlier detection and processing: the big data analysis platform detects whether the extracted operation data of the main transformer has local outliers, and if the extracted operation data of the main transformer has the local outliers, the local outliers are removed;

2. The big data preprocessing method of the main transformer based on the big data analysis platform as claimed in claim 1, wherein: the repeated data detection and processing in step S2 specifically includes the following steps:

Representing the jth data in the ith data block in the kth type of operation data; wherein k is 1,2,3,4, respectively representing current, voltage, active power and reactive power; 1,2, · · n; j is 1,2, … m; the above-mentioned

Expressed as a time object, expressed as a time and value pair;

step S23: after the repeated data is removed from each data block, the exclusive-or operation is performed on any two data blocks to detect whether the repeated data exists, namely, the exclusive-or operation is performed on each object of one data block and each object of the other data block, and if the repeated data exists, one data is retained.

3. The big data preprocessing method of the main transformer based on the big data analysis platform as claimed in claim 2, wherein: and when the XOR operation is carried out, the XOR operation is carried out on the time, and the data at the repeated time is removed.

4. The big data preprocessing method of the main transformer based on the big data analysis platform as claimed in claim 1, wherein: the abnormal data detection and processing in step S3 specifically includes the following steps:

5. The big data preprocessing method of the main transformer based on the big data analysis platform as claimed in claim 1, wherein: the local outlier detection and processing in step S4 specifically includes the following steps:

step S41: dividing each type of extracted operating data of the main transformer into n data blocks; initializing the distance between each object value of the data block and the (m + k) neighbor of the object value to be a maximum value;

6. The big data preprocessing method of the main transformer based on the big data analysis platform as claimed in claim 5, wherein: the threshold value c in said step S42 is set to 0.

7. The big data preprocessing method of the main transformer based on the big data analysis platform as claimed in claim 1, wherein: the data integrity detection and processing in step S5 specifically includes the following steps:

8. The big data preprocessing method of the main transformer based on the big data analysis platform as claimed in claim 7, wherein: and if the average values Eq and Eh are equal, the missing value at the corresponding moment is the average value Eq or Eh.

9. The big data preprocessing method of the main transformer based on the big data analysis platform as claimed in claim 7, wherein: if the missing values are the values at the initial 2 moments, calculating the next N values adjacent to the missing values, observing whether the next values increase or decrease along with the change of time, if the new values increase, setting the calculated middle value to be smaller than the average value Eh of the next N values at the corresponding moments, and if the new values decrease, setting the calculated middle value to be larger than the average value Eh of the next N values at the corresponding moments.

10. The big data preprocessing method of the main transformer based on the big data analysis platform as claimed in claim 7, wherein: if the missing value is the value of the last 2 moments, the former N values adjacent to the missing value are calculated, whether the former values increase or decrease along with the change of time is observed, if the former values increase, the calculated middle value is set to be larger than the average value Eq of the later N values of the corresponding moments, and if the latter values decrease, the calculated middle value is set to be smaller than the average value Eq of the later N values of the corresponding moments.