CN114840505A - Main transformer big data preprocessing method based on big data analysis platform - Google Patents

Main transformer big data preprocessing method based on big data analysis platform Download PDF

Info

Publication number
CN114840505A
CN114840505A CN202210263204.9A CN202210263204A CN114840505A CN 114840505 A CN114840505 A CN 114840505A CN 202210263204 A CN202210263204 A CN 202210263204A CN 114840505 A CN114840505 A CN 114840505A
Authority
CN
China
Prior art keywords
data
value
values
main transformer
missing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210263204.9A
Other languages
Chinese (zh)
Inventor
于明
林信
包忠强
李波
黄丽娟
周恒旺
覃晖
郭华
谢瑞浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangxi Power Grid Co Ltd
Original Assignee
Guangxi Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangxi Power Grid Co Ltd filed Critical Guangxi Power Grid Co Ltd
Priority to CN202210263204.9A priority Critical patent/CN114840505A/en
Publication of CN114840505A publication Critical patent/CN114840505A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J13/00Circuit arrangements for providing remote indication of network conditions, e.g. an instantaneous record of the open or closed condition of each circuitbreaker in the network; Circuit arrangements for providing remote control of switching means in a power distribution network, e.g. switching in and out of current consumers by using a pulse code signal carried by the network
    • H02J13/00002Circuit arrangements for providing remote indication of network conditions, e.g. an instantaneous record of the open or closed condition of each circuitbreaker in the network; Circuit arrangements for providing remote control of switching means in a power distribution network, e.g. switching in and out of current consumers by using a pulse code signal carried by the network characterised by monitoring
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J13/00Circuit arrangements for providing remote indication of network conditions, e.g. an instantaneous record of the open or closed condition of each circuitbreaker in the network; Circuit arrangements for providing remote control of switching means in a power distribution network, e.g. switching in and out of current consumers by using a pulse code signal carried by the network
    • H02J13/00032Systems characterised by the controlled or operated power network elements or equipment, the power network elements or equipment not otherwise provided for
    • H02J13/00034Systems characterised by the controlled or operated power network elements or equipment, the power network elements or equipment not otherwise provided for the elements or equipment being or involving an electric power substation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Abstract

The invention carries out data processing aiming at the current, the voltage, the active power and the reactive power of the main transformer extracted from a dispatching automation system, comprises repeated data detection and processing, abnormal data detection and processing, local outlier detection and processing and data integrity detection and processing, and can process the problems of noise, abnormity, deficiency, repetition and the like in the operation data current, the voltage, the active power and the reactive power of the main transformer. The quality of the data is improved through big data preprocessing, standard, continuous and accurate mass data are obtained, and the efficiency and the accuracy are improved for subsequent main transformer big data mining analysis. Meanwhile, the continuity and the priori of time, namely the relevance of the front moment and the back moment of the monitoring data of the main transformer, are considered when missing data completion is carried out, so that the calculation is carried out in a weight summation mode, the trend of a monitoring curve is considered, the completed value is more real and more accurate and is closer to the actually monitored data, and the accuracy of subsequent data mining is improved.

Description

Main transformer big data preprocessing method based on big data analysis platform
Technical Field
The invention belongs to the technical field of data preprocessing, and particularly relates to a main transformer big data preprocessing method based on a big data analysis platform.
Background
A main transformer, called a main transformer (GSU) for short, is a main step-down transformer mainly used for power transmission and transformation in a unit or a transformer substation, and is also a core part of the transformer substation. The transformer is a core device of the traction power supply system of the electric locomotive and is also a key device for ensuring the safe and stable operation of the traction power supply system. The main transformer generally has a relatively large capacity and requires high operational reliability. Although the failure rate of the main transformer is not high, once the failure occurs, the main transformer causes great loss, the main transformer can cause equipment failure if the main transformer fails, and fire can be caused if the main transformer fails, so that the normal transportation safety is endangered. Therefore, it is very important to analyze the cause of the transformer fault and take corresponding countermeasures. With the development of society and the progress of technology, state maintenance is a maintenance mode which can reduce maintenance cost, shorten maintenance power-off time and improve equipment utilization ratio compared with a regular maintenance mode, and has become a development direction for the maintenance of power equipment such as transformers and the like. And the key to correctly grasp the running state of the transformer is whether the condition maintenance is successful or not. At present, the main transformer is monitored and corresponding monitoring data is collected and analyzed by adopting monitoring equipment to correctly master the operation state of the main transformer, however, the collected monitoring data is abnormal due to system errors, network delay or other factors of the monitoring equipment, so that the difficulty is improved for the data analysis of the main transformer in the later period, and therefore the collected monitoring data needs to be preprocessed so as to be suitable for the data mining analysis in the later period.
Disclosure of Invention
In order to solve the problems, the invention provides a main transformer big data preprocessing method based on a big data analysis platform, and the specific technical scheme is as follows:
the main transformer big data preprocessing method based on the big data analysis platform comprises the following steps:
step S1, data acquisition and storage: extracting operation data of the main transformer from the dispatching automation system, wherein the operation data comprises current, voltage, active power and reactive power at each moment in operation, and storing the extracted operation data to a big data analysis platform;
step S2, repeating data detection and processing: the big data analysis platform detects repeated data in the extracted operating data of the main transformer, selects one of the repeated data to be reserved, eliminates redundant repeated data, and inputs the processed data into the step S3;
step S3: abnormal data detection and processing: the big data analysis platform detects whether the extracted running data of the main transformer is abnormal, if the extracted running data is abnormal, the abnormal data is removed, and the processed data is input to the step S4;
step S4: local outlier detection and processing: the big data analysis platform detects whether extracted operation data of the main transformer has local outliers, and if the extracted operation data of the main transformer has the local outliers, the local outliers are removed;
step S5, data integrity detection and processing: and the big data analysis platform detects whether the extracted operation data is complete, completes the missing value if the extracted operation data has the missing value, and outputs the completed operation data as finally processed data.
Preferably, the repeated data detection and processing in step S2 specifically includes the following steps:
step S21: dividing the extracted operating data of the main transformer into n data blocks according to types; each data block includesm objects; the object is described in
Figure BDA0003551542900000021
Representing the jth data in the ith data block in the kth type of operation data; wherein k is 1,2,3,4, respectively representing current, voltage, active power and reactive power; 1,2, · · n; j ═ 1,2, · · m; the above-mentioned
Figure BDA0003551542900000022
Expressed as a time object, expressed as a time and value pair;
step S22: detecting whether repeated data exist in any two objects in each data block in the kth type of running data by adopting XOR operation, if the operation result is 0, indicating that the repeated data exist, rejecting one of the repeated data, and if the operation result is 1, indicating that the repeated data do not exist in the data block;
step S23: after the repeated data is removed from each data block, the exclusive-or operation is performed on any two data blocks to detect whether the repeated data exists, namely, the exclusive-or operation is performed on each object of one data block and each object of the other data block, and if the repeated data exists, one data is reserved.
Preferably, the exclusive-or operation is performed on the time to eliminate the data at the repeated time.
Preferably, the abnormal data detection and processing in step S3 specifically includes the following steps:
step S31: setting the maximum value and the minimum value of the current, the voltage, the active power and the reactive power of the main transformer on a big data analysis platform;
step S32: and respectively detecting whether the extracted operating data of each type are between the set corresponding maximum value and the set corresponding minimum value, if the corresponding numerical values are not between the minimum value and the maximum value, judging as abnormal data, and performing numerical value elimination.
Preferably, the local outlier detection and processing in step S4 specifically includes the following steps:
step S41: dividing each type of extracted operating data of the main transformer into n data blocks; each object value of the data block is initialized to a maximum value with its (m + k) neighbor distance;
step S42: calculating the distance between each object value of the operation data and each object value of the first data block, updating (m + k) neighbor of each object value in the first data block, calculating the degree of outlier of each object value in real time, setting the degree of outlier to be infinite when the number of neighbors is less than m + k, and excluding the outlier from the data block when the degree of outlier is less than an initial threshold value c; the degree of outlier of each object value is the sum of the distances between the object value and the m +1 th to m + k th neighbors of the object value;
step S43: after the first data block is processed, sorting the object numerical values which are not excluded in the first data block from large to small according to the degree of outlier, taking the first n object numerical values, adding the first n object numerical values into a TOP n outlier, and updating a threshold c;
step S44: calculating the distance between each object value of the operation data and each object value of the second data block, updating (m + k) neighbors of each object value in the second data block, calculating the degree of outlier of each object value in real time, setting the degree of outlier to be infinite when the number of neighbors is less than m + k, and excluding the degree of outlier from the data block when the degree of outlier is less than a threshold value c;
step S45: after the second data block is processed, if the degree of outlier of the object value which is not excluded in the second data block is larger than the degree of outlier in the TOP n outlier, updating the TOP n outlier and updating the threshold c;
step S46: repeating steps S44-S45 for the ith data block, i being 3,4, 5 … … n; outputting TOP n outliers until all data blocks are processed;
in the steps S43 and S45, the threshold c is updated by setting the degree of outlier of the nth one of the TOP n outliers as the value of the threshold c.
Preferably, the threshold value c in step S42 is set to 0.
Preferably, the data integrity detection and processing in step S5 specifically includes the following steps:
step S51: for each type of operation data, after the processing of the steps S2-S4, the data loss comprises that the time and the numerical value are simultaneously lost, and only the numerical value is lost; detecting the integrity of the processed operating data of the main transformer, judging whether the data is complete, namely comprising corresponding time and numerical values, and if the data is missing, judging the corresponding data missing type;
step S52: if the time and the value are simultaneously missing, the corresponding missing time is supplemented, the missing type is converted into the missing of the value, and then the data is supplemented by adopting the method in the step S53;
step S53: if only the numerical value is missing, extracting N numerical values before and after the corresponding time of the missing numerical value, respectively calculating an average Eq of the N numerical values before the corresponding time and an average Eh of the N numerical values after the corresponding time, taking the N numerical values before as first data and the N numerical values after as second data, allocating a weight lambda to a first numerical value adjacent to the missing numerical value in the first data and the second data, allocating a weight a to a second numerical value adjacent to the missing numerical value in the first data and the second data, and taking the weight of N-2 data in the first data and the second data as the weight
Figure BDA0003551542900000041
Step S54: respectively multiplying N numerical values in the first data by corresponding weights and summing to obtain a first calculation value, respectively multiplying N numerical values in the second data by corresponding weights and summing to obtain a second calculation value, averaging the first calculation value and the second calculation value to obtain an intermediate calculation value, judging whether the intermediate calculation value is between the average values Eq and Eh, if the intermediate calculation value is between the average values Eq and Eh, using the intermediate calculation value as a supplement value of a missing value of the data, if the intermediate calculation value is not between the average values Eq and Eh, adjusting the values of the weight values lambda and a, and repeating the calculation to enable the calculated intermediate calculation value to be between the average values Eq and Eh.
Preferably, if the average values Eq and Eh are equal, the missing value at the corresponding time is the average value Eq or Eh.
Preferably, if the missing value is the value of the initial 2 moments, the next N values adjacent to the missing value are calculated, and whether the next value increases or decreases with time is observed, if the trend of the increase is positive, the calculated intermediate value is set to be smaller than the average Eh of the next N values at the corresponding moment, and if the trend of the decrease is negative, the calculated intermediate value is set to be larger than the average Eh of the next N values at the corresponding moment.
Preferably, if the missing value is the last 2 time values, the previous N adjacent values are calculated, and whether the previous value increases or decreases with time is observed, if the trend of the increase is positive, the calculated intermediate value is set to be larger than the average Eq of the next N corresponding time values, and if the trend of the decrease is negative, the calculated intermediate value is set to be smaller than the average Eq of the next N corresponding time values.
The invention has the beneficial effects that: the invention carries out data processing aiming at the current, the voltage, the active power and the reactive power of the main transformer extracted from a dispatching automation system, comprises repeated data detection and processing, abnormal data detection and processing, local outlier detection and processing and data integrity detection and processing, and can process the problems of noise, abnormity, deficiency, repetition and the like in the operation data current, the voltage, the active power and the reactive power of the main transformer. The quality of the data is improved through big data preprocessing, and standard, continuous and accurate mass data are obtained, so that the efficiency and the accuracy are improved for subsequent big data mining and analyzing of the main transformer. Meanwhile, the continuity and the priori of time, namely the relevance of the front moment and the back moment of the monitoring data of the main transformer, are considered when missing data completion is carried out, so that the calculation is carried out in a weight summation mode, the trend of a monitoring curve is considered, the completed value is more real and more accurate and is closer to the actually monitored data, and the accuracy of subsequent data mining is improved.
Drawings
In order to more clearly illustrate the detailed description of the invention or the technical solutions in the prior art, the drawings that are needed in the detailed description of the invention or the prior art will be briefly described below. Throughout the drawings, like elements or portions are generally identified by like reference numerals. In the drawings, elements or portions are not necessarily drawn to scale.
FIG. 1 is a schematic flow chart of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
As shown in fig. 1, a main transformer big data preprocessing method based on a big data analysis platform according to an embodiment of the present invention includes the following steps:
step S1, data acquisition and storage: and extracting the operation data of the main transformer from the dispatching automation system, wherein the operation data comprises current, voltage, active power and reactive power at each moment in operation, and storing the extracted operation data to a big data analysis platform.
Step S2, repeating data detection and processing: and the big data analysis platform detects repeated data in the extracted operating data of the main transformer, selects one of the repeated data to be reserved, eliminates redundant repeated data, and inputs the processed data into the step S3.
The method specifically comprises the following steps:
step S21: dividing the extracted operating data of the main transformer into n data blocks according to types; each data block comprises m objects; the object is described in
Figure BDA0003551542900000071
Representing the jth data in the ith data block in the kth type of operation data; wherein k is 1,2,3,4, respectively representing current, voltage, active power and reactive power; 1,2, · · n; j ═ 1,2, · · m; the described
Figure BDA0003551542900000072
Expressed as a time object, expressed as a time and value pair;
step S22: detecting whether repeated data exist in any two objects in each data block in the kth type of running data by adopting XOR operation, if the operation result is 0, indicating that the repeated data exist, rejecting one of the repeated data, and if the operation result is 1, indicating that the repeated data do not exist in the data block;
step S23: after the repeated data is removed from each data block, the exclusive-or operation is performed on any two data blocks to detect whether the repeated data exists, namely, the exclusive-or operation is performed on each object of one data block and each object of the other data block, and if the repeated data exists, one data is reserved. And when the XOR operation is carried out, the XOR operation is carried out on the time, and the data at the repeated time is removed. The time of the monitoring data obtained by adopting the operation is unique, and the data of two same times does not exist.
Step S3: abnormal data detection and processing: and the big data analysis platform detects whether the extracted operation data of the main transformer is abnormal or not, if the extracted operation data is abnormal, the abnormal data is removed, and the processed data is input to the step S4. The method specifically comprises the following steps:
step S31: setting the maximum value and the minimum value of the current, the voltage, the active power and the reactive power of the main transformer on a big data analysis platform;
step S32: and respectively detecting whether the extracted operating data of each type are between the set corresponding maximum value and the set corresponding minimum value, if the corresponding numerical values are not between the minimum value and the maximum value, judging as abnormal data, and performing numerical value elimination.
Step S4: local outlier detection and processing: and detecting whether the extracted operating data of the main transformer has local outliers by the big data analysis platform, and if so, rejecting the local outliers.
The method specifically comprises the following steps:
step S41: dividing each type of extracted operating data of the main transformer into n data blocks; each object value of the data block is initialized to a maximum value with its (m + k) neighbor distance;
step S42: calculating the distance between each object value of the operation data and each object value of the first data block, updating (m + k) neighbor of each object value in the first data block, calculating the degree of outlier of each object value in real time, setting the degree of outlier to be infinite when the number of neighbors is less than m + k, and excluding the outlier from the data block when the degree of outlier is less than an initial threshold value c; the degree of outlier of each object value is the sum of the distances between the object value and the m +1 th to m + k th neighbors of the object value; the initial threshold c is set to 0;
step S43: after the first data block is processed, sorting the object numerical values which are not excluded in the first data block from large to small according to the degree of outlier, taking the first n object numerical values, adding the first n object numerical values into a TOP n outlier, and updating a threshold c;
step S44: calculating the distance between each object value of the operation data and each object value of the second data block, updating (m + k) neighbor of each object value in the second data block, calculating the degree of outlier of each object value in real time, setting the degree of outlier to be infinite when the number of neighbors is less than m + k, and excluding the outlier from the data block when the degree of outlier is less than a threshold value c;
step S45: after the second data block is processed, if the degree of outlier of the object value which is not excluded in the second data block is larger than the degree of outlier in the TOP n outlier, updating the TOP n outlier and updating the threshold c;
step S46: repeating steps S44-S45 for the ith data block, i being 3,4, 5 … … n; outputting TOP n outliers until all data blocks are processed;
in the steps S43 and S45, the threshold c is updated by setting the degree of outlier of the nth one of the TOP n outliers as the value of the threshold c.
Preferably, said step S42
Step S5, data integrity detection and processing: and the big data analysis platform detects whether the extracted operation data is complete, completes the missing value if the extracted operation data has the missing value, and outputs the completed operation data as finally processed data. The method specifically comprises the following steps:
step S51: for each type of operation data, after the processing of the steps S2-S4, the data loss comprises that the time and the numerical value are simultaneously lost, and only the numerical value is lost; detecting the integrity of the processed running data of the main transformer, judging whether the data is complete, namely comprising corresponding time and a numerical value, and if the data is missing, judging the corresponding data missing type;
step S52: if the time and the value are simultaneously missing, the corresponding missing time is supplemented, the missing type is converted into the missing of the value, and then the data is supplemented by adopting the method in the step S53;
step S53: if only the numerical value is missing, extracting N numerical values before and after the corresponding time of the missing numerical value, respectively calculating the average Eq of the N numerical values before the corresponding time and the average Eh of the N numerical values after the corresponding time, taking the front N numerical values as first data and the rear N numerical values as second data, and taking the front N numerical values as the first data and the rear N numerical values as second dataA first value adjacent to the missing value is assigned a weight lambda, a second value adjacent to the missing value in the first data and the second data is assigned a weight a, and N-2 data in the first data and the second data are assigned weights of
Figure BDA0003551542900000101
Wherein λ > a;
step S54: respectively multiplying N numerical values in the first data by corresponding weights and summing to obtain a first calculation value, respectively multiplying N numerical values in the second data by corresponding weights and summing to obtain a second calculation value, averaging the first calculation value and the second calculation value to obtain an intermediate calculation value, judging whether the intermediate calculation value is between the average values Eq and Eh, if the intermediate calculation value is between the average values Eq and Eh, using the intermediate calculation value as a supplement value of a missing value of the data, if the intermediate calculation value is not between the average values Eq and Eh, adjusting the values of the weight values lambda and a, and repeating the calculation to enable the calculated intermediate calculation value to be between the average values Eq and Eh.
For example, N is 4 for the data with voltage missing as the 12 th time, that is, the voltage value 4 before the 12 th time is taken as the first data, where the weight of the voltage value at the 11 th time is λ, the weight of the voltage value at the 10 th time is a, and the weight of the voltage values at the 9 th and 8 th times is a
Figure BDA0003551542900000102
If the voltage values 4 times before the 12 th time are divided into D11, D10, D9, and D8, Eq ═ 4 (D11+ D10+ D9+ D8).
Taking the voltage value 4 moments after the 12 th moment as first data, wherein the weight of the voltage value at the 13 th moment is lambda, the weight of the voltage value at the 14 th moment is a, and the weights of the voltage values at the 15 th moment and the 16 th moment are
Figure BDA0003551542900000103
If the voltage values 4 times before the 12 th next time are divided into D13, D14, D15 and D16, Eh is equal to Eh(D13+D14+D15+D16)/4。
The first intermediate value is calculated as:
Figure BDA0003551542900000104
the second intermediate value is calculated as:
Figure BDA0003551542900000105
and if the average values Eq and Eh are equal, the missing value at the corresponding moment is the average value Eq or Eh.
If the missing values are the values at the initial 2 moments, calculating the next N values adjacent to the missing values, observing whether the next values increase or decrease along with the change of time, if the new values increase, setting the calculated middle value to be smaller than the average value Eh of the next N values at the corresponding moments, and if the new values decrease, setting the calculated middle value to be larger than the average value Eh of the next N values at the corresponding moments.
If the missing value is the value of the last 2 moments, the former N values adjacent to the missing value are calculated, whether the former values increase or decrease along with the change of time is observed, if the former values increase, the calculated middle value is set to be larger than the average value Eq of the later N values of the corresponding moments, and if the latter values decrease, the calculated middle value is set to be smaller than the average value Eq of the later N values of the corresponding moments.
Those of ordinary skill in the art will appreciate that the elements of the examples described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the components of the examples have been described above generally in terms of their functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided in the present application, it should be understood that the division of the unit is only one division of logical functions, and other division manners may be used in actual implementation, for example, multiple units may be combined into one unit, one unit may be split into multiple units, or some features may be omitted.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the present invention, and they should be construed as being included in the following claims and description.

Claims (10)

1. The main transformer big data preprocessing method based on the big data analysis platform is characterized by comprising the following steps: the method comprises the following steps:
step S1, data acquisition and storage: extracting operation data of the main transformer from the dispatching automation system, wherein the operation data comprises current, voltage, active power and reactive power at each moment in operation, and storing the extracted operation data to a big data analysis platform;
step S2, repeating data detection and processing: the big data analysis platform detects repeated data in the extracted operating data of the main transformer, selects one of the repeated data to be reserved, eliminates redundant repeated data, and inputs the processed data into the step S3;
step S3: abnormal data detection and processing: the big data analysis platform detects whether the extracted running data of the main transformer is abnormal, if the extracted running data is abnormal, the abnormal data is removed, and the processed data is input to the step S4;
step S4: local outlier detection and processing: the big data analysis platform detects whether the extracted operation data of the main transformer has local outliers, and if the extracted operation data of the main transformer has the local outliers, the local outliers are removed;
step S5, data integrity detection and processing: and the big data analysis platform detects whether the extracted operation data is complete, completes the missing value if the extracted operation data has the missing value, and outputs the completed operation data as finally processed data.
2. The big data preprocessing method of the main transformer based on the big data analysis platform as claimed in claim 1, wherein: the repeated data detection and processing in step S2 specifically includes the following steps:
step S21: dividing the extracted operating data of the main transformer into n data blocks according to types; each data block comprises m objects; the object is described in
Figure FDA0003551542890000011
Representing the jth data in the ith data block in the kth type of operation data; wherein k is 1,2,3,4, respectively representing current, voltage, active power and reactive power; 1,2, · · n; j is 1,2, … m; the above-mentioned
Figure FDA0003551542890000012
Expressed as a time object, expressed as a time and value pair;
step S22: detecting whether repeated data exist in any two objects in each data block in the kth type of running data by adopting XOR operation, if the operation result is 0, indicating that the repeated data exist, rejecting one of the repeated data, and if the operation result is 1, indicating that the repeated data do not exist in the data block;
step S23: after the repeated data is removed from each data block, the exclusive-or operation is performed on any two data blocks to detect whether the repeated data exists, namely, the exclusive-or operation is performed on each object of one data block and each object of the other data block, and if the repeated data exists, one data is retained.
3. The big data preprocessing method of the main transformer based on the big data analysis platform as claimed in claim 2, wherein: and when the XOR operation is carried out, the XOR operation is carried out on the time, and the data at the repeated time is removed.
4. The big data preprocessing method of the main transformer based on the big data analysis platform as claimed in claim 1, wherein: the abnormal data detection and processing in step S3 specifically includes the following steps:
step S31: setting the maximum value and the minimum value of the current, the voltage, the active power and the reactive power of the main transformer on a big data analysis platform;
step S32: and respectively detecting whether the extracted operating data of each type are between the set corresponding maximum value and the set corresponding minimum value, if the corresponding numerical values are not between the minimum value and the maximum value, judging as abnormal data, and performing numerical value elimination.
5. The big data preprocessing method of the main transformer based on the big data analysis platform as claimed in claim 1, wherein: the local outlier detection and processing in step S4 specifically includes the following steps:
step S41: dividing each type of extracted operating data of the main transformer into n data blocks; initializing the distance between each object value of the data block and the (m + k) neighbor of the object value to be a maximum value;
step S42: calculating the distance between each object value of the operation data and each object value of the first data block, updating (m + k) neighbor of each object value in the first data block, calculating the degree of outlier of each object value in real time, setting the degree of outlier to be infinite when the number of neighbors is less than m + k, and excluding the outlier from the data block when the degree of outlier is less than an initial threshold value c; the degree of outlier of each object value is the sum of the distances between the object value and the m +1 th to m + k th neighbors of the object value;
step S43: after the first data block is processed, sorting the object numerical values which are not excluded in the first data block from large to small according to the degree of outlier, taking the first n object numerical values, adding the first n object numerical values into a TOP n outlier, and updating a threshold c;
step S44: calculating the distance between each object value of the operation data and each object value of the second data block, updating (m + k) neighbor of each object value in the second data block, calculating the degree of outlier of each object value in real time, setting the degree of outlier to be infinite when the number of neighbors is less than m + k, and excluding the outlier from the data block when the degree of outlier is less than a threshold value c;
step S45: after the second data block is processed, if the degree of outlier of the object value which is not excluded in the second data block is larger than the degree of outlier in the TOP n outlier, updating the TOP n outlier and updating the threshold c;
step S46: repeating steps S44-S45 for the ith data block, i being 3,4, 5 … … n; outputting TOP n outliers until all data blocks are processed;
in the steps S43 and S45, the threshold c is updated by setting the degree of outlier of the nth one of the TOP n outliers as the value of the threshold c.
6. The big data preprocessing method of the main transformer based on the big data analysis platform as claimed in claim 5, wherein: the threshold value c in said step S42 is set to 0.
7. The big data preprocessing method of the main transformer based on the big data analysis platform as claimed in claim 1, wherein: the data integrity detection and processing in step S5 specifically includes the following steps:
step S51: for each type of operation data, after the processing of the steps S2-S4, the data loss comprises that the time and the numerical value are simultaneously lost, and only the numerical value is lost; detecting the integrity of the processed operating data of the main transformer, judging whether the data is complete, namely comprising corresponding time and numerical values, and if the data is missing, judging the corresponding data missing type;
step S52: if the time and the value are simultaneously missing, the corresponding missing time is supplemented, the missing type is converted into the missing of the value, and then the data is supplemented by adopting the method in the step S53;
step S53: if only the numerical value is missing, extracting N numerical values before and after the corresponding time of the missing numerical value, respectively calculating an average Eq of the N numerical values before the corresponding time and an average Eh of the N numerical values after the corresponding time, taking the N numerical values before as first data and the N numerical values after as second data, allocating a weight lambda to a first numerical value adjacent to the missing numerical value in the first data and the second data, allocating a weight a to a second numerical value adjacent to the missing numerical value in the first data and the second data, and taking the weight of N-2 data in the first data and the second data as the weight
Figure FDA0003551542890000041
Step S54: respectively multiplying N numerical values in the first data by corresponding weights and summing to obtain a first calculation value, respectively multiplying N numerical values in the second data by corresponding weights and summing to obtain a second calculation value, averaging the first calculation value and the second calculation value to obtain an intermediate calculation value, judging whether the intermediate calculation value is between the average values Eq and Eh, if the intermediate calculation value is between the average values Eq and Eh, using the intermediate calculation value as a supplement value of a missing value of the data, if the intermediate calculation value is not between the average values Eq and Eh, adjusting the values of the weight values lambda and a, and repeating the calculation to enable the calculated intermediate calculation value to be between the average values Eq and Eh.
8. The big data preprocessing method of the main transformer based on the big data analysis platform as claimed in claim 7, wherein: and if the average values Eq and Eh are equal, the missing value at the corresponding moment is the average value Eq or Eh.
9. The big data preprocessing method of the main transformer based on the big data analysis platform as claimed in claim 7, wherein: if the missing values are the values at the initial 2 moments, calculating the next N values adjacent to the missing values, observing whether the next values increase or decrease along with the change of time, if the new values increase, setting the calculated middle value to be smaller than the average value Eh of the next N values at the corresponding moments, and if the new values decrease, setting the calculated middle value to be larger than the average value Eh of the next N values at the corresponding moments.
10. The big data preprocessing method of the main transformer based on the big data analysis platform as claimed in claim 7, wherein: if the missing value is the value of the last 2 moments, the former N values adjacent to the missing value are calculated, whether the former values increase or decrease along with the change of time is observed, if the former values increase, the calculated middle value is set to be larger than the average value Eq of the later N values of the corresponding moments, and if the latter values decrease, the calculated middle value is set to be smaller than the average value Eq of the later N values of the corresponding moments.
CN202210263204.9A 2022-03-17 2022-03-17 Main transformer big data preprocessing method based on big data analysis platform Pending CN114840505A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210263204.9A CN114840505A (en) 2022-03-17 2022-03-17 Main transformer big data preprocessing method based on big data analysis platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210263204.9A CN114840505A (en) 2022-03-17 2022-03-17 Main transformer big data preprocessing method based on big data analysis platform

Publications (1)

Publication Number Publication Date
CN114840505A true CN114840505A (en) 2022-08-02

Family

ID=82561657

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210263204.9A Pending CN114840505A (en) 2022-03-17 2022-03-17 Main transformer big data preprocessing method based on big data analysis platform

Country Status (1)

Country Link
CN (1) CN114840505A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116660667A (en) * 2023-07-26 2023-08-29 山东金科电气股份有限公司 Transformer abnormality monitoring method and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116660667A (en) * 2023-07-26 2023-08-29 山东金科电气股份有限公司 Transformer abnormality monitoring method and system
CN116660667B (en) * 2023-07-26 2023-10-24 山东金科电气股份有限公司 Transformer abnormality monitoring method and system

Similar Documents

Publication Publication Date Title
CN106779505B (en) Power transmission line fault early warning method and system based on big data driving
EP3675306A1 (en) Quick search method for cascading failure of large ac/dc power grid, and system
CN106709651B (en) Electric power system security evaluation system based on risk theory
CN109146124B (en) Power distribution terminal transformation decision method based on time-varying failure rate
CN116187593B (en) Power distribution network fault prediction processing method, device, equipment and storage medium
CN112818297B (en) Data anomaly detection method in cloud environment
CN113077075B (en) New energy uncertainty electric power system safety risk prevention control method and device
CN112383045B (en) Transient stability out-of-limit probability calculation method and device for new energy power generation uncertainty
CN104794535A (en) Leading industry based electricity demand prediction and early warning method
CN114840505A (en) Main transformer big data preprocessing method based on big data analysis platform
CN116739829B (en) Big data-based power data analysis method, system and medium
CN114172133A (en) Method for detecting abnormal direct-current voltage measurement of high-voltage flexible direct-current power transmission system
CN115453356A (en) Power equipment running state monitoring and analyzing method, system, terminal and medium
CN114997566A (en) Power grid blocking risk assessment method and system considering node connectivity loss
CN109359742B (en) Method for generating preventive maintenance period of subway subsystem
CN110348540A (en) Electrical power system transient angle stability Contingency screening method and device based on cluster
CN114594398A (en) Energy storage lithium ion battery data preprocessing method
CN114414940A (en) Fault judgment method based on basic data of electricity utilization information acquisition system
CN112332420B (en) Device and method for determining hierarchical load reduction in power system risk assessment
CN114187132A (en) Transformer substation monitoring information feature selection method, storage medium and equipment
CN117332215A (en) High-low voltage power distribution cabinet abnormal fault information remote monitoring system
CN114595966A (en) Super-huge type urban power grid toughness assessment method considering different disaster types
CN108761258A (en) Transformer short period overload capability assessment system based on artificial intelligence and big data technology
CN108090616A (en) A kind of electric system Active Splitting optimal section searching method
CN113919520A (en) Maintenance plan management method, device and equipment for power grid maintenance and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination