CN114372093A - Processing method of DGA (differential global alignment) online monitoring data of transformer - Google Patents

Processing method of DGA (differential global alignment) online monitoring data of transformer Download PDF

Info

Publication number
CN114372093A
CN114372093A CN202111534103.2A CN202111534103A CN114372093A CN 114372093 A CN114372093 A CN 114372093A CN 202111534103 A CN202111534103 A CN 202111534103A CN 114372093 A CN114372093 A CN 114372093A
Authority
CN
China
Prior art keywords
data
line segment
sequence
dga
minimum
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111534103.2A
Other languages
Chinese (zh)
Inventor
朱自伟
张益宁
周梦垚
谢青
徐松龄
翟嘉璐
王梦宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanchang University
Original Assignee
Nanchang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanchang University filed Critical Nanchang University
Priority to CN202111534103.2A priority Critical patent/CN114372093A/en
Publication of CN114372093A publication Critical patent/CN114372093A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply

Abstract

The invention provides a processing method of DGA (differential global amplitude A) online monitoring data of a transformer, which is characterized in that the online data are equivalent to a time sequence according to the characteristics of returned data; the first stage introduces the idea of a sliding window algorithm, provides an improved sequence piecewise linearization algorithm, divides sequence data into a plurality of line segments characterized by slope and span, symbolizes online monitoring data by using improved K-means clustering, and finally excavates the relevance between different indexes in DGA by using an APRIORI algorithm and discovers the abnormal numerical value existing in the DGA; and in the second stage, according to the screened abnormal numerical sampling points, an improved particle swarm optimization support vector regression algorithm is used, the solving speed and solving diversity of the algorithm are guaranteed, and key parameters in the optimized support vector regression algorithm are used for repairing the sampling points, so that the processing of the transformer on-line DGA monitoring data is completed.

Description

Processing method of DGA (differential global alignment) online monitoring data of transformer
Technical Field
The invention relates to a processing method of DGA (differential global positioning system) online monitoring data of a transformer, belonging to the field of power equipment data cleaning.
Background
The power transformer is a pivotal device for converting and transmitting electric energy, and the safe and stable operation of the power transformer is an important guarantee for the power supply quality of users. The DGA index online data of the transformer is used for monitoring the insulation performance of equipment in real time, and the real-time state of the transformer can be rapidly obtained based on the analysis of oil chromatographic data; meanwhile, indexes in the DGA data are more in dimensionality, and the data of different abnormal modes in the online data can be distinguished by mining the incidence relation of the indexes, so that the reliability of the comprehensive state evaluation result of the equipment can be enhanced.
Due to the operating environment of the equipment and the electromagnetic interference of the transformer, the online monitoring device is easy to have randomly distributed abnormal numerical points in the data acquisition and transmission process, and even has the situations of data drift and transmission interruption in severe cases. The background system can quickly distinguish the obvious data abnormal phenomena such as data drift, data interruption and the like and give an alarm aiming at the problems; however, for the abnormal numerical points randomly distributed in the normal online data, the real-time representation of the equipment state index is seriously interfered, the state evaluation work based on the index is also influenced, the situations of false report, false report and the like of the abnormal state of the equipment are easily caused, and the waste of the running maintenance resources of the equipment is caused.
The power transformer is important equipment for ensuring stable operation of a transmission and distribution network, and iron core grounding current monitoring data of the transformer is an important basis for state evaluation of the transformer. The monitoring data of a period of time, including the overall change trend, the extreme points and jump points in the change and the data statistical characteristics, can reflect the possible abnormal conditions in the power transformer from multiple aspects.
After the long-term operation of the power equipment, the existing large-scale index data is stored in the power database and inevitably contains the index data of different abnormal modes, the existing index data is subjected to correlation analysis, the existing correlation relation is excavated, the data of different abnormal modes in the data is analyzed based on the correlation relation, and the data are effectively repaired, so that the comprehensive state evaluation system of the power equipment is favorably perfected, the abnormal state of the equipment device is found out in advance, the equipment overhaul efficiency is improved, and the operation and maintenance cost of the equipment is reduced.
Disclosure of Invention
The invention aims to provide a processing method of transformer DGA online monitoring data, so as to solve the problems of the background technology.
The invention is realized by the following technical scheme, and the method for processing the DGA online monitoring data of the transformer comprises the following steps:
s1, sliding window processing of the data set: introducing a sliding window idea, and intercepting an online data set by using a window with the length of L;
s2, traversing the online data set by sliding a window with a certain step size: setting the sliding step length as l, dragging the window to slide on the whole data set until all data are traversed; let the length of the online data set be L1After traversal, get
Figure BDA0003412507750000021
A data window, deriving the data in all windows to form a data set DS to be analyzedi,i∈n;
S3, piecewise linearization of sequence data: providing a piecewise linearization algorithm of sequence data, and combining a variable number of points in online data to form a multi-group data point set; the grouping of data points is normalized in that the error between the line segment fitted to all points and the actual data points is less than a threshold, and the slope and line segment span of the line segment used characterize the fitted line segment;
s4, constructing a model for describing the similarity of different line segments: constructing a similarity model based on the slope and span of the line segments, classifying the line segments by using a K-means clustering algorithm improved based on the maximum and minimum distances, giving symbols to the line segments of the same class, and completing the symbolization of sequence data;
s5, mining the relevance among different sequences: based on the idea of Apriori algorithm, setting minimum confidence and support degree, mining frequent item sets existing among different sequences, and quantifying the relevance among different sequences;
s6, extracting and screening abnormal values existing in DGA online monitoring data: according to the strength of the correlation among the sequences, separating data of different abnormal modes from the abnormal numerical value types in the judged data;
s7, improving particle swarm optimization support vector regression: defining the distance between the particle solution sets, calculating the density of different particles based on the distance, and defining the updating mode of the particles according to the density; and optimizing the key parameters supporting vector regression by using an algorithm to complete the processing of DGA online data.
Further, the specific steps of the piecewise linearization algorithm of the sequence data set forth in S3 are:
1) for the online monitoring data of the equipment indexes similar to DGA, equivalent to time sequence data;
2) for time series XK={x1,x2,…,xkIntercepting data points by a window with the length of L (L < k), and carrying out piecewise linear fitting on the data points contained in the intercepted window on the basis of the idea of a sliding window;
3) the first data point in the window is taken as the fitting starting point of the initial line segment, and the point is taken as xiAssuming that the fitting end point of the initial line segment is xi+m(m > 1), fitting the m +1 data points to a line segment;
4) then for such a line segment, it is expressed by the following equation:
Figure BDA0003412507750000031
my-(Xi+m-1-Xi)X-(m-1)Xi+Xi+m-1=0 (2)
taking the distance from the actual data point to the fitting line segment as a fitting error; calculating the distances from all actual data points in the step length of the fitted line segment to the line segment, and taking the sum of the distances as the overall fitting error ER of the line segment:
Figure BDA0003412507750000032
Figure BDA0003412507750000033
5) setting the fitting error threshold to ERrIf ER < ERrIf so, the line segment can still continue to increase the fitting point, let m be m +1, and repeat the above steps; if ER > ER is presentrIf the line segment can not be fitted, the fitting end point of the current line segment is stored as Xend=Xi+m-1And recording the data sampling time, returning to the step 3), resetting the parameter m, and fitting the next part of data by taking the current fitting endpoint as the fitting starting point of the next line segment until all data points in the sequence are fitted.
Further, the similarity model is constructed in S4, and the main steps of performing cluster analysis based on the similarity model are as follows:
1) form all line segment attributes present in the same sequence as
Figure BDA0003412507750000034
The standardization operation of (2);
2) during cluster analysis, establishing a standard for measuring the similarity of the line segments; extracting two key parameters of the slope and the span of the line segment, describing the similarity between the line segments by using Euclidean distance, and expressing the consideration degree of different attributes of the line segment in a weight mode; the established line segment similarity model is shown as the following formula:
Figure BDA0003412507750000035
3) based on the line segment similarity model, the line segment set is subjected to clustering analysis by using a K-means algorithm improved based on the maximum and minimum distances, and similar line segments are divided into the same category.
Further, in S4, the improved K-means algorithm based on the maximum and minimum distances includes the following main steps:
1) the maximum and minimum distances are also based on Euclidean distances, and the difference between the maximum and minimum distances and the K-means algorithm is that an object with a maximum distance is taken as a clustering center; for the sample set, a proportion coefficient theta (0 < theta < 1) is given, and the sample set s is taken arbitrarilynIs the initial clustering center, denoted as z1
2) Optionally taking the distance z of the remaining n-1 samples1The farthest sample is the second cluster center, denoted as z2
3) Calculate the remaining n-2 samples and z1And z2And finding the minimum value among them, namely:
Dij=||xi-zj||,j=1,2 (6)
Di=min(Di1,Di2),i=1,2,…,n (7)
4) if it is
Di=max{Di}>θ×||zi-z2|| (8)
Then select the corresponding sample siAs a third cluster center z3
5) Assuming that there are K cluster centers, the distance from the remaining n-K samples to the cluster centers is calculated, and the following steps are carried out:
Dr=max{min(Di1,Di2,…Dik)}>θ×||z1-z2|| (9)
then the corresponding sample xrIs the K +1 cluster center and is marked as zK+1(ii) a The process is continuously circulated until no new clustering center appears;
6) when no new cluster center is present, the samples are assigned to each class according to the minimum distance principle.
Further, the main process of sequence association mining in S5 is as follows:
1) setting parameters of minimum support degree and minimum confidence degree; the confidence coefficient and the support threshold are the basis for judging the sequence association and the frequent item set, and the minimum support threshold of the frequent-1 and frequent-2 item sets is recorded as min1And min2The minimum confidence threshold in the sequence association mining is mincon;
2) generating a frequent item set; using the summed two-signed sequence as a transaction set, denoted
Figure BDA0003412507750000041
Wherein
Figure BDA0003412507750000042
All symbol categories corresponding to the two sequences are: { A1,A2,…,ACAAnd { B }1,B2,…,BCBObtaining a frequent item set of the sequence by scanning the transaction set in two stages based on the basic idea of an Apriori algorithm; the confidence for each symbol in the sequence is calculated according to equation (10):
Figure BDA0003412507750000051
in the formula NtRepresenting the number of transaction sets, namely the number of elements in the sequence, representing the proportion of items in the transaction sets by the support degree, and when a frequent-1 item set is mined, the support degree is greater than the minimum1The items of (a) are divided into a set of frequent-1 item sets;
the collection of frequent-1 item sets of two sequences in the association mining is recorded as PA、PBPairing the items in the set according to the index parameters to form the form (P)Ai,PBi) Form 2-item set, calculating the support degree of each item in the 2-item set, and enabling the support degree to be greater than min2Is divided into a frequent-2 item set, denoted as { PA,PB}freq
3) Mining sequence relevance; combining all the sequences pairwise, and respectively counting the support degree of the frequent-2 item concentrated items in the sequences and the confidence degree between the corresponding association mining sequences;
accumulating the support degrees of all frequent-2 item sets between two index parameters according to the formula (11), and taking the accumulated support degrees as the support degree counts of the two parameter sequences in all multivariate sequences;
Figure BDA0003412507750000052
σ(XA)=sum(σ(PA)) (12)
σ(XB)=sum(σ(PB)) (13)
wherein m is CA + CB, and is the total number of line segment categories divided after the two-sequence clustering analysis; meanwhile, the minimum support threshold of the index sequence level is minsup3If the support degree of the parameter index level is larger than the set threshold value, calculating the confidence degree con (X) of the combination of the symbol item sets in the two sequencesA→XB) As shown in formula (14):
Figure BDA0003412507750000053
when the confidence is greater than the set minimum confidence threshold, the association rule X is reservedA→XBAnd describing the strength of the association between the two indexes by using the confidence coefficient, and judging that the two indexes have strong association.
Further, the improved particle swarm optimization support vector regression in S7, which is mainly: for vacant numerical points caused by deletion of abnormal values, repairing the vacant numerical points by using a support vector regression algorithm for improving particle swarm optimization; the method mainly comprises the following steps:
1) defining the number m of variables, generating N m-dimensional particles in a feasible solution space, StIs the t-th generation particle in the iteration, wherein the element is
Figure BDA0003412507750000061
Wherein the elements are expressed as
Figure BDA0003412507750000062
2) Determining the inertia weight, wherein the specific expression is as follows:
Figure BDA0003412507750000063
wherein, waAnd wzRepresenting maximum and minimum values of inertial weight, fz,fpjRespectively, the fitness value of a particle, the minimum fitness value of all particles, and the average fitness value of all particles.
3) The types of the particle populations are divided, and the distance between each particle is expressed by Euclidean distance:
Figure BDA0003412507750000064
define a standard distance:
Figure BDA0003412507750000065
wherein r is a dividing radius, and the density c of i particles is calculatedi
Figure BDA0003412507750000066
niIs the number of particles in the i particle population, and N is the resulting solutionThe number of particles is concentrated.
4) The particles initialize two learning factors mu of the algorithm according to the category of the population to which the particles belong1、μ2(ii) a When the particle density ciWhen the value is larger than a certain threshold value, the updating mode is as follows:
Figure BDA0003412507750000067
when the particle density ciWhen the value is less than a certain threshold value, the updating mode is as follows:
Figure BDA0003412507750000068
the invention has the beneficial effects that:
the relevance among different indexes of the transformer oil chromatogram is excavated through sequence segmentation and a relevance analysis algorithm, abnormal points in transformer DGA online monitoring data are distinguished, the abnormal points are repaired according to a regression algorithm, and the processing speed of the transformer DGA online monitoring data is effectively improved.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a flow chart of a sequence segmentation algorithm;
FIG. 3 is a flow chart of an improved particle swarm algorithm solution;
FIG. 4 is a comparison graph of hydrogen index fit;
FIG. 5 is a comparison graph of methane indicator fit;
FIG. 6 shows detected outliers of hydrogen and methane sequences;
FIG. 7 is a diagram of data repair results;
Detailed Description
The following describes in detail a processing method of online monitoring data of a transformer DGA according to the present invention with reference to embodiments and drawings.
A processing method of transformer DGA online monitoring data is disclosed, as shown in FIG. 1, and comprises the following steps:
s1, importing DGA online data and setting basic parameters of a sliding window algorithm: the significance of online monitoring data lies in real-time reflection of equipment indexes, the scale of an online data set of the equipment is generally huge after the equipment runs for a long time, the complexity of analyzing the whole data set is high, the feasibility is not achieved, and the online data has timeliness, namely when a certain sampling point is analyzed, the closer the sampling point is to the point change, the greater the significance is, and vice versa, the smaller the significance is. The invention introduces the idea of sliding the window, intercepts the online data set by using the window with the length of L, and analyzes the data in the window to reduce the complexity of the process.
S2, traversing the online data set by sliding a window with a certain step size: setting the sliding step length as l, dragging the window to slide on the whole data set until all data are traversed; let the length of the online data set be L1After traversal, get
Figure BDA0003412507750000071
A data window, deriving the data in all windows to form a data set DS to be analyzedi,i∈n。
S3, piecewise linearization of sequence data: since online data is usually a numerical variable, the method is not suitable for relevance mining of sequence data; the invention provides a piecewise linearization algorithm of sequence data, which combines a variable number of points in online data according to a model to form a multi-group data point set; the criteria for grouping of data points is that the error between the line segment to which all points are fitted and the actual data points is less than a threshold, and the slope and line segment span of the line segment used characterize the fitted line segment.
S4, constructing a model for describing the similarity of different line segments: and constructing a similarity model based on the slope and the span of the line segment, classifying the line segment by using a K-means clustering algorithm improved based on the maximum and minimum distances, giving symbols to the line segments of the same class, and completing the symbolization of the sequence data.
S5, mining the relevance among different sequences: based on the idea of Apriori algorithm, the minimum confidence and the support degree are set, frequent item sets existing among different sequences are mined, and the relevance among the different sequences is quantified.
S6, extracting and screening abnormal values existing in DGA online monitoring data: and separating data of different abnormal modes from the abnormal numerical value types in the judged data according to the strength of the correlation among the sequences.
S7, improving particle swarm optimization support vector regression: defining the distance between the particle solution sets, calculating the density of different particles based on the distance, and defining the updating mode of the particles according to the density so as to improve the solving speed of the algorithm and the diversity of the solution; and optimizing the key parameters of the support vector regression by using an algorithm, improving the data regression precision and finishing the processing of DGA online data.
The object studied by the method is DGA online monitoring data of certain main transformer equipment.
As shown in fig. 2, the specific steps of the piecewise linearization algorithm of sequence data proposed in S3 are:
1) for the online monitoring data of the device indexes similar to DGA, the nature of the online monitoring data can be regarded as state index values which are acquired one by one according to a certain time interval sequence. Data is known to have strong temporal properties and can be equated to time series data.
2) For time series XK={x1,x2,…,xkAnd intercepting data points by using a window with the length of L (L < k), and carrying out piecewise linear fitting on the data points contained in the intercepted data points on the basis of the idea of a sliding window.
3) The first data point in the window is taken as the fitting starting point of the initial line segment, and the point is taken as xiAssuming that the fitting end point of the initial line segment is xi+m(m > 1), the m +1 data points are fitted to a line segment.
4) Then for such a line segment, it can be expressed by the following equation:
Figure BDA0003412507750000081
my-(Xi+m-1-Xi)X-(m-1)Xi+Xi+m-1=0 (2)
the distance from the actual data points to the fitting line segment is used as a fitting error, and the fitting accuracy of the fitting line segment to the actual numerical points is improved; calculating the distances from all actual data points in the step length of the fitted line segment to the line segment, and taking the sum of the distances as the overall fitting error ER of the line segment:
Figure BDA0003412507750000082
Figure BDA0003412507750000083
5) setting the fitting error threshold to ERrIf ER < ERrIf so, the line segment can still continue to increase the fitting point, let m be m +1, and repeat the above steps; if ER > ER is presentrIf the line segment can not be fitted, the fitting end point of the current line segment is stored as Xend=Xi+m-1And recording the data sampling time, returning to the step 3), resetting the parameter m, and fitting the next part of data by taking the current fitting endpoint as the fitting starting point of the next line segment until all data points in the sequence are fitted.
The similarity model is constructed in S4, and the main steps of cluster analysis based on the similarity model are as follows:
1) because different indexes have certain order of magnitude difference in DGA online monitoring, all line segment attributes in the same sequence need to be shaped as
Figure BDA0003412507750000091
The standardization operation of (1).
2) During cluster analysis, a standard for measuring the similarity of the line segments needs to be established; the DGA online data reflects real-time indexes of equipment, and the change trend and the form of parameters can reflect the change of the running state of the equipment most, so that when a model for measuring the similarity of line segments is established, different consideration needs to be given to different attributes of the line segments; the established line segment similarity model is shown as the following formula:
Figure BDA0003412507750000092
3) based on the line segment similarity model, the line segment set is subjected to clustering analysis by using a K-means algorithm improved based on the maximum and minimum distances, and similar line segments are divided into the same category.
In S4, the improved K-means algorithm based on the maximum and minimum distance mainly comprises the following steps:
1) the maximum and minimum distances are also based on Euclidean distances, and the difference between the maximum and minimum distances and the K-means algorithm is that an object with a maximum distance is taken as a clustering center; for the sample set, a proportion coefficient theta (0 < theta < 1) is given, and the sample set s is taken arbitrarilynIs the initial clustering center, denoted as z1
2) Optionally taking the distance z of the remaining n-1 samples1The farthest sample is the second cluster center, denoted as z2
3) Calculate the remaining n-2 samples and z1And z2And finding the minimum value among them, namely:
Dij=||xi-zj||,j=1,2 (6)
Di=min(Di1,Di2),i=1,2,…,n (7)
4) if it is
Di=max{Di}>θ×||zi-z2|| (8)
Then select the corresponding sample siAs a third cluster center z3
5) Assuming that there are K cluster centers, the distance from the remaining n-K samples to the cluster centers is calculated, and the following steps are carried out:
Dr=max{min(Di1,Di2,…Dik)}>θ×||z1-z2|| (9)
then the corresponding sample xrIs the K +1 cluster center and is marked as zK+1(ii) a The process is continuously circulated until no new clustering center appears;
6) when no new cluster center is present, the samples are assigned to each class according to the minimum distance principle. The improved K-means clustering algorithm based on the maximum and minimum distances has the advantages that the clustering centers are consistent during each clustering analysis, the randomness of selecting the clustering centers by the traditional K-means algorithm is eliminated, and the accuracy and the speed of the clustering analysis can be effectively improved.
The main process of sequence association mining in S5 is as follows:
1) setting parameters of minimum support degree and minimum confidence degree; confidence and support threshold are the basis for judging sequence association and frequent item sets, proper threshold parameters are favorable for enhancing the reliability of association relationship, and the minimum support threshold of frequent-1 and frequent-2 item sets is recorded as min1And min2The minimum confidence threshold in the sequence association mining is mincon.
2) Generating a frequent item set; using the summed two-signed sequence as a transaction set, denoted
Figure BDA0003412507750000101
Wherein
Figure BDA0003412507750000102
All symbol categories corresponding to the two sequences are: { A1,A2,…,ACAAnd { B }1,B2,…,BCBBased on the basic idea of Apriori algorithm, the invention obtains a frequent item set of a sequence by scanning a transaction set in two stages; the confidence for each symbol in the sequence is calculated according to equation (10):
Figure BDA0003412507750000103
in the formula NtRepresenting the number of transaction sets, namely the number of elements in the sequence, representing the proportion of items in the transaction sets by the support degree, and when a frequent-1 item set is mined, the support degree is greater than the minimum1The items of (a) are divided into a set of frequent-1 item sets.
The collection of frequent-1 item sets of two sequences in the association mining is recorded as PA、PBPairing the items in the set according to the index parameters to form the form (P)Ai,PBi) Form 2-item set, calculating the support degree of each item in the 2-item set, and enabling the support degree to be greater than min2Is divided into a frequent-2 item set, denoted as { PA,PB}freq
3) Mining sequence relevance; combining all the sequences pairwise, and respectively counting the support degree of the frequent-2 item concentrated items in the sequences and the confidence degree between the corresponding association mining sequences;
accumulating the support degrees of all frequent-2 item sets between two index parameters according to the formula (11), and taking the accumulated support degrees as the support degree counts of the two parameter sequences in all multivariate sequences;
Figure BDA0003412507750000111
σ(XA)=sum(σ(PA)) (12)
σ(XB)=sum(σ(PB)) (13)
wherein m is CA + CB, and is the total number of line segment categories divided after the two-sequence clustering analysis; meanwhile, the minimum support threshold of the index sequence level is minsup3If the support degree of the parameter index level is larger than the set threshold value, calculating the confidence degree con (X) of the combination of the symbol item sets in the two sequencesA→XB) As shown in formula (14):
Figure BDA0003412507750000112
when the confidence is greater than the set minimum confidence threshold, the association rule X is reservedA→XBAnd describing the strength of the association between the two indexes by using the confidence coefficient, and judging that the two indexes have strong association.
The improved particle swarm optimization support vector regression in S7, which is mainly: for vacant numerical points caused by deletion of abnormal values, the invention provides a support vector regression algorithm for improving particle swarm optimization to repair. As shown in fig. 3, the main steps are as follows:
1) defining the number m of variables, generating N m-dimensional particles in a feasible solution space, StIs the t-th generation particle in the iteration, wherein the element is
Figure BDA0003412507750000113
Wherein the elements are expressed as
Figure BDA0003412507750000114
2) Determining the inertia weight, wherein the specific expression is as follows:
Figure BDA0003412507750000121
wherein, waAnd wzRepresenting maximum and minimum values of inertial weight, fz,fpjRespectively, the fitness value of a particle, the minimum fitness value of all particles, and the average fitness value of all particles.
3) The types of the particle populations are divided, and the distance between each particle is expressed by Euclidean distance:
Figure BDA0003412507750000122
define a standard distance:
Figure BDA0003412507750000123
wherein r is a dividing radius, and the density c of i particles is calculatedi
Figure BDA0003412507750000124
niThe number of particles in the i particle group and the number of particles in the generated solution set are N.
4) Two learning factors mu of the initialization algorithm1、μ2(ii) a When the particle density ciWhen the value is larger than a certain threshold value, the updating mode is as follows:
Figure BDA0003412507750000125
when the particle density ciWhen the value is less than a certain threshold value, the updating mode is as follows:
Figure BDA0003412507750000126
specific examples are given below:
a processing method of transformer DGA online monitoring data comprises the following steps:
s1, sliding window processing of the data set: after the power transformer runs for many years, DGA online monitoring data of the power transformer usually has a larger scale, and meanwhile, the complexity of an algorithm and the running pressure of a server are usually increased by processing the whole data set, so that the feasibility is low; a DGA online data processing method based on a sliding window idea is provided, a data window with the length of L is established, and data are intercepted in a data set through the window.
S2, intercepting the data set according to a certain step length: dragging the data window to length L with step length of L1The on-line monitoring data is centrally slid to obtain an intercepted data
Figure BDA0003412507750000127
A data window for exporting the obtained window dataTo obtain the data window set { DS) to be processediAnd f, i belongs to n, and the data processing takes a data window as a basic unit of analysis.
S3, piecewise linearization processing of the sequence data in the window: for intercepted data window WiRespectively extracting corresponding sequence data according to DGA monitoring indexes, wherein the example mainly researches H in DGA2、CH4Two types of gases, and thus in the data window WiCorresponding 2 sequences can be obtained, and the sequences are subjected to piecewise linearization.
S4, cluster analysis of the line segment set: for the line segment set expressed in an array form, the embodiment establishes a model ds for describing the similarity of the line segments by using a Euclidean distance method based on relevant parameters in the line segment setijAnd according to the similarity model, carrying out clustering analysis on the line segment set by using a K-means clustering algorithm improved based on the maximum and minimum distances, merging the line segments with higher similarity into a category, endowing symbols for each category line segment, and completing symbolization of sequence data.
S5, mining the correlation between the sequences: for two sequences completing the summarizing operation, the idea of the Apriori algorithm is based on, and the minimum support degree threshold value minsup of different layers are setiAnd continuously mining a frequent item set existing among the sequences by using a minimum confidence threshold value mincon of the index level, and finally judging the strength of the association relation among the indexes.
S6, extracting and screening abnormal data based on the incidence relation: and screening and extracting invalid abnormal data existing in the sequences according to the incidence relation among the sequences.
S7, data restoration: and restoring DGA online monitoring data by using improved particle swarm optimization support vector regression to complete the processing work of the DGA online data.
The method provided by the invention is used for carrying out piecewise linearization fitting on the window sequence data by taking hydrogen and methane gas indexes in DGA historical online monitoring data of certain main transformer equipment as research objects, and attention should be paid here to: because different index data are in different orders of magnitude, when the piecewise linearization fitting is carried out by using the method provided by the invention, appropriate fitting error thresholds should be selected for different index data, and the specific fitting result of each index data is shown in fig. 4 and 5.
The fitting result proves the feasibility of the online data piecewise linearization algorithm provided by the invention, the fitting error of each line segment is smaller than the set fitting error threshold, the fitted line segment can better reflect the change trend of the online data points in the fitting interval, and the effectiveness of the algorithm is verified.
Mining of sequence association relation: after the corresponding frequent item set is obtained, the relevance between the two indexes is analyzed by using the method provided by the invention, so that the support degree is convenient for representing the strength of the relevance relation by the confidence coefficient, and H is obtained2→CH4The support degree and the confidence degree of (2) are 0.5050 and 0.6804 respectively, which are both larger than the set related minimum threshold value, and indicate that the rule is a strong association rule, which indicates that a strong association relationship exists between the hydrogen and methane indexes. The results of the detection are shown in FIG. 6. The repair DGA online data results are shown in fig. 7.
Therefore, the data points which are screened out are found, all values return to normal levels after the gas with the characteristics are repaired by the method, and the online data are effectively cleaned.

Claims (6)

1. A processing method of transformer DGA online monitoring data is characterized in that: the method comprises the following steps:
s1, sliding window processing of the data set: introducing a sliding window idea, and intercepting an online data set by using a window with the length of L;
s2, traversing the online data set by sliding a window with a certain step size: setting the sliding step length as l, dragging the window to slide on the whole data set until all data are traversed; let the length of the online data set be L1After traversal, get
Figure FDA0003412507740000011
A data window, deriving the data in all windows to form a data set DS to be analyzedi,i∈n;
S3, piecewise linearization of sequence data: providing a piecewise linearization algorithm of sequence data, and combining a variable number of points in online data to form a multi-group data point set; the grouping of data points is normalized in that the error between the line segment fitted to all points and the actual data points is less than a threshold, and the slope and line segment span of the line segment used characterize the fitted line segment;
s4, constructing a model for describing the similarity of different line segments: constructing a similarity model based on the slope and span of the line segments, classifying the line segments by using a K-means clustering algorithm improved based on the maximum and minimum distances, giving symbols to the line segments of the same class, and completing the symbolization of sequence data;
s5, mining the relevance among different sequences: based on the idea of Apriori algorithm, setting minimum confidence and support degree, mining frequent item sets existing among different sequences, and quantifying the relevance among different sequences;
s6, extracting and screening abnormal values existing in DGA online monitoring data: according to the strength of the correlation among the sequences, separating data of different abnormal modes from the abnormal numerical value types in the judged data;
s7, improving particle swarm optimization support vector regression: defining the distance between the particle solution sets, dividing different particle categories based on the distance, and defining a particle updating mode; and optimizing the key parameters supporting vector regression by using an algorithm to complete the processing of DGA online data.
2. The processing method of the on-line monitoring data of the DGA of the transformer as claimed in claim 1, wherein: the specific steps of the piecewise linearization algorithm of the sequence data set forth in S3 are:
1) for the online monitoring data of the equipment indexes similar to DGA, equivalent to time sequence data;
2) for time series XK={x1,x2,…,xkIntercepting data points by a window with the length of L (L < k), and carrying out piecewise linear fitting on the data points contained in the intercepted window on the basis of the idea of a sliding window;
3)the first data point in the window is taken as the fitting starting point of the initial line segment, and the point is taken as xiAssuming that the fitting end point of the initial line segment is xi+m(m > 1), fitting the m +1 data points to a line segment;
4) then for such a line segment, it is expressed by the following equation:
Figure FDA0003412507740000021
my-(Xi+m-1-Xi)X-(m-1)Xi+Xi+m-1=0 (2)
taking the distance from the actual data point to the fitting line segment as a fitting error; calculating the distances from all actual data points in the step length of the fitted line segment to the line segment, and taking the sum of the distances as the overall fitting error ER of the line segment:
Figure FDA0003412507740000022
Figure FDA0003412507740000023
5) setting the fitting error threshold to ERrIf ER < ERrIf so, the line segment can still continue to increase the fitting point, let m be m +1, and repeat the above steps; if ER > ER is presentrIf the line segment can not be fitted, the fitting end point of the current line segment is stored as Xend=Xi+m-1And recording the data sampling time, returning to the step 3), resetting the parameter m, and fitting the next part of data by taking the current fitting endpoint as the fitting starting point of the next line segment until all data points in the sequence are fitted.
3. The processing method of the on-line monitoring data of the DGA of the transformer as claimed in claim 1, wherein: the similarity model is constructed in S4, and the main steps of cluster analysis based on the similarity model are as follows:
1) form all line segment attributes present in the same sequence as
Figure FDA0003412507740000024
The standardization operation of (2);
2) during cluster analysis, establishing a standard for measuring the similarity of the line segments; extracting two key parameters of the slope and the span of the line segment, describing the similarity between the line segments by using Euclidean distance, and expressing the consideration degree of different attributes of the line segment in a weight mode; the established line segment similarity model is shown as the following formula:
Figure FDA0003412507740000025
3) based on the line segment similarity model, the line segment set is subjected to clustering analysis by using a K-means algorithm improved based on the maximum and minimum distances, and similar line segments are divided into the same category.
4. The method for processing the on-line monitoring data of the DGA of the transformer as claimed in claim 3, wherein: in S4, the improved K-means algorithm based on the maximum and minimum distance mainly comprises the following steps:
1) the maximum and minimum distances are also based on Euclidean distances, and the difference between the maximum and minimum distances and the K-means algorithm is that an object with a maximum distance is taken as a clustering center; for the sample set, a proportion coefficient theta (0 < theta < 1) is given, and the sample set s is taken arbitrarilynIs the initial clustering center, denoted as z1
2) Optionally taking the distance z of the remaining n-1 samples1The farthest sample is the second cluster center, denoted as z2
3) Calculate the remaining n-2 samples and z1And z2And finding the minimum value among them, namely:
Dij=||xi-zj||,j=1,2 (6)
Di=min(Di1,Di2),i=1,2,…,n (7)
4) if it is
Di=max{Di}>θ×||zi-z2|| (8)
Then select the corresponding sample siAs a third cluster center z3
5) Assuming that there are K cluster centers, the distance from the remaining n-K samples to the cluster centers is calculated, and the following steps are carried out:
Dr=max{min(Di1,Di2,…Dik)}>θ×||z1-z2|| (9)
then the corresponding sample xrIs the K +1 cluster center and is marked as zK+1(ii) a The process is continuously circulated until no new clustering center appears;
6) when no new cluster center is present, the samples are assigned to each class according to the minimum distance principle.
5. The processing method of the on-line monitoring data of the DGA of the transformer as claimed in claim 1, wherein: the main process of sequence association mining in S5 is as follows:
1) setting parameters of minimum support degree and minimum confidence degree; the confidence coefficient and the support threshold are the basis for judging the sequence association and the frequent item set, and the minimum support threshold of the frequent-1 and frequent-2 item sets is recorded as min1And min2The minimum confidence threshold in the sequence association mining is mincon;
2) generating a frequent item set; using the summed two-signed sequence as a transaction set, denoted
Figure FDA0003412507740000031
Wherein
Figure FDA0003412507740000032
All symbol categories corresponding to the two sequences are: { A1,A2,…,ACAAnd { B }1,B2,…,BCBObtaining a frequent item set of the sequence by scanning the transaction set in two stages based on the basic idea of an Apriori algorithm; the confidence for each symbol in the sequence is calculated according to equation (10):
Figure FDA0003412507740000041
in the formula NtRepresenting the number of transaction sets, namely the number of elements in the sequence, representing the proportion of items in the transaction sets by the support degree, and when a frequent-1 item set is mined, the support degree is greater than the minimum1The items of (a) are divided into a set of frequent-1 item sets;
the collection of frequent-1 item sets of two sequences in the association mining is recorded as PA、PBPairing the items in the set according to the index parameters to form the form (P)Ai,PBi) Form 2-item set, calculating the support degree of each item in the 2-item set, and enabling the support degree to be greater than min2Is divided into a frequent-2 item set, denoted as { PA,PB}freq
3) Mining sequence relevance; combining all the sequences pairwise, and respectively counting the support degree of the frequent-2 item concentrated items in the sequences and the confidence degree between the corresponding association mining sequences;
accumulating the support degrees of all frequent-2 item sets between two index parameters according to the formula (11), and taking the accumulated support degrees as the support degree counts of the two parameter sequences in all multivariate sequences;
Figure FDA0003412507740000042
σ(XA)=sum(σ(PA)) (12)
σ(XB)=sum(σ(PB)) (13)
wherein m is CA + CB, and is the total number of line segment categories divided after the two-sequence clustering analysis; while recording the minimum of the index sequence levelSupport degree threshold value is minisu3If the support degree of the parameter index level is larger than the set threshold value, calculating the confidence degree con (X) of the combination of the symbol item sets in the two sequencesA→XB) As shown in formula (14):
Figure FDA0003412507740000043
when the confidence is greater than the set minimum confidence threshold, the association rule X is reservedA→XBAnd describing the strength of the association between the two indexes by using the confidence coefficient, and judging that the two indexes have strong association.
6. The processing method of the on-line monitoring data of the DGA of the transformer as claimed in claim 1, wherein: the improved particle swarm optimization support vector regression in S7, which is mainly: for vacant numerical points caused by deletion of abnormal values, repairing the vacant numerical points by using a support vector regression algorithm for improving particle swarm optimization; the method mainly comprises the following steps:
1) defining the number m of variables, generating N m-dimensional particles in a feasible solution space, StIs the t-th generation particle in the iteration, wherein the element is
Figure FDA0003412507740000051
Wherein the elements are expressed as
Figure FDA0003412507740000052
2) Determining an inertia weight, wherein the inertia weight represents the inheritance degree of the particle to the speed during the last iteration; the specific expression is as follows:
Figure FDA0003412507740000053
wherein, waAnd wzRepresenting maximum and minimum values of inertial weight, fz,fpjRespectively representThe fitness value of the particle, the minimum fitness value of all the particles and the average fitness value of all the particles;
3) the types of the particle populations are divided, and the distance between each particle is expressed by Euclidean distance:
Figure FDA0003412507740000054
define a standard distance:
Figure FDA0003412507740000055
wherein r is a dividing radius, and the density c of i particles is calculatedi
Figure FDA0003412507740000056
niThe number of particles in the particle swarm is i, and N is the number of generated solution concentration particles;
4) two learning factors mu of the initialization algorithm1、μ2(ii) a When the particle density ciWhen the value is larger than a certain threshold value, the updating mode is as follows:
Figure FDA0003412507740000057
when the particle density ciWhen the value is less than a certain threshold value, the updating mode is as follows:
Figure FDA0003412507740000058
CN202111534103.2A 2021-12-15 2021-12-15 Processing method of DGA (differential global alignment) online monitoring data of transformer Pending CN114372093A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111534103.2A CN114372093A (en) 2021-12-15 2021-12-15 Processing method of DGA (differential global alignment) online monitoring data of transformer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111534103.2A CN114372093A (en) 2021-12-15 2021-12-15 Processing method of DGA (differential global alignment) online monitoring data of transformer

Publications (1)

Publication Number Publication Date
CN114372093A true CN114372093A (en) 2022-04-19

Family

ID=81139694

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111534103.2A Pending CN114372093A (en) 2021-12-15 2021-12-15 Processing method of DGA (differential global alignment) online monitoring data of transformer

Country Status (1)

Country Link
CN (1) CN114372093A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115695564A (en) * 2022-12-30 2023-02-03 深圳市润信数据技术有限公司 Efficient transmission method for data of Internet of things
CN116776258A (en) * 2023-08-24 2023-09-19 北京天恒安科集团有限公司 Power equipment monitoring data processing method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015176565A1 (en) * 2014-05-22 2015-11-26 袁志贤 Method for predicting faults in electrical equipment based on multi-dimension time series
CN107065568A (en) * 2017-05-26 2017-08-18 广州供电局有限公司 A kind of Diagnosis Method of Transformer Faults based on particle swarm support vector machine
KR102106827B1 (en) * 2018-11-30 2020-05-06 두산중공업 주식회사 System and method for optimizing boiler combustion
CN113792754A (en) * 2021-08-12 2021-12-14 国网江西省电力有限公司电力科学研究院 Method for processing DGA (differential global alignment) online monitoring data of converter transformer by removing different elements and then repairing

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015176565A1 (en) * 2014-05-22 2015-11-26 袁志贤 Method for predicting faults in electrical equipment based on multi-dimension time series
CN107065568A (en) * 2017-05-26 2017-08-18 广州供电局有限公司 A kind of Diagnosis Method of Transformer Faults based on particle swarm support vector machine
KR102106827B1 (en) * 2018-11-30 2020-05-06 두산중공업 주식회사 System and method for optimizing boiler combustion
CN113792754A (en) * 2021-08-12 2021-12-14 国网江西省电力有限公司电力科学研究院 Method for processing DGA (differential global alignment) online monitoring data of converter transformer by removing different elements and then repairing

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
吴米佳;卢锦玲;: "基于改进粒子群算法与支持向量机的变压器状态评估", 电力科学与工程, no. 03, 28 March 2011 (2011-03-28) *
郭世伟;孟昱煜;: "一个基于二阶粒子群的关联规则挖掘算法", 兰州交通大学学报, no. 03, 15 June 2016 (2016-06-15) *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115695564A (en) * 2022-12-30 2023-02-03 深圳市润信数据技术有限公司 Efficient transmission method for data of Internet of things
CN115695564B (en) * 2022-12-30 2023-03-10 深圳市润信数据技术有限公司 Efficient transmission method of Internet of things data
CN116776258A (en) * 2023-08-24 2023-09-19 北京天恒安科集团有限公司 Power equipment monitoring data processing method and system
CN116776258B (en) * 2023-08-24 2023-10-31 北京天恒安科集团有限公司 Power equipment monitoring data processing method and system

Similar Documents

Publication Publication Date Title
CN112800686A (en) Transformer DGA online monitoring data abnormal mode judgment method
CN107493277B (en) Large data platform online anomaly detection method based on maximum information coefficient
CN111324642A (en) Model algorithm type selection and evaluation method for power grid big data analysis
CN113792754A (en) Method for processing DGA (differential global alignment) online monitoring data of converter transformer by removing different elements and then repairing
CN114372093A (en) Processing method of DGA (differential global alignment) online monitoring data of transformer
CN113987033B (en) Main transformer online monitoring data group deviation identification and calibration method
CN110930198A (en) Electric energy substitution potential prediction method and system based on random forest, storage medium and computer equipment
CN109597757B (en) Method for measuring similarity between software networks based on multidimensional time series entropy
CN116780781B (en) Power management method for smart grid access
CN116150191A (en) Data operation acceleration method and system for cloud data architecture
CN113554361B (en) Comprehensive energy system data processing and calculating method and processing system
CN115544519A (en) Method for carrying out security association analysis on threat information of metering automation system
CN116796271A (en) Resident energy abnormality identification method
CN116737510B (en) Data analysis-based intelligent keyboard monitoring method and system
CN116361059B (en) Diagnosis method and diagnosis system for abnormal root cause of banking business
CN117092582A (en) Electric energy meter abnormality detection method and device based on contrast self-encoder
CN116126807A (en) Log analysis method and related device
Man et al. The data mining in wireless spectrum monitoring application
CN115718861A (en) Method and system for classifying power users and monitoring abnormal behaviors in high-energy-consumption industry
CN111814436A (en) User behavior sequence detection method and system based on mutual information and entropy
CN112308338A (en) Power data processing method and device
CN112488805A (en) Long-renting market early warning method based on multiple regression time series analysis
CN117539920B (en) Data query method and system based on real estate transaction multidimensional data
CN117640252B (en) Encryption stream threat detection method and system based on context analysis
CN112836926B (en) Enterprise operation condition evaluation method based on electric power big data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination