CN113792754A

CN113792754A - Method for processing DGA (differential global alignment) online monitoring data of converter transformer by removing different elements and then repairing

Info

Publication number: CN113792754A
Application number: CN202110922307.7A
Authority: CN
Inventors: 童超; 朱自伟; 张益宁; 童军心; 李唐兵; 王华云; 张宇; 王鹏; 万华; 刘玉婷; 徐碧川; 童涛; 曾磊磊; 周友武
Original assignee: State Grid Jiangxi Electric Power Co ltd; State Grid Corp of China SGCC; Nanchang University; Electric Power Research Institute of State Grid Jiangxi Electric Power Co Ltd
Current assignee: State Grid Jiangxi Electric Power Co ltd; State Grid Corp of China SGCC; Nanchang University; Electric Power Research Institute of State Grid Jiangxi Electric Power Co Ltd
Priority date: 2021-08-12
Filing date: 2021-08-12
Publication date: 2021-12-14
Anticipated expiration: 2041-08-12
Also published as: CN113792754B

Abstract

The invention discloses a method for processing on-line monitoring data of a converter transformer DGA (differential global architecture) by removing different materials and then repairing, wherein the first stage introduces the idea of a sliding window algorithm, uses a piecewise linearization algorithm to divide sequence data into a plurality of line segments represented by slopes and spans, uses K-means clustering improved based on the maximum and minimum distances to symbolize the on-line monitoring data, and finally uses an APRIORI algorithm to mine the relevance among different indexes in the DGA and mine the abnormal values existing in the DGA; and in the second stage, providing an improved particle swarm optimization support vector regression algorithm according to the screened abnormal numerical sampling points, defining the distance between particle solution sets, dividing different types of particles by using a fuzzy inference rule, defining different updating formulas according to the different types of particles, ensuring the solving speed and solving diversity of the algorithm, optimizing key parameters in the support vector regression algorithm to repair the sampling points, and realizing the processing of online DGA monitoring data.

Description

Method for processing DGA (differential global alignment) online monitoring data of converter transformer by removing different elements and then repairing

The technical field is as follows:

the invention relates to a DGA (differential global alignment) online monitoring data processing method for a converter transformer, which belongs to the field of power equipment data cleaning and requires the priority of a prior application CN202110330366.5 on 3, 25/2021.

Background art:

the power converter transformer is a hub device for converting and transmitting electric energy, and the safe and stable operation of the power converter transformer is an important guarantee for the power supply quality of users. DGA index online data of the converter transformer is used for monitoring the insulation performance of equipment in real time, and the real-time state of the converter transformer can be rapidly obtained based on the analysis of oil chromatographic data; meanwhile, indexes in the DGA data are more in dimensionality, and the data of different abnormal modes in the online data can be distinguished by mining the incidence relation of the indexes, so that the reliability of the comprehensive state evaluation result of the equipment can be enhanced.

Due to the operating environment of the equipment and the electromagnetic interference effect of the converter transformer, the online monitoring device is easy to have randomly distributed abnormal numerical points in the data acquisition and transmission process, and even has the situations of data drift and transmission interruption in severe cases. The background system can quickly distinguish the obvious data abnormal phenomena such as data drift, data interruption and the like and give an alarm aiming at the problems; however, for the abnormal numerical points randomly distributed in the normal online data, the real-time representation of the equipment state index is seriously interfered, the state evaluation work based on the index is also influenced, the situations of false report, false report and the like of the abnormal state of the equipment are easily caused, and the waste of the running maintenance resources of the equipment is caused.

The power converter transformer is important equipment for ensuring stable operation of a transmission and distribution network, and iron core grounding current monitoring data of the converter transformer is an important basis for state evaluation of the converter transformer. The monitoring data of a period of time, including the overall change trend, the extreme points and jump points in the change and the data statistical characteristics, can reflect the possible abnormal conditions in the power converter flow from multiple aspects.

After the long-term operation of the power equipment, the existing large-scale index data is stored in the power database and inevitably contains the index data of different abnormal modes, the existing index data is subjected to correlation analysis, the existing correlation relation is excavated, the data of different abnormal modes in the data is analyzed based on the correlation relation, and the data are effectively repaired, so that the comprehensive state evaluation system of the power equipment is favorably perfected, the abnormal state of the equipment device is found out in advance, the equipment overhaul efficiency is improved, and the operation and maintenance cost of the equipment is reduced.

The CN112800686A previously applied by the applicant discloses a method for judging the abnormal mode of DGA (differential global amplitude versus offset) online monitoring data of a converter transformer, which comprises the steps of importing the DGA online monitoring data, setting the length and the sliding step length of a sliding window, traversing an online data set by the sliding window with a certain step length, fitting each intercepted data window by using a sliding data piecewise linearization algorithm based on least square, representing the fitted line segment by using the slope of the line segment obtained by fitting, the actual growth rate containing data and the span of the line segment, constructing a model of the similarity of the described line segment, and carrying out clustering analysis on the line segment set by using a K-means algorithm; symbolizing the line segment set, and summarizing the number of elements in the set after symbolization of different sequences; based on the idea of Apriori algorithm, frequent item sets existing among different sequences are mined, the relevance among the different sequences is quantified, and the data of different abnormal modes are separated according to the strength of the relevance among the sequences and the type of abnormal numerical values existing in the judged data.

However, the on-line data of the power transformation equipment usually needs to be collected, converted and transmitted before being stored in the system database, and the data is used for real-time monitoring and displaying of the equipment state. However, due to the influence of human misoperation, severe operating environment, strong electromagnetic interference and other factors, the online data collected by the system usually has more problems, and the reliability of the online data on the representation of the equipment state is greatly influenced.

Disclosure of Invention

The invention solves the technical problem of providing a method for processing the on-line monitoring data of the DGA of the converter transformer by removing the difference and then repairing, wherein a first stage uses a piecewise linearization algorithm, and K-means clustering and APRIORI algorithm processing based on maximum and minimum distance improvement to discover abnormal values existing in the method; and in the second stage, the sampling points are repaired by using an improved particle swarm optimization support vector regression algorithm, so that the processing of the online DGA monitoring data of the power converter transformer is realized.

The invention is realized by the following technical scheme, and the method for processing the on-line monitoring data of the DGA of the converter transformer by removing the difference firstly and then repairing comprises the following steps:

s1, importing DGA online monitoring data, and setting the length and the sliding step length of a sliding window;

s2, piecewise linearization of sequence data: combining a variable number of points in the online data together according to a model by using a piecewise linearization algorithm of sequence data to form a multi-group data point set; the criteria for grouping data points are: wherein the error between the line segment fitted by all the points and the actual data point is less than a threshold value, and the fitted line segment is represented by using the slope and the line segment span of the line segment;

s3, constructing a model for describing the similarity of different line segments: constructing a similarity model based on the slope and span of the line segments, classifying the line segments by using a K-means clustering algorithm improved based on the maximum and minimum distances, giving symbols to the line segments of the same class, and completing the symbolization of sequence data;

s4, mining the relevance among different sequences: setting a minimum confidence coefficient and a support degree based on an Apriori algorithm, mining a frequent item set existing among different sequences, and quantifying the relevance among the different sequences;

s5, extracting and screening abnormal values existing in DGA online monitoring data: according to the strength of the correlation among the sequences, separating data of different abnormal modes from the abnormal numerical value types in the judged data;

s6, optimizing key parameters of support vector regression by improving a particle swarm algorithm, and repairing the screened abnormal numerical value points: defining the distance between particle solution sets, calculating the density of different particles based on the distance, and introducing an improved fuzzy inference rule according to the density to define different particle updating modes so as to improve the diversity and solving speed of particle swarm algorithm solution; and optimizing the key parameters supporting vector regression by using an improved particle swarm algorithm, improving the data regression precision, repairing the screened abnormal numerical points, and finishing the processing of DGA on-line monitoring data.

Further preferably, in step S1, DGA online monitoring data is imported, the length of the sliding window is set to L, and the sliding step length is set to L; traversing the online data set with a sliding window of a certain step size: dragging a sliding window to slide on the whole online monitoring data set by a sliding step length l until all data are traversed; let the length of the on-line monitoring data set be L₁After traversal, get

A data window, deriving the data in all windows to form a data set DS to be analyzed_i，i∈n。

Further preferably, the step S2 provides a piecewise linearization algorithm of the sequence data, which specifically comprises the following steps:

s2.1, for monitoring data X_K＝{x₁,x₂,…,x_kIntercepting data points by a window with the length of L (L < k), and carrying out piecewise linear fitting on the data points contained in the intercepted window on the basis of the idea of a sliding window;

s2.2, taking the first data point in the window as the fitting starting point of the initial line segment, and enabling the point to be x_iAssuming that the fitting end point of the initial line segment is x_i+m(m > 1), fitting the m +1 data points to a line segment;

the distance from the actual data points to the fitting line segment is used as a fitting error, and the fitting accuracy of the fitting line segment to the actual numerical points is improved; unlike the conventional least squares fitting, let d_nAs index sequence number points X_nAnd (3) calculating the linear distance from all actual data points in the step length of the fitting line segment to the fitting line segment, and taking the sum of the linear distances as the fitting overall error ER of the line segment:

X_ithe sampling numerical value of the time i in the time sequence is represented, and m represents the number of numerical points contained in the fitting line segment; t is t_nRepresents a time step;

s2.3, setting a fitting error threshold value to be ER_rIf ER < ER_rIf so, the line segment can still continue to increase the fitting points, and let m be m +1, and repeat the above steps; ER if any_rIf m is equal to m, taking the point as a line segment fitting end point to generate a line segment; if ER > ER is present_rIf the line segment can not be fitted, the fitting end point of the current line segment is stored as X_end＝X_i+m-1And recording the data sampling time, returning to the step S2.2, resetting the parameter m, and fitting the next part of data by taking the current fitting end point as the fitting starting point of the next line segment until all data points in the sequence are fitted.

Preferably, in step S3, since there is a certain order of magnitude difference between different indicators in the DGA online monitoring data, all line segment triplets existing in the same sequence need to be shaped as

The standardization operation of (2);

during cluster analysis, establishing a standard for measuring the similarity of the line segments; describing the similarity between line segments by using Euclidean distance, wherein the consideration degree of different attributes of the line segments is expressed in a weight mode; the established line segment similarity model is shown as the following formula:

in the formula (ds)_ijRepresenting line segment similarity, ω_k、ω_m、ω_rAnd respectively representing the weight ratios occupied by the slope, the span and the growth rate in the line segment similarity model.

Further preferably, in step S3 of the present invention, the improved K-means algorithm based on the maximum and minimum distances includes the following main steps:

the maximum and minimum distances are also based on Euclidean distances, and the difference between the maximum and minimum distances and the K-means algorithm is that an object with a maximum distance is taken as a clustering center; for the sample set, a proportion coefficient theta (0 < theta < 1) is given, and the sample set s is taken arbitrarily_nIs the initial clustering center, denoted as z₁；

Optionally taking the distance z of the remaining n-1 samples₁The farthest sample is the second cluster center, denoted as z₂；

Calculate the remaining n-2 samples and z₁And z₂And finding the minimum value among them, namely:

D_ij＝||x_i-z_j||,j＝1,2 (6)

D_i＝min(D_i1,D_i2),i＝1,2,…,n (7)

if it is

D_i＝max{D_i}＞θ×||z_i-z₂|| (8)

Then select the corresponding sample s_iAs a third cluster center z₃；

Assuming that K cluster centers are provided, the distance from the rest n-K samples to the cluster centers is calculated, and

comprises the following steps:

D_r＝max{min(D_i1,D_i2,…D_ik)}＞θ×||z₁-z₂|| (9)

then the corresponding sample x_rIs the K +1 cluster center and is marked as z_K+1(ii) a The process is continuously circulated until no new clustering center appears;

when no new cluster center is present, the samples are assigned to each class according to the minimum distance principle. The improved K-means clustering algorithm based on the maximum and minimum distances has the advantages that the clustering centers are consistent during each clustering analysis, the randomness of selecting the clustering centers by the traditional K-means algorithm is eliminated, and the accuracy and the speed of the clustering analysis can be effectively improved.

Further preferably, in step S4 of the present invention, the process of mining the association between different sequences is as follows:

s4.1, setting parameters of minimum support degree and minimum confidence degree; judging the basis of sequence association and frequent item sets when the confidence coefficient and the support degree threshold value exist, wherein a proper threshold value parameter is favorable for enhancing the reliability of the association relation, and the minimum support degree threshold value of frequent-1 and frequent-2 item sets is recorded as min sup₁And min sup₂The minimum confidence threshold in the sequence association mining is min con;

s4.2, generating a frequent item set; using the summed two-signed sequence as a transaction set, denoted

Wherein

All symbol categories corresponding to the two sequences are: { A₁,A₂,…,A_CAAnd { B }₁,B₂,…,B_CBAnd obtaining a frequent item set of the sequence by scanning the transaction set in two stages based on the basic idea of an Apriori algorithm. The confidence for each symbol in the sequence is calculated according to equation (10):

wherein X and Y represent two index objects needing to mine association rules, N^tRepresenting the number of transaction sets, namely the number of elements in the sequence, representing the proportion of items in the transaction sets by the support degree, and when a frequent-1 item set is explored, the support degree is greater than min sup₁The items of (a) are divided into a set of frequent-1 item sets;

the collection of frequent-1 item sets of two sequences in the association mining is recorded as P_A、P_BPairing the items in the set according to the index parameters to form the form (P)_Ai,P_Bi) Format 2-item set, computing support of each item in the 2-item setDegree, will support more than min sup₂Is divided into a frequent-2 item set, denoted as { P_A,P_B}_freq；

S4.3, mining of sequence relevance: combining all the sequences pairwise, and respectively counting the support degree of the frequent-2 item concentrated items in the sequences and the confidence degree between the corresponding association mining sequences;

and (3) firstly, accumulating the support degrees of all frequent-2 item sets between two index parameters according to the formula (7), and taking the accumulated support degrees as the support degrees of the two parameter sequences in all multivariate sequences.

σ(X^A)＝sum(σ(P_A)) (12)

σ(X^B)＝sum(σ(P_B)) (13)

Wherein, m is CA + CB, CA and CB are the total number of the line segment categories divided after the clustering analysis of the two sequences, and m is the number of the line segment categories after the two sequences are summed; at the same time, the minimum support threshold of the index sequence layer is min sup₃If the support degree of the parameter index level is larger than the set threshold value, calculating the confidence degree con (X) of the combination of the symbol item sets in the two sequences^A→X^B) As shown in formula (14):

when the confidence is greater than the set minimum confidence threshold, the association rule X is reserved^A→X^BAnd describing the strength of the association between the two indexes by using the confidence coefficient, and judging that the two indexes have strong association.

The improved particle swarm algorithm in the step S6 mainly comprises the following steps:

s6.1, defining the number m of variables, and generating N m-dimensional particles in a feasible solution space, S^tIs the t-th generation particle in the iteration, wherein the element is

Wherein the elements are expressed as

S6.2, determining the inertial weight: the self-adaptive weight method can better find a balance point between the two, and the inertia weight is properly increased when the target values of all particles tend to be consistent; when the target values of the particles are relatively dispersed, the inertia weight value is properly reduced, and the specific expression is as follows:

wherein, w_aAnd w_zRepresenting maximum and minimum values of inertial weight, f_z,f_pjRespectively representing the fitness value of the particle, the minimum fitness value of all the particles and the average fitness value of all the particles;

s6.3, defining fuzzy inference rule input variables: the population density of the particles is expressed by Euclidean distance:

the calculation formula of the particle density is obtained:

n_iis the number of particles in the i particle population, N is the number of solution-concentrated particles generated, c_iRecording the density of the particles, normalizing the density and the current iteration times k, taking the normalized density and the current iteration times k as two input variables of the fuzzy rule, and respectively calculating the membership degrees of the fuzzy rule to different states;

s6.4, fuzzy inference rules: separately defining fuzzy sets of normalization variables for inputsLow, medium, high density, membership function expression as shown in₁-c₃Is an interval threshold of the membership function,

wherein x can be two input variables of particle density and iteration times, the calculated membership degree of particle density and the membership degree of iteration times are combined in a cross way to form a particle state fuzzy matrix K with the dimensionality of 3 multiplied by 3, and the particle state fuzzy matrix K and a vector c formed by the membership degree of particle density_lMultiplying to obtain probability vector c of particle dependent different density interval_b，

And taking the maximum density interval in which the current particles are positioned, and respectively formulating different particle updating modes.

S6.5, particle updating rule: two learning factors mu of the initialization algorithm₁、μ₂，

When c is going to_blAt maximum, the particles are solved only to the optimal direction of the particles, and the speed updating mode is as follows:

when c is going to_bmOr c_bhAnd when the maximum is reached, the algorithm is solved towards the global optimum and subgroup optimum directions, and the same updating mode as the traditional particle population algorithm is adopted.

The invention has the technical effects that a two-stage online DGA data processing method based on the thought of 'removing the difference firstly and then repairing' is provided, and the online data is equivalent to a time sequence according to the characteristics of the returned data; the method comprises the steps that the idea of a sliding window algorithm is introduced in the first stage, a piecewise linearization algorithm is used for dividing sequence data into a plurality of line segments characterized by slopes and spans, then K-means clustering based on maximum and minimum distance improvement is used for symbolizing online monitoring data, finally an APRIORI algorithm is used for mining the relevance among different indexes in DGA, and abnormal values existing in the DGA are mined; and in the second stage, according to the screened abnormal numerical sampling points, an improved particle swarm optimization support vector regression algorithm is provided, the distance between particle solution sets is defined, different types of particles are divided by using a fuzzy inference rule, different updating formulas are defined according to the different types of particles, the solving speed and solving diversity of the algorithm are guaranteed, key parameters in the support vector regression algorithm are optimized to repair the sampling points, and the processing of the on-line DGA monitoring data of the power converter transformer is realized.

Drawings

FIG. 1 is a flow chart of the present invention.

FIG. 2 is a flow chart of solving particle swarm optimization (APSO-SVR).

Figure 3 is a comparison graph of hydrogen index fit.

FIG. 4 is a comparison graph of methane indicator fit.

FIG. 5 is a graph comparing the results of the regression model.

Fig. 6 is a data repair result.

Detailed Description

The present invention will be explained in further detail with reference to examples.

Referring to fig. 1, a method for processing on-line monitoring data of a converter transformer DGA with exception removal and repair, which comprises the following steps:

s2, piecewise linearization of sequence data: since online data is usually a numerical variable, the method is not suitable for relevance mining of sequence data; combining a variable number of points in the online data together according to a model by using a piecewise linearization algorithm of sequence data to form a multi-group data point set; the data point grouping is normalized in that the error between the line segment fitted to all points and the actual data point is less than a threshold, and the fitted line segment is characterized using the slope and the line segment span of the line segment;

Specifically, in step S1, DGA online monitoring data is imported, the length of the sliding window is set to L, and the sliding step length is set to L; traversing the online data set with a sliding window of a certain step size: dragging a sliding window to slide on the whole online monitoring data set by a sliding step length l until all data are traversed; let the length of the on-line monitoring data set be L₁After traversal, get

Specifically, the specific steps of the piecewise linearization algorithm of the sequence data set forth in step S2 are:

s2.1, for monitoring data X_K＝{x₁,x₂,…,x_kAnd intercepting data points by using a window with the length of L (L < k), and carrying out piecewise linear fitting on the data points contained in the intercepted data points on the basis of the idea of a sliding window.

S2.2, taking the first data point in the window as the fitting starting point of the initial line segment, and enabling the point to be x_iAssuming that the fitting end point of the initial line segment is x_i+m(m > 1), the m +1 data points are fitted to a line segment.

s2.3, setting a fitting error threshold value to be ER_rIf ER < ER_rIf so, the line segment can still continue to increase the fitting points, and let m be m +1, and repeat the above steps; ER if any_rIf m is equal to m, the point is used as the line segment fitting end pointGenerating a line segment; if ER > ER is present_rIf the line segment can not be fitted, the fitting end point of the current line segment is stored as X_end＝X_i+m-1And recording the data sampling time, returning to the step S2.2, resetting the parameter m, and fitting the next part of data by taking the current fitting end point as the fitting starting point of the next line segment until all data points in the sequence are fitted.

Assume that the slope of the fitted line segment is k_iThe number of fitting numerical points in the line segment is m_iThen the actual growth rate of the line segment fit data can be expressed as:

with k_i，m_i，r_iThe three elements constitute a triplet (k) of line segments_i,m_i,r_i) And representing a fitted line segment by the array.

Since the piecewise linearization is a data fitting process, the quality of the fitting effect is related to the error magnitude. The present invention uses the slope k of a line segment in consideration of the characteristic properties of a general line segment_iLength of fit l_iAnd the growth rate r of the line segment_iFormed as { k }_i,l_i,r_iThe array set represents each line segment.

In particular, in step S3, during the cluster analysis, a standard for measuring the similarity of the line segments needs to be established; the DGA online monitoring data reflects real-time indexes of equipment, the change trend and the form of the parameters can reflect the change of the running state of the equipment most, the invention extracts two key parameters of the slope and the span of a line segment, describes the similarity between the line segments by using Euclidean distance and defines a similarity model of the line segment. Based on a similarity model, performing clustering analysis on the line segment set by using a K-means algorithm improved based on the maximum and minimum distances, and dividing similar line segments into the same category.

In particular, in step S3, since there is a certain order of magnitude difference between different indicators in the online DGA monitoring data, all line segment triplets existing in the same sequence need to be shaped as

The standardization operation of (2);

specifically, in step S3, during the clustering analysis, a criterion for measuring the similarity of the line segments is established; describing the similarity between line segments by using Euclidean distance, wherein the consideration degree of different attributes of the line segments is expressed in a weight mode; the established line segment similarity model is shown as the following formula:

In step S3 of the present invention, the K-means algorithm improved based on the maximum and minimum distances includes the following main steps:

D_ij＝||x_i-z_j||,j＝1,2 (6)

D_i＝min(D_i1,D_i2),i＝1,2,…,n (7)

if it is

D_i＝max{D_i}＞θ×||z_i-z₂|| (8)

Then select the corresponding sample s_iAs a third cluster center z₃；

Assuming that there are K cluster centers, the distance from the remaining n-K samples to the cluster centers is calculated, and the following steps are carried out:

D_r＝max{min(D_i1,D_i2,…D_ik)}＞θ×||z₁-z₂|| (9)

In step S4 of the present invention, the process of mining the association between different sequences is as follows:

s4.1, setting parameters of minimum support degree and minimum confidence degree; judging the basis of sequence association and frequent item sets when the confidence coefficient and the support degree threshold value exist, wherein a proper threshold value parameter is favorable for enhancing the reliability of the association relation, and the minimum support degree threshold value of frequent-1 and frequent-2 item sets is recorded as min sup₁And min sup₂The minimum confidence threshold in the sequence association mining is min con.

Wherein

All symbol categories corresponding to the two sequences are: { A₁,A₂,…,A_CAAnd { B }₁,B₂,…,B_CBBased on the basic idea of Apriori algorithm, the invention obtains a frequent item set of the sequence by scanning the transaction set in two stages. The confidence for each symbol in the sequence is calculated according to equation (10):

wherein X and Y represent two index objects needing to mine association rules, N^tRepresenting the number of transaction sets, namely the number of elements in the sequence, representing the proportion of items in the transaction sets by the support degree, and when a frequent-1 item set is explored, the support degree is greater than min sup₁The items of (a) are divided into a set of frequent-1 item sets.

The collection of frequent-1 item sets of two sequences in the association mining is recorded as P_A、P_BPairing the items in the set according to the index parameters to form the form (P)_Ai,P_Bi) Form 2-item set, calculating the support degree of each item in the 2-item set, and enabling the support degree to be greater than min sup₂Is divided into a frequent-2 item set, denoted as { P_A,P_B}_freq。

σ(X^A)＝sum(σ(P_A)) (12)

σ(X^B)＝sum(σ(P_B)) (13)

And m is CA + CB, wherein CA and CB are the total number of the line segment categories divided after the two sequences are subjected to clustering analysis, and m is the number of the line segment categories after the two sequences are subjected to the total grouping. At the same time, the minimum support threshold of the index sequence layer is min sup₃If the support degree of the parameter index level is larger than the set threshold value, calculating the symbolConfidence con (X) of item set combinations in two sequences^A→X^B) As shown in formula (14):

The invention sets minimum support threshold min sup of different levels for two sequences completing the total operation based on the idea of Apriori algorithm_iAnd continuously mining a frequent item set existing among the sequences and finally judging the strength of the association relation among the indexes.

The main idea of improving particle swarm optimization support vector regression in step S6 of the invention is as follows: and for vacant numerical points caused by deletion of abnormal values, a support vector regression algorithm for improving particle swarm optimization is provided for repairing, and in the classification and regression problems of a support vector machine, a kernel function is introduced to convert the nonlinear problem of an input space into the linear problem of a high-dimensional space, so that the complexity of the algorithm can be effectively reduced. The present invention uses a Radial Basis (RBF) kernel. In order to obtain the optimal parameters of the RBF function, the mean square error is used as a fitness function, and the parameters C and gamma of the support vector machine are optimized by using an improved particle swarm optimization.

The particle swarm optimization is easy to converge prematurely and fall into local optimization, the density of different particles is defined through the Euclidean distance between the particles in the particle iteration process of the particle swarm optimization, the particles are updated in different updating modes according to the density of clusters to which the particles belong, the convergence speed of the algorithm solution can be guaranteed, the diversity of solution sets can be guaranteed, and the local optimization is avoided. The particle swarm optimization algorithm is improved by the following main steps:

1) defining the number m of variables, generating N m-dimensional particles in a space of feasible solutions,S^tis the t-th generation particle in the iteration, wherein the element is

Wherein the elements are expressed as

2) An inertial weight is determined, which represents the magnitude of the particle's inheritance to the velocity at the last iteration. When the value is larger, the global optimizing capability of the population is stronger, and the local optimizing capability is weaker; the learning ability of the particles is strong when the value is small, and the particles can be converged to a local optimal value at a higher speed. The self-adaptive weight method can better find a balance point between the two, and the inertia weight is properly increased when the target values of all particles tend to be consistent; when the target values of the particles are relatively dispersed, the inertia weight value is properly reduced, and the specific expression is as follows:

wherein, w_aAnd w_zRepresenting maximum and minimum values of inertial weight, f_z,f_pjRespectively, the fitness value of a particle, the minimum fitness value of all particles, and the average fitness value of all particles.

3) Defining fuzzy inference rule input variables, density of population where particles are located, and expressing the distance between each particle by Euclidean distance:

the calculation formula of the particle density is obtained:

n_iis the number of particles in the i particle population, N isNumber of particles in the generated solution, c_iAnd (4) recording the density of the particles, normalizing the density and the current iteration times k, taking the normalized density and the current iteration times k as two input variables of the fuzzy rule, and respectively calculating the membership degrees of the fuzzy rule to different states.

4) The fuzzy inference rule defines low density (L), medium density (M) and high density (H) for the fuzzy set of the input normalization variables, and the expression of the membership function is shown as the following formula. c. C₁-c₃Is the interval threshold of the membership function.

Where x may be two input variables of particle density and iteration number. The calculated membership degree of the particle density and the membership degree of the iteration times are combined in a cross way to form a fuzzy matrix K of the particle state with the dimensionality of 3 multiplied by 3, and the fuzzy matrix K of the particle state and a vector c formed by the membership degree of the particle density are combined_lMultiplying to obtain probability vector c of particle dependent different density interval_b，

5) Two learning factors mu of particle updating rule and initialization algorithm₁、μ₂。

when c is going to_bmOr c_bhAnd when the maximum is reached, the algorithm is solved towards the global optimum and subgroup optimum directions, and the same updating mode as the traditional particle population algorithm is adopted. By self-optimal here is meant a locally optimal solution in a population of low density particles. The flow of the APSO-SVR algorithm is shown in FIG. 2.

Application case

1. The method comprises the steps of taking hydrogen and methane gas indexes in DGA historical online monitoring data of certain main transformer equipment as research objects, and considering that the online monitoring data of the oil chromatogram generally takes day as a sampling period; therefore, the present invention takes the number of sample points (720 points) that is approximately two years as the data window length and drags the data window across the entire historical data set with the number of sample points (90 points) that are quarterly as the step size.

2. Piecewise linearization of sequence data: the method provided by the invention is used for carrying out piecewise linearization fitting on the intercepted window sequence data, and the specific closing result of each index data is shown in the following figures 3 and 4. As can be seen from fig. 3 and 4, the indicator fitting of the online monitoring data of DGA is successful, and a line segment formed by connecting two end points represents all data points in a line segment span.

3. Constructing a model for describing the similarity of different line segments: and constructing a similarity model based on the slope and the span of the line segment, classifying the line segment by using a K-means clustering algorithm improved based on the maximum and minimum distances, giving symbols to the line segments of the same class, and completing the symbolization of the sequence data.

4. And (3) mining the relevance between different sequences: after the corresponding frequent item set is obtained, the relevance between the two indexes is analyzed by using the method provided by the invention, so that the support degree is convenient for representing the strength of the relevance relation by the confidence coefficient, and H is obtained₂→CH₄The support degree and the confidence degree of (2) are 0.5050 and 0.6804 respectively, which are both larger than the set related minimum threshold value, and indicate that the rule is a strong association rule, which indicates that a strong association relationship exists between the hydrogen and methane indexes.

5. Extracting and screening abnormal values existing in DGA online monitoring data: under the condition that the strong correlation between hydrogen and methane is known, abnormal value detection is carried out on the two index sequences, abnormal values of the hydrogen online data are found in the 42 th to 54 th, 85 th to 91 th and 201 th to 206 th numerical value sampling points, and the methane gas online data are not abnormal in the sampling time period near the points, so that the abnormal sampling points are judged to be caused by abnormal operation state of the monitoring device, and a cleaned data set is drawn as the basis for judging the operation state of the online monitoring device. And at 466 to 471 sampling points, the online monitoring data of methane is abnormal, at 466 to 473 sampling points, the online monitoring data of hydrogen is abnormal, the abnormal time periods of the online monitoring data of two indexes are similar, the index data of the nearby sampling time period is reserved and marked as the abnormal point of the running state of the equipment.

6. Improving a particle swarm optimization algorithm to optimize the key parameters of the support vector regression, and repairing the screened abnormal numerical value points: in order to verify the effectiveness of the APSO-SVR algorithm, a certain section of converter transformer online data running in a normal state is intercepted, and the data correction algorithm provided by the text is verified. A regression analysis model is constructed by respectively using a common PSO and APSO in the text, DGA online monitoring data is used as a verification object, hydrogen in the DGA online monitoring data is used as a test data set, and other four gases are used as training data sets, so that the optimization process and regression results of different models are obtained as shown in fig. 5.

By comparison of fig. 6, the prediction result of the SVR prediction model optimized by the APSO algorithm is closer to the actual value, and the relative prediction error is smaller; the effectiveness of the proposed data repair strategy herein is demonstrated. The results of on-line monitoring data of the DGA by using the support vector regression algorithm of the improved particle swarm optimization are shown in FIG. 6. It can be seen that the data points that were screened out, all values returned to normal levels after being repaired using the method herein by relying on several other characteristic gases, and the online data was effectively cleaned.

Claims

1. A method for processing the DGA online monitoring data of a converter transformer with the functions of firstly removing the difference and then repairing is characterized by comprising the following steps:

2. The method as claimed in claim 1, wherein in step S1, the DGA online monitoring data is imported and a sliding window is setThe length is L, and the sliding step length is set to be L; traversing the online data set with a sliding window of a certain step size: dragging a sliding window to slide on the whole online monitoring data set by a sliding step length l until all data are traversed; let the length of the on-line monitoring data set be L₁After traversal, get

3. The method for processing the on-line monitoring data of the DGA of the converter transformer with the exception removal and the repair steps as claimed in claim 1, wherein the step S2 provides a piecewise linearization algorithm of the sequence data, which comprises the following specific steps:

4. The method as claimed in claim 1, wherein in step S3, since the different indexes in the DGA online monitoring data have a certain order of magnitude difference, all line segment triplets existing in the same sequence need to be shaped as if they are all line segment triplets existing in the same sequence

The standardization operation of (2);

5. The method for processing the on-line monitoring data of the DGA of the converter transformer with the functions of firstly removing the difference and then repairing the converter transformer as claimed in claim 1, wherein in the step S3, the improved K-means algorithm based on the maximum and minimum distances comprises the following main steps:

D_ij＝||x_i-z_j||,j＝1,2 (6)

D_i＝min(D_i1,D_i2),i＝1,2,…,n (7)

if it is

D_i＝max{D_i}＞θ×||z_i-z₂|| (8)

Then select the corresponding sample s_iAs a third cluster center z₃；

D_r＝max{min(D_i1,D_i2,…D_ik)}＞θ×||z₁-z₂|| (9)

then the corresponding sample x_rIs the K +1 cluster center and is marked as z_K+1(ii) a And the process is continuously cycled until no new cluster centers appear.

6. The method for processing the on-line monitoring data of the DGA of the converter transformer with the functions of firstly removing the difference and then repairing the difference as claimed in claim 1, wherein in the step S4 of the present invention, the process of mining the correlation between different sequences is as follows:

Wherein

All symbol categories corresponding to the two sequences are: { A₁,A₂,…,A_CAAnd { B }₁,B₂,…,B_CBAnd on the basis of the basic idea of Apriori algorithm, obtaining a frequent item set of the sequence by scanning the transaction set in two stages, and calculating the confidence coefficient of each symbol in the sequence according to formula (10):

the collection of frequent-1 item sets of two sequences in the association mining is recorded as P_A、P_BThe items in the set are divided into two according to the index parameterTwo pairs, formed as (P)_Ai,P_Bi) Form 2-item set, calculating the support degree of each item in the 2-item set, and enabling the support degree to be greater than min sup₂Is divided into a frequent-2 item set, denoted as { P_A,P_B}_freq；

firstly, accumulating the support degrees of all frequent-2 item sets between two index parameters according to an equation (7) and taking the accumulated support degrees as the support degrees of the two parameter sequences in all multivariate sequences,

σ(X^A)＝sum(σ(P_A)) (12)

σ(X^B)＝sum(σ(P_B)) (13)

7. The method for processing the on-line monitoring data of the DGA of the converter transformer repaired after the difference is removed according to claim 1, wherein the step S6 of improving the particle swarm optimization mainly comprises the following steps:

Wherein the elements are expressed as

the calculation formula of the particle density is obtained:

s6.4, fuzzy inference rules: defining low density, medium density and high density respectively for the fuzzy set of the input normalization variables, wherein the expression of membership function is shown as the following formula c₁-c₃Is an interval threshold of the membership function,

Taking the maximum density interval in which the current particles are positioned, and respectively formulating different particle updating modes;

When c is going to_blAt maximum time, particleSolving only to the optimal direction of the self, and updating the speed: