CN114372093A - Processing method of DGA (differential global alignment) online monitoring data of transformer - Google Patents
Processing method of DGA (differential global alignment) online monitoring data of transformer Download PDFInfo
- Publication number
- CN114372093A CN114372093A CN202111534103.2A CN202111534103A CN114372093A CN 114372093 A CN114372093 A CN 114372093A CN 202111534103 A CN202111534103 A CN 202111534103A CN 114372093 A CN114372093 A CN 114372093A
- Authority
- CN
- China
- Prior art keywords
- data
- line segment
- sequence
- minimum
- algorithm
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012544 monitoring process Methods 0.000 title claims abstract description 34
- 238000003672 processing method Methods 0.000 title claims description 7
- 239000002245 particle Substances 0.000 claims abstract description 62
- 238000000034 method Methods 0.000 claims abstract description 32
- 230000002159 abnormal effect Effects 0.000 claims abstract description 24
- 238000012545 processing Methods 0.000 claims abstract description 19
- 238000005457 optimization Methods 0.000 claims abstract description 11
- 238000005070 sampling Methods 0.000 claims abstract description 7
- 238000003064 k means clustering Methods 0.000 claims abstract description 6
- 238000005065 mining Methods 0.000 claims description 22
- 238000004458 analytical method Methods 0.000 claims description 10
- 238000007621 cluster analysis Methods 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 8
- 239000008186 active pharmaceutical agent Substances 0.000 claims description 4
- 238000012217 deletion Methods 0.000 claims description 3
- 230000037430 deletion Effects 0.000 claims description 3
- 238000012216 screening Methods 0.000 claims description 2
- 101100313471 Streptomyces sp getA gene Proteins 0.000 claims 1
- 230000008439 repair process Effects 0.000 abstract description 6
- VNWKTOKETHGBQD-UHFFFAOYSA-N methane Chemical compound C VNWKTOKETHGBQD-UHFFFAOYSA-N 0.000 description 7
- 239000000284 extract Substances 0.000 description 5
- UFHFLCQGNIYNRP-UHFFFAOYSA-N Hydrogen Chemical compound [H][H] UFHFLCQGNIYNRP-UHFFFAOYSA-N 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 4
- 230000008859 change Effects 0.000 description 4
- 239000001257 hydrogen Substances 0.000 description 4
- 229910052739 hydrogen Inorganic materials 0.000 description 4
- 238000011156 evaluation Methods 0.000 description 3
- 239000007789 gas Substances 0.000 description 3
- 238000012423 maintenance Methods 0.000 description 3
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000004587 chromatography analysis Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000007774 longterm Effects 0.000 description 2
- 230000000717 retained effect Effects 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- XEEYBQQBJWHFJM-UHFFFAOYSA-N Iron Chemical group [Fe] XEEYBQQBJWHFJM-UHFFFAOYSA-N 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000010219 correlation analysis Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 238000009413 insulation Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012806 monitoring device Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/10—Machine learning using kernel methods, e.g. support vector machines [SVM]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Mathematical Physics (AREA)
- Business, Economics & Management (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computational Linguistics (AREA)
- Economics (AREA)
- Computing Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Molecular Biology (AREA)
- Water Supply & Treatment (AREA)
- Biomedical Technology (AREA)
- Fuzzy Systems (AREA)
- Quality & Reliability (AREA)
- Medical Informatics (AREA)
- Public Health (AREA)
- Biophysics (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
本发明提出一种变压器DGA在线监测数据的处理方法,根据传回数据的特征,将在线数据等效为时间序列;第一阶段引入滑动窗口算法的思想,提出一种改进的序列分段线性化算法,将序列数据划分为若干由斜率与跨度表征的线段,再使用基于改进的K‑means聚类将在线监测数据符号化,最后使用APRIORI算法挖掘DGA中不同指标之间的关联性,并以此发掘其中存在的异常数值;第二阶段,根据筛除的异常数值采样点,使用改进的粒子群优化的支持向量回归算法,保障算法的求解速度与求解多样性,优化支持向量回归算法中的关键参数对这些采样点进行修复,以此完成变压器在线DGA监测数据的处理。
The invention proposes a method for processing transformer DGA online monitoring data. According to the characteristics of the returned data, the online data is equivalent to a time series; the idea of sliding window algorithm is introduced in the first stage, and an improved sequence piecewise linearization is proposed. The algorithm divides the sequence data into several line segments characterized by slope and span, and then uses the improved K-means clustering to symbolize the online monitoring data. Finally, the APRIORI algorithm is used to mine the correlation between different indicators in the DGA, and the In the second stage, according to the filtered abnormal value sampling points, the improved support vector regression algorithm of particle swarm optimization is used to ensure the solution speed and solution diversity of the algorithm, and optimize the support vector regression algorithm. The key parameters repair these sampling points to complete the processing of transformer online DGA monitoring data.
Description
技术领域technical field
本发明涉及一种变压器DGA在线监测数据的处理方法,属于电力设备数据清洗领域。The invention relates to a method for processing transformer DGA online monitoring data, which belongs to the field of data cleaning of power equipment.
背景技术Background technique
电力变压器是电能转换与传输的枢纽设备,其安全稳定的运行是对用户供电质量的重要保障。变压器的DGA指标在线数据是对设备绝缘性能的实时监测,基于油色谱数据的分析,可以快速得出变压器所处的实时状态;同时DGA数据中指标维度较多,通过对其中指标的关联关系挖掘,有助于甄别在线数据中不同异常模式的数据,可以增强设备综合状态评价结果的可信度。Power transformers are pivotal equipment for power conversion and transmission, and their safe and stable operation is an important guarantee for the quality of power supply to users. The online data of the DGA indicators of the transformer is the real-time monitoring of the insulation performance of the equipment. Based on the analysis of the oil chromatography data, the real-time status of the transformer can be quickly obtained; at the same time, there are many indicators in the DGA data. , which helps to identify data with different abnormal patterns in online data, and can enhance the credibility of the comprehensive state evaluation results of the equipment.
由于设备所处运行环境以及变压器本身存在的一些电磁干扰作用,在线监测装置在数据的采集传输过程中容易出现随机分布的异常数值点,严重时甚至出现数据漂移,传输中断的情况。对数据漂移、数据中断等明显数据异常现象,后台系统可以很快的进行辨别,并针对问题进行报警;但对于那些随机分布于正常在线数据中的异常数值点,对设备状态指标的实时表征起到严重的干扰作用,也对基于指标的状态评价工作产生影响,容易造成设备异常状态的误报、错报等情况,导致设备的运行检修资源的浪费。Due to the operating environment of the equipment and some electromagnetic interference effects of the transformer itself, the online monitoring device is prone to randomly distributed abnormal value points in the process of data acquisition and transmission, and even data drift and transmission interruption in severe cases. For obvious data anomalies such as data drift and data interruption, the back-end system can quickly identify the problem and issue an alarm for the problem; but for those abnormal numerical points randomly distributed in normal online data, the real-time characterization of equipment status indicators is effective. It also affects the status evaluation work based on indicators, and it is easy to cause false alarms and false alarms of abnormal equipment status, resulting in a waste of equipment operation and maintenance resources.
电力变压器是保证输配电网稳定运行的重要设备,变压器的铁芯接地电流监测数据是对变压器进行状态评估的重要依据。一段时间的监测数据,包含其整体变化趋势、变化中的极值点及跃变点以及数据统计特征,可以从多方面反映电力变压器的内部可能存在的异常情况。Power transformers are important equipment to ensure the stable operation of the power transmission and distribution network. The monitoring data of the transformer's iron core grounding current is an important basis for evaluating the status of the transformer. The monitoring data for a period of time, including its overall change trend, extreme points and jump points in the change, and statistical characteristics of the data, can reflect the possible abnormal conditions inside the power transformer from many aspects.
经过电力设备的长期运行,已有较大规模的指标数据存储于电力数据库中,其中必然包含不同异常模式的指标数据,通过对已有的指标数据进行关联分析,挖掘出其中存在的关联关系,基于该关联关系分析数据中不同异常模式的数据,并对这些数据进行有效的修复,有利完善电力设备的综合状态评价体系,提早发现设备装置的异常状态,提高设备检修效率,降低设备的运维成本。After the long-term operation of power equipment, large-scale index data has been stored in the power database, which must contain index data of different abnormal patterns. Analyzing data of different abnormal patterns in the data based on this relationship, and effectively repairing these data, is conducive to improving the comprehensive state evaluation system of power equipment, discovering abnormal states of equipment devices early, improving equipment maintenance efficiency, and reducing equipment operation and maintenance. cost.
发明内容SUMMARY OF THE INVENTION
本发明的目的在于提供一种变压器DGA在线监测数据的处理方法,以解决上述背景技术的问题。The purpose of the present invention is to provide a method for processing transformer DGA online monitoring data, so as to solve the above-mentioned problems of the background technology.
本发明通过以下技术方案来实现,一种变压器DGA在线监测数据的处理方法,包括如下步骤:The present invention is achieved through the following technical solutions, a method for processing transformer DGA online monitoring data, comprising the following steps:
S1、数据集的滑动窗口处理:引入滑动窗口的思想,使用长度为L的窗口截取在线数据集;S1. Sliding window processing of datasets: The idea of sliding windows is introduced, and a window of length L is used to intercept online datasets;
S2、以一定的步长滑动窗口遍历在线数据集:设置滑动步长为l,拖动窗口在整体数据集上滑动,直至遍历所有数据;令在线数据集长度为L1,遍历之后得到个数据窗口,导出所有窗口中的数据,构成待分析数据集DSi,i∈n;S2. Traverse the online dataset by sliding the window with a certain step size: set the sliding step size to l, drag the window to slide on the overall dataset until all data is traversed; let the length of the online dataset be L 1 , and get the result after traversing data windows, export the data in all windows to form the data set DS i to be analyzed, i∈n;
S3、序列数据的分段线性化:提出序列数据的分段线性化算法,将在线数据中不定量的点组合在一起,形成多组数据点集;数据点的分组的标准在于其中所有点拟合出的线段与实际数据点之间的误差小于阈值,且使用的线段的斜率与线段跨度表征拟合出的线段;S3. Piecewise linearization of sequence data: A piecewise linearization algorithm of sequence data is proposed, which combines indeterminate points in online data to form multiple sets of data points; the standard for grouping data points is that all points are The error between the combined line segment and the actual data point is less than the threshold, and the slope of the line segment and the line segment span are used to characterize the fitted line segment;
S4、构建描述不同线段相似度的模型:基于线段的斜率与跨度构建相似度模型,并使用基于最大最小距离改进的K-means聚类算法为线段划分类别,并为同类别线段赋予符号,完成序列数据的符号化;S4. Build a model describing the similarity of different line segments: build a similarity model based on the slope and span of the line segment, and use the improved K-means clustering algorithm based on the maximum and minimum distances to classify the line segments, and assign symbols to the same type of line segments, complete Symbolization of sequence data;
S5、挖掘不同序列之间的关联性:基于Apriori算法的思想,设置最小置信度与支持度,挖掘不同序列之间存在的频繁项集,量化不同序列之间的关联性;S5. Mining the correlation between different sequences: Based on the idea of Apriori algorithm, set the minimum confidence and support, mine the frequent itemsets existing between different sequences, and quantify the correlation between different sequences;
S6、提取筛除DGA在线监测数据中存在的异常值:根据序列之间的关联性强弱,对判定数据中存在的异常数值类型,分离出不同异常模式的数据;S6. Extract and screen out the abnormal values existing in the DGA online monitoring data: according to the strength of the correlation between the sequences, to determine the types of abnormal values existing in the data, separate data with different abnormal patterns;
S7、改进粒子群优化支持向量回归:定义粒子解集之间的距离,基于该距离计算不同粒子所处的密度,并根据密度定义粒子的更新方式;使用算法优化支持向量回归的关键参数,完成DGA在线数据的处理。S7. Improve particle swarm optimization support vector regression: define the distance between particle solution sets, calculate the density of different particles based on the distance, and define the update method of particles according to the density; use the algorithm to optimize the key parameters of support vector regression, complete Processing of DGA Online Data.
进一步的,S3中提出的序列数据的分段线性化算法的具体步骤是:Further, the specific steps of the piecewise linearization algorithm for sequence data proposed in S3 are:
1)对于类似DGA的设备指标在线监测数据,等效为时间序列数据;1) For online monitoring data of equipment indicators similar to DGA, it is equivalent to time series data;
2)对时间序列XK={x1,x2,…,xk},以长度为L(L<k)的窗口截取数据点,对截取窗口内的数据,基于滑动窗口的思想,对其中含有的数据点进行分段线性拟合;2) For the time series X K ={x 1 ,x 2 ,...,x k }, use a window of length L (L<k) to intercept data points, and for the data in the intercepted window, based on the idea of sliding windows, to The data points contained in it are subjected to piecewise linear fitting;
3)以窗口内的首个数据点为初始线段的拟合起点,令该点为xi,假设初始线段的拟合终点为xi+m(m>1),将这m+1个数据点拟合为一条线段;3) Take the first data point in the window as the fitting starting point of the initial line segment, let this point be xi , and assuming that the fitting end point of the initial line segment is xi+m (m>1), use this m+1 data points are fitted to a line segment;
4)那么对于这样一条线段,用如下所示的式子表达:4) Then for such a line segment, use the following formula to express:
my-(Xi+m-1-Xi)X-(m-1)Xi+Xi+m-1=0 (2)my-(X i+m-1 -X i )X-(m-1)X i +X i+m-1 =0 (2)
以实际数据点至拟合线段的距离作为拟合误差;计算拟合线段步长内所有实际数据点至线段的距离,以其之和作为该线段的拟合整体误差ER:Take the distance from the actual data point to the fitted line segment as the fitting error; calculate the distance from all actual data points to the line segment within the step size of the fitted line segment, and use the sum as the overall fitting error ER of the line segment:
5)设置拟合误差阈值为ERr,如果ER<ERr,则说明该线段仍然可以继续增加拟合点,令m=m+1,并重复上述步骤;如果有ER>ERr,则判定该线段无法拟合,保存当前线段的拟合终点为Xend=Xi+m-1,记录其数据采样时刻,之后回到步骤3),重置参数m,并以当前拟合终点作为下一线段的拟合起点进行下一部分的数据拟合,直至该序列中所有的数据点都拟合完毕。5) Set the fitting error threshold to ER r , if ER < ER r , it means that the line segment can continue to add fitting points, let m=m+1, and repeat the above steps; if there is ER > ER r , then determine The line segment cannot be fitted, save the fitting end point of the current line segment as X end =X i+m-1 , record its data sampling time, and then return to step 3), reset the parameter m, and use the current fitting end point as the lower The fitting starting point of a line segment performs the next part of data fitting until all data points in the series are fitted.
进一步的,S4中构建相似度模型,并基于此模型进行聚类分析的主要步骤是:Further, the main steps of constructing a similarity model in S4 and performing cluster analysis based on this model are:
1)对同一序列中存在的所有线段属性进行形如的标准化操作;1) Form the attributes of all line segments existing in the same sequence as standardized operations;
2)在聚类分析时,建立衡量线段相似度的标准;提取线段的斜率与跨度两个关键参数,使用欧式距离描述线段之间的相似度,在其中以权重的方式表示对线段不同属性的考虑程度;建立的线段相似度模型如下式所示:2) In the cluster analysis, establish a standard for measuring the similarity of line segments; extract two key parameters, the slope and the span of the line segment, use the Euclidean distance to describe the similarity between the line segments, and express the difference between the different attributes of the line segment in the form of weights. Consider the degree; the established line segment similarity model is as follows:
3)基于上述的线段相似度模型,对线段集合使用基于最大最小距离改进的K-means算法进行聚类分析,将相似的线段划分为同一类别。3) Based on the above-mentioned line segment similarity model, clustering analysis is performed on the line segment set using the improved K-means algorithm based on the maximum and minimum distances, and the similar line segments are divided into the same category.
进一步的,S4中基于最大最小距离改进的K-means算法,其主要步骤是:Further, the main steps of the improved K-means algorithm based on the maximum and minimum distances in S4 are:
1)最大最小距离同样以欧式距离为基础,其与K-means算法不同之处在于其取尽量远的对象作为聚类中心;对于样本集,给定一比例系数θ(0<θ<1),任取样本集sn中的任一样本为初始聚类中心,记为z1;1) The maximum and minimum distances are also based on the Euclidean distance, which differs from the K-means algorithm in that it takes the object as far away as possible as the cluster center; for the sample set, a proportional coefficient θ (0<θ<1) is given. , any sample in the sample set sn is taken as the initial cluster center, denoted as z 1 ;
2)任取剩下n-1个样本中距离z1最远的样本为第二个聚类中心,记为z2;2) arbitrarily take the sample farthest from z 1 in the remaining n-1 samples as the second cluster center, denoted as z 2 ;
3)计算剩下n-2个样本与z1与z2的距离,并求出其中最小值,即:3) Calculate the distance between the remaining n-2 samples and z 1 and z 2 , and find the minimum value, namely:
Dij=||xi-zj||,j=1,2 (6)D ij =||x i -z j ||,j=1,2 (6)
Di=min(Di1,Di2),i=1,2,…,n (7)D i =min(D i1 ,D i2 ),i=1,2,...,n (7)
4)若4) If
Di=max{Di}>θ×||zi-z2|| (8)D i =max{D i }>θ×||z i -z 2 || (8)
则选取对应样本si作为第三个聚类中心z3;Then select the corresponding sample si as the third cluster center z 3 ;
5)假设有K个聚类中心,以此计算剩下的n-K个样本至聚类中心的距离,并有:5) Assuming there are K cluster centers, calculate the distances from the remaining n-K samples to the cluster centers, and have:
Dr=max{min(Di1,Di2,…Dik)}>θ×||z1-z2|| (9)D r =max{min(D i1 ,D i2 ,...D ik )}>θ×||z 1 -z 2 || (9)
则对应的样本xr为第K+1个聚类中心,记为zK+1;并不断循环这个过程,直至没有新的聚类中心出现;Then the corresponding sample x r is the K+1th cluster center, denoted as z K+1 ; and this process is repeated continuously until no new cluster center appears;
6)当没有新的聚类中心出现时,将样本按最小距离原则分配至各类中。6) When no new cluster centers appear, assign the samples to each category according to the principle of minimum distance.
进一步的,S5中序列关联性挖掘的主要过程为:Further, the main process of sequence correlation mining in S5 is:
1)最小支持度与最小置信度参数的设置;置信度与支持度阈值是判定序列关联与频繁项集的基础,记频繁-1与频繁-2项集的最小支持度度阈值为minsup1与minsup2,序列关联挖掘中的最小置信度阈值为mincon;1) The setting of the minimum support and minimum confidence parameters; the confidence and support thresholds are the basis for determining the sequence association and frequent itemsets, and the minimum support thresholds for frequent-1 and frequent-2 itemsets are minsup 1 and minsup 2 , the minimum confidence threshold in sequence association mining is mincon;
2)频繁项集的生成;使用经过归总之后的两符号化序列作为事务集,记为其中两序列对应的所有符号类别为:{A1,A2,…,ACA}和{B1,B2,…,BCB},基于Apriori算法的基本思想,通过对事务集的两阶段扫描,得到序列的频繁项集;根据式(10)计算序列中每个符号的置信度:2) Generation of frequent itemsets; use the two symbolized sequences after summarization as transaction sets, denoted as in All symbol categories corresponding to the two sequences are: {A 1 ,A 2 ,…,A CA } and {B 1 ,B 2 ,…,B CB }, based on the basic idea of the Apriori algorithm, through two-stage scanning of the transaction set , get the frequent itemsets of the sequence; calculate the confidence of each symbol in the sequence according to formula (10):
式中Nt表示事务集的个数,即序列中元素的个数,支持度表示的是项在事务集中的占比程度,在发掘频繁-1项集时,将支持度大于minsup1的项划分至频繁-1项集的集合中;In the formula, N t represents the number of transaction sets, that is, the number of elements in the sequence, and the support degree represents the proportion of items in the transaction set. When excavating frequent-1 itemsets, the items whose support degree is greater than minsup 1 are selected. Divide into a set of frequent-1 itemsets;
记关联挖掘中两序列的频繁-1项集的集合分别为PA、PB,根据指标参数将集合中的项两两配对,构成形如(PAi,PBi)形式2-项集,计算每个项在该2-项集中的支持度,将支持度大于minsup2的项划分至频繁-2项集,记为{PA,PB}freq;Denote the sets of frequent-1 itemsets of two sequences in association mining as P A and P B respectively. According to the index parameters, the items in the sets are paired in pairs to form 2-itemsets of the form (P Ai , P Bi ), Calculate the support degree of each item in the 2-item set, and divide the items whose support degree is greater than minsup 2 into frequent-2 itemsets, denoted as {P A , P B } freq ;
3)序列关联性的挖掘;将所有序列进行两两组合,分别统计其中存在的频繁-2项集中项的支持度以及对应关联挖掘序列之间的置信度;3) Mining of sequence associations; all sequences are combined in pairs, and the support of items in the frequent-2 item set and the confidence between the corresponding association mining sequences are counted respectively;
根据式(11)对所有频繁-2项集在两指标参数之间的支持度累加,并以此作为这两个参数序列在所有多元序列中的支持度计数;According to formula (11), the support degrees of all frequent-2 itemsets between the two index parameters are accumulated, and the support degrees of these two parameter sequences in all multivariate sequences are counted;
σ(XA)=sum(σ(PA)) (12)σ(X A )=sum(σ(P A )) (12)
σ(XB)=sum(σ(PB)) (13)σ(X B )=sum(σ(P B )) (13)
其中m=CA+CB,为对两序列聚类分析之后的所划分出的线段类别总数;同时记指标序列层面的最小支持度阈值为minsup3,若参数指标层面的支持度大于设置的阈值,则计算符号项集组合在两序列中的置信度con(XA→XB),如式(14)所示:Where m=CA+CB, is the total number of line segment categories divided after the cluster analysis of the two sequences; at the same time, the minimum support threshold at the index sequence level is minsup 3 , if the support at the parameter index level is greater than the set threshold, Then calculate the confidence degree con(X A →X B ) of the combination of symbolic itemsets in the two sequences, as shown in formula (14):
当置信度大于所设置的最小置信度阈值时,保留关联规则XA→XB,使用置信度描述两指标之间的关联强度,判定两指标存在强关联。When the confidence is greater than the set minimum confidence threshold, the association rule X A → X B is retained, the confidence is used to describe the strength of the association between the two indicators, and it is determined that there is a strong correlation between the two indicators.
进一步的,S7中的改进的粒子群优化支持向量回归,其主要为:对于由于异常值删除而导致的空缺数值点,使用改进粒子群优化的支持向量回归算法进行修复;主要步骤如下:Further, the improved particle swarm optimization support vector regression in S7 is mainly: for the vacant numerical points caused by the deletion of outliers, use the improved particle swarm optimization support vector regression algorithm to repair; the main steps are as follows:
1)明确变量个数m,在可行解的空间中生成N个m维的粒子,St为迭代中的第t代粒子,其中元素为其中元素表达为 1) Specify the number of variables m, and generate N m-dimensional particles in the space of feasible solutions. S t is the t-th generation particle in the iteration, where the elements are where the elements are expressed as
2)确定惯性权重,具体其表达式为:2) Determine the inertia weight, and its specific expression is:
其中,wa和wz代表惯性权重的最大值和最小值,f,fz,fpj分别表示粒子的适应度值、所有粒子的最小适应度值,所有粒子的平均适应度值。Among them, w a and w z represent the maximum and minimum values of inertia weights, f, f z , and f pj respectively represent the fitness value of the particle, the minimum fitness value of all particles, and the average fitness value of all particles.
3)划分粒子种群的类型,以欧式距离表示每个粒子之间的距离:3) Divide the type of particle population, and express the distance between each particle by Euclidean distance:
定义一个标准距离:Define a standard distance:
式中,r为划分半径,计算i粒子的密度ci:In the formula, r is the dividing radius, and the density c i of the i particle is calculated:
ni为i粒子群落中粒子数目,N为生成的解集中粒子数目。n i is the number of particles in the i particle community, and N is the number of particles in the generated solution set.
4)粒子根据所属种群的类别,初始化算法的两个学习因子μ1、μ2;当粒子密度ci大于一定阈值时,更新方式为:4) The particles initialize the two learning factors μ 1 and μ 2 of the algorithm according to the category of the population they belong to; when the particle density c i is greater than a certain threshold, the update method is:
当粒子密度ci小于一定阈值时,更新方式为:When the particle density c i is less than a certain threshold, the update method is:
本发明的有益效果是:The beneficial effects of the present invention are:
通过序列分段以及关联分析算法挖掘变压器油色谱不同指标之间的关联性,分辨变压器DGA在线监测数据中的异常点,并根据回归算法对这些异常点进行修复,有效提高变压器DGA在线检测数据处理速度。Through sequence segmentation and correlation analysis algorithm, the correlation between different indicators of transformer oil chromatography is mined, the abnormal points in the transformer DGA online monitoring data are identified, and these abnormal points are repaired according to the regression algorithm, which effectively improves the transformer DGA online detection data processing. speed.
附图说明Description of drawings
图1是本发明方法流程图;Fig. 1 is the flow chart of the method of the present invention;
图2是序列分段算法流程图;Fig. 2 is the sequence segmentation algorithm flow chart;
图3是改进粒子群算法求解流程图;Fig. 3 is the solution flow chart of the improved particle swarm algorithm;
图4是氢气指标拟合对比图;Figure 4 is a hydrogen index fitting comparison diagram;
图5是甲烷指标拟合对比图;Fig. 5 is the methane index fitting comparison chart;
图6是氢气与甲烷序列检测出的异常点;Figure 6 is the abnormal point detected by the hydrogen and methane sequence;
图7是数据修复结果图;Fig. 7 is a data repair result diagram;
具体实施方式Detailed ways
下面结合实施例和附图对本发明的一种变压器DGA在线监测数据的处理方法做出详细说明。A method for processing transformer DGA on-line monitoring data of the present invention will be described in detail below with reference to the embodiments and the accompanying drawings.
一种变压器DGA在线监测数据的处理方法,如图1所示,包括如下步骤:A method for processing transformer DGA online monitoring data, as shown in Figure 1, includes the following steps:
S1、DGA在线数据的导入与滑动窗口算法的基本参数设置:在线监测数据的意义在于对设备指标的实时反映,设备经过长期的运行,其在线数据集的规模普遍较为庞大,对数据集整体进行分析复杂程度较高而不具备可行性,且在线数据具有时效性,即在分析某采样点时,距离改点越近的采样点对分析的意义越大,反之越小。本发明引入滑动窗口的思想,使用长度为L的窗口截取在线数据集,对窗口内数据的进行分析以降低过程的复杂程度。S1. Import of DGA online data and basic parameter setting of sliding window algorithm: The significance of online monitoring data lies in the real-time reflection of equipment indicators. After long-term operation of equipment, the scale of online data sets is generally relatively large. The analysis is complex and unfeasible, and the online data is time-sensitive, that is, when analyzing a sampling point, the closer the sampling point is to the changed point, the greater the significance of the analysis, and vice versa. The invention introduces the idea of sliding window, uses a window of length L to intercept the online data set, and analyzes the data in the window to reduce the complexity of the process.
S2、以一定的步长滑动窗口遍历在线数据集:设置滑动步长为l,拖动窗口在整体数据集上滑动,直至遍历所有数据;令在线数据集长度为L1,遍历之后得到个数据窗口,导出所有窗口中的数据,构成待分析数据集DSi,i∈n。S2. Traverse the online dataset by sliding the window with a certain step size: set the sliding step size to l, drag the window to slide on the overall dataset until all data is traversed; let the length of the online dataset be L 1 , and get the result after traversing There are data windows, and the data in all windows are exported to form the data set DS i to be analyzed, i∈n.
S3、序列数据的分段线性化:由于在线数据通常为数值型变量,不适用于序列数据的关联性挖掘;本发明提出一种序列数据的分段线性化算法,根据模型将在线数据中不定量的点组合在一起,形成多组数据点集;数据点的分组的标准在于其中所有点拟合出的线段与实际数据点之间的误差小于阈值,且使用的线段的斜率与线段跨度表征拟合出的线段。S3. Piecewise linearization of sequence data: since online data is usually a numerical variable, it is not suitable for correlation mining of sequence data; the present invention proposes a piecewise linearization algorithm of sequence data, which will Quantitative points are combined to form multiple sets of data points; the standard of grouping of data points is that the error between the line segment fitted by all points and the actual data point is less than the threshold, and the slope of the line segment and the line segment span are used to characterize The fitted line segment.
S4、构建描述不同线段相似度的模型:基于线段的斜率与跨度构建相似度模型,并使用基于最大最小距离改进的K-means聚类算法为线段划分类别,并为同类别线段赋予符号,完成序列数据的符号化。S4. Build a model describing the similarity of different line segments: build a similarity model based on the slope and span of the line segment, and use the improved K-means clustering algorithm based on the maximum and minimum distances to classify the line segments, and assign symbols to the same type of line segments, complete Symbolization of sequence data.
S5、挖掘不同序列之间的关联性:基于Apriori算法的思想,设置最小置信度与支持度,挖掘不同序列之间存在的频繁项集,量化不同序列之间的关联性。S5. Mining the correlation between different sequences: Based on the idea of Apriori algorithm, set the minimum confidence and support, mine the frequent itemsets existing between different sequences, and quantify the correlation between different sequences.
S6、提取筛除DGA在线监测数据中存在的异常值:根据序列之间的关联性强弱,对判定数据中存在的异常数值类型,分离出不同异常模式的数据。S6. Extract and screen out the abnormal values existing in the DGA online monitoring data: According to the strength of the correlation between the sequences, the types of abnormal values existing in the judgment data are separated, and the data of different abnormal patterns are separated.
S7、改进粒子群优化支持向量回归:定义粒子解集之间的距离,基于该距离计算不同粒子所处的密度,并根据密度定义粒子的更新方式,以提高算法的求解速度与求解的多样性;使用算法优化支持向量回归的关键参数,提高数据回归精度,完成DGA在线数据的处理。S7. Improve particle swarm optimization support vector regression: define the distance between particle solution sets, calculate the density of different particles based on the distance, and define the update method of particles according to the density, so as to improve the solution speed of the algorithm and the diversity of solutions ; Use algorithms to optimize key parameters of support vector regression, improve data regression accuracy, and complete DGA online data processing.
本发明方法所研究的对象为某主变设备的DGA在线监测数据。The object studied by the method of the present invention is the DGA online monitoring data of a certain main transformer equipment.
如图2所示,S3中提出的序列数据的分段线性化算法的具体步骤是:As shown in Figure 2, the specific steps of the piecewise linearization algorithm for sequence data proposed in S3 are:
1)对于类似DGA的设备指标在线监测数据,其本质可以看作为按着一定的时间间隔顺序,一个个采集的状态指标数值。可知数据具有很强的时间属性,可以等效为时间序列数据。1) For the online monitoring data of equipment indicators similar to DGA, its essence can be regarded as the state indicator values collected one by one according to a certain time interval sequence. It can be seen that the data has a strong time attribute and can be equivalent to time series data.
2)对时间序列XK={x1,x2,…,xk},以长度为L(L<k)的窗口截取数据点,对截取窗口内的数据,基于滑动窗口的思想,对其中含有的数据点进行分段线性拟合。2) For the time series X K ={x 1 ,x 2 ,...,x k }, use a window of length L (L<k) to intercept data points, and for the data in the intercepted window, based on the idea of sliding windows, to The data points contained therein are fitted with a piecewise linear fit.
3)以窗口内的首个数据点为初始线段的拟合起点,令该点为xi,假设初始线段的拟合终点为xi+m(m>1),将这m+1个数据点拟合为一条线段。3) Take the first data point in the window as the fitting starting point of the initial line segment, let this point be xi , and assuming that the fitting end point of the initial line segment is xi+m (m>1), use this m+1 data Points fit as a line segment.
4)那么对于这样一条线段,其可以用如下所示的式子表达:4) Then for such a line segment, it can be expressed by the following formula:
my-(Xi+m-1-Xi)X-(m-1)Xi+Xi+m-1=0 (2)my-(X i+m-1 -X i )X-(m-1)X i +X i+m-1 =0 (2)
以实际数据点至拟合线段的距离作为拟合误差,提高拟合线段对实际数值点的拟合准确度;计算拟合线段步长内所有实际数据点至线段的距离,以其之和作为该线段的拟合整体误差ER:Use the distance from the actual data point to the fitting line segment as the fitting error to improve the fitting accuracy of the fitting line segment to the actual value point; The fitted overall error ER for this line segment:
5)设置拟合误差阈值为ERr,如果ER<ERr,则说明该线段仍然可以继续增加拟合点,令m=m+1,并重复上述步骤;如果有ER>ERr,则判定该线段无法拟合,保存当前线段的拟合终点为Xend=Xi+m-1,记录其数据采样时刻,之后回到步骤3),重置参数m,并以当前拟合终点作为下一线段的拟合起点进行下一部分的数据拟合,直至该序列中所有的数据点都拟合完毕。5) Set the fitting error threshold to ER r , if ER < ER r , it means that the line segment can continue to add fitting points, let m=m+1, and repeat the above steps; if there is ER > ER r , then determine The line segment cannot be fitted, save the fitting end point of the current line segment as X end =X i+m-1 , record its data sampling time, and then return to step 3), reset the parameter m, and use the current fitting end point as the lower The fitting starting point of a line segment performs the next part of data fitting until all data points in the series are fitted.
S4中构建相似度模型,并基于此模型进行聚类分析的主要步骤是:The main steps to build a similarity model in S4 and perform cluster analysis based on this model are:
1)由于DGA在线监测中不同指标之间存在一定的数量级差异,首先需要对同一序列中存在的所有线段属性进行形如的标准化操作。1) Since there are certain order of magnitude differences between different indicators in DGA online monitoring, it is first necessary to form the attributes of all line segments existing in the same sequence as standardized operation.
2)在聚类分析时,需要建立衡量线段相似度的标准;DGA在线数据反映的是设备实时指标,而其中参数的变化趋势和形态最能体现设备运行状态的变化,因此,在建立衡量线段相似度模型时,对线段不同属性需要有不同的考虑,本发明提取线段的斜率与跨度两个关键参数,使用欧式距离描述线段之间的相似度,在其中以权重的方式表示对线段不同属性的考虑程度;建立的线段相似度模型如下式所示:2) In cluster analysis, it is necessary to establish a standard for measuring the similarity of line segments; DGA online data reflects the real-time indicators of equipment, and the change trend and shape of parameters can best reflect the change of equipment operating status. Therefore, when establishing a measurement line segment In the similarity model, different attributes of line segments need to be considered differently. The present invention extracts two key parameters, the slope and the span of the line segment, and uses the Euclidean distance to describe the similarity between the line segments, in which the different attributes of the line segments are expressed in the form of weights. The degree of consideration; the established line segment similarity model is as follows:
3)基于上述的线段相似度模型,对线段集合使用基于最大最小距离改进的K-means算法进行聚类分析,将相似的线段划分为同一类别。3) Based on the above-mentioned line segment similarity model, clustering analysis is performed on the line segment set using the improved K-means algorithm based on the maximum and minimum distances, and the similar line segments are divided into the same category.
S4中基于最大最小距离改进的K-means算法,其主要步骤是:The main steps of the improved K-means algorithm based on the maximum and minimum distance in S4 are:
1)最大最小距离同样以欧式距离为基础,其与K-means算法不同之处在于其取尽量远的对象作为聚类中心;对于样本集,给定一比例系数θ(0<θ<1),任取样本集sn中的任一样本为初始聚类中心,记为z1;1) The maximum and minimum distances are also based on the Euclidean distance, which differs from the K-means algorithm in that it takes the object as far away as possible as the cluster center; for the sample set, a proportional coefficient θ (0<θ<1) is given. , any sample in the sample set sn is taken as the initial cluster center, denoted as z 1 ;
2)任取剩下n-1个样本中距离z1最远的样本为第二个聚类中心,记为z2;2) arbitrarily take the sample farthest from z 1 in the remaining n-1 samples as the second cluster center, denoted as z 2 ;
3)计算剩下n-2个样本与z1与z2的距离,并求出其中最小值,即:3) Calculate the distance between the remaining n-2 samples and z 1 and z 2 , and find the minimum value, namely:
Dij=||xi-zj||,j=1,2 (6)D ij =||x i -z j ||,j=1,2 (6)
Di=min(Di1,Di2),i=1,2,…,n (7)D i =min(D i1 ,D i2 ),i=1,2,...,n (7)
4)若4) If
Di=max{Di}>θ×||zi-z2|| (8)D i =max{D i }>θ×||z i -z 2 || (8)
则选取对应样本si作为第三个聚类中心z3;Then select the corresponding sample si as the third cluster center z 3 ;
5)假设有K个聚类中心,以此计算剩下的n-K个样本至聚类中心的距离,并有:5) Assuming there are K cluster centers, calculate the distances from the remaining n-K samples to the cluster centers, and have:
Dr=max{min(Di1,Di2,…Dik)}>θ×||z1-z2|| (9)D r =max{min(D i1 ,D i2 ,...D ik )}>θ×||z 1 -z 2 || (9)
则对应的样本xr为第K+1个聚类中心,记为zK+1;并不断循环这个过程,直至没有新的聚类中心出现;Then the corresponding sample x r is the K+1th cluster center, denoted as z K+1 ; and this process is repeated continuously until no new cluster center appears;
6)当没有新的聚类中心出现时,将样本按最小距离原则分配至各类中。基于最大最小距离改进的K-means聚类算法其优势在于保证了每次聚类分析时聚类中心一致,去除了传统K-means算法选取聚类中心的随机性,能有效提高聚类分析的准确度与速度。6) When no new cluster centers appear, assign the samples to each category according to the principle of minimum distance. The advantage of the improved K-means clustering algorithm based on the maximum and minimum distance is that it ensures that the cluster centers are consistent in each clustering analysis, removes the randomness of the traditional K-means algorithm to select the clustering centers, and can effectively improve the efficiency of clustering analysis. Accuracy and Speed.
S5中序列关联性挖掘的主要过程为:The main process of sequence correlation mining in S5 is as follows:
1)最小支持度与最小置信度参数的设置;置信度与支持度阈值是判定序列关联与频繁项集的基础,合适的阈值参数有利于增强关联关系的可信度,记频繁-1与频繁-2项集的最小支持度度阈值为minsup1与minsup2,序列关联挖掘中的最小置信度阈值为mincon。1) The setting of the minimum support and minimum confidence parameters; the confidence and support thresholds are the basis for judging the sequence association and frequent itemsets. Appropriate threshold parameters are beneficial to enhance the credibility of the association relationship, and record frequent -1 and frequent The minimum support thresholds of -2 itemsets are minsup 1 and minsup 2 , and the minimum confidence threshold in sequence association mining is mincon.
2)频繁项集的生成;使用经过归总之后的两符号化序列作为事务集,记为其中两序列对应的所有符号类别为:{A1,A2,…,ACA}和{B1,B2,…,BCB},基于Apriori算法的基本思想,本发明通过对事务集的两阶段扫描,得到序列的频繁项集;根据式(10)计算序列中每个符号的置信度:2) Generation of frequent itemsets; use the two symbolized sequences after summarization as transaction sets, denoted as in All symbol categories corresponding to the two sequences are: {A 1 , A 2 ,...,A CA } and {B 1 ,B 2 ,...,B CB }. Based on the basic idea of the Apriori algorithm, the present invention uses two Step scan to get the frequent itemsets of the sequence; calculate the confidence of each symbol in the sequence according to formula (10):
式中Nt表示事务集的个数,即序列中元素的个数,支持度表示的是项在事务集中的占比程度,在发掘频繁-1项集时,将支持度大于minsup1的项划分至频繁-1项集的集合中。In the formula, N t represents the number of transaction sets, that is, the number of elements in the sequence, and the support degree represents the proportion of items in the transaction set. When excavating frequent-1 itemsets, the items whose support degree is greater than minsup 1 are selected. Divide into sets of frequent-1 itemsets.
记关联挖掘中两序列的频繁-1项集的集合分别为PA、PB,根据指标参数将集合中的项两两配对,构成形如(PAi,PBi)形式2-项集,计算每个项在该2-项集中的支持度,将支持度大于minsup2的项划分至频繁-2项集,记为{PA,PB}freq;Denote the sets of frequent-1 itemsets of two sequences in association mining as P A and P B respectively. According to the index parameters, the items in the sets are paired in pairs to form 2-itemsets of the form (P Ai , P Bi ), Calculate the support degree of each item in the 2-item set, and divide the items whose support degree is greater than minsup 2 into frequent-2 itemsets, denoted as {P A , P B } freq ;
3)序列关联性的挖掘;将所有序列进行两两组合,分别统计其中存在的频繁-2项集中项的支持度以及对应关联挖掘序列之间的置信度;3) Mining of sequence associations; all sequences are combined in pairs, and the support of items in the frequent-2 item set and the confidence between the corresponding association mining sequences are counted respectively;
根据式(11)对所有频繁-2项集在两指标参数之间的支持度累加,并以此作为这两个参数序列在所有多元序列中的支持度计数;According to formula (11), the support degrees of all frequent-2 itemsets between the two index parameters are accumulated, and the support degrees of these two parameter sequences in all multivariate sequences are counted;
σ(XA)=sum(σ(PA)) (12)σ(X A )=sum(σ(P A )) (12)
σ(XB)=sum(σ(PB)) (13)σ(X B )=sum(σ(P B )) (13)
其中m=CA+CB,为对两序列聚类分析之后的所划分出的线段类别总数;同时记指标序列层面的最小支持度阈值为minsup3,若参数指标层面的支持度大于设置的阈值,则计算符号项集组合在两序列中的置信度con(XA→XB),如式(14)所示:Where m=CA+CB, is the total number of line segment categories divided after the cluster analysis of the two sequences; at the same time, the minimum support threshold at the index sequence level is minsup 3 , if the support at the parameter index level is greater than the set threshold, Then calculate the confidence degree con(X A →X B ) of the combination of symbolic itemsets in the two sequences, as shown in formula (14):
当置信度大于所设置的最小置信度阈值时,保留关联规则XA→XB,使用置信度描述两指标之间的关联强度,判定两指标存在强关联。When the confidence is greater than the set minimum confidence threshold, the association rule X A → X B is retained, the confidence is used to describe the strength of the association between the two indicators, and it is determined that there is a strong correlation between the two indicators.
S7中的改进的粒子群优化支持向量回归,其主要为:对于由于异常值删除而导致的空缺数值点,本发明提出一种改进粒子群优化的支持向量回归算法进行修复。如图3所示,主要步骤如下:The improved particle swarm optimization support vector regression in S7 mainly includes: the present invention proposes an improved particle swarm optimization support vector regression algorithm to repair the vacant numerical points caused by the deletion of outliers. As shown in Figure 3, the main steps are as follows:
1)明确变量个数m,在可行解的空间中生成N个m维的粒子,St为迭代中的第t代粒子,其中元素为其中元素表达为 1) Specify the number of variables m, and generate N m-dimensional particles in the space of feasible solutions. S t is the t-th generation particle in the iteration, where the elements are where the elements are expressed as
2)确定惯性权重,具体其表达式为:2) Determine the inertia weight, and its specific expression is:
其中,wa和wz代表惯性权重的最大值和最小值,f,fz,fpj分别表示粒子的适应度值、所有粒子的最小适应度值,所有粒子的平均适应度值。Among them, w a and w z represent the maximum and minimum values of inertia weights, f, f z , and f pj respectively represent the fitness value of the particle, the minimum fitness value of all particles, and the average fitness value of all particles.
3)划分粒子种群的类型,以欧式距离表示每个粒子之间的距离:3) Divide the type of particle population, and express the distance between each particle by Euclidean distance:
定义一个标准距离:Define a standard distance:
式中,r为划分半径,计算i粒子的密度ci:In the formula, r is the dividing radius, and the density c i of the i particle is calculated:
ni为i粒子群落中粒子数目,N为生成的解集中粒子数目。n i is the number of particles in the i particle community, and N is the number of particles in the generated solution set.
4)初始化算法的两个学习因子μ1、μ2;当粒子密度ci大于一定阈值时,更新方式为:4) Two learning factors μ 1 and μ 2 of the initialization algorithm; when the particle density c i is greater than a certain threshold, the update method is:
当粒子密度ci小于一定阈值时,更新方式为:When the particle density c i is less than a certain threshold, the update method is:
下面给出具体实例:Specific examples are given below:
一种变压器DGA在线监测数据的处理方法,步骤如下:A method for processing transformer DGA online monitoring data, the steps are as follows:
S1、数据集的滑动窗口处理:电力变压器经过多年的运行,其DGA在线监测数据通常具有较大的规模,同时对整个数据集进行处理通常会加大算法的复杂程度及服务器的运行压力,可行性较低;提出一种基于滑动窗口思想的DGA在线数据处理方法,建立长度为L的数据窗口,使用该窗口在数据集中截取数据。S1. Sliding window processing of data sets: After years of operation of power transformers, the DGA online monitoring data of power transformers usually have a large scale, and processing the entire data set at the same time usually increases the complexity of the algorithm and the operating pressure of the server, which is feasible. It has low performance; a DGA online data processing method based on the idea of sliding window is proposed, a data window of length L is established, and the data is intercepted in the data set using this window.
S2、按照一定的步长截取数据集:以长为l的步长,拖动数据窗口于长度为L1在线监测数据集中滑动,可以得到截取出个数据窗口,将所得的窗口数据导出,得到待处理的数据窗口集合{DSi},i∈n,数据处理将以数据窗口作为分析的基本单位。S2. Intercept the data set according to a certain step size: with a step size of l, drag the data window to slide in the online monitoring data set of length L 1 , and the intercepted data can be obtained. A data window is obtained, and the obtained window data is derived to obtain a set of data windows to be processed {DS i }, i ∈ n. The data processing will take the data window as the basic unit of analysis.
S3、窗口内序列数据的分段线性化处理:对于截取的数据窗口Wi,根据DGA监测指标分别提取其对应的序列数据,本实例主要研究的是DGA中H2、CH4两类气体,因此在数据窗口Wi中可以得到对应的2个序列,对序列进行分段线性化。S3. Piecewise linearization of the sequence data in the window: For the intercepted data window W i , extract the corresponding sequence data according to the DGA monitoring indicators. This example mainly studies two types of gases, H 2 and CH 4 in the DGA. Therefore, two corresponding sequences can be obtained in the data window Wi, and the sequence is piecewise linearized.
S4、线段集合的聚类分析:对以数组形式表达的线段集合,本实例基于其中的相关参数使用欧式距离的方法建立描述线段相似度的模型dsij,并根据此相似度模型,使用基于最大最小距离改进的K-means聚类算法对线段集合进行聚类分析,将相似程度较高的线段合并为一个类别,并为每个类别线段赋予符号,完成序列数据的符号化。S4. Cluster analysis of line segment sets: For the line segment set expressed in the form of an array, this example uses the Euclidean distance method to establish a model ds ij describing the similarity of the line segments based on the relevant parameters in it, and according to this similarity model, use the maximum The minimum distance improved K-means clustering algorithm performs cluster analysis on the line segment set, merges the line segments with high similarity into one category, and assigns a symbol to each category line segment to complete the symbolization of the sequence data.
S5、序列之间的关联性挖掘:对完成归总操作的两条序列,基于Apriori算法的思路,通过设置的不同的层次的最小支持度阈值minsupi,以及指标层面的最小置信度阈值mincon,不断的挖掘序列之间的存在的频繁项集,最终判定指标之间的关联关系强弱。S5. Correlation mining between sequences: For the two sequences that complete the summarization operation, based on the idea of the Apriori algorithm, the minimum support threshold minsup i at different levels and the minimum confidence threshold mincon at the index level are set. The frequent itemsets that exist between the sequences are continuously mined, and the relationship between the indicators is finally determined.
S6、基于关联关系的异常数据提取筛除:根据序列之间的关联关系,筛选提取出其中存在的无效异常数据。S6. Extraction and screening of abnormal data based on association relationship: According to the association relationship between sequences, the invalid abnormal data existing in the sequence is filtered and extracted.
S7、数据的修复:使用改进粒子群优化支持向量回归修复DGA在线监测数据,完成DGA在线数据的处理工作。S7. Data repair: use the improved particle swarm optimization support vector regression to repair the DGA online monitoring data, and complete the processing of the DGA online data.
以某台主变设备的DGA历史在线监测数据中的氢气与甲烷气体指标为研究对象,使用本发明提出的方法对以上窗口序列数据进行分段线性化拟合,此处应注意:由于不同指标数据所处的数量级不同,所以在使用本发明提出的方式进行分段线性化拟合时,对不同的指标数据应该选取适当的拟合误差阈值,各个指标数据的具体拟合结果如图4和图5所示。Taking the hydrogen and methane gas indicators in the DGA historical online monitoring data of a main transformer as the research object, the method proposed in the present invention is used to perform piecewise linearization fitting on the above window sequence data. It should be noted here: due to different indicators The order of magnitude of the data is different, so when using the method proposed by the present invention to perform piecewise linearization fitting, an appropriate fitting error threshold should be selected for different index data. The specific fitting results of each index data are shown in Figures 4 and 4. shown in Figure 5.
拟合结果证明了本发明提出的在线数据分段线性化算法的可行性,每条线段拟合的误差均小于设置的拟合误差阈值,且拟合的线段能较好的反映拟合区间内在线数据点的变化趋势,算法的有效性得到验证。The fitting result proves the feasibility of the online data piecewise linearization algorithm proposed by the present invention, the fitting error of each line segment is less than the set fitting error threshold, and the fitted line segment can better reflect the fitting interval The changing trend of online data points, the validity of the algorithm is verified.
序列关联关系的挖掘:得到对应的频繁项集之后,使用本发明提出的方法分析两指标之间的关联性,以支持度便于置信度表示关联关系的强弱,得到H2→CH4的支持度与置信度分别为0.5050与0.6804,均大于所设置的相关最小阈值,表示该规则为强关联规则,说明氢气与甲烷指标之间存在强关联关系。检测结果如图6所示。修复DGA在线数据结果如图7所示。Sequence correlation mining: After obtaining the corresponding frequent itemsets, the method proposed in the present invention is used to analyze the correlation between the two indicators, and the support degree is convenient for the confidence degree to express the strength of the correlation relationship, which is supported by H 2 →CH 4 The degree and confidence are 0.5050 and 0.6804 respectively, which are both greater than the set minimum relevant threshold, indicating that the rule is a strong association rule, indicating that there is a strong correlation between hydrogen and methane indicators. The detection results are shown in Figure 6. The result of repairing DGA online data is shown in Figure 7.
可见被筛除的数据点,依靠其他的几种特征气体使用本发明方法进行修复后,所有数值回归正常水平,在线数据得到有效的清洗。It can be seen that after the screened out data points are repaired by the method of the present invention relying on several other characteristic gases, all the values return to the normal level, and the online data is effectively cleaned.
Claims (6)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111534103.2A CN114372093A (en) | 2021-12-15 | 2021-12-15 | Processing method of DGA (differential global alignment) online monitoring data of transformer |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111534103.2A CN114372093A (en) | 2021-12-15 | 2021-12-15 | Processing method of DGA (differential global alignment) online monitoring data of transformer |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114372093A true CN114372093A (en) | 2022-04-19 |
Family
ID=81139694
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111534103.2A Pending CN114372093A (en) | 2021-12-15 | 2021-12-15 | Processing method of DGA (differential global alignment) online monitoring data of transformer |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114372093A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115695564A (en) * | 2022-12-30 | 2023-02-03 | 深圳市润信数据技术有限公司 | Efficient transmission method for data of Internet of things |
CN116776258A (en) * | 2023-08-24 | 2023-09-19 | 北京天恒安科集团有限公司 | Power equipment monitoring data processing method and system |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015176565A1 (en) * | 2014-05-22 | 2015-11-26 | 袁志贤 | Method for predicting faults in electrical equipment based on multi-dimension time series |
CN107065568A (en) * | 2017-05-26 | 2017-08-18 | 广州供电局有限公司 | A kind of Diagnosis Method of Transformer Faults based on particle swarm support vector machine |
CN107992880A (en) * | 2017-11-13 | 2018-05-04 | 山东斯博科特电气技术有限公司 | A kind of optimal lump classification method for diagnosing faults of power transformer |
KR102106827B1 (en) * | 2018-11-30 | 2020-05-06 | 두산중공업 주식회사 | System and method for optimizing boiler combustion |
CN112800686A (en) * | 2021-03-29 | 2021-05-14 | 国网江西省电力有限公司电力科学研究院 | Transformer DGA online monitoring data abnormal mode judgment method |
CN113792754A (en) * | 2021-08-12 | 2021-12-14 | 国网江西省电力有限公司电力科学研究院 | A method for on-line monitoring data processing of converter transformer DGA with first removal and then repair |
-
2021
- 2021-12-15 CN CN202111534103.2A patent/CN114372093A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015176565A1 (en) * | 2014-05-22 | 2015-11-26 | 袁志贤 | Method for predicting faults in electrical equipment based on multi-dimension time series |
CN107065568A (en) * | 2017-05-26 | 2017-08-18 | 广州供电局有限公司 | A kind of Diagnosis Method of Transformer Faults based on particle swarm support vector machine |
CN107992880A (en) * | 2017-11-13 | 2018-05-04 | 山东斯博科特电气技术有限公司 | A kind of optimal lump classification method for diagnosing faults of power transformer |
KR102106827B1 (en) * | 2018-11-30 | 2020-05-06 | 두산중공업 주식회사 | System and method for optimizing boiler combustion |
CN112800686A (en) * | 2021-03-29 | 2021-05-14 | 国网江西省电力有限公司电力科学研究院 | Transformer DGA online monitoring data abnormal mode judgment method |
CN113792754A (en) * | 2021-08-12 | 2021-12-14 | 国网江西省电力有限公司电力科学研究院 | A method for on-line monitoring data processing of converter transformer DGA with first removal and then repair |
Non-Patent Citations (2)
Title |
---|
吴米佳;卢锦玲;: "基于改进粒子群算法与支持向量机的变压器状态评估", 电力科学与工程, no. 03, 28 March 2011 (2011-03-28) * |
郭世伟;孟昱煜;: "一个基于二阶粒子群的关联规则挖掘算法", 兰州交通大学学报, no. 03, 15 June 2016 (2016-06-15) * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115695564A (en) * | 2022-12-30 | 2023-02-03 | 深圳市润信数据技术有限公司 | Efficient transmission method for data of Internet of things |
CN115695564B (en) * | 2022-12-30 | 2023-03-10 | 深圳市润信数据技术有限公司 | Efficient transmission method of Internet of things data |
CN116776258A (en) * | 2023-08-24 | 2023-09-19 | 北京天恒安科集团有限公司 | Power equipment monitoring data processing method and system |
CN116776258B (en) * | 2023-08-24 | 2023-10-31 | 北京天恒安科集团有限公司 | Power equipment monitoring data processing method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112800686A (en) | Transformer DGA online monitoring data abnormal mode judgment method | |
CN110458195B (en) | Abnormal data discrimination method based on multi-criterion fusion | |
WO2015176565A1 (en) | Method for predicting faults in electrical equipment based on multi-dimension time series | |
CN106101121B (en) | A kind of all-network flow abnormity abstracting method | |
CN113792754B (en) | Converter transformer DGA online monitoring data processing method for firstly removing abnormal state and then repairing | |
CN106708016A (en) | fault monitoring method and device | |
CN114372093A (en) | Processing method of DGA (differential global alignment) online monitoring data of transformer | |
CN108875118B (en) | A method and equipment for evaluating the accuracy of a prediction model for silicon content in blast furnace molten iron | |
CN110348683A (en) | The main genetic analysis method, apparatus equipment of electrical energy power quality disturbance event and storage medium | |
CN116780781B (en) | Power management method for smart grid access | |
CN113670616B (en) | Bearing performance degradation state detection method and system | |
CN111210170A (en) | Environmental protection management and control monitoring and evaluation method based on 90% electricity distribution characteristic index | |
CN115409120A (en) | Data-driven-based auxiliary user electricity stealing behavior detection method | |
CN109257383A (en) | A kind of BGP method for detecting abnormality and system | |
CN116186630A (en) | Abnormal leakage current data identification method and related device | |
CN115659191A (en) | Light-charge typical scene set generation method based on ensemble clustering and frequent itemset tree | |
CN110097141A (en) | A kind of acquisition operational system intelligent trouble detection method | |
He et al. | Intelligent Fault Analysis With AIOps Technology | |
CN116150191A (en) | Data operation acceleration method and system for cloud data architecture | |
CN115718861A (en) | Method and system for classifying power users and monitoring abnormal behaviors in high-energy-consumption industry | |
CN115034671A (en) | Secondary system information fault analysis method based on association rule and cluster | |
CN115033591A (en) | An intelligent detection method, system, storage medium and computer equipment for abnormality of electricity bill data | |
CN118669281A (en) | Online state evaluation method, device and equipment of wind motor and storage medium | |
CN118536048A (en) | Switchgear insulation status monitoring and management system | |
CN118779800A (en) | A method and system for identifying abnormality of pollutant time series data based on sliding window |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |