CN117031202A - K-SMOTE and depth forest based power transmission line fault multi-source diagnosis method and system - Google Patents
K-SMOTE and depth forest based power transmission line fault multi-source diagnosis method and system Download PDFInfo
- Publication number
- CN117031202A CN117031202A CN202310998033.9A CN202310998033A CN117031202A CN 117031202 A CN117031202 A CN 117031202A CN 202310998033 A CN202310998033 A CN 202310998033A CN 117031202 A CN117031202 A CN 117031202A
- Authority
- CN
- China
- Prior art keywords
- fault
- data
- data set
- transmission line
- smote
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003745 diagnosis Methods 0.000 title claims abstract description 58
- 230000005540 biological transmission Effects 0.000 title claims abstract description 43
- 238000000034 method Methods 0.000 title claims abstract description 30
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 25
- 238000012549 training Methods 0.000 claims abstract description 19
- 238000004590 computer program Methods 0.000 claims description 12
- 238000003064 k means clustering Methods 0.000 claims description 11
- 230000001052 transient effect Effects 0.000 claims description 6
- 238000002405 diagnostic procedure Methods 0.000 claims description 4
- 238000005070 sampling Methods 0.000 claims description 4
- 208000025274 Lightning injury Diseases 0.000 claims 4
- 239000013598 vector Substances 0.000 description 13
- 238000007637 random forest analysis Methods 0.000 description 7
- 238000000605 extraction Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 241000288113 Gallirallus australis Species 0.000 description 2
- 238000003066 decision tree Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007621 cluster analysis Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000005484 gravity Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 241000894007 species Species 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01R—MEASURING ELECTRIC VARIABLES; MEASURING MAGNETIC VARIABLES
- G01R31/00—Arrangements for testing electric properties; Arrangements for locating electric faults; Arrangements for electrical testing characterised by what is being tested not provided for elsewhere
- G01R31/08—Locating faults in cables, transmission lines, or networks
- G01R31/081—Locating faults in cables, transmission lines, or networks according to type of conductors
- G01R31/085—Locating faults in cables, transmission lines, or networks according to type of conductors in power transmission or distribution lines, e.g. overhead
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01R—MEASURING ELECTRIC VARIABLES; MEASURING MAGNETIC VARIABLES
- G01R31/00—Arrangements for testing electric properties; Arrangements for locating electric faults; Arrangements for electrical testing characterised by what is being tested not provided for elsewhere
- G01R31/08—Locating faults in cables, transmission lines, or networks
- G01R31/088—Aspects of digital computing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Bioinformatics & Computational Biology (AREA)
- Economics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Public Health (AREA)
- Probability & Statistics with Applications (AREA)
- Water Supply & Treatment (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Mathematical Physics (AREA)
- Supply And Distribution Of Alternating Current (AREA)
Abstract
Description
技术领域Technical field
本发明涉及电网系统输配电线路故障检测领域,尤其是涉及故障类型和故障发生诊断方面,具体为一种基于K-SMOTE和深度森林的输电线路故障多源诊断方法及系统。The present invention relates to the field of transmission and distribution line fault detection in power grid systems, and in particular to fault types and fault occurrence diagnosis. Specifically, it is a multi-source diagnosis method and system for transmission line faults based on K-SMOTE and deep forest.
背景技术Background technique
随着中国输配电系统日趋大型化、复杂化和智能化,电网运行特性随之发生深刻变化,电网调度也面临着新的重大挑战。一方面,输配电系统会受到地区气候多变、运行环境复杂等因素的影响,迫切需要同步掌控电网运行态势;另一方面,电网送受端、交直流耦合密切,容易引发全局安全风险。As China's power transmission and distribution system becomes increasingly large-scale, complex and intelligent, the operating characteristics of the power grid have undergone profound changes, and power grid dispatching is also facing new and major challenges. On the one hand, the power transmission and distribution system will be affected by factors such as regional climate variability and complex operating environments, and there is an urgent need to control the power grid's operating situation synchronously; on the other hand, the sending and receiving ends of the power grid are closely coupled with AC and DC, which can easily lead to overall security risks.
因此,进行电网故障诊断实对协助电网正常运行与调度运行极为重要。当电网发生故障时,监测系统采集的海量故障警报数据(涵盖正确报警信息、错误报警信息、重复报警信息和不相关信息)从本地自动装置上送至调度中心。现有故障诊断方法对故障数据集分类不平衡,故障诊断准确率低。Therefore, power grid fault diagnosis is extremely important to assist the normal operation and dispatching of the power grid. When a power grid failure occurs, the massive fault alarm data collected by the monitoring system (covering correct alarm information, incorrect alarm information, repeated alarm information and irrelevant information) is sent from the local automatic device to the dispatch center. Existing fault diagnosis methods have unbalanced classification of fault data sets and have low fault diagnosis accuracy.
发明内容Contents of the invention
本发明提供一种基于K-SMOTE和深度森林的输电线路故障多源诊断方法及系统,以解决现有技术中配电线路故障故障数据集分类不平衡、诊断准确率低的技术问题。The present invention provides a multi-source diagnosis method and system for transmission line faults based on K-SMOTE and deep forest to solve the technical problems of unbalanced classification and low diagnosis accuracy of distribution line fault data sets in the prior art.
根据本发明说明书的一方面,提供一种基于K-SMOTE和深度森林的输电线路故障多源诊断方法,所述方法包括:According to one aspect of the present invention, a multi-source diagnosis method for transmission line faults based on K-SMOTE and deep forest is provided. The method includes:
获取输电线路故障数据;Obtain transmission line fault data;
将所述输电线路故障数据输入训练好的故障诊断模型,得到故障诊断结果;Input the transmission line fault data into the trained fault diagnosis model to obtain fault diagnosis results;
所述故障诊断模型的训练包括:The training of the fault diagnosis model includes:
提取历史故障数据中电压电流波形的时频特征,根据提取的的时频特征进行故障分类并形成不同故障类型的故障波形数据集;Extract the time-frequency characteristics of voltage and current waveforms in historical fault data, classify faults based on the extracted time-frequency characteristics, and form fault waveform data sets of different fault types;
利用K-means聚类算法对故障数据集中的不平衡数据集进行聚类,并利用SMOTE过采样对聚类后的不平衡数据集进行数据扩充;Use K-means clustering algorithm to cluster the unbalanced data set in the fault data set, and use SMOTE oversampling to expand the clustered unbalanced data set;
根据平衡数据集和扩充后的不平衡数据集形成故障子数据集;Form fault sub-data sets based on the balanced data set and the expanded unbalanced data set;
利用所述故障子数据集,并基于深度森林算法进行模型训练,得到训练好的故障诊断模型。The fault sub-data set is used and model training is performed based on the deep forest algorithm to obtain a trained fault diagnosis model.
优选地,提取的所述时频特征包括时域、频域及时频域的暂态波形特征。Preferably, the extracted time-frequency features include time domain, frequency domain and transient waveform features in the frequency domain.
优选地,利用K-means聚类算法对故障数据集中的不平衡数据集进行聚类,进一步包括:Preferably, K-means clustering algorithm is used to cluster the imbalanced data set in the fault data set, further including:
步骤一:对于数据集,抽取k个初始聚类中心点为i1,i2,…,ik;Step 1: For the data set, extract k initial clustering center points as i 1 , i 2 ,..., i k ;
步骤二:对于数据集中除聚类中心外的其他数据,通过公式ci=argmin[|xi-ij|]2计算各数据与ij(j=1,2,…,k)的欧式距离,并将与i,j最近的数据分为一类,得到k类数据;Step 2: For other data in the data set except the cluster center, calculate the Euclidean of each data and i j (j=1,2,...,k) through the formula c i =argmin[|x i -i j |] 2 distance, and classify the data closest to i, j into one category to obtain k category data;
步骤三:计算各类中数据的平均值,将求得的平均值设定为该类的中心值,再用步骤二中公式计算各数据点至中心值的欧氏距离之和,计为S;Step 3: Calculate the average value of the data in each category, set the obtained average value as the central value of the category, and then use the formula in step 2 to calculate the sum of the Euclidean distances from each data point to the central value, calculated as S ;
步骤四:重复步骤二、三,直至结果S不改变,输出K-means聚类子集。Step 4: Repeat steps 2 and 3 until the result S does not change and output the K-means clustering subset.
优选地,利用SMOTE过采样对聚类后的不平衡数据集进行数据扩充,包括:Preferably, SMOTE oversampling is used to augment the clustered imbalanced data set, including:
步骤一:不平衡数据集中少数类的每一个样本x,以欧氏距离为标准,计算它到少数类样本集中所有样本的距离,得到其k近邻;Step 1: For each sample x of the minority class in the unbalanced data set, use the Euclidean distance as the standard, calculate its distance to all samples in the minority class sample set, and obtain its k nearest neighbors;
步骤二:确定采样倍率N,对于每一个少数类样本x,从其k近邻中随机选择若干个样本;Step 2: Determine the sampling rate N. For each minority class sample x, randomly select several samples from its k nearest neighbors;
步骤三:随机选出一个近邻Xnew,所述近邻Xnew与原样本通过公式构建新的样本;rand(0,N)是在0到N之间生成的随机数,/>是聚类中心,x为该类样本。Step 3: Randomly select a nearest neighbor Xnew, which is compared with the original sample through the formula Construct a new sample; rand(0, N) is a random number generated between 0 and N,/> is the cluster center, and x is a sample of this type.
优选地,所述深度森林算法采用深度森林多粒度级联森林模型。Preferably, the deep forest algorithm adopts a deep forest multi-granularity cascade forest model.
根据本发明的说明书的又一方面,提供一种基于K-SMOTE和深度森林的输电线路故障多源诊断系统,该系统包括:According to another aspect of the specification of the present invention, a transmission line fault multi-source diagnosis system based on K-SMOTE and deep forest is provided, which system includes:
获取单元:获取故障数据;Acquisition unit: obtain fault data;
诊断单元:将所述输电线路故障数据输入训练好的故障诊断模型,得到故障诊断结果;Diagnosis unit: input the transmission line fault data into the trained fault diagnosis model to obtain fault diagnosis results;
训练单元:提取历史故障数据中电压电流波形的时频特征,根据提取的的时频特征进行故障分类并形成不同故障类型的故障数据集;Training unit: Extract the time-frequency characteristics of voltage and current waveforms in historical fault data, classify faults based on the extracted time-frequency characteristics and form fault data sets of different fault types;
利用K-means聚类算法对故障数据集中的不平衡数据集进行聚类,并利用SMOTE过采样对聚类后的不平衡数据集进行数据扩充;Use K-means clustering algorithm to cluster the unbalanced data set in the fault data set, and use SMOTE oversampling to expand the clustered unbalanced data set;
根据平衡数据集和扩充后的不平衡数据集形成故障子数据集;Form fault sub-data sets based on the balanced data set and the expanded unbalanced data set;
利用所述故障子数据集,并基于深度森林算法进行模型训练,得到训练好的故障诊断模型。The fault sub-data set is used and model training is performed based on the deep forest algorithm to obtain a trained fault diagnosis model.
基于本发明说明书的又一方面,提供一种电子设备,所述电子设备包括处理器、存储器,以及存储在所述存储器上并可被所述处理器执行的计算机程序,其中所述计算机程序被所述处理器执行时,实现所述的基于K-SMOTE和深度森林的输电线路故障多源诊断方法的步骤。Based on another aspect of the present invention, an electronic device is provided. The electronic device includes a processor, a memory, and a computer program stored on the memory and executable by the processor, wherein the computer program is When the processor is executed, the steps of the multi-source diagnosis method for transmission line faults based on K-SMOTE and deep forest are implemented.
基于本发明说明书的又一方面,提供一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,其中所述计算机程序被处理器执行时,实现所述的基于K-SMOTE和深度森林的输电线路故障多源诊断方法的步骤。Based on another aspect of the present invention, a computer-readable storage medium is provided. A computer program is stored on the computer-readable storage medium. When the computer program is executed by a processor, the K-SMOTE-based method is implemented. and steps of a multi-source diagnosis method for transmission line faults in Deep Forest.
与现有技术相比,本发明的有益效果在于:Compared with the prior art, the beneficial effects of the present invention are:
本发明提供的一种基于K-SMOTE和深度森林的输电线路故障多源诊断方法及系统,通过对输电线路故障进行特征提取分类,并进行K-means聚类分析修正不平衡分类子集,避免了故障数据分类状况复杂、数据深度不平衡造成SMOTE算法在小样本处理时加强噪声数据点的问题,从而提高后续基于深度森林训练的准确率和覆盖率。The invention provides a multi-source diagnosis method and system for transmission line faults based on K-SMOTE and deep forest. By performing feature extraction and classification on transmission line faults and performing K-means cluster analysis to correct the unbalanced classification subset, it avoids It solves the problem of complex fault data classification and unbalanced data depth causing the SMOTE algorithm to strengthen noise data points when processing small samples, thereby improving the accuracy and coverage of subsequent deep forest-based training.
附图说明Description of the drawings
图1为根据本发明实施例的基于K-SMOTE和深度森林的输电线路故障多源诊断方法的流程图;Figure 1 is a flow chart of a transmission line fault multi-source diagnosis method based on K-SMOTE and deep forest according to an embodiment of the present invention;
图2为根据本发明实施例的基于K-SMOTE和深度森林的输电线路故障多源诊断系统的示意图;Figure 2 is a schematic diagram of a transmission line fault multi-source diagnosis system based on K-SMOTE and deep forest according to an embodiment of the present invention;
图3为根据本发明实施例中故障诊断模型的训练过程流程图。Figure 3 is a flow chart of the training process of the fault diagnosis model according to the embodiment of the present invention.
具体实施方式Detailed ways
下面将结合本发明中的附图,对本发明的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动条件下所获得的所有其它实施例,都属于本发明保护的范围。The technical solution of the present invention will be clearly and completely described below with reference to the accompanying drawings of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of the present invention.
如图1、图3所示,本实施例提供了一种基于K-SMOTE和深度森林的输电线路故障多源诊断方法,包括:As shown in Figures 1 and 3, this embodiment provides a multi-source transmission line fault diagnosis method based on K-SMOTE and deep forest, including:
获取输电线路故障数据;Obtain transmission line fault data;
将所述输电线路故障数据输入训练好的故障诊断模型,得到故障诊断结果;Input the transmission line fault data into the trained fault diagnosis model to obtain fault diagnosis results;
所述故障诊断模型的训练包括:The training of the fault diagnosis model includes:
提取历史故障数据中电压电流波形的时频特征,根据提取的的时频特征进行故障分类并形成不同故障类型的故障波形数据集;Extract the time-frequency characteristics of voltage and current waveforms in historical fault data, classify faults based on the extracted time-frequency characteristics, and form fault waveform data sets of different fault types;
利用K-means聚类算法对故障波形数据集中的不平衡数据集进行聚类,并利用SMOTE过采样对聚类后的不平衡数据集进行数据扩充;Use K-means clustering algorithm to cluster the unbalanced data set in the fault waveform data set, and use SMOTE oversampling to expand the clustered unbalanced data set;
根据平衡数据集和扩充后的不平衡数据集形成故障子数据集;Form fault sub-data sets based on the balanced data set and the expanded unbalanced data set;
利用所述故障子数据集,并基于深度森林算法进行模型训练,得到训练好的故障诊断模型。The fault sub-data set is used and model training is performed based on the deep forest algorithm to obtain a trained fault diagnosis model.
具体地,历史故障波形数据库所提取的波形特征是时域、频域、时频域三个方面的暂态波形特征。其中,对暂态波形时域特征的提取包括波形均值、均方差、方根幅值、峰值、偏度、峰值因子和波形因子时域特征参数,对暂态频域的提取包括重心频率、平均频率、频率标准差和均方根频率、频域谱功率,对暂态时频域的提取包括小波熵;Specifically, the waveform features extracted from the historical fault waveform database are transient waveform features in three aspects: time domain, frequency domain, and time-frequency domain. Among them, the extraction of transient waveform time domain characteristics includes waveform mean, mean square error, root square amplitude, peak value, skewness, crest factor and waveform factor time domain characteristic parameters, and the extraction of transient frequency domain includes center of gravity frequency, average Frequency, frequency standard deviation and root mean square frequency, frequency domain spectral power, the extraction of transient frequency domain includes wavelet entropy;
同时采用公式进行归一化处理从而消除量纲的不一致性,公式为:在公式中,yi为归一化后的数据;xmax、xmin为提取的特征参数中的最大值与最小值;At the same time, the formula is used for normalization to eliminate dimensional inconsistencies. The formula is: In the formula, y i is the normalized data; x max and x min are the maximum and minimum values of the extracted feature parameters;
并且将上述时频特征进行故障分类,所述故障分类包括:雷击非雷击故障二分类、雷击多分类、非雷击多分类或整体多分类,由此构建不同类型的故障波形数据集;And the above time-frequency characteristics are used for fault classification. The fault classification includes: lightning strike and non-lightning strike fault two classifications, lightning strike multi-classification, non-lightning strike multi-classification or overall multi-classification, thereby constructing different types of fault waveform data sets;
具体地,分类后的数据集,每个类的数据数量不同,将数据数量最多的类作为平衡数据集,其他剩余类的数据集均为不平衡数据集,对剩余的数据集进行聚类和过采样处理;Specifically, the classified data set has a different number of data for each class. The class with the largest number of data is used as a balanced data set, and the data sets of other remaining classes are all unbalanced data sets. The remaining data sets are clustered and Oversampling processing;
基于国网历史故障信息,已初步建立故障波形数据集,数据总数3415条,其中雷击数据总数994条,非雷击数据总数2421条;剔除掉无具体绕击反击类型的数据后,剩余雷击数据582条,其中绕击485条,反击97条;剔除掉没有细类标签的非雷击数据后,剩余非雷击数据898条,其中鸟害120条、冰害162条、风偏186条、外力破坏365条、其他65条。Based on the historical fault information of the State Grid, a fault waveform data set has been initially established, with a total of 3,415 pieces of data, including a total of 994 pieces of lightning strike data and a total of 2,421 pieces of non-lightning strike data. After excluding data without specific counterattack types, 582 pieces of lightning strike data remain. There are 485 of them, including 485 for bypass and 97 for counterattack. After excluding the non-lightning data without detailed category labels, there are 898 remaining non-lightning data, including 120 for bird damage, 162 for ice damage, 186 for wind deflection, and 365 for external force damage. Articles, 65 others.
具体地,对不平衡数据集通过K-means聚类算法进行聚类,包括,Specifically, the imbalanced data set is clustered through the K-means clustering algorithm, including,
步骤一:对于数据集,抽取k个初始聚类中心点为i1,i2,…,ik;Step 1: For the data set, extract k initial clustering center points as i 1 , i 2 ,..., i k ;
步骤二:对于除聚类中心外的数据集其他数据,通过公式ci=argmin[|xi-ij|]2计算各数据与ij(j=1,2,…,k)的欧式距离,并将与i,j最近的数据分为一类,实现数据分为k类;Step 2: For other data in the data set except the cluster center, calculate the Euclidean of each data and i j (j=1,2,...,k) through the formula c i =argmin[|x i -i j |] 2 distance, and classify the data closest to i, j into one category, and implement the data into k categories;
步骤三:计算各类中数据的平均值,将求得的平均值设定为该类的中心值,再用步骤二中公式计算各数据点至中心值的欧氏距离之和,计为S;Step 3: Calculate the average value of the data in each category, set the obtained average value as the central value of the category, and then use the formula in step 2 to calculate the sum of the Euclidean distances from each data point to the central value, calculated as S ;
步骤四:重复步骤二、三,直至结果S不改变,输出K-means聚类子集。Step 4: Repeat steps 2 and 3 until the result S does not change and output the K-means clustering subset.
11.具体地,利用SMOTE过采样对聚类后的不平衡数据集进行数据扩充,进一步包括:11. Specifically, use SMOTE oversampling to expand the clustered imbalanced data set, further including:
步骤一:对于不平衡数据集中少数类的每一个样本x,以欧氏距离为标准,计算它到少数类样本集中所有样本的距离,得到其k近邻;Step 1: For each sample x of the minority class in the imbalanced data set, use the Euclidean distance as the standard, calculate its distance to all samples in the minority class sample set, and obtain its k nearest neighbors;
步骤二:确定采样倍率N,对于每一个少数类样本x,从其k近邻中随机选择若干个样本;Step 2: Determine the sampling rate N. For each minority class sample x, randomly select several samples from its k nearest neighbors;
步骤三:随机选出一个近邻Xnew,所述近邻Xnew与原样本通过公式构建新的样本;rand(0,N)是在0到N之间生成的随机数,/>是聚类中心,x为该类样本。Step 3: Randomly select a nearest neighbor Xnew, which is compared with the original sample through the formula Construct a new sample; rand(0, N) is a random number generated between 0 and N,/> is the cluster center, and x is a sample of this type.
SMOTE过采样的要求是,除数据数量最多的一类保持不变外,增加其余类别的数据数量,直至其余类别的数据数量与最多的一类保持一致;通过SMOTE过采样,在各数据集的聚类中心与其他样本点的连线上进行插值,修正聚类样本,扩充数据集,构建故障子数据集;The requirement of SMOTE oversampling is to increase the number of data in other categories except for the category with the largest number of data. Interpolation is performed on the connection between the cluster center and other sample points, the clustering samples are corrected, the data set is expanded, and the fault sub-data set is constructed;
具体地,深度森林算法采用深度森林多粒度级联森林模型,包括,Specifically, the deep forest algorithm adopts the deep forest multi-granularity cascade forest model, including,
多粒度级联森林模型包括多粒度扫描和级联森林;多粒度扫描部分筛选特征,生成与分类关系更密切的特征量。首先利用长度L的滑动窗口对维度为K的原始数据进行扫描,每次滑动步长为S,滑动采样结束后得到N个L维的特征子向量,其中N=(K-L)/S+1。假定原始数据采用10种故障特征类进行区分,输入样本为300维向量,使用100的粒度进行扫描,每次滑动步长为1,总共可生成201个100维向量,每个向量经过随机森林将产生1个10维的类向量,总计产生2010维的输入向量;The multi-granularity cascade forest model includes multi-granularity scanning and cascade forest; multi-granularity scanning partially filters features to generate feature quantities that are more closely related to classification. First, a sliding window of length L is used to scan the original data with dimension K, and each sliding step is S. After sliding sampling is completed, N L-dimensional feature subvectors are obtained, where N = (K-L)/S+1. Assume that the original data is distinguished by 10 fault feature classes, the input sample is a 300-dimensional vector, and the granularity of 100 is used for scanning. Each sliding step is 1. A total of 201 100-dimensional vectors can be generated. Each vector is processed through a random forest. Generate a 10-dimensional class vector, generating a total of 2010-dimensional input vectors;
多粒度级联森林模型将随机森林作为基学习器进行集成学习,其中级联森林种每个森林由多颗决策树构成,决策树采用分类和回归树(CAET)算法;The multi-granularity cascade forest model uses random forests as the base learner for integrated learning. Each forest of the cascade forest species is composed of multiple decision trees. The decision trees use the classification and regression tree (CAET) algorithm;
式中Pm表示当前样本集D中第m类样本所占比例,|y|为非零整数,Gini(D)为基尼指数,其数值越小,表明数据集D的纯度越高,在构建CART节点时,计算当前训练样本集现有特征在所有可能取值下的基尼系数,并选择最小基尼系数对应的特征和对应分割点作为当前节点的分裂条件;In the formula, Pm represents the proportion of samples of the mth category in the current sample set D, |y| is a non-zero integer, and Gini(D) is the Gini index. The smaller the value, the higher the purity of the data set D. When constructing CART node, calculate the Gini coefficient of the existing features of the current training sample set under all possible values, and select the feature corresponding to the minimum Gini coefficient and the corresponding split point as the splitting condition of the current node;
级联森林的作用是一层一层地对样本特征进行处理,增强该算法模式识别的准确率。每一层设有1个随机森林和1个完全随机森林,级联森林第1层的输入数据是经过随机森林的2010维类向量和经过完全随机森林的2010维向量拼接的4020维向量,经过分类处理后获得2个二维类别向量;然后把这2个二维类别向量与4020维初始特征向量相拼接,构成一个4024维的新特征向量作为第二层的输入;按照该方法类推。最后,对第N层输出的类别向量求平均值,选择其中最大值所对应的类别作为此类故障的最终分类结果。The function of the cascade forest is to process sample features layer by layer to enhance the accuracy of pattern recognition of the algorithm. Each layer has a random forest and a complete random forest. The input data of the first layer of the cascade forest is a 4020-dimensional vector spliced by the 2010-dimensional class vector of the random forest and the 2010-dimensional vector of the complete random forest. After the classification process, two two-dimensional category vectors are obtained; then these two two-dimensional category vectors are spliced with the 4020-dimensional initial feature vector to form a 4024-dimensional new feature vector as the input of the second layer; follow this method and analogy. Finally, the category vectors output by the Nth layer are averaged, and the category corresponding to the maximum value is selected as the final classification result of this type of fault.
本实施例中对不同类型的故障波形数据集处理后,使得少数类数据集里的数据数量增加,增加了训练集的数据数量,使用深度森林作为分类方法进行分类,使得分类的结果准确度上升。将经过处理之后的数据利用Weka测试分类精度,在训练集和测试集9:1的条件下,非雷击多分类精度为71.27%,39个特征SMOTE处理之后的最高精度为77.90%,In this embodiment, after processing different types of fault waveform data sets, the number of data in the minority class data set is increased, and the number of data in the training set is increased. Deep forest is used as a classification method for classification, which increases the accuracy of the classification results. . The processed data was tested for classification accuracy using Weka. Under the condition of 9:1 between training set and test set, the non-lightning multi-classification accuracy was 71.27%, and the highest accuracy after SMOTE processing of 39 features was 77.90%.
本实施例中深度森林算法训练集和测试集比为7:3;K-means-SMOTE与深度森林算法结合,其精确率评价指标为准确率Precision=TP/(TP/TP+FP),即正确预测为正占全部预测为正的比例;通过Weka仿真结果可知K-means-SMOTE与深度森林算法结合后,雷击/非雷击二分类精度可达90%;雷击绕击/反击分类精度可达86%;非雷击多分类精度可达73%;整体多分类精度可达79%,精度比传统的随机森林等方法提高10%以上。In this embodiment, the ratio of the training set and the test set of the deep forest algorithm is 7:3; K-means-SMOTE is combined with the deep forest algorithm, and its accuracy evaluation index is accuracy Precision=TP/(TP/TP+FP), that is Correct predictions are positive as a proportion of all positive predictions; the Weka simulation results show that after K-means-SMOTE is combined with the deep forest algorithm, the lightning strike/non-lightning strike classification accuracy can reach 90%; the lightning strike/counterattack classification accuracy can reach 90%. 86%; the non-lightning multi-classification accuracy can reach 73%; the overall multi-classification accuracy can reach 79%, and the accuracy is more than 10% higher than traditional random forest and other methods.
如图2、图3所示,本实施例还提供了一种基于K-SMOTE和深度森林的输电线路故障多源诊断系统,包括:As shown in Figures 2 and 3, this embodiment also provides a transmission line fault multi-source diagnosis system based on K-SMOTE and deep forest, including:
获取单元:获取故障数据;Acquisition unit: obtain fault data;
诊断单元:将所述输电线路故障数据输入训练好的故障诊断模型,得到故障诊断结果;Diagnosis unit: input the transmission line fault data into the trained fault diagnosis model to obtain fault diagnosis results;
所述故障诊断模型的训练包括:The training of the fault diagnosis model includes:
提取历史故障数据中电压电流波形的时频特征,根据提取的的时频特征进行故障分类并形成不同故障类型的故障数据集;Extract the time-frequency characteristics of voltage and current waveforms in historical fault data, classify faults based on the extracted time-frequency characteristics, and form fault data sets of different fault types;
利用K-means聚类算法对故障数据集中的不平衡数据集进行聚类,并利用SMOTE过采样对聚类后的不平衡数据集进行数据扩充;Use K-means clustering algorithm to cluster the unbalanced data set in the fault data set, and use SMOTE oversampling to expand the clustered unbalanced data set;
根据平衡数据集和扩充后的不平衡数据集形成故障子数据集;Form fault sub-data sets based on the balanced data set and the expanded unbalanced data set;
利用所述新的故障数据集,并基于深度森林算法进行模型训练,得到训练好的故障诊断模型。The new fault data set is used and model training is performed based on the deep forest algorithm to obtain a trained fault diagnosis model.
本实施例还提供一种计算机可读存储介质,所述存储介质上存储有计算机程序,其中所述计算机程序被处理器执行时,实现所述的基于K-SMOTE和深度森林的输电线路故障多源诊断方法的步骤。This embodiment also provides a computer-readable storage medium. A computer program is stored on the storage medium. When the computer program is executed by a processor, the transmission line fault multiplication based on K-SMOTE and deep forest is realized. Source diagnostic method steps.
其中,所述计算机可读存储介质可以是前述实施例所述的计算机设备的内部存储单元,例如所述计算机设备的硬盘或内存。所述计算机可读存储介质也可以是所述计算机设备的外部存储设备,例如所述计算机设备上配备的插接式硬盘,智能存储卡(SmartMediaCard),安全数字(Secure Digital)卡,闪存卡(Flash Card)等。The computer-readable storage medium may be an internal storage unit of the computer device described in the previous embodiment, such as a hard disk or memory of the computer device. The computer-readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a smart memory card (SmartMediaCard), a secure digital (Secure Digital) card, a flash memory card ( Flash Card), etc.
本实施例还提供一种电子设备,所述电子设备包括处理器、存储器,以及存储在所述存储器上并可被所述处理器执行的计算机程序,其中所述计算机程序被所述处理器执行时,实现所述的基于K-SMOTE和深度森林的输电线路故障多源诊断方法的步骤。This embodiment also provides an electronic device. The electronic device includes a processor, a memory, and a computer program stored on the memory and executable by the processor, wherein the computer program is executed by the processor. When, the steps of the multi-source diagnosis method for transmission line faults based on K-SMOTE and deep forest are implemented.
尽管已经示出和描述了本发明的实施例,对于本领域的普通技术人员而言,可以理解在不脱离本发明的原理和精神的情况下可以对这些实施例进行多种变化、修改、替换和变型,本发明的范围由所附权利要求及其等同物限定。Although the embodiments of the present invention have been shown and described, those of ordinary skill in the art will understand that various changes, modifications, and substitutions can be made to these embodiments without departing from the principles and spirit of the invention. and modifications, the scope of the invention is defined by the appended claims and their equivalents.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310998033.9A CN117031202A (en) | 2023-08-09 | 2023-08-09 | K-SMOTE and depth forest based power transmission line fault multi-source diagnosis method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310998033.9A CN117031202A (en) | 2023-08-09 | 2023-08-09 | K-SMOTE and depth forest based power transmission line fault multi-source diagnosis method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117031202A true CN117031202A (en) | 2023-11-10 |
Family
ID=88629467
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310998033.9A Pending CN117031202A (en) | 2023-08-09 | 2023-08-09 | K-SMOTE and depth forest based power transmission line fault multi-source diagnosis method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117031202A (en) |
-
2023
- 2023-08-09 CN CN202310998033.9A patent/CN117031202A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lines et al. | Hive-cote: The hierarchical vote collective of transformation-based ensembles for time series classification | |
CN106248801B (en) | A rail crack detection method based on the probability of multiple acoustic emission events | |
CN111666169B (en) | Improved isolated forest algorithm and Gaussian distribution-based combined data anomaly detection method | |
CN110108992B (en) | Method and system for cable partial discharge fault identification based on improved random forest algorithm | |
CN111046931A (en) | A Random Forest-based Switch Fault Diagnosis Method | |
KR101964412B1 (en) | Method for diagnosing anomaly log of mobile commmunication data processing system and system thereof | |
CN110147760B (en) | A New Method for Feature Extraction and Recognition of High Efficiency Power Quality Disturbance Image | |
CN109948726B (en) | A Deep Forest Based Power Quality Disturbance Classification Method | |
CN107682109B (en) | A kind of interference signal classifying identification method suitable for UAV Communication system | |
CN112580471A (en) | Non-invasive load identification method based on AdaBoost feature extraction and RNN model | |
CN108595884A (en) | Power system transient stability appraisal procedure and device | |
CN108804731A (en) | Based on the dual evaluation points time series trend feature extracting method of vital point | |
CN112732748A (en) | Non-invasive household appliance load identification method based on adaptive feature selection | |
CN113489514B (en) | Method and device for noise identification of power line communication based on self-organizing mapping neural network | |
CN109975697A (en) | A kind of Mechanical Failure of HV Circuit Breaker diagnostic method based on atom sparse decomposition | |
CN108599152A (en) | The key stato variable choosing method and device of power system transient stability assessment | |
CN111553186A (en) | Electromagnetic signal identification method based on depth long-time and short-time memory network | |
CN114371009A (en) | High-speed train bearing fault diagnosis method based on improved random forest | |
CN108805295A (en) | A kind of method for diagnosing faults based on decision Tree algorithms | |
CN110244216B (en) | Fault diagnosis method of analog circuit based on cloud model optimization PNN | |
CN111310719A (en) | Unknown radiation source individual identification and detection method | |
CN114021424A (en) | PCA-CNN-LVQ-based voltage sag source identification method | |
CN117031202A (en) | K-SMOTE and depth forest based power transmission line fault multi-source diagnosis method and system | |
Morais et al. | A framework for evaluating automatic classification of underlying causes of disturbances and its application to short-circuit faults | |
CN118277823A (en) | Signal sorting method and system for TR-RAGCN-FSFM |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20240408 Address after: No. 1, Jiangning District, Jiangning District, Nanjing, Jiangsu Applicant after: STATE GRID JIANGSU ELECTRIC POWER COMPANY Research Institute Country or region after: China Applicant after: STATE GRID JIANGSU ELECTRIC POWER Co.,Ltd. Applicant after: State Grid Jiangsu Electric Power Co.,Ltd. innovation and Innovation Center Applicant after: JIANGSU ELECTRIC POWER RESEARCH INSTITUTE Co.,Ltd. Address before: No. 1, Jiangning District, Jiangning District, Nanjing, Jiangsu Applicant before: STATE GRID JIANGSU ELECTRIC POWER COMPANY Research Institute Country or region before: China Applicant before: STATE GRID JIANGSU ELECTRIC POWER Co.,Ltd. Applicant before: State Grid Jiangsu Electric Power Co.,Ltd. innovation and Innovation Center Applicant before: Super high voltage branch of State Grid Jiangsu Electric Power Co.,Ltd. |
|
TA01 | Transfer of patent application right |