CN107817427A

CN107817427A - Decision tree recognition methods based on sulfur hexafluoride gas shelf depreciation

Info

Publication number: CN107817427A
Application number: CN201711044243.5A
Authority: CN
Inventors: 邱妮; 何国军; 姚强; 苗玉龙; 唐炬; 曾福平; 杨华夏; 籍勇亮; 胡晓锐; 宫林; 张施令
Original assignee: Electric Power Research Institute of State Grid Chongqing Electric Power Co Ltd; State Grid Corp of China SGCC; Wuhan University WHU
Current assignee: Electric Power Research Institute of State Grid Chongqing Electric Power Co Ltd; State Grid Corp of China SGCC; Wuhan University WHU
Priority date: 2017-10-31
Filing date: 2017-10-31
Publication date: 2018-03-20

Abstract

The invention discloses a kind of decision tree recognition methods based on sulfur hexafluoride gas shelf depreciation, and it is as follows that it includes decision tree formation flow：S1：Whether training of judgement sample is empty, if it is not, into step S2；Conversely, then enter step S6；S2：Sample in decision node only has a classification, if it is not, into step S3；Conversely, then enter step S6；S3：Whether the attribute A of information gain-ratio is continuous quantity in judgement sample, if it is not, into step S4；Conversely, into step S6；S4：Find attribute A partition threshold；S5：New node, and return to step S1 are grown according to attribute A；S6：As leaf node and it is named as respective classes；S7：Form decision tree.The beneficial effect that the present invention obtains is：Ensure SF₆The safe and reliable operation of equipment, its discrimination to all kinds of defects is improved, the treatment effeciency to insulation fault can be improved.Pattern-recognition has been carried out to the shelf depreciation of acquisition using decision tree, has further improved the discrimination to shelf depreciation.

Description

Decision tree recognition methods based on sulfur hexafluoride gas shelf depreciation

Technical field

The present invention relates to sulfur hexafluoride decomposition technique field, particularly a kind of determining based on sulfur hexafluoride gas shelf depreciation Plan tree recognition methods.

Background technology

Sulfur hexafluoride (SF₆) gas is widely used in air insulating device due to its excellent insulation and arc extinction performance In.However, SF₆Air insulating device (abbreviation SF₆Electrical equipment, such as gas insulated combined electrical equipment GIS, gas insulation breaker GCB, gas-insulated transformer GIT and gas-insulated lines or pipeline GIL etc.) in manufacture, transport, installation, maintenance and operation Deng during, internal inevitably various insulation defects, such as the metallic bur power on conductor, superstructure loosening or contact not Good, conductor and supporting insulator peel off metal particle in legacy and cavity after the air gap to be formed, maintenance etc., and these are all SF can be made₆Device interior forms different degrees of insulation defect, so as to cause device interior electric field to be distorted, and then generation office Discharge (PD) in portion.

When there is serious PD, on the one hand, PD can accelerate the further destruction to device interior insulation, ultimately result in absolutely Reason barrier causes power outage, to operating SF₆Equipment is a kind of potential hidden danger, there is the title of insulation " tumour "；The opposing party Face, PD is the characteristic quantity of Efficient Characterization insulation status again, by SF₆The PD of electrical equipment carry out detection go forward side by side row mode knowledge Not, SF can largely be found₆Insulation defect existing for device interior and type.Therefore, the production of insulation defect is identified Life is to ensureing SF₆Electrical equipment safe and reliable operation has important practical significance.

The content of the invention

It is local based on sulfur hexafluoride gas it is an object of the invention to provide one kind in view of the drawbacks described above of prior art The decision tree recognition methods of electric discharge, ensure SF₆The safe and reliable operation of equipment, its discrimination to all kinds of defects is improved, can be with Improve the treatment effeciency to insulation fault.

The purpose of the present invention realized by such technical scheme, a kind of based on sulfur hexafluoride gas shelf depreciation Decision tree recognition methods, it includes：It is as follows that the decision tree forms flow：

S1：Whether training of judgement sample is empty, if it is not, into step S2；Conversely, then enter step S6；

S2：Sample in decision node only has a classification, if it is not, into step S3；Conversely, then enter step S6；

S3：Whether the attribute A of information gain-ratio is continuous quantity in judgement sample, if it is not, into step S4；Conversely, into Step S6；

S4：Find attribute A partition threshold；

S5：New node, and return to step S1 are grown according to attribute A；

S6：As leaf node and it is named as respective classes；

S7：Form decision tree.

Further, the judgement flow of the step S2 also includes：

S21：Calculate the information gain-ratio under each attribute in sample；

S22：Find attribute A maximum in information gain-ratio.

Further, the step S6 also includes：

S61：Calculate estimation mistake point rate and carry out beta pruning.

Further, decision tree is generated using C4.5 algorithms, product process is as follows：

S01：If S is the set of s data sample, the data sample belongs to m different class C_i(i=1 ..., m)；

S02：If s_iIt is C_iIn sample number, to a given sample, its total information entropy is：

Wherein, p_iIt is that arbitrary sample belongs to C_iProbability, using s_i/ s estimates；

S03：If A is an attribute of sample, attribute A has v different value { a₁,a₂,...,a_v}；

S04：S is divided into by v subset { S by attribute A₁,S₂,...,S_v}；Wherein, S_jThe value for being attribute A in S is a_j's Sample；

S05：If selecting A as testing attribute, these subsets are exactly point to be grown out from sample set S node Branch.

Further, the decision tree generation also includes：

S06：If S_ijIt is subset S_jIn belong to class C_iSample number；

S07：Entropy (entropy) according to the attribute A subsets being divided into is：

Wherein,For subset S_jPower, and be equal to subset S_jIn number of samples divided by S gross sample This number, entropy is smaller, and the purity of subset division is higher；

S08：I(s_1j,s_2j,...,s_mj) it is subset S_jEntropy：

WhereinIt is S_jIn sample belong to class C_iProbability；

S09：It is with the information gain value obtained by after attribute A division sample sets S：

Gain (S, A)=I (s₁,s₂,...,s_m)-E(A) (4)。

Further, also include：

S010：If A is continuous type attribute, training set S sample is sorted from small to large according to attribute A value；

S011：Assuming that training sample concentrates A to have the different values of v, then the value sequence for sequencing A attributes after sequence is { a₁, a₂,...,a_v}；The average value of consecutive value is taken then to share v-1 cut-point as cut-point one by one in order；

S012：The information gain-ratio of each cut-point is calculated respectively, and cut-point of the selection with maximum information ratio of profit increase is made For local threshold values；

S013：In sequence { a₁,a₂,...,a_vIn find closest to but no more than local threshold values value v_maxAs Attribute A partition threshold；

S014：Using the method choice testing attribute based on information gain-ratio, information gain-ratio is equal to information gain to dividing The ratio of information content is cut, i.e., is with the A information gain-ratios divided to S：

Wherein,

Further, the beta pruning of decision tree is also included：

S611：The calculating formula of the wrong point rate of estimation of leaf node is：

Wherein, the mistake on f ordinary meanings divides rate, and f=E/N, E are wrong point in leaf node of number of samples, and N is current leaf The sum of sample, z are fiducial limit, and generally when confidence level is 0.25, z 0.69, the estimation mistake point rate of subtree root node is The weighted average of each undue rate of leaf segment point estimation and, i.e.,

Wherein k be branch number, N_iBy the number for the sample assigned in i-th of branch.

Further, the training sample in step S1 includes：Metallic projections, insulator surface air gap, free metal are micro- Three kinds of defects of grain.

By adopting the above-described technical solution, the present invention has the advantage that：

(1) SF is ensured₆The safe and reliable operation of equipment, to grasping SF₆Apparatus insulated operation conditions, structure State Maintenance body System has important science and practical value；

(2) its discrimination to all kinds of defects is improved, can preferably characterize the feature of all kinds of insulation defects；

(3) the insulation defect intelligent diagnosis system based on decomposed constituent is established using decision tree, can improved to insulation event The treatment effeciency of barrier；

(4) decision tree method of insulation defect identification is founded, finds rule and bar using component ratio identification fault type Part；

(5) pattern-recognition has been carried out to the shelf depreciation of acquisition using decision tree, has further improved the knowledge to shelf depreciation Not rate.

Other advantages, target and the feature of the present invention will be illustrated in the following description to a certain extent, and And to a certain extent, based on will be apparent to those skilled in the art to investigating hereafter, Huo Zheke To be instructed from the practice of the present invention.The target and other advantages of the present invention can be wanted by following specification and right Book is sought to realize and obtain.

Brief description of the drawings

The brief description of the drawings of the present invention is as follows：

Fig. 1 is the decision Tree algorithms construction flow chart of the present invention.

Fig. 2 is the decision tree diagram that the present invention constructs.

The mistake that Fig. 3 is the present invention divides sample distribution figure.

Embodiment

The invention will be further described with reference to the accompanying drawings and examples.

Embodiment：As shown in Figure 1 to Figure 3；A kind of decision tree recognition methods based on sulfur hexafluoride gas shelf depreciation, it Include：It is as follows that the decision tree forms flow：

The judgement flow of the step S2 also includes：

S21：Calculate the information gain-ratio under each attribute in sample；

S22：Find attribute A maximum in information gain-ratio.

S4：Find attribute A partition threshold；

S5：New node, and return to step S1 are grown according to attribute A；

S6：As leaf node and it is named as respective classes；The step S6 also includes：S61：Calculate estimation mistake and divide rate simultaneously Carry out beta pruning.

S7：Form decision tree.

For the present patent application using C4.5 algorithms generation decision tree, C4.5 algorithms are that current most powerful and use is most wide One of general decision Tree algorithms, it is the whole that ID3 algorithms are remained based on the ID3 algorithms proposed by Quinlan in 1986 Advantage, and a series of improvement have been carried out to ID3 algorithms, substantially increase the performance of the algorithm.

The principle of ID3 algorithms is decision tree when nodes at different levels select attribute, with information entropy theory, selects current sample Concentrate and the attribute with maximum information yield value is as testing attribute, branch, Zhi Daosuo are established by the different values of the attribute There is subset only comprising untill same category of data, to finally obtain the decision tree of an identification object.

Decision tree is generated using C4.5 algorithms, product process is as follows：

The decision tree generation also includes：

S06：If S_ijIt is subset S_jIn belong to class C_iSample number；

S08：I(s_1j,s_2j,...,s_mj) it is subset S_jEntropy：

WhereinIt is S_jIn sample belong to class C_iProbability；

Gain (S, A)=I (s₁,s₂,...,s_m)-E(A) (4)。

ID3 algorithms are exactly as testing attribute to attribute maximum each node selection information gain Gain (S, A).The calculation The advantages of method is that method is simple, learning ability is stronger.Its shortcoming is intended to the attribute for selecting value more, and in most of feelings The not necessarily optimal attribute of the more attribute of value under condition.In addition, only to contrast less data set effective for ID3 algorithms, and It is more sensitive to noise, and when training dataset becomes big, decision tree may change therewith.

C4.5 algorithms have done a series of improvement to ID3 algorithms.First, it can be handled with the signal of connection attribute, Its basic thought is that the codomain of Continuous valued attributes is divided into discrete section to gather, and is also included：

Wherein,

The information gain-ratio of all properties in current candidate property set is obtained in aforementioned manners, finds out wherein information gain-ratio Sample set is divided into some subsample collection, the method same to each subsample continues by highest attribute as testing attribute Segmentation is until indivisible or untill reaching stop condition.

Also include the beta pruning of decision tree：Decision-making branch pruning just refers to replace a whole stalk tree with a leaf node, Whole tree is first established, then it is built again.It is if the root node of subtree estimates mistake point rate ratio after branch that it, which trims principle, The estimation mistake of leaf divides rate big before branch, is carried out trimming, does not otherwise trim.

Training sample in step S1 includes：Metallic projections, insulator surface air gap, three kinds of free metal particulate lack Fall into.

Because pollution severity of insulators is not decomposed, so the characteristic quantity that the present patent application is extracted：c (SOF2)/c (SO2F2), c (CF4)/c (CO2) and c (SOF2+SO2F2)/c (CO2+CF4) component content ratio are as feature Amount, by 24 groups of SF6 decomposed constituents under three kinds of obtained metallic projections, insulator surface air gap, free metal particulate defects Data are established using principle described above as training sample and trim decision tree, the minimum sample number of node is arranged to 2, puts The letter factor is set to 0.25, and the classification accuracy of the decision tree is weighed using ten folding cross validations.

Result is generated from decision tree, in three characteristic quantities of input, the decision tree ultimately formed has only used c (SOF2)/c (SO2F2) and c (the CF4)/component content ratio feature amounts of c (CO2) two, this explanation test data have preferable Discrimination, it is only necessary to which three kinds of defects can be identified for two characteristic quantities, and C4.5 algorithms are according to the maximum original of information gain-ratio Then, c (SOF2)/c (SO2F2) and c (CF4)/two characteristic quantities of c (CO2) are have chosen, and have cast out c (SOF2+SO2F2)/c (CO2+CF4)。

To the decision tree of gained in Fig. 2, using another set test data as test sample, to verify its classification Energy.Using 24 groups of SF under three kinds of defects₆Decomposed constituent data, its recognition result are as shown in table 1；

The decision tree recognition result of table 1

Defect type	N classes	P classes	G classes	Amount to
					Sample number	8	8	8	24
Identify number	8	6	7	21
					Discrimination	100%	75.0%	87.5%	87.5%

Four kinds of Exemplary insulative defects of laboratory simulation are carried out with the decision tree of gained it can be seen from recognition result Identification, comprehensive discrimination reach 87.50%, achieve more good recognition effect.

The recognition result confusion matrix of table 2

In addition to N classes defect all can be identified correctly, other defects are all present by the sample of mistake point.Table 2 gives identification knot The confusion matrix of fruit, P class defects have 2 groups to be confused with G class defects, and G class defects have 1 group to be confused with P class defects.If by this A little mistakes divide sample to be labeled in coordinate as shown in Figure 3, it can be seen that identify the sample standard deviation of mistake positioned at two kinds of defect types Near border, this is due to that the border of decision tree is an absolute value, easily causes the object near border to be known by mistake Not.

The device have the advantages that：It is proposed utilizes SF₆The decision tree identification of the decomposition components content ratio amount of being characterized Method, construct with c (SOF₂)/c(SO₂F₂) and c (CF₄)/c(CO₂) PD Pattern Recognition decision tree as characteristic quantity, Its decision process is as shown in Figure 2.And propose when discrimination is not high, can be by c (SOF₂+SO₂F₂)/c(CO₂+CF₄) as auxiliary Characteristic quantity is helped, further to improve the discrimination to shelf depreciation.The shelf depreciation obtained using the decision tree to laboratory is entered Go pattern-recognition, achieve satisfied effect.When constituent content is not exceeded and has ultra-high frequency signal, represent have in equipment absolutely Edge surface filth defect；And when constituent content is not exceeded and without ultra-high frequency signal, expression equipment is normal.

Finally illustrate, the above embodiments are merely illustrative of the technical solutions of the present invention and it is unrestricted, although with reference to compared with The present invention is described in detail good embodiment, it will be understood by those within the art that, can be to the skill of the present invention Art scheme is modified or equivalent substitution, and without departing from the objective and scope of the technical program, it all should cover in the present invention Right among.

Claims

1. a kind of decision tree recognition methods based on sulfur hexafluoride gas shelf depreciation, it is characterised in that the decision tree is formed Flow is as follows：

S4：Find attribute A partition threshold；

S5：New node, and return to step S1 are grown according to attribute A；

S6：As leaf node and it is named as respective classes；

S7：Form decision tree.

2. the decision tree recognition methods based on sulfur hexafluoride gas shelf depreciation as claimed in claim 1, it is characterised in that institute The judgement flow for stating step S2 also includes：

S21：Calculate the information gain-ratio under each attribute in sample；

S22：Find attribute A maximum in information gain-ratio.

3. the decision tree recognition methods based on sulfur hexafluoride gas shelf depreciation as claimed in claim 2, it is characterised in that institute Step S6 is stated also to include：

S61：Calculate estimation mistake point rate and carry out beta pruning.

4. the decision tree recognition methods based on sulfur hexafluoride gas shelf depreciation as claimed in claim 3, it is characterised in that adopt Decision tree is generated with C4.5 algorithms, product process is as follows：

<mrow> <mi>I</mi> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mn>1</mn> </msub> <mo>,</mo> <msub> <mi>s</mi> <mn>2</mn> </msub> <mo>,</mo> <mn>...</mn> <mo>,</mo> <msub> <mi>s</mi> <mi>m</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mo>-</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>m</mi> </munderover> <msub> <mi>p</mi> <mi>i</mi> </msub> <msub> <mi>log</mi> <mn>2</mn> </msub> <msub> <mi>p</mi> <mi>i</mi> </msub> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow>

S04：S is divided into by v subset { S by attribute A₁,S₂,...,S_v}；Wherein, S_jThe value for being attribute A in S is a_jSample This；

S05：If selecting A, these subsets are exactly the branch to be grown out from sample set S node as testing attribute.

5. the decision tree recognition methods based on sulfur hexafluoride gas shelf depreciation as claimed in claim 4, it is characterised in that institute Decision tree generation is stated also to include：

S06：If S_ijIt is subset S_jIn belong to class C_iSample number；

<mrow> <mi>E</mi> <mrow> <mo>(</mo> <mi>A</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>v</mi> </munderover> <mfrac> <mrow> <msub> <mi>s</mi> <mrow> <mn>1</mn> <mi>j</mi> </mrow> </msub> <mo>+</mo> <msub> <mi>s</mi> <mrow> <mn>2</mn> <mi>j</mi> </mrow> </msub> <mo>+</mo> <mn>...</mn> <mo>+</mo> <msub> <mi>s</mi> <mrow> <mi>m</mi> <mi>j</mi> </mrow> </msub> </mrow> <mi>s</mi> </mfrac> <mi>I</mi> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mrow> <mn>1</mn> <mi>j</mi> </mrow> </msub> <mo>,</mo> <msub> <mi>s</mi> <mrow> <mn>2</mn> <mi>j</mi> </mrow> </msub> <mo>,</mo> <mn>...</mn> <mo>,</mo> <msub> <mi>s</mi> <mrow> <mi>m</mi> <mi>j</mi> </mrow> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </mrow>

Wherein,For subset S_jPower, and be equal to subset S_jIn number of samples divided by S total number of samples, Entropy is smaller, and the purity of subset division is higher；

S08：I(s_1j,s_2j,...,s_mj) it is subset S_jEntropy：

<mrow> <mi>z</mi> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mrow> <mn>1</mn> <mi>j</mi> </mrow> </msub> <mo>,</mo> <msub> <mi>s</mi> <mrow> <mn>2</mn> <mi>j</mi> </mrow> </msub> <mo>,</mo> <mn>...</mn> <mo>,</mo> <msub> <mi>s</mi> <mrow> <mi>m</mi> <mi>j</mi> </mrow> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mo>-</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>m</mi> </munderover> <msub> <mi>p</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <msub> <mi>log</mi> <mn>2</mn> </msub> <msub> <mi>p</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>3</mn> <mo>)</mo> </mrow> </mrow>

WhereinIt is S_jIn sample belong to class C_iProbability；

Gain (S, A)=I (s₁,s₂,...,s_m)-E(A) (4)。

6. the decision tree recognition methods based on sulfur hexafluoride gas shelf depreciation as claimed in claim 5, it is characterised in that also Include：

S012：The information gain-ratio of each cut-point is calculated respectively, and cut-point of the selection with maximum information ratio of profit increase is as office Portion's threshold values；

S014：Using the method choice testing attribute based on information gain-ratio, information gain-ratio, which is equal to information gain, to be believed segmentation The ratio of breath amount, i.e., it is with the A information gain-ratios divided to S：

Wherein,

7. the decision tree recognition methods based on sulfur hexafluoride gas shelf depreciation as claimed in claim 6, it is characterised in that also Include the beta pruning of decision tree：

Wherein, the mistake on f ordinary meanings divides rate, and f=E/N, E are wrong point in leaf node of number of samples, and N is current leaf sample Sum, z is fiducial limit, generally confidence level be 0.25 when, z 0.69, the estimation mistake of subtree root node divides rate to be each leaf Node estimate undue rate weighted average and, i.e.,

<mrow> <msub> <mi>e</mi> <mi>T</mi> </msub> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>k</mi> </munderover> <mfrac> <msub> <mi>N</mi> <mi>i</mi> </msub> <mi>N</mi> </mfrac> <msub> <mi>e</mi> <mi>i</mi> </msub> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>7</mn> <mo>)</mo> </mrow> </mrow>

8. the decision tree recognition methods based on sulfur hexafluoride gas shelf depreciation as claimed in claim 1, it is characterised in that step Training sample in rapid S1 includes：Three kinds of metallic projections, insulator surface air gap, free metal particulate defects.