CN112199496A

CN112199496A - Power grid equipment defect text classification method based on multi-head attention mechanism and RCNN (Rich coupled neural network)

Info

Publication number: CN112199496A
Application number: CN202010778393.4A
Authority: CN
Inventors: 祝云; 陆世豪; 周振茂; 苏琪; 姚梦婷; 何鹏辉; 徐泽天; 伍文侠; 封之聪; 潘柯良; 兰慧颖; 冯帅
Original assignee: Guangxi University; Laibin Power Supply Bureau of Guangxi Power Grid Co Ltd; Tianshengqiao Bureau of Extra High Voltage Power Transmission Co
Current assignee: Guangxi University; Laibin Power Supply Bureau of Guangxi Power Grid Co Ltd; Tianshengqiao Bureau of Extra High Voltage Power Transmission Co
Priority date: 2020-08-05
Filing date: 2020-08-05
Publication date: 2021-01-08

Abstract

The invention discloses a method for classifying a defect text of power grid equipment based on a multi-head attention mechanism and an RCNN (Rich coupled neural network), which comprises the following steps of: firstly, preprocessing a power grid defect text by word segmentation and word removal; step two, embedding word vectors into the text after word segmentation to obtain a text matrix; inputting the text matrix into a multi-head attention model to obtain a text matrix containing attention, and fusing the attention text matrix with the original text matrix; step four, using an RCNN network model to extract the characteristics of the fused text matrix, and outputting the final classification result; and fifthly, testing and optimizing the multi-head attention model and the RCNN model by using the power grid primary equipment defect text. The method applies a multi-head attention mechanism and an RCNN (Rich coupled neural network) to classification of the defect texts of the power grid equipment, and realizes automatic classification of the defect texts of the equipment.

Description

Power grid equipment defect text classification method based on multi-head attention mechanism and RCNN (Rich coupled neural network)

Technical Field

The invention belongs to the technical field of electric power systems, and particularly relates to a power grid equipment defect text classification method based on a multi-head attention mechanism and an RCNN (remote control neural network).

Background

The power grid company classifies equipment defects into three grades of normal, important and urgent according to different severity degrees. The power grid company operators find the defects of the equipment in the process of inspection, operation and maintenance, report the information of the faults, the defects, the defect grades and the like of the equipment in a Chinese form, and arrange corresponding teams to eliminate the defects only after the equipment defect is verified and rated. The classification work of the equipment defects is usually carried out manually, the workload is large, time and labor are consumed, and the classification correctness is difficult to guarantee due to personal subjective factors and knowledge and experience differences. A large number of classified and graded equipment defect texts are stored in a defect information management system of a power grid company, and conditions can be created for quickly classifying the equipment defect texts by reasonably utilizing the texts. Therefore, the research of the text classification method of the equipment defect based on the natural language processing technology is very important and urgent.

At present, a plurality of text classification methods are applied to classification of the defect texts of the power grid equipment, and the traditional machine learning algorithm, the deep learning algorithm and the like are mainly adopted. The traditional machine learning algorithm comprises a support vector machine, a decision tree, Bayes and the like, and the feature extraction of the methods usually adopts LDA, TF-IDF and other shallow layer extraction and then is classified by classifiers such as the support vector machine and the decision tree, so that the classification effect of the shallow layer learning method is general and the semantic information of a text cannot be deeply learned; the result of the text classification method based on the traditional deep learning (such as TextCNN, TextRNN, etc.) is more accurate, but the text classification effect for the text with long-distance dependency and semantic transition is not good, so the applicability is limited.

The method has the characteristics that the defect text of the power grid equipment has strong specialization, the expressions are different from person to person, the text length is different, a large number of numbers and units are mixed in the defect text, the degraded defect text can contain the expression of semantic turn, and the like, and the classification accuracy of the equipment defect text is influenced by the characteristics.

Therefore, an urgent need exists in the art for a method for quickly and accurately classifying the defect texts of the power grid equipment based on the multi-head attention mechanism and the RCNN network.

Disclosure of Invention

In view of the above, the invention provides a power grid device defect text classification method based on a multi-head attention mechanism and an RCNN network, which applies the multi-head attention mechanism and the RCNN network to the classification of power grid device defect text to realize the automatic classification of device defect text.

In order to achieve the purpose, the invention adopts the following technical scheme:

a power grid equipment defect text classification method based on a multi-head attention mechanism and an RCNN network comprises the following steps:

firstly, preprocessing a power grid defect text by word segmentation and word removal;

step two, embedding word vectors into the text after word segmentation to obtain a text matrix;

inputting the text matrix into a multi-head attention model to obtain a text matrix containing attention, and fusing the attention text matrix with the original text matrix;

step four, using an RCNN network model to extract the characteristics of the fused text matrix, and outputting the final classification result;

and fifthly, testing and optimizing the multi-head attention model and the RCNN model by using the power grid primary equipment defect text.

Preferably, in the first step, the text preprocessing process is as follows:

(1) acquiring a data set file for text classification in advance, wherein the data set file comprises a power equipment defect text and a corresponding labeled defect grade class label;

(2) and establishing a proper name word bank and a stop word bank aiming at the text content, segmenting the text by utilizing a Chinese word segmentation component of python, and converting each text into a word sequence.

Preferably, the method for embedding word vectors into the text after word segmentation to obtain the text matrix is as follows:

(1) performing unsupervised training on a word sequence obtained by word segmentation by using a CBOW algorithm in a word2vec component in a genesis library to obtain a word vector corresponding to each word;

(2) and performing word embedding on the trained word vectors by using an embedding layer to obtain a text matrix.

Preferably, the CBOW algorithm is used to predict the probability p (w | context (w)) generated by w from the context (w) of the word w, and to train the word vector by maximizing the objective function T:

T＝∑log p(w|Context(w))。

preferably, the method for inputting the text matrix into the multi-head attention model to obtain the text matrix containing attention and fusing the attention text matrix and the original text matrix comprises the following steps:

(1) the dimension of the word vector is d_kThe length of the sentence is L, and the first word vector in the sentence is represented as e_l(1. ltoreq. L. ltoreq.L) to give Lxd_kText matrix E ═ E₁…e_l…e_L]；

(2) Inputting the text matrix into the multi-head attention model to obtain a text matrix representation which is subjected to multi-head attention optimization

Head＝MultiHead(EW^Q,EW^K,EW^V)＝Concat(head₁,…,head_h)W^O

Where, Q, K, V is the input matrix,

for scaling factor, Attetnion is the scaling dot product attention operation, Multihead is the Multi-head attention function, Concat is the splicing function, head is the result of the Single self-attention model operation, head_iFor the ith self-attention model operation result, W^Q、W^K、W^V、W^OIs a linear transformation matrix;

(3) fusing the attention text matrix with the original text matrix and outputting the matrix

E′₁＝Residual_Connect(E,Head)

E₁＝LayerNorm(E₁′)

Wherein E is₁' is the matrix after residual concatenation, Residul _ Connect is the residual concatenation operation, LayerNorm is the layer normalization.

Preferably, the method for extracting the features of the fused text matrix by using the RCNN network model and outputting the final classification result is as follows:

(1) the recurrent neural network part in the RCNN network adopts a bidirectional GRU network, the network consists of a forward input GRU and a reverse input GRU, and the GRU network is used for learning the current word w respectively on the left and the right_iLeft context of (c) represents cl (w)_i) And the right context representation cr (w)_i) And then with the attention word vector e (w) of the current word_i)∈E₁Input x connected to form subsequent convolutional layers_i：

cl(w_i)＝f(W^(l)cl(w_i-1)+W^(sl)e(w_i-1))

cr(w_i)＝f(W^(r)cr(w_i-1)+W^(sr)e(w_i-1))

x_i＝[cl(w_i)；e(w_i)；cr(w_i)]

Wherein, W^(l)，W^(r)For converting a hidden layer into a matrix of the next hidden layer, W^(sl)，W^(sr)F is a non-linear activation function for a matrix combining the semantics of the current word with the left or right text of the next word;

(2) convolution layer using the number of convolution kernel columns and x_iConvolution kernel with equal number of rows 1

The activation function is tanh, the output of the Bi-GRU network is convoluted by the convolution layer to obtain the convolution result

Wherein b is an offset;

(3) the Pooling layer part adopts Global Average potential firing (GAP) to sample the characteristics of the output result, y⁽³⁾∈R^3mFor the extracted feature vectors:

(4) inputting the feature vector into a softmax function, and outputting a final classification result:

wherein, P_iThe probability that the text classification is classified into i classes is shown, n is the number of classes, and the class with the highest probability is the classification result of the text.

Preferably, the method for testing and tuning the classification model by the power grid primary equipment defect text is as follows:

and testing the model by adopting a five-fold cross validation method for the data set, wherein the Macro-average comprehensive index Macro-F1 is adopted as the evaluation index, and the Adam gradient descent method is adopted for model training to update the model weight parameters, so that the test tuning is realized.

The invention has the beneficial effects that:

according to the invention, by mining the defect text data of the power grid equipment, the classification problem of the defect text of the power grid equipment is effectively solved, and the defects and shortcomings of the conventional RCNN are overcome. Firstly, in order to enable the RCNN to extract important information related to classification to have higher weight in classification, and enable irrelevant features to be selectively ignored, a multi-attention mechanism is introduced so as to be capable of distributing more attention weight to the information related to classification. Secondly, aiming at the problem that the word vector generated by word2vec cannot be dynamically optimized for a specific task, learning the word dependence relationship in the text by a multi-head attention mechanism, capturing the internal structure of the text, and converting the word vector from static state to dynamic state. And finally, the generated attention text matrix and the original text matrix are fused through residual connection, so that network degradation is prevented.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a schematic flow chart of the present invention.

FIG. 2 is a diagram of a multi-head attention model according to the present invention.

FIG. 3 is a schematic diagram of a Bi-GRU of the present invention.

FIG. 4 is a graph showing experimental results of different comparative models of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, the present invention provides a method for classifying a defect text of a power grid device based on a multi-head attention mechanism and an RCNN network, the method includes the following steps:

firstly, preprocessing a power grid defect text such as word segmentation and word stop;

The process of the power grid defect text preprocessing is as follows:

(1) the method comprises the steps of obtaining a data set file for text classification in advance, wherein the data set file comprises a power equipment defect text and a corresponding labeled defect grade class label, and the data set is a 2016-:

(2) table 1 data set information table

Name (R)	Number of classification	Training set	Test set
				2016 + 2019 defect of primary equipment of certain power supply station	3	1548	387

(3) Establishing a proper name word bank and a stop word bank aiming at text contents, segmenting the text by utilizing a python Chinese segmentation component, converting each equipment defect text into a word sequence, and exemplifying the equipment defect text: the knife switch can not be electrically operated on site and is converted into a word sequence: 'knife', 'fail', 'in place', 'power', 'operate'.

Embedding word vectors into the text after word segmentation, and obtaining a text matrix in the following specific process:

(1) the word sequence obtained by word segmentation is subjected to unsupervised training by using a CBOW algorithm in a word2vec component in a genesis library to obtain a word vector corresponding to each word. The principle of the CBOW algorithm is to predict the probability p (w | Context (w)) generated by w according to the context Context (w) of the word w, and train the word vector by maximizing the objective function T:

T＝∑log p(w|Context(w))

and (3) carrying out word sequence: the 'knife', 'fail', 'in place', 'power up', 'operation' training results in a 128-dimensional word vector as shown in table 2:

(2) TABLE 2 word-corresponding word vectors

(3) And performing word embedding on the trained word vectors by using an embedding layer to obtain a text matrix.

Inputting the text matrix into a multi-head attention model to obtain a text matrix containing attention, and fusing the attention text matrix with the original text matrix in the following specific process:

(1) the dimension of the word vector is d_k128, the sentence length is L, and the ith word vector in the sentence is denoted as e_l(1. ltoreq. L. ltoreq.L) to give Lxd_kText matrix E ═ E₁…e_l…e_L]；

The multi-head attention model is shown in FIG. 2;

Head＝MultiHead(EW^Q,EW^K,EW^V)＝Concat(head₁,…,head_h)W^O

where, Q, K, V is the input matrix,

for scaling factor, Attetnion is the scaling dot product attention operation, Multihead is the Multi-head attention function, Concat is the splicing function, head is the result of the Single self-attention model operation, W^Q、W^K、W^V、W^OFor linear transformation matrices, the text matrix optimized for multi-headed attention is shown in table 3.

(3) TABLE 3 attention text matrix

Comparing table 1 and table 2, it can be known that the text matrix keyword vectors optimized by the multi-head attention mechanism are enhanced from different dimensions;

(4) fusing the attention text matrix with the original text matrix and outputting the matrix

E′₁＝Residual_Connect(E,Head)

E₁＝LayerNorm(E₁′)

The specific process of using the RCNN model to extract the characteristics of the fused text matrix and outputting the final classification result is as follows:

(1) the recurrent neural network part in the RCNN network adopts a bidirectional GRU network, the schematic diagram is shown in FIG. 3, the network consists of a forward input GRU and a reverse input GRU, and the GRU network is used for learning the current word w respectively on the left and the right_iLeft context of (c) represents cl (w)_i) And the right context representation cr (w)_i) And then with the attention word vector e (w) of the current word_i)∈E₁Input x connected to form subsequent convolutional layers_i：

cl(w_i)＝f(W^(l)cl(w_i-1)+W^(sl)e(w_i-1))

cr(w_i)＝f(W^(r)cr(w_i-1)+W^(sr)e(w_i-1))

x_i＝[cl(w_i)；e(w_i)；cr(w_i)]

Wherein b is an offset.

(4) and inputting the feature vector into a softmax function, and outputting a final classification result.

(5) The model training adopts a five-fold cross validation method, the model training adopts an Adam gradient descent method to update model weight parameters, and the prediction result of the model on the example text sequence after tuning is shown in Table 3.

(6) TABLE 3 results of the classification

Predicting equipment defects through an algorithm: the knife switch can not be electrically operated on site, and the general defects are consistent with the grades of the actual defects.

(7) To examine the effect of the classification algorithm of this embodiment, this embodiment also designed comparative experiments with other models. The experimental environment is CPU Intel Core i7-8550U, the experimental framework is Tensorflow, the evaluation index is Macro-average comprehensive index Macro-F1(MF1), and the experimental result is shown in FIG. 3.

FIG. 4 shows the comparison between the multi-head attention mechanism and the RCNN classification algorithm (MAT-RCNN) and other classification algorithms. It can be seen from the figure that the model of the present invention is superior to the compared classification algorithm in classification effect, and MF1 reaches 94.51%.

According to the method, the deep semantic learning algorithm is applied to classification of the power grid defect texts, and the power grid equipment defect texts are classified quickly, so that quick grading of equipment defects is achieved, the maintenance efficiency of the power grid equipment is improved, the fault elimination time is shortened, and the practicability is high. The method not only saves labor cost, but also has good classification effect on the power grid defect texts.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A power grid equipment defect text classification method based on a multi-head attention mechanism and an RCNN (Rich coupled neural network) is characterized by comprising the following steps:

2. The method as claimed in claim 1, wherein in the first step, the text preprocessing is performed as follows:

3. The method for classifying the defect texts of the power grid equipment based on the multi-head attention mechanism and the RCNN is characterized in that word vector embedding is performed on the text after word segmentation, and a text matrix is obtained by the following method:

4. The method for classifying the defect texts of the power grid equipment based on the multi-head attention mechanism and the RCNN network as claimed in claim 3, wherein the CBOW algorithm is used to predict the probability p (w | Context (w)) generated by the word w according to the context (w) of the word w, and the word vector is trained by maximizing the objective function T:

T＝∑logp(w|Context(w))。

5. the method for classifying the defect texts of the power grid equipment based on the multi-head attention mechanism and the RCNN is characterized in that the text matrix is input into a multi-head attention model to obtain a text matrix containing attention, and the method for fusing the attention text matrix and the original text matrix comprises the following steps:

(1) the dimension of the word vector is d_kThe length of the sentence is L, the first word vector in the sentence is expressed ase_l(1. ltoreq. L. ltoreq.L) to give Lxd_kText matrix E ═ E₁…e_l…e_L]；

Head＝MultiHead(EW^Q,EW^K,EW^V)＝Concat(head₁,…,head_h)W^O

Where, Q, K, V is the input matrix,

E′₁＝Residual_Connect(E,Head)

E₁＝LayerNorm(E′₁)

Wherein, E'₁The matrix after residual concatenation, Residul _ Connect, and LayerNorm, are the residual concatenation operations and layer normalization.

6. The method for classifying the defect texts of the power grid equipment based on the multi-head attention mechanism and the RCNN is characterized in that the method for extracting the features of the fused text matrix by using the RCNN model and outputting the final classification result comprises the following steps:

cl(w_i)＝f(W^(l)cl(w_i-1)+W^(sl)e(w_i-1))

cr(w_i)＝f(W^(r)cr(w_i-1)+W^(sr)e(w_i-1))

x_i＝[cl(w_i)；e(w_i)；cr(w_i)]

Wherein b is an offset;

(3) the part of the Pooling layer adopts Global Average Potential (GAP) to output the resultLine feature sampling, y⁽³⁾∈R^3mFor the extracted feature vectors:

7. The method for classifying the defect texts of the power grid equipment based on the multi-head attention mechanism and the RCNN is characterized in that the method for testing and optimizing the classification model by the defect texts of the primary equipment of the power grid is as follows: