CN112199496A - Power grid equipment defect text classification method based on multi-head attention mechanism and RCNN (Rich coupled neural network) - Google Patents
Power grid equipment defect text classification method based on multi-head attention mechanism and RCNN (Rich coupled neural network) Download PDFInfo
- Publication number
- CN112199496A CN112199496A CN202010778393.4A CN202010778393A CN112199496A CN 112199496 A CN112199496 A CN 112199496A CN 202010778393 A CN202010778393 A CN 202010778393A CN 112199496 A CN112199496 A CN 112199496A
- Authority
- CN
- China
- Prior art keywords
- text
- word
- matrix
- attention
- rcnn
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000007547 defect Effects 0.000 title claims abstract description 64
- 238000000034 method Methods 0.000 title claims abstract description 45
- 230000007246 mechanism Effects 0.000 title claims abstract description 20
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 9
- 239000011159 matrix material Substances 0.000 claims abstract description 82
- 239000013598 vector Substances 0.000 claims abstract description 36
- 230000011218 segmentation Effects 0.000 claims abstract description 17
- 238000012360 testing method Methods 0.000 claims abstract description 11
- 238000007781 pre-processing Methods 0.000 claims abstract description 7
- 230000006870 function Effects 0.000 claims description 18
- 238000004422 calculation algorithm Methods 0.000 claims description 11
- 238000012549 training Methods 0.000 claims description 9
- 230000004913 activation Effects 0.000 claims description 6
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 3
- 230000002457 bidirectional effect Effects 0.000 claims description 3
- 238000002790 cross-validation Methods 0.000 claims description 3
- 238000011156 evaluation Methods 0.000 claims description 3
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 238000005457 optimization Methods 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 230000000306 recurrent effect Effects 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 238000013145 classification model Methods 0.000 claims description 2
- 238000005070 sampling Methods 0.000 claims 1
- 230000008569 process Effects 0.000 description 6
- 230000000694 effects Effects 0.000 description 5
- 238000007635 classification algorithm Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 230000000052 comparative effect Effects 0.000 description 2
- 238000003066 decision tree Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000010304 firing Methods 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a method for classifying a defect text of power grid equipment based on a multi-head attention mechanism and an RCNN (Rich coupled neural network), which comprises the following steps of: firstly, preprocessing a power grid defect text by word segmentation and word removal; step two, embedding word vectors into the text after word segmentation to obtain a text matrix; inputting the text matrix into a multi-head attention model to obtain a text matrix containing attention, and fusing the attention text matrix with the original text matrix; step four, using an RCNN network model to extract the characteristics of the fused text matrix, and outputting the final classification result; and fifthly, testing and optimizing the multi-head attention model and the RCNN model by using the power grid primary equipment defect text. The method applies a multi-head attention mechanism and an RCNN (Rich coupled neural network) to classification of the defect texts of the power grid equipment, and realizes automatic classification of the defect texts of the equipment.
Description
Technical Field
The invention belongs to the technical field of electric power systems, and particularly relates to a power grid equipment defect text classification method based on a multi-head attention mechanism and an RCNN (remote control neural network).
Background
The power grid company classifies equipment defects into three grades of normal, important and urgent according to different severity degrees. The power grid company operators find the defects of the equipment in the process of inspection, operation and maintenance, report the information of the faults, the defects, the defect grades and the like of the equipment in a Chinese form, and arrange corresponding teams to eliminate the defects only after the equipment defect is verified and rated. The classification work of the equipment defects is usually carried out manually, the workload is large, time and labor are consumed, and the classification correctness is difficult to guarantee due to personal subjective factors and knowledge and experience differences. A large number of classified and graded equipment defect texts are stored in a defect information management system of a power grid company, and conditions can be created for quickly classifying the equipment defect texts by reasonably utilizing the texts. Therefore, the research of the text classification method of the equipment defect based on the natural language processing technology is very important and urgent.
At present, a plurality of text classification methods are applied to classification of the defect texts of the power grid equipment, and the traditional machine learning algorithm, the deep learning algorithm and the like are mainly adopted. The traditional machine learning algorithm comprises a support vector machine, a decision tree, Bayes and the like, and the feature extraction of the methods usually adopts LDA, TF-IDF and other shallow layer extraction and then is classified by classifiers such as the support vector machine and the decision tree, so that the classification effect of the shallow layer learning method is general and the semantic information of a text cannot be deeply learned; the result of the text classification method based on the traditional deep learning (such as TextCNN, TextRNN, etc.) is more accurate, but the text classification effect for the text with long-distance dependency and semantic transition is not good, so the applicability is limited.
The method has the characteristics that the defect text of the power grid equipment has strong specialization, the expressions are different from person to person, the text length is different, a large number of numbers and units are mixed in the defect text, the degraded defect text can contain the expression of semantic turn, and the like, and the classification accuracy of the equipment defect text is influenced by the characteristics.
Therefore, an urgent need exists in the art for a method for quickly and accurately classifying the defect texts of the power grid equipment based on the multi-head attention mechanism and the RCNN network.
Disclosure of Invention
In view of the above, the invention provides a power grid device defect text classification method based on a multi-head attention mechanism and an RCNN network, which applies the multi-head attention mechanism and the RCNN network to the classification of power grid device defect text to realize the automatic classification of device defect text.
In order to achieve the purpose, the invention adopts the following technical scheme:
a power grid equipment defect text classification method based on a multi-head attention mechanism and an RCNN network comprises the following steps:
firstly, preprocessing a power grid defect text by word segmentation and word removal;
step two, embedding word vectors into the text after word segmentation to obtain a text matrix;
inputting the text matrix into a multi-head attention model to obtain a text matrix containing attention, and fusing the attention text matrix with the original text matrix;
step four, using an RCNN network model to extract the characteristics of the fused text matrix, and outputting the final classification result;
and fifthly, testing and optimizing the multi-head attention model and the RCNN model by using the power grid primary equipment defect text.
Preferably, in the first step, the text preprocessing process is as follows:
(1) acquiring a data set file for text classification in advance, wherein the data set file comprises a power equipment defect text and a corresponding labeled defect grade class label;
(2) and establishing a proper name word bank and a stop word bank aiming at the text content, segmenting the text by utilizing a Chinese word segmentation component of python, and converting each text into a word sequence.
Preferably, the method for embedding word vectors into the text after word segmentation to obtain the text matrix is as follows:
(1) performing unsupervised training on a word sequence obtained by word segmentation by using a CBOW algorithm in a word2vec component in a genesis library to obtain a word vector corresponding to each word;
(2) and performing word embedding on the trained word vectors by using an embedding layer to obtain a text matrix.
Preferably, the CBOW algorithm is used to predict the probability p (w | context (w)) generated by w from the context (w) of the word w, and to train the word vector by maximizing the objective function T:
T=∑log p(w|Context(w))。
preferably, the method for inputting the text matrix into the multi-head attention model to obtain the text matrix containing attention and fusing the attention text matrix and the original text matrix comprises the following steps:
(1) the dimension of the word vector is dkThe length of the sentence is L, and the first word vector in the sentence is represented as el(1. ltoreq. L. ltoreq.L) to give LxdkText matrix E ═ E1…el…eL];
(2) Inputting the text matrix into the multi-head attention model to obtain a text matrix representation which is subjected to multi-head attention optimization
Head=MultiHead(EWQ,EWK,EWV)=Concat(head1,…,headh)WO
Where, Q, K, V is the input matrix,for scaling factor, Attetnion is the scaling dot product attention operation, Multihead is the Multi-head attention function, Concat is the splicing function, head is the result of the Single self-attention model operation, headiFor the ith self-attention model operation result, WQ、WK、WV、WOIs a linear transformation matrix;
E′1=Residual_Connect(E,Head)
E1=LayerNorm(E1′)
Wherein E is1' is the matrix after residual concatenation, Residul _ Connect is the residual concatenation operation, LayerNorm is the layer normalization.
Preferably, the method for extracting the features of the fused text matrix by using the RCNN network model and outputting the final classification result is as follows:
(1) the recurrent neural network part in the RCNN network adopts a bidirectional GRU network, the network consists of a forward input GRU and a reverse input GRU, and the GRU network is used for learning the current word w respectively on the left and the rightiLeft context of (c) represents cl (w)i) And the right context representation cr (w)i) And then with the attention word vector e (w) of the current wordi)∈E1Input x connected to form subsequent convolutional layersi:
cl(wi)=f(W(l)cl(wi-1)+W(sl)e(wi-1))
cr(wi)=f(W(r)cr(wi-1)+W(sr)e(wi-1))
xi=[cl(wi);e(wi);cr(wi)]
Wherein, W(l),W(r)For converting a hidden layer into a matrix of the next hidden layer, W(sl),W(sr)F is a non-linear activation function for a matrix combining the semantics of the current word with the left or right text of the next word;
(2) convolution layer using the number of convolution kernel columns and xiConvolution kernel with equal number of rows 1The activation function is tanh, the output of the Bi-GRU network is convoluted by the convolution layer to obtain the convolution result
Wherein b is an offset;
(3) the Pooling layer part adopts Global Average potential firing (GAP) to sample the characteristics of the output result, y(3)∈R3mFor the extracted feature vectors:
(4) inputting the feature vector into a softmax function, and outputting a final classification result:
wherein, PiThe probability that the text classification is classified into i classes is shown, n is the number of classes, and the class with the highest probability is the classification result of the text.
Preferably, the method for testing and tuning the classification model by the power grid primary equipment defect text is as follows:
and testing the model by adopting a five-fold cross validation method for the data set, wherein the Macro-average comprehensive index Macro-F1 is adopted as the evaluation index, and the Adam gradient descent method is adopted for model training to update the model weight parameters, so that the test tuning is realized.
The invention has the beneficial effects that:
according to the invention, by mining the defect text data of the power grid equipment, the classification problem of the defect text of the power grid equipment is effectively solved, and the defects and shortcomings of the conventional RCNN are overcome. Firstly, in order to enable the RCNN to extract important information related to classification to have higher weight in classification, and enable irrelevant features to be selectively ignored, a multi-attention mechanism is introduced so as to be capable of distributing more attention weight to the information related to classification. Secondly, aiming at the problem that the word vector generated by word2vec cannot be dynamically optimized for a specific task, learning the word dependence relationship in the text by a multi-head attention mechanism, capturing the internal structure of the text, and converting the word vector from static state to dynamic state. And finally, the generated attention text matrix and the original text matrix are fused through residual connection, so that network degradation is prevented.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a schematic flow chart of the present invention.
FIG. 2 is a diagram of a multi-head attention model according to the present invention.
FIG. 3 is a schematic diagram of a Bi-GRU of the present invention.
FIG. 4 is a graph showing experimental results of different comparative models of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, the present invention provides a method for classifying a defect text of a power grid device based on a multi-head attention mechanism and an RCNN network, the method includes the following steps:
firstly, preprocessing a power grid defect text such as word segmentation and word stop;
step two, embedding word vectors into the text after word segmentation to obtain a text matrix;
inputting the text matrix into a multi-head attention model to obtain a text matrix containing attention, and fusing the attention text matrix with the original text matrix;
step four, using an RCNN network model to extract the characteristics of the fused text matrix, and outputting the final classification result;
and fifthly, testing and optimizing the multi-head attention model and the RCNN model by using the power grid primary equipment defect text.
The process of the power grid defect text preprocessing is as follows:
(1) the method comprises the steps of obtaining a data set file for text classification in advance, wherein the data set file comprises a power equipment defect text and a corresponding labeled defect grade class label, and the data set is a 2016-:
(2) table 1 data set information table
Name (R) | Number of classification | Training set | Test set |
2016 + 2019 defect of primary equipment of certain power supply station | 3 | 1548 | 387 |
(3) Establishing a proper name word bank and a stop word bank aiming at text contents, segmenting the text by utilizing a python Chinese segmentation component, converting each equipment defect text into a word sequence, and exemplifying the equipment defect text: the knife switch can not be electrically operated on site and is converted into a word sequence: 'knife', 'fail', 'in place', 'power', 'operate'.
Embedding word vectors into the text after word segmentation, and obtaining a text matrix in the following specific process:
(1) the word sequence obtained by word segmentation is subjected to unsupervised training by using a CBOW algorithm in a word2vec component in a genesis library to obtain a word vector corresponding to each word. The principle of the CBOW algorithm is to predict the probability p (w | Context (w)) generated by w according to the context Context (w) of the word w, and train the word vector by maximizing the objective function T:
T=∑log p(w|Context(w))
and (3) carrying out word sequence: the 'knife', 'fail', 'in place', 'power up', 'operation' training results in a 128-dimensional word vector as shown in table 2:
(2) TABLE 2 word-corresponding word vectors
(3) And performing word embedding on the trained word vectors by using an embedding layer to obtain a text matrix.
Inputting the text matrix into a multi-head attention model to obtain a text matrix containing attention, and fusing the attention text matrix with the original text matrix in the following specific process:
(1) the dimension of the word vector is dk128, the sentence length is L, and the ith word vector in the sentence is denoted as el(1. ltoreq. L. ltoreq.L) to give LxdkText matrix E ═ E1…el…eL];
(2) Inputting the text matrix into the multi-head attention model to obtain a text matrix representation which is subjected to multi-head attention optimizationThe multi-head attention model is shown in FIG. 2;
Head=MultiHead(EWQ,EWK,EWV)=Concat(head1,…,headh)WO
where, Q, K, V is the input matrix,for scaling factor, Attetnion is the scaling dot product attention operation, Multihead is the Multi-head attention function, Concat is the splicing function, head is the result of the Single self-attention model operation, WQ、WK、WV、WOFor linear transformation matrices, the text matrix optimized for multi-headed attention is shown in table 3.
(3) TABLE 3 attention text matrix
Comparing table 1 and table 2, it can be known that the text matrix keyword vectors optimized by the multi-head attention mechanism are enhanced from different dimensions;
E′1=Residual_Connect(E,Head)
E1=LayerNorm(E1′)
Wherein E is1' is the matrix after residual concatenation, Residul _ Connect is the residual concatenation operation, LayerNorm is the layer normalization.
The specific process of using the RCNN model to extract the characteristics of the fused text matrix and outputting the final classification result is as follows:
(1) the recurrent neural network part in the RCNN network adopts a bidirectional GRU network, the schematic diagram is shown in FIG. 3, the network consists of a forward input GRU and a reverse input GRU, and the GRU network is used for learning the current word w respectively on the left and the rightiLeft context of (c) represents cl (w)i) And the right context representation cr (w)i) And then with the attention word vector e (w) of the current wordi)∈E1Input x connected to form subsequent convolutional layersi:
cl(wi)=f(W(l)cl(wi-1)+W(sl)e(wi-1))
cr(wi)=f(W(r)cr(wi-1)+W(sr)e(wi-1))
xi=[cl(wi);e(wi);cr(wi)]
Wherein, W(l),W(r)For converting a hidden layer into a matrix of the next hidden layer, W(sl),W(sr)F is a non-linear activation function for a matrix combining the semantics of the current word with the left or right text of the next word;
(2) convolution layer using the number of convolution kernel columns and xiConvolution kernel with equal number of rows 1The activation function is tanh, the output of the Bi-GRU network is convoluted by the convolution layer to obtain the convolution result
Wherein b is an offset.
(3) The Pooling layer part adopts Global Average potential firing (GAP) to sample the characteristics of the output result, y(3)∈R3mFor the extracted feature vectors:
(4) and inputting the feature vector into a softmax function, and outputting a final classification result.
Wherein, PiThe probability that the text classification is classified into i classes is shown, n is the number of classes, and the class with the highest probability is the classification result of the text.
(5) The model training adopts a five-fold cross validation method, the model training adopts an Adam gradient descent method to update model weight parameters, and the prediction result of the model on the example text sequence after tuning is shown in Table 3.
(6) TABLE 3 results of the classification
Predicting equipment defects through an algorithm: the knife switch can not be electrically operated on site, and the general defects are consistent with the grades of the actual defects.
(7) To examine the effect of the classification algorithm of this embodiment, this embodiment also designed comparative experiments with other models. The experimental environment is CPU Intel Core i7-8550U, the experimental framework is Tensorflow, the evaluation index is Macro-average comprehensive index Macro-F1(MF1), and the experimental result is shown in FIG. 3.
FIG. 4 shows the comparison between the multi-head attention mechanism and the RCNN classification algorithm (MAT-RCNN) and other classification algorithms. It can be seen from the figure that the model of the present invention is superior to the compared classification algorithm in classification effect, and MF1 reaches 94.51%.
According to the method, the deep semantic learning algorithm is applied to classification of the power grid defect texts, and the power grid equipment defect texts are classified quickly, so that quick grading of equipment defects is achieved, the maintenance efficiency of the power grid equipment is improved, the fault elimination time is shortened, and the practicability is high. The method not only saves labor cost, but also has good classification effect on the power grid defect texts.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (7)
1. A power grid equipment defect text classification method based on a multi-head attention mechanism and an RCNN (Rich coupled neural network) is characterized by comprising the following steps:
firstly, preprocessing a power grid defect text by word segmentation and word removal;
step two, embedding word vectors into the text after word segmentation to obtain a text matrix;
inputting the text matrix into a multi-head attention model to obtain a text matrix containing attention, and fusing the attention text matrix with the original text matrix;
step four, using an RCNN network model to extract the characteristics of the fused text matrix, and outputting the final classification result;
and fifthly, testing and optimizing the multi-head attention model and the RCNN model by using the power grid primary equipment defect text.
2. The method as claimed in claim 1, wherein in the first step, the text preprocessing is performed as follows:
(1) acquiring a data set file for text classification in advance, wherein the data set file comprises a power equipment defect text and a corresponding labeled defect grade class label;
(2) and establishing a proper name word bank and a stop word bank aiming at the text content, segmenting the text by utilizing a Chinese word segmentation component of python, and converting each text into a word sequence.
3. The method for classifying the defect texts of the power grid equipment based on the multi-head attention mechanism and the RCNN is characterized in that word vector embedding is performed on the text after word segmentation, and a text matrix is obtained by the following method:
(1) performing unsupervised training on a word sequence obtained by word segmentation by using a CBOW algorithm in a word2vec component in a genesis library to obtain a word vector corresponding to each word;
(2) and performing word embedding on the trained word vectors by using an embedding layer to obtain a text matrix.
4. The method for classifying the defect texts of the power grid equipment based on the multi-head attention mechanism and the RCNN network as claimed in claim 3, wherein the CBOW algorithm is used to predict the probability p (w | Context (w)) generated by the word w according to the context (w) of the word w, and the word vector is trained by maximizing the objective function T:
T=∑logp(w|Context(w))。
5. the method for classifying the defect texts of the power grid equipment based on the multi-head attention mechanism and the RCNN is characterized in that the text matrix is input into a multi-head attention model to obtain a text matrix containing attention, and the method for fusing the attention text matrix and the original text matrix comprises the following steps:
(1) the dimension of the word vector is dkThe length of the sentence is L, the first word vector in the sentence is expressed asel(1. ltoreq. L. ltoreq.L) to give LxdkText matrix E ═ E1…el…eL];
(2) Inputting the text matrix into the multi-head attention model to obtain a text matrix representation which is subjected to multi-head attention optimization
Head=MultiHead(EWQ,EWK,EWV)=Concat(head1,…,headh)WO
Where, Q, K, V is the input matrix,for scaling factor, Attetnion is the scaling dot product attention operation, Multihead is the Multi-head attention function, Concat is the splicing function, head is the result of the Single self-attention model operation, headiFor the ith self-attention model operation result, WQ、WK、WV、WOIs a linear transformation matrix;
E′1=Residual_Connect(E,Head)
E1=LayerNorm(E′1)
Wherein, E'1The matrix after residual concatenation, Residul _ Connect, and LayerNorm, are the residual concatenation operations and layer normalization.
6. The method for classifying the defect texts of the power grid equipment based on the multi-head attention mechanism and the RCNN is characterized in that the method for extracting the features of the fused text matrix by using the RCNN model and outputting the final classification result comprises the following steps:
(1) the recurrent neural network part in the RCNN network adopts a bidirectional GRU network, the network consists of a forward input GRU and a reverse input GRU, and the GRU network is used for learning the current word w respectively on the left and the rightiLeft context of (c) represents cl (w)i) And the right context representation cr (w)i) And then with the attention word vector e (w) of the current wordi)∈E1Input x connected to form subsequent convolutional layersi:
cl(wi)=f(W(l)cl(wi-1)+W(sl)e(wi-1))
cr(wi)=f(W(r)cr(wi-1)+W(sr)e(wi-1))
xi=[cl(wi);e(wi);cr(wi)]
Wherein, W(l),W(r)For converting a hidden layer into a matrix of the next hidden layer, W(sl),W(sr)F is a non-linear activation function for a matrix combining the semantics of the current word with the left or right text of the next word;
(2) convolution layer using the number of convolution kernel columns and xiConvolution kernel with equal number of rows 1The activation function is tanh, the output of the Bi-GRU network is convoluted by the convolution layer to obtain the convolution result
Wherein b is an offset;
(3) the part of the Pooling layer adopts Global Average Potential (GAP) to output the resultLine feature sampling, y(3)∈R3mFor the extracted feature vectors:
(4) inputting the feature vector into a softmax function, and outputting a final classification result:
wherein, PiThe probability that the text classification is classified into i classes is shown, n is the number of classes, and the class with the highest probability is the classification result of the text.
7. The method for classifying the defect texts of the power grid equipment based on the multi-head attention mechanism and the RCNN is characterized in that the method for testing and optimizing the classification model by the defect texts of the primary equipment of the power grid is as follows:
and testing the model by adopting a five-fold cross validation method for the data set, wherein the Macro-average comprehensive index Macro-F1 is adopted as the evaluation index, and the Adam gradient descent method is adopted for model training to update the model weight parameters, so that the test tuning is realized.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010778393.4A CN112199496A (en) | 2020-08-05 | 2020-08-05 | Power grid equipment defect text classification method based on multi-head attention mechanism and RCNN (Rich coupled neural network) |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010778393.4A CN112199496A (en) | 2020-08-05 | 2020-08-05 | Power grid equipment defect text classification method based on multi-head attention mechanism and RCNN (Rich coupled neural network) |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112199496A true CN112199496A (en) | 2021-01-08 |
Family
ID=74006164
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010778393.4A Pending CN112199496A (en) | 2020-08-05 | 2020-08-05 | Power grid equipment defect text classification method based on multi-head attention mechanism and RCNN (Rich coupled neural network) |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112199496A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112765353A (en) * | 2021-01-22 | 2021-05-07 | 重庆邮电大学 | Scientific research text-based biomedical subject classification method and device |
CN113297380A (en) * | 2021-05-27 | 2021-08-24 | 长春工业大学 | Text classification algorithm based on self-attention mechanism and convolutional neural network |
CN113886524A (en) * | 2021-09-26 | 2022-01-04 | 四川大学 | Network security threat event extraction method based on short text |
CN115617990A (en) * | 2022-09-28 | 2023-01-17 | 浙江大学 | Electric power equipment defect short text classification method and system based on deep learning algorithm |
CN116484262A (en) * | 2023-05-06 | 2023-07-25 | 南通大学 | Textile equipment fault auxiliary processing method based on text classification |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109472024A (en) * | 2018-10-25 | 2019-03-15 | 安徽工业大学 | A kind of file classification method based on bidirectional circulating attention neural network |
CN109885673A (en) * | 2019-02-13 | 2019-06-14 | 北京航空航天大学 | A kind of Method for Automatic Text Summarization based on pre-training language model |
CN110209824A (en) * | 2019-06-13 | 2019-09-06 | 中国科学院自动化研究所 | Text emotion analysis method based on built-up pattern, system, device |
CN110532386A (en) * | 2019-08-12 | 2019-12-03 | 新华三大数据技术有限公司 | Text sentiment classification method, device, electronic equipment and storage medium |
CN110569508A (en) * | 2019-09-10 | 2019-12-13 | 重庆邮电大学 | Method and system for classifying emotional tendencies by fusing part-of-speech and self-attention mechanism |
CN110619034A (en) * | 2019-06-27 | 2019-12-27 | 中山大学 | Text keyword generation method based on Transformer model |
CN110727824A (en) * | 2019-10-11 | 2020-01-24 | 浙江大学 | Method for solving question-answering task of object relationship in video by using multiple interaction attention mechanism |
CN110781305A (en) * | 2019-10-30 | 2020-02-11 | 北京小米智能科技有限公司 | Text classification method and device based on classification model and model training method |
CN111079532A (en) * | 2019-11-13 | 2020-04-28 | 杭州电子科技大学 | Video content description method based on text self-encoder |
CN111259666A (en) * | 2020-01-15 | 2020-06-09 | 上海勃池信息技术有限公司 | CNN text classification method combined with multi-head self-attention mechanism |
CN111461190A (en) * | 2020-03-24 | 2020-07-28 | 华南理工大学 | Deep convolutional neural network-based non-equilibrium ship classification method |
-
2020
- 2020-08-05 CN CN202010778393.4A patent/CN112199496A/en active Pending
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109472024A (en) * | 2018-10-25 | 2019-03-15 | 安徽工业大学 | A kind of file classification method based on bidirectional circulating attention neural network |
CN109885673A (en) * | 2019-02-13 | 2019-06-14 | 北京航空航天大学 | A kind of Method for Automatic Text Summarization based on pre-training language model |
CN110209824A (en) * | 2019-06-13 | 2019-09-06 | 中国科学院自动化研究所 | Text emotion analysis method based on built-up pattern, system, device |
CN110619034A (en) * | 2019-06-27 | 2019-12-27 | 中山大学 | Text keyword generation method based on Transformer model |
CN110532386A (en) * | 2019-08-12 | 2019-12-03 | 新华三大数据技术有限公司 | Text sentiment classification method, device, electronic equipment and storage medium |
CN110569508A (en) * | 2019-09-10 | 2019-12-13 | 重庆邮电大学 | Method and system for classifying emotional tendencies by fusing part-of-speech and self-attention mechanism |
CN110727824A (en) * | 2019-10-11 | 2020-01-24 | 浙江大学 | Method for solving question-answering task of object relationship in video by using multiple interaction attention mechanism |
CN110781305A (en) * | 2019-10-30 | 2020-02-11 | 北京小米智能科技有限公司 | Text classification method and device based on classification model and model training method |
CN111079532A (en) * | 2019-11-13 | 2020-04-28 | 杭州电子科技大学 | Video content description method based on text self-encoder |
CN111259666A (en) * | 2020-01-15 | 2020-06-09 | 上海勃池信息技术有限公司 | CNN text classification method combined with multi-head self-attention mechanism |
CN111461190A (en) * | 2020-03-24 | 2020-07-28 | 华南理工大学 | Deep convolutional neural network-based non-equilibrium ship classification method |
Non-Patent Citations (1)
Title |
---|
SIWEI LAI等: "Recurrent Convolutional Neural Networks for Text Classification", 《PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112765353A (en) * | 2021-01-22 | 2021-05-07 | 重庆邮电大学 | Scientific research text-based biomedical subject classification method and device |
CN112765353B (en) * | 2021-01-22 | 2022-11-04 | 重庆邮电大学 | Scientific research text-based biomedical subject classification method and device |
CN113297380A (en) * | 2021-05-27 | 2021-08-24 | 长春工业大学 | Text classification algorithm based on self-attention mechanism and convolutional neural network |
CN113886524A (en) * | 2021-09-26 | 2022-01-04 | 四川大学 | Network security threat event extraction method based on short text |
CN115617990A (en) * | 2022-09-28 | 2023-01-17 | 浙江大学 | Electric power equipment defect short text classification method and system based on deep learning algorithm |
CN115617990B (en) * | 2022-09-28 | 2023-09-05 | 浙江大学 | Power equipment defect short text classification method and system based on deep learning algorithm |
CN116484262A (en) * | 2023-05-06 | 2023-07-25 | 南通大学 | Textile equipment fault auxiliary processing method based on text classification |
CN116484262B (en) * | 2023-05-06 | 2023-12-08 | 南通大学 | Textile equipment fault auxiliary processing method based on text classification |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112199496A (en) | Power grid equipment defect text classification method based on multi-head attention mechanism and RCNN (Rich coupled neural network) | |
CN108304468B (en) | Text classification method and text classification device | |
CN106250934B (en) | Defect data classification method and device | |
Halibas et al. | Application of text classification and clustering of Twitter data for business analytics | |
CN106021410A (en) | Source code annotation quality evaluation method based on machine learning | |
CN108536756A (en) | Mood sorting technique and system based on bilingual information | |
CN110377901B (en) | Text mining method for distribution line trip filling case | |
CN111767398A (en) | Secondary equipment fault short text data classification method based on convolutional neural network | |
CN113590764B (en) | Training sample construction method and device, electronic equipment and storage medium | |
CN110895565A (en) | Method and system for classifying fault defect texts of power equipment | |
CN109446423B (en) | System and method for judging sentiment of news and texts | |
CN111309859B (en) | Scenic spot network public praise emotion analysis method and device | |
CN110910175A (en) | Tourist ticket product portrait generation method | |
CN112966708A (en) | Chinese crowdsourcing test report clustering method based on semantic similarity | |
CN113886562A (en) | AI resume screening method, system, equipment and storage medium | |
CN112417893A (en) | Software function demand classification method and system based on semantic hierarchical clustering | |
TWI734085B (en) | Dialogue system using intention detection ensemble learning and method thereof | |
CN115757695A (en) | Log language model training method and system | |
CN114416991A (en) | Method and system for analyzing text emotion reason based on prompt | |
CN117768618A (en) | Method for analyzing personnel violation based on video image | |
CN113378024A (en) | Deep learning-based public inspection field-oriented related event identification method | |
CN110362828B (en) | Network information risk identification method and system | |
CN117390198A (en) | Method, device, equipment and medium for constructing scientific and technological knowledge graph in electric power field | |
CN115357718B (en) | Method, system, device and storage medium for discovering repeated materials of theme integration service | |
CN111160756A (en) | Scenic spot assessment method and model based on secondary artificial intelligence algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210108 |
|
RJ01 | Rejection of invention patent application after publication |