CN111274386A

CN111274386A - Work order text classification algorithm based on convolutional neural network and multi-attention machine mechanism

Info

Publication number: CN111274386A
Application number: CN201911147815.1A
Authority: CN
Inventors: 王晓峰; 周艳; 范华; 尉耀稳; 霍凯龙; 陈杰; 翁利国; 施凌震; 徐舒妍; 姜川; 陶燕增
Original assignee: Hangzhou Power Supply Co of State Grid Zhejiang Electric Power Co Ltd; Zhejiang Zhongxin Electric Power Engineering Construction Co Ltd
Current assignee: Hangzhou Power Supply Co of State Grid Zhejiang Electric Power Co Ltd; Zhejiang Zhongxin Electric Power Engineering Construction Co Ltd
Priority date: 2019-11-21
Filing date: 2019-11-21
Publication date: 2020-06-12

Abstract

The invention provides a worksheet text classification algorithm based on a convolutional neural network and a multi-attention machine system, which comprises a training set acquisition step, a text word segmentation step, a word vector training step, a sentence splitting step, a word vector conversion step and a sentence-level convolutional neural network step; a sentence-level attention mechanism step; a sentence-level full-concatenation step, a document processing step, and a class step, including performing linear transformation on the document feature vector acquired in step S9, and then generating probabilities of the respective classes on the class set C using a softmax function. The model is designed into two parts, namely a sentence level and a document level. Sentence features are extracted first at the sentence level and then document features are extracted at the document level for classification. The model structure can ensure that all texts are input into the model and can avoid the waste in calculation caused by the excessively large model.

Description

Work order text classification algorithm based on convolutional neural network and multi-attention machine mechanism

Technical Field

The invention relates to the field of text classification in computer natural language processing, in particular to a work order text classification algorithm based on a convolutional neural network and a multi-attention machine system.

Background

For a power supply company, complaints of users represent opportunities and bring huge challenges, and the huge complaints of work orders enable shunting personnel to accurately distribute the work orders to proper processing departments while reading all the work orders, so that the efficiency is low, and the accuracy cannot be guaranteed. The method can timely and efficiently deal with the complaints of the users, can improve the image of enterprises, increase public praise, timely adjust the business direction of the users, improve the service quality and improve the equal competitiveness greatly. On the contrary, if the user appeal cannot be solved in time, or the enterprise image is greatly discounted when the user appeal is distributed to wrong processing staff, the overstock of the complaint work order can be caused, the word-of-mouth slides down, even a large area of complaint is caused by the user panic, and negative messages are generated. Therefore, how to classify the electric power complaint texts quickly and accurately is a great challenge facing each power supply company nowadays.

With the rise of artificial neural network methods in recent years, various artificial neural network-based algorithms are also applied to the field of text classification for natural language processing, and exhibit superior performance to other conventional classification methods. The most common implementation is to classify text by convolutional neural networks. The convolutional neural network has strong sentence modeling capability because it can capture local features of different positions in the text through the convolutional window. Convolutional neural networks can capture the semantics of text better than other artificial neural network models. However, due to the size limitation of the convolution window, the convolution neural network model cannot capture the dependency relationship between long-distance words, thereby causing information loss when extracting text features. Second, the importance of all text features extracted by the convolutional neural network default is equal, which results in that the important features related to classification cannot play a role commensurate with their importance. While extraneous noise features may interfere with the effectiveness of text classification. Thirdly, the length of the input text of the convolutional neural network is fixed, and if the length of the text is longer, partial text information is discarded. Therefore, there is a great disadvantage in classifying the electric power complaint texts by using such a method.

Disclosure of Invention

The invention aims to provide a work order text classification algorithm based on a convolutional neural network and a multi-attention machine system, aiming at solving a plurality of problems existing in manual classification of power complaint work order texts. The method can be divided into two levels of sentence level and document level, firstly, the characteristics of each sentence of the text are extracted, and then the document characteristics are extracted. The specific operation steps are as follows:

s1, a training set obtaining step, which comprises the steps of obtaining training set files for text classification in advance, wherein the training set files comprise electric power complaint work order texts and corresponding labeled complaint category labels;

s2, a text word segmentation step, which comprises the steps of segmenting the texts obtained in the step S1 by using a python Chinese word segmentation component, and converting each text into a word sequence;

s3, training word vectors, namely performing unsupervised training on the word sequences obtained in the step S2 by using a skip-gram algorithm in a word2vec component in a genetic library to obtain the word vectors corresponding to each word;

s4, a sentence splitting step, which includes splitting the obtained word vector to obtain a word sequence;

s5, converting word vectors, including converting each word in the word sequence obtained in the S4 into corresponding word vectors obtained through S3 training;

s6, a sentence level convolution neural network step, including respectively using the 2-dimensional matrixes obtained in the step S5 as the first layer of the sentence level convolution neural network;

s7, a sentence-level attention mechanism step including assigning different attention weights to the respective word feature vectors in the first layer of the output of the convolutional neural network obtained in step S6 by an attention mechanism formula;

s8, sentence-level full-connection step, which comprises the step of carrying out linear transformation on the output vector S of the S7 sentence-level attention step through a full-connection neural network to obtain a sentence characteristic vector;

s9, a document processing step, which comprises the step of splicing the sentence characteristic vectors acquired in the step S8 into a vector as the input of the document level part;

and S10, a classification step, which comprises the steps of carrying out linear transformation on the document feature vectors acquired in the step S9 and then generating the probability of each classification on the class set C by using a softmax function.

Optionally, the obtaining a word vector corresponding to each word by performing unsupervised training on the word sequence obtained in step S2 by using a skip-gram algorithm in a word2vec component in a generic library includes:

performing unsupervised training on the word sequence obtained in the step S2 by using a skip-gram algorithm in a word2vec component in a genetic library in python to obtain a word vector corresponding to each word, wherein the dimension of the word vector is d, the principle of the skip-gram algorithm is to train word vectors of words w1, w2, … and wN to maximize average logarithmic probability as shown in formula 1, wherein N represents that N non-repeated words are contained in a training set,

in the formula: c is the word w_tThe context range of (a) represents c words from the first c words of the word t to the c words after the word t in the word sequence, and t belongs to (1, N);

in the formula: e (w)_t) The expression w_tAnd (3) corresponding word vectors, wherein the word vectors corresponding to all words are initialized randomly, and then the vector parameters are continuously updated in an iterative manner through gradient descent until the target function formula 2 is converged.

Optionally, the respectively using the 2-dimensional matrix obtained in step S5 as the first layer of the sentence-level convolutional neural network includes:

using the convolution window matrix W ∈ R^h*dH denotes the size of the convolution window and d denotes the dimension of the word vector, X is given by equation 3_layer1Extracting text features, result X_layer2Is m 2-dimensional matrixes with the shape of n X d, and the rest is repeated until L layers, and the output X is_layerLAs the final output of the convolutional neural network,

wherein W is ∈ R^h*dFor the convolution window matrix, b ∈ R^d、c∈R^dFor deviation, σ is a nonlinear activation function, V ∈ R^h*dIs a matrix of gate cells.

Optionally, the assigning different attention weights to the word feature vectors in the first layer of the output of the convolutional neural network obtained in step S6 through the attention mechanism formula includes:

the output X of the convolutional neural network in step S6 is given by equations 4 and 5_layerL＝[x_L1,x_L2,…,x_Ln]Each word feature vector in (1) is assigned a different attention weight, where x_LiRepresents the feature vector of the i-th word,

where a is the attention vector α_iTo assign to the word feature vector x_LiN is the length of the sentence set artificially, and s is the sentence feature vector obtained by weighting and summing each word feature vector and the attention weight.

Compared with the prior art, the invention has the beneficial effects that:

the method not only effectively solves the problem of classification and classification of the electric power complaint work order text, but also overcomes the defects and shortcomings of the convolutional neural network in the prior art. Firstly, in order to break through the limitation of the size of a convolution window, capture the dependency relationship between long-distance words, increase the depth of a network, and perform convolution operation on each layer of the network. Thus, as the number of network layers increases, the characteristics output by each layer of network can capture more words than the previous layer. Secondly, in order to enable important features extracted by the convolutional neural network and related to classification to occupy higher weight during classification, and irrelevant noise features to be selectively ignored, a multi-attention mechanism is introduced so as to be capable of assigning more attention to the features related to classification, namely, giving higher weight. Thirdly, the number of neurons at the input end of the convolutional neural network model is fixed, and if the model can be completely input in order to ensure that long text can be input, the number of neurons at the input end is too large, and the time complexity of the model is greatly increased. To solve this problem, the model is designed in two parts, namely, sentence level and document level. Sentence features are extracted first at the sentence level and then document features are extracted at the document level for classification. The model structure can ensure that all texts are input into the model and can avoid the waste in calculation caused by the excessively large model.

Drawings

FIG. 1 is an overall flow chart of the method of the present invention.

Detailed Description

The preferred embodiments of the present invention are described below with reference to the accompanying drawings:

the work order text classification algorithm based on the convolutional neural network and the multi-attention machine system, as shown in fig. 1, includes the following steps:

In implementation, the specific step S3 of training word vectors is: and (4) performing unsupervised training on the word sequence obtained in the step S2 by using a skip-gram algorithm in a word2vec component in a genetic library in python to obtain a word vector corresponding to each word. The dimension of the word vector is d. The principle of the skip-gram algorithm is to train a word vector of words w1, w2, …, wN with a maximum mean logarithmic probability (equation 1), where N denotes that the training set contains N words that are not repeated.

In the formula: c is the word w_tRepresents c words from the first c words of the word t to c words after the word t in the word sequence, and t belongs to (1, N).

In the formula: e (w)_t) The expression w_tA corresponding word vector;

the word vectors corresponding to all words are initialized randomly, and then the vector parameters are continuously updated iteratively through gradient descent until the objective function (formula 1) converges.

S4, sentence splitting: a sequence of words obtained through step S2 is split into several sentences with a sentence end punctuation (e.g., ". If the word sequence contains m sentences, splitting the word sequence into m parts;

s5, converting word vectors: each word w in the m word sequences obtained by step S4_tAre converted into corresponding word vectors e (w) trained through S3_t). Thus, m 2-dimensional matrixes with the shape of n x d are finally obtained and are used for training the subsequent neural network. Wherein n is the sentence length set manually, if the sentence length exceeds n, the exceeding part is intercepted, if the sentence length is less than n, the station bit symbol is filled;

s6, sentence level convolution neural network step: using the m 2-dimensional matrices with n × d shapes obtained in step S5 as the first layer X of the sentence-level convolutional neural network_{layer 1}. Using the convolution window matrix W ∈ R^h*dH denotes the size of the convolution window and d denotes the dimension of the word vector, X is given by equation 3_{layer 1}Extracting text features, result X_{layer 2}There are still m 2-dimensional matrices shaped as n x d. Repeating the steps until the L layer, and outputting X_{layer L}As the final output of the convolutional neural network. The range of text that the convolution window can capture increases as the number of layers increases.

Wherein W is ∈ R^h*dFor the convolution window matrix, b ∈ R^d、c∈R^dIs a deviation. σ is a nonlinear activation function, V is equal to R^h*dFor the gate cell matrix, the gate cell may determine whether the text feature can be passed to the next layer, thereby preserving the text feature associated with classification and screening out noise features that are not associated with classification.

S7, sentence-level attention mechanism: giving the output X of the convolutional neural network in step S6 by the attention mechanism equations 4, 5_layerL＝[x_L1,x_L2,…,x_Ln]Each word feature vector in (1) is assigned a different attention weight, where x_LiRepresenting the ith word feature vector.

Wherein a is attention vector α_iTo assign to the word feature vector x_LiN is the sentence length set by the person. And s is a sentence feature vector obtained by weighting and summing each word feature vector and the attention weight. Because a multi-attention mechanism is used, the attention vector has a plurality of correlations for respectively examining the word feature vector and the classification from different angles, and each attention vector generates a corresponding sentence feature vector. And finally, splicing all sentence feature vectors into a vector S as the output of the sentence-level attention step.

S8, sentence-level full-connection step: and (3) performing linear transformation on the output vector S of the sentence-level attention step of S7 through a fully-connected neural network (formula 6) to obtain m sentence feature vectors with dimension d.

y_s＝S*W_s+b_sEquation 6

In the formula: w_sIs a linear transformation matrix, b_sIs a deviation, y_sIs a d-dimensional sentence feature vector.

S9, document level part: the m sentence feature vectors of dimension d obtained in step S8 are spliced into one vector as input to the document-level part. The operation of the document level part is completely the same as that of the sentence level part, and the document level part comprises a document level convolution neural network step, a document level attention mechanism step and a document level full connection step. And finally, outputting the document feature vector, which is not described in detail herein.

S10, classification step: the document feature vector acquired in step S9 is linearly transformed by equation 6, and then the probability of each classification on the class set C is generated using the softmax function of equation 7.

y_c＝D*W_c+b_cIn the formula 7, the first and second groups,

in the formula: w_c，b_cIs a variable. P_cIs the probability that the text classification is classified into class c, and the classification with the highest probability is the classification result of the text. In the model training stage, the Adam gradient descent method is adopted to update the model weight parameters. The data of this example is the text data of the complaints of the accepted electric power in the whole year of a power supply bureau 95598 in 2018, and the specific characteristics are shown in the following table.

TABLE 1 data set characteristic information Table

Name (R)	Number of classification	Training set size	Test set size
				2018 complaint about power supply bureau	5	1200	400

S2, text word segmentation step: text: the client reflects that a plurality of users have no electricity and can convert the electricity into a word sequence: client, reflection, multiple households, no electricity.

S3, word vector training: and (3) carrying out word sequence: client, reflection, multiple households and no electricity, and the corresponding d-dimensional word vector obtained through training is shown in table 2. In the example d is 128.

TABLE 2 word vectors corresponding to words

S4, sentence splitting: text: in the last 2 months, 3 times of power failure occurs. The customer reflects that a plurality of customers do not have power. Requiring verification of the cause of the power outage. Can be split into three sentences: 1) in the last 2 months, 3 times of power failure occurs. 2) The customer reflects that a plurality of customers do not have power. 3) Requiring verification of the cause of the power outage.

S5, converting word vectors: since the 3 word sequences obtained in step S4 have different lengths, the length n needs to be unified, and the filling placeholder pad is less than n. In the examples, n was 12, and the results are shown in Table 3.

1)

Near to

2

An

Moon cake

Inner part

Appear

3

Next time

Power cut

。

pad

2)

Customer

Reflecting

Multi-family

Without electricity

。

pad

3)

Require that

Verification

Power cut

Reason for

。

pad

TABLE 3 unifying sentence length

All words are then converted into corresponding word vectors obtained by training at S3, which ultimately results in 3 2-dimensional matrices of 12 × 128 shape for subsequent training of the neural network.

S6, sentence level convolution neural network step: will pass through step S5And obtaining 3 sentence-level convolutional neural networks with L layers of 2-dimensional matrixes with the shapes of 12 x 128. Using convolution window W ∈ R^h*dAnd extracting text features. h denotes the size of the convolution window, d denotes the dimension of the word vector, h takes 3 and L takes 5 in the example. Thus, a text feature with a distance dependency of 1 can be captured at the second layer of the 2 nd convolutional neural network, and a text feature with a distance dependency of 3 can be captured at the third layer of the 3 rd convolutional neural network. By analogy, at the last layer, i.e., layer 5, a text feature with dependency of 7 may be captured.

S7, sentence-level attention mechanism: the importance of different words in the text also differs. Different attention weights may be assigned to the respective word feature vectors by an attention mechanism. For example, the sequence of words: the weight of "no electricity" in "customer, reflection, multi-family, no electricity" will be higher, and the specific weight assignment is shown in table 4:

word	Customer	Reflecting	Multi-family	Without electricity
					Weight of	0.03	0.01	0.14	0.82

TABLE 4 attention mechanism Allocation of word weights

Because a multi-attention mechanism is used, the attention vector has a plurality of correlations for respectively examining the word feature vector and the classification from different angles, and each attention vector generates a corresponding sentence feature vector. And finally, splicing all sentence feature vectors into a vector to be used as the output of the sentence-level attention step.

S8, sentence-level full-connection step: and (3) performing linear transformation on the output of the sentence-level attention step of S7 through a fully-connected neural network to obtain 3 sentence feature vectors with dimension of 128.

S9, document level part: the 3 sentence feature vectors of dimension 128 obtained in step S8 are spliced into one vector as input to the document-level part. The operation of the document level part is completely the same as that of the sentence level part, and the document level part comprises a document level convolution neural network step, a document level attention mechanism step and a document level full connection step. And finally, outputting the document feature vector, which is not described in detail herein.

S10, classification step: the document feature vector acquired in step S9 is subjected to linear transformation, and then the probability of each classification on the class set C is generated by the softmax function, and the classification with the highest probability is the classification result of the text, as shown in table 5.

TABLE 5 results of the classification

To examine the effect of the classification algorithm of this example, the following comparative experiment was also designed. The hardware configuration of the experimental environment is 4GB RAM, Nvidia Geforce GTX 970M and video memory 3GB, and the experimental framework is tenserflow (1.1.0).

The classification algorithm of the sub-embodiment obtains an effect superior to other algorithms on a power supply bureau 95598 power complaint data set in 2018.

The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. The work order text classification algorithm based on the convolutional neural network and the multi-attention machine system is characterized by comprising the following steps of:

2. The convolutional neural network and multi-attention machine based worksheet text classification algorithm of claim 1, wherein the unsupervised training of the word sequence obtained in step S2 is performed by using skip-gram algorithm in word2vec component in generatim library to obtain word vector corresponding to each word, and the method comprises:

unsupervised training of the word sequence obtained in the step S2 is carried out by using a skip-gram algorithm in a word2vec component in a genetic library in python, a word vector corresponding to each word is obtained, the dimension of the word vector is d, the principle of the skip-gram algorithm is that the word vectors of words w1, w2, and wN are trained as shown in the formula 1 by maximizing the average logarithmic probability, wherein N represents that the training set contains N non-repeated words,

3. The convolutional neural network and multi-attention mechanism based work order text classification algorithm as claimed in claim 1, wherein the step of using the 2-dimensional matrix obtained in step S5 as the first layer of the sentence-level convolutional neural network respectively comprises:

using the convolution window matrix W ∈ R^h*dH denotes the size of the convolution window and d denotes the dimension of the word vector byFormula 3 vs. X_layer1Extracting text features, result X_layer2Is m 2-dimensional matrixes with the shape of n X d, and the rest is repeated until L layers, and the output X is_layerLAs the final output of the convolutional neural network,

4. The convolutional neural network and multi-attention mechanism based work order text classification algorithm of claim 1, wherein the assigning different attention weights to the word feature vectors in the first layer of the output of the convolutional neural network obtained in step S6 through the attention mechanism formula comprises:

the output X of the convolutional neural network in step S6 is given by equations 4 and 5_layerL＝[x_L1,x_L2,...,x_Ln]Each word feature vector in (1) is assigned a different attention weight, where x_LiRepresents the feature vector of the i-th word,