CN114722206A

CN114722206A - Extremely short text classification method based on keyword screening and attention mechanism

Info

Publication number: CN114722206A
Application number: CN202210419204.3A
Authority: CN
Inventors: 朱毅; 周鑫柯; 李云; 强继朋; 袁运浩
Original assignee: Yangzhou University
Current assignee: Yangzhou University
Priority date: 2022-04-20
Filing date: 2022-04-20
Publication date: 2022-07-08

Abstract

The invention discloses a method for classifying extremely short texts based on keyword screening and attention mechanism, which comprises the following steps: (1) designing and implementing a keyword screening algorithm, and introducing additional knowledge through a knowledge graph to optimize the feature representation of the extremely short text; (2) obtaining the representation of the extremely short text through a bidirectional long-short term memory model (Attention-based BilSTM) with an Attention mechanism; (3) constructing two attention mechanisms for extra knowledge to learn more important and relevant knowledge; (4) finally, the extremely short text representation is combined with additional knowledge, a softmax classifier is used for classifying the extremely short text data set, and a classification result is obtained. The invention improves the effects of representation learning and feature extraction, improves the accuracy of data set classification, and has higher robustness and practicability.

Description

Extremely short text classification method based on keyword screening and attention mechanism

Technical Field

The invention relates to the field of data mining and natural language processing research, in particular to a very short text classification method based on keyword screening and attention mechanism.

Background

The method can focus on the most important words through the Attention mechanism, thereby capturing the most important semantic information in the sentence.

With the rapid development of web services, more and more short texts are generated on the web such as Twitter and microblog and are applied in many fields. In recent years, there has been a strong demand for processing short texts, which has attracted extensive attention and research. Currently, most existing short text classification methods can be roughly classified according to the feature learning mode: both on its own and on external resources. For the self-resource based approach, it extends the feature space by rules or statistics hidden in the current short text. For the external resource based approach, it extends the feature space with additional knowledge of the outside.

Although the two short text classification methods have good effect on short text classification, the expected effect cannot be achieved on extremely short texts. This is mainly because very short text has a shorter length than conventional short text, and even one to two keywords in very short text classification can determine the final classification result. And because of the length of the extremely short text, often containing few features, do not provide sufficient word co-occurrence. The last words have high ambiguity in very short texts, which may have an effect on the classification result.

The method for classifying the short text based on the external resources does not consider that the final classification result can be determined by one or two key words in the extremely short text classification, and the method usually introduces concepts of each word in the short text to enrich the characteristics, so that a large number of irrelevant concepts or concepts without any action are introduced, and a large amount of noise is introduced to influence the final classification result. However, the concept of only introducing one keyword in a very short text also causes problems, the concept of one word has a plurality of concepts and no relevance exists between the concepts, and if all the concepts are introduced, the final classification result is also influenced.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a method for classifying an extremely short text based on keyword screening and attention mechanism, which can find out a keyword determining classification in the extremely short text through the keyword screening and filter out useful concepts through the attention mechanism method so as to achieve the purposes of optimizing feature expression vectors and improving the accuracy of classification of an extremely short text data set.

The purpose of the invention is realized as follows: a method for classifying extremely short texts based on keyword screening and attention mechanism comprises the following steps:

1) designing and implementing a keyword screening algorithm, and introducing additional knowledge through a knowledge graph to optimize the feature representation of the extremely short text;

2) obtaining the representation of the extremely short text through a bidirectional long and short term memory model with an attention mechanism;

3) constructing two attention mechanisms for extra knowledge to learn more important and relevant knowledge;

4) finally, the extremely short text representation is combined with additional knowledge, a softmax classifier is used for classifying the extremely short text data set, and a classification result is obtained.

Further, the step 1) specifically includes:

1.1) selecting input keywords by using a Rake keyword extraction algorithm, dividing an extremely short text into a plurality of phrases by using separators, taking the phrases as candidate words of the finally extracted keywords, dividing each phrase into a plurality of words through spaces, giving a score to each word, obtaining the score of each phrase through accumulation, and finally selecting the highest score as a keyword; the word score formula is:

wordScore＝wordDegree(w)/wordFrequency(w)

wherein wordDegree indicates that 1 is added to each word that co-occurs with a word in a phrase, and wordFrequency indicates the total number of occurrences of the word;

1.2) introducing relevant concepts of the keywords by using the knowledge graph, acquiring the concepts of the keywords from the knowledge graph base as additional knowledge, searching the concepts of the keywords by using an api interface of the knowledge graph base, and combining the obtained concepts into a concept set.

Further, the step 2) specifically includes:

2.1) word and word level embedding: representing input extremely short text as { (x)₁,y₁),(x₂,y₂),...(x_n,y_n) Where n is the number of all texts in the very short text, y_iE {1,2,. c }, c being the number of tags; performing feature representation learning by adopting two embedding modes of words and words, obtaining word embedding of each word by using a convolutional neural network, obtaining word embedding by word2vec, wherein the dimensionalities of a word vector and a word vector are both d/2, and finally connecting the word vector and the word vector together to obtain a d-dimensional word representation;

2.2) representation of very short text: the word representation obtained in step 2.1) is regarded as a d-dimensional sequence of word vectors (x)₁,x₂,…,x_n) Wherein n is the length of the very short text; inputting the word vector sequence into an Attention-based BilSTM to obtain a corresponding representation; BilSTM contains a network of forward and backward directions for processing very short text, as shown in equations (1) and (2):

then each one will

And

joined together to give a hidden state h_t(ii) a Therefore, all of h_ts is defined as

As shown in equation (3):

H＝(h₁,h₂,…,h_n) (3)

wherein u is the number of hidden elements in each direction of the BilSTM, and n is the number of word vectors; the attention weight is then calculated by equation (4):

wherein alpha is_iRepresenting the attention weight of each word, f is the activation function of the network, softmax is the weight used to normalize each word;

is a matrix of the weights that is,

is a weight vector, where d_aIs a hyperparameter, b₁Is an offset vector, h_iA hidden state representing the ith word;

final h_iThe weighted sum of (a) and (b) yields a representation z of the very short text_sAs shown in equation (5):

further, the step 3) specifically includes:

3.1) constructing a first concept attention mechanism: embedding the concept and the word level of the concept set obtained in the step 1.2) to obtain a d-dimensional concept vector (c)₁,c₂,…,c_m) Where m is the number of concepts; the first concept attention mechanism is used for calculating the ith concept and the very short text z_sThe calculation formula of the semantic similarity (2) is shown as (6):

wherein, beta_iRepresenting the attention weight of the ith concept to the very short text, f is the activation function of the network;

is a matrix of the weights that is,

is a weight vector, where d_bIs a hyperparameter, b₂Is a bias vector;

3.2) constructing a second concept attention mechanism: the second concept attention mechanism is used to calculate the importance of each concept to the whole concept set, and the calculation formula is shown as (7):

wherein, delta_iRepresenting the attention weight of the ith concept to the concept set, f is the activation function of the network;

is a matrix of the weights that is,

is a weight vector, where d_cIs a hyperparameter, b₃Is a bias vector;

3.3) weighting the attention of both conceptsCombining: will beta_iAnd delta_iThe final attention weight is obtained by combining equation (8):

μ_i＝softmax(λβ_i+(1-λ)δ_i) (8)

wherein, mu_iRepresents the final concept weight of the ith concept, and lambda is a weighting parameter to adjust the importance of the two attention weights;

3.4) concept representation: the final concept weight mu obtained in the step 3.3)_iAnd the concept vector (c) obtained in step 3.1)₁,c₂,…,c_m) Weighting and summing according to equation (9) to obtain a conceptual representation z_c：

Wherein, c_iConcept vector representing ith concept

Further, the step 4) specifically includes:

4.1) combining very short text representation with extra knowledge: representing the extremely short text obtained in the step 2.2) by z_sAnd the conceptual representation z obtained in step 3.4)_cCombining to obtain an output z, and inputting the output z into a full connection layer;

4.2) training a softmax classifier to classify on the extremely short text data set: training with a test extremely short text dataset, in softmax:

outputting z of the training data set in z in step 4.1)_trainSubstituting the class label y of the known training data set into (10) to train the classifier;

4.3) classifying the test extremely-short text data set by using the trained classifier: outputting z of the test data set in the output z completed in the step 4.1)_testObtaining a classification result T of the test extremely short text data set by substituting a classifier finished by an equation (10)_testAs shown in formula (11):

T_test＝argmax P(z_test) (11)。

by adopting the technical scheme, compared with the prior art, the invention has the beneficial effects that: 1) the invention provides a mixed extremely short text classification method, which can enrich semantic information of extremely short texts by combining knowledge in an external knowledge map; 2) the method introduces an Attention-based BilSTM algorithm principle to assign different weights to each word in the extremely short text to enhance the function of the keywords in the classification, thereby solving the problem that one or two keywords in the extremely short text can determine the classification result; 3) the method for screening the keywords is provided to find the most key words in the extremely short text and obtain the related concepts of the words, so that the problem that the concepts of all the words are not necessarily introduced into the extremely short text is solved; 4) two conceptual attention mechanisms have been proposed to introduce useful concepts to reduce the effects of noise; the method effectively improves the effects of representation learning and feature extraction, improves the accuracy of data set classification, and has higher robustness and practicability.

Drawings

FIG. 1 is a general block diagram of the proposed method of the present invention.

Figure 2 is a conceptual representation of the structure of the present invention.

Detailed Description

Fig. 1 shows a method for classifying very short texts based on keyword screening and attention mechanism, which includes the following steps: 1) designing and implementing a keyword screening algorithm, and introducing additional knowledge through a knowledge graph to optimize the feature representation of the extremely short text; 2) constructing a bidirectional long-short term memory model (Attention-based BilSTM) with an Attention mechanism, and inputting the extremely short text to obtain the representation of the extremely short text; 3) constructing two attention mechanisms for additional knowledge, and inputting the additional knowledge obtained in the step 1) to obtain a concept representation; 4) finally, the extremely short text representation is combined with extra knowledge, a softmax classifier is used for training on the extremely short text data set, and a classification result is obtained.

The method comprises the following steps:

1.1) selecting keywords by a Rake keyword extraction method;

selecting input keywords by using a Rake keyword extraction algorithm, taking the length of an extremely short text into consideration, dividing the extremely short text into a plurality of phrases by using separators such as AND, the, of and the like, taking the phrases as candidate words of the finally extracted keywords, dividing each phrase into a plurality of words through spaces, giving a score to each word, accumulating to obtain the score of each phrase, and finally selecting the phrase with the highest score as a keyword; the word score formula is:

wordScore＝wordDegree(w)/wordFrequency(w)

1.2) introducing related concepts of keywords by using a knowledge graph; acquiring the concept of the keyword from the knowledge graph base as additional knowledge;

searching the key words for concepts by using an api interface of the knowledge graph base; 'https:// concept.research.microsoft.com/api/Concept/score by probingstate ═ key word & topK ═ 10', instance in the statement interface is set as the selected keyword, topK is the number of concepts desired, the obtained concepts are combined into a set of concepts;

2) inputting the extremely short text to obtain the representation z of the extremely short text through a bidirectional long-short term memory model (Attention-based BilSTM) with an Attention mechanism_s；

2.1) embedding words and words for the input extremely short text;

firstly, embedding words and characters according to the idea of FIG. 1, inputting a very short text with length n, which is a word sequence and can be expressed as { (x)₁,y₁),(x₂,y₂),...(x_n,y_n) Where n is the number of texts in the very short text，y_iE {1,2,. c }, c being the number of tags; the feature representation learning is carried out by adopting two embedding modes of words and words, and the word embedding is to map each word to a high-dimensional vector space. Using a Convolutional Neural Network (CNN) to obtain word embedding for each word; the concrete mode is as follows: word embedding as a vector can be considered as a one-dimensional input to CNN, whose size is the input channel size of CNN; word embedding is obtained through word2vec, the dimensionality of a word vector and the dimensionality of a word vector are both d/2, and finally the word vector and the word vector are connected together to obtain a d-dimensional word representation;

2.2) constructing a bidirectional long-short term memory model (Attention-based BilSTM) with an Attention mechanism to obtain a representation z of an extremely short text_s；

The word representation obtained in step 2.1) is regarded as a d-dimensional sequence of word vectors (x)₁,x₂,…,x_n) Wherein n is the length of the very short text; inputting the word vector sequence into an Attention-based BilSTM to obtain a corresponding representation; BilSTM contains a network of forward and backward directions for processing very short text, as shown in equations (1) and (2):

then each one will

And

joined together to form a hidden state h_t(ii) a Therefore, all of h_ts is defined as

As shown in equation (3):

H＝(h₁,h₂,…,h_n) (3)

is a matrix of weights that is a function of,

3) constructing two attention mechanisms for the additional knowledge, and inputting the additional knowledge obtained in the step 1) to obtain a representation z of the concept_c；

3.1) construct the first concept attention mechanism to obtain the attention weight β_i；

As shown in fig. 2, the concept and word-level embedding is performed on the set of concepts obtained in step 1.2) in the same manner as in step 2.1), to obtain a d-dimensional concept vector (c)₁,c₂,…,c_m) Where m is the number of concepts; the first concept attention mechanism is used to calculate the ith concept and the very short text z_sThe calculation formula of the semantic similarity (2) is shown as (6):

wherein, beta_iRepresenting the attention weight of the ith concept to the very short text, and f is the activation function of the network;

is a matrix of the weights that is,

is a weight vector, where d_bIs a hyperparameter, b₂Is a bias vector;

3.2) constructing a second concept attention mechanism to obtain an attention weight delta_i；

The second concept attention mechanism is used to calculate the importance of each concept to the whole concept set, and the calculation formula is shown as (7):

is a matrix of the weights that is,

is a weight vector, where d_cIs a hyperparameter, b₃Is a bias vector;

3.3) combining the two concept attention weights to obtain the attention weight mu_i；

Will beta_iAnd delta_iThe final attention weight is obtained by combining equation (8):

μ_i＝softmax(λβ_i+(1-λ)δ_i) (8)

wherein, mu_iRepresents the final concept weight of the ith concept, and lambda is a weighting parameter to adjustThe importance of two attention weights is saved;

3.4) obtaining a conceptual representation z_c：

The final concept weight mu obtained in the step 3.3)_iAnd the concept vector (c) obtained in step 3.1)₁,c₂,…,c_m) Weighting and summing according to equation (9) to obtain a conceptual representation z_c：

Wherein, c_iA concept vector representing the ith concept.

4) Finally, combining the extremely short text representation with additional knowledge, classifying the extremely short text data set by using a softmax classifier, and obtaining a classification result;

4.1) combining very short text representation with extra knowledge:

representing the extremely short text obtained in the step 2.2) by z_sAnd the conceptual representation z obtained in step 3.4)_cCombining to obtain z input into a full connection layer;

4.3) classifying the test extremely-short text data set by using the trained classifier: outputting z of the test data set in the output z completed in the step 4.1)_testObtaining a classification result T of the test extremely short text data set by substituting the classifier completed by the formula (10)_testAs shown in formula (11):

T_test＝argmax P(z_test) (11)。

the invention provides a short text classification method based on keyword screening and attention mechanism, so that the keyword determining classification in an extremely short text can be found through the keyword screening, and a useful concept can be filtered through the attention mechanism method, so that a feature expression vector is optimized, and the classification accuracy of an extremely short text data set is improved.

The present invention is not limited to the above-mentioned embodiments, and based on the technical solutions disclosed in the present invention, those skilled in the art can make some substitutions and modifications to some technical features without creative efforts according to the disclosed technical contents, and these substitutions and modifications are all within the protection scope of the present invention.

Claims

1. A method for classifying extremely short texts based on keyword screening and attention mechanism is characterized by comprising the following steps:

2. The method for classifying very short texts based on keyword screening and attention mechanism as claimed in claim 1, wherein the step 1) specifically comprises:

wordScore＝wordDegree(w)/wordFrequency(w)

wherein wordDegree indicates that 1 is added to each word co-occurring with a word in a phrase, and wordFrequency indicates the total number of occurrences of the word;

1.2) introducing relevant concepts of the keywords by using the knowledge graph, acquiring the concepts of the keywords from the knowledge graph base as additional knowledge, searching the concepts of the keywords by using an api interface of the knowledge graph base, and combining the searched concepts into a concept set.

3. The method for classifying very short texts based on keyword screening and attention mechanism as claimed in claim 2, wherein said step 2) specifically comprises:

then each one will be

And

As shown in equation (3):

H＝(h₁,h₂,…,h_n) (3)

is a matrix of the weights that is,

4. the method for classifying very short texts based on keyword screening and attention mechanism as claimed in claim 3, wherein said step 3) specifically comprises:

3.1) constructing a first concept attention mechanism: embedding the concept and the word level of the concept set obtained in the step 1.2) to obtain a d-dimensional concept vector (c)₁,c₂,…,c_m) Where m is the number of concepts; the first concept attention mechanism is used to calculate the ith concept and the very short text z_sThe calculation formula of the semantic similarity (2) is shown as (6):

wherein beta is_iRepresenting the attention weight of the ith concept to the very short text, f is the activation function of the network;

is a matrix of the weights that is,

is a weight vector, where d_bIs a hyperparameter, b₂Is a bias vector;

is a matrix of the weights that is,

is a weight vector, where d_cIs a hyperparameter, b₃Is a bias vector;

3.3) combine two conceptual attention weights: will beta_iAnd delta_iThe final attention weight is obtained by combining equation (8):

μ_i＝softmax(λβ_i+(1-λ)δ_i) (8)

3.4) concept representation: the final conceptual weight mu obtained in the step 3.3)_iAnd the concept vector (c) obtained in step 3.1)₁,c₂,…,c_m) Weighting and summing according to equation (9) to obtain a conceptual representation z_c：

Wherein, c_iA concept vector representing the ith concept.

5. The method for classifying very short texts based on keyword screening and attention mechanism as claimed in claim 4, wherein the step 4) specifically comprises:

4.1) combine very short text representation with extra knowledge: representing the extremely short text obtained in the step 2.2) by z_sAnd the conceptual representation z obtained in step 3.4)_cCombining to obtain an output z, and inputting z into a full connection layer;

4.3) classifying the test extremely short text data set by using the trained classifier: outputting z of the test data set in the output z completed in the step 4.1)_testObtaining a classification result T of the test very short text dataset by substituting a classifier of the formula (10)_testAs shown in formula (11):

T_test＝argmax P(z_test) (11)。