CN113987188B

CN113987188B - Short text classification method and device and electronic equipment

Info

Publication number: CN113987188B
Application number: CN202111326798.5A
Authority: CN
Inventors: 夏书银; 唐祚; 张勇; 付京成
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2021-11-10
Filing date: 2021-11-10
Publication date: 2022-07-08
Anticipated expiration: 2041-11-10
Also published as: CN113987188A

Abstract

The invention discloses a short text classification method, a short text classification device and electronic equipment, which relate to the technical field of data processing, and have the technical scheme that: determining knowledge information and key words of the short text; embedding the short text, the knowledge information and the keywords into a vector space for splicing to obtain a vector matrix of the short text, the knowledge information and the keywords; processing the short text vector matrix by adopting a bidirectional memory network layer to obtain semantic information of the short text; attention calculation is carried out on semantic information of the short text and a vector matrix of knowledge information or a vector matrix of keywords to obtain a vector of the knowledge information or the keywords; and performing feature extraction on the vector and the vector matrix by using a convolutional neural network to obtain a short text classification result. The method solves the problem that the text classification can not be accurately carried out in the prior art when the short text context semantic is lost in the short text classification method, and improves the accuracy of text classification.

Description

Short text classification method and device and electronic equipment

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a short text classification method and apparatus, and an electronic device.

Background

In recent years, with the development of deep learning, such as Convolutional Neural Network (CNN), Recurrent Neural Network (RNN) is widely used in text classification, and has achieved good results in longer texts. However, the traditional deep learning neural network has a huge challenge in short text due to the problems of sparsity and ambiguity of data. In order to solve the problems of data sparseness and fuzziness, the current work is focused on acquiring more implicit information from short texts to understand the short texts. The text representation model mainly comprises explicit representation and implicit representation, the explicit representation is based on parts of speech tagging, knowledge base and the like to create effective characteristics, people can easily understand subjectively, but the explicit representation mode usually separates each characteristic information independently and ignores the context information of short texts. The implicit expression is to map each word into a high-dimensional vector, and to express the text information by using a word vector matrix, so that the neural network model can conveniently learn the information contained in the text. But some entity information, implicit representation, in the text may not be able to obtain this information. For example, { Ant will dispose new products }, the implicit expression does not take Ant as an entity and classifies it as a representation of an original word, but Ant as a name of a sports brand may affect the tendency of classification.

In the past, a model structure in which a display and an implicit text mode are integrated has been proposed. There are still several disadvantages, first, in conceptualizing text information, corresponding weight information is obtained through a large knowledge base and integrated into a neural model, but the weight information is static and independent from the text information. And secondly, the keyword information of the text is often ignored, especially in the text with less knowledge information acquired in the emotion two classification task.

Disclosure of Invention

The invention aims to provide a short text classification method, a short text classification device and electronic equipment, which solve the problem that the short text context semantic is lost in the short text classification method in the prior art, and can not accurately classify texts.

The technical purpose of the invention is realized by the following technical scheme:

in a first aspect, the present invention provides a short text classification method, including the following steps:

determining knowledge information and key words of the short text;

embedding the short text, the knowledge information and the keywords into a vector space for splicing to obtain a vector matrix of the short text, the knowledge information and the keywords;

processing the short text vector matrix by adopting a bidirectional memory network layer to obtain semantic information of the short text;

attention calculation is carried out on semantic information of the short text and a vector matrix of knowledge information or a vector matrix of keywords to obtain a vector of the knowledge information or the keywords;

and performing feature extraction on the vector and the vector matrix by using a convolutional neural network to obtain a short text classification result.

The method aims to solve the problem that the short text context semantic is lacked in the short text classification method in the prior art. Therefore, the invention expands the representation range of the short text by determining the knowledge information and the keywords of the short text, but the embedding of the knowledge information by the existing classification method is only static and does not concern about semantic information of the context of the short text, so that a self-attention mechanism based on the context is provided, and the knowledge information is selectively embedded through the context information; in addition, influence generated when knowledge information is insufficient is usually ignored in the conventional classification method, so that the invention provides that a convolutional neural network is adopted to extract the keywords and the vector of the knowledge information and the feature information of the semantic information of the short text, the features of the knowledge information and the keywords are subjected to aggregated classification to obtain a final short text classification result, and more detailed classification results of the short text from different granularities are realized to improve the classification accuracy.

Further, entity recognition is carried out on the short text to obtain an entity set of the short text, and the entity set is recognized to determine knowledge information of the short text.

Further, the short text, the knowledge information and the keywords are input into a neural network model embedding layer, and the word vector model is adopted to pre-train the short text, the knowledge information and the keywords in the embedding layer to obtain vector representation of the short text, the knowledge information and the keywords.

Further, the vector representations of the short text and the knowledge information are spliced in a superior sub-network of the neural network model to obtain a vector matrix of the knowledge information;

the short text and the keywords are pre-trained by adopting a convolutional neural network to obtain character-level vector representations of the short text and the keywords, and the character-level vector representations of the short text and the keywords are spliced in a sub-network at the lower level of a neural network model to obtain a vector matrix of the keywords at the character level.

Further, the short text vector matrix and the character-level short text vector matrix are input to a bidirectional memory network layer for processing, and semantic information of the short text context of the upper and lower sub-networks is respectively obtained.

Furthermore, in the upper-level sub-network, attention calculation is carried out on semantic information and knowledge information of the short text context to obtain a self-attention result of the knowledge information, products of the self-attention result of the knowledge information and the semantic information are calculated, and each product is spliced to obtain a vector of the knowledge information;

in a next-level sub-network, attention calculation is carried out on semantic information of the short text context and the keywords to obtain self-attention results of the keywords, products of the self-attention results of the keywords and the semantic information are calculated, and each product is spliced to obtain vectors of the keywords.

Further, the calculation formula of the attention calculation is y_i＝softmax(a₁(tanh(a₂[c_i；p]+b₂) ); wherein, yⁱRepresenting the weight of knowledge information or keywords to short text, tanh representing a hyperbolic tangent function, softmax representing the normalization of the self-attention result,

a matrix of weights is represented by a matrix of weights,

representing a weight vector, b₂Representing an offset vector, p an intermediate result, W a vector, c_iThe representation represents the i-th knowledge vector in the superior subnetwork and the i-th keyword vector in the inferior subnetwork.

Further, splicing the vector of the knowledge information and the vector matrix of the short text in the superior sub-network, performing feature extraction on the spliced matrix through a two-dimensional convolutional neural network to obtain a feature vector, and classifying the feature vector through a full connection layer of the superior sub-network to obtain a classification result of the superior sub-network;

splicing the vector of the keyword and the vector matrix of the keyword in the lower sub-network, performing feature extraction on the spliced matrix through a two-dimensional convolutional neural network to obtain a feature vector, and classifying the feature vector through a full connection layer of the lower sub-network to obtain a classification result of the lower sub-network;

and carrying out combined classification on the classification result of the superior sub-network and the classification result of the subordinate sub-network to obtain the classification result of the short text.

In a second aspect, the invention provides a short text classification device based on keywords and knowledge information, which is used for realizing the classification method provided in the first aspect, and comprises a determining unit, a splicing unit, a processing unit, a calculating unit and a classifying unit;

the determining unit is used for determining knowledge information and keywords of the short text;

the splicing unit is used for embedding the short text, the knowledge information and the keywords into a vector space for splicing to obtain a vector matrix of the short text, the knowledge information and the keywords;

the processing unit is used for processing the short text vector matrix by adopting a two-way memory network layer to obtain semantic information of the short text;

the computing unit is used for carrying out attention computing on the semantic information of the short text and the vector matrix of the knowledge information or the vector matrix of the keyword to obtain the vector of the knowledge information or the keyword;

and the classification unit is used for extracting the characteristics of the vector and the vector matrix by using a convolutional neural network to obtain a short text classification result.

In a third aspect, the present invention provides an electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the classification method as provided in the first aspect when executing the computer program.

Compared with the prior art, the invention has the following beneficial effects:

the invention firstly conceptualizes the short text to obtain knowledge information and extracts keywords of the short text, provides a concept of an upper and lower two-stage sub-network, and trains the text information and the knowledge information to generate a vector matrix by using a pre-trained word vector model in the upper sub-network. Then, an attention mechanism based on the context of the short text is introduced to measure the importance degree of knowledge information to the short text; and embedding the measured knowledge information and semantic information into a two-dimensional convolution network to capture features and finally classifying. In the lower network, inspired by character level embedding, the text and the keywords are embedded by using the character level embedding to obtain different granularity characteristic information, then the upper sub-network and the lower sub-network are kept consistent in subsequent operation, and finally the classification results of the upper sub-network and the lower sub-network are aggregated and classified to obtain the final text classification result.

Drawings

The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:

FIG. 1 is a schematic flow chart of a method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a network model structure according to an embodiment of the present invention;

FIG. 3 is a block diagram of a frame of an apparatus according to an embodiment of the present invention;

fig. 4 is a block diagram of an electronic device according to an embodiment of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to examples and the accompanying drawings, and the exemplary embodiments and descriptions thereof are only used for explaining the present invention and are not used as limiting the present invention.

It will be understood that when an element is referred to as being "secured to" or "disposed on" another element, it can be directly on the other element or be indirectly connected to the other element. When an element is referred to as being "connected to" another element, it can be directly or indirectly connected to the other element.

It will be understood that the terms "length," "width," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like, as used herein, refer to an orientation or positional relationship indicated in the drawings that is solely for the purpose of facilitating the description and simplifying the description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and is therefore not to be construed as limiting the invention.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.

Example one

The method provided by the embodiment can be applied to the classification of short texts with less than 15 characters and can also be applied to the classification of short texts with 15-25 characters, and the problem that the short text context semantic is lacked in the short text classification method in the prior art is solved, so that the classification of the short texts is more accurate.

As shown in fig. 1, the first embodiment provides a short text classification method, which includes the following steps:

step S10, determining knowledge information and keywords of the short text.

Specifically, short text information is conceptualized. This is accomplished using existing general knowledge bases (e.g., Yago, Freebase, and base). And (3) the Probase knowledge base is used, and because the information contained in the Probase knowledge base is more extensive, the concept information in the short text can be excavated more. And obtaining an entity set E of the short text by utilizing a network interface of entity identification provided by the Probase. And then for each entity E. And acquiring the concept information of the information from the existing knowledge base by taking the isA relation as a standard. For example, the short text "Yahoo fixes two flies in mail system" obtains an entity set E ═ Yahoo mail } through an entity recognition network interface in the base, and then obtains a concept set C ═ search engine company and appucation service } by conceptualizing the isA relationship selected by the Yahoo entity.

Keywords are extracted from the short text. In this embodiment, a key word set K of a text is obtained through a Yake keyword extraction algorithm, which is an unsupervised keyword extraction algorithm characterized by capitalization of words, positions of words, word frequencies, context relationships, and frequency of occurrence of words in sentences. For example, the short text "Yahoo fixes two flies in mail system", K ═ Yahoo, fixes, flies } is obtained by the keyword extraction algorithm.

And step S20, embedding the short text, the knowledge information and the keywords into a vector space for splicing to obtain a vector matrix of the short text, the knowledge information and the keywords.

Specifically, as shown in fig. 2, unlike other short text classifications, only the knowledge information of the short text is used to expand the text features, the embodiment of the present application further uses the keyword information of the text, embeds the keyword information into the sub-networks of the next level for classification, and proposes to input the short text, the knowledge information, and the keywords into the embedded layers of the sub-networks of the upper and lower levels of the neural network model for stitching to obtain the vector matrix of the short text.

And step S30, processing the short text vector matrix by using a two-way memory network layer to obtain the semantic information of the short text.

Specifically, the operation of the upper sub-network coincides with that of the lower sub-network, and the upper sub-network is described as an example. Vector matrix W of short text words obtained by input unit_w＝{W₁,W₂,...,W_nInputting the context semantic information of the short text in the upper sub-network into an LSTM network; similarly, character-level contextual semantic information for short text may also be obtained at a subordinate subnetwork.

Step S40, the semantic information of the short text and the vector matrix of the knowledge information or the vector matrix of the keyword are attentively calculated to obtain the vector of the knowledge information or the keyword.

Specifically, the knowledge information and the keywords of the short text can supplement the feature information of the short text, and the determination of the class label of the short text is facilitated. In the superior sub-network, the short text semantic information and the knowledge information are coded for attention calculation. In the next-level sub-network, attention calculation is carried out on the short text semantic information and the keyword codes, a context-dependent attention mechanism is provided, and the weight of the concept or the keyword is calculated according to the semantic information contained in the context of the short text.

And step S50, extracting the features of the vector and the vector matrix by using a convolutional neural network to obtain a short text classification result.

In particular, the Convolutional Neural Network (CNN) can extract more feature information from short texts. The method comprises the steps that vectors of semantic information and knowledge information of short texts are spliced in a superior sub-network to serve as input of the superior sub-network, vectors of the semantic information and keywords of the short texts are spliced in a subordinate sub-network to serve as input of the subordinate sub-network, the input of the superior sub-network and the input of the subordinate sub-network are subjected to convolution, pooling and classification processing through a convolution neural network, classification results of the superior sub-network and the subordinate sub-network are obtained, and aggregation classification processing is conducted on the classification results of the superior sub-network and the subordinate sub-network to obtain a final short text classification result.

According to the technical scheme, the short text classification method expands the representation range of the short text by determining the knowledge information and the keywords of the short text, but the embedding of the knowledge information by the existing classification method is only static and does not pay attention to semantic information of the context of the short text, so that a self-attention mechanism based on the context is provided, and the knowledge information is selectively embedded through the context information. In addition, influence generated when knowledge information is insufficient is often ignored in the conventional classification method, so that the method adopts a convolutional neural network to extract the keywords and the vector of the knowledge information and the feature information of the semantic information of the short text, performs aggregated classification on the knowledge information and the features of the keywords to obtain a final short text classification result, and realizes generation of a more detailed classification result on the short text from different granularities so as to improve the classification accuracy.

A description is given below of a possible implementation manner of each step of the short text classification method provided in the first embodiment of the present application.

On the basis of the first embodiment, in a further embodiment of the present application, entity recognition is performed on the short text to obtain an entity set of the short text, and the entity set is recognized to determine knowledge information of the short text.

Specifically, how to determine the knowledge information of the short text is already described in step S10, and is not described here.

Based on the above embodiment, in a further embodiment of the present application, the short text, the knowledge information and the keywords are input into the neural network model embedding layer, and the word vector model is used to pre-train the short text, the knowledge information and the keywords in the embedding layer, so as to obtain the vector representation of the short text, the knowledge information and the keywords.

Specifically, in the embedding layer of the upper sub-network, we embed words and concepts into a high-dimensional vector space. Here we use Word vectors pre-trained with the Word2vec model to obtain a vector representation of each Word. W_w，W_cRespectively, as embedded representations of words, knowledge information. The concrete formula is as follows

Wherein, it needs to be explained that one short text contains a plurality of words, for example, zhang san today goes to an orchard and plants a plurality of fruit trees; in the text, a plurality of words appear, such as orchards, fruit trees, planting and the like;

the splicing operation is shown in all the embodiments of the application, m and n respectively represent words and the maximum number of words of knowledge information, wherein

Is a vector representation of the ith word,

for the vector representation of the ith knowledge information, finally obtaining the vector representation W of the short text through a splicing operation_wVector representation W with knowledge information_c. If the vector length of the text and knowledge information is not enough, 0 is used for filling.

In the next sub-network, we use a standard Convolutional Neural Network (CNN) to obtain the character-level vector representation of the ith word

Character level vector representation with ith keyword

T and v are respectively the maximum number of words and keywords, and a vector matrix E of a character-level short text and keyword set is obtained through the same splicing operation as that of a superior subnetwork_w,E_k。

On the basis of the first embodiment, in a further embodiment of the present application, the vector representations of the short text and the knowledge information are spliced in a superior subnetwork of the neural network model to obtain a vector matrix of the knowledge information;

Specifically, how to obtain the vector matrix of the short text, the knowledge information and the keyword is described in the implementation of the above embodiment, and is not described here.

Based on the first embodiment, in a further embodiment of the present application, the short text vector matrix and the vector matrix of the character-level short text are both input to the bidirectional memory network layer for processing, so as to obtain semantic information of the short text contexts of the upper and lower subnetworks respectively.

Specifically, the upper subnetwork and the lower subnetwork match the operation of obtaining the vector matrix, and thus the upper subnetwork is taken as an example for description. A word vector matrix W obtained by an input unit_w＝{W₁,W₂,...,W_nThe text is input to the LSTM network, so as to obtain the context semantic information of the text. Forward LSTM reads in the normal order (W)₁～W_n) The inverted LSTM is read in reverse order (W) as shown in equation (4)_n～W₁) The following formula (5):

wherein h is_tRepresenting the neuron output at time t, w_iRepresenting the ith short text vector. Combining each forward output at time t

And reverse input

To obtain the final h_tAs in formula (6) above, we use H_supSemantic representation of the superordinate subnetwork, i.e. the final h_tI.e. H_sup＝{h₁,h₂,...,h_t}. The same operation procedure as that of the upper sub-network is adopted, and E_w＝{E₁,E₂,...,E_tH is input into LSTM network in subordinate sub-network to obtain semantic representation H of subordinate sub-network_sub。

In the lower sub-network, the calculation mode is the same as that of the upper sub-network, and the Q, K and V vectors in the lower sub-network are all equal to H_subThe final E is obtained by the same operation mode as the superior subnetwork_k。

On the basis of the first embodiment, in a further embodiment of the present application, in the upper-level sub-network, attention calculation is performed on semantic information and knowledge information of a short text context to obtain a self-attention result of the knowledge information, products of the self-attention result and the semantic information of the knowledge information are calculated, and each product is spliced to obtain a vector of the knowledge information;

Specifically, since the lower subnetwork is calculated in the same manner as the upper subnetwork, the upper subnetwork will be explained as an example, and first, the zoom point product attention mechanism is used to capture the word-word dependency relationship between sentences, and learn the internal structure of the sentences. Given a query vector Q, a key matrix K and a value matrix V. Where Q, K, V are three matrices of the same value and all equal to H_sup2r denotes a scaling factor, and r denotes the number of neurons of the upper subnetwork. Subjecting the result A obtained by the calculation to a maximum pooling operation represented by the following formula (A)8) The dependency of words of short text is represented by a maximum in each dimension. The specific formula is as follows:

p＝maxpool(A) (8)

after calculating p, we propose a context-based attention calculation for calculating the importance of knowledge information to text in the upper sub-network, and the specific formula is as follows:

y_i＝softmax(a₁(tanh(a₂[c_i；p]+b₂))) (9)

yi represents the weight of a concept to text, and a larger y represents a greater importance of this concept/keyword to short text. tanh is a hyperbolic tangent function, normalizing the attention result to [0,1 ] using a softmax function]Within the range of (a).

A matrix of weights is represented by a matrix of weights,

representing weight vectors, R being a vector space representation, d_rRepresents a hyper-parameter, b₂Representing an offset vector. Finally, the calculated weight y_iMultiplication by

And splicing to obtain final W_cAs in the above formula (10).

In a further embodiment of the present application, based on the first embodiment, the calculation formula of the attention calculation is y_i＝softmax(a₁(tanh(a₂[c_i；p]+b₂) ); wherein, yⁱRepresenting knowledge information or informationWeight of the key word to the short text, tanh denotes hyperbolic tangent function, softmax denotes normalization of the self-attention result,

a matrix of weights is represented by a matrix of weights,

Specifically, the previous embodiment has already explained how to perform the attention calculation, and therefore is not described here.

On the basis of the first embodiment, in a further embodiment of the present application, the vector of the knowledge information and the vector matrix of the short text are spliced in the upper-level sub-network, feature extraction is performed on the spliced matrix through a two-dimensional convolutional neural network to obtain a feature vector, and the feature vector is classified through a full connection layer of the upper-level sub-network to obtain a classification result of the upper-level sub-network;

Specifically, in the upper sub-network, we will use the semantic information H of the short text_supAnd W_cSplicing as input W_supIn the subordinate subnetwork we will be H_subAnd E_kSplicing as input W_sub. The corresponding formula is as follows:

wherein the content of the first and second substances,

m denotes the word vector dimension, n_c/n_kRepresenting the number of concepts/keywords, R is a vector space representation. And then, performing convolution, pooling and classification operations on the upper and lower subnetworks respectively by using a CNN model.

Firstly, the convolution kernels with the width fixed as m and the different heights h are used for respectively carrying out W_sup，W_subConvolution operation is carried out, so as to extract the features of the short text and generate a group of feature vectors v_i. Feature vector [ v ] to be generated₁；v_i]Activation takes place via the activation function relu. The specific formula is as follows:

S_sup＝relu(w·v_i+b) (13)

wherein w is and vⁱThe dimensions of the weight matrix are the same, and b represents an offset vector. Obtaining S of lower sub-network by the same operation_sub。

In the Pooling layer, Max Pooling is used, the maximum value in a certain area is taken as a representative to be output, and a vector T with a fixed length is extracted from the feature map_i. Therefore, the generalization capability of the neural network model is improved, and the network parameters are reduced. After CNN pooling, we introduce a fully connected softmax (·) layer to sort the upper and lower levels separately. And finally, performing combined classification on the classification results of the upper and lower sub-networks to obtain a final classification result Output, as shown in fig. 2, wherein the specific formula is as follows:

by combining the above technical solutions, as shown in fig. 2, fig. 2 is a proposed short text network classification model, and the model is generally composed of an upper level network and a lower level network, and each of the upper level network and the lower level network includes four units. The method comprises the steps that four units are arranged in a superior network, short texts are conceptualized by using an external knowledge base (base), corresponding word vector matrixes are generated by using pre-trained word vectors, and the vector matrixes of the texts are input into an LSTM network to obtain semantic representation of the texts. Thirdly, the semantic representation and the concept word vector matrix are subjected to a dynamic attention mechanism to obtain a knowledge information vector. And finally, connecting the semantic representation with the vector of the knowledge information together to complete classification through a CNN network. In a lower-level network, keywords in a short text are obtained by using a Yake keyword automatic extraction algorithm, corresponding vector representation is generated through character-level features, other units in the lower-level network are consistent with a higher-level sub-network, and knowledge information is replaced by keyword information. And finally, combining the classification results of the upper and lower networks together, and acquiring the probability of each class by using an output layer.

Example two

Based on the same concept, the second embodiment provides a short text classification device, which can be applied to a computer and some other electronic devices to execute the short text classification method described in the first embodiment, as shown in fig. 3, which shows a structural block diagram of the short text classification device provided in the second embodiment of the present application, and includes a determining unit 110, a splicing unit 120, a processing unit 130, a calculating unit 140, and a classifying unit 150;

the determining unit 110 is configured to determine knowledge information and keywords of a short text;

the splicing unit 120 is configured to embed the short text, the knowledge information, and the keyword into a vector space for splicing to obtain a vector matrix of the short text, the knowledge information, and the keyword;

the processing unit 130 is configured to process the short text vector matrix by using a two-way memory network layer to obtain semantic information of the short text;

the calculating unit 140 is configured to perform attention calculation on the semantic information of the short text and the vector matrix of the knowledge information or the vector matrix of the keyword to obtain a vector of the knowledge information or the keyword;

the classification unit 150 is configured to perform feature extraction on the vector and the vector matrix by using a convolutional neural network to obtain a short text classification result.

According to the technical scheme, the short text classification device in the second embodiment of the application expands the representation range of the short text by determining the knowledge information and the keywords of the short text, but the embedding of the knowledge information by the existing classification method is only static, and does not pay attention to semantic information of the context of the short text, so that a self-attention mechanism based on the context is provided, and the knowledge information is selectively embedded through the context information. In addition, influence generated when knowledge information is insufficient is often ignored in the conventional classification method, so that the method adopts a convolutional neural network to extract the keywords and the vector of the knowledge information and the feature information of the semantic information of the short text, performs aggregated classification on the knowledge information and the features of the keywords to obtain a final short text classification result, and realizes generation of a more detailed classification result on the short text from different granularities so as to improve the classification accuracy.

Optionally, the determining unit 110 is further configured to perform entity identification on the short text, obtain an entity set of the short text, and identify the entity set to determine knowledge information of the short text.

Optionally, the splicing unit 120 is further configured to input the short text, the knowledge information, and the keyword into the neural network model embedding layer, and pre-train the short text, the knowledge information, and the keyword in the embedding layer by using a word vector model to obtain a vector representation of the short text, the knowledge information, and the keyword.

Optionally, the splicing unit 120 includes a first splicing unit and a second splicing unit, where the first splicing unit is configured to splice the vector representations of the short text and the knowledge information in a superior subnetwork of the neural network model to obtain a vector matrix of the knowledge information;

the second splicing unit is used for pre-training the short text and the keywords by adopting a convolutional neural network to obtain character-level vector representations of the short text and the keywords, splicing the character-level vector representations of the short text and the keywords in a sub-network of a lower level of the neural network model to obtain a vector matrix of the keywords at a character level.

Optionally, the calculating unit 140 is configured to input the short text vector matrix and the vector matrix of the character-level short text into the two-way memory network layer for processing, and obtain semantic information of the short text contexts of the upper and lower subnetworks respectively.

Optionally, the calculating unit 140 is further configured to perform attention calculation on the semantic information and the knowledge information of the short text context in the upper-level sub-network to obtain a self-attention result of the knowledge information, calculate a product of the self-attention result of the knowledge information and the semantic information, and splice each product to obtain a vector of the knowledge information;

in a next-level sub-network, attention calculation is carried out on semantic information and keywords of the short text context to obtain self-attention results of the keywords, products of the self-attention results and the semantic information of the keywords are calculated, and each product is spliced to obtain vectors of the keywords.

Optionally, the formula of the attention calculation is y_i＝softmax(a₁(tanh(a₂[c_i；p]+b₂) ); wherein, yⁱRepresenting the weight of knowledge information or keywords to short text, tanh representing a hyperbolic tangent function, softmax representing the normalization of the self-attention result,

a matrix of weights is represented by a matrix of weights,

Optionally, the vector of the knowledge information and the vector matrix of the short text are spliced in the superior subnetwork, the feature vector is obtained by performing feature extraction on the spliced matrix through a two-dimensional convolutional neural network, and the feature vector is classified through a full connection layer of the superior subnetwork to obtain a classification result of the superior subnetwork;

splicing the vector of the keyword and the vector matrix of the keyword in the lower sub-network, performing feature extraction on the spliced matrix through a two-dimensional convolution neural network to obtain a feature vector, and classifying the feature vector through a full connection layer of the lower sub-network to obtain a classification result of the lower sub-network;

The feasible implementation manners of each unit of the short text classification apparatus provided in the second embodiment of the present application are all described in the first embodiment of the short text classification method, and therefore will not be described here.

EXAMPLE III

Based on the same concept, as shown in fig. 4, a third embodiment of the present application provides an electronic device, which includes a memory 330, a processor 310, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of implementing the classification method provided in the first embodiment when executing the computer program.

Fig. 4 is a schematic entity structure diagram of an electronic device according to a third embodiment of the present invention, and as shown in fig. 4, the electronic device may include: a processor (processor)310, a communication interface (communication interface)320, a memory (memory)330 and a communication bus 340, wherein the processor 310, the communication interface 320 and the memory 330 communicate with each other via the communication bus 340. The processor 310 may invoke a computer program stored on the memory 330 and executable on the processor 310 to perform the text classification methods provided by the various embodiments described above, including, for example: determining knowledge information and key words of the short text; embedding the short text, the knowledge information and the keywords into a vector space for splicing to obtain a vector matrix of the short text, the knowledge information and the keywords; processing the short text vector matrix by adopting a bidirectional memory network layer to obtain semantic information of the short text; performing attention calculation on the semantic information of the short text, the vector matrix of the knowledge information and the vector matrix of the keywords to obtain the knowledge information or the vectors of the keywords of the short text; and performing feature extraction on the vector and semantic information of the short text by using a convolutional neural network to obtain a short text classification result.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A short text classification method is characterized by comprising the following steps:

determining knowledge information and key words of the short text;

embedding the short text, the knowledge information and the keywords into a vector space for splicing to obtain a vector matrix of the short text, the knowledge information and the keywords, which specifically comprises the following steps: splicing the vector representations of the short text and the knowledge information in a superior subnetwork of the neural network model to obtain a vector matrix of the knowledge information; pre-training the short text and the keywords by adopting a convolutional neural network to obtain character-level vector representations of the short text and the keywords, and splicing the character-level vector representations of the short text and the keywords in a lower-level sub-network of a neural network model to obtain a vector matrix of the character-level keywords;

the method comprises the following steps of utilizing a convolutional neural network to carry out feature extraction on vectors and a vector matrix to obtain a short text classification result, and specifically comprising the following steps: splicing the vector of the knowledge information and the vector matrix of the short text in a superior subnetwork, extracting the characteristics of the spliced matrix through a two-dimensional convolutional neural network to obtain a characteristic vector, and classifying the characteristic vector through a full-connection layer of the superior subnetwork to obtain a classification result of the superior subnetwork;

2. The short text classification method according to claim 1, wherein the determining knowledge information of the short text specifically comprises: and carrying out entity recognition on the short text to obtain an entity set of the short text, and recognizing the entity set to determine knowledge information of the short text.

3. The short text classification method according to claim 2, wherein the short text, the knowledge information and the keyword are embedded in a vector space for stitching to obtain a vector matrix of the short text, the knowledge information and the keyword, and the method further comprises the following steps: inputting the short text, the knowledge information and the keywords into a neural network model embedding layer, and pre-training the short text, the knowledge information and the keywords of the embedding layer by adopting a word vector model to obtain vector representation of the short text, the knowledge information and the keywords.

4. The method for classifying short texts according to claim 1, wherein the processing of the short text vector matrix by the bidirectional memory network layer to obtain the semantic information of the short texts specifically comprises:

and inputting the short text vector matrix and the character-level short text vector matrix into a bidirectional memory network layer for processing, and respectively obtaining semantic information of the short text context of the upper and lower sub-networks.

5. The method for classifying short texts according to claim 1, wherein performing attention calculation on the semantic information of the short texts and the vector matrix of the knowledge information or the vector matrix of the keyword to obtain the vector of the knowledge information or the keyword specifically comprises:

in a superior subnetwork, performing attention calculation on semantic information and knowledge information of a short text context to obtain a self-attention result of the knowledge information, calculating products of the self-attention result of the knowledge information and the semantic information, and splicing each product to obtain a vector of the knowledge information;

6. The short text classification method according to claim 5,

the calculation formula of the attention calculation is:

wherein the weight of the knowledge information or the keyword to the short text is represented, tanhwhich represents a function of the hyperbolic tangent,softmaxindicating that the self-attention results are normalized,

a matrix of weights is represented by a matrix of weights,

a vector of weights is represented by a vector of weights,

which represents the offset vector of the offset vector,pthe intermediate result is shown to be,

indicating in the upper subnetwork thatiThe sum of knowledge vectors representing the second in the next subnetworkiA key vector.

7. A short text classification device is characterized by comprising a determining unit, a splicing unit, a processing unit, a calculating unit and a classifying unit;

the splicing unit is used for embedding the short text, the knowledge information and the keywords into a vector space for splicing to obtain a vector matrix of the short text, the knowledge information and the keywords, and specifically comprises the following steps: splicing the vector representations of the short text and the knowledge information in a superior subnetwork of the neural network model to obtain a vector matrix of the knowledge information; pre-training the short text and the keywords by adopting a convolutional neural network to obtain character-level vector representations of the short text and the keywords, and splicing the character-level vector representations of the short text and the keywords in a lower-level sub-network of a neural network model to obtain a vector matrix of the character-level keywords;

the classification unit is used for extracting features of the vectors and the vector matrix by using a convolutional neural network to obtain a short text classification result, and specifically comprises the following steps: splicing the vector of the knowledge information and the vector matrix of the short text in a superior subnetwork, extracting the characteristics of the spliced matrix through a two-dimensional convolutional neural network to obtain a characteristic vector, and classifying the characteristic vector through a full-connection layer of the superior subnetwork to obtain a classification result of the superior subnetwork;

8. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the classification method according to any one of claims 1 to 6 when executing the computer program.