CN113987188B - Short text classification method and device and electronic equipment - Google Patents

Short text classification method and device and electronic equipment Download PDF

Info

Publication number
CN113987188B
CN113987188B CN202111326798.5A CN202111326798A CN113987188B CN 113987188 B CN113987188 B CN 113987188B CN 202111326798 A CN202111326798 A CN 202111326798A CN 113987188 B CN113987188 B CN 113987188B
Authority
CN
China
Prior art keywords
short text
vector
knowledge information
keywords
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111326798.5A
Other languages
Chinese (zh)
Other versions
CN113987188A (en
Inventor
夏书银
唐祚
张勇
付京成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202111326798.5A priority Critical patent/CN113987188B/en
Publication of CN113987188A publication Critical patent/CN113987188A/en
Application granted granted Critical
Publication of CN113987188B publication Critical patent/CN113987188B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

The invention discloses a short text classification method, a short text classification device and electronic equipment, which relate to the technical field of data processing, and have the technical scheme that: determining knowledge information and key words of the short text; embedding the short text, the knowledge information and the keywords into a vector space for splicing to obtain a vector matrix of the short text, the knowledge information and the keywords; processing the short text vector matrix by adopting a bidirectional memory network layer to obtain semantic information of the short text; attention calculation is carried out on semantic information of the short text and a vector matrix of knowledge information or a vector matrix of keywords to obtain a vector of the knowledge information or the keywords; and performing feature extraction on the vector and the vector matrix by using a convolutional neural network to obtain a short text classification result. The method solves the problem that the text classification can not be accurately carried out in the prior art when the short text context semantic is lost in the short text classification method, and improves the accuracy of text classification.

Description

Short text classification method and device and electronic equipment
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a short text classification method and apparatus, and an electronic device.
Background
In recent years, with the development of deep learning, such as Convolutional Neural Network (CNN), Recurrent Neural Network (RNN) is widely used in text classification, and has achieved good results in longer texts. However, the traditional deep learning neural network has a huge challenge in short text due to the problems of sparsity and ambiguity of data. In order to solve the problems of data sparseness and fuzziness, the current work is focused on acquiring more implicit information from short texts to understand the short texts. The text representation model mainly comprises explicit representation and implicit representation, the explicit representation is based on parts of speech tagging, knowledge base and the like to create effective characteristics, people can easily understand subjectively, but the explicit representation mode usually separates each characteristic information independently and ignores the context information of short texts. The implicit expression is to map each word into a high-dimensional vector, and to express the text information by using a word vector matrix, so that the neural network model can conveniently learn the information contained in the text. But some entity information, implicit representation, in the text may not be able to obtain this information. For example, { Ant will dispose new products }, the implicit expression does not take Ant as an entity and classifies it as a representation of an original word, but Ant as a name of a sports brand may affect the tendency of classification.
In the past, a model structure in which a display and an implicit text mode are integrated has been proposed. There are still several disadvantages, first, in conceptualizing text information, corresponding weight information is obtained through a large knowledge base and integrated into a neural model, but the weight information is static and independent from the text information. And secondly, the keyword information of the text is often ignored, especially in the text with less knowledge information acquired in the emotion two classification task.
Disclosure of Invention
The invention aims to provide a short text classification method, a short text classification device and electronic equipment, which solve the problem that the short text context semantic is lost in the short text classification method in the prior art, and can not accurately classify texts.
The technical purpose of the invention is realized by the following technical scheme:
in a first aspect, the present invention provides a short text classification method, including the following steps:
determining knowledge information and key words of the short text;
embedding the short text, the knowledge information and the keywords into a vector space for splicing to obtain a vector matrix of the short text, the knowledge information and the keywords;
processing the short text vector matrix by adopting a bidirectional memory network layer to obtain semantic information of the short text;
attention calculation is carried out on semantic information of the short text and a vector matrix of knowledge information or a vector matrix of keywords to obtain a vector of the knowledge information or the keywords;
and performing feature extraction on the vector and the vector matrix by using a convolutional neural network to obtain a short text classification result.
The method aims to solve the problem that the short text context semantic is lacked in the short text classification method in the prior art. Therefore, the invention expands the representation range of the short text by determining the knowledge information and the keywords of the short text, but the embedding of the knowledge information by the existing classification method is only static and does not concern about semantic information of the context of the short text, so that a self-attention mechanism based on the context is provided, and the knowledge information is selectively embedded through the context information; in addition, influence generated when knowledge information is insufficient is usually ignored in the conventional classification method, so that the invention provides that a convolutional neural network is adopted to extract the keywords and the vector of the knowledge information and the feature information of the semantic information of the short text, the features of the knowledge information and the keywords are subjected to aggregated classification to obtain a final short text classification result, and more detailed classification results of the short text from different granularities are realized to improve the classification accuracy.
Further, entity recognition is carried out on the short text to obtain an entity set of the short text, and the entity set is recognized to determine knowledge information of the short text.
Further, the short text, the knowledge information and the keywords are input into a neural network model embedding layer, and the word vector model is adopted to pre-train the short text, the knowledge information and the keywords in the embedding layer to obtain vector representation of the short text, the knowledge information and the keywords.
Further, the vector representations of the short text and the knowledge information are spliced in a superior sub-network of the neural network model to obtain a vector matrix of the knowledge information;
the short text and the keywords are pre-trained by adopting a convolutional neural network to obtain character-level vector representations of the short text and the keywords, and the character-level vector representations of the short text and the keywords are spliced in a sub-network at the lower level of a neural network model to obtain a vector matrix of the keywords at the character level.
Further, the short text vector matrix and the character-level short text vector matrix are input to a bidirectional memory network layer for processing, and semantic information of the short text context of the upper and lower sub-networks is respectively obtained.
Furthermore, in the upper-level sub-network, attention calculation is carried out on semantic information and knowledge information of the short text context to obtain a self-attention result of the knowledge information, products of the self-attention result of the knowledge information and the semantic information are calculated, and each product is spliced to obtain a vector of the knowledge information;
in a next-level sub-network, attention calculation is carried out on semantic information of the short text context and the keywords to obtain self-attention results of the keywords, products of the self-attention results of the keywords and the semantic information are calculated, and each product is spliced to obtain vectors of the keywords.
Further, the calculation formula of the attention calculation is yi=softmax(a1(tanh(a2[ci;p]+b2) ); wherein, yiRepresenting the weight of knowledge information or keywords to short text, tanh representing a hyperbolic tangent function, softmax representing the normalization of the self-attention result,
Figure BDA0003347190030000031
a matrix of weights is represented by a matrix of weights,
Figure BDA0003347190030000032
representing a weight vector, b2Representing an offset vector, p an intermediate result, W a vector, ciThe representation represents the i-th knowledge vector in the superior subnetwork and the i-th keyword vector in the inferior subnetwork.
Further, splicing the vector of the knowledge information and the vector matrix of the short text in the superior sub-network, performing feature extraction on the spliced matrix through a two-dimensional convolutional neural network to obtain a feature vector, and classifying the feature vector through a full connection layer of the superior sub-network to obtain a classification result of the superior sub-network;
splicing the vector of the keyword and the vector matrix of the keyword in the lower sub-network, performing feature extraction on the spliced matrix through a two-dimensional convolutional neural network to obtain a feature vector, and classifying the feature vector through a full connection layer of the lower sub-network to obtain a classification result of the lower sub-network;
and carrying out combined classification on the classification result of the superior sub-network and the classification result of the subordinate sub-network to obtain the classification result of the short text.
In a second aspect, the invention provides a short text classification device based on keywords and knowledge information, which is used for realizing the classification method provided in the first aspect, and comprises a determining unit, a splicing unit, a processing unit, a calculating unit and a classifying unit;
the determining unit is used for determining knowledge information and keywords of the short text;
the splicing unit is used for embedding the short text, the knowledge information and the keywords into a vector space for splicing to obtain a vector matrix of the short text, the knowledge information and the keywords;
the processing unit is used for processing the short text vector matrix by adopting a two-way memory network layer to obtain semantic information of the short text;
the computing unit is used for carrying out attention computing on the semantic information of the short text and the vector matrix of the knowledge information or the vector matrix of the keyword to obtain the vector of the knowledge information or the keyword;
and the classification unit is used for extracting the characteristics of the vector and the vector matrix by using a convolutional neural network to obtain a short text classification result.
In a third aspect, the present invention provides an electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the classification method as provided in the first aspect when executing the computer program.
Compared with the prior art, the invention has the following beneficial effects:
the invention firstly conceptualizes the short text to obtain knowledge information and extracts keywords of the short text, provides a concept of an upper and lower two-stage sub-network, and trains the text information and the knowledge information to generate a vector matrix by using a pre-trained word vector model in the upper sub-network. Then, an attention mechanism based on the context of the short text is introduced to measure the importance degree of knowledge information to the short text; and embedding the measured knowledge information and semantic information into a two-dimensional convolution network to capture features and finally classifying. In the lower network, inspired by character level embedding, the text and the keywords are embedded by using the character level embedding to obtain different granularity characteristic information, then the upper sub-network and the lower sub-network are kept consistent in subsequent operation, and finally the classification results of the upper sub-network and the lower sub-network are aggregated and classified to obtain the final text classification result.
Drawings
The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:
FIG. 1 is a schematic flow chart of a method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a network model structure according to an embodiment of the present invention;
FIG. 3 is a block diagram of a frame of an apparatus according to an embodiment of the present invention;
fig. 4 is a block diagram of an electronic device according to an embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to examples and the accompanying drawings, and the exemplary embodiments and descriptions thereof are only used for explaining the present invention and are not used as limiting the present invention.
It will be understood that when an element is referred to as being "secured to" or "disposed on" another element, it can be directly on the other element or be indirectly connected to the other element. When an element is referred to as being "connected to" another element, it can be directly or indirectly connected to the other element.
It will be understood that the terms "length," "width," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like, as used herein, refer to an orientation or positional relationship indicated in the drawings that is solely for the purpose of facilitating the description and simplifying the description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and is therefore not to be construed as limiting the invention.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.
Example one
The method provided by the embodiment can be applied to the classification of short texts with less than 15 characters and can also be applied to the classification of short texts with 15-25 characters, and the problem that the short text context semantic is lacked in the short text classification method in the prior art is solved, so that the classification of the short texts is more accurate.
As shown in fig. 1, the first embodiment provides a short text classification method, which includes the following steps:
step S10, determining knowledge information and keywords of the short text.
Specifically, short text information is conceptualized. This is accomplished using existing general knowledge bases (e.g., Yago, Freebase, and base). And (3) the Probase knowledge base is used, and because the information contained in the Probase knowledge base is more extensive, the concept information in the short text can be excavated more. And obtaining an entity set E of the short text by utilizing a network interface of entity identification provided by the Probase. And then for each entity E. And acquiring the concept information of the information from the existing knowledge base by taking the isA relation as a standard. For example, the short text "Yahoo fixes two flies in mail system" obtains an entity set E ═ Yahoo mail } through an entity recognition network interface in the base, and then obtains a concept set C ═ search engine company and appucation service } by conceptualizing the isA relationship selected by the Yahoo entity.
Keywords are extracted from the short text. In this embodiment, a key word set K of a text is obtained through a Yake keyword extraction algorithm, which is an unsupervised keyword extraction algorithm characterized by capitalization of words, positions of words, word frequencies, context relationships, and frequency of occurrence of words in sentences. For example, the short text "Yahoo fixes two flies in mail system", K ═ Yahoo, fixes, flies } is obtained by the keyword extraction algorithm.
And step S20, embedding the short text, the knowledge information and the keywords into a vector space for splicing to obtain a vector matrix of the short text, the knowledge information and the keywords.
Specifically, as shown in fig. 2, unlike other short text classifications, only the knowledge information of the short text is used to expand the text features, the embodiment of the present application further uses the keyword information of the text, embeds the keyword information into the sub-networks of the next level for classification, and proposes to input the short text, the knowledge information, and the keywords into the embedded layers of the sub-networks of the upper and lower levels of the neural network model for stitching to obtain the vector matrix of the short text.
And step S30, processing the short text vector matrix by using a two-way memory network layer to obtain the semantic information of the short text.
Specifically, the operation of the upper sub-network coincides with that of the lower sub-network, and the upper sub-network is described as an example. Vector matrix W of short text words obtained by input unitw={W1,W2,...,WnInputting the context semantic information of the short text in the upper sub-network into an LSTM network; similarly, character-level contextual semantic information for short text may also be obtained at a subordinate subnetwork.
Step S40, the semantic information of the short text and the vector matrix of the knowledge information or the vector matrix of the keyword are attentively calculated to obtain the vector of the knowledge information or the keyword.
Specifically, the knowledge information and the keywords of the short text can supplement the feature information of the short text, and the determination of the class label of the short text is facilitated. In the superior sub-network, the short text semantic information and the knowledge information are coded for attention calculation. In the next-level sub-network, attention calculation is carried out on the short text semantic information and the keyword codes, a context-dependent attention mechanism is provided, and the weight of the concept or the keyword is calculated according to the semantic information contained in the context of the short text.
And step S50, extracting the features of the vector and the vector matrix by using a convolutional neural network to obtain a short text classification result.
In particular, the Convolutional Neural Network (CNN) can extract more feature information from short texts. The method comprises the steps that vectors of semantic information and knowledge information of short texts are spliced in a superior sub-network to serve as input of the superior sub-network, vectors of the semantic information and keywords of the short texts are spliced in a subordinate sub-network to serve as input of the subordinate sub-network, the input of the superior sub-network and the input of the subordinate sub-network are subjected to convolution, pooling and classification processing through a convolution neural network, classification results of the superior sub-network and the subordinate sub-network are obtained, and aggregation classification processing is conducted on the classification results of the superior sub-network and the subordinate sub-network to obtain a final short text classification result.
According to the technical scheme, the short text classification method expands the representation range of the short text by determining the knowledge information and the keywords of the short text, but the embedding of the knowledge information by the existing classification method is only static and does not pay attention to semantic information of the context of the short text, so that a self-attention mechanism based on the context is provided, and the knowledge information is selectively embedded through the context information. In addition, influence generated when knowledge information is insufficient is often ignored in the conventional classification method, so that the method adopts a convolutional neural network to extract the keywords and the vector of the knowledge information and the feature information of the semantic information of the short text, performs aggregated classification on the knowledge information and the features of the keywords to obtain a final short text classification result, and realizes generation of a more detailed classification result on the short text from different granularities so as to improve the classification accuracy.
A description is given below of a possible implementation manner of each step of the short text classification method provided in the first embodiment of the present application.
On the basis of the first embodiment, in a further embodiment of the present application, entity recognition is performed on the short text to obtain an entity set of the short text, and the entity set is recognized to determine knowledge information of the short text.
Specifically, how to determine the knowledge information of the short text is already described in step S10, and is not described here.
Based on the above embodiment, in a further embodiment of the present application, the short text, the knowledge information and the keywords are input into the neural network model embedding layer, and the word vector model is used to pre-train the short text, the knowledge information and the keywords in the embedding layer, so as to obtain the vector representation of the short text, the knowledge information and the keywords.
Specifically, in the embedding layer of the upper sub-network, we embed words and concepts into a high-dimensional vector space. Here we use Word vectors pre-trained with the Word2vec model to obtain a vector representation of each Word. Ww,WcRespectively, as embedded representations of words, knowledge information. The concrete formula is as follows
Figure BDA0003347190030000061
Figure BDA0003347190030000062
Wherein, it needs to be explained that one short text contains a plurality of words, for example, zhang san today goes to an orchard and plants a plurality of fruit trees; in the text, a plurality of words appear, such as orchards, fruit trees, planting and the like;
Figure BDA0003347190030000063
the splicing operation is shown in all the embodiments of the application, m and n respectively represent words and the maximum number of words of knowledge information, wherein
Figure BDA0003347190030000064
Is a vector representation of the ith word,
Figure BDA0003347190030000065
for the vector representation of the ith knowledge information, finally obtaining the vector representation W of the short text through a splicing operationwVector representation W with knowledge informationc. If the vector length of the text and knowledge information is not enough, 0 is used for filling.
Figure BDA0003347190030000071
Figure BDA0003347190030000072
In the next sub-network, we use a standard Convolutional Neural Network (CNN) to obtain the character-level vector representation of the ith word
Figure BDA0003347190030000073
Character level vector representation with ith keyword
Figure BDA0003347190030000074
T and v are respectively the maximum number of words and keywords, and a vector matrix E of a character-level short text and keyword set is obtained through the same splicing operation as that of a superior subnetworkw,Ek
On the basis of the first embodiment, in a further embodiment of the present application, the vector representations of the short text and the knowledge information are spliced in a superior subnetwork of the neural network model to obtain a vector matrix of the knowledge information;
the short text and the keywords are pre-trained by adopting a convolutional neural network to obtain character-level vector representations of the short text and the keywords, and the character-level vector representations of the short text and the keywords are spliced in a sub-network at the lower level of a neural network model to obtain a vector matrix of the keywords at the character level.
Specifically, how to obtain the vector matrix of the short text, the knowledge information and the keyword is described in the implementation of the above embodiment, and is not described here.
Based on the first embodiment, in a further embodiment of the present application, the short text vector matrix and the vector matrix of the character-level short text are both input to the bidirectional memory network layer for processing, so as to obtain semantic information of the short text contexts of the upper and lower subnetworks respectively.
Specifically, the upper subnetwork and the lower subnetwork match the operation of obtaining the vector matrix, and thus the upper subnetwork is taken as an example for description. A word vector matrix W obtained by an input unitw={W1,W2,...,WnThe text is input to the LSTM network, so as to obtain the context semantic information of the text. Forward LSTM reads in the normal order (W)1~Wn) The inverted LSTM is read in reverse order (W) as shown in equation (4)n~W1) The following formula (5):
Figure BDA0003347190030000075
Figure BDA0003347190030000076
Figure BDA0003347190030000077
wherein h istRepresenting the neuron output at time t, wiRepresenting the ith short text vector. Combining each forward output at time t
Figure BDA0003347190030000078
And reverse input
Figure BDA0003347190030000079
To obtain the final htAs in formula (6) above, we use HsupSemantic representation of the superordinate subnetwork, i.e. the final htI.e. Hsup={h1,h2,...,ht}. The same operation procedure as that of the upper sub-network is adopted, and Ew={E1,E2,...,EtH is input into LSTM network in subordinate sub-network to obtain semantic representation H of subordinate sub-networksub
In the lower sub-network, the calculation mode is the same as that of the upper sub-network, and the Q, K and V vectors in the lower sub-network are all equal to HsubThe final E is obtained by the same operation mode as the superior subnetworkk
On the basis of the first embodiment, in a further embodiment of the present application, in the upper-level sub-network, attention calculation is performed on semantic information and knowledge information of a short text context to obtain a self-attention result of the knowledge information, products of the self-attention result and the semantic information of the knowledge information are calculated, and each product is spliced to obtain a vector of the knowledge information;
in a next-level sub-network, attention calculation is carried out on semantic information of the short text context and the keywords to obtain self-attention results of the keywords, products of the self-attention results of the keywords and the semantic information are calculated, and each product is spliced to obtain vectors of the keywords.
Specifically, since the lower subnetwork is calculated in the same manner as the upper subnetwork, the upper subnetwork will be explained as an example, and first, the zoom point product attention mechanism is used to capture the word-word dependency relationship between sentences, and learn the internal structure of the sentences. Given a query vector Q, a key matrix K and a value matrix V. Where Q, K, V are three matrices of the same value and all equal to Hsup2r denotes a scaling factor, and r denotes the number of neurons of the upper subnetwork. Subjecting the result A obtained by the calculation to a maximum pooling operation represented by the following formula (A)8) The dependency of words of short text is represented by a maximum in each dimension. The specific formula is as follows:
Figure BDA0003347190030000081
p=maxpool(A) (8)
after calculating p, we propose a context-based attention calculation for calculating the importance of knowledge information to text in the upper sub-network, and the specific formula is as follows:
yi=softmax(a1(tanh(a2[ci;p]+b2))) (9)
Figure BDA0003347190030000082
yi represents the weight of a concept to text, and a larger y represents a greater importance of this concept/keyword to short text. tanh is a hyperbolic tangent function, normalizing the attention result to [0,1 ] using a softmax function]Within the range of (a).
Figure BDA0003347190030000083
A matrix of weights is represented by a matrix of weights,
Figure BDA0003347190030000084
representing weight vectors, R being a vector space representation, drRepresents a hyper-parameter, b2Representing an offset vector. Finally, the calculated weight yiMultiplication by
Figure BDA0003347190030000085
And splicing to obtain final WcAs in the above formula (10).
In a further embodiment of the present application, based on the first embodiment, the calculation formula of the attention calculation is yi=softmax(a1(tanh(a2[ci;p]+b2) ); wherein, yiRepresenting knowledge information or informationWeight of the key word to the short text, tanh denotes hyperbolic tangent function, softmax denotes normalization of the self-attention result,
Figure BDA0003347190030000091
a matrix of weights is represented by a matrix of weights,
Figure BDA0003347190030000092
representing a weight vector, b2Representing an offset vector, p an intermediate result, W a vector, ciThe representation represents the i-th knowledge vector in the superior subnetwork and the i-th keyword vector in the inferior subnetwork.
Specifically, the previous embodiment has already explained how to perform the attention calculation, and therefore is not described here.
On the basis of the first embodiment, in a further embodiment of the present application, the vector of the knowledge information and the vector matrix of the short text are spliced in the upper-level sub-network, feature extraction is performed on the spliced matrix through a two-dimensional convolutional neural network to obtain a feature vector, and the feature vector is classified through a full connection layer of the upper-level sub-network to obtain a classification result of the upper-level sub-network;
splicing the vector of the keyword and the vector matrix of the keyword in the lower sub-network, performing feature extraction on the spliced matrix through a two-dimensional convolutional neural network to obtain a feature vector, and classifying the feature vector through a full connection layer of the lower sub-network to obtain a classification result of the lower sub-network;
and carrying out combined classification on the classification result of the superior sub-network and the classification result of the subordinate sub-network to obtain the classification result of the short text.
Specifically, in the upper sub-network, we will use the semantic information H of the short textsupAnd WcSplicing as input WsupIn the subordinate subnetwork we will be HsubAnd EkSplicing as input Wsub. The corresponding formula is as follows:
Figure BDA0003347190030000093
Figure BDA0003347190030000094
wherein the content of the first and second substances,
Figure BDA0003347190030000095
m denotes the word vector dimension, nc/nkRepresenting the number of concepts/keywords, R is a vector space representation. And then, performing convolution, pooling and classification operations on the upper and lower subnetworks respectively by using a CNN model.
Firstly, the convolution kernels with the width fixed as m and the different heights h are used for respectively carrying out Wsup,WsubConvolution operation is carried out, so as to extract the features of the short text and generate a group of feature vectors vi. Feature vector [ v ] to be generated1;vi]Activation takes place via the activation function relu. The specific formula is as follows:
Ssup=relu(w·vi+b) (13)
wherein w is and viThe dimensions of the weight matrix are the same, and b represents an offset vector. Obtaining S of lower sub-network by the same operationsub
In the Pooling layer, Max Pooling is used, the maximum value in a certain area is taken as a representative to be output, and a vector T with a fixed length is extracted from the feature mapi. Therefore, the generalization capability of the neural network model is improved, and the network parameters are reduced. After CNN pooling, we introduce a fully connected softmax (·) layer to sort the upper and lower levels separately. And finally, performing combined classification on the classification results of the upper and lower sub-networks to obtain a final classification result Output, as shown in fig. 2, wherein the specific formula is as follows:
Figure BDA0003347190030000101
by combining the above technical solutions, as shown in fig. 2, fig. 2 is a proposed short text network classification model, and the model is generally composed of an upper level network and a lower level network, and each of the upper level network and the lower level network includes four units. The method comprises the steps that four units are arranged in a superior network, short texts are conceptualized by using an external knowledge base (base), corresponding word vector matrixes are generated by using pre-trained word vectors, and the vector matrixes of the texts are input into an LSTM network to obtain semantic representation of the texts. Thirdly, the semantic representation and the concept word vector matrix are subjected to a dynamic attention mechanism to obtain a knowledge information vector. And finally, connecting the semantic representation with the vector of the knowledge information together to complete classification through a CNN network. In a lower-level network, keywords in a short text are obtained by using a Yake keyword automatic extraction algorithm, corresponding vector representation is generated through character-level features, other units in the lower-level network are consistent with a higher-level sub-network, and knowledge information is replaced by keyword information. And finally, combining the classification results of the upper and lower networks together, and acquiring the probability of each class by using an output layer.
Example two
Based on the same concept, the second embodiment provides a short text classification device, which can be applied to a computer and some other electronic devices to execute the short text classification method described in the first embodiment, as shown in fig. 3, which shows a structural block diagram of the short text classification device provided in the second embodiment of the present application, and includes a determining unit 110, a splicing unit 120, a processing unit 130, a calculating unit 140, and a classifying unit 150;
the determining unit 110 is configured to determine knowledge information and keywords of a short text;
the splicing unit 120 is configured to embed the short text, the knowledge information, and the keyword into a vector space for splicing to obtain a vector matrix of the short text, the knowledge information, and the keyword;
the processing unit 130 is configured to process the short text vector matrix by using a two-way memory network layer to obtain semantic information of the short text;
the calculating unit 140 is configured to perform attention calculation on the semantic information of the short text and the vector matrix of the knowledge information or the vector matrix of the keyword to obtain a vector of the knowledge information or the keyword;
the classification unit 150 is configured to perform feature extraction on the vector and the vector matrix by using a convolutional neural network to obtain a short text classification result.
According to the technical scheme, the short text classification device in the second embodiment of the application expands the representation range of the short text by determining the knowledge information and the keywords of the short text, but the embedding of the knowledge information by the existing classification method is only static, and does not pay attention to semantic information of the context of the short text, so that a self-attention mechanism based on the context is provided, and the knowledge information is selectively embedded through the context information. In addition, influence generated when knowledge information is insufficient is often ignored in the conventional classification method, so that the method adopts a convolutional neural network to extract the keywords and the vector of the knowledge information and the feature information of the semantic information of the short text, performs aggregated classification on the knowledge information and the features of the keywords to obtain a final short text classification result, and realizes generation of a more detailed classification result on the short text from different granularities so as to improve the classification accuracy.
Optionally, the determining unit 110 is further configured to perform entity identification on the short text, obtain an entity set of the short text, and identify the entity set to determine knowledge information of the short text.
Optionally, the splicing unit 120 is further configured to input the short text, the knowledge information, and the keyword into the neural network model embedding layer, and pre-train the short text, the knowledge information, and the keyword in the embedding layer by using a word vector model to obtain a vector representation of the short text, the knowledge information, and the keyword.
Optionally, the splicing unit 120 includes a first splicing unit and a second splicing unit, where the first splicing unit is configured to splice the vector representations of the short text and the knowledge information in a superior subnetwork of the neural network model to obtain a vector matrix of the knowledge information;
the second splicing unit is used for pre-training the short text and the keywords by adopting a convolutional neural network to obtain character-level vector representations of the short text and the keywords, splicing the character-level vector representations of the short text and the keywords in a sub-network of a lower level of the neural network model to obtain a vector matrix of the keywords at a character level.
Optionally, the calculating unit 140 is configured to input the short text vector matrix and the vector matrix of the character-level short text into the two-way memory network layer for processing, and obtain semantic information of the short text contexts of the upper and lower subnetworks respectively.
Optionally, the calculating unit 140 is further configured to perform attention calculation on the semantic information and the knowledge information of the short text context in the upper-level sub-network to obtain a self-attention result of the knowledge information, calculate a product of the self-attention result of the knowledge information and the semantic information, and splice each product to obtain a vector of the knowledge information;
in a next-level sub-network, attention calculation is carried out on semantic information and keywords of the short text context to obtain self-attention results of the keywords, products of the self-attention results and the semantic information of the keywords are calculated, and each product is spliced to obtain vectors of the keywords.
Optionally, the formula of the attention calculation is yi=softmax(a1(tanh(a2[ci;p]+b2) ); wherein, yiRepresenting the weight of knowledge information or keywords to short text, tanh representing a hyperbolic tangent function, softmax representing the normalization of the self-attention result,
Figure BDA0003347190030000111
a matrix of weights is represented by a matrix of weights,
Figure BDA0003347190030000112
representing a weight vector, b2Representing an offset vector, p an intermediate result, W a vector, ciThe representation represents the i-th knowledge vector in the superior subnetwork and the i-th keyword vector in the inferior subnetwork.
Optionally, the vector of the knowledge information and the vector matrix of the short text are spliced in the superior subnetwork, the feature vector is obtained by performing feature extraction on the spliced matrix through a two-dimensional convolutional neural network, and the feature vector is classified through a full connection layer of the superior subnetwork to obtain a classification result of the superior subnetwork;
splicing the vector of the keyword and the vector matrix of the keyword in the lower sub-network, performing feature extraction on the spliced matrix through a two-dimensional convolution neural network to obtain a feature vector, and classifying the feature vector through a full connection layer of the lower sub-network to obtain a classification result of the lower sub-network;
and carrying out combined classification on the classification result of the superior sub-network and the classification result of the subordinate sub-network to obtain the classification result of the short text.
The feasible implementation manners of each unit of the short text classification apparatus provided in the second embodiment of the present application are all described in the first embodiment of the short text classification method, and therefore will not be described here.
EXAMPLE III
Based on the same concept, as shown in fig. 4, a third embodiment of the present application provides an electronic device, which includes a memory 330, a processor 310, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of implementing the classification method provided in the first embodiment when executing the computer program.
Fig. 4 is a schematic entity structure diagram of an electronic device according to a third embodiment of the present invention, and as shown in fig. 4, the electronic device may include: a processor (processor)310, a communication interface (communication interface)320, a memory (memory)330 and a communication bus 340, wherein the processor 310, the communication interface 320 and the memory 330 communicate with each other via the communication bus 340. The processor 310 may invoke a computer program stored on the memory 330 and executable on the processor 310 to perform the text classification methods provided by the various embodiments described above, including, for example: determining knowledge information and key words of the short text; embedding the short text, the knowledge information and the keywords into a vector space for splicing to obtain a vector matrix of the short text, the knowledge information and the keywords; processing the short text vector matrix by adopting a bidirectional memory network layer to obtain semantic information of the short text; performing attention calculation on the semantic information of the short text, the vector matrix of the knowledge information and the vector matrix of the keywords to obtain the knowledge information or the vectors of the keywords of the short text; and performing feature extraction on the vector and semantic information of the short text by using a convolutional neural network to obtain a short text classification result.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (8)

1. A short text classification method is characterized by comprising the following steps:
determining knowledge information and key words of the short text;
embedding the short text, the knowledge information and the keywords into a vector space for splicing to obtain a vector matrix of the short text, the knowledge information and the keywords, which specifically comprises the following steps: splicing the vector representations of the short text and the knowledge information in a superior subnetwork of the neural network model to obtain a vector matrix of the knowledge information; pre-training the short text and the keywords by adopting a convolutional neural network to obtain character-level vector representations of the short text and the keywords, and splicing the character-level vector representations of the short text and the keywords in a lower-level sub-network of a neural network model to obtain a vector matrix of the character-level keywords;
processing the short text vector matrix by adopting a bidirectional memory network layer to obtain semantic information of the short text;
attention calculation is carried out on semantic information of the short text and a vector matrix of knowledge information or a vector matrix of keywords to obtain a vector of the knowledge information or the keywords;
the method comprises the following steps of utilizing a convolutional neural network to carry out feature extraction on vectors and a vector matrix to obtain a short text classification result, and specifically comprising the following steps: splicing the vector of the knowledge information and the vector matrix of the short text in a superior subnetwork, extracting the characteristics of the spliced matrix through a two-dimensional convolutional neural network to obtain a characteristic vector, and classifying the characteristic vector through a full-connection layer of the superior subnetwork to obtain a classification result of the superior subnetwork;
splicing the vector of the keyword and the vector matrix of the keyword in the lower sub-network, performing feature extraction on the spliced matrix through a two-dimensional convolutional neural network to obtain a feature vector, and classifying the feature vector through a full connection layer of the lower sub-network to obtain a classification result of the lower sub-network;
and carrying out combined classification on the classification result of the superior sub-network and the classification result of the subordinate sub-network to obtain the classification result of the short text.
2. The short text classification method according to claim 1, wherein the determining knowledge information of the short text specifically comprises: and carrying out entity recognition on the short text to obtain an entity set of the short text, and recognizing the entity set to determine knowledge information of the short text.
3. The short text classification method according to claim 2, wherein the short text, the knowledge information and the keyword are embedded in a vector space for stitching to obtain a vector matrix of the short text, the knowledge information and the keyword, and the method further comprises the following steps: inputting the short text, the knowledge information and the keywords into a neural network model embedding layer, and pre-training the short text, the knowledge information and the keywords of the embedding layer by adopting a word vector model to obtain vector representation of the short text, the knowledge information and the keywords.
4. The method for classifying short texts according to claim 1, wherein the processing of the short text vector matrix by the bidirectional memory network layer to obtain the semantic information of the short texts specifically comprises:
and inputting the short text vector matrix and the character-level short text vector matrix into a bidirectional memory network layer for processing, and respectively obtaining semantic information of the short text context of the upper and lower sub-networks.
5. The method for classifying short texts according to claim 1, wherein performing attention calculation on the semantic information of the short texts and the vector matrix of the knowledge information or the vector matrix of the keyword to obtain the vector of the knowledge information or the keyword specifically comprises:
in a superior subnetwork, performing attention calculation on semantic information and knowledge information of a short text context to obtain a self-attention result of the knowledge information, calculating products of the self-attention result of the knowledge information and the semantic information, and splicing each product to obtain a vector of the knowledge information;
in a next-level sub-network, attention calculation is carried out on semantic information of the short text context and the keywords to obtain self-attention results of the keywords, products of the self-attention results of the keywords and the semantic information are calculated, and each product is spliced to obtain vectors of the keywords.
6. The short text classification method according to claim 5,
the calculation formula of the attention calculation is:
Figure DEST_PATH_IMAGE001
wherein the weight of the knowledge information or the keyword to the short text is represented, tanhwhich represents a function of the hyperbolic tangent,softmaxindicating that the self-attention results are normalized,
Figure 47781DEST_PATH_IMAGE002
a matrix of weights is represented by a matrix of weights,
Figure DEST_PATH_IMAGE003
a vector of weights is represented by a vector of weights,
Figure 783656DEST_PATH_IMAGE004
which represents the offset vector of the offset vector,pthe intermediate result is shown to be,
Figure DEST_PATH_IMAGE005
indicating in the upper subnetwork thatiThe sum of knowledge vectors representing the second in the next subnetworkiA key vector.
7. A short text classification device is characterized by comprising a determining unit, a splicing unit, a processing unit, a calculating unit and a classifying unit;
the determining unit is used for determining knowledge information and keywords of the short text;
the splicing unit is used for embedding the short text, the knowledge information and the keywords into a vector space for splicing to obtain a vector matrix of the short text, the knowledge information and the keywords, and specifically comprises the following steps: splicing the vector representations of the short text and the knowledge information in a superior subnetwork of the neural network model to obtain a vector matrix of the knowledge information; pre-training the short text and the keywords by adopting a convolutional neural network to obtain character-level vector representations of the short text and the keywords, and splicing the character-level vector representations of the short text and the keywords in a lower-level sub-network of a neural network model to obtain a vector matrix of the character-level keywords;
the processing unit is used for processing the short text vector matrix by adopting a two-way memory network layer to obtain semantic information of the short text;
the computing unit is used for carrying out attention computing on the semantic information of the short text and the vector matrix of the knowledge information or the vector matrix of the keyword to obtain the vector of the knowledge information or the keyword;
the classification unit is used for extracting features of the vectors and the vector matrix by using a convolutional neural network to obtain a short text classification result, and specifically comprises the following steps: splicing the vector of the knowledge information and the vector matrix of the short text in a superior subnetwork, extracting the characteristics of the spliced matrix through a two-dimensional convolutional neural network to obtain a characteristic vector, and classifying the characteristic vector through a full-connection layer of the superior subnetwork to obtain a classification result of the superior subnetwork;
splicing the vector of the keyword and the vector matrix of the keyword in the lower sub-network, performing feature extraction on the spliced matrix through a two-dimensional convolutional neural network to obtain a feature vector, and classifying the feature vector through a full connection layer of the lower sub-network to obtain a classification result of the lower sub-network;
and carrying out combined classification on the classification result of the superior sub-network and the classification result of the subordinate sub-network to obtain the classification result of the short text.
8. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the classification method according to any one of claims 1 to 6 when executing the computer program.
CN202111326798.5A 2021-11-10 2021-11-10 Short text classification method and device and electronic equipment Active CN113987188B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111326798.5A CN113987188B (en) 2021-11-10 2021-11-10 Short text classification method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111326798.5A CN113987188B (en) 2021-11-10 2021-11-10 Short text classification method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN113987188A CN113987188A (en) 2022-01-28
CN113987188B true CN113987188B (en) 2022-07-08

Family

ID=79747702

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111326798.5A Active CN113987188B (en) 2021-11-10 2021-11-10 Short text classification method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN113987188B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115617990B (en) * 2022-09-28 2023-09-05 浙江大学 Power equipment defect short text classification method and system based on deep learning algorithm

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104834747A (en) * 2015-05-25 2015-08-12 中国科学院自动化研究所 Short text classification method based on convolution neutral network
KR20180112590A (en) * 2017-04-04 2018-10-12 한국전자통신연구원 System and method for generating multimedia knowledge base
CN109710761A (en) * 2018-12-21 2019-05-03 中国标准化研究院 The sentiment analysis method of two-way LSTM model based on attention enhancing
CN110321562A (en) * 2019-06-28 2019-10-11 广州探迹科技有限公司 A kind of short text matching process and device based on BERT
CN111460142A (en) * 2020-03-06 2020-07-28 南京邮电大学 Short text classification method and system based on self-attention convolutional neural network
CN113515632A (en) * 2021-06-30 2021-10-19 西南电子技术研究所(中国电子科技集团公司第十研究所) Text classification method based on graph path knowledge extraction

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104834747A (en) * 2015-05-25 2015-08-12 中国科学院自动化研究所 Short text classification method based on convolution neutral network
KR20180112590A (en) * 2017-04-04 2018-10-12 한국전자통신연구원 System and method for generating multimedia knowledge base
CN109710761A (en) * 2018-12-21 2019-05-03 中国标准化研究院 The sentiment analysis method of two-way LSTM model based on attention enhancing
CN110321562A (en) * 2019-06-28 2019-10-11 广州探迹科技有限公司 A kind of short text matching process and device based on BERT
CN111460142A (en) * 2020-03-06 2020-07-28 南京邮电大学 Short text classification method and system based on self-attention convolutional neural network
CN113515632A (en) * 2021-06-30 2021-10-19 西南电子技术研究所(中国电子科技集团公司第十研究所) Text classification method based on graph path knowledge extraction

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
融合主题模型与词向量的短文本分类方法研究;邵云飞;《中国优秀硕士学位论文全文库》;20200215;正文第25-41页 *

Also Published As

Publication number Publication date
CN113987188A (en) 2022-01-28

Similar Documents

Publication Publication Date Title
Wen et al. Ensemble of deep neural networks with probability-based fusion for facial expression recognition
KR102071582B1 (en) Method and apparatus for classifying a class to which a sentence belongs by using deep neural network
CN111079639B (en) Method, device, equipment and storage medium for constructing garbage image classification model
Shao et al. Feature learning for image classification via multiobjective genetic programming
CN110796199B (en) Image processing method and device and electronic medical equipment
CN111444344B (en) Entity classification method, entity classification device, computer equipment and storage medium
CN111177383B (en) Text entity relation automatic classification method integrating text grammar structure and semantic information
Chen et al. Recursive context routing for object detection
CN111259144A (en) Multi-model fusion text matching method, device, equipment and storage medium
CN112711953A (en) Text multi-label classification method and system based on attention mechanism and GCN
CN112905795A (en) Text intention classification method, device and readable medium
CN110968725B (en) Image content description information generation method, electronic device and storage medium
CN113051914A (en) Enterprise hidden label extraction method and device based on multi-feature dynamic portrait
CN107122492A (en) Lyric generation method and device based on picture content
CN112100377A (en) Text classification method and device, computer equipment and storage medium
CN114647713A (en) Knowledge graph question-answering method, device and storage medium based on virtual confrontation
CN113987188B (en) Short text classification method and device and electronic equipment
Li et al. Spatial-temporal dynamic hand gesture recognition via hybrid deep learning model
Ghayoumi et al. Local sensitive hashing (LSH) and convolutional neural networks (CNNs) for object recognition
CN113806580A (en) Cross-modal Hash retrieval method based on hierarchical semantic structure
CN116186594B (en) Method for realizing intelligent detection of environment change trend based on decision network combined with big data
CN113761188A (en) Text label determination method and device, computer equipment and storage medium
CN113536784A (en) Text processing method and device, computer equipment and storage medium
CN111680132A (en) Noise filtering and automatic classifying method for internet text information
CN111401069A (en) Intention recognition method and intention recognition device for conversation text and terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant