CN112667782A - Text classification method, device, equipment and storage medium - Google Patents

Text classification method, device, equipment and storage medium Download PDF

Info

Publication number
CN112667782A
CN112667782A CN202110005508.0A CN202110005508A CN112667782A CN 112667782 A CN112667782 A CN 112667782A CN 202110005508 A CN202110005508 A CN 202110005508A CN 112667782 A CN112667782 A CN 112667782A
Authority
CN
China
Prior art keywords
text
phrase
classified
vectors
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202110005508.0A
Other languages
Chinese (zh)
Inventor
王硕
周星杰
杨康
徐成国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Minglue Artificial Intelligence Group Co Ltd
Original Assignee
Shanghai Minglue Artificial Intelligence Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Minglue Artificial Intelligence Group Co Ltd filed Critical Shanghai Minglue Artificial Intelligence Group Co Ltd
Priority to CN202110005508.0A priority Critical patent/CN112667782A/en
Publication of CN112667782A publication Critical patent/CN112667782A/en
Withdrawn legal-status Critical Current

Links

Images

Abstract

The application provides a text classification method, a text classification device, text classification equipment and a storage medium, and relates to the technical field of natural language processing. The method comprises the following steps: converting at least one word in the text to be classified into at least one word vector respectively; inputting each word vector into a phrase attention sub-model in a text classification model obtained by pre-training to obtain a plurality of phrase vectors corresponding to the text to be classified; inputting a plurality of phrase vectors and each candidate category in a hierarchical label structure corresponding to the text to be classified into a label attention sub-model in the text classification model to obtain a plurality of feature vectors of the text to be classified; and obtaining a classification result of the text to be classified based on each feature vector of the text to be classified, wherein the classification result is used for representing the category of the text to be classified. By applying the method and the device, the precision of classifying the texts to be classified can be improved.

Description

Text classification method, device, equipment and storage medium
Technical Field
The present application relates to the field of natural language processing technologies, and in particular, to a text classification method, apparatus, device, and storage medium.
Background
The text classification problem is an important research direction in the field of natural language processing, and has related application in the fields of emotion analysis and information retrieval. The hierarchical multi-label text classification method is an important method for solving the problem of text classification, and is concerned by learners in recent years.
At present, in a hierarchical multi-label text classification method, a plurality of two classifiers are constructed to classify hierarchical multi-label texts by assuming that labels are mutually independent and then converting the labels into a two-classification problem.
However, there is a complex dependency relationship between labels in the hierarchical multi-label text classification task, and the text classification precision is reduced by classifying the hierarchical multi-label text in the prior art.
Disclosure of Invention
An object of the present application is to provide a method, an apparatus, a device and a storage medium for text classification, which can improve the accuracy of text classification.
In order to achieve the above purpose, the technical solutions adopted in the embodiments of the present application are as follows:
in a first aspect, an embodiment of the present application provides a text classification method, where the method includes:
converting at least one word in the text to be classified into at least one word vector respectively;
inputting each word vector into a phrase attention sub-model in a text classification model obtained by pre-training to obtain a plurality of phrase vectors corresponding to the text to be classified;
inputting the phrase vectors and candidate categories in a hierarchical label structure corresponding to the text to be classified into a label attention sub-model in the text classification model to obtain a plurality of feature vectors of the text to be classified;
and obtaining a classification result of the text to be classified based on each feature vector of the text to be classified, wherein the classification result is used for representing the category of the text to be classified.
Optionally, the tag attention submodel includes: a chart winding layer and a label attention layer;
the step of inputting the plurality of phrase vectors and each candidate category in the hierarchical label structure corresponding to the text to be classified into the label attention submodel in the text classification model to obtain a plurality of feature vectors of the text to be classified includes:
and inputting the plurality of phrase vectors and each candidate category vector in a hierarchical label structure corresponding to the text to be classified into the label attention layer, and obtaining a plurality of feature vectors of the text to be classified by the label attention layer according to the plurality of phrase vectors and each candidate category vector output by the graph convolution layer.
Optionally, the obtaining a plurality of feature vectors of the text to be classified according to the plurality of phrase vectors and the candidate category vectors output by the graph convolution layer includes:
determining a weight of each of the candidate categories relative to each of the phrases based on each of the candidate category vectors and each of the phrase vectors;
and obtaining a plurality of feature vectors of the text to be classified according to the weight of each candidate category relative to each phrase and each phrase vector.
Optionally, before the inputting the plurality of phrase vectors and each candidate category vector in the hierarchical tag structure corresponding to the text to be classified into the tag attention layer, and the tag attention layer obtaining a plurality of feature vectors of the text to be classified according to the plurality of phrase vectors and each candidate category vector output by the graph convolution layer, the method further includes:
and carrying out node aggregation processing on each candidate category in the hierarchical label structure by the graph convolution layer to obtain each candidate category vector.
Optionally, the phrase attention submodel includes: convolutional layers, two-way long-short term memory layers, and phrase attention layers;
optionally, the inputting each word vector into a phrase attention submodel in a text classification model obtained through pre-training to obtain a plurality of phrase vectors corresponding to the text to be classified includes:
inputting each word vector into the convolutional layer to obtain phrase sequence characteristics;
inputting the phrase sequence characteristics into the bidirectional long and short term memory layer, and extracting phrase context semantic characteristics to obtain phrase semantic characteristics containing the context semantics;
and inputting the phrase semantic features into the phrase attention layer to obtain a plurality of phrase vectors corresponding to the text to be classified.
Optionally, the inputting the phrase semantic features into the phrase attention layer to obtain a plurality of phrase vectors corresponding to the text to be classified includes:
determining a representation vector according to the phrase semantic features;
determining scores of the semantic features of the phrases according to the expression vectors;
and obtaining a plurality of phrase vectors corresponding to the text to be classified according to the scores of the phrase semantic features and the phrase semantic features.
Optionally, the converting at least one word in the text to be classified into at least one word vector respectively includes:
and inputting at least one word in the text to be classified into an embedded model obtained by pre-training to obtain a word vector corresponding to each word.
In a second aspect, an embodiment of the present application further provides a text classification apparatus, where the apparatus includes:
the conversion module is used for converting at least one word in the text to be classified into at least one word vector respectively;
the first input module is used for inputting each word vector into a phrase attention sub-model in a text classification model obtained by pre-training to obtain a plurality of phrase vectors corresponding to the text to be classified;
the second input module is used for inputting the plurality of phrase vectors and each candidate category in the hierarchical label structure corresponding to the text to be classified into the label attention submodel in the text classification model to obtain a plurality of feature vectors of the text to be classified;
and the determining module is used for obtaining a classification result of the text to be classified based on each feature vector of the text to be classified, and the classification result is used for representing the category of the text to be classified.
Optionally, the tag attention submodel includes: a chart winding layer and a label attention layer;
correspondingly, the second input module is specifically configured to input the plurality of phrase vectors and each candidate category vector in the hierarchical tag structure corresponding to the text to be classified into the tag attention layer, and the tag attention layer obtains a plurality of feature vectors of the text to be classified according to the plurality of phrase vectors and each candidate category vector output by the graph convolution layer.
Optionally, the second input module is further specifically configured to determine, based on each candidate category vector and each phrase vector, a weight of each candidate category with respect to each phrase; and obtaining a plurality of feature vectors of the text to be classified according to the weight of each candidate category relative to each phrase and each phrase vector.
Optionally, the apparatus further comprises:
and the processing module is used for carrying out node aggregation processing on each candidate category in the hierarchical label structure by the graph convolution layer to obtain each candidate category vector.
Optionally, the phrase attention submodel includes: convolutional layers, two-way long-short term memory layers, and phrase attention layers;
correspondingly, the first input module is specifically configured to input each word vector into the convolutional layer to obtain a phrase sequence feature; inputting the phrase sequence characteristics into the bidirectional long and short term memory layer, and extracting phrase context semantic characteristics to obtain phrase semantic characteristics containing the context semantics; and inputting the phrase semantic features into the phrase attention layer to obtain a plurality of phrase vectors corresponding to the text to be classified.
Optionally, the first input module is further specifically configured to determine a representation vector according to the phrase semantic features; determining scores of the semantic features of the phrases according to the expression vectors; and obtaining a plurality of phrase vectors corresponding to the text to be classified according to the scores of the phrase semantic features and the phrase semantic features.
Optionally, the conversion module is specifically configured to input at least one word in the text to be classified into an embedding model obtained through pre-training, so as to obtain a word vector corresponding to each word.
In a third aspect, an embodiment of the present application provides an electronic device, including: a processor, a storage medium and a bus, wherein the storage medium stores machine-readable instructions executable by the processor, when the electronic device runs, the processor communicates with the storage medium through the bus, and the processor executes the machine-readable instructions to execute the steps of the text classification method of the first aspect.
In a fourth aspect, the present application provides a storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the text classification method of the first aspect.
The beneficial effect of this application is:
the embodiment of the application provides a text classification method, a text classification device, text classification equipment and a storage medium, wherein the method comprises the following steps: converting at least one word in the text to be classified into at least one word vector respectively; inputting each word vector into a phrase attention sub-model in a text classification model obtained by pre-training to obtain a plurality of phrase vectors corresponding to the text to be classified; inputting a plurality of phrase vectors and each candidate category in a hierarchical label structure corresponding to the text to be classified into a label attention sub-model in the text classification model to obtain a plurality of feature vectors of the text to be classified; and obtaining a classification result of the text to be classified based on each feature vector of the text to be classified, wherein the classification result is used for representing the category of the text to be classified. By adopting the text classification method provided by the embodiment of the application, on the premise that the semantics of the text to be classified is represented by the phrase vectors, the candidate categories in the hierarchical label structure corresponding to the text to be classified are input into the label attention submodel in the text classification model, the dependency relationship and the hierarchical structure characteristics among the candidate categories are captured by using the label attention submodel, and then the phrase vectors are combined with the captured dependency relationship and the hierarchical structure characteristics among the candidate categories by using the label attention submodel to obtain the classification result of the text to be classified, so that the precision of classifying the text to be classified can be improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
Fig. 1 is a schematic flowchart of a file classification method according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a text classification model according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a hierarchical label structure provided by an embodiment of the present application;
fig. 4 is a schematic flowchart of another text classification method according to an embodiment of the present application;
fig. 5 is a schematic flowchart of another text classification method according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a text classification apparatus according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments.
Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
Before explaining the embodiments of the present application in detail, an application scenario of the present application will be described first. The application scenario may be a certain service scenario, such as academic paper classification, emotion analysis, news classification, and the like, which is not limited in the present application. Here, a news classification scenario is taken as an example, in the news classification scenario, a device (e.g., a server) with certain processing capability processes a text input by a user to obtain a phrase vector capable of representing semantic information of the text, and the device obtains a category corresponding to the text based on the phrase vector and a pre-trained text classification model. The text input by the user may correspond to a plurality of tags, dependency relationships exist among the tags, and the application of the embodiment of the application can determine the type of the text input by the user, that is, the tag corresponding to the text, for example, it can be determined that the tag corresponding to the text input by the user is sports news under sports news, and then the sports news under the sports news can be pushed to the user, so that the content desired by the user can be more accurately pushed to the user, and other application scenes are similar.
The text classification method mentioned in the present application is exemplified below with reference to the drawings. Fig. 1 is a schematic flowchart of a file classification method according to an embodiment of the present application. As shown in fig. 1, the method may include:
s101, converting at least one word in the text to be classified into at least one word vector respectively.
The text to be classified may be a descriptive title of an article, a paper, a news article, etc., and the source of the text to be classified is not limited in the present application. Segmenting the text to be classified by using a jieba (jieba) word segmentation tool, deleting special symbols, stop words and the like contained in the text to be classified, and finally obtaining a plurality of words contained in the text to be classified. Each word is vectorized through a pre-trained word vectorization model, and each word can be converted into a corresponding word vector through the pre-trained word vectorization model, that is, each word is represented by a word vector with a preset dimension (e.g., 300).
And S102, inputting each word vector into a phrase attention sub-model in a text classification model obtained through pre-training to obtain a plurality of phrase vectors corresponding to the text to be classified.
The text classification model may include a phrase attention submodel and a tag attention submodel, as shown in fig. 2. The phrase attention submodel before training and the label attention submodel before training may be trained as a whole to obtain the text classification model, or the phrase attention submodel before training and the label attention submodel before training may be trained respectively to obtain the phrase attention submodel and the label attention submodel, and the text classification model may be obtained by combining the phrase attention submodel and the label attention submodel, which is not limited in this application.
The phrase attention submodel can convert the obtained text features of the text to be classified represented by the word vectors into the text features of the text to be classified represented by a plurality of phrase vectors. That is to say, the phrase attention submodel may capture phrase features formed by a plurality of words in the text to be classified, so as to improve the accuracy of obtaining the text features of the text to be classified.
S103, inputting the plurality of phrase vectors and each candidate category in the hierarchical label structure corresponding to the text to be classified into the label attention submodel in the text classification model to obtain a plurality of feature vectors of the text to be classified.
And S104, obtaining a classification result of the text to be classified based on each feature vector of the text to be classified.
The hierarchical label structure may be as shown in fig. 3, each candidate category in the hierarchical label structure corresponds to each node in fig. 3, and a label corresponding to a certain node may include label information corresponding to other nodes along the way, for example, the content included in the sports news label in fig. 3 is sports news under sports news. The hierarchical label structure is related to the application scenario corresponding to the text to be classified, that is, if the text to be classified belongs to a news article, the hierarchical label structure corresponding to the text to be classified may be as shown in fig. 3, and if the text to be classified belongs to a paper, the labels corresponding to the nodes in the hierarchical label structure may include a chemical label, a biological label, and a physical label, and the physical label may include an optical label, a plasma label, and the like, that is, different application scenarios, and the hierarchical label structure may be different.
Each candidate category in the hierarchical tag structure may be encoded in advance to obtain an initial vector corresponding to each candidate category, where the initial vector of each candidate category is used to characterize the own features of each candidate category. Inputting the initial vector of each candidate category into the tag attention submodel, wherein the tag attention submodel can capture the dependency relationship of each candidate category in the hierarchical tag structure to obtain the target vector of each candidate category, and the target vector of a certain candidate category is used for representing the information of the combination of the characteristics of the target vector and the characteristics corresponding to other candidate categories with connection relationships.
Through the label attention submodel, phrase vectors corresponding to the text to be classified can be combined with target vectors of candidate categories, and the corresponding relation between the text to be classified and the candidate categories can be obtained and can be characterized by feature vectors. And finally, inputting each feature vector of the text to be classified into a full connection layer in the label attention submodel, and outputting the category corresponding to the text to be classified by the full connection layer.
To sum up, in the text classification method provided by the present application, on the premise that the semantics of the text to be classified is represented by the phrase vectors, the candidate categories in the hierarchical label structure corresponding to the text to be classified are input into the label attention submodel in the text classification model, the label attention submodel is used to capture the dependency relationship and the hierarchical structure characteristics between the candidate categories, and then the label attention submodel is used to combine the phrase vectors with the captured dependency relationship and hierarchical structure characteristics between the candidate categories to obtain the classification result of the text to be classified, so that the precision of classifying the text to be classified can be improved.
Optionally, the tag attention submodel in fig. 2 may include: a chart winding layer and a label attention layer; the above inputting a plurality of phrase vectors and each candidate category in the hierarchical tag structure corresponding to the text to be classified into the tag attention submodel in the text classification model to obtain a plurality of feature vectors of the text to be classified includes:
inputting a plurality of phrase vectors and each candidate category vector in a hierarchical label structure corresponding to the text to be classified into the label attention layer, and obtaining a plurality of feature vectors of the text to be classified by the label attention layer according to the plurality of phrase vectors and each candidate category vector output by the graph convolution layer.
The initial vector of each candidate category in the hierarchical label structure can be obtained according to a word embedding matrix in a pre-trained word vector conversion model, the initial vector of each candidate category is input into the graph volume layer, and the graph volume layer outputs each candidate category vector, wherein each candidate category vector is equivalent to the target vector of each candidate category mentioned above. And respectively inputting each phrase vector and each candidate category vector into the label attention layer, wherein the label attention layer can obtain the interactive characteristics of the text to be classified and each candidate category vector through an attention mechanism, and the interactive characteristics are used for representing a plurality of characteristic vectors of the text to be classified.
Optionally, the tag attention submodel in fig. 2 may further include an output layer, and the output layer may output a classification result of the text to be classified based on each feature vector of the text to be classified.
Fig. 4 is a flowchart illustrating another text classification method according to an embodiment of the present application. As shown in fig. 4, optionally, the obtaining a plurality of feature vectors of the text to be classified according to the plurality of phrase vectors and the candidate category vectors output by the graph convolution layer includes:
s401, determining the weight of each candidate category relative to each phrase based on each candidate category vector and each phrase vector.
Wherein, the label attention layer parameter (w) in the text classification model obtained by pre-trainingk) Satisfying the preset training stop condition, each candidate category vector output by the last layer (I layer) of the graph convolution layer with I layer
Figure BDA0002882911960000111
Into a fully connected layer in the tag attention layer. Specifically, the number of fully-connected layers in the tag attention layer may be multiple, and the number of fully-connected layers is the same as the number of candidate categories, that is, each candidate category vector may be input to the corresponding tag attention layer parameter wkIn the full-connected layer(s), each candidate class feature output by each full-connected layer is:
Figure BDA0002882911960000121
wherein, bkIndicating the attention level of the labelA bias term, which is a constant; the value of k is 1-P, and P represents the number of candidate category vectors.
Outputting candidate class features (g) at fully connected layers in the tag attention layerk) Then, g can bekWith each phrase vector (v)i) The following treatments were carried out:
Figure BDA0002882911960000122
wherein, the value of i is 1-n, n represents the number of phrase vectors, and the weight (beta) of each candidate category relative to each phrase can be obtained through a formula (2)ki)。
For example, if P is 2, i.e. there are 2 candidate class vectors, n is 3, i.e. there are 3 phrase vectors, the weight of the first candidate class vector relative to the first phrase vector, the second phrase vector and the third phrase vector is β11、β12、β13The second candidate category vector is weighted β with respect to the first phrase vector, the second phrase vector, and the third phrase vector21、β22、β23
S402, obtaining a plurality of feature vectors of the text to be classified according to the weight of each candidate category relative to each phrase and each phrase vector.
The weight (beta) of each candidate category relative to each phrase can be determined separatelyki) And each phrase vector (v)i) Weighting and processing to obtain a plurality of characteristic vectors (o) corresponding to the text to be classifiedk) The number of feature vectors is the same as the number (P) of candidate class vectors, and can be specifically realized by the following formula:
Figure BDA0002882911960000123
continuing with the above example, the first candidate category vector is weighted β with respect to the first phrase vector11And a first phrase vector v1Multiplied vector, plus first candidate category vector relative to second phraseVector weight β12And a second phrase vector v2The multiplied vector is added with the weight beta of the first candidate class vector relative to the third phrase vector13And a third phrase vector v3The multiplied vector finally obtains a first characteristic vector (o) of the text to be classified1) The second candidate category vectors are similar, so that a second feature vector (o) of the text to be classified can be obtained2)。
Optionally, before the step of inputting the phrase vectors and the candidate category vectors in the hierarchical tag structure corresponding to the text to be classified into the tag attention layer, and obtaining the feature vectors of the text to be classified by the tag attention layer according to the phrase vectors and the candidate category vectors output by the graph convolution layer, the method further includes: and carrying out node aggregation processing on each candidate category in the hierarchical label structure by the graph convolution layer to obtain each candidate category vector.
Wherein the Graph convolution layer is a Graph Convolution Network (GCN), and each candidate class in the hierarchical label structure can be represented by a vector through the Graph convolution layer. Specifically, as shown in fig. 3, nodes (candidate classes) having a connection relationship in the hierarchical label structure are used as neighboring nodes to each other, the graph convolutional layer may aggregate features of the central node and features of neighboring nodes having a connection relationship with the central node to obtain features of the central node, the features of the central node may be characterized by a candidate feature vector corresponding to the central node, the graph convolutional layer may capture a hierarchical relationship and a dependency relationship between nodes, and may more accurately express the candidate feature vector corresponding to each node.
When the graph volume layer has multiple layers (such as I layer), the next layer (l +1) is updated based on the candidate class vector on the previous layer (l), and the specific processing procedure is as follows:
Figure BDA0002882911960000131
wherein the content of the first and second substances,
Figure BDA0002882911960000132
representing the candidate class vector output by the node i at level l +1, cijTo normalize the factor, NiAll neighbor nodes of node i, wlAs weight matrix parameters of the l-th layer, blIs the layer i offset.
Fig. 5 is a flowchart illustrating another text classification method according to an embodiment of the present application. As shown in fig. 5, the phrase attention submodel in fig. 2 may optionally include: convolutional layers, two-way long-short term memory layers, and phrase attention layers; the above inputting each word vector into the phrase attention submodel in the text classification model obtained by pre-training to obtain a plurality of phrase vectors corresponding to the text to be classified includes:
s501, inputting each word vector into the convolutional layer to obtain phrase sequence characteristics.
Wherein the convolutional layer may comprise a plurality of filters of dimension h, each word vector (x)i) And splicing to obtain a word vector sequence, inputting the word vector sequence into each filter on the convolution layer, and performing convolution on the word vector sequence by using a filter with the scale of h according to a preset step length, wherein the step length can be 1, 2 and the like.
The filter with the scale h performs convolution on the word vector sequence according to a preset step length to obtain the phrase sequence characteristics by the following process:
cij=f(wj·xi:i+h-1+bj)
wherein j, bj∈R,wjDenotes the jth filter, bjFor the bias term corresponding to the jth filter, f represents a non-linear function that may be selected as the activation function ReLU. w is aj·xi:i+h-1Denotes the sub-sequence (x) of the length h of the jth filter in the word vector sequencei:i+h-1) A convolution operation ofijAnd finally, combining the phrase characteristic values obtained by other filters to obtain a phrase sequence characteristic consisting of a plurality of initial phrase vectors.
S502, inputting the phrase sequence characteristics into the bidirectional long-short term memory layer, and extracting the phrase context semantic characteristics to obtain the phrase semantic characteristics containing the context semantic.
Wherein the Bi-directional long short term memory (Bi-LSTM) layer can read the phrase sequence features from left to right
Figure BDA0002882911960000141
Represents; reading the phrase sequence features from right to left
Figure BDA0002882911960000142
Representing, and then corresponding the forward direction vector of the initial phrase vector
Figure BDA0002882911960000151
And the direction vector
Figure BDA0002882911960000152
Splicing:
Figure BDA0002882911960000153
wherein h istThe phrase semantic features which contain context semantics and correspond to the initial phrase vectors can be obtained through the bidirectional long-short term memory layer. That is, the bidirectional long-short term memory layer can extract the phrase context features, and the phrase features can be used for more accurately representing the features of the text to be classified. The semantic features of the phrase are also expressed by vectors, and the number of the semantic features of the phrase corresponds to the number of the initial phrase vectors.
S503, inputting the semantic feature of the phrase into the attention layer of the phrase to obtain a plurality of phrase vectors corresponding to the text to be classified.
Optionally, determining a representation vector according to the semantic feature of the phrase; determining a score of the semantic feature of the phrase according to the expression vector; and obtaining a plurality of phrase vectors corresponding to the text to be classified according to the score of the phrase semantic feature and the phrase semantic feature.
Wherein, training an initial text classification model in advance can obtain a phrase attention layer parameter (w)t) Inputting each phrase semantic feature into a full-connected layer in the phrase attention layer, and outputting a representation vector (mu) corresponding to each phrase semantic feature by the full-connected layert) The specific treatment process of the full connection layer is as follows:
μt=tanh(wtht+bt) (5)
wherein, btThe bias term, which represents the phrase attention layer, is a constant.
The fully-connected layer in the phrase attention layer outputs a representation vector (μ)t) Thereafter, one representative vector can be solved relative to the other representative vectors (μ) according to the following equationw) Degree of importance in the text to be classified, the score (. alpha.) availablet) Represents the degree of importance:
Figure BDA0002882911960000161
finally, the semantic features of each phrase and the corresponding scores are operated as follows:
vt=∑tαtht (7)
wherein v istRepresenting a phrase vector, vtAnd v mentioned aboveiAll the expression is phrase vector. It can be seen that the phrase attention layer can assign a higher score to an important phrase by using an attention mechanism, which indicates that the phrase semantics plays a more important role in the text semantics to be classified, so that the interference of unimportant phrases to the text semantics to be classified can be reduced, and the text semantics quality to be classified is improved.
Optionally, the converting at least one word in the text to be classified into at least one word vector respectively includes: and inputting at least one word in the text to be classified into an embedded model obtained by pre-training to obtain a word vector corresponding to each word.
The embedded model in fig. 2 is a model other than the text classification model, and may of course be used as one of the text classification models, which is not limited in the present application. Generally, an embedded model and a text classification model are used as two independent models to be trained respectively, and when an initial embedded model is trained, the initial embedded model can be obtained by training a word2vec model. Specifically, the word2vec model may include a skip-word model (skip-gram) and a continuous bag of words model (CBOW), where the training samples corresponding to the skip-word model include features that are central words and labels that are contextual words of the central words; the training samples corresponding to the continuous bag-of-words model comprise context words with characteristics as the central words and labels as the central words, wherein the training samples can be obtained from encyclopedic data.
Whether the word skipping model or the continuous bag-of-words model is adopted, the embedded model can be obtained after the training stopping condition is met, and the word vector corresponding to each word in the text to be classified can be obtained according to the word embedded matrix in the embedded model.
The process of initial text classification model training is briefly described below. In one implementation, the initial phrase attention submodel and the initial tag attention submodel in the initial text classification model are trained as a whole. Inputting a first training sample into an initial phrase attention submodel of the initial text classification model, wherein the characteristics in the first training sample are word vectors corresponding to texts, labels are categories corresponding to the texts, the categories are one of candidate categories contained in a hierarchical label structure, and meanwhile, the hierarchical label structure is input into the initial label attention submodel of the initial text classification model, and the initial text classification model can output the prediction result of the classification category to which the texts belong
Figure BDA0002882911960000171
The predicted result is processed
Figure BDA0002882911960000172
The above-mentioned phrase attention layer parameter (w) is adjusted using cross-entropy loss as a loss function in comparison to the corresponding category (y) of the textt) Label attention layer parameter (w)k) And map convolution layer parameters (w)l) When the following loss function reaches a minimum value, the text classification model can be trained.
Figure BDA0002882911960000173
Wherein N is the text length, and C is the number of the types of the text.
Fig. 6 is a schematic structural diagram of a text classification device according to an embodiment of the present application. As shown in fig. 6, the apparatus may include:
a conversion module 601, configured to convert at least one word in a text to be classified into at least one word vector respectively;
a first input module 602, configured to input each word vector into a phrase attention sub-model in a pre-trained text classification model, so as to obtain a plurality of phrase vectors corresponding to the text to be classified;
a second input module 603, configured to input the multiple phrase vectors and each candidate category in the hierarchical tag structure corresponding to the text to be classified into the tag attention sub-model in the text classification model, so as to obtain multiple feature vectors of the text to be classified;
the determining module 604 is configured to obtain a classification result of the text to be classified based on each feature vector of the text to be classified.
Optionally, the tag attention submodel includes: a chart winding layer and a label attention layer;
correspondingly, the second input module 603 is specifically configured to input the plurality of phrase vectors and each candidate category vector in the hierarchical tag structure corresponding to the text to be classified into the tag attention layer, and the tag attention layer obtains a plurality of feature vectors of the text to be classified according to the plurality of phrase vectors and each candidate category vector output by the graph convolution layer.
Optionally, the second input module 603 is further specifically configured to determine, based on each candidate category vector and each phrase vector, a weight of each candidate category with respect to each phrase; and obtaining a plurality of feature vectors of the text to be classified according to the weight of each candidate category relative to each phrase and each phrase vector.
Optionally, the apparatus further comprises: and the processing module is used for carrying out node aggregation processing on each candidate category in the hierarchical label structure by the graph volume layer to obtain each candidate category vector.
Optionally, the phrase attention submodel includes: convolutional layers, two-way long-short term memory layers, and phrase attention layers;
correspondingly, the first input module 602 is specifically configured to input each word vector into the convolution layer to obtain a phrase sequence feature; inputting the phrase sequence characteristics into a bidirectional long-short term memory layer, and extracting phrase context semantic characteristics to obtain phrase semantic characteristics containing context semantics; and inputting the phrase semantic features into a phrase attention layer to obtain a plurality of phrase vectors corresponding to the text to be classified.
Optionally, the first input module 602 is further specifically configured to determine a representation vector according to the semantic feature of the phrase; determining scores of semantic features of the phrases according to the expression vectors; and obtaining a plurality of phrase vectors corresponding to the text to be classified according to the scores of the phrase semantic features and the phrase semantic features.
Optionally, the conversion module 601 is specifically configured to input at least one word in the text to be classified into a pre-trained embedding model, so as to obtain a word vector corresponding to each word.
The above-mentioned apparatus is used for executing the method provided by the foregoing embodiment, and the implementation principle and technical effect are similar, which are not described herein again.
These above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others. For another example, when one of the above modules is implemented in the form of a Processing element scheduler code, the Processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor capable of calling program code. For another example, these modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).
Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application, and as shown in fig. 7, the electronic device may include: a processor 701, a storage medium 702 and a bus 703, wherein the storage medium 702 stores machine-readable instructions executable by the processor 701, when the electronic device is operated, the processor 701 communicates with the storage medium 702 through the bus 703, and the processor 701 executes the machine-readable instructions to execute the steps of the text classification method. The specific implementation and technical effects are similar, and are not described herein again.
Optionally, the present application further provides a storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the text classification method are executed.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. Alternatively, the indirect coupling or communication connection of devices or units may be electrical, mechanical or other.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to perform some steps of the methods according to the embodiments of the present application. And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims (10)

1. A method of text classification, the method comprising:
converting at least one word in the text to be classified into at least one word vector respectively;
inputting each word vector into a phrase attention sub-model in a text classification model obtained by pre-training to obtain a plurality of phrase vectors corresponding to the text to be classified;
inputting the phrase vectors and candidate categories in a hierarchical label structure corresponding to the text to be classified into a label attention sub-model in the text classification model to obtain a plurality of feature vectors of the text to be classified;
and obtaining a classification result of the text to be classified based on each feature vector of the text to be classified, wherein the classification result is used for representing the category of the text to be classified.
2. The method of claim 1, wherein the tag attention submodel comprises: a chart winding layer and a label attention layer;
the step of inputting the plurality of phrase vectors and each candidate category in the hierarchical label structure corresponding to the text to be classified into the label attention submodel in the text classification model to obtain a plurality of feature vectors of the text to be classified includes:
and inputting the plurality of phrase vectors and each candidate category vector in a hierarchical label structure corresponding to the text to be classified into the label attention layer, and obtaining a plurality of feature vectors of the text to be classified by the label attention layer according to the plurality of phrase vectors and each candidate category vector output by the graph convolution layer.
3. The method of claim 2, wherein obtaining a plurality of feature vectors of the text to be classified according to the plurality of phrase vectors and candidate category vectors output by the graph convolution layer comprises:
determining a weight of each of the candidate categories relative to each of the phrases based on each of the candidate category vectors and each of the phrase vectors;
and obtaining a plurality of feature vectors of the text to be classified according to the weight of each candidate category relative to each phrase and each phrase vector.
4. The method according to claim 2, wherein before the inputting the plurality of phrase vectors and each candidate class vector in the hierarchical tag structure corresponding to the text to be classified into the tag attention layer, the tag attention layer obtains a plurality of feature vectors of the text to be classified according to the plurality of phrase vectors and each candidate class vector output by the graph convolution layer, the method further comprises:
and carrying out node aggregation processing on each candidate category in the hierarchical label structure by the graph convolution layer to obtain each candidate category vector.
5. The method of any of claims 1-4, wherein the phrase attention submodel comprises: convolutional layers, two-way long-short term memory layers, and phrase attention layers;
inputting each word vector into a phrase attention submodel in a text classification model obtained by pre-training to obtain a plurality of phrase vectors corresponding to the text to be classified, wherein the phrase attention submodel comprises the following steps:
inputting each word vector into the convolutional layer to obtain phrase sequence characteristics;
inputting the phrase sequence characteristics into the bidirectional long and short term memory layer, and extracting phrase context semantic characteristics to obtain phrase semantic characteristics containing the context semantics;
and inputting the phrase semantic features into the phrase attention layer to obtain a plurality of phrase vectors corresponding to the text to be classified.
6. The method according to claim 5, wherein the entering the phrase semantic features into the phrase attention layer to obtain a plurality of phrase vectors corresponding to the text to be classified comprises:
determining a representation vector according to the phrase semantic features;
determining scores of the semantic features of the phrases according to the expression vectors;
and obtaining a plurality of phrase vectors corresponding to the text to be classified according to the scores of the phrase semantic features and the phrase semantic features.
7. The method according to any one of claims 1-4, wherein converting at least one word in the text to be classified into at least one word vector respectively comprises:
and inputting at least one word in the text to be classified into an embedded model obtained by pre-training to obtain a word vector corresponding to each word.
8. An apparatus for classifying text, the apparatus comprising:
the conversion module is used for converting at least one word in the text to be classified into at least one word vector respectively;
the first input module is used for inputting each word vector into a phrase attention sub-model in a text classification model obtained by pre-training to obtain a plurality of phrase vectors corresponding to the text to be classified;
the second input module is used for inputting the plurality of phrase vectors and each candidate category in the hierarchical label structure corresponding to the text to be classified into the label attention submodel in the text classification model to obtain a plurality of feature vectors of the text to be classified;
and the determining module is used for obtaining a classification result of the text to be classified based on each feature vector of the text to be classified, and the classification result is used for representing the category of the text to be classified.
9. An electronic device, comprising: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating via the bus when the electronic device is operating, the processor executing the machine-readable instructions to perform the steps of the text classification method according to any one of claims 1 to 7.
10. A storage medium, characterized in that the storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps of the text classification method according to any one of claims 1 to 7.
CN202110005508.0A 2021-01-04 2021-01-04 Text classification method, device, equipment and storage medium Withdrawn CN112667782A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110005508.0A CN112667782A (en) 2021-01-04 2021-01-04 Text classification method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110005508.0A CN112667782A (en) 2021-01-04 2021-01-04 Text classification method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112667782A true CN112667782A (en) 2021-04-16

Family

ID=75412771

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110005508.0A Withdrawn CN112667782A (en) 2021-01-04 2021-01-04 Text classification method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112667782A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113177118A (en) * 2021-04-29 2021-07-27 中国邮政储蓄银行股份有限公司 Text classification model, text classification method and device
CN113312480A (en) * 2021-05-19 2021-08-27 北京邮电大学 Scientific and technological thesis level multi-label classification method and device based on graph convolution network
CN113342943A (en) * 2021-08-05 2021-09-03 北京明略软件系统有限公司 Training method and device for classification model
CN113553433A (en) * 2021-09-17 2021-10-26 平安科技(深圳)有限公司 Product classification method, device, medium and terminal equipment based on artificial intelligence
CN113656579A (en) * 2021-07-23 2021-11-16 北京亿欧网盟科技有限公司 Text classification method, device, equipment and medium

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113177118A (en) * 2021-04-29 2021-07-27 中国邮政储蓄银行股份有限公司 Text classification model, text classification method and device
CN113312480A (en) * 2021-05-19 2021-08-27 北京邮电大学 Scientific and technological thesis level multi-label classification method and device based on graph convolution network
CN113656579A (en) * 2021-07-23 2021-11-16 北京亿欧网盟科技有限公司 Text classification method, device, equipment and medium
CN113656579B (en) * 2021-07-23 2024-01-26 北京亿欧网盟科技有限公司 Text classification method, device, equipment and medium
CN113342943A (en) * 2021-08-05 2021-09-03 北京明略软件系统有限公司 Training method and device for classification model
CN113553433A (en) * 2021-09-17 2021-10-26 平安科技(深圳)有限公司 Product classification method, device, medium and terminal equipment based on artificial intelligence
CN113553433B (en) * 2021-09-17 2022-01-07 平安科技(深圳)有限公司 Product classification method, device, medium and terminal equipment based on artificial intelligence

Similar Documents

Publication Publication Date Title
CN111897908B (en) Event extraction method and system integrating dependency information and pre-training language model
CN108363790B (en) Method, device, equipment and storage medium for evaluating comments
CN107608956B (en) Reader emotion distribution prediction algorithm based on CNN-GRNN
CN111783474B (en) Comment text viewpoint information processing method and device and storage medium
CN112667782A (en) Text classification method, device, equipment and storage medium
Jabreel et al. Target-dependent sentiment analysis of tweets using bidirectional gated recurrent neural networks
CN109086265B (en) Semantic training method and multi-semantic word disambiguation method in short text
CN112052684A (en) Named entity identification method, device, equipment and storage medium for power metering
Kaur Incorporating sentimental analysis into development of a hybrid classification model: A comprehensive study
CN113591483A (en) Document-level event argument extraction method based on sequence labeling
CN110502742B (en) Complex entity extraction method, device, medium and system
CN111859967A (en) Entity identification method and device and electronic equipment
Liu et al. Automatic document metadata extraction based on deep networks
CN113312480A (en) Scientific and technological thesis level multi-label classification method and device based on graph convolution network
CN114936290A (en) Data processing method and device, storage medium and electronic equipment
CN113704396A (en) Short text classification method, device, equipment and storage medium
Hamdy et al. Deep mining of open source software bug repositories
CN112667803A (en) Text emotion classification method and device
CN116415593B (en) Research front identification method, system, electronic equipment and storage medium
Achilles et al. Using Surface and Semantic Features for Detecting Early Signs of Self-Harm in Social Media Postings.
JP7043373B2 (en) Information processing equipment, information processing methods, and programs
CN110851600A (en) Text data processing method and device based on deep learning
CN116432660A (en) Pre-training method and device for emotion analysis model and electronic equipment
Pedipina et al. Sentimental analysis on twitter data of political domain
CN114595324A (en) Method, device, terminal and non-transitory storage medium for power grid service data domain division

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20210416

WW01 Invention patent application withdrawn after publication