CN112667782A

CN112667782A - Text classification method, device, equipment and storage medium

Info

Publication number: CN112667782A
Application number: CN202110005508.0A
Authority: CN
Inventors: 王硕; 周星杰; 杨康; 徐成国
Original assignee: Shanghai Minglue Artificial Intelligence Group Co Ltd
Current assignee: Shanghai Minglue Artificial Intelligence Group Co Ltd
Priority date: 2021-01-04
Filing date: 2021-01-04
Publication date: 2021-04-16

Abstract

The application provides a text classification method, a text classification device, text classification equipment and a storage medium, and relates to the technical field of natural language processing. The method comprises the following steps: converting at least one word in the text to be classified into at least one word vector respectively; inputting each word vector into a phrase attention sub-model in a text classification model obtained by pre-training to obtain a plurality of phrase vectors corresponding to the text to be classified; inputting a plurality of phrase vectors and each candidate category in a hierarchical label structure corresponding to the text to be classified into a label attention sub-model in the text classification model to obtain a plurality of feature vectors of the text to be classified; and obtaining a classification result of the text to be classified based on each feature vector of the text to be classified, wherein the classification result is used for representing the category of the text to be classified. By applying the method and the device, the precision of classifying the texts to be classified can be improved.

Description

Text classification method, device, equipment and storage medium

Technical Field

The present application relates to the field of natural language processing technologies, and in particular, to a text classification method, apparatus, device, and storage medium.

Background

The text classification problem is an important research direction in the field of natural language processing, and has related application in the fields of emotion analysis and information retrieval. The hierarchical multi-label text classification method is an important method for solving the problem of text classification, and is concerned by learners in recent years.

At present, in a hierarchical multi-label text classification method, a plurality of two classifiers are constructed to classify hierarchical multi-label texts by assuming that labels are mutually independent and then converting the labels into a two-classification problem.

However, there is a complex dependency relationship between labels in the hierarchical multi-label text classification task, and the text classification precision is reduced by classifying the hierarchical multi-label text in the prior art.

Disclosure of Invention

An object of the present application is to provide a method, an apparatus, a device and a storage medium for text classification, which can improve the accuracy of text classification.

In order to achieve the above purpose, the technical solutions adopted in the embodiments of the present application are as follows:

in a first aspect, an embodiment of the present application provides a text classification method, where the method includes:

converting at least one word in the text to be classified into at least one word vector respectively;

inputting each word vector into a phrase attention sub-model in a text classification model obtained by pre-training to obtain a plurality of phrase vectors corresponding to the text to be classified;

inputting the phrase vectors and candidate categories in a hierarchical label structure corresponding to the text to be classified into a label attention sub-model in the text classification model to obtain a plurality of feature vectors of the text to be classified;

and obtaining a classification result of the text to be classified based on each feature vector of the text to be classified, wherein the classification result is used for representing the category of the text to be classified.

Optionally, the tag attention submodel includes: a chart winding layer and a label attention layer;

the step of inputting the plurality of phrase vectors and each candidate category in the hierarchical label structure corresponding to the text to be classified into the label attention submodel in the text classification model to obtain a plurality of feature vectors of the text to be classified includes:

and inputting the plurality of phrase vectors and each candidate category vector in a hierarchical label structure corresponding to the text to be classified into the label attention layer, and obtaining a plurality of feature vectors of the text to be classified by the label attention layer according to the plurality of phrase vectors and each candidate category vector output by the graph convolution layer.

Optionally, the obtaining a plurality of feature vectors of the text to be classified according to the plurality of phrase vectors and the candidate category vectors output by the graph convolution layer includes:

determining a weight of each of the candidate categories relative to each of the phrases based on each of the candidate category vectors and each of the phrase vectors;

and obtaining a plurality of feature vectors of the text to be classified according to the weight of each candidate category relative to each phrase and each phrase vector.

Optionally, before the inputting the plurality of phrase vectors and each candidate category vector in the hierarchical tag structure corresponding to the text to be classified into the tag attention layer, and the tag attention layer obtaining a plurality of feature vectors of the text to be classified according to the plurality of phrase vectors and each candidate category vector output by the graph convolution layer, the method further includes:

and carrying out node aggregation processing on each candidate category in the hierarchical label structure by the graph convolution layer to obtain each candidate category vector.

Optionally, the phrase attention submodel includes: convolutional layers, two-way long-short term memory layers, and phrase attention layers;

optionally, the inputting each word vector into a phrase attention submodel in a text classification model obtained through pre-training to obtain a plurality of phrase vectors corresponding to the text to be classified includes:

inputting each word vector into the convolutional layer to obtain phrase sequence characteristics;

inputting the phrase sequence characteristics into the bidirectional long and short term memory layer, and extracting phrase context semantic characteristics to obtain phrase semantic characteristics containing the context semantics;

and inputting the phrase semantic features into the phrase attention layer to obtain a plurality of phrase vectors corresponding to the text to be classified.

Optionally, the inputting the phrase semantic features into the phrase attention layer to obtain a plurality of phrase vectors corresponding to the text to be classified includes:

determining a representation vector according to the phrase semantic features;

determining scores of the semantic features of the phrases according to the expression vectors;

and obtaining a plurality of phrase vectors corresponding to the text to be classified according to the scores of the phrase semantic features and the phrase semantic features.

Optionally, the converting at least one word in the text to be classified into at least one word vector respectively includes:

and inputting at least one word in the text to be classified into an embedded model obtained by pre-training to obtain a word vector corresponding to each word.

In a second aspect, an embodiment of the present application further provides a text classification apparatus, where the apparatus includes:

the conversion module is used for converting at least one word in the text to be classified into at least one word vector respectively;

the first input module is used for inputting each word vector into a phrase attention sub-model in a text classification model obtained by pre-training to obtain a plurality of phrase vectors corresponding to the text to be classified;

the second input module is used for inputting the plurality of phrase vectors and each candidate category in the hierarchical label structure corresponding to the text to be classified into the label attention submodel in the text classification model to obtain a plurality of feature vectors of the text to be classified;

and the determining module is used for obtaining a classification result of the text to be classified based on each feature vector of the text to be classified, and the classification result is used for representing the category of the text to be classified.

correspondingly, the second input module is specifically configured to input the plurality of phrase vectors and each candidate category vector in the hierarchical tag structure corresponding to the text to be classified into the tag attention layer, and the tag attention layer obtains a plurality of feature vectors of the text to be classified according to the plurality of phrase vectors and each candidate category vector output by the graph convolution layer.

Optionally, the second input module is further specifically configured to determine, based on each candidate category vector and each phrase vector, a weight of each candidate category with respect to each phrase; and obtaining a plurality of feature vectors of the text to be classified according to the weight of each candidate category relative to each phrase and each phrase vector.

Optionally, the apparatus further comprises:

and the processing module is used for carrying out node aggregation processing on each candidate category in the hierarchical label structure by the graph convolution layer to obtain each candidate category vector.

correspondingly, the first input module is specifically configured to input each word vector into the convolutional layer to obtain a phrase sequence feature; inputting the phrase sequence characteristics into the bidirectional long and short term memory layer, and extracting phrase context semantic characteristics to obtain phrase semantic characteristics containing the context semantics; and inputting the phrase semantic features into the phrase attention layer to obtain a plurality of phrase vectors corresponding to the text to be classified.

Optionally, the first input module is further specifically configured to determine a representation vector according to the phrase semantic features; determining scores of the semantic features of the phrases according to the expression vectors; and obtaining a plurality of phrase vectors corresponding to the text to be classified according to the scores of the phrase semantic features and the phrase semantic features.

Optionally, the conversion module is specifically configured to input at least one word in the text to be classified into an embedding model obtained through pre-training, so as to obtain a word vector corresponding to each word.

In a third aspect, an embodiment of the present application provides an electronic device, including: a processor, a storage medium and a bus, wherein the storage medium stores machine-readable instructions executable by the processor, when the electronic device runs, the processor communicates with the storage medium through the bus, and the processor executes the machine-readable instructions to execute the steps of the text classification method of the first aspect.

In a fourth aspect, the present application provides a storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the text classification method of the first aspect.

The beneficial effect of this application is:

the embodiment of the application provides a text classification method, a text classification device, text classification equipment and a storage medium, wherein the method comprises the following steps: converting at least one word in the text to be classified into at least one word vector respectively; inputting each word vector into a phrase attention sub-model in a text classification model obtained by pre-training to obtain a plurality of phrase vectors corresponding to the text to be classified; inputting a plurality of phrase vectors and each candidate category in a hierarchical label structure corresponding to the text to be classified into a label attention sub-model in the text classification model to obtain a plurality of feature vectors of the text to be classified; and obtaining a classification result of the text to be classified based on each feature vector of the text to be classified, wherein the classification result is used for representing the category of the text to be classified. By adopting the text classification method provided by the embodiment of the application, on the premise that the semantics of the text to be classified is represented by the phrase vectors, the candidate categories in the hierarchical label structure corresponding to the text to be classified are input into the label attention submodel in the text classification model, the dependency relationship and the hierarchical structure characteristics among the candidate categories are captured by using the label attention submodel, and then the phrase vectors are combined with the captured dependency relationship and the hierarchical structure characteristics among the candidate categories by using the label attention submodel to obtain the classification result of the text to be classified, so that the precision of classifying the text to be classified can be improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.

Fig. 1 is a schematic flowchart of a file classification method according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of a text classification model according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a hierarchical label structure provided by an embodiment of the present application;

fig. 4 is a schematic flowchart of another text classification method according to an embodiment of the present application;

fig. 5 is a schematic flowchart of another text classification method according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a text classification apparatus according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments.

Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

Before explaining the embodiments of the present application in detail, an application scenario of the present application will be described first. The application scenario may be a certain service scenario, such as academic paper classification, emotion analysis, news classification, and the like, which is not limited in the present application. Here, a news classification scenario is taken as an example, in the news classification scenario, a device (e.g., a server) with certain processing capability processes a text input by a user to obtain a phrase vector capable of representing semantic information of the text, and the device obtains a category corresponding to the text based on the phrase vector and a pre-trained text classification model. The text input by the user may correspond to a plurality of tags, dependency relationships exist among the tags, and the application of the embodiment of the application can determine the type of the text input by the user, that is, the tag corresponding to the text, for example, it can be determined that the tag corresponding to the text input by the user is sports news under sports news, and then the sports news under the sports news can be pushed to the user, so that the content desired by the user can be more accurately pushed to the user, and other application scenes are similar.

The text classification method mentioned in the present application is exemplified below with reference to the drawings. Fig. 1 is a schematic flowchart of a file classification method according to an embodiment of the present application. As shown in fig. 1, the method may include:

s101, converting at least one word in the text to be classified into at least one word vector respectively.

The text to be classified may be a descriptive title of an article, a paper, a news article, etc., and the source of the text to be classified is not limited in the present application. Segmenting the text to be classified by using a jieba (jieba) word segmentation tool, deleting special symbols, stop words and the like contained in the text to be classified, and finally obtaining a plurality of words contained in the text to be classified. Each word is vectorized through a pre-trained word vectorization model, and each word can be converted into a corresponding word vector through the pre-trained word vectorization model, that is, each word is represented by a word vector with a preset dimension (e.g., 300).

And S102, inputting each word vector into a phrase attention sub-model in a text classification model obtained through pre-training to obtain a plurality of phrase vectors corresponding to the text to be classified.

The text classification model may include a phrase attention submodel and a tag attention submodel, as shown in fig. 2. The phrase attention submodel before training and the label attention submodel before training may be trained as a whole to obtain the text classification model, or the phrase attention submodel before training and the label attention submodel before training may be trained respectively to obtain the phrase attention submodel and the label attention submodel, and the text classification model may be obtained by combining the phrase attention submodel and the label attention submodel, which is not limited in this application.

The phrase attention submodel can convert the obtained text features of the text to be classified represented by the word vectors into the text features of the text to be classified represented by a plurality of phrase vectors. That is to say, the phrase attention submodel may capture phrase features formed by a plurality of words in the text to be classified, so as to improve the accuracy of obtaining the text features of the text to be classified.

S103, inputting the plurality of phrase vectors and each candidate category in the hierarchical label structure corresponding to the text to be classified into the label attention submodel in the text classification model to obtain a plurality of feature vectors of the text to be classified.

And S104, obtaining a classification result of the text to be classified based on each feature vector of the text to be classified.

The hierarchical label structure may be as shown in fig. 3, each candidate category in the hierarchical label structure corresponds to each node in fig. 3, and a label corresponding to a certain node may include label information corresponding to other nodes along the way, for example, the content included in the sports news label in fig. 3 is sports news under sports news. The hierarchical label structure is related to the application scenario corresponding to the text to be classified, that is, if the text to be classified belongs to a news article, the hierarchical label structure corresponding to the text to be classified may be as shown in fig. 3, and if the text to be classified belongs to a paper, the labels corresponding to the nodes in the hierarchical label structure may include a chemical label, a biological label, and a physical label, and the physical label may include an optical label, a plasma label, and the like, that is, different application scenarios, and the hierarchical label structure may be different.

Each candidate category in the hierarchical tag structure may be encoded in advance to obtain an initial vector corresponding to each candidate category, where the initial vector of each candidate category is used to characterize the own features of each candidate category. Inputting the initial vector of each candidate category into the tag attention submodel, wherein the tag attention submodel can capture the dependency relationship of each candidate category in the hierarchical tag structure to obtain the target vector of each candidate category, and the target vector of a certain candidate category is used for representing the information of the combination of the characteristics of the target vector and the characteristics corresponding to other candidate categories with connection relationships.

Through the label attention submodel, phrase vectors corresponding to the text to be classified can be combined with target vectors of candidate categories, and the corresponding relation between the text to be classified and the candidate categories can be obtained and can be characterized by feature vectors. And finally, inputting each feature vector of the text to be classified into a full connection layer in the label attention submodel, and outputting the category corresponding to the text to be classified by the full connection layer.

To sum up, in the text classification method provided by the present application, on the premise that the semantics of the text to be classified is represented by the phrase vectors, the candidate categories in the hierarchical label structure corresponding to the text to be classified are input into the label attention submodel in the text classification model, the label attention submodel is used to capture the dependency relationship and the hierarchical structure characteristics between the candidate categories, and then the label attention submodel is used to combine the phrase vectors with the captured dependency relationship and hierarchical structure characteristics between the candidate categories to obtain the classification result of the text to be classified, so that the precision of classifying the text to be classified can be improved.

Optionally, the tag attention submodel in fig. 2 may include: a chart winding layer and a label attention layer; the above inputting a plurality of phrase vectors and each candidate category in the hierarchical tag structure corresponding to the text to be classified into the tag attention submodel in the text classification model to obtain a plurality of feature vectors of the text to be classified includes:

inputting a plurality of phrase vectors and each candidate category vector in a hierarchical label structure corresponding to the text to be classified into the label attention layer, and obtaining a plurality of feature vectors of the text to be classified by the label attention layer according to the plurality of phrase vectors and each candidate category vector output by the graph convolution layer.

The initial vector of each candidate category in the hierarchical label structure can be obtained according to a word embedding matrix in a pre-trained word vector conversion model, the initial vector of each candidate category is input into the graph volume layer, and the graph volume layer outputs each candidate category vector, wherein each candidate category vector is equivalent to the target vector of each candidate category mentioned above. And respectively inputting each phrase vector and each candidate category vector into the label attention layer, wherein the label attention layer can obtain the interactive characteristics of the text to be classified and each candidate category vector through an attention mechanism, and the interactive characteristics are used for representing a plurality of characteristic vectors of the text to be classified.

Optionally, the tag attention submodel in fig. 2 may further include an output layer, and the output layer may output a classification result of the text to be classified based on each feature vector of the text to be classified.

Fig. 4 is a flowchart illustrating another text classification method according to an embodiment of the present application. As shown in fig. 4, optionally, the obtaining a plurality of feature vectors of the text to be classified according to the plurality of phrase vectors and the candidate category vectors output by the graph convolution layer includes:

s401, determining the weight of each candidate category relative to each phrase based on each candidate category vector and each phrase vector.

Wherein, the label attention layer parameter (w) in the text classification model obtained by pre-training_k) Satisfying the preset training stop condition, each candidate category vector output by the last layer (I layer) of the graph convolution layer with I layer

Into a fully connected layer in the tag attention layer. Specifically, the number of fully-connected layers in the tag attention layer may be multiple, and the number of fully-connected layers is the same as the number of candidate categories, that is, each candidate category vector may be input to the corresponding tag attention layer parameter w_kIn the full-connected layer(s), each candidate class feature output by each full-connected layer is:

wherein, b_kIndicating the attention level of the labelA bias term, which is a constant; the value of k is 1-P, and P represents the number of candidate category vectors.

Outputting candidate class features (g) at fully connected layers in the tag attention layer_k) Then, g can be_kWith each phrase vector (v)_i) The following treatments were carried out:

wherein, the value of i is 1-n, n represents the number of phrase vectors, and the weight (beta) of each candidate category relative to each phrase can be obtained through a formula (2)_ki)。

For example, if P is 2, i.e. there are 2 candidate class vectors, n is 3, i.e. there are 3 phrase vectors, the weight of the first candidate class vector relative to the first phrase vector, the second phrase vector and the third phrase vector is β₁₁、β₁₂、β₁₃The second candidate category vector is weighted β with respect to the first phrase vector, the second phrase vector, and the third phrase vector₂₁、β₂₂、β₂₃。

S402, obtaining a plurality of feature vectors of the text to be classified according to the weight of each candidate category relative to each phrase and each phrase vector.

The weight (beta) of each candidate category relative to each phrase can be determined separately_ki) And each phrase vector (v)_i) Weighting and processing to obtain a plurality of characteristic vectors (o) corresponding to the text to be classified_k) The number of feature vectors is the same as the number (P) of candidate class vectors, and can be specifically realized by the following formula:

continuing with the above example, the first candidate category vector is weighted β with respect to the first phrase vector₁₁And a first phrase vector v₁Multiplied vector, plus first candidate category vector relative to second phraseVector weight β₁₂And a second phrase vector v₂The multiplied vector is added with the weight beta of the first candidate class vector relative to the third phrase vector₁₃And a third phrase vector v₃The multiplied vector finally obtains a first characteristic vector (o) of the text to be classified₁) The second candidate category vectors are similar, so that a second feature vector (o) of the text to be classified can be obtained₂)。

Optionally, before the step of inputting the phrase vectors and the candidate category vectors in the hierarchical tag structure corresponding to the text to be classified into the tag attention layer, and obtaining the feature vectors of the text to be classified by the tag attention layer according to the phrase vectors and the candidate category vectors output by the graph convolution layer, the method further includes: and carrying out node aggregation processing on each candidate category in the hierarchical label structure by the graph convolution layer to obtain each candidate category vector.

Wherein the Graph convolution layer is a Graph Convolution Network (GCN), and each candidate class in the hierarchical label structure can be represented by a vector through the Graph convolution layer. Specifically, as shown in fig. 3, nodes (candidate classes) having a connection relationship in the hierarchical label structure are used as neighboring nodes to each other, the graph convolutional layer may aggregate features of the central node and features of neighboring nodes having a connection relationship with the central node to obtain features of the central node, the features of the central node may be characterized by a candidate feature vector corresponding to the central node, the graph convolutional layer may capture a hierarchical relationship and a dependency relationship between nodes, and may more accurately express the candidate feature vector corresponding to each node.

When the graph volume layer has multiple layers (such as I layer), the next layer (l +1) is updated based on the candidate class vector on the previous layer (l), and the specific processing procedure is as follows:

wherein the content of the first and second substances,

representing the candidate class vector output by the node i at level l +1, c_ijTo normalize the factor, N_iAll neighbor nodes of node i, w^lAs weight matrix parameters of the l-th layer, b^lIs the layer i offset.

Fig. 5 is a flowchart illustrating another text classification method according to an embodiment of the present application. As shown in fig. 5, the phrase attention submodel in fig. 2 may optionally include: convolutional layers, two-way long-short term memory layers, and phrase attention layers; the above inputting each word vector into the phrase attention submodel in the text classification model obtained by pre-training to obtain a plurality of phrase vectors corresponding to the text to be classified includes:

s501, inputting each word vector into the convolutional layer to obtain phrase sequence characteristics.

Wherein the convolutional layer may comprise a plurality of filters of dimension h, each word vector (x)_i) And splicing to obtain a word vector sequence, inputting the word vector sequence into each filter on the convolution layer, and performing convolution on the word vector sequence by using a filter with the scale of h according to a preset step length, wherein the step length can be 1, 2 and the like.

The filter with the scale h performs convolution on the word vector sequence according to a preset step length to obtain the phrase sequence characteristics by the following process:

c_ij＝f(w_j·x_i:i+h-1+b_j)

wherein j, b_j∈R，w_jDenotes the jth filter, b_jFor the bias term corresponding to the jth filter, f represents a non-linear function that may be selected as the activation function ReLU. w is a_j·x_i:i+h-1Denotes the sub-sequence (x) of the length h of the jth filter in the word vector sequence_i:i+h-1) A convolution operation of_ijAnd finally, combining the phrase characteristic values obtained by other filters to obtain a phrase sequence characteristic consisting of a plurality of initial phrase vectors.

S502, inputting the phrase sequence characteristics into the bidirectional long-short term memory layer, and extracting the phrase context semantic characteristics to obtain the phrase semantic characteristics containing the context semantic.

Wherein the Bi-directional long short term memory (Bi-LSTM) layer can read the phrase sequence features from left to right

Represents; reading the phrase sequence features from right to left

Representing, and then corresponding the forward direction vector of the initial phrase vector

And the direction vector

Splicing:

wherein h is_tThe phrase semantic features which contain context semantics and correspond to the initial phrase vectors can be obtained through the bidirectional long-short term memory layer. That is, the bidirectional long-short term memory layer can extract the phrase context features, and the phrase features can be used for more accurately representing the features of the text to be classified. The semantic features of the phrase are also expressed by vectors, and the number of the semantic features of the phrase corresponds to the number of the initial phrase vectors.

S503, inputting the semantic feature of the phrase into the attention layer of the phrase to obtain a plurality of phrase vectors corresponding to the text to be classified.

Optionally, determining a representation vector according to the semantic feature of the phrase; determining a score of the semantic feature of the phrase according to the expression vector; and obtaining a plurality of phrase vectors corresponding to the text to be classified according to the score of the phrase semantic feature and the phrase semantic feature.

Wherein, training an initial text classification model in advance can obtain a phrase attention layer parameter (w)_t) Inputting each phrase semantic feature into a full-connected layer in the phrase attention layer, and outputting a representation vector (mu) corresponding to each phrase semantic feature by the full-connected layer_t) The specific treatment process of the full connection layer is as follows:

μ_t＝tanh(w_th_t+b_t) (5)

wherein, b_tThe bias term, which represents the phrase attention layer, is a constant.

The fully-connected layer in the phrase attention layer outputs a representation vector (μ)_t) Thereafter, one representative vector can be solved relative to the other representative vectors (μ) according to the following equation_w) Degree of importance in the text to be classified, the score (. alpha.) available_t) Represents the degree of importance:

finally, the semantic features of each phrase and the corresponding scores are operated as follows:

v_t＝∑_tα_th_t (7)

wherein v is_tRepresenting a phrase vector, v_tAnd v mentioned above_iAll the expression is phrase vector. It can be seen that the phrase attention layer can assign a higher score to an important phrase by using an attention mechanism, which indicates that the phrase semantics plays a more important role in the text semantics to be classified, so that the interference of unimportant phrases to the text semantics to be classified can be reduced, and the text semantics quality to be classified is improved.

Optionally, the converting at least one word in the text to be classified into at least one word vector respectively includes: and inputting at least one word in the text to be classified into an embedded model obtained by pre-training to obtain a word vector corresponding to each word.

The embedded model in fig. 2 is a model other than the text classification model, and may of course be used as one of the text classification models, which is not limited in the present application. Generally, an embedded model and a text classification model are used as two independent models to be trained respectively, and when an initial embedded model is trained, the initial embedded model can be obtained by training a word2vec model. Specifically, the word2vec model may include a skip-word model (skip-gram) and a continuous bag of words model (CBOW), where the training samples corresponding to the skip-word model include features that are central words and labels that are contextual words of the central words; the training samples corresponding to the continuous bag-of-words model comprise context words with characteristics as the central words and labels as the central words, wherein the training samples can be obtained from encyclopedic data.

Whether the word skipping model or the continuous bag-of-words model is adopted, the embedded model can be obtained after the training stopping condition is met, and the word vector corresponding to each word in the text to be classified can be obtained according to the word embedded matrix in the embedded model.

The process of initial text classification model training is briefly described below. In one implementation, the initial phrase attention submodel and the initial tag attention submodel in the initial text classification model are trained as a whole. Inputting a first training sample into an initial phrase attention submodel of the initial text classification model, wherein the characteristics in the first training sample are word vectors corresponding to texts, labels are categories corresponding to the texts, the categories are one of candidate categories contained in a hierarchical label structure, and meanwhile, the hierarchical label structure is input into the initial label attention submodel of the initial text classification model, and the initial text classification model can output the prediction result of the classification category to which the texts belong

The predicted result is processed

The above-mentioned phrase attention layer parameter (w) is adjusted using cross-entropy loss as a loss function in comparison to the corresponding category (y) of the text_t) Label attention layer parameter (w)_k) And map convolution layer parameters (w)^l) When the following loss function reaches a minimum value, the text classification model can be trained.

Wherein N is the text length, and C is the number of the types of the text.

Fig. 6 is a schematic structural diagram of a text classification device according to an embodiment of the present application. As shown in fig. 6, the apparatus may include:

a conversion module 601, configured to convert at least one word in a text to be classified into at least one word vector respectively;

a first input module 602, configured to input each word vector into a phrase attention sub-model in a pre-trained text classification model, so as to obtain a plurality of phrase vectors corresponding to the text to be classified;

a second input module 603, configured to input the multiple phrase vectors and each candidate category in the hierarchical tag structure corresponding to the text to be classified into the tag attention sub-model in the text classification model, so as to obtain multiple feature vectors of the text to be classified;

the determining module 604 is configured to obtain a classification result of the text to be classified based on each feature vector of the text to be classified.

correspondingly, the second input module 603 is specifically configured to input the plurality of phrase vectors and each candidate category vector in the hierarchical tag structure corresponding to the text to be classified into the tag attention layer, and the tag attention layer obtains a plurality of feature vectors of the text to be classified according to the plurality of phrase vectors and each candidate category vector output by the graph convolution layer.

Optionally, the second input module 603 is further specifically configured to determine, based on each candidate category vector and each phrase vector, a weight of each candidate category with respect to each phrase; and obtaining a plurality of feature vectors of the text to be classified according to the weight of each candidate category relative to each phrase and each phrase vector.

Optionally, the apparatus further comprises: and the processing module is used for carrying out node aggregation processing on each candidate category in the hierarchical label structure by the graph volume layer to obtain each candidate category vector.

correspondingly, the first input module 602 is specifically configured to input each word vector into the convolution layer to obtain a phrase sequence feature; inputting the phrase sequence characteristics into a bidirectional long-short term memory layer, and extracting phrase context semantic characteristics to obtain phrase semantic characteristics containing context semantics; and inputting the phrase semantic features into a phrase attention layer to obtain a plurality of phrase vectors corresponding to the text to be classified.

Optionally, the first input module 602 is further specifically configured to determine a representation vector according to the semantic feature of the phrase; determining scores of semantic features of the phrases according to the expression vectors; and obtaining a plurality of phrase vectors corresponding to the text to be classified according to the scores of the phrase semantic features and the phrase semantic features.

Optionally, the conversion module 601 is specifically configured to input at least one word in the text to be classified into a pre-trained embedding model, so as to obtain a word vector corresponding to each word.

The above-mentioned apparatus is used for executing the method provided by the foregoing embodiment, and the implementation principle and technical effect are similar, which are not described herein again.

These above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others. For another example, when one of the above modules is implemented in the form of a Processing element scheduler code, the Processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor capable of calling program code. For another example, these modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).

Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application, and as shown in fig. 7, the electronic device may include: a processor 701, a storage medium 702 and a bus 703, wherein the storage medium 702 stores machine-readable instructions executable by the processor 701, when the electronic device is operated, the processor 701 communicates with the storage medium 702 through the bus 703, and the processor 701 executes the machine-readable instructions to execute the steps of the text classification method. The specific implementation and technical effects are similar, and are not described herein again.

Optionally, the present application further provides a storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the text classification method are executed.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. Alternatively, the indirect coupling or communication connection of devices or units may be electrical, mechanical or other.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to perform some steps of the methods according to the embodiments of the present application. And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims

1. A method of text classification, the method comprising:

2. The method of claim 1, wherein the tag attention submodel comprises: a chart winding layer and a label attention layer;

3. The method of claim 2, wherein obtaining a plurality of feature vectors of the text to be classified according to the plurality of phrase vectors and candidate category vectors output by the graph convolution layer comprises:

4. The method according to claim 2, wherein before the inputting the plurality of phrase vectors and each candidate class vector in the hierarchical tag structure corresponding to the text to be classified into the tag attention layer, the tag attention layer obtains a plurality of feature vectors of the text to be classified according to the plurality of phrase vectors and each candidate class vector output by the graph convolution layer, the method further comprises:

5. The method of any of claims 1-4, wherein the phrase attention submodel comprises: convolutional layers, two-way long-short term memory layers, and phrase attention layers;

inputting each word vector into a phrase attention submodel in a text classification model obtained by pre-training to obtain a plurality of phrase vectors corresponding to the text to be classified, wherein the phrase attention submodel comprises the following steps:

6. The method according to claim 5, wherein the entering the phrase semantic features into the phrase attention layer to obtain a plurality of phrase vectors corresponding to the text to be classified comprises:

determining a representation vector according to the phrase semantic features;

7. The method according to any one of claims 1-4, wherein converting at least one word in the text to be classified into at least one word vector respectively comprises:

8. An apparatus for classifying text, the apparatus comprising:

9. An electronic device, comprising: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating via the bus when the electronic device is operating, the processor executing the machine-readable instructions to perform the steps of the text classification method according to any one of claims 1 to 7.

10. A storage medium, characterized in that the storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps of the text classification method according to any one of claims 1 to 7.