CN113220886A

CN113220886A - Text classification method, text classification model training method and related equipment

Info

Publication number: CN113220886A
Application number: CN202110600745.1A
Authority: CN
Inventors: 赵宏宇; 赵国庆; 蒋宁; 王洪斌; 吴海英; 林亚臣
Original assignee: Mashang Xiaofei Finance Co Ltd
Current assignee: Mashang Xiaofei Finance Co Ltd
Priority date: 2021-05-31
Filing date: 2021-05-31
Publication date: 2021-08-06

Abstract

The application discloses a text classification method, a text classification model training method and related equipment, wherein the text classification method comprises the following steps: acquiring text information to be classified, wherein the text information comprises sentences and phrases; constructing a heteromorphic graph according to the sentences and the phrases, wherein nodes in the heteromorphic graph are composed of the sentences and the phrases; inputting graph data corresponding to the heterogeneous graph into a text classification model, and outputting a classification result of the text information; the classification result is determined and obtained based on N groups of node features, the N groups of node features are determined and obtained based on N-head self-attention mechanisms and the graph data, the N groups of node features correspond to the N-head attention mechanisms, and N is an integer greater than 1. Therefore, the method is beneficial to paying attention to the node characteristics from the N subspaces, improves the extraction capability of the node characteristics, can also improve the attention probability of the low-frequency phrase, reduces the problem of sparse characteristic matrix, and improves the accuracy of text classification.

Description

Text classification method, text classification model training method and related equipment

Technical Field

The application belongs to the technical field of natural language processing, and particularly relates to a text classification method, a text classification model training method and related equipment.

Background

Text classification is one of the most basic, important tasks in Natural Language Processing (NLP). Its main function is to classify a sentence or a text. For example, it may be determined which type of politics, sports, military, society the news category of the text belongs to, which type of positive energy, negative energy the emotion category of the text belongs to, which type of good comment, neutral comment, bad comment the comment category of the text belongs to, etc.

In practical applications, the problem of sparse feature matrix is often encountered, that is, some words have less chance or frequency to appear, but have important features, for example, words such as "cheerful", "intoxicated", "hesitancy" and the like in emotion classification are less frequent than words such as "happy", "happy" and the like, and words such as "sadness", "hesitancy", "cold and depressed" and the like which represent hurry are less frequent than words such as "sadness", "difficult to pass" and the like. That is, the conventional text classification algorithm has low accuracy of text classification.

Disclosure of Invention

The embodiment of the application aims to provide a text classification method, a text classification model training method and related equipment, which are used for improving the accuracy of text classification.

In a first aspect, an embodiment of the present application provides a text classification method, where the method includes:

acquiring text information to be classified, wherein the text information comprises sentences and phrases;

constructing a heteromorphic graph according to the sentences and the phrases, wherein nodes in the heteromorphic graph are composed of the sentences and the phrases;

inputting graph data corresponding to the heterogeneous graph into a text classification model, and outputting a classification result of the text information; the classification result is determined and obtained based on N groups of node features, the N groups of node features are determined and obtained based on N-head self-attention mechanisms and the graph data, the N groups of node features correspond to the N-head attention mechanisms, and N is an integer greater than 1.

Therefore, in the embodiment of the application, the text information to be classified is converted into the image data of the heteromorphic graph, so that the feature data of each phrase and each sentence in the text information is extracted, the node classification of the image data can be conveniently predicted based on the text classification model in the follow-up process, and the classification result of the text information is obtained. Meanwhile, an N-head self-attention mechanism is adopted in the text classification model, and N times of graph convolution calculation can be carried out on graph data, so that the characteristics of nodes in N different spaces (namely N groups of node characteristics) can be concerned, the concerned probability of low-frequency phrases is improved, the characteristic matrix sparseness problem is reduced, and the text classification accuracy is improved.

In a second aspect, an embodiment of the present application provides a method for training a text classification model, where the method includes:

acquiring P text samples, wherein the P text samples comprise sentences and phrases, and the sentences carry real text types;

constructing P abnormal graphs according to the P text samples;

performing iterative training on the text classification model according to the graph data corresponding to the P heterogeneous graphs until the loss value of a preset loss function reaches the minimum value;

the preset loss function is used for calculating a loss value between the real text type and a predicted text type of each text sample, the predicted text type is determined and obtained based on N groups of node characteristics, the N groups of node characteristics are determined and obtained based on N groups of self-attention mechanisms and graph data corresponding to the P heterogeneous graphs, the N groups of node characteristics correspond to the N groups of attention mechanisms, and P and N are integers greater than 1.

Therefore, in the embodiment of the application, the model to be predicted is subjected to iterative training according to the graph data corresponding to the P heterogeneous graphs, so that the predicted text type of the trained text classification model is close to the real text type, and the accuracy of the prediction result of the trained text classification model is ensured. In addition, in the iterative training process, the predicted text type is obtained by performing graph convolution calculation on the corresponding graph data for N times based on an N-head self-attention mechanism, so that the predicted text type can focus on the characteristics of the nodes in N different spaces (namely N groups of node characteristics), the probability of focusing on the low-frequency phrases is improved, the problem of feature matrix sparsity can be reduced by a trained text classification model, and the accuracy of text classification is further improved. In a third aspect, an embodiment of the present application provides a text classification apparatus, where the apparatus includes:

the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring text information to be classified, and the text information comprises sentences and phrases;

the first construction module is used for constructing a heteromorphic graph according to the sentences and the phrases, and nodes in the heteromorphic graph are composed of the sentences and the phrases;

the first processing module is used for inputting the graph data corresponding to the heterogeneous graph into a text classification model and outputting the classification result of the text information; the classification result is determined and obtained based on N groups of node features, the N groups of node features are determined and obtained based on N-head self-attention mechanisms and the graph data, the N groups of node features correspond to the N-head attention mechanisms, and N is an integer greater than 1.

In a fourth aspect, an embodiment of the present application provides a text classification model training apparatus, where the apparatus includes:

a fifth obtaining module, configured to obtain P text samples, where each text sample in the P text samples includes a sentence and a phrase, and the sentence carries a real text type;

the second construction module is used for constructing P abnormal patterns according to the P text samples;

the training module is used for carrying out iterative training on a model to be trained according to the graph data corresponding to the P heterogeneous graphs;

the second processing module is used for obtaining a text classification model when the loss value of the preset loss function reaches the minimum value;

In a fifth aspect, the present application provides an electronic device, which includes a processor, a memory, and a computer program stored on the memory and executable on the processor, wherein the computer program, when executed by the processor, implements the steps of the method according to the first aspect, or the computer program, when executed by the processor, implements the steps of the method according to the second aspect.

In a sixth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, the computer program, when executed by a processor, implementing the steps of the method according to the first aspect, or the computer program, when executed by the processor, implementing the steps of the method according to the second aspect.

Drawings

Fig. 1 is a flowchart of a text classification method provided in an embodiment of the present application;

fig. 2 is a schematic diagram of an abnormal image corresponding to text information provided in an embodiment of the present application;

fig. 3 is a schematic structural diagram of a text classification model provided in an embodiment of the present application;

fig. 4 is a schematic diagram of a calculation flow of first-order neighborhood node features provided in the embodiment of the present application;

FIG. 5 is a flowchart of a text classification model training method according to an embodiment of the present application;

FIG. 6 is a block diagram of a text classification device in accordance with an embodiment of the present invention;

FIG. 7 is a block diagram of a text classification model training apparatus in accordance with an embodiment of the present invention;

fig. 8 is a block diagram of an electronic device provided in the practice of the invention.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that embodiments of the application may be practiced in sequences other than those illustrated or described herein, and that the terms "first," "second," and the like are generally used herein in a generic sense and do not limit the number of terms, e.g., the first term can be one or more than one. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/" generally means that a preceding and succeeding related objects are in an "or" relationship.

The text classification method provided by the embodiment of the present application is described in detail below with reference to the accompanying drawings through specific embodiments and application scenarios thereof.

Referring to fig. 1, fig. 1 is a flowchart of a text classification method provided in an embodiment of the present application. As shown in fig. 1, the text classification method specifically includes the following steps:

step 101, obtaining text information to be classified, wherein the text information comprises sentences and phrases.

Specifically, the text information to be classified may include, but is not limited to, chinese text information to be classified, english text information, or other forms of text information. The text information may include sentences and phrases. The number of the sentences and phrases in the text information to be classified may be one or more. The phrase may be obtained by performing word segmentation processing on a sentence, or may be another phrase independent of the sentence, and the present application is not particularly limited.

In practical application, the text information to be classified can be acquired in a targeted manner according to different application scenes. For example, when the method is applied to a news classification scene, a certain piece of news can be acquired as text information to be classified to determine the type of the piece of news; when the method is applied to a resume classification scene, a certain resume can be acquired as text information to be classified to determine the type of the resume; when the method is applied to a mail classification scene, a certain mail can be acquired as text information to be classified to determine the type of the mail; when the method is applied to an office document classification scene, a certain office document can be acquired as text information to be classified to determine the type of the office document; when the method is applied to a user attribute mining scene of the application data of the smart phone, the user data of each application program on the smart phone can be acquired as text information to be classified to determine the user attributes and the like.

102, constructing a heteromorphic graph according to the sentences and the phrases, wherein nodes in the heteromorphic graph are composed of the sentences and the phrases.

In particular, the heterogeneous graph may also be referred to as a topological graph, including nodes and edges. The nodes in the abnormal graph are composed of sentences and phrases, each sentence or each phrase corresponds to one node, and the node corresponding to each phrase and the node corresponding to the sentence to which the phrase belongs have one edge. For example, if the text information has V1 sentences and V2 phrases, and V2 phrases belong to V1 sentences and are unique, the number of nodes of the heteromorphic graph corresponding to the text information is V1+ V2, and the number of edges is V2.

Step 103, inputting graph data corresponding to the heterogeneous graph into a text classification model, and outputting a classification result of the text information; the classification result is determined and obtained based on N groups of node features, the N groups of node features are determined and obtained based on N-head self-attention mechanisms and graph data, the N groups of node features correspond to the N-head attention mechanisms, and N is an integer larger than 1.

Specifically, the map data corresponding to the abnormal map includes, but is not limited to, an adjacency matrix, an initial feature matrix, and the like of the abnormal map. The text classification model can be a new Self-attention-based graph convolution neural network (S-GCN for short), and can realize effective classification of text information without combining any prior information. And meanwhile, the Embedding of phrases and sentences can be generated to downstream tasks. The model can calculate attention scores through a self-attention mechanism, weights each node, and adopts a Multi-Head (Multi-Head) method to pay attention to the characteristics of the nodes in different aspects, so that the capturing capability of the characteristics of neighbor nodes is improved.

The number of the N-head self-attention mechanisms may be 2, 3, 4, or any integer greater than 1, and the present application is not particularly limited. For the single-head self-attention mechanism, the importance of each node in the heteromorphic graph can be learned. Specifically, weights can be calculated for all nodes in the abnormal graph through graph convolution calculation, important nodes are selected, attention to key node characteristics is achieved, and influences of useless or noise nodes are weakened. In this embodiment, the text classification model employs an N-headed self-attention mechanism, where the N-headed self-attention mechanism is equivalent to a plurality of independent single-headed self-attention mechanisms, and each self-attention mechanism can learn the importance of each node from different aspects to obtain N sets of node features. The N groups of node features represent the node features concerned from N angles, so that the probability of the low-frequency vocabulary being concerned is improved.

In this embodiment, the text information to be classified may be converted into the image data of the heteromorphic graph, so as to extract the feature data of each phrase and each sentence in the text information, thereby facilitating the subsequent prediction of node classification of the image data based on the text classification model, and obtaining the classification result of the text information. Meanwhile, an N-head self-attention mechanism is adopted in the text classification model, and N times of graph convolution calculation can be carried out on graph data, so that the characteristics of nodes in N different spaces (namely N groups of node characteristics) can be concerned, the concerned probability of low-frequency phrases is improved, the characteristic matrix sparseness problem is reduced, and the text classification accuracy is improved.

Optionally, the heterogeneous graph includes a first node formed by a phrase and a second node formed by a sentence; the text classification method may further include the steps of:

acquiring a word vector of a first node, wherein the word vector is used for indicating the importance degree of a word group corresponding to the first node in a sentence corresponding to a second node;

determining the optimal hyper-parameter of the logistic regression algorithm according to the word vector;

obtaining a regression coefficient corresponding to the first node according to the optimal hyper-parameter;

graph data is determined from the regression coefficients.

Specifically, the first node is a node formed by a phrase, and may be one node or a plurality of nodes; the second node is a node formed by a statement, and may be one node or a plurality of nodes, which is not specifically limited in this application.

In an embodiment, a Logistic Regression (LR) algorithm may be used to calculate a regression Coefficient (Coefficient, for short) corresponding to the first node, and an absolute value of the regression Coefficient obtained through calculation is used as a weight of an edge between the first node and the second node to obtain the graph data. Specifically, the word vector of the first node in the heteromorphic graph may be obtained by a term frequency-inverse document frequency (TF-IDF) algorithm. The word vector can be used for measuring the importance of the word group in the sentence and realizing text vectorization. And inputting the word vector into a preset parameter adjusting algorithm, and adjusting the hyper-parameters of a logistic regression (LR for short) algorithm to determine the optimal hyper-parameters of the logistic regression algorithm. The default parameter tuning algorithm herein may include, but is not limited to, a grid search cross validation (i.e., GridSearchCV) algorithm, a random search (i.e., RandomizedSearchCV) algorithm, and the like. And then obtaining a regression coefficient corresponding to the first node through a logistic regression algorithm with the optimal hyper-parameter. And finally, taking the absolute value of the regression coefficient as the weight of the corresponding edge between the first node and the second node, and generating an adjacent matrix A, namely graph data corresponding to the heterogeneous graph. It should be noted that, the larger the absolute value of the regression coefficient is, the more important the node characteristics of the corresponding node are.

It should be noted that, when a phrase corresponding to a certain first node is included in a sentence corresponding to a certain second node, an edge connected to each other exists between the first node and the second node. For example, it is assumed that a heterogeneous graph corresponding to a certain text information to be analyzed is shown in fig. 2, where the first nodes in the heterogeneous graph include W1, W2, and W3, and the second nodes include S1, S2, and S3. W1, W2, and W3 respectively represent different phrases in the text information, S1, S2, and S3 respectively represent different sentences in the text information, phrases S1 and S2 both include phrases W2, and phrases W1 and W3 are included in S3. Therefore, the word vectors of the phrase W2 in the sentences S1 and S2 and the word vectors of the phrase W1 and the phrase W3 in the sentence S3 can be obtained in the above manner, the optimal hyper-parameter of the logistic regression algorithm is determined and obtained through the 4 word vectors, and then the regression coefficients corresponding to the first nodes W1, W2 and W3 are obtained according to the optimal hyper-parameter and are used as the weights of the 4 edges in the abnormal composition, so that the graph data is obtained.

In this embodiment, the text information is converted into the heteromorphic graph, and the graph data of the heteromorphic graph is obtained, so that graph convolution calculation can be conveniently performed on the graph data of the heteromorphic graph through the text classification model subsequently, the type of the text information is determined, the problem of text classification can be converted into the classification problem of nodes in the heteromorphic graph, and the accuracy of text classification is improved.

Optionally, the text classification model comprises an embedding layer, a first atlas neural network and a second atlas neural network; the step 103 of inputting the graph data corresponding to the heterogeneous graph into the text classification model and outputting the classification result of the text information may specifically include the following steps:

inputting graph data into an embedding layer, and outputting an initial characteristic matrix corresponding to each node in the abnormal graph;

inputting the graph data and the initial feature matrix into a first graph convolution neural network, and outputting first-order neighborhood node features, wherein the first-order neighborhood node features are determined and obtained based on N groups of node features;

and inputting the graph data and the first-order neighborhood node characteristics into a second graph convolution neural network, and outputting second-order neighborhood node characteristics, wherein the second-order neighborhood node characteristics are used for indicating a classification result.

In an embodiment, a schematic structural diagram of the text classification model is shown in fig. 3, and the text classification model includes an embedding layer, a first graph convolution neural network, and a second graph convolution neural network. The embedded layer is used for outputting an initial characteristic matrix corresponding to the graph data according to the input graph data. The initial feature matrix is a unit matrix with a dimension of V × V by default, where V is the total number of nodes in the abnormal graph. The first and second graph convolution neural networks may be used for aggregation of adjacent node features, that is, for feature extraction of adjacent nodes. The first and second convolutional neural networks are different in that the first convolutional neural network is a multi-headed self-attention-based convolutional neural network for outputting first-order neighborhood node features according to input map data and an initial feature matrix, and the second convolutional neural network is a single-headed self-attention-based convolutional neural network for outputting second-order neighborhood node features according to input map data and first-order neighborhood node features.

In practical applications, when the order of the convolutional neural network is greater than 2, the effect of feature extraction cannot be significantly improved, and noise is easily generated when the convolutional neural network of a plurality of layers processes an abnormal graph. Therefore, the text classification model in the present embodiment may extract only neighborhood node features of 1 st order and 2 nd order. The first graph convolution neural network is used for aggregating node information directly adjacent to the central node, namely first-order neighborhood node characteristics; the second graph convolution neural network is used for aggregating information of adjacent nodes of the central node, namely second-order neighborhood node characteristics.

In this embodiment, the first-order neighborhood node features and the second-order neighborhood node features of each node in the heteromorphic image can be respectively obtained through the first graph convolution neural network and the second graph convolution neural network, so that the extraction effect of the node features is improved, and the model can predict the classification result of the text information more accurately.

Optionally, the first graph convolution neural network includes an N-dimensional graph convolution layer, a one-dimensional graph convolution layer, and a full link layer; the step of inputting the graph data and the initial feature matrix into the first graph convolution neural network and outputting the first-order neighborhood node feature may specifically include the following steps:

inputting the graph data and the initial characteristic matrix into an N-dimensional graph convolution layer, and outputting N groups of node weights; the N groups of node weights are determined and obtained based on N-head self-attention mechanisms, the graph data and the initial characteristic matrix, and the N-head self-attention mechanisms are used for carrying out N times of graph convolution calculation on the graph data and the initial characteristic matrix;

inputting the graph data and the initial characteristic matrix into a one-dimensional graph convolution layer, and outputting intermediate neighborhood node characteristics;

performing weighted calculation on each node weight in the N groups of node weights and the characteristics of the intermediate neighborhood nodes respectively to determine N groups of node characteristics;

splicing the N groups of node characteristics;

inputting the spliced N groups of node characteristics into a full-connection layer, and outputting first-order neighborhood node characteristics.

Referring to fig. 4, fig. 4 is a schematic diagram of a calculation flow of first-order neighborhood node features provided in the embodiment of the present application. As shown in fig. 4, the first atlas neural network includes an N-dimensional atlas layer, a one-dimensional atlas layer, and a fully connected layer. The N-dimensional graph convolution layer can be obtained by performing graph convolution calculation on input graph data and an initial characteristic matrix based on an N-head self-attention mechanism, and N groups of node weights are output

Wherein the N sets of node weights

The calculation formula of (a) is as follows:

wherein the content of the first and second substances,

n sets of node weights representing layer l +1

The set of (a) and (b),

is composed of

I is 1,2, …, N. N denotes the total number of self-attention mechanisms of the N-dimensional graph convolution layer, and V denotes the total number of nodes in the anomaly graph. In this embodiment, since only the first layer uses the N-dimensional map convolution layer, l is 0, H^(l)Representing input features at level 0, i.e. H⁽⁰⁾X represents an initial feature matrix corresponding to each node in the heteromorphic graph, and X belongs to R^V×VDefault to an identity matrix.

As a result of the normalization of the adjacency matrix A, i.e.

D is a degree matrix of A, D_ii＝∑_iA_i,j。W_attIs an initial parameter in the N-dimensional map convolutional layer, W_attIs of dimension V × N. For the TopK functionAnd (4) selecting K nodes with the maximum weight values, and setting the weights of other nodes to be 0. The activation function σ may select a Tanh function for the nonlinear stretching of the weights. And (3) calculating weights for all nodes in the abnormal graph by using the graph convolution calculation of the formula (1), selecting important nodes, realizing the attention to key node characteristics and weakening the influence of useless or noise nodes.

Wherein, the one-dimensional graph convolution layer can output the intermediate neighborhood node characteristic H according to the input graph data and the initial characteristic matrix⁽¹⁾The intermediate neighborhood node characteristics H⁽¹⁾The calculation formula of (a) is as follows:

wherein X represents an initial characteristic matrix corresponding to each node in the heteromorphic graph, and X belongs to R^V×VDefault to an identity matrix.

As a result of the normalization of the adjacency matrix A, i.e.

D is a degree matrix of A, D_ii＝∑_iA_i,j。W⁽⁰⁾Expressed as an initial parameter, W, in the one-dimensional map convolution layer⁽⁰⁾Has a dimension of C x F, i.e., W⁽⁰⁾∈R^V×FV denotes the total number of nodes in the heteromorphic graph, and F denotes the output dimension of the first graph convolution neural network.

After acquiring N groups of node weights

And intermediate neighborhood node characteristics H⁽¹⁾Thereafter, the N sets of node weights may be applied

The weight of each node in the system is respectively equal to the characteristics H of the nodes in the middle neighborhood⁽¹⁾Carrying out weighted calculation to obtain N groups of node characteristics

The N sets of node characteristics

The calculation formula of (a) is as follows:

wherein, i is an inner product calculation, i is 1,2, …, N.

Respectively representing node characteristics obtained from different attention mechanisms.

Obtaining N groups of node characteristics in calculation

Then, the N groups of node characteristics after splicing can be obtained by splicing according to the following formula

Where Concat represents a merge or splice of headers.

N groups of node characteristics after splicing

Inputting full connection layer FC, affine transformation is carried out through full connection layer, and node characteristics are obtained

Is restored to H⁽¹⁾Obtaining a first-order neighborhood node characteristic Z⁽¹⁾。

In this embodiment, focus can be derived from by N-dimensional map convolutional layersInformation of different subspaces, namely different phrase nodes (or statement nodes) are concerned at each time, so that the extraction capability of the node features is improved. The different subspaces here, mainly result from the calculation of equation (1), W_attN columns of different initialization parameters form N attention at different angles, N groups of node weights can be obtained, and therefore certain probability is provided to enable some low-frequency words to be concerned. Meanwhile, the first graph convolution neural network has low calculation complexity, and the whole model is guaranteed to be mainly based on matrix calculation.

Optionally, the step of inputting the spliced N groups of node features into the fully-connected layer and outputting the first-order neighborhood node features may include the following steps:

performing affine transformation on the spliced N groups of node characteristics;

residual error calculation is carried out on the N groups of node characteristics after affine transformation and the intermediate neighborhood node characteristics;

and determining the N groups of node characteristics after residual error calculation as first-order neighborhood node characteristics.

Specifically, the first-order neighborhood node feature Z⁽¹⁾The following formula can be used for calculation:

wherein, ReLU is the activation function,

is an initial parameter of the fully-connected layer,

of dimension NF x F, i.e

N represents the total number of self-attention mechanisms for the N-dimensional graph convolution layer and F represents the output dimension of the first graph convolution neural network.

In the formula (5), can be represented by

Splicing N groups of node characteristics obtained by formula (4)

Affine transformation is performed to convert its dimension to the original size, i.e. H⁽¹⁾Is measured in the dimension (d). Meanwhile, residual calculation is carried out through a formula (5), selected important node features and unselected node features are fused, and information of non-important nodes is kept and is not lost.

It should be noted that, the first-order neighborhood node feature Z is obtained through the first graph convolution neural network⁽¹⁾Thereafter, the first-order neighborhood node feature Z may be determined⁽¹⁾Inputting the adjacent matrix A into a second graph convolution neural network to obtain a second-order neighborhood node characteristic Z⁽²⁾. The calculation formula adopted is as follows:

wherein softmax is an activation function, W⁽¹⁾Convolution of the initial parameter of the neural network for the second graph, W⁽¹⁾Has a dimension of F × E, i.e., W⁽¹⁾∈R^F×EF denotes the output dimension of the first convolutional neural network, and E denotes the output dimension of the second convolutional neural network.

In this embodiment, the nodes are selected according to their importance levels, and learning of important nodes is enhanced. In addition, due to the introduction of a multi-head method, a large weight is generally assigned to a rare word with a certain probability, that is, a word with a low frequency of occurrence is also concerned, and the influence of a characteristic sparsity problem is weakened.

Optionally, after the step 101 of acquiring the text information to be classified, the method further includes the following steps:

carrying out data cleaning on the text information;

performing word segmentation processing on the text information after data cleaning;

acquiring a target phrase in the text information after word segmentation, wherein the target phrase is a phrase with a word frequency larger than a preset threshold value in the text information;

the step 102 of constructing the heteromorphic graph according to the sentence and the phrase includes: and constructing a heteromorphic graph according to the sentence and the target phrase.

Specifically, the data cleansing is to process various dirty data in the text information to obtain standard, clean and continuous data. For example, messy codes, erroneous text, and the like in the text information are removed. The word segmentation processing refers to a process of dividing a sentence consisting of continuous characters into an independent phrase according to a certain rule. The word segmentation process can be implemented based on a dictionary word segmentation algorithm or a statistical-based machine learning algorithm. After word segmentation processing is carried out, stop words and phrases with the word frequency smaller than or equal to a preset threshold value can be removed, and the rest phrases are used as target phrases to participate in construction of the different composition. Therefore, the method is beneficial to further reducing the calculation amount of the model and improving the prediction efficiency of text classification.

The embodiment of the application provides a text classification model training method. Referring to fig. 5, fig. 5 is a flowchart of a text classification model training method provided in the embodiment of the present application. The method may comprise the steps of:

step 501, obtaining P text samples, wherein each text sample in the P text samples comprises a sentence and a phrase, and the sentence carries a real text type;

step 502, constructing P heteromorphic graphs according to P text samples;

step 503, performing iterative training on the model to be trained according to the graph data corresponding to the P heterogeneous graphs;

step 504, under the condition that the loss value of the preset loss function reaches the minimum value, a text classification model is obtained; the preset loss function is used for calculating a loss value between a real text type and a predicted text type of each text sample, the predicted text type is determined and obtained based on N groups of node characteristics, the N groups of node characteristics are determined and obtained based on N groups of self-attention mechanisms and graph data corresponding to P heterogeneous graphs, the N groups of node characteristics correspond to the N groups of attention mechanisms, and P and N are integers greater than 1.

Before classifying the text information to be classified by using the text classification model, training the model to be predicted is required to obtain the text classification model. Specifically, P text samples can be obtained, and each sentence in each text sample is labeled with a real text type corresponding to the sentence in an artificial labeling manner. Moreover, P different patterns may be constructed based on the P text samples, and a specific construction method of the different patterns is described in detail in the above embodiments, and is not described herein again. And inputting the graph data corresponding to the P heterogeneous graphs into a model to be predicted for iterative training, and determining N groups of node characteristics based on the N-head self-attention mechanism and the graph data corresponding to the heterogeneous graphs in the iterative training process so as to obtain the predicted text type corresponding to the graph data. Therefore, the loss value between the real text type and the predicted text type after each iteration is calculated based on the preset loss function, and whether the convergence condition of the text classification model is met or not is judged.

Specifically, the preset loss function may include, but is not limited to, a 0-1 loss function, a perceptual loss function, a squared loss function, a Hinge loss function, a logarithmic loss function, and the like. In one embodiment, the method can be implemented by defining a loss function by using classical cross entropy, and the calculation formula is as follows:

wherein D is_trainRepresenting a set of labeled sentence nodes in the text sample. Y is_dfReal text type, Z, representing sentence node_dfFor predicting the text category, E denotes the output dimension of the second graph convolution neural network, f ═ 1,2, …, E. μ is the tuning parameter and θ generally refers to the initial parameter. In experiments, it is found that the performance of the model of the present invention can be improved by adjusting μ, but the model is sensitive to the selection of μ, and therefore the default is set to 0.

According to the embodiment of the application, iterative training can be performed on the model to be predicted according to the graph data corresponding to the P heterogeneous graphs, so that the predicted text type of the trained text classification model is close to the real text type, and the accuracy of the predicted result of the trained text classification model is guaranteed. In addition, in the iterative training process, the predicted text type is obtained by performing graph convolution calculation on the corresponding graph data for N times based on an N-head self-attention mechanism, so that the predicted text type can focus on the characteristics of the nodes in N different spaces (namely N groups of node characteristics), the probability of focusing on the low-frequency phrases is improved, the problem of feature matrix sparsity can be reduced by a trained text classification model, and the accuracy of text classification is further improved.

Optionally, the text classification model training method specifically includes the following steps:

determining the parameter value of the initial parameter of the model to be trained after each iterative training;

in the step 504, obtaining the text classification model when the loss value of the preset loss function reaches the minimum value specifically includes the following steps:

and under the condition that the loss value of the preset loss function reaches the minimum value, determining the parameter value obtained by the current iteration as the parameter value of the initial parameter to obtain a text classification model.

Specifically, the initial parameter may be the initial parameter W used in the above equations (1) to (6)_ayt、W⁽⁰⁾、

W⁽¹⁾And the like.

In the process of performing iterative training on a model to be trained by using a text sample, a preset tuning algorithm is required to be used for initial parameters W in the model to be trained_att、W⁽⁰⁾、

W⁽¹⁾The parameter values of (A) are continuously adjusted and optimized. The preset tuning algorithm may be an adaptive moment estimation (Adam) algorithm or other tuning algorithms. When the loss value of the preset loss function reachesAnd in the case of the minimum value, determining the parameter value obtained by the current iteration as the target parameter value of the initial parameter according to the preset tuning algorithm to obtain a text classification model. When the text information to be classified is classified by using the text classification model subsequently, the text information to be classified can be predicted based on the target parameter value of the initial parameter and the input graph data, so that the classification result of the text information is obtained.

In actual use, four data sets may be employed: the validity of the text classification model S-GCN in this application is verified by one short text data set Bank that reflects the behavior of the self-loan user, and three public short text data sets MR, R8, and Ohsumed.

The method comprises the following steps of firstly, carrying out comparison experiments based on a Bank data set, and comparing a text classification model in the application with a long short-term memory network (LSTM), FastText and other popular text classification and word embedding methods in the experiments. In the experiment, the total data amount was 2000, wherein the evaluation sample was 200 and the unlabeled sample was 200. The main parameters of the algorithm are set as follows: the output dimension F of the first graph convolution neural network is 200, the second graph convolution neural network E is 2, the learning rate E is 0.005, the number of heads from attention N is 8, and TopK retains 2% of nodes. The results of the experiments are shown in the following table:

serial number	Model classes	Rate of accuracy	F1 value	AUC index
						1	Embeding+FC	0.7150	0.7273	0.7150
2	LSTM+FC	0.7350	0.7464	0.7350
					3	Attention+F	0.7400	0.7373	0.7400
4	FastText	0.7300	0.7428	0.7300
					5	Text-GCN	0.7487	0.7524	0.7488
6	S-GCN	0.7950	0.8000	0.7950

Watch 1

As can be seen from the evaluation results in table one, model 1 does not consider the dependency between adjacent words, and therefore has the worst effect. Models 2 and 3 take into account the word-to-word dependency and have better effect than model 1. Model 4 can achieve good results in a shorter time. Model 5 works better than previous algorithms due to the nature of the graph structure. In contrast, the text classification model 6 in the application can well extract relatively rare and highly differentiated node features, and the feature extraction capability is higher. In the experiment, the text classification model in the application has a good effect under various indexes, and the effectiveness of the model is proved.

And secondly, applying the Text classification model S-GCN and the existing Text-GCN model in the application to MR, R8 and Ohsumed public data sets respectively to solve the problem of Text classification. For the sake of fairness, the Text classification model S-GCN in the present application is the same as the graph construction method of the existing Text-GCN model. The results of the experiments are shown in the following table:

Watch two

Through the experiments, the text classification model S-GCN in the application can obtain better results by using the public data set with word sequences. Therefore, in the text classification task, the traditional method often encounters the problems of sparse features and the like, and the performance of the traditional method is influenced to a certain extent. The present application proposes a graph neural network to solve the above-mentioned problems. Firstly, a sentence and a phrase are used as nodes to construct a heteromorphic graph, and edges between the sentence and the phrase nodes are constructed in a mode of calculating regression coefficients, so that a graph related to a corpus is constructed. The graph convolution neural network based on self-attention is provided for graph representation learning and task learning of the graph, the network realizes attention of important nodes through a self-attention mechanism, the multi-head method is adopted to improve the capability of feature extraction, and finally text classification is realized through prediction of statement node attributes. The text classification model S-GCN in the application belongs to a semi-supervised learning method, does not need any prior information, has a simple model structure and higher computational efficiency, and can output the graphpadding of sentences and words for downstream tasks after completing a text classification training task.

Referring to fig. 6, fig. 6 is a structural diagram of a text classification apparatus provided in the implementation of the present invention, and as shown in fig. 6, the apparatus 600 includes:

a first obtaining module 601, configured to obtain text information to be classified, where the text information includes sentences and phrases;

a first construction module 602, configured to construct an abnormal graph according to the sentences and the phrases, where nodes in the abnormal graph are formed by the sentences and the phrases;

the first processing module 603 is configured to input the graph data corresponding to the heterogeneous graph into the text classification model, and output a classification result of the text information; the classification result is determined and obtained based on N groups of node features, the N groups of node features are determined and obtained based on N-head self-attention mechanisms and graph data, the N groups of node features correspond to the N-head attention mechanisms, and N is an integer larger than 1.

Optionally, the heterogeneous graph includes a first node formed by a phrase and a second node formed by a sentence; the apparatus 600 further comprises:

the second acquisition module is used for acquiring a word vector of the first node, and the word vector is used for indicating the importance degree of the phrase corresponding to the first node in the sentence corresponding to the second node;

the first determining module is used for determining the optimal hyper-parameter of the logistic regression algorithm according to the word vector;

the third obtaining module is used for obtaining a regression coefficient corresponding to the first node according to the optimal hyper-parameter;

and the second determination module is used for determining the graph data according to the regression coefficient.

Optionally, the text classification model comprises an embedding layer, a first atlas neural network and a second atlas neural network; the first processing module 603 includes:

the first processing submodule is used for inputting the graph data into the embedding layer and outputting an initial characteristic matrix corresponding to each node in the abnormal graph;

the second processing submodule is used for inputting the graph data and the initial feature matrix into the first graph convolution neural network and outputting first-order neighborhood node features, and the first-order neighborhood node features are determined and obtained based on the N groups of node features;

and the third processing submodule is used for inputting the graph data and the first-order neighborhood node characteristics into a second graph convolution neural network and outputting second-order neighborhood node characteristics, and the second-order neighborhood node characteristics are used for indicating a classification result.

Optionally, the first graph convolution neural network includes an N-dimensional graph convolution layer, a one-dimensional graph convolution layer, and a full link layer; the second processing submodule includes:

the first processing unit is used for inputting the graph data and the initial characteristic matrix into the N-dimensional graph convolution layer and outputting N groups of node weights; the N groups of node weights are determined and obtained based on N-head self-attention mechanisms, the graph data and the initial characteristic matrix, and the N-head self-attention mechanisms are used for carrying out N times of graph convolution calculation on the graph data and the initial characteristic matrix;

the second processing unit is used for inputting the graph data and the initial characteristic matrix into the one-dimensional graph convolution layer and outputting the characteristics of the intermediate neighborhood nodes;

the determining unit is used for respectively carrying out weighted calculation on each node weight in the N groups of node weights and the characteristics of the intermediate neighborhood nodes to determine the N groups of node characteristics;

the splicing unit is used for splicing the N groups of node characteristics;

and the third processing unit is used for inputting the spliced N groups of node characteristics into the full-connection layer and outputting first-order neighborhood node characteristics.

Optionally, the third processing unit is specifically configured to:

Optionally, the apparatus 600 further comprises:

the data cleaning module is used for cleaning data of the text information;

the word segmentation processing module is used for carrying out word segmentation processing on the text information after the data cleaning;

the fourth obtaining module is used for obtaining a target phrase in the text information after word segmentation processing, wherein the target phrase is a phrase in the text information, and the word frequency of the phrase is greater than a preset threshold value;

the first building module 602 is further configured to build a heteromorphic graph according to the sentence and the target phrase.

The text classification device 600 can convert text information to be classified into image data of an abnormal composition, so that feature data of each phrase and each sentence in the text information can be extracted, node classification of the image data can be conveniently predicted based on a text classification model in the follow-up process, and a classification result of the text information can be obtained. Meanwhile, an N-head self-attention mechanism is adopted in the text classification model, and N times of graph convolution calculation can be carried out on graph data, so that the characteristics of nodes in N different spaces (namely N groups of node characteristics) can be concerned, the concerned probability of low-frequency phrases is improved, the characteristic matrix sparseness problem is reduced, and the text classification accuracy is improved.

Referring to fig. 7, fig. 7 is a block diagram of a text classification model training apparatus according to an embodiment of the present invention. As shown in fig. 7, the text classification model training apparatus 700 includes:

a fifth obtaining module 701, configured to obtain P text samples, where each text sample in the P text samples includes a sentence and a phrase, and the sentence carries a real text type;

a second constructing module 702, configured to construct P heteromorphic images according to the P text samples;

the training module 703 is configured to perform iterative training on the model to be trained according to the graph data corresponding to the P heterogeneous graphs;

a second processing module 704, configured to obtain a text classification model when a loss value of the preset loss function reaches a minimum value;

the preset loss function is used for calculating a loss value between a real text type and a predicted text type of each text sample, the predicted text type is determined and obtained based on N groups of node characteristics, the N groups of node characteristics are determined and obtained based on N groups of self-attention mechanisms and graph data corresponding to P heterogeneous graphs, the N groups of node characteristics correspond to the N groups of attention mechanisms, and P and N are integers greater than 1.

Optionally, the text classification model training apparatus 700 further includes:

the determining submodule is used for determining the parameter value of the initial parameter of the model to be trained after each iterative training;

the second processing module 704 is further configured to determine a parameter value obtained by current iteration as a target parameter value of the initial parameter under the condition that the loss value of the preset loss function reaches the minimum value, so as to obtain a text classification model.

The text classification model training device 700 of this embodiment may perform iterative training on the model to be predicted according to the graph data corresponding to the P heterogeneous graphs, and may ensure that the predicted text type of the trained text classification model is close to the true text type, thereby ensuring the accuracy of the prediction result of the trained text classification model. In addition, in the iterative training process, the predicted text type is obtained by performing graph convolution calculation on the corresponding graph data for N times based on an N-head self-attention mechanism, so that the predicted text type can focus on the characteristics of the nodes in N different spaces (namely N groups of node characteristics), the probability of focusing on the low-frequency phrases is improved, the problem of feature matrix sparsity can be reduced by a trained text classification model, and the accuracy of text classification is further improved.

Referring to fig. 8, fig. 8 is a structural diagram of an electronic device provided in the implementation of the present invention, and as shown in fig. 8, the electronic device 800 includes: the processor 801, the memory 802, and a computer program stored in the memory 802 and capable of running on the processor are coupled together through the bus interface 803, and when being executed by the processor 801, the computer program implements the processes of the text classification method embodiment or the processes of the text classification model training method embodiment, and can achieve the same technical effects, and therefore, the details are not repeated here to avoid repetition.

The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements each process of the above-mentioned text classification method embodiment or each process of the above-mentioned text classification model training method embodiment, and can achieve the same technical effect, and in order to avoid repetition, the details are not repeated here. The computer-readable storage medium includes, for example, a Read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A method of text classification, the method comprising:

2. The method of claim 1, wherein the heteromorphic graph comprises a first node formed by the phrase and a second node formed by the sentence; the method further comprises the following steps:

obtaining a word vector of the first node, wherein the word vector is used for indicating the importance degree of the word group corresponding to the first node in the sentence corresponding to the second node;

determining an optimal hyper-parameter of a logistic regression algorithm according to the word vector;

determining the graph data from the regression coefficients.

3. The method of claim 1, wherein the text classification model comprises an embedding layer, a first atlas neural network, and a second atlas neural network; the step of inputting the graph data corresponding to the heterogeneous graph into a text classification model and outputting the classification result of the text information includes:

inputting the graph data into the embedding layer, and outputting an initial feature matrix corresponding to each node in the abnormal graph;

inputting the graph data and the initial feature matrix into the first graph convolution neural network, and outputting first-order neighborhood node features, wherein the first-order neighborhood node features are determined and obtained based on the N groups of node features;

and inputting the graph data and the first-order neighborhood node characteristics into the second graph convolution neural network, and outputting second-order neighborhood node characteristics, wherein the second-order neighborhood node characteristics are used for indicating the classification result.

4. The method of claim 3, wherein the first atlas neural network comprises an N-dimensional atlas layer, a one-dimensional atlas layer, and a fully connected layer; inputting the graph data and the initial feature matrix into the first graph convolution neural network, and outputting a first-order neighborhood node feature, including:

inputting the graph data and the initial feature matrix into the N-dimensional graph convolution layer, and outputting N groups of node weights; the N groups of node weights are determined based on the N-head self-attention mechanism, the graph data and the initial feature matrix, and the N-head self-attention mechanism is used for carrying out N times of graph convolution calculation on the graph data and the initial feature matrix;

inputting the graph data and the initial characteristic matrix into the one-dimensional graph convolution layer and outputting intermediate neighborhood node characteristics;

performing weighted calculation on each node weight in the N groups of node weights and the intermediate neighborhood node characteristics respectively to determine N groups of node characteristics;

splicing the N groups of node characteristics;

inputting the spliced N groups of node characteristics into the full-connection layer, and outputting the first-order neighborhood node characteristics.

5. The method of claim 4, wherein the inputting the N sets of node features after the stitching into the fully-connected layer and outputting the first-order neighborhood node features comprises:

and determining the N groups of node characteristics after residual error calculation as the first-order neighborhood node characteristics.

6. The method of claim 1, wherein after obtaining the text information to be classified, the method further comprises:

performing data cleaning on the text information;

constructing a heteromorphic graph according to the sentence and the phrase, wherein the constructing the heteromorphic graph comprises the following steps: and constructing a heteromorphic graph according to the sentence and the target phrase.

7. A method for training a text classification model, the method comprising:

acquiring P text samples, wherein each text sample in the P text samples comprises a sentence and a phrase, and the sentence carries a real text type;

constructing P abnormal graphs according to the P text samples;

performing iterative training on a model to be trained according to the graph data corresponding to the P heterogeneous graphs;

under the condition that the loss value of the preset loss function reaches the minimum value, a text classification model is obtained;

the preset loss function is used for calculating a loss value between the real text type and a predicted text type of each text sample, the predicted text type is determined and obtained based on N groups of node characteristics, the N groups of node characteristics are determined and obtained based on N groups of self-attention mechanisms and graph data corresponding to the P heterogeneous graphs, the N groups of node characteristics correspond to the N groups of attention mechanisms, and both P and N are integers greater than 1.

8. The method of claim 7, further comprising:

under the condition that the loss value of the preset loss function reaches the minimum value, obtaining a text classification model, including:

and under the condition that the loss value of the preset loss function reaches the minimum value, determining the parameter value obtained by current iteration as the target parameter value of the initial parameter to obtain a text classification model.

9. An apparatus for classifying text, the apparatus comprising:

10. An apparatus for training a text classification model, the apparatus comprising:

11. An electronic device comprising a processor, a memory and a computer program stored on the memory and being executable on the processor, the computer program, when executed by the processor, implementing the steps of the text classification method according to any one of claims 1 to 6 or the computer program, when executed by the processor, implementing the steps of the text classification model training method according to any one of claims 7 to 8.

12. A computer-readable storage medium, characterized in that a computer program is stored thereon, which computer program, when being executed by a processor, carries out the steps of the text classification method according to one of the claims 1 to 6, or which computer program, when being executed by the processor, carries out the steps of the text classification model training method according to one of the claims 7 to 8.