CN114817538B

CN114817538B - Training method of text classification model, text classification method and related equipment

Info

Publication number: CN114817538B
Application number: CN202210443637.2A
Authority: CN
Inventors: 赵宏宇; 蒋宁; 王洪斌; 吴海英
Original assignee: Mashang Consumer Finance Co Ltd
Current assignee: Mashang Consumer Finance Co Ltd
Priority date: 2022-04-26
Filing date: 2022-04-26
Publication date: 2023-08-08
Anticipated expiration: 2042-04-26
Also published as: CN114817538A

Abstract

The application discloses a training method of a text classification model, a text classification method and related equipment. The training method comprises the following steps: text classification processing is carried out on the predicted document samples based on the characteristic information of the first relation diagram through the pre-training classification network, so that first classification reference information is obtained, wherein the first relation diagram comprises document nodes corresponding to training document samples, document nodes corresponding to the predicted document samples, word nodes corresponding to words contained in each document sample and connecting edges among the nodes; determining semantic guidance information of the first relation diagram based on the characteristic information and the structure information of the first relation diagram through the graphic neural network, and performing text classification processing on the predicted document sample based on the semantic guidance information, the characteristic information, the structure information and the class labels corresponding to the training document sample to obtain second classification reference information; and optimizing network parameters of the graph neural network based on the first classification reference information, the second classification reference information and the class labels corresponding to the predicted document samples.

Description

Training method of text classification model, text classification method and related equipment

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a training method for a text classification model, a text classification method, and related devices.

Background

The text classification task is a basic task in the field of natural language processing (Natural Language Processing, NLP), and can be widely applied to e-commerce, finance and other businesses.

At present, the traditional text classification task is based on a text classification model based on a convolutional neural network (ConvolutionalNeural Network, CNN), a cyclic neural network (Recurrent Neural Network, RNN) and other architectures, and the text classification model is trained by using a corpus so as to have the capability of executing the classification task. However, the text classification model has poor performance, so that the convergence speed is low in the training process, and the text classification model obtained by training cannot be used for quickly and accurately classifying the text.

Therefore, a solution capable of improving the convergence speed and recognition accuracy of the text classification model is needed.

Disclosure of Invention

The embodiment of the application provides a training method of a text classification model, a text classification method and related equipment, which are used for improving the convergence speed and the recognition accuracy of the text classification model.

In order to achieve the above purpose, the embodiment of the present application adopts the following technical scheme:

in a first aspect, an embodiment of the present application provides a training method for a text classification model, including:

based on the document sample set, acquiring structural information and characteristic information of the first relation graph; the document sample set comprises a training document sample and a predicting document sample, the first relation graph comprises a plurality of nodes and connecting edges among the nodes, and the nodes comprise document nodes corresponding to the training document sample, document nodes corresponding to the predicting document sample and word nodes corresponding to words contained in the document sample set;

performing text classification processing on the predicted document sample based on the characteristic information through a pre-training classification network in a text classification model to obtain first classification reference information of the predicted document sample;

determining semantic guidance information of the first relation graph based on the feature information and the structural information through a graph neural network in the text classification model;

performing text classification processing on the predicted document sample through the graph neural network based on the semantic guidance information, the characteristic information, the structural information and the class label corresponding to the training document sample to obtain second classification reference information of the predicted document sample;

And optimizing network parameters of the graph neural network based on the first classification reference information, the second classification reference information and the class labels corresponding to the predicted document samples.

It can be seen that, in the embodiment of the present application, the document sample and the words contained therein are respectively used as the nodes in the relationship graph, and the association relationship between the nodes is represented by the connection edges between the nodes, so that the text classification task is regarded as the node classification task.

On the basis, a text classification model framework comprising a pre-training classification network and a graph neural network is adopted, characteristic information of a relation graph is learned by the pre-training classification network so as to carry out text classification processing on a predicted document sample, characteristic information and structural information of the relation graph are learned by the graph neural network, association relations among nodes in the relation graph, node representations of the nodes and class labels corresponding to the training document sample are fully utilized to carry out text classification processing on the predicted document sample, classification reference information obtained by the pre-training classification network and the graph neural network and class labels corresponding to the predicted document sample are synthesized, network parameters of the graph neural network are optimized, learning capacity and classification reference information of the two networks can be fully fused, and performance of the text classification model is further improved.

In addition, when the graph neural network is used for text classification processing, semantic guidance information of the relation graph is determined based on the characteristic information and the structure information of the relation graph, the graph neural network is further used for text classification processing of the predicted document sample based on the characteristic information, the structure information, the semantic guidance information and the class labels corresponding to the training document samples of the relation graph, and the semantic guidance information can reflect the semantics of the document sample and the words contained in the document sample, so that semantic guidance can be provided for text classification processing tasks of the predicted document sample, and therefore, the graph neural network can infer node representations of the document nodes corresponding to the predicted document sample by means of node representations of the document nodes corresponding to the training document sample in the relation graph to realize text classification, and the graph neural network can be focused on important node representations with rich semantic information, so that convergence can be fast, and convergence speed of a text recognition model is improved.

In a second aspect, an embodiment of the present application provides a text classification method, including:

based on the target document set, obtaining structural information and characteristic information of the second relation graph; the target document set comprises a document to be classified and a classified document, the second relation graph comprises a plurality of nodes and connecting edges between the nodes, and the nodes comprise document nodes corresponding to the document to be classified, document nodes corresponding to the classified document and word nodes corresponding to words contained in the documents in the target document set;

Performing text classification processing on the document to be classified based on the characteristic information through a pre-training classification network in a text classification model to obtain first classification reference information of the document to be classified;

determining semantic guidance information of the second relationship graph based on the feature information and the structural information through a graph neural network in the text classification model;

performing text classification processing on the document to be classified through a graphic neural network in the text classification model based on the semantic guidance information, the characteristic information, the structural information and the category to which the classified document belongs to, so as to obtain second classification reference information of the document to be classified;

and determining the category of the document to be classified based on the first classification reference information and the second classification reference information of the document to be classified.

It can be seen that, in the embodiment of the present application, the document to be classified, the classified document and the words included in each document are respectively used as nodes in the relationship graph, and the association relationship between the nodes is represented by the connection edges between the nodes, so that the text classification task is regarded as a node classification task, and because the relationship graph is richer than the document itself, the text classification task is executed by the text classification model based on the feature information and the structure information of the relationship graph, so that the text classification model can acquire richer knowledge, and is beneficial to improving the text classification accuracy; on the basis, a text classification model framework comprising a pre-training classification network and a graphic neural network is adopted, the feature information of a relation graph is learned by the pre-training classification network so as to carry out text classification processing on a document to be classified, the feature information and the structure information of the relation graph are learned by the graphic neural network, the association relation among nodes in the relation graph, node representation of the nodes and the category to which the classified document belongs are fully utilized to carry out text classification processing, and then classification reference information obtained by the pre-training classification network and the graphic neural network is synthesized to determine the category to which the document to be classified belongs, so that the learning capability and the prediction result of the two networks can be fully fused, and the text classification accuracy is further improved; in addition, when the graph neural network is used for text classification processing, semantic guidance information of the relationship graph is determined based on the characteristic information and the structure information of the relationship graph, and the graph neural network is further used for text classification processing of the unclassified document based on the characteristic information, the structure information, the semantic guidance information and the category to which the classified document belongs, and as the semantic guidance information can reflect the semantics of each document and the words contained in each document, semantic guidance can be provided for text classification processing tasks of the documents to be classified, so that the graph neural network can infer node representations of the unclassified nodes by means of node representations of the classified nodes in the relationship graph to realize text classification, and the graph neural network can be focused on important node representations with rich semantic information, so that classification results can be obtained quickly, and text recognition efficiency is improved.

In a third aspect, an embodiment of the present application provides a training device for a text classification model, including:

the acquisition unit is used for acquiring structural information and characteristic information of the first relation diagram based on the document sample set; the document sample set comprises a training document sample and a predicting document sample, the first relation graph comprises a plurality of nodes and connecting edges among the nodes, the nodes comprise document nodes corresponding to the training document sample, document nodes corresponding to the predicting document sample and word nodes corresponding to words contained in the document sample set, the structure information is used for representing edge weights corresponding to the connecting edges in the first relation graph, and the characteristic information comprises node representations of the nodes in the first relation graph;

the classification unit is used for carrying out text classification processing on the predicted document sample based on the characteristic information through a pre-training classification network in the text classification model to obtain first classification reference information of the predicted document sample;

a semantic processing unit for determining semantic guidance information of the first relationship graph based on the feature information and the structure information through a graph neural network in the text classification model;

The classification unit is used for carrying out text classification processing on the predicted document sample based on the semantic guidance information, the characteristic information, the structural information and the class label corresponding to the training document sample through a graph neural network in the text classification model to obtain second class reference information of the predicted document sample;

and the optimizing unit is used for optimizing the network parameters of the graph neural network based on the first classification reference information, the second classification reference information and the class labels corresponding to the predicted document samples.

In a fourth aspect, an embodiment of the present application provides a text classification apparatus, including:

the acquisition unit is used for acquiring structural information and characteristic information of the second relation graph based on the target document set; the target document set comprises a document to be classified and a classified document, the second relation graph comprises a plurality of nodes and connecting edges between the nodes, and the nodes comprise document nodes corresponding to the document to be classified, document nodes corresponding to the classified document and word nodes corresponding to words contained in the documents in the target document set;

the classification unit is used for carrying out text classification processing on the document to be classified based on the characteristic information through a pre-training classification network in the text classification model to obtain first classification reference information of the document to be classified;

A semantic processing unit for determining semantic guidance information of the second relationship graph based on the feature information and the structure information through a graph neural network in the text classification model;

the classifying unit is used for performing text classification processing on the document to be classified based on the semantic guidance information, the characteristic information, the structural information and the category to which the classified document belongs through a graph neural network in the text classifying model to obtain second classification reference information of the document to be classified;

the classification unit is used for determining the category to which the document to be classified belongs based on the first classification reference information and the second classification reference information of the document to be classified.

In a fifth aspect, embodiments of the present application provide an electronic device, including:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the method according to the first or second aspect.

In a sixth aspect, embodiments of the present application provide a computer readable storage medium, which when executed by a processor of an electronic device, enables the electronic device to perform the method of the first or second aspect.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:

FIG. 1 is a flow chart of a training method of a text classification model according to an embodiment of the present application;

FIG. 2 is a schematic structural diagram of a first relationship diagram according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a text classification model according to an embodiment of the present application;

FIG. 4 is a diagram illustrating performance comparisons of different text classification models;

FIG. 5 is a flow chart of a text classification method according to an embodiment of the present application;

FIG. 6 is a flow chart of a text classification method according to another embodiment of the present application;

FIG. 7 is a schematic structural diagram of a training device for text classification models according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a text classification device according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

For the purposes, technical solutions and advantages of the present application, the technical solutions of the present application will be clearly and completely described below with reference to specific embodiments of the present application and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

The terms "first," "second," and the like in the description and in the claims, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application may be implemented in sequences other than those illustrated or described herein. Furthermore, in the present specification and claims, "and/or" means at least one of the connected objects, and the character "/" generally means a relationship in which the associated object is an "or" before and after.

Because the number of training document samples marked with category labels is limited, the text classification model based on a CNN or RNN architecture cannot be guaranteed to learn enough useful characteristic information from the document samples so as to influence the classification accuracy of the text classification model; on the basis, a text classification model framework comprising a pre-training classification network and a graph neural network is adopted, characteristic information of a relation graph is learned by the pre-training classification network so as to carry out text classification processing on a predicted document sample, characteristic information and structural information of the relation graph are learned by the graph neural network, association relations among nodes in the relation graph, node representations of the nodes and class labels corresponding to the training document sample are fully utilized, text classification processing is carried out on the predicted document sample, classification reference information obtained by the pre-training classification network and the graph neural network respectively and the class labels corresponding to the predicted document sample are synthesized, network parameters of the graph neural network are optimized, learning capacity and classification reference information of the two networks can be fully fused, and the performance of the text classification model is further improved; in addition, when the graph neural network is used for text classification processing, semantic guidance information of the relation graph is determined based on the characteristic information and the structure information of the relation graph, the graph neural network is further used for text classification processing of the predicted document sample based on the characteristic information, the structure information, the semantic guidance information and the class labels corresponding to the training document samples of the relation graph, and the semantic guidance information can reflect the semantics of the document sample and the words contained in the document sample, so that semantic guidance can be provided for text classification processing tasks of the predicted document sample, and therefore, the graph neural network can infer node representations of the document nodes corresponding to the predicted document sample by means of node representations of the document nodes corresponding to the training document sample in the relation graph to realize text classification, and the graph neural network can be focused on important node representations with rich semantic information, so that convergence can be fast, and convergence speed of a text recognition model is improved.

Based on the training method of the text classification model, the embodiment of the application also provides a text classification method, and text recognition can be rapidly and accurately carried out by using the text classification model obtained through training.

It should be understood that the training method and the text classification method for the text classification model provided in the embodiments of the present application may be performed by an electronic device or software installed in the electronic device. The electronic devices referred to herein may include terminal devices such as smartphones, tablet computers, notebook computers, desktop computers, intelligent voice interaction devices, intelligent home appliances, smart watches, vehicle terminals, aircraft, etc.; alternatively, the electronic device may further include a server, such as an independent physical server, or may be a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server that provides a cloud computing service.

The following describes in detail the technical solutions provided by the embodiments of the present application with reference to the accompanying drawings.

Referring to fig. 1, a flowchart of a training method of a text classification model according to an embodiment of the present application is provided, and the method may include the following steps:

s102, based on the document sample set, acquiring structural information and characteristic information of the first relation diagram.

Wherein the first relationship graph includes a plurality of nodes and connecting edges between the nodes. Specifically, the plurality of nodes include document nodes corresponding to training document samples, document nodes corresponding to predicted document samples, and word nodes corresponding to words contained in document samples in the document sample set. The connection edges between the nodes are used for indicating that the association relationship exists between the nodes. The connection edges between the nodes in the first relation graph comprise a first type connection edge and a second type connection edge, wherein the first type connection edge is used for representing the association relation between word nodes, and the second type connection edge is used for representing the association relation between document nodes and word nodes.

In the embodiment of the present application, the first relationship diagram may be established in any suitable manner, and may specifically be selected according to actual needs, which is not limited in the embodiment of the present application. In an optional implementation manner, to accurately reflect the association relationship between different document samples and the association relationship between the document samples and the words, the training method according to the embodiment of the present application may further include, before S102, establishing a first relationship diagram based on the training document samples and the predicted document samples in the document sample set: determining a word set corresponding to the document sample set based on words contained in the document sample set, wherein the words contained in the document sample set comprise words in a training document sample and words in a prediction document sample; then, aiming at each word in the word set, creating word nodes corresponding to each word, and creating a first type of connecting edge between different word nodes meeting a first creating edge condition, wherein if two words in the word set are simultaneously present in the same document sample, the word nodes corresponding to the two words respectively meet the first creating edge condition; if the two words do not appear in the same document sample at the same time, the word nodes corresponding to the two words do not meet the first creating edge condition; creating a document node corresponding to a training document sample and a document node corresponding to a predicted document sample aiming at the document samples in the document sample set, and creating a second type connecting edge between the document node and the word node which meet a second creating edge condition, wherein if a certain document sample (the training document sample or the predicted document sample) contains a certain word, the document node corresponding to the document sample and the word node corresponding to the word are indicated to meet the second creating edge condition; if a certain document sample does not contain a certain word, the document node corresponding to the document sample and the word node corresponding to the word do not meet the second creating edge condition. Thus, the first type of connection edge may represent that the terms corresponding to the connected term nodes have an association relationship, and the second type of connection edge may represent that the relationship between the document sample corresponding to the connected document node and the term corresponding to the term node is contained.

For example, the document sample set includes three document samples, namely a document sample 1, a document sample 2 and a document sample 3, the document sample 1 includes a word a, the document sample 2 includes a word B and a word C, the document sample 3 includes a word C and a word D, and further, it can be determined that the word set corresponding to the document sample set is { a, B, C, D }, by the above embodiment, a first relationship diagram as shown in fig. 2 can be established, in which a connection dotted line between word nodes represents a first type of connection edge, and a connection solid line between a word node and a document node represents a second type of connection edge. It should be noted that, in the embodiment of the present application, the document may include, for example, but not limited to, sentences, paragraphs, and the like.

In this embodiment of the present application, the structural information of the first relationship graph refers to information for reflecting structural features of the first relationship graph (such as association relationships between document samples or terms corresponding to nodes), and may specifically include edge weights corresponding to each connecting edge in the first relationship graph, where the edge weights corresponding to each connecting edge in the first relationship graph include edge weights of the first type of connecting edge and edge weights of the second type of connecting edge. The edge weight corresponding to the connection edge may reflect the degree of association between the nodes connected by the connection edge, for example, if the edge weight corresponding to the connection edge is greater, the association between the nodes connected by the connection edge is tighter.

In an alternative implementation manner, considering that documents with the same category generally include the same words, for example, documents with positive emotion generally include words with positive emotion such as "good", "like", etc., in order that the structural information of the first relationship graph can accurately reflect the association relationship between the document sample and the words and the association relationship between different words, so that the text recognition model can learn the differences between documents with different categories, the commonalities between documents with the same category, and the influence of the words in the documents on the category of the documents, in S102, the obtaining the structural information of the first relationship graph may be specifically implemented as follows:

(1) For a first type of connection edge between word nodes, determining point mutual information (Point Mutual Information, PMI) between a first word and a second word based on the probability that any first word and second word in the word set each appear in a document sample set and the probability that the first word and the second word appear in the same document sample; and determining the edge weight corresponding to the first type connecting edge between the word node corresponding to the first word and the word node corresponding to the second word based on the point mutual information.

Specifically, the probability of each term appearing in the document sample set may be determined according to a ratio between the number of times each term appears in the document sample and the total number of terms contained in the term set corresponding to the document sample set. The probability that the first term and the second term occur in the same document sample may be determined based on a ratio between a number of times the first term and the second term occur in the same document sample and a total number of terms contained in a set of terms corresponding to the set of document samples. Further, the point mutual information between the first word and the second word can be determined as the edge weight corresponding to the first type connecting edge between the word node corresponding to the first word and the word node corresponding to the second word.

The point mutual information between two words is used to represent the correlation between the two words, and if the point mutual information between the two words is larger, the more correlated the two words are. The mutual information between the first word and the second word may be determined by means commonly used in the art, such asWherein PMI (i, j) represents point mutual information between the first word and the second word, p (i) represents probability that the first word appears in the document sample set, p (j) represents the number of times that the second word appears in the document sample set, and p (i, j) represents probability that the first word and the second word appear in the same document sample.

(2) For a second type connecting edge between a word node and a document node, determining the importance degree of the target word to the target document sample based on the probability that any one target word in the word set appears in the target document and the number of document samples containing the target word in the document sample set, and determining the edge weight corresponding to the second type connecting edge between the word node corresponding to the target word and the document node corresponding to the target document sample based on the importance degree.

In particular, the importance of a Term to a document may be expressed in terms of Term Frequency-inverse document Frequency (Term Frequency-Inverse Document Frequency, TF-IDF). Wherein the Term Frequency (TF) of the term represents the frequency with which the term appears in the document; the Inverse Document Frequency (IDF) represents the prevalence of a term, which in embodiments of the present application may be determined based on the document sample containing the term and the number of document samples in the document sample set; further, the product of the word frequency and the inverse document frequency can be used as the word frequency-inverse document frequency of the word. Further, the importance degree of the target word to the target document sample can be determined as the edge weight corresponding to the second type of connecting edge between the word node corresponding to the target word and the document node corresponding to the target document sample.

In this embodiment of the present application, the feature information of the first relationship graph refers to information for reflecting features of each node in the first relationship graph, and may specifically include node representations of each node in the first relationship graph. The node representation of the document node refers to a representation vector of the document feature of the document sample corresponding to the document node, and the node representation of the word node refers to a representation vector of the word feature of the word corresponding to the word node. The node representations of the nodes in the first relationship graph may be obtained by various technical means in the art, which is not limited in this embodiment of the present application.

S104, performing text classification processing on the predicted document sample based on the characteristic information of the first relation diagram through a pre-training classification network in the text classification model to obtain first classification reference information of the predicted document sample.

As shown in fig. 3, the text classification model in the embodiment of the present application includes a pre-training classification network, where the pre-training classification network refers to a pre-trained network with text classification capability. Because the characteristic information of the first relation diagram reflects the characteristics of the document sample and the words contained in the document sample, the pre-training classification network performs text classification processing on the predicted document sample based on the characteristic information of the first relation diagram to obtain first classification reference information of the predicted document sample. Wherein the first classification reference information may include a feature vector for representing a classification result of the predicted document sample.

In this embodiment of the present application, the pre-training classification network may have any suitable structure, and may specifically be set according to actual needs, which is not limited in this embodiment of the present application. In an alternative implementation, to increase the recognition accuracy of the pre-trained classification network, as shown in fig. 3, the pre-trained classification network may include a language representation layer and a fully-connected layer.

The language representation layer is used for carrying out Embedding processing (Embedding) on the characteristic information of the first relation graph to obtain the embedded vector of each node in the first relation graph. In order to enable the embedded vector of the full-connection layer to be input and contain rich semantic information, that is, even if the embedded vector of the word node can better reflect the meaning of the word corresponding to the word node in the document sample and enable the embedded vector of the document node to better reflect the true intention of the document sample corresponding to the document node, the full-connection layer can adopt a pre-training language model ((Bidirectional Encoder Representations from Transformers, BERT) which adopts a transform-based bi-directional encoder architecture, and when the node representation of the node corresponding to each of the document sample or word is embedded, not only the document sample or word itself but also the context information of the document sample is considered, so that the obtained embedded vector has richer semantic information, thereby being beneficial to accurately identifying the category to which the document sample belongs by the full-connection layer.

In order to enable the capability of the pre-training classification network to perform classification recognition on large-scale documents to be classified, the language representation layer may be configured to perform embedding processing on node representations of respective document nodes in the first relationship graph to obtain corresponding embedded vectors, and set zero vectors to the embedded vectors of word nodes in the first relationship graph, that isWherein X represents the embedded vector of the language representation layer output, < >> n _word Representing the number of document nodes in the first relation diagram, n _doc Representing the number of word nodes in the first relationship graph, d representing the dimension of the embedded vector, X _doc An embedding vector representing a document node, and 0 represents an embedding vector of a word node.

The full connection layer is used as an output layer of the pre-training classification network and is used for classifying and identifying the document sample based on the embedded vector of each node in the first relation diagram to obtain first classification reference information of the document sample, namelyWherein (1)>First classification reference information representing a document sample, l represents a layer index in the pre-trained classification network, i.e. the number of layers belonging to the pre-trained classification network, X ^(l) Feature vectors representing nodes in the first relation graph, < >>Network parameters representing the full connectivity layer, +.>C represents the dimension of the feature vector of the node in the first relation diagram, E represents the dimension of the first classification reference information of the document sample, R represents the document sample set, and Bert represents the fully connected layer. Thus, the fully connected layer may act as a fine tuning for the language representation layer, enabling embedded vectors output by the language representation layer to be used for downstream tasks, such as text classification tasks in embodiments of the present application.

S106, determining semantic guidance information of the first relation diagram based on the characteristic information and the structural information of the first relation diagram through a graphic neural network in the text classification model.

As shown in fig. 3, the text classification model in the embodiment of the application further includes a graph neural network, where the graph neural network performs classification recognition tasks through network topology structures and node content information of the graph. In the embodiment of the application, the structural information, the characteristic information and the class labels corresponding to the training document samples of the first relation graph are input into the graph neural network through the graph neural network, so that the graph neural network can learn the association relation among all nodes in the first relation graph and the node representation reflecting the node characteristics based on the class labels corresponding to the training document samples, master the association relation among different document samples, the association relation among the document samples and the words and the characteristics of the document samples and the words, and further conduct text classification processing on the predicted document samples by utilizing the related information of the training document samples to obtain second classification reference information of the document samples.

In the embodiment of the application, in order to improve the classification accuracy, the graph neural network may be a graph roll-up network (Graph Convolutional Network, GCN). The graph convolutional network includes convolutional layers, which may be graph convolutional layers, for example. The graph convolution layer is a processing layer for carrying out convolution operation on graph data, and features of each node in the graph are generated by utilizing the learned function mapping and features of the node and features of neighboring nodes thereof by learning a function mapping, so that feature propagation among the nodes in the graph is realized, and the classification of unclassified nodes is identified by utilizing the features of classified nodes. In the embodiment of the application, for each node in the relation graph, the convolution layer updates the characteristics of the node by using the characteristics of the neighboring nodes of the node to obtain new characteristics of the node, so that characteristic propagation among each node in the relation graph is realized, and the category to which the predicted document sample belongs is identified by using the node representation of the document node corresponding to the training document sample and the category label corresponding to the training document sample.

Specifically, the convolution layers in the graph neural network can be expressed as the following formula (1) and formula (2):

wherein H is ^(l+1) The output result of the convolution layer is represented by l, namely the layer index of the convolution layer, GCN represents convolution operation, A represents the adjacent matrix of the structural information of the first relation diagram, and the normalized form isD represents a degree matrix of the adjacency matrix, PMI (i, j) represents a variable edge weight corresponding to a first type connecting edge between a word node corresponding to the i-th word and a word node corresponding to the j-th word, TF-IDF (i, j) represents an edge weight corresponding to a second type connecting edge between a word node corresponding to the i-th word and a document node corresponding to the j-th document sample, and X ^(l) Representing input data of a convolution layer, X E R ^V×C Representing the number of nodes in the first relationship graph, C representing the dimension of the input data of the convolution layer, W ^(l) Network parameters representing convolutional layers, W ε R ^C×F F represents the dimension of the output data of the convolution layer, R represents the set of document samples, σ represents the activation function.

Considering that the structure information and the feature information of the relation graph have large data volume, the graph neural network has a large learning content, so that the convergence speed is low, and the training efficiency of the text classification model is affected.

Specifically, the semantic guidance information of the first relationship graph can be used for expressing the semantic importance of the nodes of each node in the first relationship graph to the predicted document sample, so that the convolution layer focuses on important node features with rich semantic information when executing feature propagation among nodes, and the convergence speed of the graph neural network, namely the convergence speed of the text recognition model and the training efficiency of the text recognition model are improved.

In an alternative implementation, the semantic guidance information of the first relationship graph may be determined by introducing a scoring mechanism for node importance in the graph neural network, and the feature information of the first relationship graph includes importance scores of the nodes in the first relationship graph. As shown in fig. 3, the graph neural network may include a node scoring layer for determining importance scores of nodes in the first relationship graph based on the self-attention mechanism through structural information and feature information of the first relationship graph. It will be appreciated that, since the Attention machine (Attention) is able to screen out a small amount of important information from a large amount of information and focus on these important information, ignoring most of the unimportant information, while the Self Attention mechanism (Self Attention) is a variant of the Attention mechanism, which is able to reduce the dependence on external information, better capture the internal relevance of data or features, capture the association relationships between nodes and the features of the nodes in the first relationship graph by the Self Attention mechanism, and evaluate the importance of each node more accurately and objectively, that is, the obtained importance score of each node is more accurate and more objective, which is beneficial to the second convolution layer being able to focus on the features of the important nodes in the first relationship graph, thereby facilitating the rapid convergence of the entire graph neural network during training.

More specifically, in order to ensure that the second convolution layer can focus on the features of the important nodes, the features of most unimportant nodes are ignored, so as to further increase the convergence speed of the graph neural network, the above S106 may be implemented as: the node scoring layer is utilized to determine the attention score of each node in the first relation diagram through the structural information and the characteristic information of the first relation diagram based on a self-attention mechanism, the node with the attention score meeting the preset score condition is selected from the first relation diagram, nonlinear transformation processing is carried out on the attention score of the selected node, the importance score of the selected node is obtained, and the importance score of the unselected node in the first relation diagram is set as the preset score.

The preset score may be set according to actual needs, and for example, to ignore the features of the unimportant nodes, the preset score may be set to 0. The preset score condition may also be set according to actual needs, and for example, the preset score condition may be set such that the attention score is located in the first K bits.

For example, the attention score of a node may be determined by the following equation (3):

wherein S is ^(l+1) The importance scores of the nodes are represented, topK represents K nodes with highest attention scores, A represents an adjacent matrix corresponding to the structural information of the first relation diagram, and the normalized form is D represents the degree matrix of the adjacency matrix, W _s ^(l) Network parameters indicating the node score level, l indicates the level index of the node score level, i.e. indicates what level of the node score level, X ^(l) Input data representing node score layers, attention representing self-Attention mechanisms, attention (A, X) ^(l) ) Representing the attention score of the node.

S108, performing text classification processing on the predicted document sample through a graph neural network in the text classification model based on semantic guidance information, feature information, structure information and class table labels corresponding to the training document sample of the first relation graph to obtain second classification reference information of the predicted document sample.

As shown in fig. 3, the graph neural network may further include a first convolution layer and a second convolution layer, wherein the first convolution layer is connected to the second convolution layer. Accordingly, the above S108 may be specifically implemented as:

and A1, carrying out convolution processing on the structural information and the characteristic information of the first relation graph by using a first convolution layer based on the class labels corresponding to the training document samples to obtain a first convolution result.

Specifically, the first convolution result may be expressed as: h ⁽¹⁾ ＝GCN(A,X ⁽⁰⁾ ) Wherein H is ⁽¹⁾ The first convolution result is represented by GCN, the convolution process is represented by A, the adjacent matrix corresponding to the structure information of the first relation diagram is represented by X ⁽⁰⁾ The method comprises the step of training the feature information of the class labels and the first relation diagram corresponding to the document samples.

And step A2, carrying out fusion processing on the first convolution result and the semantic guidance information of the first relation graph to obtain a fusion convolution result.

In order to enhance the important node characteristics with rich semantic information to make the important node characteristics contain more rich information, candidate convolution results of each node can be determined based on the product between the importance score of each node in the first relation graph and the first convolution result; and carrying out fusion processing on the candidate convolution results of each node in the first relation diagram and the first convolution results to obtain fusion convolution results of each node in the first relation diagram.

More specifically, the fusion convolution result of each node can be determined by the following formula (4):

Z ⁽¹⁾ ＝(H ⁽¹⁾ +H ⁽¹⁾ *S ⁽¹⁾ )/2 (4)

wherein Z is ⁽¹⁾ Representing the fusion convolution result of the nodes, S ⁽¹⁾ Representing importance scores of nodes, S ⁽¹⁾ ＝Attention(A,X ⁽⁰⁾ ) Attention represents an Attention mechanism, A represents an adjacency matrix corresponding to structural information of a first relation diagram, X ⁽⁰⁾ Includes the class label corresponding to the training document sample and the characteristic information of the first relation diagram, H ⁽¹⁾ Representing the first convolution result of the node, H ⁽¹⁾ *S ⁽¹⁾ Representing candidate convolution results for a node, representing element multiplication.

And step A3, inputting the fusion convolution result into a second convolution layer for convolution processing to obtain second classification reference information.

The second classification reference information may be determined by equation (5):

wherein,,representing the second classification reference information, Z ⁽¹⁾ The fusion convolution result of the nodes is represented, A represents an adjacent matrix corresponding to the structural information of the first relation diagram, and GCN represents convolution operation.

It can be understood that, the first convolution layer is utilized to carry out convolution processing on the structural information and the characteristic information of the first relation graph based on the class label corresponding to the training document sample, which is equivalent to updating the node representation of each node by utilizing the node representation of the neighbor node of each node in the first relation graph and the structural information of the first relation graph; then, carrying out fusion processing on the first convolution result and semantic guidance information of the first relation graph to obtain a fusion convolution result, wherein the fusion convolution result is equivalent to introducing importance scores of all nodes into node representations of all nodes in the first relation graph; further, the fusion convolution result is input to a second convolution layer to carry out convolution processing to obtain second classification reference information, which is equivalent to updating node representations of the nodes after importance scores are introduced, so that the graph neural network can focus on node representations of important nodes in the first relation graph when text classification processing is carried out, and the whole graph neural network can be quickly converged in the training process.

S110, optimizing network parameters of the graph neural network based on the first classification reference information, the second classification reference information and the class labels corresponding to the predicted document samples.

The network parameters of the graph neural network may include, but are not limited to, the number of neurons included in each network layer in the graph neural network, connection relationships and connection edge weights between neurons in different network layers, and offsets corresponding to neurons in each network layer.

In the embodiment of the application, the classification accuracy of the text classification model can be reflected due to the difference between the classification reference information of the predicted document sample and the classification label thereof, and the first classification reference information and the second classification reference information are respectively obtained by learning and classifying different networks in the text classification model based on different information of the relation graph, so that the learning capacity of the two networks is fully fused, the performance of the text classification model is improved, the text classification model can accurately classify and identify the text, the first classification reference information and the second classification reference information of the predicted document sample and the classification label corresponding to the predicted document sample can be synthesized, and the network parameters of the graph neural network are optimized.

Considering that the data processing modes of the pre-training classification network and the graphic neural network are different and the network sizes are different, directly combining the classification reference information output by the pre-training classification network and the graphic neural network can influence the convergence speed of the text classification model; in addition, the neural network performs the operation on the whole relationship graph, and the pre-training classification network may not be able to load the features of all the nodes in the relationship graph at one time, so in an alternative implementation, S110 may be implemented as follows:

s1101, acquiring a first weight corresponding to the pre-training classification network and a second weight corresponding to the graph neural network.

S1102, multiplying the first classification reference information of the document sample by a first weight, multiplying the second classification reference information by a second weight, and fusing the result of the multiplication.

For example, S1102 may be specifically implemented by formula (6).

Wherein Z represents the result of the fusion process,first classification reference information representing a document sample,bert represents a pre-trained classification network, X ⁽⁰⁾ Input data representing a pre-trained classification network, +.>Second class reference information representing a document sample, < >>GCN represents convolution operation, A represents an adjacent matrix corresponding to the structural information of the first relation diagram, Z ⁽¹⁾ Representing the input of the second convolution layer, Z ε R ^V×E The method and the device are not limited in this embodiment, the method and the device represent a document sample set, the number of nodes in a first relation diagram, the dimension of classification reference information, 1-lambda represent a first weight, lambda represent a second weight, the first weight and the second weight can be set according to actual needs. />

S1103, based on the result of the fusion processing, a prediction category of the predicted document sample is determined.

Specifically, the category corresponding to the maximum classification probability indicated in the result after the fusion processing may be determined as the predicted category of the predicted document sample.

S1104, determining the prediction loss of the graph neural network based on the prediction type of the prediction document sample, the type label corresponding to the prediction document sample and the preset loss function corresponding to the graph neural network.

Wherein the prediction loss is used to represent the deviation between the predicted category of the predicted document sample and the category label of the predicted document sample.

In practical application, the preset loss function may be set according to actual needs, which is not limited in the embodiment of the present application.

S1105, optimizing network parameters of the graph neural network based on the predicted loss of the graph neural network.

Illustratively, a Back Propagation algorithm (BP) and a predictive loss may be employed to determine the predictive loss caused by each network layer in the graph neural network; then, network parameters of each network layer in the graph neural network are adjusted layer by layer with the aim of reducing the prediction loss.

The embodiment of the present application herein shows a specific implementation of S110 described above. Of course, it should be understood that S110 may be implemented in other manners, which are not limited in this embodiment of the present application.

It should be noted that the above-mentioned process is only one adjustment process. In practical applications, multiple adjustments may be required, and thus S102 to S110 may be repeatedly performed multiple times until a preset training stop condition is met, thereby obtaining a final neural network. The preset training stop condition may be set according to actual needs, for example, at least one condition that the predicted loss is smaller than a preset loss threshold, the graph neural network converges, the adjustment frequency reaches the preset frequency, etc., which is not limited in the embodiment of the present application.

After obtaining the final graph neural network, verifying the text classification model, the existing Bert model and the existing BertGCN according to the embodiment of the application by using a MR (Movie Review) document test set, an R8 document test set and an R52 document test set to obtain the average classification accuracy of each model shown in the following table 1 and the change condition of the classification accuracy of each model with the iteration number shown in fig. 4, wherein the abscissa in fig. 4 represents the iteration number (epochs), and the ordinate represents the classification accuracy (accuracy) of the model.

TABLE 1

Based on the above table 1 and fig. 4, it can be obtained that, according to the training method in the embodiment of the present application, the semantic guidance information obtained in the training process has a good guiding effect on model training, so that the performance of the text classification model obtained by training is better. Specifically, compared with the existing Bert model and BertGCN model, the text classification model obtained through training in the embodiment of the application has higher average classification accuracy on three document test sets. Secondly, for the MR document test set and the R8 document test set, in 10 steps (i.e. iterating 10 times), compared with other models (such as the existing BertgcN), the text classification model of the embodiment of the application has better convergence effect; for the R52 document test set, the text classification model of the embodiments of the present application has a better convergence effect than other models (such as the existing BertGCN) within 60 steps. In addition, when the set iteration number is larger, the text classification model in the embodiment of the application does not converge when the early-stop constraint stops, and other models (such as the existing BertGCN).

According to the training method for the text classification model, the document sample and the words contained in the document sample are respectively used as the nodes in the relation graph, the association relation between the nodes is represented by the connection edges between the nodes, so that the text classification task is regarded as the node classification task, and the relation graph is used for training the text classification model because the information contained in the relation graph is richer than that contained in the document sample, so that the text classification model can learn richer knowledge, and the performance of the text classification model is improved; on the basis, a text classification model framework comprising a pre-training classification network and a graph neural network is adopted, characteristic information of a relation graph is learned by the pre-training classification network so as to carry out text classification processing on a predicted document sample, characteristic information and structural information of the relation graph are learned by the graph neural network, association relations among nodes in the relation graph, node representations of the nodes and class labels corresponding to the training document sample are fully utilized, text classification processing is carried out on the predicted document sample, classification reference information obtained by the pre-training classification network and the graph neural network respectively and the class labels corresponding to the predicted document sample are synthesized, network parameters of the graph neural network are optimized, learning capacity and classification reference information of the two networks can be fully fused, and the performance of the text classification model is further improved; in addition, when the graph neural network is used for text classification processing, semantic guidance information of the relation graph is determined based on the characteristic information and the structure information of the relation graph, the graph neural network is further used for text classification processing of the predicted document sample based on the characteristic information, the structure information, the semantic guidance information and the class labels corresponding to the training document samples of the relation graph, and the semantic guidance information can reflect the semantics of the document sample and the words contained in the document sample, so that semantic guidance can be provided for text classification processing tasks of the predicted document sample, and therefore, the graph neural network can infer node representations of the document nodes corresponding to the predicted document sample by means of node representations of the document nodes corresponding to the training document sample in the relation graph to realize text classification, and the graph neural network can be focused on important node representations with rich semantic information, so that convergence can be fast, and convergence speed of a text recognition model is improved.

The above embodiment introduces a training method for text classification models, by which text classification models for different application scenes can be trained, and the class labels of the document sample set used for model training can be selected according to the actual application scenes. The application scenario to which the training method provided in the embodiment of the present application is applicable may include, for example, but is not limited to: in related business such as electronic commerce, finance, etc., for example, commodity negative and positive comment recognition, irrigation comment detection, sensitive and illegal speech detection, fraud short message recognition, purchasing tendency recognition, finance news classification, etc. Taking the identification of the positive and negative comments of the commodity as an example, the adopted document sample can be a historical commodity comment, and the category label corresponding to the document sample is used for representing the emotional tendency of the document sample, namely, the document sample belongs to the positive comment or the negative comment.

Based on the training method of the text classification model shown in the above embodiment of the present application, the text classification model obtained by training can be used to execute the text classification task. The text classification model-based application process is described in detail below. The embodiment of the application also provides a text classification method which can be used for classifying and identifying the documents to be classified based on the text classification model trained by the method shown in fig. 1. Fig. 5 is a flowchart of a document classification method according to an embodiment of the present application, where the method may include the following steps:

S502, based on the target document set, acquiring structural information and characteristic information of the second relation diagram.

Wherein the target document set includes documents to be classified and classified documents. In practical applications, the classified documents in the target document set may be document samples labeled with category labels.

The second relation graph comprises a plurality of nodes and connecting edges between the nodes, the nodes comprise document nodes corresponding to the documents to be classified, document nodes corresponding to the classified documents and word nodes corresponding to words contained in the documents in the target document set, the structure information is used for representing edge weights corresponding to the connecting edges in the second relation graph, and the characteristic information comprises node representations of the nodes in the second relation graph. It should be noted that, the method for establishing the second relationship diagram is similar to the method for establishing the first relationship diagram, and the description of the process for establishing the first relationship diagram can be referred to in the foregoing, which is not repeated here.

The implementation of S502 is similar to that of S102 in the embodiment shown in fig. 1, and the foregoing description of S102 in the embodiment shown in fig. 1 may be specifically referred to, which is not repeated here.

And S504, performing text classification processing on the document to be classified based on the feature information through a pre-training classification network in the text classification model to obtain first classification reference information of the document to be classified.

The implementation of S504 is similar to that of S104 in the embodiment shown in fig. 1, and the foregoing description of S104 in the embodiment shown in fig. 1 may be referred to specifically, and will not be repeated here.

S506, determining semantic guidance information of the second relation graph based on the feature information and the structure information through the graph neural network in the text classification model.

For example, as shown in fig. 6, o1 to o4 respectively represent node representations of word nodes corresponding to words included in the target document set, e1 to e3 respectively represent node representations of document nodes, where e1 and e2 respectively represent node representations of document nodes corresponding to classified documents, e3 represents node representations of document nodes corresponding to unclassified documents, a class label c1=1 of the document node e1 (i.e., the class of the classified document corresponding to the document node is the first class), and a class label c2=2 of the document node e2 (i.e., the class of the classified document corresponding to the document node is the second class). Inputting characteristic information of the second relation graph into a graph neural network, and transmitting the characteristic information among nodes through a convolution layer in the graph neural network to obtain a fusion convolution result of each node in the second relation graph, wherein O1-O4 respectively represent the fusion convolution results of word nodes, and E1-E3 respectively represent the fusion convolution results of document nodes; further, the second convolution layer performs classification recognition on the document to be classified based on the fusion convolution result of the node to be classified, and second classification reference information C3 of the document to be classified is obtained.

The implementation of S506 is similar to that of S106 in the embodiment shown in fig. 1, and the foregoing description of S106 in the embodiment shown in fig. 1 may be referred to specifically, and will not be repeated here.

S508, performing text classification processing on the document to be classified based on semantic guidance information, feature information, structural information and the category to which the classified document belongs of the second relation diagram through the graphic neural network in the text classification model, and obtaining second category reference information of the document to be classified.

The implementation of S508 is similar to that of S108 in the embodiment shown in fig. 1, and the foregoing description of S108 in the embodiment shown in fig. 1 may be referred to specifically, and will not be repeated here.

S510, determining the category to which the document to be classified belongs based on the first classification reference information and the second classification reference information of the document to be classified.

The first classification reference information and the second classification reference information of the document to be classified are obtained by classifying and identifying different networks in the text classification model based on different data of the second relation diagram, so that the classification and identification capabilities of the two networks are fully fused, the accuracy of classification prediction results can be integrated, and the category of the document to be classified can be determined by integrating the first classification reference information and the second classification reference information of the document to be classified.

Specifically, the first classification reference information and the second classification reference information of the document to be classified can be weighted and summed based on preset weights corresponding to the first classification reference information and the second classification reference information, so as to obtain final classification reference information of the document to be classified, and then the category corresponding to the maximum classification probability indicated by the final classification reference information is determined as the category to which the document to be classified belongs.

According to the text classification method provided by the embodiment of the application, the documents to be classified, the classified documents and the words contained in the documents are respectively used as the nodes in the relation graph, the association relation among the nodes is represented by the connection edges among the nodes, so that the text classification task is regarded as the node classification task, and because the relation graph is richer than the document itself, the text classification task is executed by the text classification model based on the characteristic information and the structural information of the relation graph, so that the text classification model can acquire richer knowledge, and the text classification accuracy is improved; on the basis, a text classification model framework comprising a pre-training classification network and a graphic neural network is adopted, the feature information of a relation graph is learned by the pre-training classification network so as to carry out text classification processing on a document to be classified, the feature information and the structure information of the relation graph are learned by the graphic neural network, the association relation among nodes in the relation graph, node representation of the nodes and the category to which the classified document belongs are fully utilized to carry out text classification processing, and then classification reference information obtained by the pre-training classification network and the graphic neural network is synthesized to determine the category to which the document to be classified belongs, so that the learning capability and the prediction result of the two networks can be fully fused, and the text classification accuracy is further improved; in addition, when the graph neural network is used for text classification processing, semantic guidance information of the relationship graph is determined based on the characteristic information and the structure information of the relationship graph, and the graph neural network is further used for text classification processing of the unclassified document based on the characteristic information, the structure information, the semantic guidance information and the category to which the classified document belongs, and as the semantic guidance information can reflect the semantics of each document and the words contained in each document, semantic guidance can be provided for text classification processing tasks of the documents to be classified, so that the graph neural network can infer node representations of the unclassified nodes by means of node representations of the classified nodes in the relationship graph to realize text classification, and the graph neural network can be focused on important node representations with rich semantic information, so that classification results can be obtained quickly, and text recognition efficiency is improved.

The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

In addition, corresponding to the training method of the text classification model shown in fig. 1, the embodiment of the application also provides a training device of the text classification model. Referring to fig. 7, a schematic structural diagram of a training device 700 for a text classification model according to an embodiment of the present application is provided, where the device 700 includes:

an obtaining unit 710, configured to obtain structural information and feature information of the first relationship graph based on the document sample set; the document sample set comprises a training document sample and a predicting document sample, the first relation graph comprises a plurality of nodes and connecting edges among the nodes, and the nodes comprise document nodes corresponding to the training document sample, document nodes corresponding to the predicting document sample and word nodes corresponding to words contained in the document sample set;

The classification unit 720 is configured to perform text classification processing on the predicted document sample based on the feature information through a pre-training classification network in the text classification model, so as to obtain first classification reference information of the predicted document sample;

a semantic processing unit 730 for determining semantic guidance information of the first relationship graph based on the feature information and the structure information through a graph neural network in the text classification model;

the classifying unit 720 is configured to perform text classification processing on the predicted document sample based on the semantic guidance information, the feature information, the structural information and the class label corresponding to the training document sample through a graph neural network in the text classification model, so as to obtain second class reference information of the predicted document sample;

and an optimizing unit 740, configured to optimize network parameters of the neural network based on the first classification reference information, the second classification reference information, and the class labels corresponding to the predicted document samples.

Optionally, the graph neural network includes a node score layer, and the semantic guidance information includes importance scores of all nodes in the first relationship graph;

The semantic processing unit is specifically configured to:

determining the attention score of each node in the first relation diagram by using the node scoring layer based on a self-attention mechanism through the structure information and the characteristic information;

selecting nodes with attention scores meeting preset score conditions from the first relation diagram, and performing nonlinear transformation processing on the attention scores of the selected nodes to obtain importance scores of the selected nodes;

and setting importance scores of unselected nodes in the first relation diagram as preset scores.

Optionally, the graph neural network further comprises a first convolution layer and a second convolution layer, and the first convolution layer is connected with the second convolution layer;

the classifying unit is specifically used for:

carrying out convolution processing on the structural information and the characteristic information by using the first convolution layer based on the class label corresponding to the training document sample to obtain a first convolution result;

performing fusion processing on the first convolution result and the semantic guidance information to obtain a fusion convolution result;

and inputting the fusion convolution result to the second convolution layer for convolution processing to obtain second classification reference information.

Optionally, the optimizing unit is specifically configured to:

acquiring a first weight corresponding to the pre-training classification network and a second weight corresponding to the graph neural network;

multiplying the first classification reference information by the first weight, multiplying the second classification reference information by the second weight, and fusing the multiplication result;

determining the prediction category of the prediction document sample according to the fusion processing result;

determining a predicted loss based on a predicted category of the predicted document sample, a category label corresponding to the predicted document sample and a preset loss function corresponding to the graph neural network;

and optimizing network parameters of the graph neural network based on the predicted loss.

Optionally, the connection edges between the nodes in the first relationship graph include a first type connection edge and a second type connection edge;

the apparatus 700 further comprises:

a creating unit configured to create a first relationship graph based on a training document sample and a prediction document sample in a document sample set before the obtaining unit obtains structure information and feature information of the first relationship graph based on the document sample set;

The creating a first relationship graph based on training document samples and predicted document samples in the document sample set includes:

determining a word set corresponding to the document sample set based on words contained in the document sample set; the words contained in the document sample set comprise words in the training document sample and words in the prediction document sample;

creating word nodes corresponding to each word aiming at each word in the word set, and creating a first type of connecting edge between different word nodes meeting a first creating edge condition;

and creating document nodes corresponding to the document samples aiming at the document samples in the sample set, and creating a second type connecting edge between the document nodes and the word nodes which meet the second creating edge condition.

Optionally, the edge weights corresponding to the connecting edges in the first relation graph include the edge weights of the first type of connecting edges and the edge weights of the second type of connecting edges;

the acquisition unit acquires structure information of a first relationship diagram, including:

determining point mutual information between the first word and the second word based on the probability that any first word and the second word in the word set each appear in the document sample set and the probability that the first word and the second word appear in the same document sample;

Determining edge weights corresponding to first-type connecting edges between word nodes corresponding to the first words and word nodes corresponding to the second words based on the point-to-point information;

determining the importance degree of any target word in the word set on the target document sample based on the frequency of occurrence of the target word in the target document sample and the number of the document samples containing the target word in the sample set;

and determining the edge weight corresponding to the second type connecting edge between the word node corresponding to the target word and the document node corresponding to the target document sample according to the importance degree.

Optionally, the pre-training classification network comprises:

the language representation layer is used for carrying out embedding processing on the characteristic information to obtain embedded vectors of all nodes in the first relation diagram;

and the full-connection layer is used for classifying and identifying the document sample based on the embedded vector of each node in the first relation diagram to obtain first classification reference information of the document sample.

Obviously, the training device for a text classification model provided in the embodiment of the present application can be used as an execution subject of the training method for a text classification model shown in fig. 1, for example, in the training method for a text classification model shown in fig. 1, step S102 may be executed by the acquiring unit in the training device shown in fig. 7, step S104 is executed by the classifying unit, step S106 is executed by the semantic processing unit, step S108 is executed by the classifying unit, and step S110 is executed by the optimizing unit.

According to another embodiment of the present application, each unit in the training device of the text classification model shown in fig. 7 may be separately or completely combined into one or several additional units, or some unit(s) thereof may be further split into a plurality of units with smaller functions to form the training device, which may achieve the same operation without affecting the implementation of the technical effects of the embodiments of the present application. The above units are divided based on logic functions, and in practical applications, the functions of one unit may be implemented by a plurality of units, or the functions of a plurality of units may be implemented by one unit. In other embodiments of the present application, the training device based on the text classification model may also include other units, and in practical applications, these functions may also be implemented with assistance of other units, and may be implemented by cooperation of multiple units.

According to another embodiment of the present application, the training apparatus as shown in fig. 7 may be constructed by running a computer program (including program code) capable of executing the steps involved in the respective method as shown in fig. 1 on a general-purpose computing device such as a computer, including a Central Processing Unit (CPU), a random access storage medium (RAM), a read only storage medium (ROM), etc., processing elements and storage elements, and the training method of the text classification model of the embodiment of the present application may be implemented. The computer program may be recorded on, for example, a computer-readable storage medium, and loaded into and run in an electronic device through the computer-readable storage medium.

According to the training device for the text classification model, provided by the embodiment of the application, the document sample and the words contained in the document sample are respectively used as the nodes in the relation graph, and the association relation between the nodes is represented by the connection edges between the nodes, so that the text classification task is regarded as the node classification task; on the basis, a text classification model framework comprising a pre-training classification network and a graph neural network is adopted, characteristic information of a relation graph is learned by the pre-training classification network so as to carry out text classification processing on a predicted document sample, characteristic information and structural information of the relation graph are learned by the graph neural network, association relations among nodes in the relation graph, node representations of the nodes and class labels corresponding to the training document sample are fully utilized, text classification processing is carried out on the predicted document sample, classification reference information obtained by the pre-training classification network and the graph neural network respectively and the class labels corresponding to the predicted document sample are synthesized, network parameters of the graph neural network are optimized, learning capacity and classification reference information of the two networks can be fully fused, and the performance of the text classification model is further improved; in addition, when the graph neural network is used for text classification processing, semantic guidance information of the relation graph is determined based on the characteristic information and the structure information of the relation graph, the graph neural network is further used for text classification processing of the predicted document sample based on the characteristic information, the structure information, the semantic guidance information and the class labels corresponding to the training document samples of the relation graph, and the semantic guidance information can reflect the semantics of the document sample and the words contained in the document sample, so that semantic guidance can be provided for text classification processing tasks of the predicted document sample, and therefore, the graph neural network can infer node representations of the document nodes corresponding to the predicted document sample by means of node representations of the document nodes corresponding to the training document sample in the relation graph to realize text classification, and the graph neural network can be focused on important node representations with rich semantic information, so that convergence can be fast, and convergence speed of a text recognition model is improved.

In addition, corresponding to the text classification method shown in fig. 5, the embodiment of the application also provides a text classification device. Referring to fig. 8, a schematic structural diagram of a text classification device 800 according to an embodiment of the present application is provided, where the device 800 includes:

an obtaining unit 810, configured to obtain structural information and feature information of the second relationship graph based on the target document set; the target document set comprises a document to be classified and a classified document, the second relation graph comprises a plurality of nodes and connecting edges between the nodes, and the nodes comprise document nodes corresponding to the document to be classified, document nodes corresponding to the classified document and word nodes corresponding to words contained in the documents in the target document set;

the classification unit 820 is configured to perform text classification processing on the document to be classified based on the feature information through a pre-training classification network in a text classification model, so as to obtain first classification reference information of the document to be classified;

a semantic processing unit 830, configured to determine semantic guidance information of the second relationship graph based on the feature information and the structure information through a graph neural network in the text classification model;

The classification unit 820 is configured to perform text classification processing on the document to be classified based on the semantic guidance information, the feature information, the structure information, and the category to which the classified document belongs through a graph neural network in the text classification model, so as to obtain second classification reference information of the document to be classified;

and the classification unit 820 is used for determining the category to which the document to be classified belongs based on the first classification reference information and the second classification reference information of the document to be classified.

It is obvious that the text classification device provided in the embodiment of the present application can be used as an execution subject of the text classification method shown in fig. 5, for example, step S502 in the text classification method shown in fig. 5 may be executed by the obtaining unit in the classification device shown in fig. 8, steps S504, S508 and S510 are executed by the classification unit, and step S506 is executed by the semantic processing unit.

According to another embodiment of the present application, each unit in the text classification apparatus shown in fig. 8 may be separately or completely combined into one or several additional units, or some unit(s) thereof may be further split into a plurality of units with smaller functions, which may achieve the same operation without affecting the implementation of the technical effects of the embodiments of the present application. The above units are divided based on logic functions, and in practical applications, the functions of one unit may be implemented by a plurality of units, or the functions of a plurality of units may be implemented by one unit. In other embodiments of the present application, the text-based classification apparatus may also include other units, and in actual practice, these functions may also be implemented with assistance from other units, and may be implemented by cooperation of multiple units.

According to another embodiment of the present application, the classification apparatus as shown in fig. 8 may be constructed by running a computer program (including program code) capable of executing the steps involved in the respective method as shown in fig. 5 on a general-purpose computing device such as a computer including a processing element such as a Central Processing Unit (CPU), a random access storage medium (RAM), a read only storage medium (ROM), and the like, and a storage element, and implementing the text classification method of the embodiment of the present application. The computer program may be recorded on, for example, a computer-readable storage medium, and loaded into and run in an electronic device through the computer-readable storage medium.

According to the text classification device provided by the embodiment of the application, the documents to be classified, the classified documents and the words contained in the documents are respectively used as the nodes in the relation graph, the association relation among the nodes is represented by the connection edges among the nodes, so that the text classification task is regarded as the node classification task; on the basis, a text classification model framework comprising a pre-training classification network and a graphic neural network is adopted, the feature information of a relation graph is learned by the pre-training classification network so as to carry out text classification processing on a document to be classified, the feature information and the structure information of the relation graph are learned by the graphic neural network, the association relation among nodes in the relation graph, node representation of the nodes and the category to which the classified document belongs are fully utilized to carry out text classification processing, and then classification reference information obtained by the pre-training classification network and the graphic neural network is synthesized to determine the category to which the document to be classified belongs, so that the learning capability and the prediction result of the two networks can be fully fused, and the text classification accuracy is further improved; in addition, when the graph neural network is used for text classification processing, semantic guidance information of the relationship graph is determined based on the characteristic information and the structure information of the relationship graph, and the graph neural network is further used for text classification processing of the unclassified document based on the characteristic information, the structure information, the semantic guidance information and the category to which the classified document belongs, and as the semantic guidance information can reflect the semantics of each document and the words contained in each document, semantic guidance can be provided for text classification processing tasks of the documents to be classified, so that the graph neural network can infer node representations of the unclassified nodes by means of node representations of the classified nodes in the relationship graph to realize text classification, and the graph neural network can be focused on important node representations with rich semantic information, so that classification results can be obtained quickly, and text recognition efficiency is improved.

Fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application. Referring to fig. 9, at the hardware level, the electronic device includes a processor, and optionally an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory (non-volatile Memory), such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.

The processor, network interface, and memory may be interconnected by an internal bus, which may be an ISA (Industry Standard Architecture ) bus, a PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus, or EISA (Extended Industry Standard Architecture ) bus, among others. The buses may be classified as address buses, data buses, control buses, etc. For ease of illustration, only one bi-directional arrow is shown in fig. 9, but not only one bus or one type of bus.

And a memory for storing a computer program. In particular, the computer program may comprise program code comprising computer operating instructions. The memory may include memory and non-volatile storage and provide instructions and data to the processor.

The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form the training device of the text classification model on a logic level.

In one embodiment, a processor executes a program stored in a memory and is specifically configured to perform the following operations:

Alternatively, the processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form the text classification device on a logic level. The processor is used for executing the programs stored in the memory and is specifically used for executing the following operations:

The method performed by the training device of the text classification model disclosed in the embodiment shown in fig. 1 of the present application or the method performed by the text classification device disclosed in the embodiment shown in fig. 5 of the present application may be applied to a processor or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software. The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in hardware, in a decoded processor, or in a combination of hardware and software modules in a decoded processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method.

The electronic device may further execute the method of fig. 1 and implement the function of the training device of the text classification model in the embodiment shown in fig. 1, or the electronic device may further execute the method shown in fig. 5 and implement the function of the text classification device in the embodiment shown in fig. 5, which is not described herein.

Of course, other implementations, such as a logic device or a combination of hardware and software, are not excluded from the electronic device of the present application, that is, the execution subject of the following processing flow is not limited to each logic unit, but may be hardware or a logic device.

The present embodiments also provide a computer readable storage medium storing one or more computer programs, the one or more computer programs comprising instructions, which when executed by a portable electronic device comprising a plurality of application programs, enable the portable electronic device to perform the method of the embodiment shown in fig. 1, and in particular to perform the operations of:

The instructions, when executed by a portable electronic device comprising a plurality of applications, enable the portable electronic device to perform the method of the embodiment shown in fig. 5, and in particular to:

In summary, the foregoing description is only a preferred embodiment of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of the present application should be included in the protection scope of the present application.

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

Claims

1. A method for training a text classification model, comprising:

acquiring structural information and characteristic information of a first relation diagram based on a document sample set; the document sample set comprises a training document sample and a predicting document sample, the first relation diagram comprises a plurality of nodes and connecting edges among the nodes, and the nodes comprise document nodes corresponding to the training document sample, document nodes corresponding to the predicting document sample and word nodes corresponding to words contained in the document sample set;

and optimizing network parameters of the graph neural network based on the first classification reference information, the second classification reference information book and the class labels corresponding to the predicted document samples.

2. The method of claim 1, wherein the graph neural network in the text classification model includes a node score layer, and the semantic guidance information includes importance scores of respective nodes in the first relationship graph; the determining, by the graph neural network in the text classification model, semantic guidance information of the first relationship graph based on the feature information and the structure information includes:

3. The method of claim 2, wherein the graph neural network in the text classification model further comprises a first convolution layer and a second convolution layer, the first convolution layer being connected to the second convolution layer; the text classification processing is performed on the predicted document sample by using the graphic neural network in the text classification model based on the semantic guidance information, the feature information, the structure information and the class label corresponding to the training document sample, so as to obtain second classification reference information of the predicted document sample, which comprises the following steps:

4. The method of claim 1, wherein optimizing the network parameters of the neural network based on the first classification reference information, the second classification reference information, and the class labels corresponding to the predicted document samples comprises:

5. The method of claim 1, wherein the connection edges between nodes in the first relationship graph comprise a first type of connection edge and a second type of connection edge, the method further comprising, prior to obtaining structural information and feature information of the first relationship graph based on the document sample set:

creating a first relationship graph based on training document samples and predicted document samples in the document sample set;

and aiming at the document samples in the document sample set, creating document nodes corresponding to the training documents, creating document nodes corresponding to the predicted documents, and creating a second type connecting edge between the document nodes meeting a second creating edge condition and the word nodes.

6. The method of claim 5, wherein the edge weights corresponding to the connection edges in the first relationship graph include edge weights of the first type of connection edges and edge weights of the second type of connection edges, and obtaining the structure information of the first relationship graph includes:

determining point mutual information between the first word and the second word based on the probability that any first word and second word in the word set respectively appear in the document sample set and the probability that the first word and the second word appear in the same document sample;

Determining edge weights corresponding to first-type connecting edges between word nodes corresponding to the first word and word nodes corresponding to the second word based on the point-to-point information;

determining the importance degree of any target word in the word set on the target document sample based on the frequency of occurrence of the target word in the target document sample and the number of the document samples containing the target word in the document sample set;

7. The method according to any one of claims 1 to 6, wherein the pre-trained classification network comprises:

8. A method of text classification, comprising:

acquiring structural information and characteristic information of a second relation diagram based on a target document set; the target document set comprises a document to be classified and a classified document, the second relation graph comprises a plurality of nodes and connecting edges between the nodes, and the nodes comprise document nodes corresponding to the document to be classified, document nodes corresponding to the classified document and word nodes corresponding to words contained in the target document set;

9. A training device for a text classification model, comprising:

an acquisition unit for acquiring structural information and feature information of the first relationship graph based on the document sample set; the document sample set comprises a training document sample and a predicting document sample, the first relation graph comprises a plurality of nodes and connecting edges among the nodes, the nodes comprise document nodes corresponding to the training document sample, document nodes corresponding to the predicting document sample and word nodes corresponding to words contained in the document sample set;

The classification unit is used for carrying out text classification processing on the predicted document sample based on the characteristic information through a pre-training classification network to obtain first classification reference information of the predicted document sample;

the semantic processing unit is used for determining semantic guidance information of the first relation graph based on the characteristic information and the structural information through a graph neural network;

the classification unit is used for performing text classification processing on the predicted document sample through the graphic neural network based on the semantic guidance information, the characteristic information, the structural information and the class label corresponding to the training document sample to obtain second classification reference information of the predicted document sample;

10. A text classification device, comprising:

an acquisition unit for acquiring structural information and feature information of the second relationship graph based on the target document set; the target document set comprises a document to be classified and a classified document, the second relation graph comprises a plurality of nodes and connecting edges between the nodes, and the nodes comprise document nodes corresponding to the document to be classified, document nodes corresponding to the classified document and word nodes corresponding to words contained in the target document set;

11. An electronic device, comprising:

a processor;

a memory for storing the processor-executable computer program;

wherein the processor is configured to execute the computer program to implement the method of any one of claims 1 to 8.

12. A computer readable storage medium, characterized in that a computer program in the computer readable storage medium, when executed by a processor of an electronic device, enables the electronic device to perform the method of any one of claims 1 to 8.