CN114817538A

CN114817538A - Training method of text classification model, text classification method and related equipment

Info

Publication number: CN114817538A
Application number: CN202210443637.2A
Authority: CN
Inventors: 赵宏宇; 蒋宁; 王洪斌; 吴海英
Original assignee: Mashang Xiaofei Finance Co Ltd
Current assignee: Mashang Xiaofei Finance Co Ltd
Priority date: 2022-04-26
Filing date: 2022-04-26
Publication date: 2022-07-29
Anticipated expiration: 2042-04-26
Also published as: CN114817538B

Abstract

The application discloses a training method of a text classification model, a text classification method and related equipment. The training method comprises the following steps: performing text classification processing on the predicted document samples through a pre-training classification network based on the characteristic information of a first relational graph to obtain first classification reference information, wherein the first relational graph comprises document nodes corresponding to the training document samples, document nodes corresponding to the predicted document samples, word nodes corresponding to words contained in the document samples and connecting edges among the nodes; determining semantic guide information of the first relational graph based on the feature information and the structure information of the first relational graph through a graph neural network, and performing text classification processing on the prediction document sample based on the semantic guide information, the feature information, the structure information and class labels corresponding to the training document sample to obtain second classification reference information; and optimizing the network parameters of the graph neural network based on the first classification reference information, the second classification reference information and the class labels corresponding to the prediction document samples.

Description

Training method of text classification model, text classification method and related equipment

Technical Field

The application relates to the technical field of computers, in particular to a method for training a text classification model, a method for classifying texts and related equipment.

Background

The text classification task is a basic task in the field of Natural Language Processing (NLP), and can be widely applied to related businesses such as electronic commerce and finance.

At present, the traditional text classification task is based on a text classification model based on architectures such as a Convolutional Neural Network (CNN) and a Recurrent Neural Network (RNN), and the text classification model is trained by using a corpus so as to have the capability of executing the classification task. However, the performance of such text classification models is poor, so the convergence rate is slow in the training process, and the trained text classification models cannot perform text classification quickly and accurately.

Therefore, a solution capable of improving the convergence rate and the recognition accuracy of the text classification model is needed.

Disclosure of Invention

The embodiment of the application provides a training method and a text classification method of a text classification model and related equipment, which are used for improving the convergence rate and the recognition accuracy of the text classification model.

In order to achieve the above purpose, the following technical solutions are adopted in the embodiments of the present application:

in a first aspect, an embodiment of the present application provides a method for training a text classification model, including:

acquiring structural information and characteristic information of the first relational graph based on the document sample set; the document sample set comprises training document samples and predicted document samples, the first relational graph comprises a plurality of nodes and connecting edges among the nodes, and the nodes comprise document nodes corresponding to the training document samples, document nodes corresponding to the predicted document samples and word nodes corresponding to words contained in the document samples in the document sample set;

performing text classification processing on the predicted document sample based on the characteristic information through a pre-training classification network in a text classification model to obtain first classification reference information of the predicted document sample;

determining, by a graph neural network in the text classification model, semantic guidance information for the first relationship graph based on the feature information and the structure information;

performing text classification processing on the predicted document sample through the graph neural network based on the semantic guidance information, the feature information, the structure information and the class label corresponding to the training document sample to obtain second classification reference information of the predicted document sample;

and optimizing the network parameters of the graph neural network based on the first classification reference information, the second classification reference information and the class labels corresponding to the prediction document samples.

It can be seen that in the embodiment of the application, the document samples and the words contained therein are respectively used as nodes in the relational graph, and the association relationship between the nodes is represented by the connecting edges between the nodes, so that the text classification task is regarded as a node classification task.

On the basis, a text classification model architecture comprising a pre-training classification network and a graph neural network is adopted, the feature information of a relational graph is learned by the pre-training classification network so as to perform text classification processing on a predicted document sample, the feature information and the structure information of the relational graph are learned by the graph neural network, the incidence relation among nodes in the relational graph, the node representation of the nodes and the class labels corresponding to the training document samples are fully utilized, the text classification processing is performed on the predicted document sample, then the classification reference information obtained by the pre-training classification network and the graph neural network respectively and the class labels corresponding to the predicted document samples are integrated, the network parameters of the graph neural network are optimized, the learning capability and the classification reference information of the two networks can be fully fused, and the performance of the text classification model is further improved.

In addition, when the graph neural network is used for text classification processing, semantic guidance information of the relational graph is determined firstly based on the characteristic information and the structural information of the relational graph, and the graph neural network is further used for text classification processing based on the characteristic information, the structural information, the semantic guidance information and class labels corresponding to the training document samples, and the semantic guidance information can reflect respective semantics of the document samples and words contained in the document samples and further can provide semantic guidance for a text classification processing task of the prediction document samples, so that the graph neural network can realize text classification by reasoning node representation of document nodes corresponding to the training document samples in the relational graph to realize text classification, and can focus on important node representation with rich semantic information, therefore, the convergence speed of the text recognition model can be rapidly improved.

In a second aspect, an embodiment of the present application provides a text classification method, including:

acquiring structural information and characteristic information of the second relational graph based on the target document set; the target document set comprises a document to be classified and a classified document, the second relational graph comprises a plurality of nodes and connecting edges among the nodes, and the plurality of nodes comprise document nodes corresponding to the document to be classified, document nodes corresponding to the classified document and word nodes corresponding to words contained in the document in the target document set;

performing text classification processing on the document to be classified based on the characteristic information through a pre-training classification network in a text classification model to obtain first classification reference information of the document to be classified;

determining, by a graph neural network in the text classification model, semantic guidance information for the second relationship graph based on the feature information and the structural information;

performing text classification processing on the documents to be classified through a graph neural network in the text classification model based on the semantic guidance information, the feature information, the structure information and the classes to which the classified documents belong to obtain second classification reference information of the documents to be classified;

and determining the category of the document to be classified based on the first classification reference information and the second classification reference information of the document to be classified.

It can be seen that in the embodiment of the application, the documents to be classified, the classified documents and words contained in the documents are respectively used as nodes in the relational graph, and the association relationship between the nodes is represented by the connecting edges between the nodes, so that the text classification task is regarded as a node classification task, and the text classification task is executed by the text classification model based on the characteristic information and the structural information of the relational graph because the information contained in the relational graph is richer than that contained in the documents, so that the text classification model can acquire richer knowledge and is favorable for improving the text classification accuracy; on the basis, a text classification model architecture comprising a pre-training classification network and a graph neural network is adopted, the pre-training classification network learns the characteristic information of the relational graph to classify the documents to be classified, the graph neural network learns the characteristic information and the structural information of the relational graph, the incidence relation among nodes in the relational graph, the node representation of the nodes and the classes to which the classified documents belong are fully utilized to classify the documents, and then classification reference information obtained by the pre-training classification network and the graph neural network are synthesized to determine the classes to which the documents to be classified belong, so that the learning ability and the prediction result of the two networks can be fully fused, and the text classification accuracy is further improved; in addition, when the graph neural network is used for text classification processing, semantic guidance information of the relational graph is determined firstly based on the characteristic information and the structural information of the relational graph, and text classification processing is further performed on unclassified documents based on the characteristic information, the structural information, the semantic guidance information and the classes of the classified documents of the relational graph by using the graph neural network, because the semantic guidance information can reflect respective semantics of each document and words contained in the document, semantic guidance can be provided for a text classification processing task of the document to be classified, therefore, the graph neural network can not only realize text classification by reasoning node representation of the unclassified nodes in the relational graph according to the node representation of the classified nodes, but also can focus on important node representation with rich semantic information, and can obtain classification results quickly, that is, the text recognition efficiency is improved.

In a third aspect, an embodiment of the present application provides a training apparatus for a text classification model, including:

the acquiring unit is used for acquiring the structural information and the characteristic information of the first relational graph based on the document sample set; the document sample set comprises training document samples and prediction document samples, the first relational graph comprises a plurality of nodes and connecting edges among the nodes, the nodes comprise document nodes corresponding to the training document samples, document nodes corresponding to the prediction document samples and word nodes corresponding to words contained in the document sample set, the structural information is used for representing edge weights corresponding to the connecting edges in the first relational graph, and the characteristic information comprises node representations of the nodes in the first relational graph;

the classification unit is used for performing text classification processing on the predicted document sample based on the characteristic information through a pre-training classification network in a text classification model to obtain first classification reference information of the predicted document sample;

a semantic processing unit, configured to determine semantic guidance information of the first relation graph based on the feature information and the structure information through a graph neural network in the text classification model;

the classification unit is configured to perform text classification processing on the predicted document sample through a graph neural network in the text classification model based on the semantic guidance information, the feature information, the structure information, and a class label corresponding to the training document sample to obtain second classification reference information of the predicted document sample;

and the optimization unit is used for optimizing the network parameters of the graph neural network based on the first classification reference information, the second classification reference information and the class labels corresponding to the prediction document samples.

In a fourth aspect, an embodiment of the present application provides a text classification apparatus, including:

an acquisition unit configured to acquire structural information and feature information of the second relational graph based on the target document set; the target document set comprises a document to be classified and a classified document, the second relational graph comprises a plurality of nodes and connecting edges among the nodes, and the plurality of nodes comprise document nodes corresponding to the document to be classified, document nodes corresponding to the classified document and word nodes corresponding to words contained in the document in the target document set;

the classification unit is used for performing text classification processing on the documents to be classified based on the characteristic information through a pre-training classification network in a text classification model to obtain first classification reference information of the documents to be classified;

a semantic processing unit, configured to determine semantic guidance information of the second relation graph based on the feature information and the structure information through a graph neural network in the text classification model;

the classification unit is used for performing text classification processing on the documents to be classified through a graph neural network in the text classification model based on the semantic guidance information, the feature information, the structure information and the class to which the classified documents belong to obtain second classification reference information of the documents to be classified;

the classification unit is used for determining the class of the document to be classified based on the first classification reference information and the second classification reference information of the document to be classified.

In a fifth aspect, an embodiment of the present application provides an electronic device, including:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the method of the first or second aspect.

In a sixth aspect, embodiments of the present application provide a computer-readable storage medium, where instructions, when executed by a processor of an electronic device, enable the electronic device to perform the method according to the first aspect or the second aspect.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1 is a schematic flowchart of a method for training a text classification model according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a first relationship diagram provided by an embodiment of the present application;

FIG. 3 is a schematic structural diagram of a text classification model according to an embodiment of the present application;

FIG. 4 is a graph illustrating comparison of performance of different text classification models;

fig. 5 is a flowchart illustrating a text classification method according to an embodiment of the present application;

fig. 6 is a flowchart illustrating a text classification method according to another embodiment of the present application;

fig. 7 is a schematic structural diagram of a training apparatus for a text classification model according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a text classification apparatus according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms first, second and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application are capable of operation in sequences other than those illustrated or described herein. In addition, "and/or" in the specification and claims means at least one of connected objects, and a character "/" generally means that a front and rear related object is in an "or" relationship.

Due to the limited number of training document samples labeled with category labels, it cannot be guaranteed that a text classification model based on the CNN or RNN architecture learns enough useful feature information from the document samples, therefore, the embodiment of the application provides a training method of a text classification model, which constructs a corresponding relation graph by using a document sample set, takes the document sample set and words contained in the document sample set as nodes in the relation graph respectively, expresses the association relation among the nodes by using connecting edges among the nodes, and considers a text classification task as a node classification task, since the relationship graph is richer than the information contained in each document sample itself in the document sample set, therefore, the relation graph is used for training the text classification model, so that the text classification model can learn richer knowledge, and the performance of the text classification model is improved; on the basis, a text classification model architecture comprising a pre-training classification network and a graph neural network is adopted, the feature information of a relational graph is learned by the pre-training classification network so as to perform text classification processing on a predicted document sample, the feature information and the structure information of the relational graph are learned by the graph neural network, the incidence relation among nodes in the relational graph, the node representation of the nodes and the class labels corresponding to the training document samples are fully utilized, the text classification processing is performed on the predicted document sample, then the classification reference information obtained by the pre-training classification network and the graph neural network respectively and the class labels corresponding to the predicted document samples are integrated, the network parameters of the graph neural network are optimized, the learning capability and the classification reference information of the two networks can be fully fused, and the performance of the text classification model is further improved; in addition, when the graph neural network is used for text classification processing, semantic guidance information of the relational graph is determined firstly based on the characteristic information and the structural information of the relational graph, and the graph neural network is further used for text classification processing based on the characteristic information, the structural information, the semantic guidance information and class labels corresponding to the training document samples, and the semantic guidance information can reflect respective semantics of the document samples and words contained in the document samples and further can provide semantic guidance for a text classification processing task of the prediction document samples, so that the graph neural network can realize text classification by reasoning node representation of document nodes corresponding to the training document samples in the relational graph to realize text classification, and can focus on important node representation with rich semantic information, therefore, the convergence speed of the text recognition model can be rapidly improved.

Based on the training method of the text classification model, the embodiment of the application also provides a text classification method, and text recognition can be rapidly and accurately carried out by using the text classification model obtained by training.

It should be understood that the training method and the text classification method of the text classification model provided in the embodiments of the present application may be executed by an electronic device or software installed in the electronic device. The electronic device referred to herein may include a terminal device, such as a smart phone, a tablet computer, a notebook computer, a desktop computer, an intelligent voice interaction device, an intelligent household appliance, an intelligent watch, a vehicle-mounted terminal, an aircraft, or the like; alternatively, the electronic device may further include a server, such as an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and a cloud server providing a cloud computing service.

The technical solutions provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.

Referring to fig. 1, a flow chart of a training method of a text classification model according to an embodiment of the present application is schematically shown, where the method includes the following steps:

s102, acquiring the structural information and the characteristic information of the first relational graph based on the document sample set.

Wherein the first relational graph comprises a plurality of nodes and connecting edges between the nodes. Specifically, the plurality of nodes include document nodes corresponding to training document samples, document nodes corresponding to prediction document samples, and word nodes corresponding to words contained in document samples in the document sample set. The connecting edges between the nodes are used for indicating that the association relationship exists between the nodes. The connecting edges between the nodes in the first relational graph comprise a first class connecting edge and a second class connecting edge, the first class connecting edge is used for representing the incidence relation between the word nodes, and the second class connecting edge represents the incidence relation between the document nodes and the word nodes.

In the embodiment of the present application, the first relation graph may be established in any appropriate manner, and may be specifically selected according to actual needs, which is not limited in the embodiment of the present application. In an optional implementation manner, to accurately reflect the association relationship between different document samples and the association relationship between a document sample and a word, a first relationship graph may be established based on training document samples and predicted document samples in a document sample set, and specifically, before S102, the training method according to the embodiment of the present application further includes: determining a word set corresponding to the document sample set based on words contained in the document sample set, wherein the words contained in the document sample set comprise words in a training document sample and words in a prediction document sample; then, aiming at each word in the word set, creating a word node corresponding to each word, and creating a first type of connecting edge between different word nodes meeting a first creating edge condition, wherein if two words in the word set simultaneously appear in the same document sample, the word nodes corresponding to the two words respectively meet the first creating edge condition; if the two terms do not appear in the same document sample at the same time, the term nodes corresponding to the two terms do not meet the first creation boundary condition; creating document nodes corresponding to training document samples and document nodes corresponding to prediction document samples aiming at document samples in a document sample set, and creating a second type connecting edge between the document nodes meeting a second creating edge condition and the word nodes, wherein if a certain document sample (the training document sample or the prediction document sample) contains a certain word, the document nodes corresponding to the document sample and the word nodes corresponding to the word meet the second creating edge condition; and if the certain document sample does not contain a certain word, the document node corresponding to the document sample and the word node corresponding to the word do not meet the second creation boundary condition. Therefore, the first type of connecting edge can represent that the associated relation exists between the words corresponding to the connected word nodes, and the second type of connecting edge can represent that the relation between the document samples corresponding to the connected document nodes and the words corresponding to the word nodes is between inclusion and inclusion.

For example, the document sample set includes three document samples, that is, document sample 1, document sample 2, and document sample 3, where document sample 1 includes term a, document sample 2 includes term B and term C, and document sample 3 includes term C and term D, and further it may be determined that the term set corresponding to the document sample set is { a, B, C, D }, according to the foregoing embodiments, a first relational graph as shown in fig. 2 may be established, where a dotted connection line between word nodes represents a first type of connection edge, and a solid connection line between a word node and a document node represents a second type of connection edge. It should be noted that in the embodiment of the present application, the document may include, for example, but is not limited to, a sentence, a paragraph, and the like.

In this embodiment of the application, the structure information of the first relation graph refers to information used for reflecting a structure characteristic of the first relation graph (for example, an association relationship between document samples or terms corresponding to nodes), and may specifically include an edge weight corresponding to each connection edge in the first relation graph, where the edge weight corresponding to each connection edge in the first relation graph includes an edge weight of a first type of connection edge and an edge weight of a second type of connection edge. The edge weight corresponding to the connecting edge may reflect the degree of association between the nodes connected by the connecting edge, for example, if the edge weight corresponding to the connecting edge is larger, it indicates that the association between the nodes connected by the connecting edge is tighter.

In an alternative implementation manner, considering that documents of the same category generally contain the same word, for example, documents with forward emotion generally contain "good", "like", and other words with forward emotion, in order to enable the structure information of the first relationship graph to accurately reflect the association relationship between the document sample and the word and the association relationship between different words, so that the text recognition model can learn the difference between documents of different categories, the commonality between documents of the same category, and the influence of the word in the document on the document category, in the above S102, obtaining the structure information of the first relationship graph may be specifically implemented as:

(1) for a first-class connecting edge between word nodes, determining Point Mutual Information (PMI) between a first word and a second word based on the probability that the first word and the second word in any word set respectively appear in a document sample set and the probability that the first word and the second word appear in the same document sample; based on the point mutual information, determining the edge weight corresponding to the first class connecting edge between the word node corresponding to the first word and the word node corresponding to the second word.

Specifically, the probability of each word appearing in the document sample set may be determined according to a ratio between the number of times each word appears in the document sample and the total number of words contained in the word set corresponding to the document sample set. The probability that the first term and the second term appear in the same document sample can be determined according to the ratio of the times that the first term and the second term appear in the same document sample to the total number of terms contained in the term set corresponding to the document sample set. Further, the point mutual information between the first term and the second term may be determined as an edge weight corresponding to a first-class connecting edge between a term node corresponding to the first term and a term node corresponding to the second term.

The mutual point information between two words is used for indicating the correlation between the two words, and if the mutual point information between the two words is larger, the more correlation between the two words is indicated. Mutual point information between a first word and a second word can be determined in a manner commonly used in the art, such as

The PMI (i, j) represents point mutual information between the first word and the second word, p (i) represents the probability of the first word appearing in the document sample set, p (j) represents the number of times of the second word appearing in the document sample set, and p (i, j) represents the probability of the first word and the second word appearing in the same document sample.

(2) For the second type of connecting edges between the word nodes and the document nodes, the importance degree of the target words to the target document sample is determined based on the occurrence probability of any target word in the word set in the target document and the number of the document samples containing the target word in the document sample set, and the edge weight corresponding to the second type of connecting edges between the word nodes corresponding to the target words and the document nodes corresponding to the target document samples is determined based on the importance degree.

Specifically, the importance degree of a word to a Document can be expressed by a Term Frequency-Inverse file Frequency (TF-IDF). Wherein the word frequency (TF) of a word represents the frequency with which the word appears in the document; inverse Document Frequency (IDF) represents the prevalence of a term, which in embodiments of the present application may be determined based on the document sample containing the term and the number of document samples in the document sample set; further, the product of the word frequency of the word and the inverse document frequency can be used as the word frequency-inverse document frequency of the word. Further, the importance degree of the target word for the target document sample may be determined as an edge weight corresponding to a second type of connecting edge between a word node corresponding to the target word and a document node corresponding to the target document sample.

In this embodiment of the application, the feature information of the first relational graph refers to information for reflecting features of each node in the first relational graph, and may specifically include node representations of each node in the first relational graph. The node representation of the document node refers to a representation vector of the document features of the document sample corresponding to the document node, and the node representation of the word node refers to a representation vector of the word features of the words corresponding to the word node. The node representation of each node in the first relational graph may be obtained by various technical means in the art, which is not limited in this embodiment of the present application.

And S104, performing text classification processing on the predicted document sample through a pre-training classification network in the text classification model based on the characteristic information of the first relational graph to obtain first classification reference information of the predicted document sample.

As shown in fig. 3, the text classification model in the embodiment of the present application includes a pre-trained classification network, where the pre-trained classification network refers to a network that is pre-trained and has text classification capability. Because the feature information of the first relational graph reflects the respective features of the document sample and the words contained in the document sample, the pre-training classification network performs text classification processing on the predicted document sample based on the feature information of the first relational graph to obtain first classification reference information of the predicted document sample. Wherein the first classification reference information may include a feature vector representing a classification result of the prediction document sample.

In the embodiment of the present application, the pre-training classification network may have any appropriate structure, and may be specifically set according to actual needs, which is not limited in the embodiment of the present application. In an alternative implementation, to improve the recognition accuracy of the pre-trained classification network, as shown in fig. 3, the pre-trained classification network may include a language representation layer and a full connection layer.

The language representation layer is used for performing Embedding processing (Embedding) on the feature information of the first relational graph to obtain an embedded vector of each node in the first relational graph. In order to input that the embedded vector of the full link layer can contain rich semantic information, and even if the embedded vector of the word node can better reflect the meaning of the word corresponding to the word node in the document sample and enable the embedded vector of the document node to better reflect the real intention of the document sample corresponding to the document node, the full link layer can adopt a pre-training language model (BERT), which adopts a transform-based Bidirectional Encoder architecture, when embedding processing is carried out on the node representation of the node corresponding to each document sample or word, not only the document sample or word itself but also the context information of the document sample are considered, so that the obtained embedded vector has more rich semantic information, and the full link layer can be favorable for accurately identifying the category to which the document sample belongs.

In order to enable the pre-training classification network to perform classification and identification on large-scale documents to be classified, the language representation layer can be used for embedding the node representation of each document node in the first relational graph to obtain a corresponding embedded vector, and the embedded vector of the word node in the first relational graph is set as a zero vector, namely the zero vector

Wherein X represents an embedded vector output by the language representation layer,

n _word representing the number of document nodes, n, in the first relational graph _doc Representing the number of word nodes in the first relational graph, d representing the dimension of the embedding vector, X _doc An embedded vector representing a document node, and 0 represents an embedded vector of a word node.

The full connection layer is used as an output layer of the pre-training classification network and is used for classifying and identifying the document sample based on the embedded vector of each node in the first relational graph to obtain first classification reference information of the document sample, namely the first classification reference information

Wherein the content of the first and second substances,

first classification reference information representing a document sample, l represents a layer index in a pre-trained classification network, i.e., belongs to the second layer in the pre-trained classification network, X ^(l) A feature vector representing a node in the first relationship graph,

a network parameter representing a full connectivity layer,

c represents the dimension of the feature vector of the node in the first relational graph, E represents the dimension of the first classification reference information of the document sample, R represents the document sample set, and Bert represents the full connection layer. Thus, the fully-connected layer can play a fine-tuning role on the language representation layer, so that the embedded vectors output by the language representation layer can be used for downstream tasks, such as the text classification task in the embodiment of the present application.

And S106, determining semantic guidance information of the first relational graph based on the feature information and the structural information of the first relational graph through a graph neural network in the text classification model.

As shown in fig. 3, the text classification model in the embodiment of the present application further includes a graph neural network, where the graph neural network performs a classification and identification task through the network topology and the node content information of the graph. In the embodiment of the application, the structural information and the feature information of the first relational graph and the class labels corresponding to the training document samples are input into the graph neural network through the graph neural network, so that the graph neural network can learn the association relationship among all nodes in the first relational graph and the node representation reflecting the node features based on the class labels corresponding to the training document samples, grasp the association relationship among different document samples, the association relationship between the document samples and words and the respective features of the document samples and words, and further perform text classification processing on the predicted document samples by using the relevant information of the training document samples to obtain the second classification reference information of the document samples.

In the embodiment of the present application, to improve the classification accuracy, the Graph neural Network may be a Graph Convolutional Network (GCN). The graph convolution network includes convolution layers, which may be graph convolution layers, for example. The graph convolutional layer is a processing layer for carrying out convolution operation on graph data, and is used for learning a function mapping, aiming at each node in the graph, generating a new feature of the node by using the feature of the node and the feature of a neighboring node by using the learned function mapping, thereby realizing feature propagation among the nodes in the graph so as to identify the class of an unclassified node by using the feature of the classified node. In the embodiment of the application, for each node in the relational graph, the convolutional layer updates the feature of the node by using the feature of the neighbor node of the node to obtain a new feature of the node, so as to implement feature propagation between the nodes in the relational graph, and identifies the category to which the predicted document sample belongs by using the node representation of the document node corresponding to the training document sample and the category label corresponding to the training document sample.

Specifically, the convolutional layers in the graph neural network can be expressed as the following formula (1) and formula (2):

wherein H ^(l+1) Indicating the output result of the convolutional layer, l indicating the layer index of the convolutional layer, i.e. indicating the layer of the second convolutional layer, GCN indicating the convolution operation, A indicating the adjacency matrix of the structure information of the first relational graph, its normalized form being

D represents a degree matrix of the adjacency matrix, PMI (i, j) represents the edge change weight corresponding to a first type connecting edge between the ith word and the word node corresponding to the jth word, TF-IDF (i, j) represents the edge weight corresponding to a second type connecting edge between the word node corresponding to the ith word and the document node corresponding to the jth document sample, and X ^(l) Input data representing convolutional layer, X ∈ R ^V×C C represents the dimension of the input data of the convolutional layer, W ^(l) Network parameters representing convolutional layers, W ∈ R ^C×F F represents the dimension of the output data of the convolutional layer, R represents the document sample set, and σ represents the activation function.

Considering that the structural information and the characteristic information of the relational graph have large data quantity, the graph neural network has a large number of contents to be learned, so that the convergence speed is low, and the training efficiency of the text classification model is influenced, therefore, the semantic guidance information of the first relational graph can be determined through the graph neural network based on the characteristic information and the structural information of the first relational graph, the semantic guidance information can reflect respective semantics of a document sample and words contained in the document sample, and further can provide semantic guidance for a text classification processing task of predicting the document sample, so that the graph neural network can realize text classification by reasoning and predicting node representation of the document node corresponding to the document sample according to the node representation of the document node corresponding to the document sample in the relational graph, and the graph neural network can focus on important node representation with rich semantic information, therefore, the convergence speed of the text recognition model can be rapidly improved.

Specifically, the semantic guidance information of the first relational graph can be used for representing the semantic importance of the node representation of each node in the first relational graph to the predicted document sample, so that the convolutional layer is more focused on the important node features with rich semantic information when performing feature propagation between nodes, the convergence speed of the graph neural network is increased, namely the convergence speed of the text recognition model is increased, and the training efficiency of the text recognition model is increased.

In an alternative implementation, the semantic guidance information of the first relation graph can be determined by introducing a scoring mechanism for the importance of the nodes in the graph neural network, and the feature information of the first relation graph comprises the importance scores of the nodes in the first relation graph. As shown in fig. 3, the graph neural network may include a node scoring layer for determining importance scores of the nodes in the first relational graph through the structural information and the characteristic information of the first relational graph based on a self-attention mechanism. It can be understood that, because the Attention machine (Attention) can screen out a small amount of important information from a large amount of information and focus on the important information, neglect most of the unimportant information, while the Self-Attention machine (Self-Attention) belongs to a variation of the Attention machine, which can reduce the dependence on external information, and is better at capturing the internal correlation of data or features, and capturing the correlation between nodes and the features of the nodes in the first relational graph through the Self-Attention machine can more accurately and objectively evaluate the importance of each node, that is, the obtained importance score of each node is more accurate and more objective, and it is beneficial for the second convolutional layer to focus on the features of the important nodes in the first relational graph, thereby facilitating the rapid convergence of the whole graph neural network in the training process.

More specifically, in order to ensure that the second convolutional layer can focus on the features of the important nodes and ignore the features of the most of the unimportant nodes, thereby further improving the convergence speed of the neural network, the above S106 may be implemented as: the method comprises the steps of determining the attention value of each node in a first relational graph through structural information and characteristic information of the first relational graph by utilizing a node scoring layer based on a self-attention mechanism, selecting the node with the attention value meeting a preset value condition from the first relational graph, carrying out nonlinear transformation processing on the attention value of the selected node to obtain the importance value of the selected node, and setting the importance value of the unselected node in the first relational graph as a preset value.

The preset score can be set according to actual needs, and for example, in order to ignore the characteristics of the unimportant node, the preset score can be set to 0. The preset score condition can also be set according to actual needs, for example, the preset score condition can be set that the attention score is located at the top K.

For example, the attention score of a node may be determined by the following equation (3):

wherein S is ^(l+1) Representing the importance scores of the nodes, TopK representing the selection of the K nodes with the highest attention scores, A representing the adjacency matrix corresponding to the structural information of the first relational graph, and the normalization form of the adjacency matrix is

D represents the degree matrix of the adjacency matrix, W _s ^(l) Network parameters representing node ratings, | representing node ratingsLayer index, i.e. representing the level of the node evaluation layer, X ^(l) Input data representing node evaluation hierarchy, Attention representing the self-Attention mechanism, Attention (A, X) ^(l) ) Representing the attention score of the node.

And S108, performing text classification processing on the predicted document sample through a graph neural network in the text classification model based on the semantic guide information, the characteristic information and the structural information of the first relational graph and a class table label corresponding to the training document sample to obtain second classification reference information of the predicted document sample.

As shown in fig. 3, the graph neural network may further include a first convolutional layer and a second convolutional layer, wherein the first convolutional layer is connected with the second convolutional layer. Accordingly, the above S108 may be specifically implemented as:

step A1, performing convolution processing on the structural information and the characteristic information of the first relational graph by using the first convolution layer based on the class label corresponding to the training document sample to obtain a first convolution result.

Specifically, the first convolution result may be expressed as: h ⁽¹⁾ ＝GCN(A,X ⁽⁰⁾ ) Wherein H is ⁽¹⁾ Representing a first convolution result, GCN representing convolution processing, A representing an adjacency matrix corresponding to structure information of the first relational graph, X ⁽⁰⁾ The method comprises the class labels corresponding to the training document samples and the characteristic information of the first relational graph.

And step A2, performing fusion processing on the first convolution result and the semantic guidance information of the first relational graph to obtain a fusion convolution result.

In order to enhance the important node characteristics with rich semantic information to enable the important node characteristics to contain richer information, a candidate convolution result of each node can be determined based on the product of the importance score of each node in the first relational graph and the first convolution result; and performing fusion processing on the candidate convolution result and the first convolution result of each node in the first relational graph to obtain a fusion convolution result of each node in the first relational graph.

More specifically, the fused convolution result of each node can be determined by the following formula (4):

Z ⁽¹⁾ ＝(H ⁽¹⁾ +H ⁽¹⁾ *S ⁽¹⁾ )/2 (4)

wherein Z is ⁽¹⁾ Representing the result of the fused convolution of the nodes, S ⁽¹⁾ Representing the importance score, S, of a node ⁽¹⁾ ＝Attention(A,X ⁽⁰⁾ ) Attention means, A means an adjacency matrix corresponding to the structural information of the first relational graph, and X means ⁽⁰⁾ Including class labels corresponding to the training document samples and feature information of the first relational graph, H ⁽¹⁾ Representing the first convolution result, H, of the node ⁽¹⁾ *S ⁽¹⁾ The candidate convolution results representing nodes, represent element multiplications.

And step A3, inputting the fusion convolution result into the second convolution layer for convolution processing to obtain second classification reference information.

The second classification reference information may be determined by equation (5):

wherein the content of the first and second substances,

denotes second classification reference information, Z ⁽¹⁾ The result of the fusion convolution of the nodes is represented, a represents an adjacency matrix corresponding to the structural information of the first relational graph, and GCN represents convolution operation.

It can be understood that, the first convolution layer is used for performing convolution processing on the structure information and the feature information of the first relational graph based on the class label corresponding to the training document sample, and equivalently, the node representation of each node is updated by using the node representation of the neighbor node of each node in the first relational graph and the structure information of the first relational graph; then, performing fusion processing on the first convolution result and the semantic guidance information of the first relational graph to obtain a fusion convolution result, namely introducing the importance value of each node into the node representation of each node in the first relational graph; furthermore, the fusion convolution result is input to the second convolution layer to be subjected to convolution processing to obtain second classification reference information, which is equivalent to updating the node representation of each node after the importance value is introduced, so that the graph neural network can focus on the node representation of the important node in the first relational graph when text classification processing is performed, and the fast convergence of the whole graph neural network in the training process is facilitated.

And S110, optimizing network parameters of the graph neural network based on the first classification reference information, the second classification reference information and the class labels corresponding to the prediction document samples.

The network parameters of the neural network may include, but are not limited to, the number of neurons included in each network layer in the neural network, connection relationships and connection edge weights between neurons of different network layers, and offsets corresponding to the neurons in each network layer, for example.

In the embodiment of the application, because the classification reference information of the predicted document sample and the classification label thereof are different, the classification accuracy of the text classification model can be reflected, and the first classification reference information and the second classification reference information are respectively obtained by learning and classifying different networks in the text classification model based on different information of the relational graph, in order to fully fuse the learning abilities of the two networks, the performance of the text classification model is improved, the text classification model can accurately classify and identify texts, the first classification reference information, the second classification reference information and the classification label corresponding to the predicted document sample of the predicted document sample can be synthesized, and the network parameters of the graph neural network are optimized.

Considering that the pre-training classification network and the graph neural network have different data processing modes and different network sizes, directly combining the classification reference information output by the pre-training classification network and the graph neural network can influence the convergence speed of the text classification model; in addition, the graph neural network operates on the whole relational graph, and the pre-trained classification network may not load the features of all nodes in the relational graph at one time, for this reason, in an alternative implementation, the above S110 may be implemented as:

s1101, acquiring a first weight corresponding to the pre-training classification network and a second weight corresponding to the graph neural network.

And S1102, multiplying the first classification reference information of the document sample by the first weight, multiplying the second classification reference information by the second weight, and fusing the multiplication results.

For example, the above S1102 may be specifically realized by formula (6).

Wherein Z represents the result of the fusion process,

first classification reference information representing a sample of a document,

bert denotes the pre-trained classification network, X ⁽⁰⁾ Input data representing a pre-trained classification network,

second classification reference information representing a sample of the document,

GCN denotes convolution operation, A denotes an adjacency matrix corresponding to the structure information of the first relational graph, and Z denotes ⁽¹⁾ Represents the input of the second convolution layer, Z ∈ R ^V×E The term "document sample set" refers to the number of nodes in the first relational graph, and refers to the dimension of the classification reference information, 1- λ refers to a first weight, λ refers to a second weight, and the first weight and the second weight may be set according to actual needs, which is not limited in this embodiment of the application.

S1103, determines a prediction category of the predicted document sample based on the result of the fusion process.

Specifically, the category corresponding to the maximum classification probability indicated in the result after the fusion processing may be determined as the prediction category of the predicted document sample.

And S1104, determining the predicted loss of the graph neural network based on the predicted category of the predicted document sample, the category label corresponding to the predicted document sample and the preset loss function corresponding to the graph neural network.

Wherein the prediction loss is used to represent a deviation between a prediction category of the prediction document sample and a category label of the prediction document sample.

In practical application, the preset loss function may be set according to actual needs, which is not limited in the embodiment of the present application.

S1105, optimizing network parameters of the neural network based on the predicted loss of the neural network.

For example, a Back Propagation (BP) algorithm and a prediction loss can be adopted to determine the prediction loss caused by each network layer in the neural network; then, the network parameters of each network layer in the neural network are adjusted layer by layer with the goal of reducing the predicted loss.

The embodiment of the present application shows a specific implementation manner of the foregoing S110. Of course, it should be understood that S110 may also be implemented in other manners, and this is not limited in this embodiment of the application.

It should be noted that the above process is only a single adjustment process. In practical applications, multiple adjustments may be required, and thus the above steps S102 to S110 may be repeatedly performed until the predetermined training stop condition is satisfied, thereby obtaining the final graph neural network. The preset training stopping condition may be set according to actual needs, for example, at least one of the conditions that the prediction loss is smaller than a preset loss threshold, the graph neural network converges, and the adjustment number reaches a preset number, and the preset training stopping condition is not limited in the embodiment of the present application.

After the final graph neural network is obtained, the text classification model, the existing Bert model and the existing BertGCN of the embodiment of the present application are verified by using an mr (movie review) document test set, an R8 document test set and an R52 document test set, and an average classification accuracy of each model and a change of the classification accuracy of each model with the number of iterations as shown in table 1 below and shown in fig. 4 are obtained, where an abscissa in fig. 4 represents the number of iterations (epochs) and an ordinate represents the classification accuracy of the model (accuracycacy).

TABLE 1

Based on the table 1 and fig. 4, it can be seen that, in the training method of the embodiment of the application, the semantic guidance information obtained in the training process has a good guidance effect on model training, and the performance of the text classification model obtained by training is better. Specifically, compared with the existing Bert model and bertGCN model, the text classification model obtained by training in the embodiment of the application has higher average classification accuracy on three document test sets. Secondly, for the MR document test set and the R8 document test set, within 10 steps (i.e. 10 iterations), the text classification model of the embodiment of the present application has a better convergence effect compared with other models (such as the existing BertGCN); for the R52 document test set, within 60 steps, the text classification model of the embodiment of the present application has better convergence effect than other models (such as the existing BertGCN). In addition, when the set iteration number is large, when the text classification model of the embodiment of the application stops in the early-stop constraint, other models (such as the existing BertGCN) do not converge.

According to the training method of the text classification model, the document samples and words contained in the document samples are respectively used as nodes in the relation graph, and the association relation among the nodes is represented by the connecting edges among the nodes, so that the text classification task is regarded as the node classification task; on the basis, a text classification model architecture comprising a pre-training classification network and a graph neural network is adopted, the feature information of a relational graph is learned by the pre-training classification network so as to perform text classification processing on a predicted document sample, the feature information and the structure information of the relational graph are learned by the graph neural network, the incidence relation among nodes in the relational graph, the node representation of the nodes and the class labels corresponding to the training document samples are fully utilized, the text classification processing is performed on the predicted document sample, then the classification reference information obtained by the pre-training classification network and the graph neural network respectively and the class labels corresponding to the predicted document samples are integrated, the network parameters of the graph neural network are optimized, the learning capability and the classification reference information of the two networks can be fully fused, and the performance of the text classification model is further improved; in addition, when the graph neural network is used for text classification processing, semantic guidance information of the relational graph is determined firstly based on the characteristic information and the structural information of the relational graph, and the graph neural network is further used for text classification processing based on the characteristic information, the structural information, the semantic guidance information and class labels corresponding to the training document samples, and the semantic guidance information can reflect respective semantics of the document samples and words contained in the document samples and further can provide semantic guidance for a text classification processing task of the prediction document samples, so that the graph neural network can realize text classification by reasoning node representation of document nodes corresponding to the training document samples in the relational graph to realize text classification, and can focus on important node representation with rich semantic information, therefore, the convergence speed of the text recognition model can be rapidly improved.

The above embodiment introduces a training method of a text classification model, and by the above method, text classification models for different application scenarios can be trained, and a class label of a document sample set used for model training can be selected according to an actual application scenario. Application scenarios to which the above training method provided in the embodiment of the present application is applicable may include, but are not limited to: in e-commerce, finance and other related services, for example, commodity positive and negative comment identification, irrigated comment detection, sensitive and illegal statement detection, fraud short message identification, purchasing tendency identification, financial news classification and the like. Taking the identification of positive and negative reviews of a commodity as an example, the adopted document sample can be historical commodity reviews, and the category tag corresponding to the document sample is used for representing the emotional tendency of the document sample, namely belonging to the positive reviews or the negative reviews.

Based on the training method of the text classification model shown in the above embodiment of the present application, the trained text classification model can be used for executing a text classification task. The application process based on the text classification model is explained in detail below. The embodiment of the application further provides a text classification method, which can classify and identify the documents to be classified based on the text classification model trained by the method shown in fig. 1. Fig. 5 is a flowchart illustrating a document classification method according to an embodiment of the present application, where the method may include the following steps:

s502, based on the target document set, the structural information and the characteristic information of the second relational graph are obtained.

The target document set comprises documents to be classified and classified documents. In practical applications, the classified documents in the target document set may be document samples labeled with category labels.

The second relational graph comprises a plurality of nodes and connecting edges among the nodes, the plurality of nodes comprise document nodes corresponding to the documents to be classified, document nodes corresponding to the classified documents and word nodes corresponding to words contained in the documents in the target document set, the structural information is used for representing edge weights corresponding to the connecting edges in the second relational graph, and the characteristic information comprises node representations of the nodes in the second relational graph. It should be noted that, the method for establishing the second relationship diagram is similar to the method for establishing the first relationship diagram, and reference may be specifically made to the description of the process for establishing the first relationship diagram, and details are not repeated here.

The implementation of the above S502 is similar to the implementation of S102 in the embodiment shown in fig. 1, and reference may be specifically made to the description of S102 in the embodiment shown in fig. 1, which is not repeated herein.

S504, performing text classification processing on the documents to be classified based on the characteristic information through a pre-training classification network in the text classification model to obtain first classification reference information of the documents to be classified.

The implementation manner of the above-mentioned S504 is similar to the implementation manner of the S104 in the embodiment shown in fig. 1, and reference may be specifically made to the description of the S104 in the embodiment shown in fig. 1, which is not described herein again.

S506, determining semantic guidance information of the second relational graph based on the characteristic information and the structural information through a graph neural network in the text classification model.

For example, as shown in fig. 6, o1 to o4 respectively represent node representations of word nodes corresponding to words contained in the target document set, and e1 to e3 respectively represent node representations of document nodes, where e1 and e2 are respectively node representations of document nodes corresponding to classified documents, e3 is a node representation of a document node corresponding to an unclassified document, the category label C1 of the document node e1 is 1 (i.e., the category of the classified document corresponding to the document node is the first category), and the category label C2 of the document node e2 is 2 (i.e., the category of the classified document corresponding to the document node is the second category). Inputting the characteristic information of the second relational graph into the graph neural network, and after the characteristic information is transmitted among nodes through convolution layers in the graph neural network, obtaining a fusion convolution result of each node in the second relational graph, wherein O1-O4 respectively represent the fusion convolution results of word nodes, and E1-E3 respectively represent the fusion convolution results of document nodes; further, the second convolution layer also classifies and identifies the documents to be classified based on the fusion convolution result of the nodes to be classified, so as to obtain second classification reference information C3 of the documents to be classified.

The implementation manner of the above S506 is similar to the implementation manner of S106 in the embodiment shown in fig. 1, and reference may be specifically made to the description of S106 in the embodiment shown in fig. 1, which is not repeated herein.

And S508, performing text classification processing on the documents to be classified through a graph neural network in the text classification model based on the semantic guidance information, the characteristic information and the structural information of the second relational graph and the category to which the classified documents belong to obtain second classification reference information of the documents to be classified.

The implementation manner of the above S508 is similar to the implementation manner of S108 in the embodiment shown in fig. 1, and reference may be specifically made to the description of S108 in the embodiment shown in fig. 1, which is not repeated herein.

S510, determining the category of the document to be classified based on the first classification reference information and the second classification reference information of the document to be classified.

Because the first classification reference information and the second classification reference information of the document to be classified are obtained by performing classification and identification on different networks in the text classification model based on different data of the second relational graph, in order to fully fuse the classification and identification capabilities of the two networks and further classify and predict the accuracy of a result, the first classification reference information and the second classification reference information of the document to be classified can be integrated to determine the category of the document to be classified.

Specifically, based on the respective preset weights corresponding to the first classification reference information and the second classification reference information, the first classification reference information and the second classification reference information of the document to be classified are subjected to weighted summation to obtain final classification reference information of the document to be classified, and then the category corresponding to the maximum classification probability indicated by the final classification reference information is determined as the category to which the document to be classified belongs.

According to the text classification method provided by the embodiment of the application, the documents to be classified, the classified documents and words contained in the documents are respectively used as nodes in the relational graph, the association relation among the nodes is represented by the connecting edges among the nodes, so that the text classification task is regarded as a node classification task, and the text classification task is executed by the text classification model based on the characteristic information and the structural information of the relational graph because the information contained in the relational graph is richer than that contained in the documents, so that the text classification model can acquire richer knowledge and is favorable for improving the text classification accuracy; on the basis, a text classification model architecture comprising a pre-training classification network and a graph neural network is adopted, the pre-training classification network learns the characteristic information of the relational graph to classify the documents to be classified, the graph neural network learns the characteristic information and the structural information of the relational graph, the incidence relation among nodes in the relational graph, the node representation of the nodes and the classes to which the classified documents belong are fully utilized to classify the documents, and then classification reference information obtained by the pre-training classification network and the graph neural network are synthesized to determine the classes to which the documents to be classified belong, so that the learning ability and the prediction result of the two networks can be fully fused, and the text classification accuracy is further improved; in addition, when the graph neural network is used for text classification processing, semantic guidance information of the relational graph is determined firstly based on the characteristic information and the structural information of the relational graph, and text classification processing is further performed on unclassified documents based on the characteristic information, the structural information, the semantic guidance information and the classes of the classified documents of the relational graph by using the graph neural network, because the semantic guidance information can reflect respective semantics of each document and words contained in the document, semantic guidance can be provided for a text classification processing task of the document to be classified, therefore, the graph neural network can not only realize text classification by reasoning node representation of the unclassified nodes in the relational graph according to the node representation of the classified nodes, but also can focus on important node representation with rich semantic information, and can obtain classification results quickly, that is, the text recognition efficiency is improved.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

In addition, corresponding to the above method for training the text classification model shown in fig. 1, an embodiment of the present application further provides a device for training the text classification model. Referring to fig. 7, a schematic structural diagram of an apparatus 700 for training a text classification model according to an embodiment of the present application is provided, where the apparatus 700 includes:

an obtaining unit 710, configured to obtain structure information and feature information of the first relation graph based on the document sample set; the document sample set comprises training document samples and prediction document samples, the first relational graph comprises a plurality of nodes and connecting edges among the nodes, and the nodes comprise document nodes corresponding to the training document samples, document nodes corresponding to the prediction document samples and word nodes corresponding to words contained in the document sample set;

the classification unit 720 is configured to perform text classification processing on the predicted document sample based on the feature information through a pre-training classification network in a text classification model to obtain first classification reference information of the predicted document sample;

a semantic processing unit 730, configured to determine semantic guidance information of the first relation graph based on the feature information and the structure information through a graph neural network in the text classification model;

the classification unit 720 is configured to perform text classification processing on the predicted document sample through a graph neural network in the text classification model based on the semantic guidance information, the feature information, the structure information, and a class label corresponding to the training document sample, so as to obtain second classification reference information of the predicted document sample;

an optimizing unit 740, configured to optimize network parameters of the graph neural network based on the first classification reference information, the second classification reference information, and the class labels corresponding to the prediction document samples.

Optionally, the graph neural network comprises a node score layer, and the semantic guidance information comprises importance scores of the nodes in the first relational graph;

the semantic processing unit is specifically configured to:

determining the attention score of each node in the first relational graph through the structural information and the characteristic information by utilizing the node scoring layer based on a self-attention mechanism;

selecting nodes with attention scores meeting preset score conditions from the first relational graph, and carrying out nonlinear transformation processing on the attention scores of the selected nodes to obtain importance scores of the selected nodes;

and setting the importance scores of the unselected nodes in the first relational graph as preset scores.

Optionally, the graph neural network further comprises a first convolutional layer and a second convolutional layer, the first convolutional layer and the second convolutional layer are connected;

the classification unit is specifically configured to:

performing convolution processing on the structural information and the characteristic information by using the first convolution layer based on the class label corresponding to the training document sample to obtain a first convolution result;

performing fusion processing on the first convolution result and the semantic guidance information to obtain a fusion convolution result;

and inputting the fusion convolution result into the second convolution layer for convolution processing to obtain second classification reference information.

Optionally, the optimization unit is specifically configured to:

acquiring a first weight corresponding to the pre-training classification network and a second weight corresponding to the graph neural network;

multiplying the first classification reference information by the first weight, multiplying the second classification reference information by the second weight, and fusing the results of the multiplication;

determining the prediction category of the prediction document sample according to the result of the fusion processing;

determining a prediction loss based on the prediction category of the prediction document sample, the category label corresponding to the prediction document sample and a preset loss function corresponding to the graph neural network;

optimizing network parameters of the graph neural network based on the predicted loss.

Optionally, the connection edges between the nodes in the first relationship graph include a first type of connection edge and a second type of connection edge;

the apparatus 700 further comprises:

a creating unit configured to create a first relational graph based on a training document sample and a prediction document sample in a document sample set before the acquiring unit acquires structure information and feature information of the first relational graph based on the document sample set;

creating a first relationship graph based on training document samples and predicted document samples in the document sample set, comprising:

determining a word set corresponding to the document sample set based on words contained in the document sample set; the words contained in the document sample set comprise words in the training document sample and words in the prediction document sample;

aiming at each word in the word set, creating word nodes corresponding to each word, and creating a first type of connecting edge between different word nodes meeting a first creating edge condition;

and aiming at the document samples in the sample set, creating document nodes corresponding to the document samples, and creating a second type of connecting edge between the document nodes meeting a second creating edge condition and the word nodes.

Optionally, the edge weight corresponding to each connection edge in the first relationship graph includes an edge weight of a first type of connection edge and an edge weight of a second type of connection edge;

the acquiring unit acquires structure information of a first relational graph, including:

determining mutual point information between a first word and a second word in the word set based on the probability that each of the first word and the second word appears in the document sample set and the probability that the first word and the second word appear in the same document sample;

determining edge weights corresponding to first-class connecting edges between word nodes corresponding to the first words and word nodes corresponding to the second words based on the point mutual information;

determining the importance degree of any target word in the word set to a target document sample based on the occurrence frequency of the target word in the target document sample and the number of document samples containing the target word in the sample set;

and determining the edge weight corresponding to the second connecting edge between the word node corresponding to the target word and the document node corresponding to the target document sample according to the importance degree.

Optionally, the pre-trained classification network includes:

the language representation layer is used for embedding the characteristic information to obtain an embedded vector of each node in the first relational graph;

and the full connection layer is used for carrying out classification and identification on the document sample based on the embedded vector of each node in the first relational graph to obtain first classification reference information of the document sample.

Obviously, the training apparatus for text classification models provided in the embodiments of the present application can be used as the execution subject of the training method for text classification models shown in fig. 1, for example, step S102 in the training method for text classification models shown in fig. 1 can be executed by the obtaining unit in the training apparatus shown in fig. 7, step S104 by the classifying unit, step S106 by the semantic processing unit, step S108 by the classifying unit, and step S110 by the optimizing unit.

According to another embodiment of the present application, the units in the training apparatus of the text classification model shown in fig. 7 may be respectively or entirely combined into one or several other units to form the unit, or some unit(s) therein may be further split into multiple units which are smaller in function to form the unit(s), which may achieve the same operation without affecting the achievement of the technical effect of the embodiment of the present application. The units are divided based on logic functions, and in practical application, the functions of one unit can be realized by a plurality of units, or the functions of a plurality of units can be realized by one unit. In other embodiments of the present application, the training apparatus based on the text classification model may also include other units, and in practical applications, these functions may also be implemented by the assistance of other units, and may be implemented by cooperation of multiple units.

According to another embodiment of the present application, the training apparatus as shown in fig. 7 may be constructed by running a computer program (including program codes) capable of executing the steps involved in the corresponding method as shown in fig. 1 on a general-purpose computing device such as a computer including a Central Processing Unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM), and the like, as well as a storage element, and the training method of the text classification model according to the embodiment of the present application may be implemented. The computer program may be, for example, embodied on a computer-readable storage medium, and loaded into and executed by an electronic device via the computer-readable storage medium.

According to the training device for the text classification model, the document sample and words contained in the document sample are respectively used as nodes in the relational graph, and the incidence relation between the nodes is represented by the connecting edges between the nodes, so that the text classification task is regarded as a node classification task, and the relational graph contains more information compared with the document sample, so that the text classification model is trained by using the relational graph, the text classification model can learn more abundant knowledge, and the performance of the text classification model is improved; on the basis, a text classification model architecture comprising a pre-training classification network and a graph neural network is adopted, the feature information of a relational graph is learned by the pre-training classification network so as to perform text classification processing on a predicted document sample, the feature information and the structure information of the relational graph are learned by the graph neural network, the incidence relation among nodes in the relational graph, the node representation of the nodes and the class labels corresponding to the training document samples are fully utilized, the text classification processing is performed on the predicted document sample, then the classification reference information obtained by the pre-training classification network and the graph neural network respectively and the class labels corresponding to the predicted document samples are integrated, the network parameters of the graph neural network are optimized, the learning capability and the classification reference information of the two networks can be fully fused, and the performance of the text classification model is further improved; in addition, when the graph neural network is used for text classification processing, semantic guidance information of the relational graph is determined firstly based on the characteristic information and the structural information of the relational graph, and the graph neural network is further used for text classification processing based on the characteristic information, the structural information, the semantic guidance information and class labels corresponding to the training document samples, and the semantic guidance information can reflect respective semantics of the document samples and words contained in the document samples and further can provide semantic guidance for a text classification processing task of the prediction document samples, so that the graph neural network can realize text classification by reasoning node representation of document nodes corresponding to the training document samples in the relational graph to realize text classification, and can focus on important node representation with rich semantic information, therefore, the convergence speed of the text recognition model can be rapidly improved.

In addition, corresponding to the text classification method shown in fig. 5, an embodiment of the present application further provides a text classification device. Referring to fig. 8, a schematic structural diagram of a text classification apparatus 800 according to an embodiment of the present application is provided, where the apparatus 800 includes:

an obtaining unit 810, configured to obtain structure information and feature information of the second relationship graph based on the target document set; the target document set comprises a document to be classified and a classified document, the second relational graph comprises a plurality of nodes and connecting edges among the nodes, and the plurality of nodes comprise document nodes corresponding to the document to be classified, document nodes corresponding to the classified document and word nodes corresponding to words contained in the document in the target document set;

a classification unit 820, configured to perform text classification processing on the document to be classified based on the feature information through a pre-training classification network in a text classification model, so as to obtain first classification reference information of the document to be classified;

a semantic processing unit 830, configured to determine semantic guidance information of the second relation graph based on the feature information and the structure information through a graph neural network in the text classification model;

a classification unit 820, configured to perform text classification processing on the document to be classified through a graph neural network in the text classification model based on the semantic guidance information, the feature information, the structure information, and the category to which the classified document belongs, so as to obtain second classification reference information of the document to be classified;

a classifying unit 820, configured to determine a category to which the to-be-classified document belongs based on the first classification reference information and the second classification reference information of the to-be-classified document.

Obviously, the text classification device provided in the embodiment of the present application can be used as the execution subject of the text classification method shown in fig. 5, for example, step S502 in the text classification method shown in fig. 5 can be executed by the obtaining unit in the classification device shown in fig. 8, step S504, step S508, and step S510 are executed by the classification unit, and step S506 is executed by the semantic processing unit.

According to another embodiment of the present application, the units in the text classification apparatus shown in fig. 8 may be respectively or entirely combined into one or several other units to form the text classification apparatus, or some unit(s) thereof may be further split into multiple functionally smaller units to form the text classification apparatus, which may achieve the same operation without affecting the achievement of the technical effect of the embodiment of the present application. The units are divided based on logic functions, and in practical application, the functions of one unit can be realized by a plurality of units, or the functions of a plurality of units can be realized by one unit. In other embodiments of the present application, the text-based classification apparatus may also include other units, and in practical applications, these functions may also be implemented by assistance of other units, and may be implemented by cooperation of a plurality of units.

According to another embodiment of the present application, the classification apparatus as shown in fig. 8 may be constructed by running a computer program (including program codes) capable of executing the steps involved in the corresponding method as shown in fig. 5 on a general-purpose computing device such as a computer including a processing element such as a Central Processing Unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM), and a storage element, and implementing the text classification method according to the embodiment of the present application. The computer program may be, for example, embodied on a computer-readable storage medium, and loaded into and executed in an electronic apparatus via the computer-readable storage medium.

According to the text classification device provided by the embodiment of the application, the documents to be classified, the classified documents and words contained in the documents are respectively used as nodes in the relational graph, the association relation among the nodes is represented by the connecting edges among the nodes, so that the text classification task is regarded as a node classification task, and the text classification task is executed by the text classification model based on the characteristic information and the structural information of the relational graph because the information contained in the relational graph is richer than that contained in the documents, so that the text classification model can acquire richer knowledge and is favorable for improving the text classification accuracy; on the basis, a text classification model architecture comprising a pre-training classification network and a graph neural network is adopted, the pre-training classification network learns the characteristic information of the relational graph to classify the documents to be classified, the graph neural network learns the characteristic information and the structural information of the relational graph, the incidence relation among nodes in the relational graph, the node representation of the nodes and the classes to which the classified documents belong are fully utilized to classify the documents, and then classification reference information obtained by the pre-training classification network and the graph neural network are synthesized to determine the classes to which the documents to be classified belong, so that the learning ability and the prediction result of the two networks can be fully fused, and the text classification accuracy is further improved; in addition, when the graph neural network is used for text classification processing, semantic guidance information of the relational graph is determined firstly based on the characteristic information and the structural information of the relational graph, and text classification processing is further performed on unclassified documents based on the characteristic information, the structural information, the semantic guidance information and the classes of the classified documents of the relational graph by using the graph neural network, because the semantic guidance information can reflect respective semantics of each document and words contained in the document, semantic guidance can be provided for a text classification processing task of the document to be classified, therefore, the graph neural network can not only realize text classification by reasoning node representation of the unclassified nodes in the relational graph according to the node representation of the classified nodes, but also can focus on important node representation with rich semantic information, and can obtain classification results quickly, that is, the text recognition efficiency is improved.

Fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application. Referring to fig. 9, at a hardware level, the electronic device includes a processor, and optionally further includes an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory, such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.

The processor, the network interface, and the memory may be connected to each other via an internal bus, which may be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 9, but this does not indicate only one bus or one type of bus.

A memory for storing a computer program. In particular, the computer program may comprise program code comprising computer operating instructions. The memory may include both memory and non-volatile storage and provides instructions and data to the processor.

The processor reads a corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form the training device of the text classification model on a logic level.

In one embodiment, the processor executes the program stored in the memory and is specifically configured to perform the following operations:

acquiring structural information and characteristic information of the first relational graph based on the document sample set; the document sample set comprises training document samples and prediction document samples, the first relational graph comprises a plurality of nodes and connecting edges among the nodes, and the nodes comprise document nodes corresponding to the training document samples, document nodes corresponding to the prediction document samples and word nodes corresponding to words contained in the document samples in the document sample set;

Or the processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form the text classification device on the logic level. The processor is used for executing the program stored in the memory and is specifically used for executing the following operations:

The method performed by the training apparatus for the text classification model disclosed in the embodiment of fig. 1 of the present application or the method performed by the text classification apparatus disclosed in the embodiment of fig. 5 of the present application may be applied to or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.

The electronic device may further execute the method shown in fig. 1 and implement the functions of the training apparatus for text classification models in the embodiment shown in fig. 1, or the electronic device may further execute the method shown in fig. 5 and implement the functions of the text classification apparatus in the embodiment shown in fig. 5, which is not described herein again in this embodiment of the present application.

Of course, besides the software implementation, the electronic device of the present application does not exclude other implementations, such as a logic device or a combination of software and hardware, and the like, that is, the execution subject of the following processing flow is not limited to each logic unit, and may also be hardware or a logic device.

Embodiments of the present application also provide a computer-readable storage medium storing one or more computer programs, where the one or more computer programs include instructions, which, when executed by a portable electronic device including a plurality of application programs, can cause the portable electronic device to perform the method of the embodiment shown in fig. 1, and is specifically configured to perform the following operations:

The instructions, when executed by a portable electronic device comprising a plurality of application programs, are capable of causing the portable electronic device to perform the method of the embodiment shown in fig. 5, and in particular to perform the following operations:

In short, the above description is only a preferred embodiment of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

Claims

1. A training method of a text classification model is characterized by comprising the following steps:

acquiring structure information and characteristic information of the first relational graph based on the document sample set; the document sample set comprises training document samples and prediction document samples, the first relational graph comprises a plurality of nodes and connecting edges among the nodes, and the plurality of nodes comprise document nodes corresponding to the training document samples, document nodes corresponding to the prediction document samples and word nodes corresponding to words contained in the document sample set;

and optimizing the network parameters of the graph neural network based on the first classification reference information, the second classification reference information book and the class labels corresponding to the prediction document samples.

2. The method according to claim 1, wherein the graph neural network in the text classification model comprises node scoring layers, and the semantic guidance information comprises importance scores of respective nodes in the first relationship graph; the determining, by a graph neural network in the text classification model, semantic guidance information of the first relationship graph based on the feature information and the structure information includes:

3. The method of claim 2, wherein the graph neural network in the text classification model further comprises a first convolutional layer and a second convolutional layer, the first convolutional layer being connected to the second convolutional layer; performing, by the graph neural network in the text classification model, text classification processing on the predicted document sample based on the semantic guidance information, the feature information, the structure information, and the class label corresponding to the training document sample to obtain second classification reference information of the predicted document sample, including:

4. The method of claim 1, wherein optimizing the network parameters of the graph neural network based on the first classification reference information, the second classification reference information and the class labels corresponding to the prediction document samples comprises:

5. The method of claim 1, wherein the connection edges between the nodes in the first relationship graph comprise a first type of connection edge and a second type of connection edge, and before obtaining the structure information and the feature information of the first relationship graph based on the document sample set, the method further comprises:

creating a first relational graph based on training document samples and predicted document samples in the document sample set;

and aiming at the document samples in the document sample set, creating document nodes corresponding to the training documents, creating document nodes corresponding to the prediction documents, and creating a second type connecting edge between the document nodes meeting a second creating edge condition and the word nodes.

6. The method according to claim 5, wherein the edge weight corresponding to each connection edge in the first relationship graph includes an edge weight of a first type of connection edge and an edge weight of a second type of connection edge, and acquiring the structure information of the first relationship graph includes:

determining mutual point information between a first word and a second word in the word set based on the probability that the first word and the second word respectively appear in the document sample set and the probability that the first word and the second word appear in the same document sample;

determining the importance degree of any target word in the word set to a target document sample based on the occurrence frequency of the target word in the target document sample and the number of document samples containing the target word in the document sample set;

7. The method of any of claims 1-6, wherein the pre-trained classification network comprises:

8. A method of text classification, comprising:

acquiring structural information and characteristic information of the second relational graph based on the target document set; the target document set comprises a document to be classified and a classified document, the second relational graph comprises a plurality of nodes and connecting edges among the nodes, and the plurality of nodes comprise document nodes corresponding to the document to be classified, document nodes corresponding to the classified document and word nodes corresponding to words contained in the target document set;

9. An apparatus for training a text classification model, comprising:

the acquiring unit is used for acquiring the structural information and the characteristic information of the first relational graph based on the document sample set; the document sample set comprises training document samples and prediction document samples, the first relational graph comprises a plurality of nodes and connecting edges among the nodes, and the nodes comprise document nodes corresponding to the training document samples, document nodes corresponding to the prediction document samples and word nodes corresponding to words contained in the document sample set;

the classification unit is used for performing text classification processing on the predicted document sample based on the characteristic information through a pre-training classification network to obtain first classification reference information of the predicted document sample;

a semantic processing unit, configured to determine semantic guidance information of the first relationship graph based on the feature information and the structure information through a graph neural network;

the classification unit is configured to perform text classification processing on the predicted document sample through the graph neural network based on the semantic guidance information, the feature information, the structure information, and the class label corresponding to the training document sample to obtain second classification reference information of the predicted document sample;

10. A text classification apparatus, comprising:

an acquisition unit configured to acquire structural information and feature information of the second relational graph based on the target document set; the target document set comprises a document to be classified and a classified document, the second relational graph comprises a plurality of nodes and connecting edges among the nodes, and the plurality of nodes comprise document nodes corresponding to the document to be classified, document nodes corresponding to the classified document and word nodes corresponding to words contained in the target document set;

11. An electronic device, comprising:

a processor;

a memory for storing the processor-executable computer program;

wherein the processor is configured to execute the computer program to implement the method of any one of claims 1 to 8.

12. A computer-readable storage medium, in which a computer program in the computer-readable storage medium, when executed by a processor of an electronic device, enables the electronic device to perform the method of any of claims 1 to 8.