CN107526785B

CN107526785B - Text classification method and device

Info

Publication number: CN107526785B
Application number: CN201710642105.0A
Authority: CN
Inventors: 彭浩; 李建欣; 何雨; 刘垚鹏; 包梦蛟; 宋阳秋; 杨强
Original assignee: Guangzhou HKUST Fok Ying Tung Research Institute
Current assignee: Guangzhou HKUST Fok Ying Tung Research Institute
Priority date: 2017-07-31
Filing date: 2017-07-31
Publication date: 2020-07-17
Anticipated expiration: 2037-07-31
Also published as: CN107526785A

Abstract

The text classification method and the text classification device are characterized in that a plurality of training texts of known classes are received, the training texts are preprocessed, a graph structure of the training texts is constructed by adopting a word co-occurrence relation, parameters of a convolutional neural network are trained through a back propagation algorithm according to the graph structure of the training texts, the trained convolutional neural network is obtained, then the input texts to be classified are received, the texts to be classified are preprocessed, a graph structure of the texts to be classified is constructed by adopting the word co-occurrence relation, then the class technical scheme of the texts to be classified is predicted through the trained convolutional neural network according to the graph structure of the texts to be classified, the problem of text classification by adopting the convolutional neural network is solved, and the accuracy and the reliability of text classification are improved.

Description

Text classification method and device

Technical Field

The invention relates to the field of machine learning, in particular to a text classification method and device.

Background

The convolutional neural network is an artificial neural network with deep learning capability designed according to the principle of a visual neural mechanism of primates. Hubel and Wiesel proposed a visual structure model based on the visual cortex of cats in 1962 and first proposed the concept of receptive fields. However, with the advent of simpler and more efficient linear classifiers such as support vector machines and the like and the ubiquitous local minimum limitation in the non-convex target cost function of the deep structure, the research of the neural network is brought into the low tide of the last two decades. Hinton et al are known to propose an unsupervised bead layer training method based on a Deep Belief Network (DBN) to solve the optimization problem related to the deep structure.

The conventional convolutional neural network is generally used for image classification, and the convolutional neural network is urgently needed to be applied to text classification, so that the accuracy and the reliability of the text classification are improved.

Disclosure of Invention

The embodiment of the invention aims to provide a text classification method and a text classification device, which can effectively solve the problem that the prior art is lack of text classification by using a convolutional neural network and improve the accuracy and the reliability of text classification.

In order to achieve the above object, an embodiment of the present invention provides a text classification method, including:

receiving training texts of a plurality of known categories, preprocessing the training texts, constructing a graph structure of the training texts by adopting a word co-occurrence relation, and training parameters of the convolutional neural network through a back propagation algorithm according to the graph structure of the training texts to obtain the trained convolutional neural network; the convolutional neural network comprises at least one convolutional layer, at least one pooling layer, at least one full-connection layer and at least one classification output layer; in the graph structure of the training text, nodes correspond to words in the training text one by one;

receiving an input text to be classified, preprocessing the text to be classified, and constructing a graph structure of the text to be classified by adopting a word co-occurrence relation; in the graph structure of the text to be classified, nodes correspond to words in the text to be classified one by one;

and predicting the category of the text to be classified through the trained convolutional neural network according to the graph structure of the text to be classified.

Compared with the prior art, the text classification method disclosed by the invention has the advantages that the training texts of a plurality of known classes are received, the graph structure of the training texts is constructed by adopting the co-occurrence relation of words after the training texts are preprocessed, the parameters of the convolutional neural network are trained through a back propagation algorithm according to the graph structure of the training texts to obtain the trained convolutional neural network, then the input texts to be classified are received, the graph structure of the texts to be classified is constructed by adopting the co-occurrence relation of the words after the texts to be classified are preprocessed, then the class technical scheme of the texts to be classified is predicted through the trained convolutional neural network according to the graph structure of the texts to be classified, the problem of text classification is solved by applying the convolutional neural network, and the accuracy and the reliability of text classification are improved.

As an improvement of the above scheme, the preprocessing of the training text or the text to be classified specifically includes:

after word segmentation processing is carried out on the training text or the text to be classified, noise and stop words of the training text or the text to be classified are removed, and word stems of all words in the training text or the text to be classified are extracted; wherein the noise point comprises punctuation marks and numbers, and the stop words comprise pronouns, conjunctions, prepositions and articles;

constructing a graph structure of the training text or the text to be classified by adopting a co-occurrence relation of words, which specifically comprises the following steps:

traversing the training text or the text to be classified through a sliding window with a preset size, and constructing an edge between any two words when the two words are simultaneously positioned in the sliding window, wherein the edge points to the next word from the previous word. Removing words with weak semantics and unrealistic meanings is necessary for highlighting the theme of the text and accurately classifying the text.

As an improvement of the above scheme, predicting the category of the text to be classified through the trained convolutional neural network according to the graph structure of the text to be classified and the graph structure of the text to be classified specifically includes:

constructing a plurality of subgraphs according to the graph structure of the text to be classified, carrying out normalization processing on each subgraph, and obtaining word vector representation of each node in each subgraph as the input of a convolutional neural network;

and predicting the category of the text to be classified according to the output result of the classification output layer. For each sub-image, the sub-images have different semantic information, the distance between the sub-images in the N-gram model of the text is long, and for text classification, high-level abstract features can be obtained by extracting the features of the different sub-images and then fusing the features, so that an accurate text classification result is obtained.

As an improvement of the above scheme, the constructing a plurality of sub-images according to the graph structure of the text to be classified, and the normalizing each sub-image comprises the steps of:

extracting nodes of the graph structure of the text to be classified, and sequencing the nodes according to the contribution values; wherein; the contribution value is determined by the degree of each node, the word frequency of a word corresponding to the node in a text and the co-occurrence rate of the node and a neighborhood node in sequence;

selecting nodes of N before sequencing from the nodes as key nodes, taking each key node as a root node, constructing sub-graphs through a breadth-first search algorithm, and carrying out normalization processing on each sub-graph; wherein the subgraph includes at least k nodes, N >0, k > 0.

As an improvement of the above scheme, the constructing of subgraphs by a breadth-first search algorithm, and the normalization processing of each subgraph specifically comprises:

acquiring adjacent nodes of the root node, and if the number of the adjacent nodes of the root node is greater than k-1, constructing a subgraph by using the root node, the adjacent nodes of the root node and the edges of the root node and the adjacent nodes;

if the number of the adjacent nodes of the root node is less than k-1, acquiring secondary adjacent nodes of the root node step by step until the total number of the acquired adjacent nodes and secondary adjacent nodes is greater than or equal to k or the secondary adjacent nodes cannot be acquired continuously, and constructing a subgraph according to the root node, the adjacent nodes and the secondary adjacent nodes of the root node, edges of the root node and the adjacent nodes, edges of the adjacent nodes and the secondary adjacent nodes of the root node and edges between the secondary adjacent nodes; wherein the secondary neighboring node is a node indirectly connected to the root node;

constructing a spanning tree according to the subgraph, and sequencing nodes of the spanning tree from a shallow layer to a deep layer by using a breadth first algorithm;

in the same layer, sequencing the adjacent nodes of the root node according to the size of the contribution value;

when the node in the subgraph is larger than k, reserving the node of k before the sequencing in the spanning tree, thereby completing the normalization process of the subgraph;

when the nodes in the subgraph are smaller than k, adding a plurality of dummy nodes in the subgraph to enable the number of the nodes in the subgraph to be equal to k, and thus completing the normalization process of the subgraph; wherein the dummy node is not connected with any node in the original subgraph.

As an improvement of the above scheme, the training of the parameters of the convolutional neural network through a back propagation algorithm to obtain the trained convolutional neural network specifically includes:

initializing parameters of the convolutional neural network, and carrying out forward propagation on the training texts of the known types through the convolutional neural network to obtain output results;

and performing back propagation according to the output result and the error marked by the training text, distributing the error to each layer in the convolutional neural network to obtain error data of each layer, and correcting the parameters of the convolutional neural network according to the error data of each layer.

As an improvement of the above scheme, the reversely propagating the error according to the output result and the labeled error of the training text, distributing the error to each layer in the convolutional neural network to obtain error data of each layer, and modifying the parameter of the convolutional neural network according to the error data of each layer specifically includes:

constructing a loss function according to the output result and the known class of the training text, and acquiring a residual error of any neural node in the convolutional neural network according to the loss function; wherein the loss function is:

J＝H+Cλ(w)

wherein H is a cross entropy term and C λ (w) is a regularization term that prevents overfitting;

carrying out recursive operation according to the residual error of the node to update the parameter of each neural node; wherein the parameters of the neural node include neural network weights and biases.

As an improvement of the above scheme, when the classification output layer includes a plurality of classifiers, the cross entropy term specifically is:

wherein M is the number of training texts of known classes, K is the number of classifiers of the classification output layer,

for training text d_mWhether or not to belong to class k₁The binary label of (a) is stored,

for training text d_mIn a prediction class k obtained by the convolutional neural network₁The probability of (c).

As an improvement of the above, the method further comprises the steps of:

constructing a grade relation according to different categories, and acquiring a regularization term at the full-connection layer through the following formula to update parameters of the full-connection layer:

wherein, the

For the weight of the parental category in the hierarchical relationship,

is the weight of the child category in the hierarchical relationship.

As an improvement of the scheme, the word vector of each node in the graph structure of the text is represented by a word2vec model or a glove model.

As an improvement of the scheme, the pooling layer adopts an average value pooling method, a maximum value pooling method or a random pooling method to perform downsampling processing on the feature matrix.

As an improvement of the above scheme, dropout is adopted in the fully connected layer to randomly clear the activation output value of the fully connected layer at a preset ratio.

As an improvement of the above solution, the convolutional layer adopts a linear correction function as an activation function of the convolutional layer, and the activation function of the fully-connected layer is a Sigmoid function, a tanh (x) function or a softplus function.

As an improvement of the scheme, the classifier is a softmax classifier.

As an improvement of the above scheme, the classifier includes several Sigmoid functions.

As an improvement of the above, the method further comprises the steps of:

constructing a tree structure according to the hierarchical relationship among different category constructions, dividing the tree structure into a plurality of subtrees, and training the full-connection layer by taking the subtrees as a unit; the tree structure comprises a plurality of category nodes and category edges, wherein the category nodes correspond to categories, and the category edges point to the category of the next level from the category of the previous level; wherein the step of dividing the tree structure into a plurality of subtrees is specifically as follows:

traversing from any category node by a depth-first and front-end traversal method, and when leaf nodes obtained by traversal are equal to a preset threshold value, segmenting the category nodes and traversed other category nodes into a sub-tree;

and taking the subtree as a category node and a leaf node, traversing from any category node by a depth-first and front-order traversal method, and if the leaf node obtained by traversal is equal to the preset threshold value, segmenting the category node and the traversed other category nodes into a subtree.

An embodiment of the present invention further provides a text classification apparatus, including:

the training module is used for receiving training texts of a plurality of known categories, preprocessing the training texts, constructing a graph structure of the training texts by adopting word co-occurrence relation, and training parameters of the convolutional neural network through a back propagation algorithm according to the graph structure of the training texts to obtain the trained convolutional neural network; the convolutional neural network comprises at least one convolutional layer, at least one pooling layer, at least one full-connection layer and at least one classification output layer; in the graph structure of the training text, nodes correspond to words in the training text;

the system comprises a to-be-classified text receiving module, a to-be-classified text receiving module and a to-be-classified text preprocessing module, wherein the to-be-classified text receiving module is used for receiving an input to-be-classified text, and constructing a graph structure of the to-be-classified text by adopting a word co-occurrence relation after preprocessing the to-be-classified text; in the graph structure of the text to be classified, nodes correspond to words in the text to be classified;

and the class prediction module is used for predicting the class of the text to be classified through the trained convolutional neural network according to the graph structure of the text to be classified.

The text classification device disclosed by the invention receives a plurality of training texts with known categories through a training module, preprocesses the training texts, constructs a graph structure of the training texts by adopting the co-occurrence relation of words, training the parameters of the convolutional neural network through a back propagation algorithm according to the graph structure of the training text to obtain the trained convolutional neural network, then the input text to be classified is received by a text receiving module to be classified, after the text to be classified is preprocessed, constructing a graph structure of the text to be classified by adopting the co-occurrence relation of words, then predicting the graph structure of the text to be classified by a category prediction module, and predicting the category technical scheme of the text to be classified through the trained convolutional neural network, and solving the problem of text classification by applying the convolutional neural network, thereby improving the accuracy and the reliability of text classification.

Drawings

Fig. 1 is a schematic flowchart of a text classification method provided in embodiment 1 of the present invention.

FIG. 2 is a schematic diagram of a graph structure for constructing text in accordance with the present invention.

Fig. 3 is a process of operation of the convolutional neural network of the present invention.

FIG. 4 is a working process of sub-graph construction and normalization of the graph structure of the text in the present invention.

FIG. 5 is a schematic diagram of the relationship of parent and child nodes for the categories of the present invention.

FIG. 6 is a diagram illustrating the unit classification of categories according to the present invention.

Fig. 7 is a schematic structural diagram of a text classification apparatus according to embodiment 2 of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, which is a schematic flow chart of a text classification method provided in embodiment 1 of the present invention, including the steps of:

s1, receiving training texts of a plurality of known types, preprocessing the training texts, constructing a graph structure of the training texts by adopting word co-occurrence relation, and training parameters of the convolutional neural network through a back propagation algorithm according to the graph structure of the training texts to obtain the trained convolutional neural network; the convolutional neural network comprises at least one convolutional layer, at least one pooling layer, at least one full-connection layer and at least one classification output layer; in the graph structure of the training text, nodes correspond to words in the training text one by one;

it should be noted that the Back Propagation process (BP) is a bottom-up process, belongs to a supervised learning algorithm, and is suitable for training a forward neural network. Then, in this step, the farmer trains the parameters of the convolutional neural network through a back propagation algorithm, and the obtaining of the trained convolutional neural network specifically includes:

initializing parameters of the convolutional neural network, and carrying out forward propagation on the training texts of the known types through the convolutional neural network to obtain output results; preferably, the initialization of the parameters of the convolutional neural network can be performed by a robust weight initialization method, and the weight is initialized to be distributed in a zero mean Gaussian manner.

In the embodiment of the invention, the convolutional neural network is adopted for text classification, the original input data can be directly applied, and the training method extracts and finds the best characteristic by adjusting trainable parameters. Where the input layer may act directly on the original input data, for a graph structure of text, the input data is a word vector representation of the text. Convolutional layers, also called feature extraction layers, the corresponding convolutional kernels can also be filters, the convolutional kernels typically involving the size, number and step size of the convolutional kernels. The number of convolution kernels indicates the number of feature maps obtained by convolution filtering from an upper layer, and the more feature maps are extracted, the larger the network representation feature space is, and the stronger the learning ability is. However, too many convolution kernels increase the complexity of the network, increase the number of parameters, increase the complexity of calculation, and easily cause an overfitting phenomenon. Therefore, the number of convolution kernels needs to be determined according to the data set size of a specific application. On the other hand, the size of the convolution kernel determines the size of the feature map, and the step size of the convolution kernel determines the step size and the number of features of the acquired image. The pooling layer, also called a lower adoption layer, mainly aims to reduce the data processing amount and accelerate the network training speed on the basis of retaining useful information. The more convolutional and pooling layers, the more abstract features can be extracted on the basis of the previous layer. Further, the pooling layer performs down-sampling processing on the feature matrix by using an average pooling method, a maximum pooling method or a random pooling method. The mean sampling means averaging the feature points in the domain, and the maximum sampling means maximizing the feature points in the domain. The random sampling steps are as follows: firstly, the statistical sum of the matrix to be pooled is obtained, the proportion of each element in the statistical sum is calculated, random sampling is carried out according to the proportion as the probability, and the probability of sampling with large elements is also high.

S2, receiving an input text to be classified, preprocessing the text to be classified, and constructing a graph structure of the text to be classified by adopting a word co-occurrence relation; in the graph structure of the text to be classified, nodes correspond to words in the text to be classified one by one;

the preprocessing of the training text or the text to be classified specifically comprises the following steps:

after word segmentation processing is carried out on the training text or the text to be classified, noise and stop words of the training text or the text to be classified are removed, and word stems of all words in the training text or the text to be classified are extracted; wherein the noise comprises punctuation marks and numbers, and the stop words comprise pronouns, conjunctions, prepositions and articles. Specifically, since noise and stop words have no practical meaning and carry little information, the text distinguishing capability is weak except for the function of a language model and sentiment analysis, and thus the noise and the stop words need to be removed in text classification. In practical application, a stop word list is usually pre-established, then the word stop word list obtained by word segmentation is matched, if a word exists in the list, the word is indicated as a stop word, and then the word is deleted; if not, it is retained. Besides some stop words, there are many semantically ambiguous adverbs, numbers, and directional words, such as "in", "one", and "very", which do not contribute much to the content of the text, so that it is necessary to remove the words with weak semantics but not practical meaning, and it is necessary to highlight the subject of the text and classify the text accurately.

In addition, a graph structure of the training text or the text to be classified is constructed by adopting a co-occurrence relation of words, and the method specifically comprises the following steps:

traversing the training text or the text to be classified through a sliding window with a preset size, and constructing an edge between any two words when the two words are simultaneously positioned in the sliding window, wherein the edge points to the next word from the previous word. In addition, the graph structure of the text is constructed, so that the text can be classified by combining the conventional graph convolutional neural network classification method, the accuracy is high, and the parameters of the model are easy to train. A graph is a data structure composed of finite and nonempty sets of vertices and edges between the vertices, and is usually represented by G ═ V, E, W, where G represents a graph, V is the set of vertices in the graph G, E is the set of edges in the graph G, and W is the set of vertices and weights of the edges. That is, a graph structure is a structure composed of nodes, directed edges connecting the nodes, and weights representing the degrees of importance of the nodes and the edges. The graph structure of the text is constructed through the co-occurrence relationship, the sequence relationship (context) of the words can be obtained, the key information of the text is not lost, and the accurate classification result is favorably obtained. As shown in fig. 2, the sliding window has a size of 3, and one edge is constructed every time co-occurrence occurs. Of course, the number of edges between every two words can be reduced to one, and the weight of the edge is in a direct proportion relationship with the co-occurrence rate of the two words corresponding to the edge or the weight of the edge is in a direct proportion relationship with the similarity of the two words connected by the edge.

And S3, predicting the category of the text to be classified through the trained convolutional neural network according to the graph structure of the text to be classified.

The convolutional layer preferably uses a linear correction function (reconstructed linear unit, Re L U) as the activation function of the convolutional layer, and when the input value is less than or equal to zero, the linear correction function is forced to be equal to zero, and when the input value is greater than zero, the linear correction function is kept unchanged, which can bring moderate sparsity to the trained network, can greatly reduce the training time, improve the performance of the network, and is closer to the nature of biological neuron activation, namely, the excitation principle of a neuron signal.

Preferably, the classification output layer is a softmax classifier. The softmax is expanded on the basis of logistic regression, can solve the problem of multi-classification, and preferably calculates the distribution probability of each class through the following formula:

wherein s is_iRepresents the output value, s, of the ith neuron of the Softmax classifier_iF · η, where F is a word vector of key nodes of a certain training text, η is a corresponding weight, and n is the number of categories to be classified.

Preferably, the classification output layer comprises several Sigmoid classifiers. The Sigmoid function is a commonly used S-type nonlinear activation function, and the specific formula is f (x) -e (1+ e)^-x)^-1Its function is to compress a real number between 0 and 1. During learning, the important features are pushed to the middle area, and the non-important features are pushed to the two side areas, which are consistent with synapses of neurons in neurology.

Preferably, since the loss function is non-convex and has no analytic solution, the loss function is intelligently solved by an optimization algorithm, and the optimization algorithm can adopt a random gradient descent algorithm, an adaptive gradient algorithm and a Nesterov gradient acceleration algorithm. When the trained data set samples are not enough or the data are over-trained, overfitting can be generated, and in order to enhance the network generalization capability and prevent overfitting, the main methods include data enhancement, weight attenuation, dropout, dropconnect and the like, wherein dropout is a method for keeping the network part in an inoperative state in the model training process, namely enabling the output value of a node to be zero at a certain probability, and the weight corresponding to the node is not updated in the back propagation process. And dropconnect clears the input weight of the neural node with a certain probability. Both Dropout and dropconnect can reduce network overfitting, inhibit the classification error rate of the network, and improve the performance of the network. Preferably, dropout can be adopted in the fully-connected layer so that the activation output value of the fully-connected layer is randomly cleared at a preset ratio.

When the method is specifically implemented, a training text of a known class is received, a graph structure is constructed based on the training text, parameters of the convolutional neural network are trained through a back propagation algorithm to obtain the trained convolutional neural network, then the input text to be classified is received, the graph structure of the text to be classified is constructed, the class of the text to be classified is predicted through the trained convolutional neural network according to the graph structure of the text to be classified, the convolutional neural network for processing natural images is applied to text classification, the accuracy of text classification can be improved, and the method is fast and effective.

In the identification process of processing a natural image, a common means of a convolutional neural network is to divide the natural image into a plurality of subgraphs and then perform a feature extraction process on each subgraph. Preferably, in this scheme, a plurality of sub-graphs are constructed based on a graph structure of a text as an input of a convolutional neural network, and then predicting the category of the text to be classified through the trained convolutional neural network according to the graph structure of the text to be classified in step S2 specifically includes:

predicting the category of the text to be classified according to the output result of the classification output layer;

the convolution layer is used for receiving a matrix input by the previous layer to carry out convolution operation to generate a characteristic matrix, and the pooling layer carries out down-sampling operation to generate a characteristic mapping matrix by receiving the characteristic matrix output by the previous convolution layer; the full connection layer is used for performing characteristic weighting operation on the characteristic mapping matrix output by the previous pooling layer and outputting an attribute characteristic matrix; the classification output layer is used for receiving the attribute feature matrix output by the last full connection layer to obtain the output result of the classification output layer, and predicting the category of the text to be classified according to the output result of the classification output layer.

For each sub-image, the sub-images have different semantic information, the distance between the sub-images in the N-gram model of the text is long, and for text classification, high-level abstract features can be obtained by extracting the features of the different sub-images and then fusing the features, so that an accurate text classification result is obtained. On the other hand, the method is more in line with the characteristic of artificial neural network local perception, has higher efficiency, can greatly reduce the training parameters, and has great advantage in the training speed.

Preferably, the constructing a plurality of sub-images according to the graph structure of the text to be classified, and the normalizing each sub-image comprises the steps of:

The subgraph is constructed by the breadth-first search algorithm, so that the information of the original text, including keywords, context information and the like, can be retained to the maximum extent, the final classification result is facilitated, the data processing amount of a computer can be reduced, the time complexity is reduced, and the method is quick and effective. Further, when each sub-graph is used as an input of the neural network, it is necessary to obtain a vector representation of each word, i.e. a so-called word vector, word embedding, that is, a word in a natural language is digitized, and a string of continuous digital vectors is used for representing the word. One of the simplest word vector representation modes is one-hot word vector representation, the length of the vector is the size of a dictionary, only one bit in the number of bits of the vector is 1, and the position of 1 is the position of the word in the dictionary. Another method for representing word vectors is distributed word vector Representation (Distribution Representation), which is a Representation of word semantics obtained through model training. Preferably, the scheme can adopt a word2vec model or a glove model to represent the word vector. The word2vec comprises two models, namely a CBOW (continuous Bag-of-Words) model and a Skip-gram model, wherein the CBOW is a forward neural similar model and is the probability distribution of a given context prediction target word, the Skip-gram model is the probability value of the given context prediction target word, both are a target function, and then an optimization method is adopted to find the optimal parameter solution of the model, so that word vector representation is obtained. The word2vec model can simplify vector representation of words, the distance in the vector space can be used for representing the similarity of text semantics, and word order and context semantic information are considered.

For convenience of explanation, N sub-graphs with 5 nodes and 50 word vector dimensions are used as the input of the convolutional neural network for specific explanation. As shown in fig. 3, the input of NX5X50 is convolved by 64 convolution kernels of 5X50 to obtain NX64 feature matrices, and the N/2X64 feature mapping matrix and N/2 subgraphs are obtained by the maximum downsampling process of the next pooling layer; then, further convolving the N/2 sub-graphs by 128 convolution kernels of 5X1 to obtain 64X128 matrixes, and obtaining 32X128 feature mapping matrixes by a maximum value down-sampling process of a pooling layer; then, the convolution is performed by 256 convolution kernels of 5 × 1 at step size 3 to obtain 10 × 256, and 5 × 256 feature mapping matrixes are obtained through the maximum value downsampling process of the pooling layer. As shown in the figure, three fully connected layers are provided, the dropout parameter of each fully connected layer is 0.5, the classification output layer comprises K classifiers, and K classification results are output.

Preferably, the constructing sub-graphs by the breadth-first search algorithm, and the normalizing each sub-graph specifically comprises:

As shown in fig. 4, which is a specific schematic diagram of subgraph construction and normalization, the key nodes obtained according to the contribution values are respectively "gold scale", "england", "club", "fit", "high", "great", "unit" and "true", the key nodes are respectively used as root nodes to perform traversal through a breadth-first search strategy to obtain a plurality of subgraphs with different semantics, and each subgraph is normalized to be used as the input of the neural network to perform feature extraction and fusion of each subgraph.

And the subsequent characteristic extraction and characteristic mapping processes are facilitated through the normalization process of the subgraph.

Preferably, the training of the parameters of the convolutional neural network through the back propagation algorithm in step S1 specifically includes:

J＝H+Cλ(w)

then, carrying out recursive operation according to the residual error of the node to update the parameter of each neural node; wherein the parameters of the neural node include neural network weights and biases.

Further, when the classification output layer includes a plurality of classifiers, the cross entropy term is specifically:

For a large number of label categories, some categories have a relationship between parents and children, wherein the classifier characteristics (parameters) of the child category inherit the classifier characteristics (parameters) of the parent category, in order to reduce the learning parameters, a hierarchical relationship is constructed according to different categories, and a regularization term is obtained at the full-connection layer through the following formula to update the parameters of the full-connection layer:

wherein, the

For the weight of the parental category in the hierarchical relationship,

is the weight of the child category in the hierarchical relationship.

Through the steps, the classification performance can be greatly improved by introducing the dependency relationship among all the categories, and when the training data of the child nodes is less, the corresponding parameters can be adjusted through the training data of the parent nodes. As a way of simplifying data processing, the hierarchical relationship of the categories can promote the similarity of the parameters corresponding to the categories. As shown in FIG. 5, "computing" is used as a parent node of "intellectual intersection", and their parameters can be considered similar.

In order to speed up the training process, on the basis of the above embodiment, the text classification method further includes the steps of:

constructing a tree structure according to the hierarchical relationship among different category constructions, dividing the tree structure into a plurality of subtrees, and training the full-connection layer by taking the subtrees as a unit; the tree structure comprises a plurality of nodes and edges, the nodes correspond to classes, and the edges point to the classes of the next level from the classes of the previous level; wherein the step of dividing the tree structure into a plurality of subtrees is specifically as follows:

traversing from any node by a depth-first and front-order traversal method, and when a leaf node obtained by traversing is equal to a preset threshold value, segmenting the node and other traversed nodes into a sub-tree;

and traversing from any node by using the subtree as a leaf node through a depth-first and front-order traversal method, and if the obtained leaf node is equal to a preset threshold value, partitioning the node and other traversed nodes into a subtree.

As shown in fig. 6, the preset threshold is 5, when traversing node a, and when the number of traversed leaf nodes is 5, the subtree (i) can be divided into a training unit; if the leaf node of the node F is 4, the preset threshold 5 can be met by combining the node E and the node F.

Through the scheme, each category is divided into blocks, each block is trained respectively, large-scale category prediction problems can be processed by a computer, in practical application, upper-layer nodes such as B, E and F nodes in a graph can be trained firstly, and then child nodes and leaf nodes of the nodes are trained through a recursive algorithm. The recursive distributed learning algorithm provided by the scheme realizes large-scale learning of classification methods such as text classification or image classification, solves the limitation that the prior art can only realize small-scale learning, and has important practical significance.

Referring to fig. 7, which is a schematic structural diagram of a text classification apparatus provided in embodiment 2 of the present invention, including:

the training module 101 is configured to receive training texts of a plurality of known categories, pre-process the training texts, construct a graph structure of the training texts by using a word co-occurrence relationship, and train parameters of the convolutional neural network through a back propagation algorithm according to the graph structure of the training texts to obtain the trained convolutional neural network; the convolutional neural network comprises at least one convolutional layer, at least one pooling layer, at least one full-connection layer and at least one classification output layer; in the graph structure of the training text, nodes correspond to words in the training text;

the text to be classified receiving module 102 is configured to receive an input text to be classified, preprocess the text to be classified, and construct a graph structure of the text to be classified by using a co-occurrence relationship of words; in the graph structure of the text to be classified, nodes correspond to words in the text to be classified;

and the class prediction module 103 is configured to predict the class of the text to be classified through the trained convolutional neural network according to the graph structure of the text to be classified.

The word vector of each node in the graph structure of the training text or the text to be classified is represented by a word2vec model or a glove model; and the pooling layer performs down-sampling processing on the feature matrix by adopting an average pooling method, a maximum pooling method or a random pooling method.

Preferably, the preprocessing of the training text or the text to be classified specifically includes:

traversing the training text or the text to be classified through a sliding window with a preset size, and constructing an edge between any two words when the two words are simultaneously positioned in the sliding window, wherein the edge points to the next word from the previous word.

Further, according to the graph structure of the text to be classified, predicting the category of the text to be classified through the trained convolutional neural network specifically comprises:

constructing a plurality of sub-images according to the graph structure of the text to be classified, and carrying out normalization processing on each sub-image;

acquiring word vector representation of each node in each subgraph as input of a convolutional neural network, performing convolutional operation through a convolutional layer to generate a feature matrix, and performing down-sampling processing on the feature matrix through a pooling layer to obtain a feature mapping matrix;

if the next layer of the pooling layer is a convolution layer, performing convolution operation and down-sampling processing on the feature mapping matrix output by the previous layer, and outputting the feature mapping matrix;

if the next layer of the pooling layer is a full-connection layer, performing feature weighting operation on the feature mapping matrix output by the previous layer, and outputting an attribute feature matrix;

if the next layer of the full connection layer is the full connection layer, continuously performing feature weighting operation on the attribute feature matrix output by the previous layer to output the attribute feature matrix;

if the next layer of the full connection layer is a classification output layer, obtaining an output result of the classification output layer according to the attribute feature matrix output by the previous layer, and predicting the category of the text to be classified according to the output result of the classification output layer.

Wherein, the construction of a plurality of subgraphs according to the graph structure of the text to be classified comprises the following steps:

Further, constructing sub-graphs through a breadth-first search algorithm, and performing normalization processing on each sub-graph specifically comprises:

Preferably, the training the parameter of the convolutional neural network through a back propagation algorithm, and the obtaining of the trained convolutional neural network specifically includes:

Preferably, the performing back propagation according to the output result and the error of the label of the training text, the distributing the error to each layer in the convolutional neural network to obtain error data of each layer, and the correcting the parameter of the convolutional neural network according to the error data of each layer specifically includes:

J＝H+Cλ(w)

Preferably, when the classification output layer includes a plurality of classifiers, the cross entropy term is specifically:

Preferably, the training module 101 is further configured to construct a hierarchical relationship according to different categories, and obtain a regularization term at the fully-connected layer by using the following formula to update parameters of the fully-connected layer:

wherein, the

For the weight of the parental category in the hierarchical relationship,

is the weight of the child category in the hierarchical relationship.

Preferably, dropout is adopted in the fully connected layer so that the activation output value of the fully connected layer is randomly cleared at a preset ratio.

Preferably, the convolutional layer adopts a linear correction function as an activation function of the convolutional layer, and the activation function of the fully-connected layer is a Sigmoid function, a tanh (x) function or a softplus function.

In a preferred embodiment, the classification output layer is a softmax classifier.

In a preferred embodiment, the classification output layer comprises several Sigmoid functions.

In a preferred embodiment, the training module 101 is further configured to construct a tree structure according to a hierarchical relationship between different category constructions, partition the tree structure into a plurality of subtrees, and train the fully-connected layer with the subtrees as a unit; the tree structure comprises a plurality of nodes and edges, the nodes correspond to the categories, and the edges point to the categories of the next level from the categories of the previous level. Wherein the step of dividing the tree structure into a plurality of subtrees is specifically as follows:

The specific implementation process and the working principle of the text classification device provided by the embodiment of the present invention may refer to the above specific description of the text classification method, and are not described herein again.

To sum up, the embodiment of the invention discloses a text classification method and a device, which construct the graph structure of a training text by adopting the co-occurrence relation of words after receiving a plurality of training texts with known categories and preprocessing the training texts, training the parameters of the convolutional neural network through a back propagation algorithm according to the graph structure of the training text to obtain the trained convolutional neural network, then receiving an input text to be classified, preprocessing the text to be classified, constructing a graph structure of the text to be classified by adopting a word co-occurrence relation, and according to the graph structure of the text to be classified, and predicting the category technical scheme of the text to be classified through the trained convolutional neural network, and solving the problem of text classification by applying the convolutional neural network, thereby improving the accuracy and the reliability of text classification.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims

1. A method of text classification, comprising the steps of:

receiving training texts of a plurality of known categories, preprocessing the training texts, constructing a graph structure of the training texts by adopting a word co-occurrence relation, and training parameters of a convolutional neural network through a back propagation algorithm according to the graph structure of the training texts to obtain the trained convolutional neural network; the convolutional neural network comprises at least one convolutional layer, at least one pooling layer, at least one full-connection layer and at least one classification output layer; in the graph structure of the training text, nodes correspond to words in the training text one by one;

extracting nodes of the graph structure of the text to be classified according to the graph structure of the text to be classified, sorting the nodes according to the size of contribution values, selecting N nodes before sorting from the nodes as key nodes, taking each key node as a root node, constructing sub-graphs through a breadth-first search algorithm, normalizing each sub-graph, obtaining word vector representation of each node in each sub-graph as input of a convolutional neural network, and predicting the category of the text to be classified according to the output result of a classification output layer, wherein the contribution values are determined by the degree of each node, the word frequency of words corresponding to the nodes in the text and the co-occurrence rate of the nodes and neighborhood nodes in sequence, and the sub-graphs comprise at least k nodes, N is greater than 0, and k is greater than 0.

2. The text classification method according to claim 1, wherein the preprocessing of the training text or the text to be classified is specifically:

the method for constructing the graph structure of the training text or the text to be classified by adopting the co-occurrence relation of the words specifically comprises the following steps:

3. The text classification method according to claim 1, characterized in that the sub-graphs are constructed by a breadth-first search algorithm, and the normalization of each sub-graph is specifically:

4. The text classification method according to claim 1, wherein the training of the parameters of the convolutional neural network by the back propagation algorithm to obtain the trained convolutional neural network specifically comprises:

5. The method for classifying texts according to claim 4, wherein the back propagation of the errors according to the output result and the labeled errors of the training texts is performed to distribute the errors to each layer in the convolutional neural network to obtain error data of each layer, and the correction of the parameters of the convolutional neural network according to the error data of each layer is specifically:

J＝H+Cλ(w)

6. The text classification method according to claim 5, characterized in that, when the classification output layer comprises a plurality of classifiers, the cross-entropy term is in particular:

7. The method for classifying text according to claim 6, further comprising the step of:

wherein, the

For the weight of the parental category in the hierarchical relationship,

is the weight of the child category in the hierarchical relationship.

8. The text classification method according to claim 1, characterized in that the word vector of each node in the graph structure of the training text or the text to be classified is represented by a word2vec model or a glove model.

9. The method of text classification according to claim 1, characterized in that the pooling layer down-samples the feature matrix using an average pooling, a maximum pooling or a random pooling method.

10. The text classification method according to claim 1, characterized in that dropout is adopted at the fully-connected layer so that the activation output values of the fully-connected layer are randomly cleared at a preset ratio.

11. The text classification method of claim 4, wherein the convolutional layer employs a linear modification function as the activation function of the convolutional layer, and the activation function of the fully-connected layer is a Sigmoid function, a tanh (x) function, or a softplus function.

12. The text classification method of claim 1, wherein the classification output layer is a softmax classifier.

13. The text classification method of claim 1, wherein the classification output layer includes Sigmoid functions.

14. The text classification method according to claim 1, characterized in that the method further comprises the steps of:

15. A text classification apparatus, comprising:

the training module is used for receiving training texts of a plurality of known categories, preprocessing the training texts, constructing a graph structure of the training texts by adopting word co-occurrence relation, and training parameters of a convolutional neural network through a back propagation algorithm according to the graph structure of the training texts to obtain the trained convolutional neural network; the convolutional neural network comprises at least one convolutional layer, at least one pooling layer, at least one full-connection layer and at least one classification output layer; in the graph structure of the training text, nodes correspond to words in the training text;

the system comprises a to-be-classified text receiving module, a to-be-classified text receiving module and a to-be-classified text preprocessing module, wherein the to-be-classified text receiving module is used for receiving an input to-be-classified text, and constructing a graph structure of the to-be-classified text by adopting a word co-occurrence relation after preprocessing the to-be-classified text; in the graph structure of the text to be classified, nodes correspond to words in the text to be classified one by one;

a category prediction module for extracting nodes of the graph structure of the text to be classified according to the graph structure of the text to be classified, sorting the nodes according to the size of the contribution values, selecting the node of N before sorting from the nodes as a key node, and taking each key node as a root node, constructing subgraphs by a breadth-first search algorithm, carrying out normalization processing on each subgraph, acquiring word vector representation of each node in each subgraph as input of a convolutional neural network, predicting the category of the text to be classified according to the output result of the classification output layer, the contribution value is determined by the degree of each node, the word frequency of a word corresponding to the node in a text and the co-occurrence rate of the node and a neighborhood node in sequence, the subgraph comprises at least k nodes, N is greater than 0, and k is greater than 0.