CN107526785A

CN107526785A - File classification method and device

Info

Publication number: CN107526785A
Application number: CN201710642105.0A
Authority: CN
Inventors: 彭浩; 李建欣; 何雨; 刘垚鹏; 包梦蛟; 宋阳秋; 杨强
Original assignee: Guangzhou HKUST Fok Ying Tung Research Institute
Current assignee: Guangzhou HKUST Fok Ying Tung Research Institute
Priority date: 2017-07-31
Filing date: 2017-07-31
Publication date: 2017-12-29
Anticipated expiration: 2037-07-31
Also published as: CN107526785B

Abstract

File classification method and device disclosed by the invention, by the training text for receiving multiple known class, after the training text is pre-processed, the graph structure of the training text is constructed using the collinear relationship of word, according to the graph structure of the training text, the parameter of the convolutional neural networks is trained by back-propagation algorithm, the convolutional neural networks after being trained, then the text to be sorted of input is received, after the text to be sorted is pre-processed, the graph structure of the text to be sorted is constructed using the collinear relationship of word, further according to the graph structure of the text to be sorted, the category technique scheme of the text to be sorted is predicted by the convolutional neural networks after training, the problem of text classification being carried out using convolutional neural networks, improve the accuracy and confidence level of text classification.

Description

Text classification method and device

Technical Field

The invention relates to the field of machine learning, in particular to a text classification method and device.

Background

The convolutional neural network is an artificial neural network with deep learning ability designed according to the principle of a visual neural mechanism of primates. Hubel and Wiesel proposed a visual structure model based on the visual cortex of cats in 1962 and first proposed the concept of receptive fields. However, with the advent of simpler and more efficient linear classifiers such as support vector machines and the like and the ubiquitous local minimum limitation in the non-convex target cost function of the deep structure, the research of the neural network falls into the low tide of the last two decades. Hinton et al are known to propose an unsupervised bead layer training method based on a Deep Belief Network (DBN) to solve the optimization problem related to the deep structure.

The conventional convolutional neural network is generally used for image classification, and the convolutional neural network is urgently needed to be applied to text classification, so that the accuracy and the reliability of the text classification are improved.

Disclosure of Invention

The embodiment of the invention aims to provide a text classification method and a text classification device, which can effectively solve the problem that the prior art is lack of text classification by using a convolutional neural network and improve the accuracy and the reliability of text classification.

In order to achieve the above object, an embodiment of the present invention provides a text classification method, including:

receiving training texts of a plurality of known categories, preprocessing the training texts, constructing a graph structure of the training texts by adopting a word collinear relationship, and training parameters of the convolutional neural network through a back propagation algorithm according to the graph structure of the training texts to obtain the trained convolutional neural network; wherein, the convolutional neural network comprises at least one convolutional layer, at least one pooling layer, at least one full-connection layer and at least one classification output layer; in the graph structure of the training text, nodes correspond to words in the training text one by one;

receiving an input text to be classified, preprocessing the text to be classified, and constructing a graph structure of the text to be classified by adopting a collinear relationship of words; in the graph structure of the text to be classified, nodes correspond to words in the text to be classified one by one;

and predicting the category of the text to be classified through the trained convolutional neural network according to the graph structure of the text to be classified.

Compared with the prior art, the text classification method disclosed by the invention has the advantages that the training texts of a plurality of known classes are received, the graph structure of the training texts is constructed by adopting the collinear relationship of words after the training texts are preprocessed, the parameters of the convolutional neural network are trained through a back propagation algorithm according to the graph structure of the training texts to obtain the trained convolutional neural network, then the input text to be classified is received, the graph structure of the text to be classified is constructed by adopting the collinear relationship of the words after the text to be classified is preprocessed, then the class technical scheme of the text to be classified is predicted through the trained convolutional neural network according to the graph structure of the text to be classified, the problem of text classification is solved by applying the convolutional neural network, and the accuracy and the reliability of text classification are improved.

As an improvement of the above scheme, the preprocessing of the training text or the text to be classified specifically includes:

after word segmentation processing is carried out on the training text or the text to be classified, noise points and stop words of the training text or the text to be classified are removed, and word stems of each word in the training text or the text to be classified are extracted; the noise comprises punctuation marks and numbers, and the stop words comprise pronouns, conjunctions, prepositions and articles;

constructing a graph structure of the training text or the text to be classified by adopting the collinear relation of words, which specifically comprises the following steps:

traversing the training text or the text to be classified through a sliding window with a preset size, and constructing an edge between any two words when the two words are positioned in the sliding window at the same time, wherein the edge points to the next word from the previous word. The removal of words with weak semantics and non-practical meanings is necessary for highlighting the subject of the text and accurately classifying the subject.

As an improvement of the above scheme, according to the graph structure of the text to be classified, and according to the graph structure of the text to be classified, the class of the text to be classified is predicted by the trained convolutional neural network specifically as follows:

constructing a plurality of subgraphs according to the graph structure of the text to be classified, and carrying out normalization processing on each subgraph to obtain the word vector representation of each node in each subgraph as the input of a convolutional neural network;

and predicting the category of the text to be classified according to the output result of the classification output layer. For each subgraph, the subgraph has different semantic information, the distance between the subgraphs is long in an N-gram model of the text, and for text classification, high-level abstract features can be obtained by extracting the features of the different subgraphs and then fusing the features, so that an accurate text classification result is obtained.

As an improvement of the above scheme, the constructing a plurality of sub-images according to the graph structure of the text to be classified, and the normalizing each sub-image comprises the steps of:

extracting nodes of the graph structure of the text to be classified, and sequencing the nodes according to the size of the contribution value; wherein; the contribution value is determined by the degree of each node, the word frequency of a word corresponding to the node in a text and the co-linearity of the node and a neighborhood node in sequence;

selecting N nodes before the node sorting as key nodes, taking each key node as a root node, constructing sub-graphs through a breadth-first search algorithm, and carrying out normalization processing on each sub-graph; wherein the subgraph comprises at least k nodes, N >0, k >0.

As an improvement of the above scheme, the constructing of the sub-graph by the breadth-first search algorithm, and the normalizing process of each sub-graph specifically includes:

acquiring adjacent nodes of the root node, and if the number of the adjacent nodes of the root node is more than k-1, constructing a sub-graph by using the root node, the adjacent nodes of the root node, and edges of the root node and the adjacent nodes;

if the number of the adjacent nodes of the root node is less than k-1, acquiring secondary adjacent nodes of the root node step by step until the total number of the acquired adjacent nodes and secondary adjacent nodes is greater than or equal to k or the secondary adjacent nodes cannot be acquired continuously, and constructing a subgraph according to the root node, the adjacent nodes and secondary adjacent nodes of the root node, edges of the root node and the adjacent nodes, edges of the adjacent nodes and secondary adjacent nodes of the root node and edges between the secondary adjacent nodes; wherein the secondary adjacent node is a node indirectly connected with the root node;

constructing a spanning tree according to the subgraph, and sequencing nodes of the spanning tree from a shallow layer to a deep layer by using a breadth first algorithm;

in the same layer, sequencing the adjacent nodes of the root node according to the contribution value;

when the node in the subgraph is larger than k, reserving the node of the k at the top of the sequence in the spanning tree, thereby completing the normalization process of the subgraph;

when the nodes in the subgraph are smaller than k, adding a plurality of dummy nodes in the subgraph to enable the number of the nodes in the subgraph to be equal to k, and therefore completing the normalization process of the subgraph; wherein the dummy element node is not connected with any node in the original subgraph.

As an improvement of the above scheme, the training of the parameter of the convolutional neural network through a back propagation algorithm to obtain the trained convolutional neural network specifically includes:

initializing parameters of the convolutional neural network, and carrying out forward propagation on the training texts of the known types through the convolutional neural network to obtain output results;

and performing back propagation according to the output result and the marked error of the training text, distributing the error to each layer in the convolutional neural network to obtain error data of each layer, and correcting the parameters of the convolutional neural network according to the error data of each layer.

As an improvement of the foregoing solution, the performing back propagation on the error of the label according to the output result and the training text, distributing the error to each layer in the convolutional neural network to obtain error data of each layer, and modifying the parameter of the convolutional neural network according to the error data of each layer specifically includes:

constructing a loss function according to the output result and the known class of the training text, and acquiring a residual error of any neural node in the convolutional neural network according to the loss function; wherein the loss function is:

J＝H+Cλ(w)

wherein H is a cross entropy term and C λ (w) is a regularization term that prevents overfitting;

carrying out recursive operation according to the residual error of the node to update the parameter of each neural node; wherein the parameters of the neural nodes include neural network weights and biases.

As an improvement of the above scheme, when the classification output layer includes a plurality of classifiers, the cross entropy term specifically is:

wherein M is the number of training texts of known classes, K is the number of classifiers of the classification output layer, l _k (d _m ) For training text d _m Whether it belongs to a binary label of class k, P _k (d _m ) For training text d _m The probability of the prediction class k obtained by the convolutional neural network.

As an improvement of the above, the method further comprises the steps of:

constructing a grade relation according to different categories, and acquiring a regularization term at the full-connection layer through the following formula to update parameters of the full-connection layer:

wherein, theFor the weight of the parental category in the hierarchical relationship,is the weight of the child category in the hierarchical relationship.

As an improvement of the scheme, the word vector of each node in the graph structure of the text is represented by a word2vec model or a glove model.

As an improvement of the scheme, the pooling layer adopts an average pooling method, a maximum pooling method or a random pooling method to perform downsampling processing on the feature matrix.

As an improvement of the above scheme, dropout is adopted in the fully-connected layer so that the activation output value of the fully-connected layer is randomly cleared at a preset ratio.

As an improvement of the above solution, the convolutional layer adopts a linear correction function as an activation function of the convolutional layer, and the activation function of the fully-connected layer is a Sigmoid function, a tanh (x) function or a softplus function.

As an improvement of the scheme, the classifier is a softmax classifier.

As an improvement of the above scheme, the classifier includes several Sigmoid functions.

As an improvement of the above, the method further comprises the steps of:

constructing a tree structure according to the hierarchical relationship among different category constructions, dividing the tree structure into a plurality of subtrees, and training the fully-connected layer by taking the subtrees as a unit; the tree structure comprises a plurality of category nodes and category edges, wherein the category nodes correspond to categories, and the category edges point to the category of the next level from the category of the previous level; wherein the division of the tree structure into a plurality of subtrees is specifically:

traversing from any category node by a depth-first and front-order traversing method, and when a leaf node obtained by traversing is equal to a preset threshold value, segmenting the category node and other traversed category nodes into a sub-tree;

and taking the subtree as a category node and leaf nodes, traversing from any category node by a depth-first and front-order traversal method, and if the leaf nodes obtained by traversing are equal to a preset threshold value, segmenting the category node and other traversed category nodes into a subtree.

An embodiment of the present invention further provides a text classification apparatus, including:

the training module is used for receiving training texts of a plurality of known categories, preprocessing the training texts, constructing a graph structure of the training texts by adopting a word collinear relationship, and training parameters of the convolutional neural network through a back propagation algorithm according to the graph structure of the training texts to obtain the trained convolutional neural network; the convolutional neural network comprises at least one convolutional layer, at least one pooling layer, at least one full-connection layer and at least one classification output layer; in the graph structure of the training text, nodes correspond to words in the training text;

the text to be classified receiving module is used for receiving an input text to be classified, preprocessing the text to be classified and then constructing a graph structure of the text to be classified by adopting a word collinear relation; in the graph structure of the text to be classified, nodes correspond to words in the text to be classified;

and the class prediction module is used for predicting the class of the text to be classified through the trained convolutional neural network according to the graph structure of the text to be classified.

The text classification device receives a plurality of training texts of known classes through a training module, constructs a graph structure of the training texts by adopting a collinear relation of words after the training texts are preprocessed, trains parameters of a convolutional neural network through a back propagation algorithm according to the graph structure of the training texts to obtain the trained convolutional neural network, receives an input text to be classified through a text receiving module to be classified, constructs the graph structure of the text to be classified by adopting the collinear relation of the words after the text to be classified is preprocessed, predicts the class technical scheme of the text to be classified through the trained convolutional neural network according to the graph structure of the text to be classified through a class prediction module, applies the convolutional neural network to classify the problem of text classification, and improves the accuracy and the reliability of text classification.

Drawings

Fig. 1 is a schematic flowchart of a text classification method provided in embodiment 1 of the present invention.

FIG. 2 is a schematic diagram of a graph structure for structuring text in accordance with the present invention.

Fig. 3 is a process of operation of the convolutional neural network of the present invention.

FIG. 4 is a working process of sub-graph construction and normalization of the graph structure of the text in the present invention.

FIG. 5 is a schematic diagram of the relationship of parent and child nodes for the categories of the present invention.

FIG. 6 is a diagram illustrating the unit division of categories according to the present invention.

Fig. 7 is a schematic structural diagram of a text classification apparatus according to embodiment 2 of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without any inventive step, are within the scope of the present invention.

Referring to fig. 1, which is a schematic flow chart of a text classification method provided in embodiment 1 of the present invention, includes the steps of:

s1, receiving a plurality of training texts of known types, preprocessing the training texts, constructing a graph structure of the training texts by adopting a word collinear relationship, and training parameters of the convolutional neural network through a back propagation algorithm according to the graph structure of the training texts to obtain the trained convolutional neural network; the convolutional neural network comprises at least one convolutional layer, at least one pooling layer, at least one full-connection layer and at least one classification output layer; in the graph structure of the training text, nodes correspond to words in the training text one by one;

it should be noted that the Back Propagation process (BP) is a bottom-up process, which belongs to a supervised learning algorithm and is suitable for training a forward neural network. Then, in this step, the agriculturals train the parameters of the convolutional neural network through a back propagation algorithm, and the convolutional neural network after training is obtained specifically as follows:

initializing parameters of the convolutional neural network, and carrying out forward propagation on the training texts of the known types through the convolutional neural network to obtain output results; preferably, the parameters of the convolutional neural network are initialized by a robust weight initialization method to obey zero-mean gaussian distribution.

And performing back propagation according to the output result and the error marked by the training text, distributing the error to each layer in the convolutional neural network to obtain error data of each layer, and correcting the parameters of the convolutional neural network according to the error data of each layer.

In the embodiment of the invention, the convolutional neural network is adopted for text classification, the original input data can be directly applied, and the training method extracts and finds the best characteristic by adjusting trainable parameters. Wherein the input layer may act directly on the original input data, which for a graph structure of text is a word vector representation of the text. Convolutional layers, also called feature extraction layers, the corresponding convolutional kernels can also be filters, the convolutional kernels typically involving the size, number and step size of the convolutional kernels. The number of convolution kernels represents the number of feature maps obtained by convolution filtering from an upper layer, and the more feature maps are extracted, the larger the network representation feature space is, and the stronger the learning ability is. However, too many convolution kernels increase the complexity of the network, increase the number of parameters, increase the complexity of calculation, and easily cause an overfitting phenomenon. Therefore, the number of convolution kernels needs to be determined according to the size of the data set for a particular application. On the other hand, the size of the convolution kernel determines the size of the feature map, and the step size of the convolution kernel determines the step size and the number of features of the acquired image. The pooling layer is also called a lower adoption layer, and mainly aims to reduce the data processing amount and accelerate the network training speed on the basis of retaining useful information. The more convolutional and pooling layers, the more abstract features can be extracted on the basis of the previous layer. Further, the pooling layer performs down-sampling processing on the feature matrix by using an average pooling method, a maximum pooling method or a random pooling method. The mean sampling means averaging the feature points in the domain, and the maximum sampling means maximizing the feature points in the domain. The random sampling steps are as follows: firstly, the statistical sum of the matrix to be pooled is obtained, the proportion of each element in the statistical sum is calculated, random sampling is carried out according to the proportion as the probability, and the probability of sampling with large elements is also high.

S2, receiving an input text to be classified, preprocessing the text to be classified, and constructing a graph structure of the text to be classified by adopting a word collinear relationship; in the graph structure of the text to be classified, nodes correspond to words in the text to be classified one by one;

the preprocessing of the training text or the text to be classified specifically comprises the following steps:

after word segmentation processing is carried out on the training text or the text to be classified, noise and stop words of the training text or the text to be classified are removed, and word stems of each word in the training text or the text to be classified are extracted; wherein the noise comprises punctuation marks and numbers, and the stop words comprise pronouns, conjunctions, prepositions and articles. Specifically, noise and stop words are not actually meaningful and hardly carry any information, and the distinguishing capability of the text is weak except for the function of language model and sentiment analysis, so that the noise and the words need to be removed in text classification. In practical application, a stop word list is usually established in advance, then the word stop word list obtained by word segmentation is matched, if a word exists in the list, the word is indicated to be a stop word, and then the word is deleted; if not, it is retained. Besides some stop words, there are also a lot of semantically ambiguous adverbs, numbers, and orientation words, such as "in", "one", and "very", etc. which do not contribute much to the content of the text, so that it is also necessary to remove words with unrealistic meaning, which have weak semantics, and it is necessary to highlight the subject of the text and classify the text accurately.

In addition, a graph structure of the training text or the text to be classified is constructed by adopting a collinear relation of words, and the graph structure specifically comprises the following steps:

traversing the training text or the text to be classified through a sliding window with a preset size, and constructing an edge between any two words when the two words are simultaneously in the sliding window, wherein the edge points to the next word from the previous word. In addition, the graph structure of the text is constructed, so that the text can be classified by combining the existing classification method of the convolutional neural network of the graph, the accuracy is high, and the parameters of the model are easy to train. Where a graph is a data structure composed of finite and nonempty sets of vertices and edges between vertices, usually denoted as G = (V, E, W), G denotes a graph, V is the set of vertices in the graph G, E is the set of edges in the graph G, W is the set of vertices and weights of the edges. That is, a graph structure is a structure composed of nodes, directed edges connecting the nodes, and weights representing the degrees of importance of the nodes and the edges. The graph structure of the text is constructed through the collinear relationship, so that the sequential relationship (context) of words can be obtained, the key information of the text is not lost, and the accurate classification result can be obtained. As shown in fig. 2, the sliding window has a size of 3, and one edge is constructed for each collinear. Of course, the number of edges between every two words can be reduced to one, and the weight of the edge is in a direct proportion relation with the co-linearity of the two words corresponding to the edge or the weight of the edge is in a direct proportion relation with the similarity of the two words connected by the edge.

And S3, predicting the category of the text to be classified through the trained convolutional neural network according to the graph structure of the text to be classified.

The activation function in the convolutional neural network refers to how to retain and map the characteristics of the activated neurons through a nonlinear function. The data processing mode of the neural network is determined, and the learning ability of the neural network is influenced. If the activation function of the neuron is a linear function, only low-level features can be learned. Preferably, the convolutional layer employs a linear correction function as an activation function of the convolutional layer. The specific formula of the linear correction function (ReLU) is f (x) = max (0, x), when the input value is less than or equal to zero, the linear correction function is forced to be equal to zero, and when the input value is greater than zero, the linear correction function is kept unchanged, so that the proper sparsity can be brought to the trained network, the training time can be greatly reduced, the performance of the network can be improved, the linear correction function is closer to the essence of biological neuron activation, and the neural signal excitation principle is met. Further, the activation function of the full connection layer is a Sigmoid function, a tanh (x) function, or a softplus function.

Preferably, the classification output layer is a softmax classifier. The softmax is expanded on the basis of logistic regression, can solve the problem of multi-classification, and preferably, the distribution probability of each class is calculated by the following formula:

wherein s is _i Represents the output value, s, of the ith neuron of the Softmax classifier _i F · η, where F is a word vector of a key node of a certain training text, η is a corresponding weight, and n is the number of classes to be classified.

Preferably, the classification output layer comprises several Sigmoid classifiers. The Sigmoid function is a commonly used S-type nonlinear activation function, and the specific formula is f (x) = (1 + e) ^-x ) ^-1 Its function is to compress a real number between 0 and 1. During learning, the important features are pushed to the middle area, and the non-important features are pushed to the two side areas, which are consistent with synapses of neurons in neurology.

Preferably, since the loss function is non-convex and has no analytic solution, the solution is intelligently solved through an optimization algorithm, and the optimization algorithm can adopt a random gradient descent algorithm, an adaptive gradient algorithm and a Nesterov gradient acceleration algorithm. When the trained data set has insufficient samples or data is over-trained, overfitting can be generated, and in order to enhance the network generalization capability and prevent overfitting, the main methods include data enhancement, weight attenuation, dropout, droponnect and the like, wherein dropout is obtained by keeping a network part in a non-working state in the model training process, namely enabling the output value of a node to be zero with a certain probability, and the weight corresponding to the node is not updated in the back propagation process. And dropconnect clears the input weight of the neural node with a certain probability. Both Dropout and dropconnect can reduce network overfitting, inhibit the classification error rate of the network, and improve the performance of the network. Preferably, dropout can be adopted in the fully-connected layer so that the activation output value of the fully-connected layer is randomly cleared at a preset ratio.

When the method is specifically implemented, a training text of a known type is received, a graph structure is constructed based on the training text, parameters of the convolutional neural network are trained through a back propagation algorithm to obtain the trained convolutional neural network, then the input text to be classified is received, the graph structure of the text to be classified is constructed, the type of the text to be classified is predicted through the trained convolutional neural network according to the graph structure of the text to be classified, the convolutional neural network for processing natural images is applied to text classification, the accuracy of text classification can be improved, and the method is rapid and effective.

In the identification process of processing natural images by a convolutional neural network, a common means is to divide the natural images into a plurality of subgraphs first, and then perform a feature extraction process on each subgraph respectively. Preferably, in the present scheme, a plurality of sub-images are constructed based on a graph structure of a text as an input of a convolutional neural network, and then predicting the category of the text to be classified through the trained convolutional neural network according to the graph structure of the text to be classified in step S2 specifically includes:

predicting the category of the text to be classified according to the output result of the classification output layer;

the convolution layer is used for receiving a matrix input by the previous layer to carry out convolution operation to generate a characteristic matrix, and the pooling layer carries out down-sampling operation to generate a characteristic mapping matrix by receiving the characteristic matrix output by the previous convolution layer; the full connection layer is used for performing characteristic weighting operation on the characteristic mapping matrix output by the previous pooling layer and outputting an attribute characteristic matrix; the classification output layer is used for receiving the attribute feature matrix output by the last full connection layer to obtain the output result of the classification output layer, and predicting the category of the text to be classified according to the output result of the classification output layer.

For each sub-image, the sub-images have different semantic information, the distance between the sub-images in the N-gram model of the text is long, and for text classification, high-level abstract features can be obtained by extracting the features of the different sub-images and then fusing the features, so that an accurate text classification result is obtained. On the other hand, the method is more in line with the characteristic of artificial neural network local perception, has higher efficiency, can greatly reduce the training parameters, and has great advantage in the training speed.

Preferably, the constructing a plurality of sub-graphs according to the graph structure of the text to be classified, and the normalizing each sub-graph comprises the steps of:

selecting N nodes before the node sorting as key nodes, taking each key node as a root node, constructing sub-graphs through a breadth-first search algorithm, and carrying out normalization processing on each sub-graph; wherein the subgraph comprises at least k nodes, N >0, k > < 0.

The subgraph is constructed by the breadth-first search algorithm, so that the information of the original text, including key words, context information and the like, can be retained to the maximum extent, the final classification result is facilitated, the data processing amount of a computer can be reduced, the time complexity is reduced, and the method is quick and effective. Furthermore, when each subgraph is used as the input of the neural network, the vector representation of each word, namely the word vector, needs to be obtained, that is, word embedding, is to say, the word number symbolization in natural language is spoken, and the word representation is performed by using a string of continuous number vectors. One of the simplest word vector representation modes is one-hot word vector representation, the length of the vector is the size of a dictionary, and the position where only one digit in the digit number of the vector is 1,1 is the position of the word in the dictionary. Another method for representing word vectors is distributed word vector Representation (Distribution Representation), which is a Representation of word semantics obtained through model training. Preferably, the word vector can be represented by adopting a word2vec model or a glove model in the scheme. The word2vec comprises two models, namely a CBOW (continuous Bag-of-Words) model and a Skip-gram model, wherein the CBOW is a forward neural similar model and is the probability distribution of a given context prediction target word, the Skip-gram model is the probability value of the given context prediction target word, both are a target function, and then an optimization method is adopted to find the optimal parameter solution of the model, so that word vector representation is obtained. The word2vec model can simplify vector representation of words, the distance in the vector space can be used for representing the similarity of text semantics, and word order and context semantic information are considered.

For convenience of explanation, N sub-graphs with 5 nodes and 50 word vector dimensions are used as the input of the convolutional neural network for specific explanation. As shown in fig. 3, firstly convolving the input of NX5X50 by 64 convolution kernels of 5X50 to obtain NX64 feature matrices, and obtaining N/2X64 feature mapping matrices and N/2 subgraphs by the maximum downsampling process of the next pooling layer; then, further convolving the N/2 sub-graphs by 128 convolution kernels of 5X1 to obtain 64X128 matrixes, and obtaining 32X128 feature mapping matrixes by a maximum value reduction sampling process of a pooling layer; then, the convolution is performed with 256 convolution kernels of 5 × 1 at step size 3 to obtain 10 × 256, and 5 × 256 feature mapping matrices are obtained through the maximum value downsampling process of the pooling layer. As shown in the figure, there are three fully connected layers, where the dropout parameter of each fully connected layer is 0.5, and the classification output layer includes K classifiers and outputs K classification results.

Preferably, the constructing sub-graphs by the breadth-first search algorithm, and the normalizing each sub-graph specifically comprises:

acquiring adjacent nodes of the root node, and if the number of the adjacent nodes of the root node is greater than k-1, constructing a sub-graph by using the root node, the adjacent nodes of the root node and edges of the root node and the adjacent nodes;

if the number of the adjacent nodes of the root node is less than k-1, acquiring secondary adjacent nodes of the root node step by step until the total number of the acquired adjacent nodes and the secondary adjacent nodes is greater than or equal to k or the secondary adjacent nodes cannot be acquired continuously, and constructing a subgraph according to the root node, the adjacent nodes and the secondary adjacent nodes of the root node, edges of the root node and the adjacent nodes, edges of the adjacent nodes and the secondary adjacent nodes of the root node and edges between the secondary adjacent nodes; wherein the secondary adjacent node is a node indirectly connected with the root node;

when the node in the subgraph is larger than k, reserving the node of the k before the ordering in the spanning tree, thereby completing the normalization process of the subgraph;

when the nodes in the subgraph are smaller than k, adding a plurality of dummy nodes in the subgraph to enable the number of the nodes in the subgraph to be equal to k, and therefore completing the normalization process of the subgraph; wherein, the dummy element node is not connected with any node in the original subgraph.

As shown in fig. 4, which is a specific schematic diagram of subgraph construction and normalization, the key nodes obtained according to the contribution values are respectively "gold scale", "england", "club", "fit", "high", "great", "unit" and "true", the breadth-first search strategy is respectively performed with the key nodes as root nodes to perform traversal to obtain a plurality of subgraphs with different semantics, and each subgraph is normalized to perform feature extraction and fusion of each subgraph as the input of the neural network.

And the subsequent characteristic extraction and characteristic mapping processes are facilitated through the normalization process of the subgraph.

Preferably, the training of the parameters of the convolutional neural network through the back propagation algorithm in step S1 specifically includes:

J＝H+Cλ(w)

then, carrying out recursive operation according to the residual error of the node to update the parameter of each neural node; wherein the parameters of the neural nodes include neural network weights and biases.

Further, when the classification output layer includes a plurality of classifiers, the cross entropy term is specifically:

For a large number of label categories, some categories have a relationship between parents and children, wherein the classifier characteristics (parameters) of the child category inherit the classifier characteristics (parameters) of the parent category, in order to reduce the learning parameters, a hierarchical relationship is constructed according to different categories, and the parameters of the full-connection layer are updated by acquiring a regularization term in the full-connection layer according to the following formula:

Through the steps, the classification performance can be greatly improved by introducing the dependency relationship among all the categories, and when the training data of the child nodes is less, the corresponding parameters can be adjusted through the training data of the parent nodes. As a way of simplifying data processing, the hierarchical relationship of the categories can promote the similarity of the parameters corresponding to the categories. As shown in FIG. 5, "computing" is used as a parent node of "intellectual intersection", and their parameters can be considered similar.

In order to speed up the training process, on the basis of the above embodiment, the text classification method further includes the steps of:

constructing a tree structure according to the hierarchical relationship among different category constructions, dividing the tree structure into a plurality of subtrees, and training the full-connection layer by taking the subtrees as a unit; the tree structure comprises a plurality of nodes and edges, wherein the nodes correspond to the categories, and the edges are pointed to the categories of the next level from the categories of the previous level; wherein the step of dividing the tree structure into a plurality of subtrees specifically comprises:

traversing from any node by a depth-first and front-order traversing method, and when a leaf node obtained by traversing is equal to a preset threshold value, segmenting the node and other traversed nodes into a sub-tree;

and traversing from any node by using the subtree as a leaf node through a depth-first and front-order traversal method, and if the obtained leaf node is equal to a preset threshold value, dividing the node and other traversed nodes into a subtree.

As shown in fig. 6, the preset threshold is 5, when traversing node a, and when the number of traversed leaf nodes is 5, sub-tree (i) may be divided into a training unit; if the leaf node of the node F is 4, the preset threshold 5 can be met by combining the node E and the node F.

Through the scheme, each category is divided into blocks, each block is trained respectively, large-scale category prediction problems can be processed by a computer, upper-layer nodes such as B, E and F nodes in a graph can be trained firstly in practical application, and then child nodes and leaf nodes of the nodes are trained through a recursive algorithm. The recursive distributed learning algorithm provided by the scheme realizes large-scale learning of classification methods such as text classification or image classification, solves the limitation that the prior art can only realize small-scale learning, and has important practical significance.

Referring to fig. 7, which is a schematic structural diagram of a text classification apparatus provided in embodiment 2 of the present invention, including:

the training module 101 is configured to receive training texts of a plurality of known categories, preprocess the training texts, construct a graph structure of the training texts by using a word co-linear relationship, train parameters of the convolutional neural network by using a back propagation algorithm according to the graph structure of the training texts, and obtain the trained convolutional neural network; the convolutional neural network comprises at least one convolutional layer, at least one pooling layer, at least one full-connection layer and at least one classification output layer; in the graph structure of the training text, nodes correspond to words in the training text;

the text to be classified receiving module 102 is configured to receive an input text to be classified, preprocess the text to be classified, and construct a graph structure of the text to be classified by using a collinear relationship of words; in the graph structure of the text to be classified, nodes correspond to words in the text to be classified;

and the class prediction module 103 is configured to predict the class of the text to be classified through the trained convolutional neural network according to the graph structure of the text to be classified.

The word vector of each node in the graph structure of the training text or the text to be classified is represented by a word2vec model or a glove model; and the pooling layer performs down-sampling processing on the feature matrix by adopting an average pooling method, a maximum pooling method or a random pooling method.

Preferably, the preprocessing of the training text or the text to be classified specifically includes:

after word segmentation processing is carried out on the training text or the text to be classified, noise and stop words of the training text or the text to be classified are removed, and word stems of each word in the training text or the text to be classified are extracted; wherein the noise point comprises punctuation marks and numbers, and the stop words comprise pronouns, conjunctions, prepositions and articles;

traversing the training text or the text to be classified through a sliding window with a preset size, and constructing an edge between any two words when the two words are positioned in the sliding window at the same time, wherein the edge points to the next word from the previous word.

Further, according to the graph structure of the text to be classified, predicting the category of the text to be classified through the trained convolutional neural network specifically comprises:

constructing a plurality of sub-images according to the graph structure of the text to be classified, and carrying out normalization processing on each sub-image;

acquiring word vector representation of each node in each subgraph as input of a convolutional neural network, performing convolutional operation through a convolutional layer to generate a feature matrix, and performing down-sampling processing on the feature matrix through a pooling layer to obtain a feature mapping matrix;

if the next layer of the pooling layer is a convolution layer, performing convolution operation and down-sampling processing on the feature mapping matrix output by the previous layer, and outputting the feature mapping matrix;

if the next layer of the pooling layer is a full-connection layer, performing feature weighting operation on the feature mapping matrix output by the previous layer, and outputting an attribute feature matrix;

if the next layer of the full connection layer is the full connection layer, continuously performing feature weighting operation on the attribute feature matrix output by the previous layer to output the attribute feature matrix;

if the next layer of the full connection layer is a classification output layer, obtaining an output result of the classification output layer according to the attribute feature matrix output by the previous layer, and predicting the category of the text to be classified according to the output result of the classification output layer.

Wherein, the construction of a plurality of subgraphs according to the graph structure of the text to be classified comprises the following steps:

Further, constructing sub-graphs through a breadth-first search algorithm, and normalizing each sub-graph specifically comprises:

if the number of the adjacent nodes of the root node is less than k-1, acquiring secondary adjacent nodes of the root node step by step until the total number of the acquired adjacent nodes and secondary adjacent nodes is greater than or equal to k or the secondary adjacent nodes cannot be acquired continuously, and constructing a subgraph according to the root node, the adjacent nodes and secondary adjacent nodes of the root node, edges of the root node and the adjacent nodes, edges of the adjacent nodes and secondary adjacent nodes of the root node and edges between the secondary adjacent nodes; wherein the secondary adjacent node is a node indirectly connected to the root node;

in the same layer, sequencing the adjacent nodes of the root node according to the size of the contribution value;

when the nodes in the subgraph are smaller than k, adding a plurality of dummy nodes in the subgraph to enable the number of the nodes in the subgraph to be equal to k, and thus completing the normalization process of the subgraph; wherein the dummy element node is not connected with any node in the original subgraph.

Preferably, the training the parameter of the convolutional neural network through a back propagation algorithm, and the obtaining of the trained convolutional neural network specifically includes:

Preferably, the reversely propagating the error according to the output result and the labeled error of the training text, distributing the error to each layer in the convolutional neural network to obtain error data of each layer, and modifying the parameter of the convolutional neural network according to the error data of each layer specifically includes:

J＝H+Cλ(w)

Preferably, when the classification output layer includes a plurality of classifiers, the cross entropy term is specifically:

Preferably, the training module 101 is further configured to construct a hierarchical relationship according to different categories, and obtain a regularization term at the fully-connected layer by using the following formula to update parameters of the fully-connected layer:

wherein, theFor the weight of the parental category in the hierarchical relationship,weights for the child categories in the hierarchical relationship.

Preferably, dropout is adopted in the fully-connected layer so that the activation output value of the fully-connected layer is randomly cleared at a preset ratio.

Preferably, the convolutional layer adopts a linear correction function as an activation function of the convolutional layer, and the activation function of the fully-connected layer is a Sigmoid function, a tanh (x) function or a softplus function.

In a preferred embodiment, the classification output layer is a softmax classifier.

In a preferred embodiment, the classification output layer comprises several Sigmoid functions.

In a preferred embodiment, the training module 101 is further configured to construct a tree structure according to a hierarchical relationship between different category constructions, partition the tree structure into a plurality of subtrees, and train the fully-connected layer with the subtrees as a unit; the tree structure comprises a plurality of nodes and edges, the nodes correspond to the categories, and the edges point to the categories of the next level from the categories of the previous level. Wherein the step of dividing the tree structure into a plurality of subtrees is specifically as follows:

traversing from any node by a depth-first and front-order traversal method, and when leaf nodes obtained by traversal are equal to a preset threshold value, segmenting the nodes and other traversed nodes into a sub-tree;

The specific implementation process and the working principle of the text classification device provided by the embodiment of the present invention may refer to the above specific description of the text classification method, and are not described herein again.

To sum up, the embodiment of the invention discloses a text classification method and a device, which comprises the steps of receiving a plurality of training texts of known types, preprocessing the training texts, constructing a graph structure of the training texts by adopting collinear relations of words, training parameters of a convolutional neural network through a back propagation algorithm according to the graph structure of the training texts to obtain the trained convolutional neural network, receiving input texts to be classified, preprocessing the texts to be classified, constructing the graph structure of the texts to be classified by adopting collinear relations of the words, and predicting the technical scheme of the types of the texts to be classified through the trained convolutional neural network according to the graph structure of the texts to be classified, so that the problem of text classification by using the convolutional neural network is solved, and the accuracy and the reliability of text classification are improved.

While the foregoing is directed to the preferred embodiment of the present invention, it will be appreciated by those skilled in the art that various changes and modifications may be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A method of text classification, comprising the steps of:

receiving training texts of a plurality of known categories, preprocessing the training texts, constructing a graph structure of the training texts by adopting a collinear relationship of words, and training parameters of the convolutional neural network through a back propagation algorithm according to the graph structure of the training texts to obtain the trained convolutional neural network; the convolutional neural network comprises at least one convolutional layer, at least one pooling layer, at least one full-connection layer and at least one classification output layer; in the graph structure of the training text, nodes correspond to words in the training text one by one;

receiving an input text to be classified, preprocessing the text to be classified, and constructing a graph structure of the text to be classified by adopting a collinear relation of words; in the graph structure of the text to be classified, nodes correspond to words in the text to be classified one by one;

2. The text classification method according to claim 1, wherein the preprocessing of the training text or the text to be classified is specifically:

after word segmentation processing is carried out on the training text or the text to be classified, noise and stop words of the training text or the text to be classified are removed, and word stems of all words in the training text or the text to be classified are extracted; wherein the noise point comprises punctuation marks and numbers, and the stop words comprise pronouns, conjunctions, prepositions and articles;

the method for constructing the graph structure of the training text or the text to be classified by adopting the collinear relationship of the words specifically comprises the following steps:

traversing the training text or the text to be classified through a sliding window with a preset size, and constructing an edge between any two words when the two words are simultaneously positioned in the sliding window, wherein the edge points to the next word from the previous word.

3. The method for classifying texts according to claim 1, wherein the predicting the classes of the texts to be classified through the trained convolutional neural network according to the graph structure of the texts to be classified specifically comprises:

and predicting the category of the text to be classified according to the output result of the classification output layer.

4. The text classification method according to claim 3, characterized in that said step of constructing a plurality of subgraphs according to the graph structure of the text to be classified, and normalizing each subgraph comprises the steps of:

extracting nodes of the graph structure of the text to be classified, and sequencing the nodes according to the contribution values; wherein; the contribution value is determined by the degree of each node, the word frequency of a word corresponding to the node in a text and the co-linearity of the node and a neighborhood node in sequence;

selecting N nodes before the node sorting as key nodes, taking each key node as a root node, constructing subgraphs through a breadth-first search algorithm, and carrying out normalization processing on each subgraph; wherein the subgraph comprises at least k nodes, N >0, k > < 0.

5. The text classification method according to claim 4, characterized in that the construction of subgraphs by breadth-first search algorithm, the normalization of each subgraph specifically being:

acquiring adjacent nodes of the root node, and if the number of the adjacent nodes of the root node is greater than k-1, constructing a subgraph by using the root node, the adjacent nodes of the root node and the edges of the root node and the adjacent nodes;

if the number of the adjacent nodes of the root node is less than k-1, acquiring secondary adjacent nodes of the root node step by step until the total number of the acquired adjacent nodes and secondary adjacent nodes is greater than or equal to k or the secondary adjacent nodes cannot be acquired continuously, and constructing a subgraph according to the root node, the adjacent nodes and the secondary adjacent nodes of the root node, edges of the root node and the adjacent nodes, edges of the adjacent nodes and the secondary adjacent nodes of the root node and edges between the secondary adjacent nodes; wherein the secondary neighbor node is a node that is indirectly connected to the root node;

when the nodes in the subgraph are smaller than k, adding a plurality of dummy nodes in the subgraph to enable the number of the nodes in the subgraph to be equal to k, and thus completing the normalization process of the subgraph; wherein the dummy node is not connected with any node in the original subgraph.

6. The text classification method according to claim 1, characterized in that the training of the parameters of the convolutional neural network by the back propagation algorithm to obtain the trained convolutional neural network specifically comprises:

7. The method for classifying texts according to claim 6, wherein the back propagation of the errors according to the output result and the labeled errors of the training texts is performed to distribute the errors to each layer in the convolutional neural network to obtain error data of each layer, and the correction of the parameters of the convolutional neural network according to the error data of each layer is specifically:

J＝H+Cλ(w)

carrying out recursive operation according to the residual error of the node to update the parameter of each neural node; wherein the parameters of the neural node include neural network weights and biases.

8. The text classification method according to claim 7, characterized in that when the classification output layer comprises a plurality of classifiers, the cross-entropy term is in particular:

9. The method for classifying text according to claim 8, further comprising the step of:

10. The text classification method according to claim 1, characterized in that the word vector of each node in the graph structure of the training text or the text to be classified is represented by a word2vec model or a glove model.

11. The method for classifying text according to claim 1, wherein the pooling layer down-samples the feature matrix using an average pooling, a maximum pooling, or a random pooling method.

12. The text classification method according to claim 1, characterized in that dropout is adopted at the fully-connected layer so that the activation output values of the fully-connected layer are randomly cleared at a preset ratio.

13. The text classification method of claim 6, wherein the convolutional layer employs a linear modification function as the activation function of the convolutional layer, and the activation function of the fully-connected layer is a Sigmoid function, a tanh (x) function, or a softplus function.

14. The semi-supervised learning method for heterogeneous information networks of claim 1, wherein the classification output layer is a softmax classifier.

15. The text classification method of claim 1, wherein the classification output layer includes Sigmoid functions.

16. The text classification method according to claim 1, characterized in that the method further comprises the steps of:

constructing a tree structure according to the hierarchical relationship among different category constructions, dividing the tree structure into a plurality of subtrees, and training the full-connection layer by taking the subtrees as a unit; the tree structure comprises a plurality of category nodes and category edges, wherein the category nodes correspond to categories, and the category edges point to the category of the next level from the category of the previous level; wherein the step of dividing the tree structure into a plurality of subtrees specifically comprises:

traversing from any category node by a depth-first and front-end traversal method, and when leaf nodes obtained by traversal are equal to a preset threshold value, segmenting the category nodes and traversed other category nodes into a sub-tree;

and taking the subtree as a category node and a leaf node, traversing from any category node by a depth-first and front-order traversal method, and if the leaf node obtained by traversal is equal to the preset threshold value, segmenting the category node and the traversed other category nodes into a subtree.

17. A text classification apparatus, comprising:

the training module is used for receiving a plurality of training texts of known types, preprocessing the training texts, constructing a graph structure of the training texts by adopting a word collinear relationship, and training parameters of the convolutional neural network through a back propagation algorithm according to the graph structure of the training texts to obtain the trained convolutional neural network; the convolutional neural network comprises at least one convolutional layer, at least one pooling layer, at least one full-connection layer and at least one classification output layer; in the graph structure of the training text, nodes correspond to words in the training text;

the system comprises a to-be-classified text receiving module, a to-be-classified text receiving module and a to-be-classified text preprocessing module, wherein the to-be-classified text receiving module is used for receiving an input to-be-classified text, and constructing a graph structure of the to-be-classified text by adopting a collinear relation of words after preprocessing the to-be-classified text; in the graph structure of the text to be classified, nodes correspond to words in the text to be classified;