CN107526785B - Text classification method and device - Google Patents

Text classification method and device Download PDF

Info

Publication number
CN107526785B
CN107526785B CN201710642105.0A CN201710642105A CN107526785B CN 107526785 B CN107526785 B CN 107526785B CN 201710642105 A CN201710642105 A CN 201710642105A CN 107526785 B CN107526785 B CN 107526785B
Authority
CN
China
Prior art keywords
text
nodes
node
classified
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710642105.0A
Other languages
Chinese (zh)
Other versions
CN107526785A (en
Inventor
彭浩
李建欣
何雨
刘垚鹏
包梦蛟
宋阳秋
杨强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou HKUST Fok Ying Tung Research Institute
Original Assignee
Guangzhou HKUST Fok Ying Tung Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou HKUST Fok Ying Tung Research Institute filed Critical Guangzhou HKUST Fok Ying Tung Research Institute
Priority to CN201710642105.0A priority Critical patent/CN107526785B/en
Publication of CN107526785A publication Critical patent/CN107526785A/en
Application granted granted Critical
Publication of CN107526785B publication Critical patent/CN107526785B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The text classification method and the text classification device are characterized in that a plurality of training texts of known classes are received, the training texts are preprocessed, a graph structure of the training texts is constructed by adopting a word co-occurrence relation, parameters of a convolutional neural network are trained through a back propagation algorithm according to the graph structure of the training texts, the trained convolutional neural network is obtained, then the input texts to be classified are received, the texts to be classified are preprocessed, a graph structure of the texts to be classified is constructed by adopting the word co-occurrence relation, then the class technical scheme of the texts to be classified is predicted through the trained convolutional neural network according to the graph structure of the texts to be classified, the problem of text classification by adopting the convolutional neural network is solved, and the accuracy and the reliability of text classification are improved.

Description

Text classification method and device
Technical Field
The invention relates to the field of machine learning, in particular to a text classification method and device.
Background
The convolutional neural network is an artificial neural network with deep learning capability designed according to the principle of a visual neural mechanism of primates. Hubel and Wiesel proposed a visual structure model based on the visual cortex of cats in 1962 and first proposed the concept of receptive fields. However, with the advent of simpler and more efficient linear classifiers such as support vector machines and the like and the ubiquitous local minimum limitation in the non-convex target cost function of the deep structure, the research of the neural network is brought into the low tide of the last two decades. Hinton et al are known to propose an unsupervised bead layer training method based on a Deep Belief Network (DBN) to solve the optimization problem related to the deep structure.
The conventional convolutional neural network is generally used for image classification, and the convolutional neural network is urgently needed to be applied to text classification, so that the accuracy and the reliability of the text classification are improved.
Disclosure of Invention
The embodiment of the invention aims to provide a text classification method and a text classification device, which can effectively solve the problem that the prior art is lack of text classification by using a convolutional neural network and improve the accuracy and the reliability of text classification.
In order to achieve the above object, an embodiment of the present invention provides a text classification method, including:
receiving training texts of a plurality of known categories, preprocessing the training texts, constructing a graph structure of the training texts by adopting a word co-occurrence relation, and training parameters of the convolutional neural network through a back propagation algorithm according to the graph structure of the training texts to obtain the trained convolutional neural network; the convolutional neural network comprises at least one convolutional layer, at least one pooling layer, at least one full-connection layer and at least one classification output layer; in the graph structure of the training text, nodes correspond to words in the training text one by one;
receiving an input text to be classified, preprocessing the text to be classified, and constructing a graph structure of the text to be classified by adopting a word co-occurrence relation; in the graph structure of the text to be classified, nodes correspond to words in the text to be classified one by one;
and predicting the category of the text to be classified through the trained convolutional neural network according to the graph structure of the text to be classified.
Compared with the prior art, the text classification method disclosed by the invention has the advantages that the training texts of a plurality of known classes are received, the graph structure of the training texts is constructed by adopting the co-occurrence relation of words after the training texts are preprocessed, the parameters of the convolutional neural network are trained through a back propagation algorithm according to the graph structure of the training texts to obtain the trained convolutional neural network, then the input texts to be classified are received, the graph structure of the texts to be classified is constructed by adopting the co-occurrence relation of the words after the texts to be classified are preprocessed, then the class technical scheme of the texts to be classified is predicted through the trained convolutional neural network according to the graph structure of the texts to be classified, the problem of text classification is solved by applying the convolutional neural network, and the accuracy and the reliability of text classification are improved.
As an improvement of the above scheme, the preprocessing of the training text or the text to be classified specifically includes:
after word segmentation processing is carried out on the training text or the text to be classified, noise and stop words of the training text or the text to be classified are removed, and word stems of all words in the training text or the text to be classified are extracted; wherein the noise point comprises punctuation marks and numbers, and the stop words comprise pronouns, conjunctions, prepositions and articles;
constructing a graph structure of the training text or the text to be classified by adopting a co-occurrence relation of words, which specifically comprises the following steps:
traversing the training text or the text to be classified through a sliding window with a preset size, and constructing an edge between any two words when the two words are simultaneously positioned in the sliding window, wherein the edge points to the next word from the previous word. Removing words with weak semantics and unrealistic meanings is necessary for highlighting the theme of the text and accurately classifying the text.
As an improvement of the above scheme, predicting the category of the text to be classified through the trained convolutional neural network according to the graph structure of the text to be classified and the graph structure of the text to be classified specifically includes:
constructing a plurality of subgraphs according to the graph structure of the text to be classified, carrying out normalization processing on each subgraph, and obtaining word vector representation of each node in each subgraph as the input of a convolutional neural network;
and predicting the category of the text to be classified according to the output result of the classification output layer. For each sub-image, the sub-images have different semantic information, the distance between the sub-images in the N-gram model of the text is long, and for text classification, high-level abstract features can be obtained by extracting the features of the different sub-images and then fusing the features, so that an accurate text classification result is obtained.
As an improvement of the above scheme, the constructing a plurality of sub-images according to the graph structure of the text to be classified, and the normalizing each sub-image comprises the steps of:
extracting nodes of the graph structure of the text to be classified, and sequencing the nodes according to the contribution values; wherein; the contribution value is determined by the degree of each node, the word frequency of a word corresponding to the node in a text and the co-occurrence rate of the node and a neighborhood node in sequence;
selecting nodes of N before sequencing from the nodes as key nodes, taking each key node as a root node, constructing sub-graphs through a breadth-first search algorithm, and carrying out normalization processing on each sub-graph; wherein the subgraph includes at least k nodes, N >0, k > 0.
As an improvement of the above scheme, the constructing of subgraphs by a breadth-first search algorithm, and the normalization processing of each subgraph specifically comprises:
acquiring adjacent nodes of the root node, and if the number of the adjacent nodes of the root node is greater than k-1, constructing a subgraph by using the root node, the adjacent nodes of the root node and the edges of the root node and the adjacent nodes;
if the number of the adjacent nodes of the root node is less than k-1, acquiring secondary adjacent nodes of the root node step by step until the total number of the acquired adjacent nodes and secondary adjacent nodes is greater than or equal to k or the secondary adjacent nodes cannot be acquired continuously, and constructing a subgraph according to the root node, the adjacent nodes and the secondary adjacent nodes of the root node, edges of the root node and the adjacent nodes, edges of the adjacent nodes and the secondary adjacent nodes of the root node and edges between the secondary adjacent nodes; wherein the secondary neighboring node is a node indirectly connected to the root node;
constructing a spanning tree according to the subgraph, and sequencing nodes of the spanning tree from a shallow layer to a deep layer by using a breadth first algorithm;
in the same layer, sequencing the adjacent nodes of the root node according to the size of the contribution value;
when the node in the subgraph is larger than k, reserving the node of k before the sequencing in the spanning tree, thereby completing the normalization process of the subgraph;
when the nodes in the subgraph are smaller than k, adding a plurality of dummy nodes in the subgraph to enable the number of the nodes in the subgraph to be equal to k, and thus completing the normalization process of the subgraph; wherein the dummy node is not connected with any node in the original subgraph.
As an improvement of the above scheme, the training of the parameters of the convolutional neural network through a back propagation algorithm to obtain the trained convolutional neural network specifically includes:
initializing parameters of the convolutional neural network, and carrying out forward propagation on the training texts of the known types through the convolutional neural network to obtain output results;
and performing back propagation according to the output result and the error marked by the training text, distributing the error to each layer in the convolutional neural network to obtain error data of each layer, and correcting the parameters of the convolutional neural network according to the error data of each layer.
As an improvement of the above scheme, the reversely propagating the error according to the output result and the labeled error of the training text, distributing the error to each layer in the convolutional neural network to obtain error data of each layer, and modifying the parameter of the convolutional neural network according to the error data of each layer specifically includes:
constructing a loss function according to the output result and the known class of the training text, and acquiring a residual error of any neural node in the convolutional neural network according to the loss function; wherein the loss function is:
J=H+Cλ(w)
wherein H is a cross entropy term and C λ (w) is a regularization term that prevents overfitting;
carrying out recursive operation according to the residual error of the node to update the parameter of each neural node; wherein the parameters of the neural node include neural network weights and biases.
As an improvement of the above scheme, when the classification output layer includes a plurality of classifiers, the cross entropy term specifically is:
Figure GDA0002474592750000051
wherein M is the number of training texts of known classes, K is the number of classifiers of the classification output layer,
Figure GDA0002474592750000052
for training text dmWhether or not to belong to class k1The binary label of (a) is stored,
Figure GDA0002474592750000053
for training text dmIn a prediction class k obtained by the convolutional neural network1The probability of (c).
As an improvement of the above, the method further comprises the steps of:
constructing a grade relation according to different categories, and acquiring a regularization term at the full-connection layer through the following formula to update parameters of the full-connection layer:
Figure GDA0002474592750000054
wherein, the
Figure GDA0002474592750000055
For the weight of the parental category in the hierarchical relationship,
Figure GDA0002474592750000056
is the weight of the child category in the hierarchical relationship.
As an improvement of the scheme, the word vector of each node in the graph structure of the text is represented by a word2vec model or a glove model.
As an improvement of the scheme, the pooling layer adopts an average value pooling method, a maximum value pooling method or a random pooling method to perform downsampling processing on the feature matrix.
As an improvement of the above scheme, dropout is adopted in the fully connected layer to randomly clear the activation output value of the fully connected layer at a preset ratio.
As an improvement of the above solution, the convolutional layer adopts a linear correction function as an activation function of the convolutional layer, and the activation function of the fully-connected layer is a Sigmoid function, a tanh (x) function or a softplus function.
As an improvement of the scheme, the classifier is a softmax classifier.
As an improvement of the above scheme, the classifier includes several Sigmoid functions.
As an improvement of the above, the method further comprises the steps of:
constructing a tree structure according to the hierarchical relationship among different category constructions, dividing the tree structure into a plurality of subtrees, and training the full-connection layer by taking the subtrees as a unit; the tree structure comprises a plurality of category nodes and category edges, wherein the category nodes correspond to categories, and the category edges point to the category of the next level from the category of the previous level; wherein the step of dividing the tree structure into a plurality of subtrees is specifically as follows:
traversing from any category node by a depth-first and front-end traversal method, and when leaf nodes obtained by traversal are equal to a preset threshold value, segmenting the category nodes and traversed other category nodes into a sub-tree;
and taking the subtree as a category node and a leaf node, traversing from any category node by a depth-first and front-order traversal method, and if the leaf node obtained by traversal is equal to the preset threshold value, segmenting the category node and the traversed other category nodes into a subtree.
An embodiment of the present invention further provides a text classification apparatus, including:
the training module is used for receiving training texts of a plurality of known categories, preprocessing the training texts, constructing a graph structure of the training texts by adopting word co-occurrence relation, and training parameters of the convolutional neural network through a back propagation algorithm according to the graph structure of the training texts to obtain the trained convolutional neural network; the convolutional neural network comprises at least one convolutional layer, at least one pooling layer, at least one full-connection layer and at least one classification output layer; in the graph structure of the training text, nodes correspond to words in the training text;
the system comprises a to-be-classified text receiving module, a to-be-classified text receiving module and a to-be-classified text preprocessing module, wherein the to-be-classified text receiving module is used for receiving an input to-be-classified text, and constructing a graph structure of the to-be-classified text by adopting a word co-occurrence relation after preprocessing the to-be-classified text; in the graph structure of the text to be classified, nodes correspond to words in the text to be classified;
and the class prediction module is used for predicting the class of the text to be classified through the trained convolutional neural network according to the graph structure of the text to be classified.
The text classification device disclosed by the invention receives a plurality of training texts with known categories through a training module, preprocesses the training texts, constructs a graph structure of the training texts by adopting the co-occurrence relation of words, training the parameters of the convolutional neural network through a back propagation algorithm according to the graph structure of the training text to obtain the trained convolutional neural network, then the input text to be classified is received by a text receiving module to be classified, after the text to be classified is preprocessed, constructing a graph structure of the text to be classified by adopting the co-occurrence relation of words, then predicting the graph structure of the text to be classified by a category prediction module, and predicting the category technical scheme of the text to be classified through the trained convolutional neural network, and solving the problem of text classification by applying the convolutional neural network, thereby improving the accuracy and the reliability of text classification.
Drawings
Fig. 1 is a schematic flowchart of a text classification method provided in embodiment 1 of the present invention.
FIG. 2 is a schematic diagram of a graph structure for constructing text in accordance with the present invention.
Fig. 3 is a process of operation of the convolutional neural network of the present invention.
FIG. 4 is a working process of sub-graph construction and normalization of the graph structure of the text in the present invention.
FIG. 5 is a schematic diagram of the relationship of parent and child nodes for the categories of the present invention.
FIG. 6 is a diagram illustrating the unit classification of categories according to the present invention.
Fig. 7 is a schematic structural diagram of a text classification apparatus according to embodiment 2 of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, which is a schematic flow chart of a text classification method provided in embodiment 1 of the present invention, including the steps of:
s1, receiving training texts of a plurality of known types, preprocessing the training texts, constructing a graph structure of the training texts by adopting word co-occurrence relation, and training parameters of the convolutional neural network through a back propagation algorithm according to the graph structure of the training texts to obtain the trained convolutional neural network; the convolutional neural network comprises at least one convolutional layer, at least one pooling layer, at least one full-connection layer and at least one classification output layer; in the graph structure of the training text, nodes correspond to words in the training text one by one;
it should be noted that the Back Propagation process (BP) is a bottom-up process, belongs to a supervised learning algorithm, and is suitable for training a forward neural network. Then, in this step, the farmer trains the parameters of the convolutional neural network through a back propagation algorithm, and the obtaining of the trained convolutional neural network specifically includes:
initializing parameters of the convolutional neural network, and carrying out forward propagation on the training texts of the known types through the convolutional neural network to obtain output results; preferably, the initialization of the parameters of the convolutional neural network can be performed by a robust weight initialization method, and the weight is initialized to be distributed in a zero mean Gaussian manner.
And performing back propagation according to the output result and the error marked by the training text, distributing the error to each layer in the convolutional neural network to obtain error data of each layer, and correcting the parameters of the convolutional neural network according to the error data of each layer.
In the embodiment of the invention, the convolutional neural network is adopted for text classification, the original input data can be directly applied, and the training method extracts and finds the best characteristic by adjusting trainable parameters. Where the input layer may act directly on the original input data, for a graph structure of text, the input data is a word vector representation of the text. Convolutional layers, also called feature extraction layers, the corresponding convolutional kernels can also be filters, the convolutional kernels typically involving the size, number and step size of the convolutional kernels. The number of convolution kernels indicates the number of feature maps obtained by convolution filtering from an upper layer, and the more feature maps are extracted, the larger the network representation feature space is, and the stronger the learning ability is. However, too many convolution kernels increase the complexity of the network, increase the number of parameters, increase the complexity of calculation, and easily cause an overfitting phenomenon. Therefore, the number of convolution kernels needs to be determined according to the data set size of a specific application. On the other hand, the size of the convolution kernel determines the size of the feature map, and the step size of the convolution kernel determines the step size and the number of features of the acquired image. The pooling layer, also called a lower adoption layer, mainly aims to reduce the data processing amount and accelerate the network training speed on the basis of retaining useful information. The more convolutional and pooling layers, the more abstract features can be extracted on the basis of the previous layer. Further, the pooling layer performs down-sampling processing on the feature matrix by using an average pooling method, a maximum pooling method or a random pooling method. The mean sampling means averaging the feature points in the domain, and the maximum sampling means maximizing the feature points in the domain. The random sampling steps are as follows: firstly, the statistical sum of the matrix to be pooled is obtained, the proportion of each element in the statistical sum is calculated, random sampling is carried out according to the proportion as the probability, and the probability of sampling with large elements is also high.
S2, receiving an input text to be classified, preprocessing the text to be classified, and constructing a graph structure of the text to be classified by adopting a word co-occurrence relation; in the graph structure of the text to be classified, nodes correspond to words in the text to be classified one by one;
the preprocessing of the training text or the text to be classified specifically comprises the following steps:
after word segmentation processing is carried out on the training text or the text to be classified, noise and stop words of the training text or the text to be classified are removed, and word stems of all words in the training text or the text to be classified are extracted; wherein the noise comprises punctuation marks and numbers, and the stop words comprise pronouns, conjunctions, prepositions and articles. Specifically, since noise and stop words have no practical meaning and carry little information, the text distinguishing capability is weak except for the function of a language model and sentiment analysis, and thus the noise and the stop words need to be removed in text classification. In practical application, a stop word list is usually pre-established, then the word stop word list obtained by word segmentation is matched, if a word exists in the list, the word is indicated as a stop word, and then the word is deleted; if not, it is retained. Besides some stop words, there are many semantically ambiguous adverbs, numbers, and directional words, such as "in", "one", and "very", which do not contribute much to the content of the text, so that it is necessary to remove the words with weak semantics but not practical meaning, and it is necessary to highlight the subject of the text and classify the text accurately.
In addition, a graph structure of the training text or the text to be classified is constructed by adopting a co-occurrence relation of words, and the method specifically comprises the following steps:
traversing the training text or the text to be classified through a sliding window with a preset size, and constructing an edge between any two words when the two words are simultaneously positioned in the sliding window, wherein the edge points to the next word from the previous word. In addition, the graph structure of the text is constructed, so that the text can be classified by combining the conventional graph convolutional neural network classification method, the accuracy is high, and the parameters of the model are easy to train. A graph is a data structure composed of finite and nonempty sets of vertices and edges between the vertices, and is usually represented by G ═ V, E, W, where G represents a graph, V is the set of vertices in the graph G, E is the set of edges in the graph G, and W is the set of vertices and weights of the edges. That is, a graph structure is a structure composed of nodes, directed edges connecting the nodes, and weights representing the degrees of importance of the nodes and the edges. The graph structure of the text is constructed through the co-occurrence relationship, the sequence relationship (context) of the words can be obtained, the key information of the text is not lost, and the accurate classification result is favorably obtained. As shown in fig. 2, the sliding window has a size of 3, and one edge is constructed every time co-occurrence occurs. Of course, the number of edges between every two words can be reduced to one, and the weight of the edge is in a direct proportion relationship with the co-occurrence rate of the two words corresponding to the edge or the weight of the edge is in a direct proportion relationship with the similarity of the two words connected by the edge.
And S3, predicting the category of the text to be classified through the trained convolutional neural network according to the graph structure of the text to be classified.
The convolutional layer preferably uses a linear correction function (reconstructed linear unit, Re L U) as the activation function of the convolutional layer, and when the input value is less than or equal to zero, the linear correction function is forced to be equal to zero, and when the input value is greater than zero, the linear correction function is kept unchanged, which can bring moderate sparsity to the trained network, can greatly reduce the training time, improve the performance of the network, and is closer to the nature of biological neuron activation, namely, the excitation principle of a neuron signal.
Preferably, the classification output layer is a softmax classifier. The softmax is expanded on the basis of logistic regression, can solve the problem of multi-classification, and preferably calculates the distribution probability of each class through the following formula:
Figure GDA0002474592750000111
wherein s isiRepresents the output value, s, of the ith neuron of the Softmax classifieriF · η, where F is a word vector of key nodes of a certain training text, η is a corresponding weight, and n is the number of categories to be classified.
Preferably, the classification output layer comprises several Sigmoid classifiers. The Sigmoid function is a commonly used S-type nonlinear activation function, and the specific formula is f (x) -e (1+ e)-x)-1Its function is to compress a real number between 0 and 1. During learning, the important features are pushed to the middle area, and the non-important features are pushed to the two side areas, which are consistent with synapses of neurons in neurology.
Preferably, since the loss function is non-convex and has no analytic solution, the loss function is intelligently solved by an optimization algorithm, and the optimization algorithm can adopt a random gradient descent algorithm, an adaptive gradient algorithm and a Nesterov gradient acceleration algorithm. When the trained data set samples are not enough or the data are over-trained, overfitting can be generated, and in order to enhance the network generalization capability and prevent overfitting, the main methods include data enhancement, weight attenuation, dropout, dropconnect and the like, wherein dropout is a method for keeping the network part in an inoperative state in the model training process, namely enabling the output value of a node to be zero at a certain probability, and the weight corresponding to the node is not updated in the back propagation process. And dropconnect clears the input weight of the neural node with a certain probability. Both Dropout and dropconnect can reduce network overfitting, inhibit the classification error rate of the network, and improve the performance of the network. Preferably, dropout can be adopted in the fully-connected layer so that the activation output value of the fully-connected layer is randomly cleared at a preset ratio.
When the method is specifically implemented, a training text of a known class is received, a graph structure is constructed based on the training text, parameters of the convolutional neural network are trained through a back propagation algorithm to obtain the trained convolutional neural network, then the input text to be classified is received, the graph structure of the text to be classified is constructed, the class of the text to be classified is predicted through the trained convolutional neural network according to the graph structure of the text to be classified, the convolutional neural network for processing natural images is applied to text classification, the accuracy of text classification can be improved, and the method is fast and effective.
In the identification process of processing a natural image, a common means of a convolutional neural network is to divide the natural image into a plurality of subgraphs and then perform a feature extraction process on each subgraph. Preferably, in this scheme, a plurality of sub-graphs are constructed based on a graph structure of a text as an input of a convolutional neural network, and then predicting the category of the text to be classified through the trained convolutional neural network according to the graph structure of the text to be classified in step S2 specifically includes:
constructing a plurality of subgraphs according to the graph structure of the text to be classified, carrying out normalization processing on each subgraph, and obtaining word vector representation of each node in each subgraph as the input of a convolutional neural network;
predicting the category of the text to be classified according to the output result of the classification output layer;
the convolution layer is used for receiving a matrix input by the previous layer to carry out convolution operation to generate a characteristic matrix, and the pooling layer carries out down-sampling operation to generate a characteristic mapping matrix by receiving the characteristic matrix output by the previous convolution layer; the full connection layer is used for performing characteristic weighting operation on the characteristic mapping matrix output by the previous pooling layer and outputting an attribute characteristic matrix; the classification output layer is used for receiving the attribute feature matrix output by the last full connection layer to obtain the output result of the classification output layer, and predicting the category of the text to be classified according to the output result of the classification output layer.
For each sub-image, the sub-images have different semantic information, the distance between the sub-images in the N-gram model of the text is long, and for text classification, high-level abstract features can be obtained by extracting the features of the different sub-images and then fusing the features, so that an accurate text classification result is obtained. On the other hand, the method is more in line with the characteristic of artificial neural network local perception, has higher efficiency, can greatly reduce the training parameters, and has great advantage in the training speed.
Preferably, the constructing a plurality of sub-images according to the graph structure of the text to be classified, and the normalizing each sub-image comprises the steps of:
extracting nodes of the graph structure of the text to be classified, and sequencing the nodes according to the contribution values; wherein; the contribution value is determined by the degree of each node, the word frequency of a word corresponding to the node in a text and the co-occurrence rate of the node and a neighborhood node in sequence;
selecting nodes of N before sequencing from the nodes as key nodes, taking each key node as a root node, constructing sub-graphs through a breadth-first search algorithm, and carrying out normalization processing on each sub-graph; wherein the subgraph includes at least k nodes, N >0, k > 0.
The subgraph is constructed by the breadth-first search algorithm, so that the information of the original text, including keywords, context information and the like, can be retained to the maximum extent, the final classification result is facilitated, the data processing amount of a computer can be reduced, the time complexity is reduced, and the method is quick and effective. Further, when each sub-graph is used as an input of the neural network, it is necessary to obtain a vector representation of each word, i.e. a so-called word vector, word embedding, that is, a word in a natural language is digitized, and a string of continuous digital vectors is used for representing the word. One of the simplest word vector representation modes is one-hot word vector representation, the length of the vector is the size of a dictionary, only one bit in the number of bits of the vector is 1, and the position of 1 is the position of the word in the dictionary. Another method for representing word vectors is distributed word vector Representation (Distribution Representation), which is a Representation of word semantics obtained through model training. Preferably, the scheme can adopt a word2vec model or a glove model to represent the word vector. The word2vec comprises two models, namely a CBOW (continuous Bag-of-Words) model and a Skip-gram model, wherein the CBOW is a forward neural similar model and is the probability distribution of a given context prediction target word, the Skip-gram model is the probability value of the given context prediction target word, both are a target function, and then an optimization method is adopted to find the optimal parameter solution of the model, so that word vector representation is obtained. The word2vec model can simplify vector representation of words, the distance in the vector space can be used for representing the similarity of text semantics, and word order and context semantic information are considered.
For convenience of explanation, N sub-graphs with 5 nodes and 50 word vector dimensions are used as the input of the convolutional neural network for specific explanation. As shown in fig. 3, the input of NX5X50 is convolved by 64 convolution kernels of 5X50 to obtain NX64 feature matrices, and the N/2X64 feature mapping matrix and N/2 subgraphs are obtained by the maximum downsampling process of the next pooling layer; then, further convolving the N/2 sub-graphs by 128 convolution kernels of 5X1 to obtain 64X128 matrixes, and obtaining 32X128 feature mapping matrixes by a maximum value down-sampling process of a pooling layer; then, the convolution is performed by 256 convolution kernels of 5 × 1 at step size 3 to obtain 10 × 256, and 5 × 256 feature mapping matrixes are obtained through the maximum value downsampling process of the pooling layer. As shown in the figure, three fully connected layers are provided, the dropout parameter of each fully connected layer is 0.5, the classification output layer comprises K classifiers, and K classification results are output.
Preferably, the constructing sub-graphs by the breadth-first search algorithm, and the normalizing each sub-graph specifically comprises:
acquiring adjacent nodes of the root node, and if the number of the adjacent nodes of the root node is greater than k-1, constructing a subgraph by using the root node, the adjacent nodes of the root node and the edges of the root node and the adjacent nodes;
if the number of the adjacent nodes of the root node is less than k-1, acquiring secondary adjacent nodes of the root node step by step until the total number of the acquired adjacent nodes and secondary adjacent nodes is greater than or equal to k or the secondary adjacent nodes cannot be acquired continuously, and constructing a subgraph according to the root node, the adjacent nodes and the secondary adjacent nodes of the root node, edges of the root node and the adjacent nodes, edges of the adjacent nodes and the secondary adjacent nodes of the root node and edges between the secondary adjacent nodes; wherein the secondary neighboring node is a node indirectly connected to the root node;
constructing a spanning tree according to the subgraph, and sequencing nodes of the spanning tree from a shallow layer to a deep layer by using a breadth first algorithm;
in the same layer, sequencing the adjacent nodes of the root node according to the size of the contribution value;
when the node in the subgraph is larger than k, reserving the node of k before the sequencing in the spanning tree, thereby completing the normalization process of the subgraph;
when the nodes in the subgraph are smaller than k, adding a plurality of dummy nodes in the subgraph to enable the number of the nodes in the subgraph to be equal to k, and thus completing the normalization process of the subgraph; wherein the dummy node is not connected with any node in the original subgraph.
As shown in fig. 4, which is a specific schematic diagram of subgraph construction and normalization, the key nodes obtained according to the contribution values are respectively "gold scale", "england", "club", "fit", "high", "great", "unit" and "true", the key nodes are respectively used as root nodes to perform traversal through a breadth-first search strategy to obtain a plurality of subgraphs with different semantics, and each subgraph is normalized to be used as the input of the neural network to perform feature extraction and fusion of each subgraph.
And the subsequent characteristic extraction and characteristic mapping processes are facilitated through the normalization process of the subgraph.
Preferably, the training of the parameters of the convolutional neural network through the back propagation algorithm in step S1 specifically includes:
initializing parameters of the convolutional neural network, and carrying out forward propagation on the training texts of the known types through the convolutional neural network to obtain output results;
constructing a loss function according to the output result and the known class of the training text, and acquiring a residual error of any neural node in the convolutional neural network according to the loss function; wherein the loss function is:
J=H+Cλ(w)
wherein H is a cross entropy term and C λ (w) is a regularization term that prevents overfitting;
then, carrying out recursive operation according to the residual error of the node to update the parameter of each neural node; wherein the parameters of the neural node include neural network weights and biases.
Further, when the classification output layer includes a plurality of classifiers, the cross entropy term is specifically:
Figure GDA0002474592750000151
wherein M is the number of training texts of known classes, K is the number of classifiers of the classification output layer,
Figure GDA0002474592750000152
for training text dmWhether or not to belong to class k1The binary label of (a) is stored,
Figure GDA0002474592750000153
for training text dmIn a prediction class k obtained by the convolutional neural network1The probability of (c).
For a large number of label categories, some categories have a relationship between parents and children, wherein the classifier characteristics (parameters) of the child category inherit the classifier characteristics (parameters) of the parent category, in order to reduce the learning parameters, a hierarchical relationship is constructed according to different categories, and a regularization term is obtained at the full-connection layer through the following formula to update the parameters of the full-connection layer:
Figure GDA0002474592750000161
wherein, the
Figure GDA0002474592750000162
For the weight of the parental category in the hierarchical relationship,
Figure GDA0002474592750000163
is the weight of the child category in the hierarchical relationship.
Through the steps, the classification performance can be greatly improved by introducing the dependency relationship among all the categories, and when the training data of the child nodes is less, the corresponding parameters can be adjusted through the training data of the parent nodes. As a way of simplifying data processing, the hierarchical relationship of the categories can promote the similarity of the parameters corresponding to the categories. As shown in FIG. 5, "computing" is used as a parent node of "intellectual intersection", and their parameters can be considered similar.
In order to speed up the training process, on the basis of the above embodiment, the text classification method further includes the steps of:
constructing a tree structure according to the hierarchical relationship among different category constructions, dividing the tree structure into a plurality of subtrees, and training the full-connection layer by taking the subtrees as a unit; the tree structure comprises a plurality of nodes and edges, the nodes correspond to classes, and the edges point to the classes of the next level from the classes of the previous level; wherein the step of dividing the tree structure into a plurality of subtrees is specifically as follows:
traversing from any node by a depth-first and front-order traversal method, and when a leaf node obtained by traversing is equal to a preset threshold value, segmenting the node and other traversed nodes into a sub-tree;
and traversing from any node by using the subtree as a leaf node through a depth-first and front-order traversal method, and if the obtained leaf node is equal to a preset threshold value, partitioning the node and other traversed nodes into a subtree.
As shown in fig. 6, the preset threshold is 5, when traversing node a, and when the number of traversed leaf nodes is 5, the subtree (i) can be divided into a training unit; if the leaf node of the node F is 4, the preset threshold 5 can be met by combining the node E and the node F.
Through the scheme, each category is divided into blocks, each block is trained respectively, large-scale category prediction problems can be processed by a computer, in practical application, upper-layer nodes such as B, E and F nodes in a graph can be trained firstly, and then child nodes and leaf nodes of the nodes are trained through a recursive algorithm. The recursive distributed learning algorithm provided by the scheme realizes large-scale learning of classification methods such as text classification or image classification, solves the limitation that the prior art can only realize small-scale learning, and has important practical significance.
Referring to fig. 7, which is a schematic structural diagram of a text classification apparatus provided in embodiment 2 of the present invention, including:
the training module 101 is configured to receive training texts of a plurality of known categories, pre-process the training texts, construct a graph structure of the training texts by using a word co-occurrence relationship, and train parameters of the convolutional neural network through a back propagation algorithm according to the graph structure of the training texts to obtain the trained convolutional neural network; the convolutional neural network comprises at least one convolutional layer, at least one pooling layer, at least one full-connection layer and at least one classification output layer; in the graph structure of the training text, nodes correspond to words in the training text;
the text to be classified receiving module 102 is configured to receive an input text to be classified, preprocess the text to be classified, and construct a graph structure of the text to be classified by using a co-occurrence relationship of words; in the graph structure of the text to be classified, nodes correspond to words in the text to be classified;
and the class prediction module 103 is configured to predict the class of the text to be classified through the trained convolutional neural network according to the graph structure of the text to be classified.
The word vector of each node in the graph structure of the training text or the text to be classified is represented by a word2vec model or a glove model; and the pooling layer performs down-sampling processing on the feature matrix by adopting an average pooling method, a maximum pooling method or a random pooling method.
Preferably, the preprocessing of the training text or the text to be classified specifically includes:
after word segmentation processing is carried out on the training text or the text to be classified, noise and stop words of the training text or the text to be classified are removed, and word stems of all words in the training text or the text to be classified are extracted; wherein the noise point comprises punctuation marks and numbers, and the stop words comprise pronouns, conjunctions, prepositions and articles;
constructing a graph structure of the training text or the text to be classified by adopting a co-occurrence relation of words, which specifically comprises the following steps:
traversing the training text or the text to be classified through a sliding window with a preset size, and constructing an edge between any two words when the two words are simultaneously positioned in the sliding window, wherein the edge points to the next word from the previous word.
Further, according to the graph structure of the text to be classified, predicting the category of the text to be classified through the trained convolutional neural network specifically comprises:
constructing a plurality of sub-images according to the graph structure of the text to be classified, and carrying out normalization processing on each sub-image;
acquiring word vector representation of each node in each subgraph as input of a convolutional neural network, performing convolutional operation through a convolutional layer to generate a feature matrix, and performing down-sampling processing on the feature matrix through a pooling layer to obtain a feature mapping matrix;
if the next layer of the pooling layer is a convolution layer, performing convolution operation and down-sampling processing on the feature mapping matrix output by the previous layer, and outputting the feature mapping matrix;
if the next layer of the pooling layer is a full-connection layer, performing feature weighting operation on the feature mapping matrix output by the previous layer, and outputting an attribute feature matrix;
if the next layer of the full connection layer is the full connection layer, continuously performing feature weighting operation on the attribute feature matrix output by the previous layer to output the attribute feature matrix;
if the next layer of the full connection layer is a classification output layer, obtaining an output result of the classification output layer according to the attribute feature matrix output by the previous layer, and predicting the category of the text to be classified according to the output result of the classification output layer.
Wherein, the construction of a plurality of subgraphs according to the graph structure of the text to be classified comprises the following steps:
extracting nodes of the graph structure of the text to be classified, and sequencing the nodes according to the contribution values; wherein; the contribution value is determined by the degree of each node, the word frequency of a word corresponding to the node in a text and the co-occurrence rate of the node and a neighborhood node in sequence;
selecting nodes of N before sequencing from the nodes as key nodes, taking each key node as a root node, constructing sub-graphs through a breadth-first search algorithm, and carrying out normalization processing on each sub-graph; wherein the subgraph includes at least k nodes, N >0, k > 0.
Further, constructing sub-graphs through a breadth-first search algorithm, and performing normalization processing on each sub-graph specifically comprises:
acquiring adjacent nodes of the root node, and if the number of the adjacent nodes of the root node is greater than k-1, constructing a subgraph by using the root node, the adjacent nodes of the root node and the edges of the root node and the adjacent nodes;
if the number of the adjacent nodes of the root node is less than k-1, acquiring secondary adjacent nodes of the root node step by step until the total number of the acquired adjacent nodes and secondary adjacent nodes is greater than or equal to k or the secondary adjacent nodes cannot be acquired continuously, and constructing a subgraph according to the root node, the adjacent nodes and the secondary adjacent nodes of the root node, edges of the root node and the adjacent nodes, edges of the adjacent nodes and the secondary adjacent nodes of the root node and edges between the secondary adjacent nodes; wherein the secondary neighboring node is a node indirectly connected to the root node;
constructing a spanning tree according to the subgraph, and sequencing nodes of the spanning tree from a shallow layer to a deep layer by using a breadth first algorithm;
in the same layer, sequencing the adjacent nodes of the root node according to the size of the contribution value;
when the node in the subgraph is larger than k, reserving the node of k before the sequencing in the spanning tree, thereby completing the normalization process of the subgraph;
when the nodes in the subgraph are smaller than k, adding a plurality of dummy nodes in the subgraph to enable the number of the nodes in the subgraph to be equal to k, and thus completing the normalization process of the subgraph; wherein the dummy node is not connected with any node in the original subgraph.
Preferably, the training the parameter of the convolutional neural network through a back propagation algorithm, and the obtaining of the trained convolutional neural network specifically includes:
initializing parameters of the convolutional neural network, and carrying out forward propagation on the training texts of the known types through the convolutional neural network to obtain output results;
and performing back propagation according to the output result and the error marked by the training text, distributing the error to each layer in the convolutional neural network to obtain error data of each layer, and correcting the parameters of the convolutional neural network according to the error data of each layer.
Preferably, the performing back propagation according to the output result and the error of the label of the training text, the distributing the error to each layer in the convolutional neural network to obtain error data of each layer, and the correcting the parameter of the convolutional neural network according to the error data of each layer specifically includes:
constructing a loss function according to the output result and the known class of the training text, and acquiring a residual error of any neural node in the convolutional neural network according to the loss function; wherein the loss function is:
J=H+Cλ(w)
wherein H is a cross entropy term and C λ (w) is a regularization term that prevents overfitting;
carrying out recursive operation according to the residual error of the node to update the parameter of each neural node; wherein the parameters of the neural node include neural network weights and biases.
Preferably, when the classification output layer includes a plurality of classifiers, the cross entropy term is specifically:
Figure GDA0002474592750000201
wherein M is the number of training texts of known classes, K is the number of classifiers of the classification output layer,
Figure GDA0002474592750000202
for training text dmWhether or not to belong to class k1The binary label of (a) is stored,
Figure GDA0002474592750000203
for training text dmIn a prediction class k obtained by the convolutional neural network1The probability of (c).
Preferably, the training module 101 is further configured to construct a hierarchical relationship according to different categories, and obtain a regularization term at the fully-connected layer by using the following formula to update parameters of the fully-connected layer:
Figure GDA0002474592750000211
wherein, the
Figure GDA0002474592750000212
For the weight of the parental category in the hierarchical relationship,
Figure GDA0002474592750000213
is the weight of the child category in the hierarchical relationship.
Preferably, dropout is adopted in the fully connected layer so that the activation output value of the fully connected layer is randomly cleared at a preset ratio.
Preferably, the convolutional layer adopts a linear correction function as an activation function of the convolutional layer, and the activation function of the fully-connected layer is a Sigmoid function, a tanh (x) function or a softplus function.
In a preferred embodiment, the classification output layer is a softmax classifier.
In a preferred embodiment, the classification output layer comprises several Sigmoid functions.
In a preferred embodiment, the training module 101 is further configured to construct a tree structure according to a hierarchical relationship between different category constructions, partition the tree structure into a plurality of subtrees, and train the fully-connected layer with the subtrees as a unit; the tree structure comprises a plurality of nodes and edges, the nodes correspond to the categories, and the edges point to the categories of the next level from the categories of the previous level. Wherein the step of dividing the tree structure into a plurality of subtrees is specifically as follows:
traversing from any node by a depth-first and front-order traversal method, and when a leaf node obtained by traversing is equal to a preset threshold value, segmenting the node and other traversed nodes into a sub-tree;
and traversing from any node by using the subtree as a leaf node through a depth-first and front-order traversal method, and if the obtained leaf node is equal to a preset threshold value, partitioning the node and other traversed nodes into a subtree.
The specific implementation process and the working principle of the text classification device provided by the embodiment of the present invention may refer to the above specific description of the text classification method, and are not described herein again.
To sum up, the embodiment of the invention discloses a text classification method and a device, which construct the graph structure of a training text by adopting the co-occurrence relation of words after receiving a plurality of training texts with known categories and preprocessing the training texts, training the parameters of the convolutional neural network through a back propagation algorithm according to the graph structure of the training text to obtain the trained convolutional neural network, then receiving an input text to be classified, preprocessing the text to be classified, constructing a graph structure of the text to be classified by adopting a word co-occurrence relation, and according to the graph structure of the text to be classified, and predicting the category technical scheme of the text to be classified through the trained convolutional neural network, and solving the problem of text classification by applying the convolutional neural network, thereby improving the accuracy and the reliability of text classification.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims (15)

1. A method of text classification, comprising the steps of:
receiving training texts of a plurality of known categories, preprocessing the training texts, constructing a graph structure of the training texts by adopting a word co-occurrence relation, and training parameters of a convolutional neural network through a back propagation algorithm according to the graph structure of the training texts to obtain the trained convolutional neural network; the convolutional neural network comprises at least one convolutional layer, at least one pooling layer, at least one full-connection layer and at least one classification output layer; in the graph structure of the training text, nodes correspond to words in the training text one by one;
receiving an input text to be classified, preprocessing the text to be classified, and constructing a graph structure of the text to be classified by adopting a word co-occurrence relation; in the graph structure of the text to be classified, nodes correspond to words in the text to be classified one by one;
extracting nodes of the graph structure of the text to be classified according to the graph structure of the text to be classified, sorting the nodes according to the size of contribution values, selecting N nodes before sorting from the nodes as key nodes, taking each key node as a root node, constructing sub-graphs through a breadth-first search algorithm, normalizing each sub-graph, obtaining word vector representation of each node in each sub-graph as input of a convolutional neural network, and predicting the category of the text to be classified according to the output result of a classification output layer, wherein the contribution values are determined by the degree of each node, the word frequency of words corresponding to the nodes in the text and the co-occurrence rate of the nodes and neighborhood nodes in sequence, and the sub-graphs comprise at least k nodes, N is greater than 0, and k is greater than 0.
2. The text classification method according to claim 1, wherein the preprocessing of the training text or the text to be classified is specifically:
after word segmentation processing is carried out on the training text or the text to be classified, noise and stop words of the training text or the text to be classified are removed, and word stems of all words in the training text or the text to be classified are extracted; wherein the noise point comprises punctuation marks and numbers, and the stop words comprise pronouns, conjunctions, prepositions and articles;
the method for constructing the graph structure of the training text or the text to be classified by adopting the co-occurrence relation of the words specifically comprises the following steps:
traversing the training text or the text to be classified through a sliding window with a preset size, and constructing an edge between any two words when the two words are simultaneously positioned in the sliding window, wherein the edge points to the next word from the previous word.
3. The text classification method according to claim 1, characterized in that the sub-graphs are constructed by a breadth-first search algorithm, and the normalization of each sub-graph is specifically:
acquiring adjacent nodes of the root node, and if the number of the adjacent nodes of the root node is greater than k-1, constructing a subgraph by using the root node, the adjacent nodes of the root node and the edges of the root node and the adjacent nodes;
if the number of the adjacent nodes of the root node is less than k-1, acquiring secondary adjacent nodes of the root node step by step until the total number of the acquired adjacent nodes and secondary adjacent nodes is greater than or equal to k or the secondary adjacent nodes cannot be acquired continuously, and constructing a subgraph according to the root node, the adjacent nodes and the secondary adjacent nodes of the root node, edges of the root node and the adjacent nodes, edges of the adjacent nodes and the secondary adjacent nodes of the root node and edges between the secondary adjacent nodes; wherein the secondary neighboring node is a node indirectly connected to the root node;
constructing a spanning tree according to the subgraph, and sequencing nodes of the spanning tree from a shallow layer to a deep layer by using a breadth first algorithm;
in the same layer, sequencing the adjacent nodes of the root node according to the size of the contribution value;
when the node in the subgraph is larger than k, reserving the node of k before the sequencing in the spanning tree, thereby completing the normalization process of the subgraph;
when the nodes in the subgraph are smaller than k, adding a plurality of dummy nodes in the subgraph to enable the number of the nodes in the subgraph to be equal to k, and thus completing the normalization process of the subgraph; wherein the dummy node is not connected with any node in the original subgraph.
4. The text classification method according to claim 1, wherein the training of the parameters of the convolutional neural network by the back propagation algorithm to obtain the trained convolutional neural network specifically comprises:
initializing parameters of the convolutional neural network, and carrying out forward propagation on the training texts of the known types through the convolutional neural network to obtain output results;
and performing back propagation according to the output result and the error marked by the training text, distributing the error to each layer in the convolutional neural network to obtain error data of each layer, and correcting the parameters of the convolutional neural network according to the error data of each layer.
5. The method for classifying texts according to claim 4, wherein the back propagation of the errors according to the output result and the labeled errors of the training texts is performed to distribute the errors to each layer in the convolutional neural network to obtain error data of each layer, and the correction of the parameters of the convolutional neural network according to the error data of each layer is specifically:
constructing a loss function according to the output result and the known class of the training text, and acquiring a residual error of any neural node in the convolutional neural network according to the loss function; wherein the loss function is:
J=H+Cλ(w)
wherein H is a cross entropy term and C λ (w) is a regularization term that prevents overfitting;
carrying out recursive operation according to the residual error of the node to update the parameter of each neural node; wherein the parameters of the neural node include neural network weights and biases.
6. The text classification method according to claim 5, characterized in that, when the classification output layer comprises a plurality of classifiers, the cross-entropy term is in particular:
Figure FDA0002474592740000041
wherein M is the number of training texts of known classes, K is the number of classifiers of the classification output layer,
Figure FDA0002474592740000042
for training text dmWhether or not to belong to class k1The binary label of (a) is stored,
Figure FDA0002474592740000043
for training text dmIn a prediction class k obtained by the convolutional neural network1The probability of (c).
7. The method for classifying text according to claim 6, further comprising the step of:
constructing a grade relation according to different categories, and acquiring a regularization term at the full-connection layer through the following formula to update parameters of the full-connection layer:
Figure FDA0002474592740000044
wherein, the
Figure FDA0002474592740000045
For the weight of the parental category in the hierarchical relationship,
Figure FDA0002474592740000046
is the weight of the child category in the hierarchical relationship.
8. The text classification method according to claim 1, characterized in that the word vector of each node in the graph structure of the training text or the text to be classified is represented by a word2vec model or a glove model.
9. The method of text classification according to claim 1, characterized in that the pooling layer down-samples the feature matrix using an average pooling, a maximum pooling or a random pooling method.
10. The text classification method according to claim 1, characterized in that dropout is adopted at the fully-connected layer so that the activation output values of the fully-connected layer are randomly cleared at a preset ratio.
11. The text classification method of claim 4, wherein the convolutional layer employs a linear modification function as the activation function of the convolutional layer, and the activation function of the fully-connected layer is a Sigmoid function, a tanh (x) function, or a softplus function.
12. The text classification method of claim 1, wherein the classification output layer is a softmax classifier.
13. The text classification method of claim 1, wherein the classification output layer includes Sigmoid functions.
14. The text classification method according to claim 1, characterized in that the method further comprises the steps of:
constructing a tree structure according to the hierarchical relationship among different category constructions, dividing the tree structure into a plurality of subtrees, and training the full-connection layer by taking the subtrees as a unit; the tree structure comprises a plurality of category nodes and category edges, wherein the category nodes correspond to categories, and the category edges point to the category of the next level from the category of the previous level; wherein the step of dividing the tree structure into a plurality of subtrees is specifically as follows:
traversing from any category node by a depth-first and front-end traversal method, and when leaf nodes obtained by traversal are equal to a preset threshold value, segmenting the category nodes and traversed other category nodes into a sub-tree;
and taking the subtree as a category node and a leaf node, traversing from any category node by a depth-first and front-order traversal method, and if the leaf node obtained by traversal is equal to the preset threshold value, segmenting the category node and the traversed other category nodes into a subtree.
15. A text classification apparatus, comprising:
the training module is used for receiving training texts of a plurality of known categories, preprocessing the training texts, constructing a graph structure of the training texts by adopting word co-occurrence relation, and training parameters of a convolutional neural network through a back propagation algorithm according to the graph structure of the training texts to obtain the trained convolutional neural network; the convolutional neural network comprises at least one convolutional layer, at least one pooling layer, at least one full-connection layer and at least one classification output layer; in the graph structure of the training text, nodes correspond to words in the training text;
the system comprises a to-be-classified text receiving module, a to-be-classified text receiving module and a to-be-classified text preprocessing module, wherein the to-be-classified text receiving module is used for receiving an input to-be-classified text, and constructing a graph structure of the to-be-classified text by adopting a word co-occurrence relation after preprocessing the to-be-classified text; in the graph structure of the text to be classified, nodes correspond to words in the text to be classified one by one;
a category prediction module for extracting nodes of the graph structure of the text to be classified according to the graph structure of the text to be classified, sorting the nodes according to the size of the contribution values, selecting the node of N before sorting from the nodes as a key node, and taking each key node as a root node, constructing subgraphs by a breadth-first search algorithm, carrying out normalization processing on each subgraph, acquiring word vector representation of each node in each subgraph as input of a convolutional neural network, predicting the category of the text to be classified according to the output result of the classification output layer, the contribution value is determined by the degree of each node, the word frequency of a word corresponding to the node in a text and the co-occurrence rate of the node and a neighborhood node in sequence, the subgraph comprises at least k nodes, N is greater than 0, and k is greater than 0.
CN201710642105.0A 2017-07-31 2017-07-31 Text classification method and device Active CN107526785B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710642105.0A CN107526785B (en) 2017-07-31 2017-07-31 Text classification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710642105.0A CN107526785B (en) 2017-07-31 2017-07-31 Text classification method and device

Publications (2)

Publication Number Publication Date
CN107526785A CN107526785A (en) 2017-12-29
CN107526785B true CN107526785B (en) 2020-07-17

Family

ID=60680376

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710642105.0A Active CN107526785B (en) 2017-07-31 2017-07-31 Text classification method and device

Country Status (1)

Country Link
CN (1) CN107526785B (en)

Families Citing this family (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190236135A1 (en) * 2018-01-30 2019-08-01 Accenture Global Solutions Limited Cross-lingual text classification
CN110309293A (en) * 2018-02-13 2019-10-08 北京京东尚科信息技术有限公司 Text recommended method and device
CN108595497B (en) * 2018-03-16 2019-09-27 北京达佳互联信息技术有限公司 Data screening method, apparatus and terminal
CN108875779A (en) * 2018-05-07 2018-11-23 深圳市恒扬数据股份有限公司 Training method, device and the terminal device of neural network
CN108804404B (en) * 2018-05-29 2022-04-15 周宇 Character text processing method and device
CN108920455A (en) * 2018-06-13 2018-11-30 北京信息科技大学 A kind of Chinese automatically generates the automatic evaluation method of text
CN110826377A (en) * 2018-08-13 2020-02-21 珠海格力电器股份有限公司 Material sorting method and device
CN110913354A (en) * 2018-09-17 2020-03-24 阿里巴巴集团控股有限公司 Short message classification method and device and electronic equipment
CN109543029B (en) * 2018-09-27 2023-07-25 平安科技(深圳)有限公司 Text classification method, device, medium and equipment based on convolutional neural network
CN109471944B (en) * 2018-11-12 2021-07-16 中山大学 Training method and device of text classification model and readable storage medium
CN109710755A (en) * 2018-11-22 2019-05-03 合肥联宝信息技术有限公司 Training BP neural network model method and device and the method and apparatus that text classification is carried out based on BP neural network
CN109739979A (en) * 2018-12-11 2019-05-10 中科恒运股份有限公司 Tuning method, tuning device and the terminal of neural network
CN109726285A (en) * 2018-12-18 2019-05-07 广州多益网络股份有限公司 A kind of file classification method, device, storage medium and terminal device
CN109740482A (en) * 2018-12-26 2019-05-10 北京科技大学 A kind of image text recognition methods and device
CN109599096B (en) * 2019-01-25 2021-12-07 科大讯飞股份有限公司 Data screening method and device
CN110019653B (en) * 2019-04-08 2021-07-02 北京航空航天大学 Social content representation method and system fusing text and tag network
CN110704626B (en) * 2019-09-30 2022-07-22 北京邮电大学 Short text classification method and device
CN111241294B (en) * 2019-12-31 2023-05-26 中国地质大学(武汉) Relationship extraction method of graph convolution network based on dependency analysis and keywords
CN111145906B (en) * 2019-12-31 2024-04-30 清华大学 Project judging method, related device and readable storage medium
CN111291823B (en) * 2020-02-24 2023-08-18 腾讯科技(深圳)有限公司 Fusion method and device of classification model, electronic equipment and storage medium
CN113642697B (en) * 2020-04-27 2024-02-20 郑州芯兰德网络科技有限公司 Distributed multi-level graph network training method and system
CN111598093B (en) * 2020-05-25 2024-05-14 深圳前海微众银行股份有限公司 Method, device, equipment and medium for generating structured information of characters in picture
CN111538870B (en) * 2020-07-07 2020-12-18 北京百度网讯科技有限公司 Text expression method and device, electronic equipment and readable storage medium
CN112329669B (en) * 2020-11-11 2021-11-16 国网黑龙江省电力有限公司电力科学研究院 Electronic file management method
CN112380344B (en) * 2020-11-19 2023-08-22 平安科技(深圳)有限公司 Text classification method, topic generation method, device, equipment and medium
CN112597764B (en) * 2020-12-23 2023-07-25 青岛海尔科技有限公司 Text classification method and device, storage medium and electronic device
CN113094549A (en) * 2021-06-10 2021-07-09 智者四海(北京)技术有限公司 Video classification method and device, electronic equipment and storage medium
CN113886438B (en) * 2021-12-08 2022-03-15 济宁景泽信息科技有限公司 Artificial intelligence-based achievement transfer transformation data screening method
CN116244738B (en) * 2022-12-30 2024-05-28 浙江御安信息技术有限公司 Sensitive information detection method based on graph neural network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101741611A (en) * 2009-12-03 2010-06-16 哈尔滨工业大学 MLkP/CR algorithm-based undirected graph dividing method
CN103473380A (en) * 2013-09-30 2013-12-25 南京大学 Computer text sentiment classification method
CN105740349A (en) * 2016-01-25 2016-07-06 重庆邮电大学 Sentiment classification method capable of combining Doc2vce with convolutional neural network
CN106991132A (en) * 2017-03-08 2017-07-28 南京信息工程大学 A kind of figure sorting technique reconstructed based on atlas with kernel of graph dimensionality reduction

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101741611A (en) * 2009-12-03 2010-06-16 哈尔滨工业大学 MLkP/CR algorithm-based undirected graph dividing method
CN103473380A (en) * 2013-09-30 2013-12-25 南京大学 Computer text sentiment classification method
CN105740349A (en) * 2016-01-25 2016-07-06 重庆邮电大学 Sentiment classification method capable of combining Doc2vce with convolutional neural network
CN106991132A (en) * 2017-03-08 2017-07-28 南京信息工程大学 A kind of figure sorting technique reconstructed based on atlas with kernel of graph dimensionality reduction

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于Word2Vec语言模型与图核设计的文本分类研究;袁艳红;《中国优秀硕士学位论文全文数据库信息科技辑》;20170215;第I138-4649页 *

Also Published As

Publication number Publication date
CN107526785A (en) 2017-12-29

Similar Documents

Publication Publication Date Title
CN107526785B (en) Text classification method and device
CN109271522B (en) Comment emotion classification method and system based on deep hybrid model transfer learning
CN111079639B (en) Method, device, equipment and storage medium for constructing garbage image classification model
CN112966691B (en) Multi-scale text detection method and device based on semantic segmentation and electronic equipment
CN111125358B (en) Text classification method based on hypergraph
CN110222634B (en) Human body posture recognition method based on convolutional neural network
CN110110323B (en) Text emotion classification method and device and computer readable storage medium
CN107909115B (en) Image Chinese subtitle generating method
CN109993100B (en) Method for realizing facial expression recognition based on deep feature clustering
CN110969020A (en) CNN and attention mechanism-based Chinese named entity identification method, system and medium
CN110188195B (en) Text intention recognition method, device and equipment based on deep learning
CN107683469A (en) A kind of product classification method and device based on deep learning
CN112949647B (en) Three-dimensional scene description method and device, electronic equipment and storage medium
CN109002755B (en) Age estimation model construction method and estimation method based on face image
CN111475622A (en) Text classification method, device, terminal and storage medium
CN113220876B (en) Multi-label classification method and system for English text
CN112749274A (en) Chinese text classification method based on attention mechanism and interference word deletion
CN116152554A (en) Knowledge-guided small sample image recognition system
CN115456043A (en) Classification model processing method, intent recognition method, device and computer equipment
Al-Hmouz et al. Enhanced numeral recognition for handwritten multi-language numerals using fuzzy set-based decision mechanism
CN111310820A (en) Foundation meteorological cloud chart classification method based on cross validation depth CNN feature integration
CN112527959B (en) News classification method based on pooling convolution embedding and attention distribution neural network
CN116884067B (en) Micro-expression recognition method based on improved implicit semantic data enhancement
CN113204640A (en) Text classification method based on attention mechanism
CN112380919A (en) Vehicle category statistical method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant