Term vector model based on mutual information and based on the file classification method of CNN
Technical field
It is specifically a kind of based on mutual information the present invention relates to the text classification field of natural language processing technique
Term vector model and the file classification method for being based on CNN (convolutional neural networks).
Background technique
With the development of internet technology, the data volume in WWW is growing day by day, wherein having a large amount of data is text
How data, all trades and professions that these data are related to society accomplish the reasonable of data in face of the text data of the scale of construction huge in this way
Changing classification becomes an important research puzzle.Text is rationalized, mechanized classification, people can be helped to solve many difficult
Topic, such as: many occasions such as junk information differentiates, deceptive information is found.In recent years, to complete text classification, then text
Expression just seems most important, the reasonable available accurate text semantic information of text representation.
1. term vector technical background
In natural language representation method, the vectorization expression of word is important basic technology.Traditional term vector table
Show that method is one dictionary of creation and each word order is numbered, i.e. one-hot encoding representation.This representation method can not capture word
Semantic similarity between language, and dimension disaster easily occurs.For this purpose, Hinton [1] was distributed in proposition term vector in 1986
Formula representation method, this method indicate word using the vector of fixed dimension, the language of word are indicated with the distance between vector
Adopted distance has also broken the semantic gap between word while playing the role of dimensionality reduction, so that the semantic relation between word obtains
To better description.With the continuous deepening of research, Bengio proposes to use neural network language model, while obtaining word
Vector.This model is able to use neural network and carries out unsupervised learning, the context relation between word is captured, in training net
Term vector is also obtained as parameter together training while network model.Although effectively, calculation amount is huge for the model of Bengio,
In order to reduce computation complexity, Miklov proposes improved model on this basis --- and word2vec simultaneously obtains better knot
Fruit, while the complexity of model is reduced to n × log by n × V2V, so that the training of extensive term vector becomes more efficient.
Although Miklov is yielded good result in the expression of word, still there are many scholar to language indicate carry out deeper into
Probe into.Wherein, the objective function in Glove model refinement word2vec model that Pennington is proposed, the overall situation is united
Information --- co-occurrence matrix is introduced into the training of term vector meter, and knot more better than word2vec is achieved in multiple experiments
Fruit.
This method is in order to improve deficiency of the Glove model on semantic capture and statistics co-occurrence matrix, herein in its mould
Introduction point mutual information on the basis of type constructs global point mutual information matrix, and training obtains final term vector.In multiple and language
The semanteme that is carried out on the relevant data set of justice the experimental results showed that, the term vector based on global point mutual information matrix can better table
Existing semantic relation.The main contributions of this paper have: 1, global point mutual information matrix being introduced into term vector calculating, make the system of term vector
It is more accurate to count information.2, the objective function for improving Glove model eliminates break-in operation, therefore can significantly reduce model instruction
Experienced calculation amount.
2. classification method technical background
Text classification common technology is divided into the file classification method based on sentiment dictionary, the text classification based on machine learning
Method and file classification method based on deep learning.The main application characteristic of these methods is as follows:
1) based on the file classification method of sentiment dictionary
Method based on sentiment dictionary is literary using existing semantic dictionary resource construction domain lexicon, then by comparing emotion
Forward direction emotion word, negative sense emotion word included in this, mark positive and negative integer value as emotional value, while also to consider
Influence of the special part-of-speech rule, syntactic structure to Judgment by emotion, such as negative, progressive sentence, turnover sentence.
File classification method based on sentiment dictionary is easy to accomplish, but this method needs fairly large sentiment dictionary, and it
It is a linear model, limited capacity.
2) based on the file classification method of machine learning
Text classification based on machine learning, key are that feature selecting, feature weight quantization, disaggregated model etc. 3 is wanted
Element.Feature selecting mainly has based on information gain, based on the methods of document frequency.Feature weight quantization mainly has word frequency, inverse text
Shelves frequency, TF-IDF, entropy weight are again etc..Sorter model includes: naive Bayesian, k nearest neighbor, support vector machines, neural network, determines
Plan tree etc..The method using three kinds of sorter models is as follows.
(a) support vector machine method
In classification problem, there are problems that linearly inseparable, support vector machine method is to linearly can not in lower dimensional space
Point sample set mapped, allow linear separability in sample high-dimensional feature space, and acquire in this higher dimensional space optimal super
Plane, so that class interval maximizes, the final classification for realizing sample.
But there is also some disadvantages for support vector machines:
First, algorithm of support vector machine is difficult to carry out large-scale training sample.Because support vector machines is by secondary
Planning solves quadratic programming and is related to the calculating of m rank matrix to solve, and wherein m is the number of sample.When m it is in a large number when
The storage and calculating of the matrix will expend a large amount of machine memory and operation time;
Second, more classification problems, which are solved, with support vector machines has difficulties.Classical algorithm of support vector machine only provides two
The algorithm of classification.
(b) traditional decision-tree
Decision tree can be used for a kind of tree construction of classification, be made of node and branch.Decision tree learning is substantially
One group of classifying rules is summarized from training data concentration.The algorithm of decision tree is usually a recursive selection optimal characteristics, and
Training set is split according to this feature, so that each Sub Data Set has the process of a best classification.
But there is also some disadvantages for decision tree:
First, decision Tree algorithms are very easy to over-fitting, cause generalization ability not strong;
Second, decision tree can be because a little change of sample, results in the violent change of tree construction;
Third, some more complicated relationships, decision tree is difficult to learn, such as exclusive or.
(c) neural network method
Neural network method can also be used to solve the problems, such as Nonlinear Classification.One neural network model generally comprises big
The neuron being connected to each other is measured, the weighting input value of the other adjacent neurons of activation primitive calculation processing is passed through.Neural network mould
Type is trained by a large amount of data, adjusts weight by optimizing the value of cost function, so that category of model effect is best,
The final classification for realizing sample.
But there is also some disadvantages for neural network:
First, when facing big data, need the feature for artificially extracting initial data as input.And deep learning can be certainly
The feature of dynamic selection initial data;
Second, it is desirable to more accurate approximate complicated function, it is necessary to increase the number of plies of hidden layer, this is easy for generating gradient
The problem of disappearance or gradient are exploded;
Third can not handle time series data (such as audio, text), because neural network is free of time parameter.
3) based on the file classification method of deep learning
Deep learning refers to deep neural network model, refers generally to nerve of the network number of plies at three layers or three layers or more
Network structure.
Some models of deep learning also improve deficiency existing for neural network model simultaneously.Such as convolutional neural networks
(CNN) trained number of parameters, Recognition with Recurrent Neural Network (RNN) and shot and long term memory network is greatly reduced in " power is shared "
(LSTM) it can handle time series data.The model of two kinds of deep learnings is as follows.
(a) RNN method
Recognition with Recurrent Neural Network is a kind of neural network of node orientation connection cyclization.It is shown as in network structure, it is multiple
The hidden layer of simple neural network joins end to end according to time series.
But due to RNN model, during model training, error, which is reversely relayed, can have gradient disappearance or explosion
Problem, so model cannot establish the dependence of long period sequence.That is, the short period can only be arranged in RNN model
The dependence of sequence.
(b) LSTM method
The dependence of long period sequence cannot be established in order to solve RNN model, proposes LSTM model.It and RNN
Network is compared, and mainly the internal structure of hidden layer is different.Only one activation primitive of the hidden layer of standard RNN network, and
The hidden layer of LSTM network has a more complex network structure, also comprising input gate, forget door and out gate these three doors.
But LSTM can only avoid the gradient of RNN from disappearing, but gradient explosion issues cannot be fought.
Summary of the invention
It is of the existing technology above-mentioned insufficient it is an object of the invention to overcome, the term vector based on mutual information is provided
Model and file classification method based on CNN.
The purpose of the present invention is achieved through the following technical solutions.
Based on a Chinese comment text classification method for mutual information overall situation term vector model comprising: (S1) is by being based on
The global term vector method training term vector model of point mutual information;
(S2) according to trained term vector model, the term vector matrix of the text is determined;
(S3) feature in term vector matrix, and train classification models are extracted by convolutional neural networks (CNN);(S4) root
According to trained term vector model and CNN Feature Selection Model to input Text character extraction;
(S5) text feature obtained according to CNN Feature Selection Model calculates text by softmax and cross-entropy method
With the mapping distance of pre-set categories, taking distance is recently that text corresponds to classification.
Further, step (S1) specifically includes:
(1) Chinese wikipedia data set is inputted, data are pre-processed, removes punctuation mark and space;
(2) word segmentation processing is carried out with the data set that Chinese word segmentation tool obtains step (1), corpus data is converted to word
Word order column;
(3) word frequency statistics are carried out to the word that step (2) obtains, and statistical result is saved in a hard disk;As a result lattice are saved
Formula are as follows: " word t word frequency ";
(4) co-occurrence statistics are carried out to the word that step (2) obtains, corpus is traversed according to the window size of setting, is obtained
It is saved in the form of triple in a hard disk to co-occurrence number of the every two word in window, and by result, saves format: " word
Language 1 t word 2 t co-occurrence number ";
(5) triple obtained to step (4) is upset at random, and the triple after upsetting at random is stored in hard disk
In, save format are as follows: " word 1 t word 2 t co-occurrence number ";
(6) it to all words occurred in step (2), random initializtion term vector, and saves in memory, facilitates program
It reads and modifies;
(7) triple obtained in step (5) is completely traversed, according to objective function:Term vector is adjusted using gradient descent method, w in objective functioniWithCentered on respectively
Word and the corresponding term vector of upper and lower cliction, V indicate all term vectors in vocabulary,
(8) iterative step (7) are repeated continuously, until result restrain to get arrive the term vector based on mutual information, will in
Term vector in depositing saves in a hard disk, saves format are as follows: " Ci Yu t term vector ".
Further, in rapid (5) to step (8), for possessing two word w of similar contextsiAnd wjFor, wiAnd wj
Between relationship can by with third wordRelationship embody, to wiAnd wjBetween relationship modeled to obtain:
In equation, wiAnd wjIndicate two centre words for possessing similar contexts,For context term vector, and
For wiWithThe probability occurred jointly;
Ratio on the right of equation is model output, represents the relationship between the word for wanting prediction;Keeping initial model
Export it is constant under the premise of, the input of initial model is simplified, to establish optimizable objective function;In view of vector
Space has inherent linear structure, then is limited to only be influenced by the difference of two center term vectors by input function form,
Obtain following formula:
Since the right of equation is that scalar by complicated linearly or nonlinearly transformation converts scalar for input vector
Form;But this undoubtedly will increase the complexity of model, and influence the linear structure of model.In order to avoid such case, adopt
Vector operation is carried out with the form of dot product, portrays the relationship between two words, such as following formula:
In order to convert the equation left side to the form of ratio, in conjunction with the condition of continuity, above-mentioned equation left side functional equation
General solution form is F (x)=eax;In view of the norm of term vector can be normalized, F (x)=e is directly takenx, then have:
It enabling again at this time (1), molecule denominator is equal to each other in (2) two formulas, then it can obtain:
That is:
Further have:
It modifies as a result, to the objective function of GloVe model, by the co-occurrence number X in former objective functionikIt replaces with a little
Mutual informationI.e. new objective function are as follows:
The objective function of GloVe model:
The final goal function of term vector model are as follows:
Obtaining the term vector of target word after gradient descent method training is wi, then to each word repeat into
The term vector of all words can be obtained in the above-mentioned operation of row.
Further, step (S2) specifically includes:
Based on a term vector data for mutual information overall situation term vector model storage, training corpus is found by matched method
In the corresponding word vectors w ∈ R of each wordd×1(word vectors are the one-dimensional vectors that length is d), and these term vectors are pressed
Sequence (the w of original sentence1,w2,w3,…,ws) the sentence matrix S that is combined0(S0∈Rd×s), wherein d is term vector
Dimension, s are the word number of longest sentence in corpus, i.e. sentence length.
Further, (S3) is specifically included:
(1) tag along sort in Chinese wikipedia data set is carried out boolean vector indicates yj∈Y(j∈(1,2,…,
L)), whereinIndicate that the vector of text i jth class indicates that Y is class set;L is the total number of class, and vector yjDimension
For l;
(2) based on obtained sentence matrix S0(S0∈Rd×s), the convolution kernel that preferred dimension is 3 × 2 respectively carries out convolution
Operation obtains eigenmatrix S1;
(3) based on obtained eigenmatrix S1The maximum pondization operation for carrying out 2 × 2, by the maximum value of each 2 × 2 matrix
It extracts and is reassembled into new eigenmatrix S2;
(4) step (2) and step (3) are repeated, until eigenmatrix Sn(wherein, n indicates to carry out convolution sum pondization behaviour altogether
The total degree of work) contain only l number until;
(5) by eigenmatrix SnExpand into the one-dimensional vector y that length is l-, text i is then calculated by softmax function
Probability value in each dimensionIt is calculated finally by entropy function is intersectedWith correct category(i.e. class belonging to text itself
Category vector) distance di;
(6) by diIt is cumulative, and to minimize(wherein, the text number that M indicates entire training corpus) is target
Model training is carried out, and the parameter of obtained CNN disaggregated model is preserved.
Further, sentence matrix S0Convolution operation, which is carried out, by convolution nuclear parameter (W, b) obtains eigenmatrix S1:
S1=f (S0·W+b) (3)
Wherein, W is convolution kernel parameter matrix, and b is bias vector, and f () indicates activation primitive;Based on obtained feature square
Battle array S1, pondization is carried out by maximum pond method and is calculated:
S2=downsample (S1) (4)
Wherein, downsample () indicates pond function;The calculating for passing through formula (3) and (4) repeatedly, obtains to the end
Eigenmatrix Sn, and spread out the one-dimensional vector y for being l for length-, and text is calculated in each class by softmax function
Probability value vectorIt is calculated finally by entropy function is intersectedWithDistance di:
Wherein, y-ikIndicate one-dimensional vector y-Kth (1≤k≤l) a value,Indicate text i in category vector kth dimension
Probability;Last objective function are as follows:
Loss=di
This method is by gradient descent method, to minimize loss calculating parameter (W, b), and last (W, b) is saved and is made
For model parameter, for being used when text classification to be sorted.
Further, step (S5) specifically includes:
(1) it based on the pre-treatment step of training sample, treats classifying text and is equally pre-processed, sentence square is calculated
Battle array S '0(S′0∈Rd×s), the convolution kernel that preferred dimension is 3 × 2 respectively carries out convolution algorithm, obtains eigenmatrix S '1;
(2) based on obtained eigenmatrix S '1The maximum pondization operation for carrying out 2 × 2, by the maximum value of each 2 × 2 matrix
It extracts and is reassembled into new eigenmatrix S'2;
(3) step (2) and step (3) are repeated, until eigenmatrix S'nAlso until becoming only l number;
(4) by eigenmatrix S'nThe one-dimensional vector y'_ that length is l is expanded into, text is then calculated by softmax function
Originally the probability value vector in each classIt is calculated finally by entropy function is intersectedWith each class label yjDistance d'j;
(5) finally, in obtained l distance d'jIt is middle to find class, the as class of the text corresponding to the smallest distance
Label: label'(j)=min (d'j)
Compared with prior art, the invention has the advantages that and technical effect:
This method sufficiently extracts the semantic information and local feature of text context, introduces on the basis of original classification method
Convolutional neural networks (CNN) method of deep learning.CNN is applied to the feature extraction of image procossing originally, and has fabulous
Local message extractability is well suited as the feature extracting method of text term vector matrix.This method mainly improves first
Shortcoming of the Glove term vector on semantic capture and statistics co-occurrence matrix, draws pronouns, general term for nouns, numerals and measure words vector for global point mutual information matrix
In matrix, keep the statistical information of term vector more accurate, while improving the objective function of Glove model, remove break-in operation, reduces
Model training complexity, in addition, the key message of text can be effectively found in conjunction with CNN Feature Selection Model, it is accurate to determine text
This meaning.The present invention can accurately excavate the characteristic of division of text, suitable for the text classification in various fields, have very big
Practical value.
Detailed description of the invention
Fig. 1 is the flow chart of the term vector model based on mutual information and the file classification method based on CNN.
Fig. 2 is the term vector training method flow chart based on mutual information.
Fig. 3 is the training flow chart of the textual classification model based on CNN.
Specific embodiment
The solution of the present invention is made to explain in detail enough in foregoing summary part.Below in conjunction with attached drawing and
Specific implementation of the invention is described in detail in specific embodiment, but implementation of the invention is without being limited thereto.It is noted that
It is that those skilled in the art can refer to prior art understanding or realize if having the process or symbol of not special detailed description below
, such as be all the theoretical reason for being referred to existing CNN for some conventional the parameter such as w and b etc. in CNN neural network
Solution, it repeats no more below.
Referring to Fig. 1, term vector model in this example based on mutual information and based on the file classification method of CNN, comprising:
(S1) pass through the global term vector method training term vector model based on mutual information;
(S2) according to trained term vector model, the term vector matrix of the text is determined;
(S3) feature in term vector matrix, and train classification models are extracted by convolutional neural networks (CNN);
(S4) according to trained term vector model and CNN Feature Selection Model to input Text character extraction;
(S5) text feature obtained according to CNN Feature Selection Model calculates text by softmax and cross-entropy method
With the mapping distance of pre-set categories, taking distance is recently that text corresponds to classification.
1, based on point mutual information training term vector
For the language model based on statistical information training term vector, how with comprehensive and accurate information word is portrayed
Between relationship be the key that model training.Therefore, the present invention improves Glove model.By deriving, discovery uses word
Between point mutual information matrix can preferably between portrayed words and word statistical relationship.Such as Fig. 2, specific technical solution is as follows:
Term vector training method based on mutual information, comprising the following steps:
(1) Chinese wikipedia data set is inputted, data are pre-processed, removes punctuation mark and space;
(2) word segmentation processing is carried out with the data set that Chinese word segmentation tool obtains step (1), corpus data is converted to word
Word order column;
(3) word frequency statistics are carried out to the word that step (2) obtains, and statistical result is saved in a hard disk.As a result lattice are saved
Formula are as follows: " word t word frequency ";
(4) co-occurrence statistics are carried out to the word that step (2) obtains, according to the window size being set in advance to corpus progress time
It goes through, obtains co-occurrence number of the every two word in window, and result is saved in a hard disk in the form of triple, save lattice
Formula: " word 1 t word 2 t co-occurrence number ";
(5) triple obtained to step (4) is upset at random, and the triple after upsetting at random is stored in hard disk
In, save format are as follows: " word 1 t word 2 t co-occurrence number ";
(6) it to all words occurred in step (2), random initializtion term vector, and saves in memory, facilitates program
It reads and modifies;
(7) triple obtained in step (5) is completely traversed, according to objective function:Term vector is adjusted using gradient descent method, in objective function
(8) iterative step (7) are repeated continuously, until result restrain to get arrive the term vector based on mutual information, will in
Term vector in depositing saves in a hard disk, saves format are as follows: " Ci Yu t term vector ".
In the above-mentioned term vector training method based on mutual information, step (5) in step (8), this method be based on " for
Possess two word w of similar contextsiAnd wjFor, wiAnd wjBetween relationship can by with third wordRelationship come
The hypothesis of embodiment " is to wiAnd wjBetween relationship modeled to obtain:
Ratio on the right of equation is model output, represents the relationship between the word for wanting prediction.Keeping model output
Under the premise of constant, the input of model is simplified, to establish optimizable objective function.In view of vector space has
Inherent linear structure, input function form then is limited to only be influenced by the difference of two center term vectors, obtain following formula:
Since the right of equation is scalar, can be converted input vector to by complicated linearly or nonlinearly transformation
The form of scalar.But this undoubtedly will increase the complexity of model, and influence the linear structure of model.In order to avoid this feelings
Condition is carried out vector operation in the form of dot product, portrays the relationship between two words, such as following formula:
In order to convert the equation left side to the form of ratio, in conjunction with the condition of continuity, above-mentioned equation left side functional equation
General solution form is F (x)=eax.In view of that the norm of term vector can be normalized, F (x)=e might as well be directly takenx, then
Have:
It enabling again at this time (1), molecule denominator is equal to each other in (2) two formulas, then it can obtain:
That is:
Further have:
It modifies as a result, to the objective function of GloVe model, by the co-occurrence number X in former objective functionikIt replaces with a little
Mutual informationI.e. new objective function are as follows:
The objective function of GloVe model:
The objective function of this method:
From the above equation, we can see that the model midpoint mutual information considers the corresponding probability of occurrence of two words in the denominator, therefore not
It will receive the interference of high frequency words, the relationship between the better portrayed words of energy, so that the term vector of training more accurately reflects the meaning of a word.
In addition, compare objective function in the model and GloVe model, it is more simple to be apparent from the objective function form that this method defines, and
And the operation of truncation funcation is eliminated, therefore calculation amount can be effectively reduced.
After determining objective function, model can carry out the training process of term vector.Firstly, passing through the side of traversal corpus
Formula, statistics obtain point mutual information (PMI) matrix between word.After obtaining a mutual information matrix, model is to a mutual information square
Value in battle array is traversed, while being trained using gradient descent method to term vector.By constantly iteration, finally obtain just
True term vector indicates.
Finally, by above-mentioned steps, by text representation each in data set at the term vector matrix of s × d, wherein s is indicated
The word number of longest text, d indicate the dimension of each word in text set S to be sorted, certainly, are less than s for text size
Sentence, can pass through " zero padding " operation carry out completion.
2, the textual classification model based on CNN is established
Based on text representation matrix obtained above, this method needs the training classification mould on the tape label corpus being collected into
Type, referring to Fig. 3, the specific steps are as follows:
(1) tag along sort in data set (such as Tan Song wave data set) is subjected to vectorization expressionWherein,Table
Show that text i belongs to the vector expression (vector of j classDimension be l), Y is class set;
(2) based on obtained sentence matrix S0, the convolution kernel that this method difference preferred dimension is 3 × 2, progress convolution fortune
It calculates, obtains eigenmatrix S1;
(3) based on obtained eigenmatrix S1The maximum pondization operation for carrying out 2 × 2, by the maximum value of each 2 × 2 matrix
It extracts and is reassembled into new eigenmatrix S2;
(4) step (2) and step (3) are repeated, until eigenmatrix SnUntil becoming only l numerical value;
(5) by eigenmatrix SnExpand into the one-dimensional vector y that length is l-, text is then calculated by softmax function and is existed
Probability value vector in each classIt is calculated finally by entropy function is intersectedWithDistance di;
(6) by diIt is cumulative, and to minimize diAnd carry out model training for objective function, and model parameter is saved
Get off.N is the number of iterations in Fig. 3, and N is the maximum times of setting.
In the training of above-mentioned CNN disaggregated model, sentence matrix S0Convolution operation is carried out by convolution nuclear parameter (W, b) to obtain
Eigenmatrix S1:
S1=f (S0·W+b) (3)
Wherein, f () indicates activation primitive.Based on obtained eigenmatrix S1, pond is carried out by maximum pond method
It calculates:
S2=downsample (S1) (4)
Wherein, downsample () indicates pond function.The calculating for passing through formula (3) and (4) repeatedly, obtains to the end
Eigenmatrix Sn, and spread out the one-dimensional vector y for being l for length-, and text is calculated in each class by softmax function
Probability value vectorIt is calculated finally by entropy function is intersectedWithDistance di:
Wherein, y-ikIndicate one-dimensional vector y-Kth (1≤k≤l) a value,Indicate text i in category vector kth dimension
Probability.Last objective function are as follows:
Loss=di
This method is by gradient descent method, to minimize loss calculating parameter (W, b).
3, text classification
Based on obtained term vector model and CNN disaggregated model, can to classify to text, detailed process is such as
Under:
(1) sentence matrix S is calculated based on text to be sorted0, the convolution kernel that this method difference preferred dimension is 3 × 2,
Convolution algorithm is carried out, eigenmatrix S is obtained1;
(2) based on obtained eigenmatrix S1The maximum pondization operation for carrying out 2 × 2, by the maximum value of each 2 × 2 matrix
It extracts and is reassembled into new eigenmatrix S2;
(3) step (2) and step (3) are repeated, until eigenmatrix SnUntil becoming only l numerical value;
(4) by eigenmatrix SnExpand into the one-dimensional vector y that length is l-, text is then calculated by softmax function and is existed
Probability value vector in each classIt is calculated finally by entropy function is intersectedWith label yjDistance dj;
(5) based on obtained distance dj, by minimizing formula
Label=min (dj)
Select the category label apart from nearest class for text.
This method sufficiently extracts the semantic information and local feature of text context, introduces on the basis of original classification method
Convolutional neural networks (CNN) method of deep learning.CNN is applied to the feature extraction of image procossing originally, and has fabulous
Local message extractability is well suited as the feature extracting method of text term vector matrix.This method mainly improves first
Shortcoming of the Glove term vector on semantic capture and statistics co-occurrence matrix, draws pronouns, general term for nouns, numerals and measure words vector for global point mutual information matrix
In matrix, keep the statistical information of term vector more accurate, while improving the objective function of Glove model, remove break-in operation, reduces
Model training complexity, in addition, the key message of text can be effectively found in conjunction with CNN Feature Selection Model, it is accurate to determine text
This meaning.