CN108388654A

CN108388654A - A kind of sensibility classification method based on turnover sentence semantic chunk partition mechanism

Info

Publication number: CN108388654A
Application number: CN201810171490.XA
Authority: CN
Inventors: 张玉红; 王勤勤; 李玉玲; 李培培; 胡学钢
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2018-03-01
Filing date: 2018-03-01
Publication date: 2018-08-10
Anticipated expiration: 2038-03-01
Also published as: CN108388654B

Abstract

The invention discloses a kind of sensibility classification method based on turnover sentence semantic chunk partition mechanism, step includes：1. utilizing known term vector dictionary, every sample in training set and test set is expressed as term vector matrix；2. choose suitable convolution kernel carries out convolution to term vector matrix, extraction mappings characteristics vector is to realize dimensionality reduction；3. building adversative dictionary, and by inquiring the position of adversative in the sample, semantic division is carried out to the mappings characteristics of extraction, extraction each divides most important information in the block, forms final feature space；4. training grader based on the final feature space, and classify to the sample in test set.The present invention is based on the adversative dictionaries of structure, realize the division of sentence semantics block, can obtain the important semantic information in every section, while considering the positional structure feature of sentence, so as to improve the correctness of text emotion classification.

Description

A kind of sensibility classification method based on turnover sentence semantic chunk partition mechanism

Technical field

The invention belongs to the emotional semantic classification problems of nature and Language Processing field, especially for the emotion for including a variety of semantemes Expression way is such as intended to press down the turnover sentence first raised or be intended to raise and first press down, carries out effective emotional semantic classification.

Background technology

It is the text message of communication media increasingly by enterprises and institutions using network with the rapid development of Internet With personal concern, the network information can be that government department understands common people's intention, and enterprise understands user couple by opening up product review The opinion of product is to improve properties of product, and consumer is by product review come consumption guidance behavior.However, online have greatly daily The new comment of amount occurs, and comment, which starts, may be the attitude actively affirmed but consider that other factors may be changed into negative state That is, there is the phenomenon that rising after restrain, first go up and then down or changing the attitude repeatedly in degree.Such as：Social networks (including：Domestic Tencent/ Sina weibo, Renren Network, external Facebook, Twitter etc.) a large amount of user data can be all generated daily, and wherein wrap Certain event is delivered containing a large amount of people to the text message of personal view.Such as：About " little Huang vehicle death case " in Tencent's news One comment：" compensation for the spirit of humanitarianism will be understood by, but chase shared bicycle provider right and wrong simply Normal not logic.Moreover as victim, you itself is also in fault, if you do not go Misuse to share bicycle, just It is not in present situation." it is that should give reparation to shared bicycle provider to hold a definite attitude, but transfer again below first Expression victim oneself is also required to undertake the responsibility.Such as：Shopping platform (including:Jingdone district store, Suning easily purchase, day cat etc.) daily User's net purchase comment information of magnanimity will be generated；Such as：The comment of the one purchase mobile phone user in Jingdone district：" sound is too small, connects electricity Words it is very hard, sound opens to the maximum effect also unobvious, but networking speed is quickly, shape as smart as a new pin, whole favorable comment ", It expressed before this and states that arrangement is satisfied behind disadvantage certainly.These real-life comment informations, which are removed, to be had and can show emotion Outside polar emotion word, and makes comment information containing adversative while there is positive/negative two kinds of emotions, this feature to make text feelings Sense classification problem becomes more complicated, but also traditional data mining algorithm and existing machine learning method face sternness Challenge：

One of challenge：Traditional unsupervised segmentation method based on emotion dictionary, passes through word in emotion dictionary parsing sentence The feeling polarities of language, and by determining the Sentiment orientation of sentence entirety to the polar simple summation of these words, to word Importance does not distinguish, it is clear that is difficult to obtain preferable effect；

The two of challenge：Based on machine learning text emotion trend analysis method (including：K- neighbours, support vector machines SVM, Bayes etc.) there are following main problems：1) be indicated using traditional bag of words method, the dimension of text vector compared with High and data are more sparse, do not utilize the training of model；2) only consider the syntactic structure between feature and ignore its semantic information, make It is mismatched at the semanteme in Feature Mapping result, can not indicate the semanteme of document well.

The three of challenge：Existing deep learning method can learn sentence characteristics, and typical neural network structure such as recycles god Through network (RNN) and convolutional neural networks (CNN), both models are all that word vector indicates feature space, and utilizes semanteme Synthetic method extracts the feature of sentence, finally grader is used to classify its feeling polarities.Relative to RNN models, CNN parameters Negligible amounts, and the semantic feature of text can be preferably captured, time complexity is also much smaller than RNN.However, traditional CNN nets Network is used to have ignored the structure feature of sentence when sentiment analysis, and Max-pooling methods are from the feature of sentence according in importance A maximum value is extracted, any differentiation is not made to the structure of sentence.This feature makes this method in the processing of turnover sentence It is ineffective.

Invention content

In the presence of solving the problems, such as three challenges, the present invention is provided one kind and is divided based on turnover sentence semantic chunk The sensibility classification method of mechanism carries out segmentation to Feature Mapping space and realizes sentence language to the adversative dictionary based on structure The division of adopted block to obtain the important semantic information in every section, while considering the positional structure feature of sentence, and then improves text The correctness of this Sentiment orientation analysis.

The present invention is to reach the goal of the invention, is adopted the following technical scheme that：

The present invention it is a kind of based on turnover sentence semantic chunk partition mechanism sensibility classification method the characteristics of be as follows into Row：

Step 1：The term vector of sample indicates in training set and test set

Step 1.1 builds term vector dictionary D

External language material is obtained from network and is trained, and term vector dictionary D is obtained, for inquiring in training set and test set The term vector of word；The dimension set of term vector is | V |；

Step 1.2 carries out term vector expression to sample in training set and test set

Obtain | I | comment text composing training collection DS={ s₁,s₂,…s_i…s_|I|And | I | comment text is constituted Test set DT={ t₁,t₂,…,t_j,…t_|J|, wherein s_iAnd t_jI-th in the training set DS and test set DT is indicated respectively Training sample and j-th of test sample, and have： It indicates in the training set DS i-th Training sample s_iIn m-th of word； Indicate j-th of test in the test set DT Sample t_jIn n-th of word；I=1,2 ..., | I |, m=1,2 ..., M, j=1,2 ..., | J |, n=1,2 ..., N；

According to the term vector dictionary D, i-th of training sample s in the training set DS is inquired_iIn m-th of word Term vector beObtain i-th of training sample s in the training set DS_iTerm vector matrix For a M × | V | matrix；

Similarly obtain j-th of test sample t in the test set DT_jTerm vector matrix Indicate j-th of test sample t in the test set DT_jIn n-th of wordTerm vector；

Step 2：Setting convolution kernel simultaneously carries out convolutional calculation

The convolution kernel set of K kind different size sizes is arranged in step 2.1, is denoted as { W¹,W²,…,W^k,…,W^KWhereinIndicate a height of h_k, width n_kKth kind size convolution kernel set,Indicate h_k×n_kMatrix；And have It indicates g-th of convolution kernel in kth kind size convolution kernel set, and carries out random Initialization；

Step 2.2 is with g-th of convolution kernelFor sliding window, using formula (1) to g-th of convolution kernelWith G-th of convolution kernelI-th of training sample s under covering_iTerm vector matrix S_iτ to τ+h-1 between piece SectionIt carries outConvolution operation obtains single features map vectorIn the τ valueIt is reflected to obtain single features Directive amountThen by the convolution kernel of the K kinds size and i-th of training sample s_i's Term vector matrix S_iConvolution operation is carried out, i-th of training sample s is obtained_iTerm vector matrix S_iFeature Mapping vector

In formula (1),Indicate the vector matrix between the τ to τ+h-1 under the covering of current sliding window mouth,Indicate that bias vector, σ () are activation primitive,Indicate 1 × (M-h_k+ 1) matrix, 1≤τ≤M-h_k+ 1；

Step 3：Structure turnover dictionary ZD, and i-th of training sample s in the training set DS_iMiddle lookup adversative, It is according to adversative position that the Feature Mapping is vectorialTo being segmented, and most important one is extracted in each segmentation A feature, then several segments obtain several features；

Step 3.1 structure turnover dictionary ZD, and according to the adversative dictionary ZD, search in the training set DS i-th Training sample s_iIn whether contain adversative, if containing l-th adversative z in the turnover dictionary ZD_L, then the L is provided A adversative z_LI-th of training sample s in the training set DS_iIn position be

Step 3.2 is according to g-th of convolution kernelSize obtain the adversative z_LSample is trained at described i-th This s_iTerm vector matrix S_iSingle features map vectorIn position beAnd as division points；

Step 3.3 is according to the division pointsBy Feature Mapping vectorIn single features map vectorIt is divided into two sections of single features map vectors, i.e.,With

Step 3.4 utilizes maximum pond two sections of single features map vectors of method pairWithIt is handled, point The maximum value in two sections of single features map vectors is not obtainedWithAnd form the maximum value pair of kth kind g-th of convolution kernel of sizeTo obtain K × G maximum value pair

Step 3.5 is to the K × G maximum value pairIt carries out Splicing, thus obtains i-th of training sample s_iFinal character representation space

Step 4：Based on the character representation spaceBuild disaggregated model

Step 4.1 is based on Bernoulli Jacob and is distributed setting zero setting vector for r, and the zero setting vector r and character representation space Element for same dimension be 0 or be 1 vector；

Step 4.2 is using formula (2) to training set DS structure softmax graders O：

In formula (2), f () is activation primitive, W^oFor weighting parameter, b^oFor another bias vector；

Step 4.3 optimizes loss function using gradient descent method, to the softmax graders O The training of (), the softmax graders after being optimizedAs the disaggregated model；

Step 5 utilizes the disaggregated model O^*() is to j-th of test sample t in the test set DT_jCarry out emotion point Class obtains the probability of different emotions classification, and using the emotional category corresponding to maximum probability as final emotional semantic classification result.

Compared with the prior art, beneficial effects of the present invention are embodied in：

1, in order to capture abundanter text feature, the present invention carries out convolution fortune using various sizes of more convolution kernels It calculates, to obtain the Feature Mapping vector of multi-source, has achieved the purpose that extract higher quality feature.

2, the present invention is based on adversative positions is segmented sentence, and most important feature is chosen from each section；Both gram It has taken the dimension disaster in conventional method, sparsity and has not accounted for text semantic information, reduced so as to cause nicety of grading Problem；It overcomes again and does not account for sentence turnover phenomenon in deep learning method, extract feature only according to importance, cause important The problem of characteristic information is lost；Therefore the accuracy rate of text emotion classification is improved.

3, the present invention is distributed zero setting method in final disaggregated model building process using Bernoulli Jacob, can effectively prevent Only model over-fitting makes model have better generalization ability.

4, the present invention is towards practical application area, such as：User delivers the Sentiment orientation to event view in social networks, can It is found in time for government department and grasps public opinion trend；Shopping online user to the Sentiment orientation of comment on commodity, can be businessman, Consumer provides prediction, early warning work, provides suggestion for the sale of businessman, service quality adjustable strategies and is the purchase of consumer Object behavior is recommended.

Description of the drawings

Fig. 1 is sensibility classification method flow chart of the present invention；

Fig. 2 is the process schematic that the present invention carries out convolution operation using more convolution kernels；

Fig. 3 is that the present invention is based on the process schematics that adversative position carries out semantic segmentation；

Fig. 4 is the process schematic that the present invention builds disaggregated model merely with Partial Feature in feature space.

Specific implementation mode

In the present embodiment, as shown in Figure 1, a kind of sensibility classification method based on turnover sentence semantic chunk partition mechanism is by such as Lower step carries out：

Step 1：The term vector of sample indicates in training set and test set

Step 1.1 builds term vector dictionary D

External language material is obtained from network and is trained, and term vector dictionary D is obtained, for inquiring in training set and test set The term vector of word；Based on corpus (about 100,000,000,000 words) outside GoogleNews corpus in the present embodiment, utilize Word2vec disclosed in google is trained corpus, obtained term vector library googlenews-vecctors- Negative300.bin files are as term vector dictionary D, the dimension set of term vector | V |, in the present embodiment | V |=300；

Obtain | I | comment text composing training collection DS={ s₁,s₂,…s_i…s_|I|And | I | comment text is constituted Test set DT={ t₁,t₂,…,t_j,…t_|J|, wherein s_iAnd t_jI-th of training in training set DS and test set DT is indicated respectively Sample and j-th of test sample, and have： Indicate i-th of training sample in training set DS s_iIn m-th of word； Indicate j-th of test sample t in test set DT_jIn n-th A word；I=1,2 ..., | I |, m=1,2 ..., M, j=1,2 ..., | J |, n=1,2 ..., N；

According to term vector dictionary D, i-th of training sample s in training set DS is inquired_iIn m-th of wordTerm vector ForObtain i-th of training sample s in training set DS_iTerm vector matrixFor a M × | V | matrix；

Similarly obtain j-th of test sample t in test set DT_jTerm vector matrix Indicate j-th of test sample t in test set DT_jIn n-th of wordTerm vector；

The convolution kernel set of K kind different size sizes is arranged in step 2.1, is denoted as { W1, W2 ..., W^k..., WK } whereinIndicate a height of h_k, width n_kKth kind size convolution kernel set,Indicate h_k×n_kMatrix；And have It indicates g-th of convolution kernel in kth kind size convolution kernel set, and carries out random Initialization；

Step 2.2 is with g-th of convolution kernelFor sliding window, using formula (1) to g-th of convolution kernelWith g-th volume Product coreI-th of training sample s under covering_iTerm vector matrix S_iτ to τ+h-1 between segmentIt carries outConvolution operation obtains single features map vectorIn the τ valueTo obtain single features map vectorThen by the convolution kernel of K kind sizes and i-th of training sample s_iTerm vector matrix S_iConvolution operation is carried out, i-th of training sample s is obtained_iTerm vector matrix S_iFeature Mapping vector

In formula (1),Indicate the vector matrix between the τ to τ+h-1 under the covering of current sliding window mouth,Expression bias vector, σ () relu activation primitives,Indicate 1 × (M-h_k+ 1) matrix, 1≤τ≤M- h_k+1；Convolution operationConcrete operation be expressed as WithG-th of convolution kernel is indicated respectivelyWith term vector matrix S_iτ to τ+h-1 between segmentIn E rows f row element.

The convolution algorithm schematic diagram that the convolution kernel for assuming 3 kinds of sizes is given shown in Fig. 2, the size point of convolution kernel in Fig. 2 Not Wei 2 × 300,3 × 300 and 4 × 300, i.e. h_k=2, h_k=3, h_k=4, n is set here₁=n₂=n₃=| V |=300；Each The convolution kernel of size has 100, i.e. G=100.Respectively to 3 × 100 convolution kernels and i-th of training sample s in Fig. 2_iWord Vector matrix S_iConvolution algorithm is carried out,

Work as convolution kernelSize is 2 × 300, the single features map vector obtained after convolution operationIt is tieed up for M-2+1 Vector, the convolution kernel that 100 sizes are 2 × 300 then obtains the vector of 100 M-2+1 dimensions, is expressed as

Work as convolution kernelSize is 3 × 300, the single features map vector obtained after convolution operationIt is tieed up for M-3+1 Vector, the convolution kernel that 100 sizes are 3 × 300 then obtains the vector of 100 M-3+1 dimensions, is expressed as

Work as convolution kernelSize is 4 × 300, the single features map vector obtained after convolution operationIt is tieed up for M-4+1 Vector, the convolution kernel that 100 sizes are 4 × 300 then obtains the vector of 100 M-4+1 dimensions, is expressed as

3 × 100 convolution kernels in Fig. 2 and i-th of training sample s_iTerm vector matrix S_iConvolution algorithm is carried out, can be obtained To 3 × 100 single features map vectorsObtain i-th of training sample s_iTerm vector matrix S_iFeature Mapping to Amount, is expressed as

Step 3：Structure turnover dictionary ZD, and i-th of training sample s in training set DS_iMiddle lookup adversative, according to Adversative position is vectorial by Feature MappingTo being segmented, and a most important feature is extracted in each segmentation, Then several segments obtain several features；

Step 3.1 structure turnover dictionary ZD, and according to adversative dictionary ZD, search i-th of training sample in training set DS s_iIn whether contain adversative, if containing l-th adversative z in turnover dictionary ZD_L, then l-th adversative z is provided_LIn training Collect i-th of training sample s in DS_iIn position be

Intelligent word (http is combined in the present embodiment://www.smart-words.org/linking-words/ ) and MSU (https transition-words.html://msu.edu/user/jdowell/135/transw.html) it is open Adversative, construct the dictionary of total 179 adversatives.Adversative dictionary ZD is as shown in table 1.

1 adversative dictionary of table

Step 3.2 is according to g-th of convolution kernelSize obtain adversative z_LIn i-th of training sample s_iTerm vector square Battle array S_iSingle features map vectorIn position beAnd as division points；

Step 3.3 is according to division pointsBy Feature Mapping vectorIn single features map vectorIt is divided into Two sections of single features map vectors, i.e.,With

Step 3.5 is to K × G maximum value pairIt is spelled It connects, thus obtains i-th of training sample s_iFinal character representation space

The specific example that semantic division is carried out based on adversative position is given shown in Fig. 3, with 100 sizes for 3 For × 300 convolution kernel, the pond maximum operation process based on adversative is illustrated.Include adversative in sample in Fig. 3 " but ", and the position of " but " in the sample isIt is assumed that the size of convolution kernel is 3 × 300, adversative " but " can be obtained and existed Single features map vectorIn division points be 10-3+1, to by single features map vectorIt is divided into two sectionsWithAnd every section of maximum value is found out respectively, so as to To a maximum value pair, the convolution kernel that 100 sizes are 3 × 300 then obtains 100 maximum values pair, to constitute final spy Levy representation space

Step 4：Feature based representation spaceBuild disaggregated model

Step 4.1 is based on overfitting problem caused by full connection type trains grader possible in order to prevent, is exerted using uncle Sharp mode is randomly by character representation spaceρ is set to 0 according to a certain percentage, and only non-zero element participates in grader structure.Specifically Operation is as shown in Figure 4：Setting zero setting vector is distributed for r based on Bernoulli Jacob, and zero setting vector r and character representation spaceIt is same The element of dimension be 0 or be 1 vector, utilizeIt willIn Partial Elements set to 0；

Step 4.2 is using formula (2) to training set DS structure softmax graders O：

In formula (2), f () is activation primitive, and sigmoid functions or tanh functions, W can be used^oFor random initializtion Weighting parameter, b^oFor another bias vector, it is initialized as 0 vector；

I-th of training sample s can be calculated using the grader of formula (2)_iThe probability for belonging to classification l is expressed as formula (3)：

In formula (3),Indicate O^*(s_i) vector first of element, | l | indicate classification total number.

Step 4.3 optimizes logarithm loss function using gradient descent method, to softmax graders O The training of (), the softmax graders after being optimizedAs disaggregated model；

In the present embodiment, log-likelihood loss function is expressed as formula (4)：

The concrete operation step of gradient descent method is as follows：

Step 4.3.1 updates weighting parameter Wo and bias vector bo according to formula (5) and formula (6)；

Step 4.3.2 is by the maximum value pair of kth kind g-th of convolution kernel of sizeIt is put back to single Feature Mapping vectorIn home position, single features map vectorIn remaining position set to 0, be expressed as

Step 4.3.3 updates g-th of convolution kernel according to formula (7) and formula (8)With bias vector b^c

In formula (7) and formula (8),It indicates to carry out 180 overturnings to matrix；

Step 4.3.4 returns to step 2, and it is E times total that iteration executes from Step 2 to Step 4；

Step 5 utilizes disaggregated model O^*() is to j-th of test sample t in test set DT_jEmotional semantic classification is carried out, is obtained not The probability of feeling of sympathy classification, and using the emotional category corresponding to maximum probability as final emotional semantic classification result.

For j-th of test sample t in test set DT_j, final feature space is obtained according to the identical method of training sampleSubstitute into grader O^*J-th of test sample t in test set DT is solved in ()_jThe probability for belonging to classification l is expressed as：

In formula (9),Indicate O^*(t_j) vector first of element, | l | indicate classification total number.

Claims

1. a kind of sensibility classification method based on turnover sentence semantic chunk partition mechanism, it is characterized in that carrying out as follows：

Step 1：The term vector of sample indicates in training set and test set

Step 1.1 builds term vector dictionary D

External language material is obtained from network and is trained, and term vector dictionary D is obtained, for inquiring word in training set and test set Term vector；The dimension set of term vector is | V |；

Obtain | I | comment text composing training collection DS={ s₁,s₂,…s_i…s_|I|And | I | comment text constitutes test Collect DT={ t₁,t₂,…,t_j,…t_|J|, wherein s_iAnd t_jI-th of training in the training set DS and test set DT is indicated respectively Sample and j-th of test sample, and have： Indicate i-th of instruction in the training set DS Practice m-th of word in sample si； Indicate j-th of test specimens in the test set DT This t_jIn n-th of word；I=1,2 ..., | I |, m=1,2 ..., M, j=1,2 ..., | J |, n=1,2 ..., N；

According to the term vector dictionary D, i-th of training sample s in the training set DS is inquired_iIn m-th of wordWord Vector isObtain i-th of training sample s in the training set DS_iTerm vector matrixFor One M × | V | matrix；

Similarly obtain j-th of test sample t in the test set DT_jTerm vector matrix Table Show j-th of test sample t in the test set DT_jIn n-th of wordTerm vector；

The convolution kernel set of K kind different size sizes is arranged in step 2.1, is denoted as { W¹,W²,…,W^k,…,W^KWhereinIndicate a height of h_k, width is the kth kind size convolution kernel set of nk,Indicate h_k×n_kMatrix；And have Indicate kth kind size convolution kernel set in g-th of convolution kernel, and carry out with Machine initializes；

Step 2.2 is with g-th of convolution kernelFor sliding window, using formula (1) to g-th of convolution kernelWith it is described G-th of convolution kernelI-th of training sample s under covering_iTerm vector matrix S_iτ to τ+h-1 between segmentIt carries outConvolution operation obtains single features map vectorIn the τ valueTo obtain single features mapping VectorThen by the convolution kernel of the K kinds size and i-th of training sample s_iWord Vector matrix S_iConvolution operation is carried out, i-th of training sample s is obtained_iTerm vector matrix S_iFeature Mapping vector

Step 3：Structure turnover dictionary ZD, and i-th of training sample s in the training set DS_iMiddle lookup adversative, according to Adversative position is vectorial by the Feature MappingTo being segmented, and a most important spy is extracted in each segmentation Sign, then several segments obtain several features；

Step 3.1 structure turnover dictionary ZD, and according to the adversative dictionary ZD, search i-th of training in the training set DS Sample s_iIn whether contain adversative, if containing l-th adversative z in the turnover dictionary ZD_L, then provide the l-th and turn Roll over word z_LI-th of training sample s in the training set DS_iIn position be

Step 3.2 is according to g-th of convolution kernelSize obtain the adversative z_LIn i-th of training sample s_i's Term vector matrix S_iSingle features map vectorIn position beAnd as division points；

Step 3.3 is according to the division pointsBy Feature Mapping vectorIn single features map vector It is divided into two sections of single features map vectors, i.e.,With

Step 3.4 utilizes maximum pond two sections of single features map vectors of method pairWithIt is handled, respectively To the maximum value in two sections of single features map vectorsWithAnd form the maximum value pair of kth kind g-th of convolution kernel of sizeTo obtain K × G maximum value pair

Step 3.5 is to the K × G maximum value pairIt is spelled It connects, thus obtains i-th of training sample s_iFinal character representation space

Step 4：Based on the character representation spaceBuild disaggregated model

Step 4.1 is based on Bernoulli Jacob and is distributed setting zero setting vector for r, and the zero setting vector r and character representation spaceIt is same The element of dimension be 0 or be 1 vector；

Step 4.2 is using formula (2) to training set DS structure softmax graders O：

Step 4.3 optimizes loss function using gradient descent method, to the softmax graders O () Training, the softmax graders after being optimizedAs the disaggregated model；

Step 5 utilizes the disaggregated model O^*() is to j-th of test sample t in the test set DT_jEmotional semantic classification is carried out, is obtained To the probability of different emotions classification, and using the emotional category corresponding to maximum probability as final emotional semantic classification result.