CN106980650A

CN106980650A - A kind of emotion enhancing word insertion learning method towards Twitter opinion classifications

Info

Publication number: CN106980650A
Application number: CN201710117139.8A
Authority: CN
Inventors: 熊蜀峰; 吕琼帅; 李玮瑶; 彭伟国; 王魁祎
Original assignee: Pingdingshan University
Current assignee: Pingdingshan University
Priority date: 2017-03-01
Filing date: 2017-03-01
Publication date: 2017-07-25

Abstract

The invention provides a kind of emotion enhancing word insertion learning method towards Twitter opinion classifications, it is related to field of computer technology, when modeling word rank n gram and polarity information simultaneously, the method of the present invention not only models the feeling polarities information of tweet documentation levels, word rank emotion information is also integrated with, and word rank is inputted into the input after convolution as tweet ranks naturally.It is on the data set of standard test result indicates that method of the invention surpasses current congenic method when being embedded into using the word learnt in Twitter viewpoint polarity classification tasks.

Description

A kind of emotion enhancing word insertion learning method towards Twitter opinion classifications

Technical field

The present invention relates to field of computer technology, more particularly to a kind of emotion towards Twitter opinion classifications strengthens word Embedded learning method.

Background technology

Twitter is one of microblogging website maximum on internet, has turned into the weight that online viewpoint and mood are issued at present One of originate.Due to user base that is its magnanimity, diversified and growing steadily, the viewpoint included in Twitter Information has been successfully applied to multiple-task, such as stock market forward prediction, political VIP's information monitoring and deduction public event Will of the people etc..Therefore, the recognition performance of efficient positive, negative and neutral viewpoint is the basic place of application task.

Scholars propose a variety of methods to improve Twitter viewpoint analysis performance.Particularly depth is neural in recent years The development of network, it was demonstrated that text representation study (expression of word rank, sentence level and documentation level) is to natural Language Services The importance of reason task.Traditional word embedding grammar is mainly modeled to syntax contextual information.On this basis, Tang et al. Propose that related word insertion (Sentiment-SpecificWord Embedding, the SSWE) learning method of emotion is used for Twitter opinion classifications, its purpose has similar contextual information this primarily directed to two opposite polarity words of viewpoint Phenomenon, the word insertion that this phenomenon can cause simple utilization contextual information to learn cannot be distinguished by the viewpoint polarity of two words Problem.On the basis of SSWE, Ren et al. is it is further proposed that theme and enhanced word insertion (the Topic and of emotion Sentiment-enriched Word Embedding, TSWE) model learns the insertion of theme enhanced word, and the purpose is to enter One step considers the polysemia of emotion word.

It is embedded in however, existing work carries out the related word of emotion merely with viewpoint polarity label overall Twitter Study, actually has many classical sentiment dictionaries to utilize.In addition, these be operated on the basis of conventional method add away from Tweet polarity labels from supervision (distant supervised) carry out the related word insertion of Latent abilities, but traditional word is embedding It is local context model to enter learning method, and tweet polarity labels belong to global document class information.In order to utilize Tweet polarity labels, emotion word embedding grammar assumes that each word in viewpoint contextual window affects context polarity, office The polarity of portion's context is consistent with the global polarity of tweet.In other words, they directly distribute to the global polarity of tweet ranks Local context and without any amendment.On the other hand, the polarity of the word obtained from dictionary is to viewpoint polarity classification task For, it is still a highly useful information.Therefore, one can utilize multi-level (word rank and tweet ranks) simultaneously The unified learning framework of word insertion of polarity label turns into the key for solving this problem, while the problem of this is also a challenge.

In practical work, multiple learning objectives have been attached in a function by SSWE models, and its target is syntax study respectively With feeling polarities study.Further, TSWE models simultaneously encoding schemes information, emotion information and syntactic information to neutral net Optimization aim in.However, in most cases, multiple targets can not be optimized directly using a unified framework, for example, Word feeling polarities and tweet viewpoint polarity.Although multi-level emotion represents that multitask can be regarded as, with standard multi-task learning There are different inputs：(1) word and its context and (2) whole tweet.Two input correspond respectively to word feeling polarities and Tweet viewpoint polarity, and the unified document of the usual only one of which of multitask deep learning method (tweet) is inputted.Therefore, nothing Method directly handles this problem using existing multitask deep learning framework.

The content of the invention

The embodiments of the invention provide a kind of emotion enhancing word insertion learning method towards Twitter opinion classifications, use To solve problems of the prior art.

A kind of emotion enhancing word insertion learning method towards Twitter opinion classifications, this method includes：

Input tweetD and include n contextual window c, each described contextual window c is inputted to a shared list Member, the word insertion dimension of the shared cell is d, and hidden layer dimension is h, and each shared cell is comprising a word insertion There is t word in layer and a linear layer, each described contextual window c, the contextual window c is inputted to institute's predicate After embeding layer, output is：

Wherein, x_i+1,x_i+2,...,x_i+tI≤n represents t word in contextual window c described in i-th, and institute's predicate is embedding Enter the output x of layer_i+1:i+tInput obtains word insertion vector to the linear layer：

Wherein, f () represents linear function, W₁ ¹∈R^(t*d)×hWithFor the parameter of the linear layer, R is that real number is empty Between, its subscript representation space dimension, * represents that numerical value is multiplied, × represent dimension superposition；

The word insertion vector e of the shared cell output_iActive coating output a by left sub-network₁=hTanh (e_i), so Afterwards by a₁The emotion that n-gram prediction score values and word rank are respectively obtained after two linear transformation processing predicts score value：

Wherein, f^ngmScore value, f are predicted for n-gram^wsScore value is predicted for the emotion of word rank,WithIt is the left side The parameter of sub-network；

In model training, the contextual window c and its variant are inputtedTo left sub-network, therefore the damage of word rank Function is lost to calculate by following formula：

loss_ws(c)=max (0,1- φ (0) f^ws(c)+φ(1)f^ws(c))

Wherein, α is linear interpolation weight, and φ () is that the viewpoint polarity of the centre word of the contextual window c indicates letter Number：

Wherein y is the standard affective tag of word, and when representing y using 2 dimensional vectors, i.e., negative polarity is expressed as [1,0], just Polarity is expressed as [0,1]；

By the word insertion vector e of the n shared cell output₁,e₂,...,e_i,...,e_nInput to right sub-network, institute Predicate insertion vector e₁,e₂,...,e_i,...,e_nThe collection of composition shares e and represented, e is respectively adopted max-pooling, Obtained respectively after average-pooling and the methods processing of tri- kinds of min-pooling pond fixed dimension feature max (e), Those features are obtained by avg (e) and min (e) after linear layer carries out linear process：

Wherein, W₁ ²∈R^t*h×hWithFor the parameter of linear layer, predict that obtained tweet is seen by softmax layers Point-polarity is：

f^ds=softmax (a₂)

Therefore, the loss function of tweet ranks is：

Wherein, g () is golden standard emotion distributions of the tweet on [positive, negative]；

The totality point of final optimization pass target is calculated according to the loss function of institute's word level and the loss function of tweet ranks Number：

Wherein, β is the weight harmonic coefficient between word rank and tweet ranks；

Overall score using the final optimization pass target is training objective, using based on dictionary resources and large-scale distance Supervise tweet language materials and train the left sub-network and right sub-network, obtain revised word insertion vector；

Polarity point is carried out to the tweet that vector representation is embedded in using the revised word using supervised learning algorithm Class.

Preferably, optimize the training left sub-network and right sub-network using stochastic gradient descent method, and use mini- Batch accelerates training process.

Preferably, when the left sub-network is trained with right sub-network using two different batch sizes, main batch Size is to be directed to tweet ranks, and secondary batch sizes are then identical with the contextual window number set in every tweet.

Preferably, polarity classification task is completed using a neural network classifier, specifically included：First, more than one The tweet of the convolutional layer processing input of filter, then, one Maxpooling layers take the maximum of each convolution kernel as spy Levy, next layer is full connection hidden layer, and its activation primitive is ReLU, is mainly used in the implicit character representation of study, last layer It is full articulamentum, positive/negative polarity distribution is predicted in its 2 dimension output using softmax.

The beneficial effects of the present invention are：When modeling word rank n-gram and polarity information simultaneously, method of the invention The feeling polarities information of tweet documentation levels is not only modeled, word rank emotion information is also integrated with, and it is naturally that word rank is defeated Enter the input as tweet ranks after convolution.When being embedded into using the word learnt in Twitter viewpoint polarity classification tasks, It is on the data set of standard test result indicates that method of the invention surpasses current congenic method.

Brief description of the drawings

In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing There is the accompanying drawing used required in technology description to be briefly described, it should be apparent that, drawings in the following description are only this Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, can be with Other accompanying drawings are obtained according to these accompanying drawings.

Fig. 1 is a kind of emotion enhancing word insertion study side towards Twitter opinion classifications provided in an embodiment of the present invention The flow chart of method；

Fig. 2 is the Macro-F1 change curves of the MSWE methods using different β value.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation is described, it is clear that described embodiment is only a part of embodiment of the invention, rather than whole embodiments.It is based on Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under the premise of creative work is not made Embodiment, belongs to the scope of protection of the invention.

As shown in figure 1, the invention discloses a kind of emotion enhancing word insertion study side towards Twitter opinion classifications Method, this method includes：

Step 100, pretreatment input tweet.Specifically, input tweetD includes n contextual window c, by each The contextual window c is inputted to a shared cell, and the word insertion dimension of the shared cell is d, and hidden layer dimension is h, Each shared cell, which is included, has t word in a word embeding layer and a linear layer, each contextual window c, The contextual window c is inputted to institute's predicate embeding layer, output is：

Wherein, f () represents linear function, W₁ ¹∈R^(t*d)×hWithFor the parameter of the linear layer, R is that R is real Number space, its subscript representation space dimension, * represents that numerical value is multiplied, × represent dimension superposition.

Step 110, the loss function of word rank is calculated.Specifically, institute's predicate insertion vector e_iBy swashing for left sub-network Layer output a living₁=hTanh (e_i), then by a₁N-gram prediction score values and word are respectively obtained after two linear transformation processing The emotion prediction score value of rank：

Wherein, f^ngmScore value, f are predicted for n-gram^wsScore value is predicted for the emotion of word rank,WithIt is the left side The parameter of sub-network.

loss_ws(c)=max (0,1- φ (0) f^ws(c)+φ(1)f^ws(c))

Wherein y is the standard affective tag of word, and when representing y using 2 dimensional vectors, i.e., negative polarity is expressed as [1,0], just Polarity is expressed as [0,1].If it is worth noting that, c centre word is not emotion word, only optimizing n-gram prediction score values. The variantObtained after the word replacement c different with one centre word.

Step 120, the loss function of tweet ranks is calculated.Specifically, the word by the n shared cell outputs is embedded in Vectorial e₁,e₂,...,e_i,...,e_nInput to right sub-network, institute's predicate is embedded in vector e₁,e₂,...,e_i,...,e_nThe collection of composition Share e to represent, e is respectively adopted tri- kinds of pond method processing of max-pooling, average-pooling and min-pooling Obtain feature max (e), the avg (e) and min (e) of fixed dimension respectively afterwards, those features are linearly located by linear layer Obtained after reason：

Wherein, W₁ ²∈R^t*h×hWithFor the parameter of linear layer.Predict that obtained tweet is seen by softmax layers Point-polarity is：

f^ds=softmax (a₂)

Therefore, the loss function of tweet ranks is：

Wherein, g () is golden standard emotion distributions of the tweet on [positive, negative].

Step 130, the totality point of final optimization pass target is calculated according to the loss function obtained in step 110 and step 120 Number：

Wherein, β is the weight harmonic coefficient between word rank and tweet ranks.

Step 140, the obtained overall score of final optimization pass target is calculated as training objective using step 130, using based on Dictionary resources and large-scale distance supervision tweet language materials train the left sub-network and right sub-network, obtain revised word Embedded vector.Wherein distance supervision tweet is based on positive hashtag (such as #happy, #joy and #happyness) and born Hashtag (such as #sadness, #angry and #frustrated) and emoticon is (such as:(:-(:(:):-):):D etc.).This The tweet language materials that text is crawled are from April 30,1 day to 2015 March in 2015.Through every tweet of over-segmentation, remove user, The pre-treatment steps such as network address URLs, duplicate contents reference, rubbish contents and other non-english languages, finally give positive-negative polarity Each 5,000,000 of tweet.

Optimize training objective in the present embodiment using stochastic gradient descent method, and accelerate using mini-batch training Process.However, for the prediction of word rank, thering is substantial amounts of effective word to need to calculate n-gram and polarity loss in tweet.It is right In tweet level predictions, only one of which penalty values.Moreover, in order to calculate the polarity fraction of losses of tweet ranks, it is necessary to first Calculate linear transformation a₁.Therefore, the method that common mini-batch can not be applied to the present invention.Therefore, the present invention is at two Using two different batch sizes during sub-network training.Main batch sizes are to be directed to tweet ranks, and secondary batch sizes are then It is identical with the window number set in every tweet.In other words, word rank batch sizes are variable, and tweet ranks batch It is then fixed.In training, empirical setting contextual window size is 3 in the present embodiment, and word insertion dimension is 50, hidden It is 20 to hide layer dimension, and main batch sizes are 32, and learning rate is 0.01.

Step 150, using supervised learning algorithm, such as SVM (SVMs), to embedding using the revised word The tweet that incoming vector is represented carries out polarity classification.

Specifically, the present embodiment completes classification task using a neural network classifier.First, a multiple filter Convolutional layer processing input tweet.Then, pooling layers of a Max takes the maximum of each convolution kernel as feature.Under One layer is full connection hidden layer, and its activation primitive is ReLU, is mainly used in the implicit character representation of study.Last layer is to connect entirely Layer is connect, positive/negative polarity distribution is predicted in its 2 dimension output using softmax.Herein dropout is used in input layer and hidden layer Regularization is done to avoid over-fitting.This neural network classifier is trained using back-propagating, and parameter, which updates, uses AdaGrad side Method.

Experimental evaluation

1st, data set and setting

Tested in following two datasets：1) SemEval2013, the Twitter polarity classification of a standard is surveyed Examination collection；2) the newest Twitter polarity grouped datas that CST (Context-Sensitive Twitter), Ren et al. are provided Collection, they crawl basic viewpoint tweet and its context to assess its model, because using having arrived context in its model Information is as supplemental training language material.Because the method for the present invention does not consider contextual information temporarily, therefore it is used only in an experiment Basic viewpoint tweet is without including context.Table 1 illustrates the details of each data set, and appraisal procedure is using positive and negative The Macro-F1 values of classification.

The data set statistical information of table 1 (CV is represented using the checking of 10 foldings)

Need carefully to set different parameters for different tasks, for the ease of contrast and experiment, the present embodiment is adopted With unified setting.The arranges value of these hyper parameters is that the manual debugging in SemEval2013 development sets is obtained.Final mould Type includes 7 hyper parameters, and being divided into network parameter, (i.e. word is embedded in dimension D, hidden layer dimension H, convolution kernel size S and convolution kernel Quantity N) and Training parameter (i.e. input layer dropout probability d₁, hidden layer dropout probability d₂, learning rate η).Table 2 List all arranges values.

Super ginseng arranges value in the final mask of table 2

2nd, contrast and experiment

The method of the present invention is contrasted with a series of fresh approach, and experimental result is as shown in table 3.All methods can To be divided into two classes：The traditional classifier and neural network classifier combined using different characteristic.Include in the first kind：

DistSuper+uni/bi/tri-gram：The use dictionary model trained on distance supervision language material LibLinear graders；

SVM+uni/bi/tri-gram：Using the SVM classifier of n-gram features；

SVM+C&W：The SVM classifier of feature is embedded in using C＆W words；

SVM+Word2vec：The SVM classifier of feature is embedded in using Word2vec words；

NBSVM：Comprehensive Naive Bayes and NB-enhanced SVM grader；

RAE：Use the recurrence self-encoding encoder of Wikipedia pre-training term vectors；

NRC：The best system of SemEval 2013Twitter polarity classification tasks, mainly by combining various emotion words The complex characteristic of allusion quotation and hand-designed；

SSWE：The SVM classifier of feature is embedded in using SSWE words.

Wherein SSWE has reached the best performance of congenic method, and the method, which is used, contains n-gram and emotion information Word insertion feature.NRC systems obtain the performance for being only second to SSWE, because having used answering for sentiment dictionary and many hand-designeds Miscellaneous feature.Emotion information is explicitly utilized due to no, the classification results using C＆W and Word2vec features are relatively poor.

Include for Equations of The Second Kind：

TSWE：The neural network classifier of feature is embedded in using theme and the enhanced word of emotion；

CNNM-Local：Utilize the neural network classifier of context supplemental training resource.

Neural network classifier can be natural use word insertion classified.TSWE and CNNM-Local are due to using Other information outside emotion information, respectively obtains performance optimal at present.And under conditions of using only emotion information, The method effect of the present invention is better than the two.Method proposed by the present invention and NRC employ sentiment dictionary information and achieved Preferable performance, this also illustrates that sentiment dictionary is still an effectively resource for Twitter viewpoint polarity classification tasks.

The Experimental comparison results of table 3

Model	SemEval2013	SST
			DistSuper+uni/bi/tri-gram	63.84	-
SVM+uni/bi/tri-gram	75.06	77.42
			SVM+C&W	75.89	-
SVM+Word2vec	76.31	-
			NBSVM	75.28	-
RAE	75.12	-
			NRC(TopSysteminSemEval)	84.73	80.24
SSWE	84.98	80.68
			TSWE	85.34	-
CNNM-Local	-	80.90
			MSWE(Ourmodel)	85.75	81.34

3rd, the influence of parameter beta

β is the coefficient for balancing two category informations.The present invention adjusts β parameter values in SemEval2013 development sets.For Another factor alpha, the present embodiment uses arranges value 0.5.It is the MSWE models in SemEval2013 development sets that Fig. 2, which is shown, The change curve of macro-F1 values.As can be seen that as β=0.8, the weight of word class information is higher, and its performance is optimal.Work as β Model degradation is SSWE when=1, is a difference in that affective tag during training comes from dictionary, represents to be used only as β=0 The emotion information of tweet ranks.As β=0, effect is worst, and this shows that n-gram information is Twitter polarity classification task one Individual indispensable feature.According to debugging result, the present embodiment selects β=0.8 as final Setup Experiments value.

It should be understood by those skilled in the art that, embodiments of the invention can be provided as method, system or computer program Product.Therefore, the present invention can be using the reality in terms of complete hardware embodiment, complete software embodiment or combination software and hardware Apply the form of example.Moreover, the present invention can be used in one or more computers for wherein including computer usable program code The computer program production that usable storage medium is implemented on (including but is not limited to magnetic disk storage, CD-ROM, optical memory etc.) The form of product.

The present invention is the flow with reference to method according to embodiments of the present invention, equipment (system) and computer program product Figure and/or block diagram are described.It should be understood that can be by every first-class in computer program instructions implementation process figure and/or block diagram Journey and/or the flow in square frame and flow chart and/or block diagram and/or the combination of square frame.These computer programs can be provided The processor of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce A raw machine so that produced by the instruction of computer or the computing device of other programmable data processing devices for real The device for the function of being specified in present one flow of flow chart or one square frame of multiple flows and/or block diagram or multiple square frames.

These computer program instructions, which may be alternatively stored in, can guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works so that the instruction being stored in the computer-readable memory, which is produced, to be included referring to Make the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one square frame of block diagram or The function of being specified in multiple square frames.

These computer program instructions can be also loaded into computer or other programmable data processing devices so that in meter Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented processing, thus in computer or The instruction performed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in individual square frame or multiple square frames.

, but those skilled in the art once know basic creation although preferred embodiments of the present invention have been described Property concept, then can make other change and modification to these embodiments.So, appended claims are intended to be construed to include excellent Select embodiment and fall into having altered and changing for the scope of the invention.

Obviously, those skilled in the art can carry out the essence of various changes and modification without departing from the present invention to the present invention God and scope.So, if these modifications and variations of the present invention belong to the scope of the claims in the present invention and its equivalent technologies Within, then the present invention is also intended to comprising including these changes and modification.

Claims

1. a kind of emotion enhancing word insertion learning method towards Twitter opinion classifications, it is characterised in that this method includes：

Input tweetD and include n contextual window c, each described contextual window c is inputted to a shared cell, institute The word insertion dimension for stating shared cell is d, and hidden layer dimension is h, each shared cell comprising a word embeding layer and There is t word in one linear layer, each described contextual window c, the contextual window c is inputted to institute's predicate and is embedded in After layer, output is：

x_{i + 1 : i + t} = x_{i + 1} &CirclePlus; x_{i + 2} &CirclePlus; ... &CirclePlus; x_{i + t}

Wherein, x_i+1,x_i+2,...,x_i+tI≤n represents t word in contextual window c described in i-th, by institute's predicate embeding layer Output x_i+1:i+tInput obtains word insertion vector to the linear layer：

e_{i} = f (W_{1}^{1} * x_{i + 1 : i + t} + b_{1}^{1})

Wherein, f () represents linear function, W₁ ¹∈R^(t*d)×hWithFor the parameter of the linear layer, R is real number space, Its subscript representation space dimension, * represents that numerical value is multiplied, × represent dimension superposition；

The word insertion vector e of the shared cell output_iActive coating output a by left sub-network₁=hTanh (e_i), then will a₁The emotion that n-gram prediction score values and word rank are respectively obtained after two linear transformation processing predicts score value：

f^{n g m} = W_{2}^{1} * a_{1}

f^{w s} = W_{3}^{1} * a_{1}

Wherein, f^ngmScore value, f are predicted for n-gram^wsScore value is predicted for the emotion of word rank,WithIt is the left sub-network Parameter；

In model training, the contextual window c and its variant are inputtedTo left sub-network, therefore the loss letter of word rank Number is calculated by following formula：

{loss}_{1} (c, \tilde{c}) = α * {loss}_{n g m} (c, \tilde{c}) + (1 - α) * {loss}_{w s} (c)

{loss}_{n g m} (c, \tilde{c}) = m a x (0, 1 - f^{n g m} (c) + f^{n g m} (\tilde{c}))

loss_ws(c)=max (0,1- φ (0) f^ws(c)+φ(1)f^ws(c))

Wherein, α is linear interpolation weight, and φ () is the viewpoint polarity indicator function of the centre word of the contextual window c：

φ (j) = \{\begin{matrix} 1 & i f y [j] = 1 \\ - 1 & i f y [j] = 0 \end{matrix}

Wherein y is the standard affective tag of word, and when representing y using 2 dimensional vectors, i.e., negative polarity is expressed as [1,0], positive polarity It is expressed as [0,1]；

By the word insertion vector e of the n shared cell output₁,e₂,...,e_i,...,e_nInput is to right sub-network, and institute's predicate is embedding Incoming vector e₁,e₂,...,e_i,...,e_nThe collection of composition shares e and represented, max-pooling, average- is respectively adopted to e Obtained respectively after pooling and the methods processing of tri- kinds of min-pooling pond fixed dimension feature max (e), avg (e) and Those features are obtained by min (e) after linear layer carries out linear process：

a_{2} = W_{1}^{2} * [m a x (e) &CirclePlus; a v g (e) &CirclePlus; m i n (e)] + b_{2}^{2}

Wherein, W₁ ²∈R^t*h×hWithFor the parameter of linear layer, obtained tweet viewpoints pole is predicted by softmax layers Property is：

f^ds=softmax (a₂)

Therefore, the loss function of tweet ranks is：

{loss}_{2} (D) = - \underset{k = {0, 1}}{Σ} g_{k} (D) {logf}_{k}^{d s}

The overall score of final optimization pass target is calculated according to the loss function of institute's word level and the loss function of tweet ranks：

l o s s = β * {loss}_{1} (c, \tilde{c}) + (1 - β) * {loss}_{2} (D)

Overall score using the final optimization pass target is supervised as training objective using based on dictionary resources and large-scale distance Tweet language materials train the left sub-network and right sub-network, obtain revised word insertion vector；

Polarity classification is carried out to the tweet that vector representation is embedded in using the revised word using supervised learning algorithm.

2. the method as described in claim 1, it is characterised in that optimize the training left subnet using stochastic gradient descent method Network and right sub-network, and training process is accelerated using mini-batch.

3. method as claimed in claim 2, it is characterised in that use two when the left sub-network and right sub-network are trained Different batch sizes, main batch sizes are to be directed to tweet ranks, the institute of secondary batch sizes then with being set in every tweet State contextual window number identical.

4. the method as described in claim 1, it is characterised in that polarity classification is completed using a neural network classifier and is appointed Business, is specifically included：First, the tweet of the convolutional layer processing input of a multiple filter, then, one Maxpooling layers take The maximum of each convolution kernel is as feature, and next layer is full connection hidden layer, and its activation primitive is ReLU, is mainly used in study Implicit character representation, last layer is full articulamentum, and positive/negative polarity distribution is predicted in its 2 dimension output using softmax.