CN107766331A  The method that automatic Calibration is carried out to word emotion value  Google Patents
The method that automatic Calibration is carried out to word emotion value Download PDFInfo
 Publication number
 CN107766331A CN107766331A CN201711105704.5A CN201711105704A CN107766331A CN 107766331 A CN107766331 A CN 107766331A CN 201711105704 A CN201711105704 A CN 201711105704A CN 107766331 A CN107766331 A CN 107766331A
 Authority
 CN
 China
 Prior art keywords
 mtd
 word
 value
 mrow
 msub
 Prior art date
 Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
 Pending
Links
Classifications

 G—PHYSICS
 G06—COMPUTING; CALCULATING OR COUNTING
 G06F—ELECTRIC DIGITAL DATA PROCESSING
 G06F40/00—Handling natural language data
 G06F40/30—Semantic analysis
Abstract
The invention discloses a kind of method that automatic Calibration is carried out to word emotion value, trains the term vector of all words in dictionary, initializes a small amount of word by the method for handmarking, be denoted as kind of a subword v_{j}；Calculate each kind subword v_{j}With each word v to be calibrated_{i}Term vector between cosine angle value, obtain each kind of subword v_{j}With each word v to be calibrated_{i}Between similarity；With kind of a subword v_{j}With word v to be calibrated_{i}As node, with kind of a subword v_{j}With word v to be calibrated_{i}Between similarity as even side right weight, build weight graph model；Predict word v to be calibrated_{i}Mood value val_{vi}With excitation value aro_{vi}.The present invention avoids a large amount of use from manually demarcating, and the accuracy of emotion value prediction result is high, solves the problem of prediction result accuracy of emotion value Forecasting Methodology in the prior art is low, and application is restricted.
Description
Technical field
The invention belongs to natural language processing technique field, is related to a kind of side that automatic Calibration is carried out to word emotion value
Method.
Background technology
The present invention is a kind of method for aiding in automatic sentiment dictionary to create, and an outstanding sentiment dictionary can be many feelings
The effect of sense analysis application provides safeguard.Substantial amounts of people is needed due to sentiment dictionary create using artificial method at present
Power material resources and time, therefore, creating sentiment dictionary using automatic method becomes inevitable choice.
" sentiment analysis " is related to the problem of many difficult, and there is certain contact between these problems.As a rule,
These problems are typically all the emotion value for needing to automatically detect that text, and it is positive or passive to identify this text
, or without Sentiment orientation.Moreover, with the fast development of online network social intercourse service, anyone may be used on network
With by a microblogging, push away text, or a circle of friends to express oneself view to a certain part thing, to certain the same thing
Happiness dislike.These emotion informations are very important resource and business opportunity for service provider and advertiser, major social activity
Network company sets up special segment analysis and the microblogging that excavation user delivers and pushes away emotion trend information and the user in text
Viewpoint information.Sentiment analysis technology how is improved, and then improves the degree of accuracy and the efficiency of analysis emotion information, becomes a weight
Want developing direction.On the other hand, not only need to use outstanding algorithm to be analyzed, it is more crucial and basis to be the need for one
Generally acknowledged can accurately give expression to the emotion method for expressing distinguished between all kinds of emotions.In emotion information analysis, in general
There is the method that two kinds of emotions represent, be discrete classification type method for expressing and continuous dimension type method for expressing respectively.The former allusion quotation
The representative of type has binary classification method for expressing, and emotion is simply divided into two kinds of positive and passive classifications；Also Robert
Eight yuan of basic emotional semantic classifications that Plutchik is proposed, eight class emotions are happy, sad, angry, frightened respectively, detest, be surprised, letter
Appoint and expect.Sentiment analysis application based on this emotion method for expressing has such as extraction of spam detection, viewpoint and viewpoint
Identification etc. has high actual application value and development prospect.
Continuous dimension type emotion method for expressing has equally attracted substantial amounts of concern in recent years.Compared to the expression of classification type
Method, the method for expressing of dimension type can allow sentiment analysis more accurate and careful.Because the sentiment analysis of dimension type can lead to
Various dimensions emotion value is crossed to represent the emotion expressed by a word, the emotion of such a word can be with use unique one
Individual coordinate represents, even express the word of close emotion.For example, in VA (ValenceArousal；Mood valueexcitation
Value) twodimentional emotional space, as shown in Figure 1 in, the V values of the two words of " happiness " and " mad with joy "  representative is positive or disappears
The degree of pole  will be very nearly the same, and representing the A values of emotion intensity can then have a long way to go.
On the premise of this kind of dimension type emotional space, the task of sentiment analysis be usually to word, sentence either
Text carries out the demarcation of emotion value.Also, the emotion value for carrying out sentence and text is demarcated generally based on the emotion value of word.
Therefore, it is essential to demarcate excellent sentiment dictionary.Complete the method for this task from big classification for have two kinds,
One of which method is by manually demarcating, and in general flow is to allow different demarcation personnel to some word according to certainly
Oneself judgement demarcation emotion value, the processing further according to the progress of proprietary calibration result necessarily mathematically afterwards, so as to draw this
The final emotion value of individual word.The cost so done is to need substantial amounts of manpower and time, with high costs.Another method is logical
Cross the demarcation that computer automatically carries out emotion value using the method for machine learning.However, for this task, use
Machine learning algorithm is typically all the algorithm for having supervision, and this requires the sentiment dictionary established in advance and provides training for algorithm
Collection.
Generally acknowledge that the higher sentiment dictionary of degree has at present：ANEW (the Affective Norms for English of English
Words), comprising 1034 English words for being labelled with VA values；Chinese CAW (Chinses Affective Words) and
CVAW (Chinese ValenceArousal Words), the English for being labelled with VA values comprising 162 and 1653 respectively are single
Word, other outstanding sentiment dictionaries do not enumerate.These sentiment dictionaries are the bases for realizing automatic marking sentiment dictionary.
Wei and Malandrakis et al. have used several methods based on recurrence to carry out emotion before and after 2011
Value prediction.These methods are typically used as training set (kind subword by sentiment dictionary；Seed words), train corresponding mould
Type, emotion value demarcation is carried out to the word (unseen words) for not demarcating emotion value by the model again afterwards.Such emotion
It is worth Forecasting Methodology, the accuracy of prediction result is low.
The content of the invention
To achieve the above object, the present invention provides a kind of method that automatic Calibration is carried out to word emotion value, avoids a large amount of
Using artificial demarcation, the accuracy of emotion value prediction result is high, solves the prediction knot of emotion value Forecasting Methodology in the prior art
The problem of fruit accuracy is low, and application is restricted.
The technical solution adopted in the present invention is, a kind of method that automatic Calibration is carried out to word emotion value, specifically according to
Following steps are carried out：
Step 1, the term vector of all words in dictionary is trained, a small amount of word is initialized by the method for handmarking,
It is denoted as kind of a subword v_{j}, remaining word is word v to be calibrated_{i}；The kind subword v being initialised_{j}Mood value be val_{vj}, excitation
It is worth for aro_{vj}；
Step 2, each kind subword v is calculated using word2vec instruments_{j}With each word v to be calibrated_{i}Term vector between
Cosine angle value, obtain each kind of subword v_{j}With each word v to be calibrated_{i}Between similarity；
Step 3, with kind of a subword v_{j}With word v to be calibrated_{i}As node, with kind of a subword v_{j}With word v to be calibrated_{i}It
Between similarity as even side right weight, build weight graph model；
Step 4, word v to be calibrated is predicted_{i}Mood value val_{vi}With excitation value aro_{vi}。
The present invention is further characterized in that, further, in the step 4, predicts word v to be calibrated_{i}Mood value val_{vi},
Continuous iteration is carried out by formula (3) and is updated to convergence：
Wherein, α is decay factor or confidence level, and value is between 01, and random number value is between 19, Sim (v_{i},
v_{j}) represent word v to be calibrated_{i}With kind of a subword v_{j}Between similarity, val_{vj}Represent the mood for the kind subword being initialised
Value, t represent the step number of iteration,The mood value of t step iteration words to be calibrated is represented,T1 step iteration is represented to treat
Demarcate the mood value of word.
Further, in the step 4, the excitation value aro of word to be calibrated is predicted_{vi}Method, by formula (4) carry out not
Disconnected iteration is updated to convergence：
Wherein, α is decay factor or confidence level, and value is between 01；Random number value is between 19, Sim (v_{i},
v_{j}) represent word v to be calibrated_{i}With kind of a subword v_{j}Between similarity, aro_{vj}Represent the excitation for the kind subword being initialised
Value, t represent the step number of iteration,The excitation value of t step iteration words to be calibrated is represented,T1 step iteration is represented to treat
Demarcate the excitation value of word.
Further, in the step 4, word v to be calibrated is predicted_{i}Mood value val_{vi}With excitation value aro_{vi}Using matrix
Operation method, it is specially：All words to be calibrated, the mood value for planting subword are represented with vectorial V, by all lists to be calibrated
Word, plant subword excitation value represented with vectorial A,
If the similarity between all words to be calibrated and kind subword forms adjacency matrix S,
Wherein, Sim (v_{i}, v_{j}) represent word v to be calibrated_{i}With kind of a subword v_{j}Between similarity, 1≤i ＜ N, 1≤j ＜
N；
Set vectorial I=(1,1^{T}, vectorial D ,=(d_{1},d_{2},...,d_{N})^{T}, wherein,
α is decay factor or confidence level, and value is between 01；Assuming that vectorial X=(x_{1},x_{2},...,x_{N})^{T}, vectorial Y=(y_{1},
y_{2},...,y_{N})^{T}, then functional operation M (X, Y)=(x_{1}×y_{1},x_{2}×y_{2},...,x_{N}×y_{N})^{T}, functional operation U (X, Y)=(x_{1}/y_{1},
x_{2}/y_{2},...,x_{N}/y_{N})^{T}；
Using formula (5) calculate mood value that t step iteration includes all vocabulary including kind of subword and word to be calibrated to
Measure V_{t}With excitation value vector A_{t}；
V_{t}=M [(ID)^{T}, V_{t}1]+M[D^{T}, U (SV_{t1}, S × I)],
A_{t}=M [(ID)^{T}, A_{t1}]+M[D^{T}, U (SA_{t1}, S × I)] (5)
Wherein, V_{t1}Represent t1 step iteration and include kind of a subword v_{j}With word v to be calibrated_{i}The mood of all vocabulary inside
Value vector, A_{t1}Represent t1 step iteration and include kind of a subword v_{j}With word v to be calibrated_{i}The excitation value vector of all vocabulary inside；
After successive ignition convergence, word v to be calibrated_{i}Mood value val_{vi}For mood value vector V_{t}Ith dimension respective counts
Value；Word v to be calibrated_{i}Excitation value aro_{vi}For excitation value vector A_{t}Ith dimension respective value.
The beneficial effects of the invention are as follows：The present invention considers contacting between word and word in language, it is proposed that weight
Graph model is improved to PageRank algorithms to be predicted to the emotion value VA of word, PageRank algorithms is carried out
Applied to after improvement in the application of emotion value VA predictions of continuous type；The difference of the present invention and PageRank algorithms are that it will be every
Weighted value of the similarity as side between two nodes in weight graph model between individual kind of subword and each word to be calibrated,
Consequently, it is possible to the higher word of similarity will make more contributions in the emotion value demarcation of word to be calibrated, so as to
The algorithm and model for being more suitable for word emotion value mark to one, improve the accuracy of emotion value prediction result.It is moreover it is possible to sharp
Go out more highquality sentiment dictionary with the weight map model creation of the present invention.
The method that the present invention carries out automatic Calibration to word emotion value, cost of labor is low, emotion value prediction result accuracy
Height, more areas can be applied to.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing
There is the required accompanying drawing used in technology description to be briefly described, it should be apparent that, drawings in the following description are only this
Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, can be with
Other accompanying drawings are obtained according to these accompanying drawings.
Fig. 1 is the twodimentional emotional space of word mood valueexcitation value.
Fig. 2 is the relation model figure of kind of subword and word to be calibrated.
Fig. 3 is PageRank algorithms and weight graph model iteration result comparison diagram of the present invention.
Embodiment
Below in conjunction with the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely retouched
State, it is clear that described embodiment is only part of the embodiment of the present invention, rather than whole embodiments.Based on the present invention
In embodiment, the every other implementation that those of ordinary skill in the art are obtained under the premise of creative work is not made
Example, belongs to the scope of protection of the invention.
Google PageRank algorithms are to calculate Web page importance and accordingly to the algorithm of webpage progress ranking, the algorithm
Purpose be to recommend the webpage the most similar to user's search result for user.The core concept of PageRank algorithms has following two
Bar：First, if a webpage illustrates that this webpage is important by a lot of other web page interlinkages if, that is,
PageRank value can be of a relatively high；Second, if the very high web page interlinkage of a PageRank value is to other webpages, then quilt
Therefore the PageRank value for the webpage being linked to can be improved correspondingly.
According to the principle of PageRank algorithms, the method for another emotional semantic classification is also suggested, and such method is to pass through
The distance for judging emotion value word to be calibrated and planting " relation " between subword demarcates emotion value, when two words have it is close
" relation " when, their emotional category should also be as being similar, by such method come the list to not demarcating emotional category
Word carries out emotional semantic classification.The core of this method be on a figure using label propagate (Rao and Ravichandran,
2009；Hassan, 2011) and PageRank algorithms (Esuli, 2007).But this method is presently mainly to be used in emotion
Category division on.
The present invention has carried out the improvement of following two aspects to PageRank algorithms so that it can be used in emotion value
(VA) in prediction；Is that Rank scores are converted into specific VA values on one side, and VA values include mood value and excitation value, see figure
1；Second aspect, the weight on the side between word to be calibrated in PageRank algorithms and the kind subword of its adjoining is regarded as phase
Together, using the similarity between word to be calibrated and the kind subword of adjoining as the side between two nodes in weight graph model
Weighted value, consequently, it is possible to which the adjacent kind subword higher with word similarity to be calibrated is demarcated in the emotion value of word to be calibrated
Middle contribution is much, and shared weight is bigger, i.e., consider simultaneously word to be calibrated and each adjacent relation planted between subword and with
Similarity between each adjacent kind subword.
The method that automatic Calibration is carried out to word emotion value, is specifically followed the steps below：
Step 1, the term vector of all words in dictionary is trained, a small amount of word is initialized by the method for handmarking,
It is denoted as kind of a subword v_{j}, remaining word is word v to be calibrated_{i}；The kind subword v being initialised_{j}Mood value be val_{vj}, excitation
It is worth for aro_{vj}；
Step 2, each kind subword v is calculated using word2vec instruments_{j}With each word v to be calibrated_{i}Term vector between
Cosine angle value, obtain each kind of subword v_{j}With each word v to be calibrated_{i}Between similarity；
Step 3, with kind of a subword v_{j}With word v to be calibrated_{i}As node, with kind of a subword v_{j}With word v to be calibrated_{i}It
Between similarity as even side right weight, build weight graph model；
Step 4, word v to be calibrated is predicted_{i}Mood value val_{vi}With excitation value aro_{vi}。
It is theoretical according to link analysis, the relational model of subword and word to be calibrated is planted, as shown in Figure 2.Word to be calibrated
Emotion value can be demarcated according to the VA values of similarity highest word in adjacent kind of subword.Word to be calibrated and its neighbour
The weighted value (i.e. similarity) on the side between inoculation subword is calculated by the Google word2vec instruments provided；Calculate
Similarity between cosine angle value between the term vector of two words, as two words.The mathematical expression of graph model is such as
Under：Nondirected graph G=(V, E), wherein, V represents the node of the set, i.e. figure of word, and E represents the side of undirected connection between node
Set.Every a line e in E represents word v to be calibrated_{i}With kind of a subword v_{j}Between similarity (v_{i},v_{j}∈V(1≤i,j≤
n；i≠j)).For each word v to be calibrated_{i}, its adjoining seed set of letters is expressed as N (v_{i})={ v_{j}(v_{i},v_{j})∈
E}.Word v to be calibrated_{i}Valence values (mood value) and Arousal (excitation value) use respectivelyWithRepresent, treat
Demarcate word v_{i}Mood value val_{vi}Computational methods such as formula (1) shown in：
Word v to be calibrated_{i}Excitation value aro_{vi}Calculated according to formula (2)：
Wherein, val_{vj}Representative species subword v_{j}Mood value, Sim (v_{i},v_{j}) represent word v to be calibrated_{i}With kind of a subword v_{j}
Between similarity.α is decay factor or confidence level, and value can limit the row in PageRank algorithms between 01
Influenceed caused by name sinkage, the phenomenon can cause result can not converge unique value；When at the beginning, word v to be calibrated_{i}
VA values be random number, any number being initialized as between 1 to 9, then pass through formula (3) and formula (4) constantly iteration afterwards and update
To convergence.
Wherein, t represents the step number of iteration；WithT step iteration word v to be calibrated is represented respectively_{i}Mood value
val_{vi}With excitation value aro_{vi}, similarlyWithT1 step iteration word v to be calibrated is represented respectively_{i}Mood value val_{vi}
With excitation value aro_{vi}。
It is worth noting that, the VA values of kind subword are all a constant in the iteration of each step, do not change.It is based on
This, the VA values of each word to be calibrated are propagated up to convergence by successive ignition in fig. 2 and got.
In order to improve the efficiency of iteration, formula (3) and formula (4) can be improved as matrix operation.By all words to be calibrated
Following two vector representations are merged into the mood value valence and excitation value arousal of kind of subword：WithIf all words to be calibrated and kind subword
Between similarity may make up adjacency matrix：
Wherein, Sim (v_{i},v_{j}) represent word v to be calibrated_{i}With kind of a subword v_{j}Between similarity, i ≠ j, 1≤i ＜ N, 1
≤ j ＜ N.Give two vectorial I=(1,1,1 ..., 1) again simultaneously^{T}With D=(d_{1},d_{2},...,d_{N})^{T}, wherein,α is the decay factor in formula (3) and formula (4)；Redefine two computings, it is assumed that to
Measure X=(x_{1},x_{2},...,x_{N})^{T}, vectorial Y=(y_{1},y_{2},...,y_{N})^{T}, then functional operation M (X, Y)=(x_{1}×y_{1},x_{2}×y_{2},...,
x_{N}×y_{N})^{T}, functional operation U (X, Y)=(x_{1}/y_{1},x_{2}/y_{2},...,x_{N}/y_{N})^{T}；
Using formula (5) calculate mood value that t step iteration includes all vocabulary including kind of subword and word to be calibrated to
Measure V_{t}With excitation value vector A_{t}；
V_{t}=M [(ID)^{T}, V_{t1}]+M[D^{T}, U (SV_{t1}, S × I)],
A_{t}=M [(ID)^{T}, A_{t1}]+M[D^{T}, U (SA_{t1}, S × I)] (5)
Wherein, V_{t1}Represent t1 step iteration and include kind of a subword v_{j}With word v to be calibrated_{i}The mood of all vocabulary inside
Value vector, A_{t1}Represent t1 step iteration and include kind of a subword v_{j}With word v to be calibrated_{i}The excitation value vector of all vocabulary inside；
After successive ignition convergence, according to mood value vector V_{t}Index obtain word v to be calibrated_{i}Mood value val_{vi}, according to excitation
It is worth vectorial A_{t}Index obtain word v to be calibrated_{i}Excitation value aro_{vi}；Word v i.e. to be calibrated_{i}Mood value val_{vi}For mood value
Vectorial V_{t}Ith dimension respective value；Word v to be calibrated_{i}Excitation value aro_{vi}For excitation value vector A_{t}Ith dimension respective value；Such as
Word v to be calibrated_{2}Mood value be mood value vector V_{t}Second dimension respective value；Such as word v to be calibrated_{3}Excitation value be sharp
Encourage the vectorial A of value_{t}Third dimension respective value.
By formula (5), the prediction of VA values can be with more efficient iteration convergence.
The calibration result of the present invention is tested using ANEW English sentiment dictionary and CAW Chinese sentiment dictionary；In reality
During testing, we are tested using cross validation.Specific operation is that data set is equally divided into several pieces, is divided in experiment
For 5 parts；Often once tested, be used as word to be calibrated using 4 parts therein as kind of a subword, remaining portion, so follow
Ring is tested 5 times, and final result takes the average value of each experimental result.
Meanwhile we are provided with two groups of contrast experiments；The method that first group of VA value using homing method is predicted, including line
Property regression algorithm and kernel function regression algorithm.For first group of two methods, plant similarity between subword and they
VA values will be trained as training set data, and remaining word is then used as test data.Second group is two kinds and is based on graph model
Method, be PageRank algorithms and weight map model algorithm proposed by the present invention respectively.
Evaluation index：
Mainly weighed for the performance of distinct methods by following three indexs：
Mean square error (Root mean square error, RMSE)：
Mean absolute error (Mean absolute error, MAE)：
Average absolute percent error (Mean absolute percentage error, MAPE)：
Wherein, A_{i}Represent the actual value of word emotion VA values to be calibrated, P_{i}Represent the prediction of word emotion VA values to be calibrated
Value.If three indexs of certain method are all than relatively low, then it represents that the predicted value of this method and actual value are closer, this method effect
Well.
Experimental result and analysis：
PageRank algorithms and weight nomography are required to be iterated operation, and we are that two algorithms set identical
Iterative steps are contrasted, and explore the effect of the different iterative steps under current data set, and we are with square mean error amount
As reference, the iteration effect of two algorithms is as shown in Figure 3.
From figure 3, it can be seen that what two algorithms all substantially tended towards stability near the 10th iteration；Weight map of the present invention
Algorithm predictionemotion value Valence and excitation value Arousal square mean error amount is relatively low, predicted value relatively connects with actual value
Closely；The final convergence result of each word is unrelated with original allocation value and decay factor.
Linear regression is respectively adopted, kernel function recurrence, PageRank algorithms, weight nomography of the present invention tested it is each
Individual index the results are shown in Table 1, and the wherein iterations of PageRank algorithms and weight graph model of the present invention is respectively provided with 50 times.
Each index result of 1 four kinds of algorithms of table
From table 1 it follows that linear regression is generally better than based on PageRank algorithms and weight nomography of the present invention
Algorithm and Kernels.Performance of the weight nomography of the present invention in MAPE (average absolute percent error) this index is excellent
It is average by low about 4%, lower than the kernel function Return Law 8% or so than PageRank algorithm in other three kinds of algorithms, compare linear regression
Method low 7%.The reason for weight nomography does very well than other several algorithms is that it considers relation between multiple nodes simultaneously
And the weighted value between them.In addition, we can also find out from result, for the prediction of Arousal values than Valence values
The difficulty of prediction is big.
Linear regression algorithm：
When the VA values prediction of word to be calibrated is carried out using method automatically or semiautomatically, generally mark in a manual manner
A small amount of emotional words as kind of a subword, recycle calculate the method for semantic similarity between word automatically from a large amount of text marks with
The similar word to be calibrated of kind subword.
Therefore, the corresponding relation of word " similarityVA values " can be generally established using linear regression, is then recycled
Similarity between word to be marked and kind subword predicts the VA values of word to be marked.Because each kind of subword has mark
The VA values remembered, therefore only need to calculate the similarity of each kind of subword and other kind of subword, can be each seed list
Word establishes the regression model of " similarityValence (or Arousal) value ", and following formula is kind of a subword seed_{j}Valence return
Return model,
Wherein, b is regression coefficient, and a is intercept.Wherein Sim (seed_{i},seed_{j}) representative species subword seed_{i}And seed_{j}
Semantic similarity,For seed_{i}Valence values.Therefore, if by seed_{j}With the similarity of other kind of subword with
And Valence (or Arousal) value of these seed vocabulary substitutes into, you can training obtains seed_{j}Valence (or
Arousal regression model).
If to predict the word VA values to be marked by checking using this regression model, it is only necessary to by these words with
seed_{j}Similarity input, such as：Assuming that certain word unseen to be marked_{i}With seed_{j}Similarity exceed threshold values and logical
When crossing checking, unseen_{i}Emotion vocabulary will be chosen as, now can be by unseen_{i}With seed_{j}Similarity input seed_{j}Return
Return model to predict cand_{i}VA values, be shown below,
If both can be used not only more than the threshold values of a kind subword in similarity of the word to be marked with planting subword
Regression model caused by the maximum kind subword training of similarity.
Kernel function regression algorithm：
Based on this, another method can introduce kernel function in the linear regression model (LRM) of routine, for being carried out to similarity
Conversion linearly or nonlinearly, regression model are defined as：
Wherein f represents kernel function, can be the nonlinear function such as linear function or square root, logarithm.Based on this
Individual model, the VA values of word to be markedCan be by the VA values of all kind subwords similar to itsWith them it
Between semantic similarity Sim (unseen_{i},seed_{j}) be predicted.It is final test result indicates that, when kernel function is linear function
Best prediction effect can be obtained.
PageRank algorithms：
Also the VA values for having some documents to propose to be carried out vocabulary using PageRank algorithms are predicted.This kind of method is generally by seed
As node, vocabulary and similarity are described as graph model, and introduce by the relation between vocabulary as side for word, word to be marked
PageRank algorithms are solved.In the method, the VA values of each node (word) to be marked are updated according to the following formula：
Wherein, w_{i}And w_{j}Word to be marked and kind subword, Nei (w are represented respectively_{i}) represent and node (word) w_{i}There is side
The set of connected all of its neighbor node (word).Represent the Valence values prediction of the t times interative computation of word to be marked
As a result, e is a constant, and α is the attenuation coefficient in PageRank algorithms, and usual value is (0,1).Based on formula (12), candidate
The VA values of vocabulary can be predicted by PageRank interative computation using the VA values of seed vocabulary.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all
Any modification, equivalent substitution and improvements made within the spirit and principles in the present invention etc., are all contained in protection scope of the present invention
It is interior.
Claims (4)
 A kind of 1. method that automatic Calibration is carried out to word emotion value, it is characterised in that specifically follow the steps below：Step 1, the term vector of all words in dictionary is trained, a small amount of word is initialized by the method for handmarking, is denoted as Kind subword v_{j}, remaining word is word v to be calibrated_{i}；The kind subword v being initialised_{j}Mood value be val_{vj}, excitation value be aro_{vj}；Step 2, each kind subword v is calculated using word2vec instruments_{j}With each word v to be calibrated_{i}Term vector between it is remaining String angle value, obtain each kind of subword v_{j}With each word v to be calibrated_{i}Between similarity；Step 3, with kind of a subword v_{j}With word v to be calibrated_{i}As node, with kind of a subword v_{j}With word v to be calibrated_{i}Between phase Like degree as even side right weight, weight graph model is built；Step 4, word v to be calibrated is predicted_{i}Mood value val_{vi}With excitation value aro_{vi}。
 A kind of 2. method that automatic Calibration is carried out to word emotion value according to claim 1, it is characterised in that the step In rapid 4, word v to be calibrated is predicted_{i}Mood value val_{vi}, continuous iteration is carried out by formula (3) and is updated to convergence：Wherein, α is decay factor or confidence level, and value is between 01, and random number value is between 19, Sim (v_{i},v_{j}) generation Table word v to be calibrated_{i}With kind of a subword v_{j}Between similarity, val_{vj}Represent the mood value for the kind subword being initialised, t generations The step number of table iteration,The mood value of t step iteration words to be calibrated is represented,Represent t1 step iteration lists to be calibrated The mood value of word.
 A kind of 3. method that automatic Calibration is carried out to word emotion value according to claim 1, it is characterised in that the step In rapid 4, the excitation value aro of word to be calibrated is predicted_{vi}Method, continuous iteration is carried out by formula (4) and is updated to convergence：Wherein, α is decay factor or confidence level, and value is between 01；Random number value is between 19, Sim (v_{i},v_{j}) generation Table word v to be calibrated_{i}With kind of a subword v_{j}Between similarity, aro_{vj}Represent the excitation value for the kind subword being initialised, t generations The step number of table iteration,The excitation value of t step iteration words to be calibrated is represented,Represent t1 step iteration lists to be calibrated The excitation value of word.
 A kind of 4. method that automatic Calibration is carried out to word emotion value according to claim 1, it is characterised in that the step In rapid 4, word v to be calibrated is predicted_{i}Mood value val_{vi}With excitation value aro_{vi}Using matrix operation method, it is specially：Will be all Word to be calibrated, the mood value for planting subword represent with vectorial V, by all words to be calibrated, plant the excitation value of subword with to A is measured to represent,If all words to be calibrated and kind Similarity between subword forms adjacency matrix S,<mrow> <mi>S</mi> <mo>=</mo> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <mrow> <mi>S</mi> <mi>i</mi> <mi>m</mi> <mrow> <mo>(</mo> <msub> <mi>v</mi> <mn>1</mn> </msub> <mo>,</mo> <msub> <mi>v</mi> <mn>1</mn> </msub> <mo>)</mo> </mrow> </mrow> </mtd> <mtd> <mn>...</mn> </mtd> <mtd> <mrow> <mi>S</mi> <mi>i</mi> <mi>m</mi> <mrow> <mo>(</mo> <msub> <mi>v</mi> <mn>1</mn> </msub> <mo>,</mo> <msub> <mi>v</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </mrow> </mtd> <mtd> <mn>...</mn> </mtd> <mtd> <mrow> <mi>S</mi> <mi>i</mi> <mi>m</mi> <mrow> <mo>(</mo> <msub> <mi>v</mi> <mn>1</mn> </msub> <mo>,</mo> <msub> <mi>v</mi> <mi>N</mi> </msub> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> <mtd> <mo>.</mo> </mtd> <mtd> <mo>.</mo> </mtd> <mtd> <mo>.</mo> </mtd> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> <mtd> <mo>.</mo> </mtd> <mtd> <mo>.</mo> </mtd> <mtd> <mo>.</mo> </mtd> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> <mtd> <mo>.</mo> </mtd> <mtd> <mo>.</mo> </mtd> <mtd> <mo>.</mo> </mtd> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mrow> <mi>S</mi> <mi>i</mi> <mi>m</mi> <mrow> <mo>(</mo> <msub> <mi>v</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>v</mi> <mn>1</mn> </msub> <mo>)</mo> </mrow> </mrow> </mtd> <mtd> <mn>...</mn> </mtd> <mtd> <mrow> <mi>S</mi> <mi>i</mi> <mi>m</mi> <mrow> <mo>(</mo> <msub> <mi>v</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>v</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </mrow> </mtd> <mtd> <mn>...</mn> </mtd> <mtd> <mrow> <mi>S</mi> <mi>i</mi> <mi>m</mi> <mrow> <mo>(</mo> <msub> <mi>v</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>v</mi> <mi>N</mi> </msub> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> <mtd> <mo>.</mo> </mtd> <mtd> <mo>.</mo> </mtd> <mtd> <mo>.</mo> </mtd> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> <mtd> <mo>.</mo> </mtd> <mtd> <mo>.</mo> </mtd> <mtd> <mo>.</mo> </mtd> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> <mtd> <mo>.</mo> </mtd> <mtd> <mo>.</mo> </mtd> <mtd> <mo>.</mo> </mtd> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mrow> <mi>S</mi> <mi>i</mi> <mi>m</mi> <mrow> <mo>(</mo> <msub> <mi>v</mi> <mi>N</mi> </msub> <mo>,</mo> <msub> <mi>v</mi> <mn>1</mn> </msub> <mo>)</mo> </mrow> </mrow> </mtd> <mtd> <mn>...</mn> </mtd> <mtd> <mrow> <mi>S</mi> <mi>i</mi> <mi>m</mi> <mrow> <mo>(</mo> <msub> <mi>v</mi> <mi>N</mi> </msub> <mo>,</mo> <msub> <mi>v</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </mrow> </mtd> <mtd> <mn>...</mn> </mtd> <mtd> <mrow> <mi>S</mi> <mi>i</mi> <mi>m</mi> <mrow> <mo>(</mo> <msub> <mi>v</mi> <mi>N</mi> </msub> <mo>,</mo> <msub> <mi>v</mi> <mi>N</mi> </msub> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>;</mo> </mrow>Wherein, Sim (v_{i}, v_{j}) represent word v to be calibrated_{i}With kind of a subword v_{j}Between similarity, 1≤i ＜ N, 1≤j ＜ N；Set vectorial I=(1,1^{T}, 1, to amount ..D ,=(1d_{1},)d_{2},...,d_{N})^{T}, wherein, α is decay factor or confidence level, and value is between 01；Assuming that vectorial X=(x_{1},x_{2},...,x_{N})^{T}, vectorial Y=(y_{1}, y_{2},...,y_{N})^{T}, then functional operation M (X, Y)=(x_{1}×y_{1},x_{2}×y_{2},...,x_{N}×y_{N})^{T}, functional operation U (X, Y)=(x_{1}/y_{1}, x_{2}/y_{2},...,x_{N}/y_{N})^{T}；The mood value vector V of all vocabulary including kind of subword and word to be calibrated is included using formula (5) calculating t step iteration_{t}With Excitation value vector A_{t}；V_{t}=M [(ID)^{T}, V_{t}_{1}]+M[D^{T}, U (SV_{t1}, S × I)],A_{t}=M [(ID)^{T}, A_{t}_{1}]+M[D^{T}, U (SA_{t1}, S ×)] (5)Wherein, V_{t1}Represent t1 step iteration and include kind of a subword v_{j}With word v to be calibrated_{i}Inside the mood value of all vocabulary to Amount, A_{t1}Represent t1 step iteration and include kind of a subword v_{j}With word v to be calibrated_{i}The excitation value vector of all vocabulary inside；After successive ignition convergence, word v to be calibrated_{i}Mood value val_{vi}For mood value vector V_{t}Ith dimension respective value；Treat Demarcate word v_{i}Excitation value aro_{vi}For excitation value vector A_{t}Ith dimension respective value.
Priority Applications (1)
Application Number  Priority Date  Filing Date  Title 

CN201711105704.5A CN107766331A (en)  20171110  20171110  The method that automatic Calibration is carried out to word emotion value 
Applications Claiming Priority (1)
Application Number  Priority Date  Filing Date  Title 

CN201711105704.5A CN107766331A (en)  20171110  20171110  The method that automatic Calibration is carried out to word emotion value 
Publications (1)
Publication Number  Publication Date 

CN107766331A true CN107766331A (en)  20180306 
Family
ID=61273802
Family Applications (1)
Application Number  Title  Priority Date  Filing Date 

CN201711105704.5A Pending CN107766331A (en)  20171110  20171110  The method that automatic Calibration is carried out to word emotion value 
Country Status (1)
Country  Link 

CN (1)  CN107766331A (en) 
Cited By (6)
Publication number  Priority date  Publication date  Assignee  Title 

CN108491393A (en) *  20180329  20180904  国信优易数据有限公司  A kind of emotion word emotional intensity side of determination and device 
CN108563635A (en) *  20180404  20180921  北京理工大学  A kind of sentiment dictionary fast construction method based on emotion wheel model 
CN108595679A (en) *  20180502  20180928  武汉斗鱼网络科技有限公司  A kind of label determines method, apparatus, terminal and storage medium 
CN109800804A (en) *  20190110  20190524  华南理工大学  A kind of method and system realizing the susceptible sense of image and independently converting 
CN109933793A (en) *  20190315  20190625  腾讯科技（深圳）有限公司  Text polarity identification method, apparatus, equipment and readable storage medium storing program for executing 
CN113326694A (en) *  20210518  20210831  西华大学  Implicit emotion dictionary generation method based on emotion propagation 
Citations (8)
Publication number  Priority date  Publication date  Assignee  Title 

KR20130022165A (en) *  20110825  20130306  성균관대학교산학협력단  System for prediction of emotional response based on responsibility value of usergroup having similar emotion, and method of thereof 
CN103985000A (en) *  20140605  20140813  武汉大学  Mediumandlong term typical daily load curve prediction method based on function type nonparametric regression 
CN104732203A (en) *  20150305  20150624  中国科学院软件研究所  Emotion recognizing and tracking method based on video information 
CN106156004A (en) *  20160704  20161123  中国传媒大学  The sentiment analysis system and method for film comment information based on term vector 
CN106547866A (en) *  20161024  20170329  西安邮电大学  A kind of fine granularity sensibility classification method based on the random cooccurrence network of emotion word 
CN106598935A (en) *  20151016  20170426  北京国双科技有限公司  Method and apparatus for determining emotional tendency of document 
CN106708953A (en) *  20161128  20170524  西安电子科技大学  Discrete particle swarm optimization based local community detection collaborative filtering recommendation method 
CN106951514A (en) *  20170317  20170714  合肥工业大学  A kind of automobile Method for Sales Forecast method for considering brand emotion 

2017
 20171110 CN CN201711105704.5A patent/CN107766331A/en active Pending
Patent Citations (8)
Publication number  Priority date  Publication date  Assignee  Title 

KR20130022165A (en) *  20110825  20130306  성균관대학교산학협력단  System for prediction of emotional response based on responsibility value of usergroup having similar emotion, and method of thereof 
CN103985000A (en) *  20140605  20140813  武汉大学  Mediumandlong term typical daily load curve prediction method based on function type nonparametric regression 
CN104732203A (en) *  20150305  20150624  中国科学院软件研究所  Emotion recognizing and tracking method based on video information 
CN106598935A (en) *  20151016  20170426  北京国双科技有限公司  Method and apparatus for determining emotional tendency of document 
CN106156004A (en) *  20160704  20161123  中国传媒大学  The sentiment analysis system and method for film comment information based on term vector 
CN106547866A (en) *  20161024  20170329  西安邮电大学  A kind of fine granularity sensibility classification method based on the random cooccurrence network of emotion word 
CN106708953A (en) *  20161128  20170524  西安电子科技大学  Discrete particle swarm optimization based local community detection collaborative filtering recommendation method 
CN106951514A (en) *  20170317  20170714  合肥工业大学  A kind of automobile Method for Sales Forecast method for considering brand emotion 
NonPatent Citations (1)
Title 

王津: "基于ValenceArousal空间的中文文本情感分析方法研究", 《中国博士学位论文全文数据库 信息科技辑》 * 
Cited By (8)
Publication number  Priority date  Publication date  Assignee  Title 

CN108491393A (en) *  20180329  20180904  国信优易数据有限公司  A kind of emotion word emotional intensity side of determination and device 
CN108491393B (en) *  20180329  20220520  国信优易数据股份有限公司  Emotion strength determining party and device for emotion words 
CN108563635A (en) *  20180404  20180921  北京理工大学  A kind of sentiment dictionary fast construction method based on emotion wheel model 
CN108595679A (en) *  20180502  20180928  武汉斗鱼网络科技有限公司  A kind of label determines method, apparatus, terminal and storage medium 
CN109800804A (en) *  20190110  20190524  华南理工大学  A kind of method and system realizing the susceptible sense of image and independently converting 
CN109933793A (en) *  20190315  20190625  腾讯科技（深圳）有限公司  Text polarity identification method, apparatus, equipment and readable storage medium storing program for executing 
CN109933793B (en) *  20190315  20230106  腾讯科技（深圳）有限公司  Text polarity identification method, device and equipment and readable storage medium 
CN113326694A (en) *  20210518  20210831  西华大学  Implicit emotion dictionary generation method based on emotion propagation 
Similar Documents
Publication  Publication Date  Title 

Wang et al.  Deep learning for aspectbased sentiment analysis  
CN107766331A (en)  The method that automatic Calibration is carried out to word emotion value  
Mahmoudi et al.  Deep neural networks understand investors better  
Stojanovski et al.  Twitter sentiment analysis using deep convolutional neural network  
CN110245229A (en)  A kind of deep learning theme sensibility classification method based on data enhancing  
CN108804612B (en)  Text emotion classification method based on dual neural network model  
Chang et al.  Research on detection methods based on Doc2vec abnormal comments  
CN110929034A (en)  Commodity comment finegrained emotion classification method based on improved LSTM  
CN107315738A (en)  A kind of innovation degree appraisal procedure of text message  
CN108984775B (en)  Public opinion monitoring method and system based on commodity comments  
Fan et al.  Multitask neural learning architecture for endtoend identification of helpful reviews  
CN109299258A (en)  A kind of public sentiment event detecting method, device and equipment  
Kang et al.  Deep recurrent convolutional networks for inferring user interests from social media  
Swathi et al.  An optimal deep learningbased LSTM for stock price prediction using twitter sentiment analysis  
CN110134934A (en)  Text emotion analysis method and device  
Talpada et al.  An analysis on use of deep learning and lexicalsemantic based sentiment analysis method on twitter data to understand the demographic trend of telemedicine  
CN111241271B (en)  Text emotion classification method and device and electronic equipment  
CN111325018A (en)  Domain dictionary construction method based on web retrieval and new word discovery  
Chen et al.  Contentbased influence modeling for opinion behavior prediction  
Thomas et al.  Sentimental analysis using recurrent neural network  
Mitroi et al.  Sentiment analysis using topicdocument embeddings  
Zhang et al.  Bidirectional long shortterm memory for sentiment analysis of Chinese product reviews  
CN113392209A (en)  Text clustering method based on artificial intelligence, related equipment and storage medium  
Lee  Document vectorization method using network information of words  
CN112084333B (en)  Social user generation method based on emotional tendency analysis 
Legal Events
Date  Code  Title  Description 

PB01  Publication  
PB01  Publication  
SE01  Entry into force of request for substantive examination  
SE01  Entry into force of request for substantive examination  
RJ01  Rejection of invention patent application after publication 
Application publication date: 20180306 

RJ01  Rejection of invention patent application after publication 