CN107766331A - The method that automatic Calibration is carried out to word emotion value - Google Patents

The method that automatic Calibration is carried out to word emotion value Download PDF

Info

Publication number
CN107766331A
CN107766331A CN201711105704.5A CN201711105704A CN107766331A CN 107766331 A CN107766331 A CN 107766331A CN 201711105704 A CN201711105704 A CN 201711105704A CN 107766331 A CN107766331 A CN 107766331A
Authority
CN
China
Prior art keywords
mtd
word
value
mrow
msub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711105704.5A
Other languages
Chinese (zh)
Inventor
王津
彭博
张学杰
张骥先
杨旭涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yunnan University YNU
Original Assignee
Yunnan University YNU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yunnan University YNU filed Critical Yunnan University YNU
Priority to CN201711105704.5A priority Critical patent/CN107766331A/en
Publication of CN107766331A publication Critical patent/CN107766331A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The invention discloses a kind of method that automatic Calibration is carried out to word emotion value, trains the term vector of all words in dictionary, initializes a small amount of word by the method for handmarking, be denoted as kind of a sub-word vj;Calculate each kind sub-word vjWith each word v to be calibratediTerm vector between cosine angle value, obtain each kind of sub-word vjWith each word v to be calibratediBetween similarity;With kind of a sub-word vjWith word v to be calibratediAs node, with kind of a sub-word vjWith word v to be calibratediBetween similarity as even side right weight, build weight graph model;Predict word v to be calibratediMood value valviWith excitation value arovi.The present invention avoids a large amount of use from manually demarcating, and the accuracy of emotion value prediction result is high, solves the problem of prediction result accuracy of emotion value Forecasting Methodology in the prior art is low, and application is restricted.

Description

The method that automatic Calibration is carried out to word emotion value
Technical field
The invention belongs to natural language processing technique field, is related to a kind of side that automatic Calibration is carried out to word emotion value Method.
Background technology
The present invention is a kind of method for aiding in automatic sentiment dictionary to create, and an outstanding sentiment dictionary can be many feelings The effect of sense analysis application provides safeguard.Substantial amounts of people is needed due to sentiment dictionary create using artificial method at present Power material resources and time, therefore, creating sentiment dictionary using automatic method becomes inevitable choice.
" sentiment analysis " is related to the problem of many difficult, and there is certain contact between these problems.As a rule, These problems are typically all the emotion value for needing to automatically detect that text, and it is positive or passive to identify this text , or without Sentiment orientation.Moreover, with the fast development of online network social intercourse service, anyone may be used on network With by a microblogging, push away text, or a circle of friends to express oneself view to a certain part thing, to certain the same thing Happiness dislike.These emotion informations are very important resource and business opportunity for service provider and advertiser, major social activity Network company sets up special segment analysis and the microblogging that excavation user delivers and pushes away emotion trend information and the user in text Viewpoint information.Sentiment analysis technology how is improved, and then improves the degree of accuracy and the efficiency of analysis emotion information, becomes a weight Want developing direction.On the other hand, not only need to use outstanding algorithm to be analyzed, it is more crucial and basis to be the need for one Generally acknowledged can accurately give expression to the emotion method for expressing distinguished between all kinds of emotions.In emotion information analysis, in general There is the method that two kinds of emotions represent, be discrete classification type method for expressing and continuous dimension type method for expressing respectively.The former allusion quotation The representative of type has binary classification method for expressing, and emotion is simply divided into two kinds of positive and passive classifications;Also Robert Eight yuan of basic emotional semantic classifications that Plutchik is proposed, eight class emotions are happy, sad, angry, frightened respectively, detest, be surprised, letter Appoint and expect.Sentiment analysis application based on this emotion method for expressing has such as extraction of spam detection, viewpoint and viewpoint Identification etc. has high actual application value and development prospect.
Continuous dimension type emotion method for expressing has equally attracted substantial amounts of concern in recent years.Compared to the expression of classification type Method, the method for expressing of dimension type can allow sentiment analysis more accurate and careful.Because the sentiment analysis of dimension type can lead to Various dimensions emotion value is crossed to represent the emotion expressed by a word, the emotion of such a word can be with use unique one Individual coordinate represents, even express the word of close emotion.For example, in VA (Valence-Arousal;Mood value-excitation Value) two-dimentional emotional space, as shown in Figure 1 in, the V values of the two words of " happiness " and " mad with joy " --- representative is positive or disappears The degree of pole --- will be very nearly the same, and representing the A values of emotion intensity can then have a long way to go.
On the premise of this kind of dimension type emotional space, the task of sentiment analysis be usually to word, sentence either Text carries out the demarcation of emotion value.Also, the emotion value for carrying out sentence and text is demarcated generally based on the emotion value of word. Therefore, it is essential to demarcate excellent sentiment dictionary.Complete the method for this task from big classification for have two kinds, One of which method is by manually demarcating, and in general flow is to allow different demarcation personnel to some word according to certainly Oneself judgement demarcation emotion value, the processing further according to the progress of proprietary calibration result necessarily mathematically afterwards, so as to draw this The final emotion value of individual word.The cost so done is to need substantial amounts of manpower and time, with high costs.Another method is logical Cross the demarcation that computer automatically carries out emotion value using the method for machine learning.However, for this task, use Machine learning algorithm is typically all the algorithm for having supervision, and this requires the sentiment dictionary established in advance and provides training for algorithm Collection.
Generally acknowledge that the higher sentiment dictionary of degree has at present:ANEW (the Affective Norms for English of English Words), comprising 1034 English words for being labelled with VA values;Chinese CAW (Chinses Affective Words) and CVAW (Chinese Valence-Arousal Words), the English for being labelled with VA values comprising 162 and 1653 respectively are single Word, other outstanding sentiment dictionaries do not enumerate.These sentiment dictionaries are the bases for realizing automatic marking sentiment dictionary.
Wei and Malandrakis et al. have used several methods based on recurrence to carry out emotion before and after 2011 Value prediction.These methods are typically used as training set (kind sub-word by sentiment dictionary;Seed words), train corresponding mould Type, emotion value demarcation is carried out to the word (unseen words) for not demarcating emotion value by the model again afterwards.Such emotion It is worth Forecasting Methodology, the accuracy of prediction result is low.
The content of the invention
To achieve the above object, the present invention provides a kind of method that automatic Calibration is carried out to word emotion value, avoids a large amount of Using artificial demarcation, the accuracy of emotion value prediction result is high, solves the prediction knot of emotion value Forecasting Methodology in the prior art The problem of fruit accuracy is low, and application is restricted.
The technical solution adopted in the present invention is, a kind of method that automatic Calibration is carried out to word emotion value, specifically according to Following steps are carried out:
Step 1, the term vector of all words in dictionary is trained, a small amount of word is initialized by the method for handmarking, It is denoted as kind of a sub-word vj, remaining word is word v to be calibratedi;The kind sub-word v being initialisedjMood value be valvj, excitation It is worth for arovj
Step 2, each kind sub-word v is calculated using word2vec instrumentsjWith each word v to be calibratediTerm vector between Cosine angle value, obtain each kind of sub-word vjWith each word v to be calibratediBetween similarity;
Step 3, with kind of a sub-word vjWith word v to be calibratediAs node, with kind of a sub-word vjWith word v to be calibratediIt Between similarity as even side right weight, build weight graph model;
Step 4, word v to be calibrated is predictediMood value valviWith excitation value arovi
The present invention is further characterized in that, further, in the step 4, predicts word v to be calibratediMood value valvi, Continuous iteration is carried out by formula (3) and is updated to convergence:
Wherein, α is decay factor or confidence level, and value is between 0-1, and random number value is between 1-9, Sim (vi, vj) represent word v to be calibratediWith kind of a sub-word vjBetween similarity, valvjRepresent the mood for the kind sub-word being initialised Value, t represent the step number of iteration,The mood value of t step iteration words to be calibrated is represented,T-1 step iteration is represented to treat Demarcate the mood value of word.
Further, in the step 4, the excitation value aro of word to be calibrated is predictedviMethod, by formula (4) carry out not Disconnected iteration is updated to convergence:
Wherein, α is decay factor or confidence level, and value is between 0-1;Random number value is between 1-9, Sim (vi, vj) represent word v to be calibratediWith kind of a sub-word vjBetween similarity, arovjRepresent the excitation for the kind sub-word being initialised Value, t represent the step number of iteration,The excitation value of t step iteration words to be calibrated is represented,T-1 step iteration is represented to treat Demarcate the excitation value of word.
Further, in the step 4, word v to be calibrated is predictediMood value valviWith excitation value aroviUsing matrix Operation method, it is specially:All words to be calibrated, the mood value for planting sub-word are represented with vectorial V, by all lists to be calibrated Word, plant sub-word excitation value represented with vectorial A, If the similarity between all words to be calibrated and kind sub-word forms adjacency matrix S,
Wherein, Sim (vi, vj) represent word v to be calibratediWith kind of a sub-word vjBetween similarity, 1≤i < N, 1≤j < N;
Set vectorial I=(1,1T, vectorial D ,=(d1,d2,...,dN)T, wherein, α is decay factor or confidence level, and value is between 0-1;Assuming that vectorial X=(x1,x2,...,xN)T, vectorial Y=(y1, y2,...,yN)T, then functional operation M (X, Y)=(x1×y1,x2×y2,...,xN×yN)T, functional operation U (X, Y)=(x1/y1, x2/y2,...,xN/yN)T
Using formula (5) calculate mood value that t step iteration includes all vocabulary including kind of sub-word and word to be calibrated to Measure VtWith excitation value vector At
Vt=M [(I-D)T, Vt-1]+M[DT, U (SVt-1, S × I)],
At=M [(I-D)T, At-1]+M[DT, U (SAt-1, S × I)] (5)
Wherein, Vt-1Represent t-1 step iteration and include kind of a sub-word vjWith word v to be calibratediThe mood of all vocabulary inside Value vector, At-1Represent t-1 step iteration and include kind of a sub-word vjWith word v to be calibratediThe excitation value vector of all vocabulary inside;
After successive ignition convergence, word v to be calibratediMood value valviFor mood value vector VtI-th dimension respective counts Value;Word v to be calibratediExcitation value aroviFor excitation value vector AtI-th dimension respective value.
The beneficial effects of the invention are as follows:The present invention considers contacting between word and word in language, it is proposed that weight Graph model is improved to PageRank algorithms to be predicted to the emotion value VA of word, PageRank algorithms is carried out Applied to after improvement in the application of emotion value VA predictions of continuous type;The difference of the present invention and PageRank algorithms are that it will be every Weighted value of the similarity as side between two nodes in weight graph model between individual kind of sub-word and each word to be calibrated, Consequently, it is possible to the higher word of similarity will make more contributions in the emotion value demarcation of word to be calibrated, so as to The algorithm and model for being more suitable for word emotion value mark to one, improve the accuracy of emotion value prediction result.It is moreover it is possible to sharp Go out more high-quality sentiment dictionary with the weight map model creation of the present invention.
The method that the present invention carries out automatic Calibration to word emotion value, cost of labor is low, emotion value prediction result accuracy Height, more areas can be applied to.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing There is the required accompanying drawing used in technology description to be briefly described, it should be apparent that, drawings in the following description are only this Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, can be with Other accompanying drawings are obtained according to these accompanying drawings.
Fig. 1 is the two-dimentional emotional space of word mood value-excitation value.
Fig. 2 is the relation model figure of kind of sub-word and word to be calibrated.
Fig. 3 is PageRank algorithms and weight graph model iteration result comparison diagram of the present invention.
Embodiment
Below in conjunction with the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely retouched State, it is clear that described embodiment is only part of the embodiment of the present invention, rather than whole embodiments.Based on the present invention In embodiment, the every other implementation that those of ordinary skill in the art are obtained under the premise of creative work is not made Example, belongs to the scope of protection of the invention.
Google PageRank algorithms are to calculate Web page importance and accordingly to the algorithm of webpage progress ranking, the algorithm Purpose be to recommend the webpage the most similar to user's search result for user.The core concept of PageRank algorithms has following two Bar:First, if a webpage illustrates that this webpage is important by a lot of other web page interlinkages if, that is, PageRank value can be of a relatively high;Second, if the very high web page interlinkage of a PageRank value is to other webpages, then quilt Therefore the PageRank value for the webpage being linked to can be improved correspondingly.
According to the principle of PageRank algorithms, the method for another emotional semantic classification is also suggested, and such method is to pass through The distance for judging emotion value word to be calibrated and planting " relation " between sub-word demarcates emotion value, when two words have it is close " relation " when, their emotional category should also be as being similar, by such method come the list to not demarcating emotional category Word carries out emotional semantic classification.The core of this method be on a figure using label propagate (Rao and Ravichandran, 2009;Hassan, 2011) and PageRank algorithms (Esuli, 2007).But this method is presently mainly to be used in emotion Category division on.
The present invention has carried out the improvement of following two aspects to PageRank algorithms so that it can be used in emotion value (VA) in prediction;Is that Rank scores are converted into specific VA values on one side, and VA values include mood value and excitation value, see figure 1;Second aspect, the weight on the side between word to be calibrated in PageRank algorithms and the kind sub-word of its adjoining is regarded as phase Together, using the similarity between word to be calibrated and the kind sub-word of adjoining as the side between two nodes in weight graph model Weighted value, consequently, it is possible to which the adjacent kind sub-word higher with word similarity to be calibrated is demarcated in the emotion value of word to be calibrated Middle contribution is much, and shared weight is bigger, i.e., consider simultaneously word to be calibrated and each adjacent relation planted between sub-word and with Similarity between each adjacent kind sub-word.
The method that automatic Calibration is carried out to word emotion value, is specifically followed the steps below:
Step 1, the term vector of all words in dictionary is trained, a small amount of word is initialized by the method for handmarking, It is denoted as kind of a sub-word vj, remaining word is word v to be calibratedi;The kind sub-word v being initialisedjMood value be valvj, excitation It is worth for arovj
Step 2, each kind sub-word v is calculated using word2vec instrumentsjWith each word v to be calibratediTerm vector between Cosine angle value, obtain each kind of sub-word vjWith each word v to be calibratediBetween similarity;
Step 3, with kind of a sub-word vjWith word v to be calibratediAs node, with kind of a sub-word vjWith word v to be calibratediIt Between similarity as even side right weight, build weight graph model;
Step 4, word v to be calibrated is predictediMood value valviWith excitation value arovi
It is theoretical according to link analysis, the relational model of sub-word and word to be calibrated is planted, as shown in Figure 2.Word to be calibrated Emotion value can be demarcated according to the VA values of similarity highest word in adjacent kind of sub-word.Word to be calibrated and its neighbour The weighted value (i.e. similarity) on the side between inoculation sub-word is calculated by the Google word2vec instruments provided;Calculate Similarity between cosine angle value between the term vector of two words, as two words.The mathematical expression of graph model is such as Under:Non-directed graph G=(V, E), wherein, V represents the node of the set, i.e. figure of word, and E represents the side of undirected connection between node Set.Every a line e in E represents word v to be calibratediWith kind of a sub-word vjBetween similarity (vi,vj∈V(1≤i,j≤ n;i≠j)).For each word v to be calibratedi, its adjoining seed set of letters is expressed as N (vi)={ vj|(vi,vj)∈ E}.Word v to be calibratediValence values (mood value) and Arousal (excitation value) use respectivelyWithRepresent, treat Demarcate word viMood value valviComputational methods such as formula (1) shown in:
Word v to be calibratediExcitation value aroviCalculated according to formula (2):
Wherein, valvjRepresentative species sub-word vjMood value, Sim (vi,vj) represent word v to be calibratediWith kind of a sub-word vj Between similarity.α is decay factor or confidence level, and value can limit the row in PageRank algorithms between 0-1 Influenceed caused by name sinkage, the phenomenon can cause result can not converge unique value;When at the beginning, word v to be calibratedi VA values be random number, any number being initialized as between 1 to 9, then pass through formula (3) and formula (4) constantly iteration afterwards and update To convergence.
Wherein, t represents the step number of iteration;WithT step iteration word v to be calibrated is represented respectivelyiMood value valviWith excitation value arovi, similarlyWithT-1 step iteration word v to be calibrated is represented respectivelyiMood value valvi With excitation value arovi
It is worth noting that, the VA values of kind sub-word are all a constant in the iteration of each step, do not change.It is based on This, the VA values of each word to be calibrated are propagated up to convergence by successive ignition in fig. 2 and got.
In order to improve the efficiency of iteration, formula (3) and formula (4) can be improved as matrix operation.By all words to be calibrated Following two vector representations are merged into the mood value valence and excitation value arousal of kind of sub-word:WithIf all words to be calibrated and kind sub-word Between similarity may make up adjacency matrix:
Wherein, Sim (vi,vj) represent word v to be calibratediWith kind of a sub-word vjBetween similarity, i ≠ j, 1≤i < N, 1 ≤ j < N.Give two vectorial I=(1,1,1 ..., 1) again simultaneouslyTWith D=(d1,d2,...,dN)T, wherein,α is the decay factor in formula (3) and formula (4);Re-define two computings, it is assumed that to Measure X=(x1,x2,...,xN)T, vectorial Y=(y1,y2,...,yN)T, then functional operation M (X, Y)=(x1×y1,x2×y2,..., xN×yN)T, functional operation U (X, Y)=(x1/y1,x2/y2,...,xN/yN)T
Using formula (5) calculate mood value that t step iteration includes all vocabulary including kind of sub-word and word to be calibrated to Measure VtWith excitation value vector At
Vt=M [(I-D)T, Vt-1]+M[DT, U (SVt-1, S × I)],
At=M [(I-D)T, At-1]+M[DT, U (SAt-1, S × I)] (5)
Wherein, Vt-1Represent t-1 step iteration and include kind of a sub-word vjWith word v to be calibratediThe mood of all vocabulary inside Value vector, At-1Represent t-1 step iteration and include kind of a sub-word vjWith word v to be calibratediThe excitation value vector of all vocabulary inside; After successive ignition convergence, according to mood value vector VtIndex obtain word v to be calibratediMood value valvi, according to excitation It is worth vectorial AtIndex obtain word v to be calibratediExcitation value arovi;Word v i.e. to be calibratediMood value valviFor mood value Vectorial VtI-th dimension respective value;Word v to be calibratediExcitation value aroviFor excitation value vector AtI-th dimension respective value;Such as Word v to be calibrated2Mood value be mood value vector VtSecond dimension respective value;Such as word v to be calibrated3Excitation value be sharp Encourage the vectorial A of valuetThird dimension respective value.
By formula (5), the prediction of VA values can be with more efficient iteration convergence.
The calibration result of the present invention is tested using ANEW English sentiment dictionary and CAW Chinese sentiment dictionary;In reality During testing, we are tested using cross validation.Specific operation is that data set is equally divided into several pieces, is divided in experiment For 5 parts;Often once tested, be used as word to be calibrated using 4 parts therein as kind of a sub-word, remaining portion, so follow Ring is tested 5 times, and final result takes the average value of each experimental result.
Meanwhile we are provided with two groups of contrast experiments;The method that first group of VA value using homing method is predicted, including line Property regression algorithm and kernel function regression algorithm.For first group of two methods, plant similarity between sub-word and they VA values will be trained as training set data, and remaining word is then used as test data.Second group is two kinds and is based on graph model Method, be PageRank algorithms and weight map model algorithm proposed by the present invention respectively.
Evaluation index:
Mainly weighed for the performance of distinct methods by following three indexs:
Mean square error (Root mean square error, RMSE):
Mean absolute error (Mean absolute error, MAE):
Average absolute percent error (Mean absolute percentage error, MAPE):
Wherein, AiRepresent the actual value of word emotion VA values to be calibrated, PiRepresent the prediction of word emotion VA values to be calibrated Value.If three indexs of certain method are all than relatively low, then it represents that the predicted value of this method and actual value are closer, this method effect Well.
Experimental result and analysis:
PageRank algorithms and weight nomography are required to be iterated operation, and we are that two algorithms set identical Iterative steps are contrasted, and explore the effect of the different iterative steps under current data set, and we are with square mean error amount As reference, the iteration effect of two algorithms is as shown in Figure 3.
From figure 3, it can be seen that what two algorithms all substantially tended towards stability near the 10th iteration;Weight map of the present invention Algorithm prediction-emotion value Valence and excitation value Arousal square mean error amount is relatively low, predicted value relatively connects with actual value Closely;The final convergence result of each word is unrelated with original allocation value and decay factor.
Linear regression is respectively adopted, kernel function recurrence, PageRank algorithms, weight nomography of the present invention tested it is each Individual index the results are shown in Table 1, and the wherein iterations of PageRank algorithms and weight graph model of the present invention is respectively provided with 50 times.
Each index result of 1 four kinds of algorithms of table
From table 1 it follows that linear regression is generally better than based on PageRank algorithms and weight nomography of the present invention Algorithm and Kernels.Performance of the weight nomography of the present invention in MAPE (average absolute percent error) this index is excellent It is average by low about 4%, lower than the kernel function Return Law 8% or so than PageRank algorithm in other three kinds of algorithms, compare linear regression Method low 7%.The reason for weight nomography does very well than other several algorithms is that it considers relation between multiple nodes simultaneously And the weighted value between them.In addition, we can also find out from result, for the prediction of Arousal values than Valence values The difficulty of prediction is big.
Linear regression algorithm:
When the VA values prediction of word to be calibrated is carried out using method automatically or semi-automatically, generally mark in a manual manner A small amount of emotional words as kind of a sub-word, recycle calculate the method for semantic similarity between word automatically from a large amount of text marks with The similar word to be calibrated of kind sub-word.
Therefore, the corresponding relation of word " similarity-VA values " can be generally established using linear regression, is then recycled Similarity between word to be marked and kind sub-word predicts the VA values of word to be marked.Because each kind of sub-word has mark The VA values remembered, therefore only need to calculate the similarity of each kind of sub-word and other kind of sub-word, can be each seed list Word establishes the regression model of " similarity-Valence (or Arousal) value ", and following formula is kind of a sub-word seedjValence return Return model,
Wherein, b is regression coefficient, and a is intercept.Wherein Sim (seedi,seedj) representative species sub-word seediAnd seedj Semantic similarity,For seediValence values.Therefore, if by seedjWith the similarity of other kind of sub-word with And Valence (or Arousal) value of these seed vocabulary substitutes into, you can training obtains seedjValence (or Arousal regression model).
If to predict the word VA values to be marked by checking using this regression model, it is only necessary to by these words with seedjSimilarity input, such as:Assuming that certain word unseen to be markediWith seedjSimilarity exceed threshold values and logical When crossing checking, unseeniEmotion vocabulary will be chosen as, now can be by unseeniWith seedjSimilarity input seedjReturn Return model to predict candiVA values, be shown below,
If both can be used not only more than the threshold values of a kind sub-word in similarity of the word to be marked with planting sub-word Regression model caused by the maximum kind sub-word training of similarity.
Kernel function regression algorithm:
Based on this, another method can introduce kernel function in the linear regression model (LRM) of routine, for being carried out to similarity Conversion linearly or nonlinearly, regression model are defined as:
Wherein f represents kernel function, can be the nonlinear function such as linear function or square root, logarithm.Based on this Individual model, the VA values of word to be markedCan be by the VA values of all kind sub-words similar to itsWith them it Between semantic similarity Sim (unseeni,seedj) be predicted.It is final test result indicates that, when kernel function is linear function Best prediction effect can be obtained.
PageRank algorithms:
Also the VA values for having some documents to propose to be carried out vocabulary using PageRank algorithms are predicted.This kind of method is generally by seed As node, vocabulary and similarity are described as graph model, and introduce by the relation between vocabulary as side for word, word to be marked PageRank algorithms are solved.In the method, the VA values of each node (word) to be marked are updated according to the following formula:
Wherein, wiAnd wjWord to be marked and kind sub-word, Nei (w are represented respectivelyi) represent and node (word) wiThere is side The set of connected all of its neighbor node (word).Represent the Valence values prediction of the t times interative computation of word to be marked As a result, e is a constant, and α is the attenuation coefficient in PageRank algorithms, and usual value is (0,1).Based on formula (12), candidate The VA values of vocabulary can be predicted by PageRank interative computation using the VA values of seed vocabulary.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all Any modification, equivalent substitution and improvements made within the spirit and principles in the present invention etc., are all contained in protection scope of the present invention It is interior.

Claims (4)

  1. A kind of 1. method that automatic Calibration is carried out to word emotion value, it is characterised in that specifically follow the steps below:
    Step 1, the term vector of all words in dictionary is trained, a small amount of word is initialized by the method for handmarking, is denoted as Kind sub-word vj, remaining word is word v to be calibratedi;The kind sub-word v being initialisedjMood value be valvj, excitation value be arovj
    Step 2, each kind sub-word v is calculated using word2vec instrumentsjWith each word v to be calibratediTerm vector between it is remaining String angle value, obtain each kind of sub-word vjWith each word v to be calibratediBetween similarity;
    Step 3, with kind of a sub-word vjWith word v to be calibratediAs node, with kind of a sub-word vjWith word v to be calibratediBetween phase Like degree as even side right weight, weight graph model is built;
    Step 4, word v to be calibrated is predictediMood value valviWith excitation value arovi
  2. A kind of 2. method that automatic Calibration is carried out to word emotion value according to claim 1, it is characterised in that the step In rapid 4, word v to be calibrated is predictediMood value valvi, continuous iteration is carried out by formula (3) and is updated to convergence:
    Wherein, α is decay factor or confidence level, and value is between 0-1, and random number value is between 1-9, Sim (vi,vj) generation Table word v to be calibratediWith kind of a sub-word vjBetween similarity, valvjRepresent the mood value for the kind sub-word being initialised, t generations The step number of table iteration,The mood value of t step iteration words to be calibrated is represented,Represent t-1 step iteration lists to be calibrated The mood value of word.
  3. A kind of 3. method that automatic Calibration is carried out to word emotion value according to claim 1, it is characterised in that the step In rapid 4, the excitation value aro of word to be calibrated is predictedviMethod, continuous iteration is carried out by formula (4) and is updated to convergence:
    Wherein, α is decay factor or confidence level, and value is between 0-1;Random number value is between 1-9, Sim (vi,vj) generation Table word v to be calibratediWith kind of a sub-word vjBetween similarity, arovjRepresent the excitation value for the kind sub-word being initialised, t generations The step number of table iteration,The excitation value of t step iteration words to be calibrated is represented,Represent t-1 step iteration lists to be calibrated The excitation value of word.
  4. A kind of 4. method that automatic Calibration is carried out to word emotion value according to claim 1, it is characterised in that the step In rapid 4, word v to be calibrated is predictediMood value valviWith excitation value aroviUsing matrix operation method, it is specially:Will be all Word to be calibrated, the mood value for planting sub-word represent with vectorial V, by all words to be calibrated, plant the excitation value of sub-word with to A is measured to represent,If all words to be calibrated and kind Similarity between sub-word forms adjacency matrix S,
    <mrow> <mi>S</mi> <mo>=</mo> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <mrow> <mi>S</mi> <mi>i</mi> <mi>m</mi> <mrow> <mo>(</mo> <msub> <mi>v</mi> <mn>1</mn> </msub> <mo>,</mo> <msub> <mi>v</mi> <mn>1</mn> </msub> <mo>)</mo> </mrow> </mrow> </mtd> <mtd> <mn>...</mn> </mtd> <mtd> <mrow> <mi>S</mi> <mi>i</mi> <mi>m</mi> <mrow> <mo>(</mo> <msub> <mi>v</mi> <mn>1</mn> </msub> <mo>,</mo> <msub> <mi>v</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </mrow> </mtd> <mtd> <mn>...</mn> </mtd> <mtd> <mrow> <mi>S</mi> <mi>i</mi> <mi>m</mi> <mrow> <mo>(</mo> <msub> <mi>v</mi> <mn>1</mn> </msub> <mo>,</mo> <msub> <mi>v</mi> <mi>N</mi> </msub> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> <mtd> <mo>.</mo> </mtd> <mtd> <mo>.</mo> </mtd> <mtd> <mo>.</mo> </mtd> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> <mtd> <mo>.</mo> </mtd> <mtd> <mo>.</mo> </mtd> <mtd> <mo>.</mo> </mtd> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> <mtd> <mo>.</mo> </mtd> <mtd> <mo>.</mo> </mtd> <mtd> <mo>.</mo> </mtd> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mrow> <mi>S</mi> <mi>i</mi> <mi>m</mi> <mrow> <mo>(</mo> <msub> <mi>v</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>v</mi> <mn>1</mn> </msub> <mo>)</mo> </mrow> </mrow> </mtd> <mtd> <mn>...</mn> </mtd> <mtd> <mrow> <mi>S</mi> <mi>i</mi> <mi>m</mi> <mrow> <mo>(</mo> <msub> <mi>v</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>v</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </mrow> </mtd> <mtd> <mn>...</mn> </mtd> <mtd> <mrow> <mi>S</mi> <mi>i</mi> <mi>m</mi> <mrow> <mo>(</mo> <msub> <mi>v</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>v</mi> <mi>N</mi> </msub> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> <mtd> <mo>.</mo> </mtd> <mtd> <mo>.</mo> </mtd> <mtd> <mo>.</mo> </mtd> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> <mtd> <mo>.</mo> </mtd> <mtd> <mo>.</mo> </mtd> <mtd> <mo>.</mo> </mtd> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> <mtd> <mo>.</mo> </mtd> <mtd> <mo>.</mo> </mtd> <mtd> <mo>.</mo> </mtd> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mrow> <mi>S</mi> <mi>i</mi> <mi>m</mi> <mrow> <mo>(</mo> <msub> <mi>v</mi> <mi>N</mi> </msub> <mo>,</mo> <msub> <mi>v</mi> <mn>1</mn> </msub> <mo>)</mo> </mrow> </mrow> </mtd> <mtd> <mn>...</mn> </mtd> <mtd> <mrow> <mi>S</mi> <mi>i</mi> <mi>m</mi> <mrow> <mo>(</mo> <msub> <mi>v</mi> <mi>N</mi> </msub> <mo>,</mo> <msub> <mi>v</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </mrow> </mtd> <mtd> <mn>...</mn> </mtd> <mtd> <mrow> <mi>S</mi> <mi>i</mi> <mi>m</mi> <mrow> <mo>(</mo> <msub> <mi>v</mi> <mi>N</mi> </msub> <mo>,</mo> <msub> <mi>v</mi> <mi>N</mi> </msub> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>;</mo> </mrow>
    Wherein, Sim (vi, vj) represent word v to be calibratediWith kind of a sub-word vjBetween similarity, 1≤i < N, 1≤j < N;
    Set vectorial I=(1,1T, 1, to amount ..D ,=(1d1,)d2,...,dN)T, wherein, α is decay factor or confidence level, and value is between 0-1;Assuming that vectorial X=(x1,x2,...,xN)T, vectorial Y=(y1, y2,...,yN)T, then functional operation M (X, Y)=(x1×y1,x2×y2,...,xN×yN)T, functional operation U (X, Y)=(x1/y1, x2/y2,...,xN/yN)T
    The mood value vector V of all vocabulary including kind of sub-word and word to be calibrated is included using formula (5) calculating t step iterationtWith Excitation value vector At
    Vt=M [(I-D)T, Vt-1]+M[DT, U (SVt-1, S × I)],
    At=M [(I-D)T, At-1]+M[DT, U (SAt-1, S ×)] (5)
    Wherein, Vt-1Represent t-1 step iteration and include kind of a sub-word vjWith word v to be calibratediInside the mood value of all vocabulary to Amount, At-1Represent t-1 step iteration and include kind of a sub-word vjWith word v to be calibratediThe excitation value vector of all vocabulary inside;
    After successive ignition convergence, word v to be calibratediMood value valviFor mood value vector VtI-th dimension respective value;Treat Demarcate word viExcitation value aroviFor excitation value vector AtI-th dimension respective value.
CN201711105704.5A 2017-11-10 2017-11-10 The method that automatic Calibration is carried out to word emotion value Pending CN107766331A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711105704.5A CN107766331A (en) 2017-11-10 2017-11-10 The method that automatic Calibration is carried out to word emotion value

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711105704.5A CN107766331A (en) 2017-11-10 2017-11-10 The method that automatic Calibration is carried out to word emotion value

Publications (1)

Publication Number Publication Date
CN107766331A true CN107766331A (en) 2018-03-06

Family

ID=61273802

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711105704.5A Pending CN107766331A (en) 2017-11-10 2017-11-10 The method that automatic Calibration is carried out to word emotion value

Country Status (1)

Country Link
CN (1) CN107766331A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108491393A (en) * 2018-03-29 2018-09-04 国信优易数据有限公司 A kind of emotion word emotional intensity side of determination and device
CN108563635A (en) * 2018-04-04 2018-09-21 北京理工大学 A kind of sentiment dictionary fast construction method based on emotion wheel model
CN108595679A (en) * 2018-05-02 2018-09-28 武汉斗鱼网络科技有限公司 A kind of label determines method, apparatus, terminal and storage medium
CN109800804A (en) * 2019-01-10 2019-05-24 华南理工大学 A kind of method and system realizing the susceptible sense of image and independently converting
CN109933793A (en) * 2019-03-15 2019-06-25 腾讯科技(深圳)有限公司 Text polarity identification method, apparatus, equipment and readable storage medium storing program for executing
CN113326694A (en) * 2021-05-18 2021-08-31 西华大学 Implicit emotion dictionary generation method based on emotion propagation

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20130022165A (en) * 2011-08-25 2013-03-06 성균관대학교산학협력단 System for prediction of emotional response based on responsibility value of user-group having similar emotion, and method of thereof
CN103985000A (en) * 2014-06-05 2014-08-13 武汉大学 Medium-and-long term typical daily load curve prediction method based on function type nonparametric regression
CN104732203A (en) * 2015-03-05 2015-06-24 中国科学院软件研究所 Emotion recognizing and tracking method based on video information
CN106156004A (en) * 2016-07-04 2016-11-23 中国传媒大学 The sentiment analysis system and method for film comment information based on term vector
CN106547866A (en) * 2016-10-24 2017-03-29 西安邮电大学 A kind of fine granularity sensibility classification method based on the random co-occurrence network of emotion word
CN106598935A (en) * 2015-10-16 2017-04-26 北京国双科技有限公司 Method and apparatus for determining emotional tendency of document
CN106708953A (en) * 2016-11-28 2017-05-24 西安电子科技大学 Discrete particle swarm optimization based local community detection collaborative filtering recommendation method
CN106951514A (en) * 2017-03-17 2017-07-14 合肥工业大学 A kind of automobile Method for Sales Forecast method for considering brand emotion

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20130022165A (en) * 2011-08-25 2013-03-06 성균관대학교산학협력단 System for prediction of emotional response based on responsibility value of user-group having similar emotion, and method of thereof
CN103985000A (en) * 2014-06-05 2014-08-13 武汉大学 Medium-and-long term typical daily load curve prediction method based on function type nonparametric regression
CN104732203A (en) * 2015-03-05 2015-06-24 中国科学院软件研究所 Emotion recognizing and tracking method based on video information
CN106598935A (en) * 2015-10-16 2017-04-26 北京国双科技有限公司 Method and apparatus for determining emotional tendency of document
CN106156004A (en) * 2016-07-04 2016-11-23 中国传媒大学 The sentiment analysis system and method for film comment information based on term vector
CN106547866A (en) * 2016-10-24 2017-03-29 西安邮电大学 A kind of fine granularity sensibility classification method based on the random co-occurrence network of emotion word
CN106708953A (en) * 2016-11-28 2017-05-24 西安电子科技大学 Discrete particle swarm optimization based local community detection collaborative filtering recommendation method
CN106951514A (en) * 2017-03-17 2017-07-14 合肥工业大学 A kind of automobile Method for Sales Forecast method for considering brand emotion

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王津: "基于Valence-Arousal空间的中文文本情感分析方法研究", 《中国博士学位论文全文数据库 信息科技辑》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108491393A (en) * 2018-03-29 2018-09-04 国信优易数据有限公司 A kind of emotion word emotional intensity side of determination and device
CN108491393B (en) * 2018-03-29 2022-05-20 国信优易数据股份有限公司 Emotion strength determining party and device for emotion words
CN108563635A (en) * 2018-04-04 2018-09-21 北京理工大学 A kind of sentiment dictionary fast construction method based on emotion wheel model
CN108595679A (en) * 2018-05-02 2018-09-28 武汉斗鱼网络科技有限公司 A kind of label determines method, apparatus, terminal and storage medium
CN109800804A (en) * 2019-01-10 2019-05-24 华南理工大学 A kind of method and system realizing the susceptible sense of image and independently converting
CN109933793A (en) * 2019-03-15 2019-06-25 腾讯科技(深圳)有限公司 Text polarity identification method, apparatus, equipment and readable storage medium storing program for executing
CN109933793B (en) * 2019-03-15 2023-01-06 腾讯科技(深圳)有限公司 Text polarity identification method, device and equipment and readable storage medium
CN113326694A (en) * 2021-05-18 2021-08-31 西华大学 Implicit emotion dictionary generation method based on emotion propagation

Similar Documents

Publication Publication Date Title
Wang et al. Deep learning for aspect-based sentiment analysis
CN107766331A (en) The method that automatic Calibration is carried out to word emotion value
Mahmoudi et al. Deep neural networks understand investors better
Stojanovski et al. Twitter sentiment analysis using deep convolutional neural network
CN110245229A (en) A kind of deep learning theme sensibility classification method based on data enhancing
CN108804612B (en) Text emotion classification method based on dual neural network model
Chang et al. Research on detection methods based on Doc2vec abnormal comments
CN110929034A (en) Commodity comment fine-grained emotion classification method based on improved LSTM
CN107315738A (en) A kind of innovation degree appraisal procedure of text message
CN108984775B (en) Public opinion monitoring method and system based on commodity comments
Fan et al. Multi-task neural learning architecture for end-to-end identification of helpful reviews
CN109299258A (en) A kind of public sentiment event detecting method, device and equipment
Kang et al. Deep recurrent convolutional networks for inferring user interests from social media
Swathi et al. An optimal deep learning-based LSTM for stock price prediction using twitter sentiment analysis
CN110134934A (en) Text emotion analysis method and device
Talpada et al. An analysis on use of deep learning and lexical-semantic based sentiment analysis method on twitter data to understand the demographic trend of telemedicine
CN111241271B (en) Text emotion classification method and device and electronic equipment
CN111325018A (en) Domain dictionary construction method based on web retrieval and new word discovery
Chen et al. Content-based influence modeling for opinion behavior prediction
Thomas et al. Sentimental analysis using recurrent neural network
Mitroi et al. Sentiment analysis using topic-document embeddings
Zhang et al. Bidirectional long short-term memory for sentiment analysis of Chinese product reviews
CN113392209A (en) Text clustering method based on artificial intelligence, related equipment and storage medium
Lee Document vectorization method using network information of words
CN112084333B (en) Social user generation method based on emotional tendency analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180306

RJ01 Rejection of invention patent application after publication