CN105117428A - Web comment sentiment analysis method based on word alignment model - Google Patents

Web comment sentiment analysis method based on word alignment model Download PDF

Info

Publication number
CN105117428A
CN105117428A CN201510471154.3A CN201510471154A CN105117428A CN 105117428 A CN105117428 A CN 105117428A CN 201510471154 A CN201510471154 A CN 201510471154A CN 105117428 A CN105117428 A CN 105117428A
Authority
CN
China
Prior art keywords
word
candidate
evaluation object
comment
emotion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510471154.3A
Other languages
Chinese (zh)
Other versions
CN105117428B (en
Inventor
程红蓉
唐明霜
蔡腾远
郭彦伟
张锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201510471154.3A priority Critical patent/CN105117428B/en
Publication of CN105117428A publication Critical patent/CN105117428A/en
Application granted granted Critical
Publication of CN105117428B publication Critical patent/CN105117428B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The present invention belongs to the field of sentiment analysis in natural language processing, and discloses a Web comment sentiment analysis method based on a word alignment model. The method specifically comprises: obtaining comment information from a webpage and preprocessing the content of the comment information; obtaining a candidate emotional word and a candidate evaluation object word from the comment based on an improved machine translation model; then using the emotional relationship of the emotional word and the evaluation object word, and a characteristic index of the word to extract the emotional word and the evaluation object from a candidate word list; and finally determining an emotional tendency of the corresponding emotional word to the evaluation object by use of an effective multi-categorization regression model. The method disclosed by the invention has performed experiments on a comment data set of a plurality of categories, and obtained good experimental results.

Description

A kind of web based on word alignment model comments on sentiment analysis method
Technical field
The present invention relates to internet arena, natural language processing and machine learning field, be specifically related to a kind of web based on word alignment model and comment on sentiment analysis implementation method.
Background technology
Along with the arrival of web2.0 and the develop rapidly of mobile Internet, information on internet is explosive growth, International Telecommunications Union (ITU) issues " Information & Communication Technology in 2014 " report and claims, to the end of the year in 2014, Global Internet number of users will reach about 3,000,000,000, and the most network user also obtains from the network information in past the fabricator that taker becomes the network information, make the visit capacity of the quantity of Web content and the network information all in swift and violent increase.Sentiment analysis is exactly to the information on internet, and as news, blog article, comment on commodity, mail, the contents such as forum postings carry out analyzing and excavating.
Along with specification and the development of ecommerce, the user of online shopping gets more and more.Line is done shopping, and user is cognitive really to product neither one, so user can be partial to see that the review information of commodity determines whether buying.For manufacturer or electricity Shang company, the public praise wanted by understanding certain brand judges following sales trend, no longer be confined to questionnaire in the past or call-on back by phone carrys out obtaining information, they directly just can obtain the commodity market feedback information wanted by online comment on commodity.In addition, by carrying out sentiment analysis to comment on commodity and then commercial product recommending is also one to user applying very widely.Therefore, how effectively from the comment on commodity of magnanimity, the active demand that profound emotion information becomes every profession and trade people is excavated.The sentiment analysis (SentimentAnalysis/OpinionMining) of network comment also just naturally becomes current study hotspot.
The groundwork of sentiment analysis is tendentiousness information extraction and tendentiousness classification.The main task of tendentiousness information extraction is at word, sentence or chapter rank extract the key element relevant to Sentiment orientation, wherein finer work be there has also been in recent years to the extraction of evaluation object (opiniontarget, also referred to as product feature productfeature).Hu and Liu (Huetal., 2004) think the evaluation object noun often mentioned of reviewer or noun phrase often, therefore with the method for correlation rule extract minimum supporting rate (minimumsupport) be 1% noun or noun phrase as the evaluation object of frequent (frequent).Then extracting the adjective comprised in the sentence of evaluation object is emotion word (opinionword), and the evaluation object frequently that finally combination is drawn into and emotion word are to extract the evaluation object of non-frequent (infrequent).Popescu and Etzioni (Popescuetal., 2005) improves the method for Hu and Liu.First for each product category defines a series of whole relation mark word (meronymydiscriminator), some mutual information (PMI) value then calculating overall mark word and noun obtains the possibility that this noun is evaluation object.The people (Qiuetal., 2011) such as Qiu in 2011 extract evaluation object with the two-way propagation algorithm (DoublePropagation) based on semantic relation.The people (Liuetal., 2012) such as Liu in 2012 have used the statistical machine translation based on word in sentiment analysis first, combine and extract evaluation object and emotion word.
Comprising word, short sentence, sentence, the granularity that chapter etc. are different for emotion tendency classification, is all text classification problem.Emotion tendency is classified, and mainly contains the method based on supervised learning, and wherein making with the most use has naive Bayesian, support vector machine, maximum entropy model, k nearest neighbor and condition random field sorter etc., be also widely used in emotional semantic classification based on the method for semi-supervised learning and unsupervised learning point method.Although semi-supervised simpler than the method for supervised learning in realization with method that is unsupervised learning, but the semantic similarity between emotion word is difficult to calculate, final classification results does not have the classify accuracy of supervised learning high yet, so present Sentiment orientation classification great majority or the method for the supervised learning used.
Summary of the invention
Based on above background technology, the present invention proposes a kind of sentiment analysis method to product review, object is to provide reference to the purchase of user, they effective feedback information can be provided again, so that can improve to product or to user's recommended products for production firm or electricity Shang company.
The main body of method proposed by the invention extracts emotion word and evaluation object word, different from the method that the emotional relationship (opinionrelation) between traditional independent consideration emotion word and evaluation object or the independent characteristic relying on word itself extract emotion word and evaluation object, the characteristic that present invention incorporates emotional relationship and word combines extraction emotion word and evaluation object, can obtain higher extraction accuracy rate.Finally there is the machine learning method of supervision to carry out emotional semantic classification to emotion word with a kind of, and then judge the Sentiment orientation of the emotion word corresponding to evaluation object.Specific embodiment of the invention step is as follows:
1. data prediction
Data of the present invention are the comments on commodity captured from network by crawlers, there is the nonstandard situation of format write in these data, in order to reduce the impact on text analyzing and emotional semantic classification, first pre-service has been carried out to data, such as remove null, remove space, remove the punctuate and web page tag etc. that repeat.Carry out participle and part-of-speech tagging with participle instrument to the text after process again, finally the text after participle being pressed punctuation mark (comma, fullstop, exclamation mark) cutting is sentence.
2. extract emotion word and the evaluation object of candidate
Before extraction candidate's emotion word and evaluation object, the present invention is based on such hypothesis: all noun/noun phrases are all the evaluation objects of candidate, all adjective/verbs are all the emotion word of candidate.This hypothesis is widely used in sentiment analysis before, and is proved to be effective.Had this hypothesis, the present invention just regards extraction emotion word and evaluation object as and extracts the right task of (adjective/verb, noun/noun phrase) word in the text.The present invention is the Machine Translation Model (WTM based on word, Word-BasedTranslationModel) the word alignment model changing single language into extracts the right task of word, specifically improve one's methods and be: noun/noun phrase (adjective/verb) snaps to adjective/verb (noun/noun phrase) or snaps to sky (NULL), make the word alignment of other part of speech own to them.The text handled well in 1st step is copied the parallel corpus of generation one, using the input data of these two identical corpus as model.In corpus one is contained to the sentence S={ ω of n word 1, ω 2..., ω n, word be tried to achieve to A={ (j, aj) | j ∈ [1, n] }, the new probability formula that will be calculated as follows:
P ( A | S ) ∝ Π k = 1 n n ( φ k | ω k ) Π j = 1 n t ( ω j | ω j ) d ( j | a j , n ) - - - ( 1 )
Wherein t (ω j| ω aj) represent that the noun/noun phrase (adjective/verb) of a jth position and the adjective/verb (noun/noun phrase) of aj position appear at the probabilistic information in sentence simultaneously.If an adjective/verb and a noun/noun phrase occur frequently in corpus, so t (ω j| ω aj) value will be larger.D (j|aj, n) the modelling positional information of word, represents the probability of the word of the word aligned position j of position aj.N (φ k| ω k) the modelling procreation probability of word alignment, illustrate the situation of word one-to-many, wherein φ krepresent and snap to word ω kthe number of word.By maximizing new probability formula just can in the hope of word pair.
3. obtain accurate emotion word and evaluation object
1) relation value between emotion word and evaluation object is calculated
Obtain all words to afterwards by above method, just can calculate the alignment probability between noun/noun phrase and adjective/verb, formula is as follows:
P ( ω o | ω t ) = c o u n t ( ω o , ω t ) c o u n t ( ω t ) - - - ( 2 )
Wherein ω orepresent adjective/verb, ω trepresent noun/noun phrase, P (ω o| ω t) represent that noun/noun phrase is to the alignment probability of adjective/verb, in like manner can try to achieve the probability that aligns of adjective/verb and noun/noun phrase.By alignment probability, just can potential emotional relationship between formulistic emotion word and evaluation object, represent with Association.Concrete formula is as follows:
Association(ω ot)=(λ*P(ω to)+(1-λ)*P(ω ot)) -1(3)
As was previously described, the present invention extracts emotion word and evaluation object in conjunction with the characteristic index (the present invention Indicator represents) of potential emotional relationship and word itself.Because emotion word and evaluation object have different I ndicator, the present invention has used diverse ways to calculate them.
2) Indicator of Calculation Estimation object
The present invention is evaluation object noun/noun phrase being regarded as candidate, and this kind of word has field singularity, simultaneously in corpus medium-high frequency and the candidate word be evenly distributed more likely becomes evaluation object.Based on this 2 point, the present invention has used m with the comment incoherent corpus in language material field and has carried out the Indicator of Calculation Estimation object in conjunction with document frequency and information entropy.
About fix on comments material storehouse C={d 1, d 2..., d nin each section comment d jseparate, if certain evaluation object word t idistribute in corpus more even, then information entropy is larger.Just well can be reflected the distribution situation of word by the size of entropy, in the present invention, the computing formula of information entropy is as follows:
I E ( t i ) = - Σ i = 1 n p ( d j , t i ) log p ( d j , t i ) - - - ( 4 )
Wherein p (d j, t i) represent candidate word t iat comment d jthe probability of middle appearance, circular is as follows
p ( d i , t i ) = tf i j Σ j = 1 n tf i j - - - ( 5 )
Wherein tf ijrepresent candidate word t iword frequency in the comment of a jth section, if t ionly occur in one section of comment, so p (d j, t i)=1, then logp (d j, t i)=0, makes IE (t i)=0.In order to the feasibility of subsequent calculations, IE (t i) can not be 0, the present invention just adds very little constant term factor ε=0.0001 in the denominator part of formula (5).New probability formula so is now:
p ( d j , t i ) = tf i j Σ j = 1 n tf i j + ϵ - - - ( 6 )
Based on entropy, high frequency words can be preferred, but may have common major terms in high frequency words, as: " people ", " thing " etc., and also may there is evaluation object in the low-frequency word omitted.In order to make up this defect, the present invention uses m and comments on the uncorrelated but corpus that scale is identical in language material field, and calculates in conjunction with document frequency, the present invention Ds (t i) jrepresent candidate word t idistributed intelligence in the incoherent corpus in a jth field, concrete formula is as follows:
D s ( t i ) j = α × l o g ( 1 + df i n ) i f df o u t _ j = 0 l o g ( 1 + df i n ) l o g ( 1 + df o u t _ j ) o t h e r w i s e - - - ( 7 )
Wherein df incandidate word t idocument frequency in comment corpus, df out_jrepresent t iat jth and the document frequency in the incoherent corpus in field.Work as df out_jwhen=0, candidate word t ivery large probability is then had to be the evaluation object word with field singularity.Therefore in order to improve document frequency to distributed intelligence Ds (t i) jimpact, parameters α is greater than 1.
Finally, the formula of the Indicator of evaluation object is asked to be expressed as:
I ( t i ) = D s ( t 1 ) ‾ × I E ( t i ) - - - ( 8 )
Wherein D s ( t 1 ) ‾ = Σ j = 1 m D s ( t i ) j m
3) Indicator of emotion word is calculated
The emotion word of candidate is adjective/verb, and major part does not have field correlativity, as: " good ", " disliking ", " liking " etc.Small part word has field singularity, as: " good to eat " in food and drink comment, " stimulation " in film comment.The present invention is in conjunction with the Indicator of document frequency and word distribution proportion calculated candidate emotion word.Concrete formula is as follows:
I(o i)=log(1+df i)×D i(9)
Df in above formula irepresent candidate word o idocument frequency in comment corpus, represent the distribution situation of candidate word.Tf ijcandidate word o iword frequency in the comment of a jth section, represent o iaverage word frequency in all comments.
By above 1) ~ 3) step obtains the important indicator value Indicator of emotional relationship value Association between emotion word and evaluation object and word itself.Association and Indicator is combined, forms the parameter of a screening candidate word, be called the energy value (Energy) of candidate word.In candidate word list, energy value is chosen as final emotion word (or evaluation object word) higher than those words of certain threshold value.The present invention turns to a bigraph (bipartitegraph) this algorithm model, and calculates the energy value of emotion word and evaluation object with the Random Walk Algorithm restarted (RandomWalkingwithRestart), and formula is as follows:
E ( t ) = λ × R × E ( o ) + ( 1 - λ ) × I t E ( o ) = λ × R × E ( t ) + ( 1 - λ ) × I o - - - ( 10 )
Wherein E (t) and E (o) represents the energy value of evaluation object word and emotion word respectively, and R is relational matrix, R ijrepresent the Association weight between the evaluation object word of i-th candidate and jth candidate's emotion word.I trepresent the vector of the Indicator of candidate evaluations object, each element value is wherein calculated by formula (8).I orepresent the vector of the Indicator of candidate's emotion word, wherein each element value is calculated by formula (9).λ ∈ [0,1] is a mediation parameter.
4. emotion word feeling polarities classification
According to the method for the embodiment of the present invention, the Sentiment orientation of emotion word corresponding to evaluation object finally to be obtained.The present invention uses a kind of effective polytypic regression model Softmax to carry out emotional semantic classification to emotion word, and the Sentiment orientation of emotion word is divided three classes (positive, neutral, negative), in regression model, use (3,2,1) to represent classification respectively.For given data, emotion word is converted into the proper vector of applicable Softmax by the first word vector model of the present invention.Input the data { (x of Softmax regression model (1), y (1)) ..., (x (n), y (n)), wherein y (n){ 1,2,3} represents classification to ∈, input feature vector x (i)∈ R n+1, the dimension of representation feature vector x is n+1.By 5 folding cross validation training pattern parameters.The highest that classification of the probability finally exported is as prediction classification.The result of emotional semantic classification is shown with an example at this, as hotel's comment: " room in hotel is very warm; feel just as going back home ", extract emotion word and evaluation object word to (room, warm), classified by Softmax, the emotion classification of " warmth " correspondence is 3, being evaluated as " just " then to " room ", be expressed as (room ,+).
Accompanying drawing explanation
Fig. 1 is the general frame figure of the web comment sentiment analysis method that the present invention is based on word alignment model;
Fig. 2 is the treatment scheme of the web comment sentiment analysis method that the present invention is based on word alignment model;
Fig. 3 is that the web that the present invention is based on word alignment model comments on the illustraton of model extracting emotion word and evaluation object in sentiment analysis method;
Fig. 4 is the process flow diagram that the web that the present invention is based on word alignment model comments on for emotion word emotion kind judging in sentiment analysis method.
Embodiment
With reference to the accompanying drawings, and in conjunction with specific embodiments, embodiments of the invention are described in detail.The embodiment described below with reference to accompanying drawing is exemplary, just for explaining the present invention, and can not be interpreted as limitation of the present invention.
The present invention is the web comment sentiment analysis method based on word alignment model, mainly carries out sentiment analysis to the comment on commodity on internet.As depicted in figs. 1 and 2, the present invention includes following steps:
S1. review information is obtained from Web.The data of specific embodiments of the invention are with crawlers from Jingdone district net, Dangdang.com, camera comment, book review, hotel's comment and food and drink comment that ctrip.com and popular comment website capture respectively.The concrete scale of data set is as shown in table 1.
Table 1 comment data collection
Field Comment record Comment sentence number
Camera 17052 63574
Book 9473 21630
Hotel 2331 7365
Food and drink 35519 346832
S2. pre-service is carried out to data
The data captured from Web are above all nonstandard usually, first remove web page tag, remove the punctuation mark etc. of repetition.Then with the Words partition system NLPIR of the Chinese Academy of Sciences, participle is carried out to text, obtain corpus C.Then press comma, fullstop, exclamation mark is sentence each section of comment cutting, obtains corpus C1.
S3. from review information, obtain emotion word and the evaluation object of candidate, concrete steps are as follows:
1) S2 step is obtained text data language material C1 and copy the parallel corpus C2 of generation one.
2) amendment is based on the Machine Translation Model of word, and concrete modification strategy is as follows: allow noun/noun phrase (adjective/verb) snap to adjective/verb (noun/noun phrase) or NULL.Make the word alignment of other part of speech own to them.
3) 1) in data set C1 and C2 be input to 2) in amendment model, finally obtain (noun/noun phrase, adjective/verb/NULL) or (adjective/verb, noun/noun phrase/NULL) word pair.
S4. accurate word pair is extracted from candidate word centering
1) obtain all words pair in comment corpus by S3, just can calculate the right emotional relationship Association of word now, circular is with reference to formula (3).
2) because the evaluation object word of candidate has field singularity, the present embodiment 5 corpus D1s uncorrelated and identical with C1 scale with comment language material field, D2, D3, D4, D5, then the word index Indicator of combining information entropy and document frequency calculated candidate evaluation object, concrete with reference to formula (8).
3) most of emotion word does not have field singularity, and as " good ", " liking ", " ugliness " etc., only have a small amount of emotion word to have field singularity, as " good to eat " in food and drink comment.True based on this, the document frequency of the embodiment of the present invention in conjunction with word and the word index Indicator of distribution proportion acquisition candidate emotion word, concrete grammar is with reference to formula (9).
4) by 1), 2), 3) step obtains the emotional relationship value Association of candidate word and their point other Indicator values, in order to these two factor models, the present embodiment constructs a bigraph G (V, E, R), as accompanying drawing 3, v t∈ V, and v o∈ V, v trepresent candidate evaluations subject word, v orepresent candidate's emotion word.E is the limit collection between summit, v tand v obetween when having an emotional relationship, then have limit.R represents the weight set on limit, and each element in R is by 1) in calculate Association form.Then obtain the energy value (Energy) of candidate word with the Random Walk Algorithm restarted (RWR) with reference to formula (10) iterative computation, those words that in candidate word list, energy value is greater than certain threshold value are selected as final emotion word and evaluation object word.
Evaluation criterion: embodiment of the present invention accuracy rate, recall rate and F1 value as evaluation index, through manual verification, the accuracy rate on four data sets, recall rate and F1 Data-Statistics see the following form:
Table 2 experimental result
Data set Precision Recall F1 value
Book review 0.64 0.92 0.75
Camera is commented on 0.60 0.76 0.67
Hotel is commented on 0.70 0.87 0.77
Food and drink is commented on 0.63 0.85 0.72
S5. Sentiment orientation
The one that the embodiment of the present invention is effectively many classification regression model Softmax carries out feeling polarities judgement to emotion word.The feeling polarities of emotion word is divided into front by the embodiment of the present invention, neutral, negative three classifications, and use numeral 3,2 respectively, 1 represents.Before training Softmax model, the first emotion classification of artificial mark emotion word, in embodiments of the present invention, three people are first allowed to carry out Emotion tagging to emotion word respectively, there is the situation of difference in the classification of these three people's marks, is then consulted to discuss by three people, draw final annotation results.Then with a kind of term vector model, emotion word is converted into the proper vector of n dimension, the term vector model of the embodiment of the present invention is word2vector model.Finally draw the value that predicts the outcome of each emotion word.Specific experiment flow process is shown in accompanying drawing 4.
Below describe the embodiment of the present invention illustratively, so that the researchist of the art understands the present invention, but be noted that the scope that the invention is not restricted to specific embodiment.For the ordinary skill in the art, without departing from the principles and spirit of the present invention, multiple amendment is carried out to these embodiments, change, replace and modification etc., all should be included within scope that claims of the present invention protect.

Claims (3)

1. the web based on word alignment model comments on a sentiment analysis method, and the concrete steps that the method comprises are as follows:
Step 1, to capturing the punctuation mark that the comment data of getting off removes repetition from internet, remove the process such as web page tag, then participle and part-of-speech tagging are carried out to it.Again the data marked by comma, fullstop, exclamation mark cutting be short sentence.
Step 2, revise Machine Translation Model based on word, this bilingual translation model is applied to the emotion word and the evaluation object word pair that extract candidate in single words and phrases language alignment model.
Step 3, characteristic in conjunction with the emotional relationship between emotion word and evaluation object word and word itself, extract accurate emotion word and evaluation object from candidate word centering.
Step 4, with much a kind of more effective classification regression model, feeling polarities judgement is carried out to emotion word.
2. method according to claim 1, is characterized in that, is be that the corpus of sentence copies and generates another parallel corpus by the cutting in step 1 in step 2, and these two identical corpus are as the input language material of single language alignment model.
3. method according to claim 1 and 2, is characterized in that, further comprising the steps in step 3:
3.1 according to all words obtained to the emotional relationship value calculated between emotion word and evaluation object.
The word characteristic value Indicator of 3.2 calculated candidate evaluation object words
Evaluation object word due to candidate has field singularity, and the present invention has used m and commented on the uncorrelated but language material that scale is identical with former corpus in language material field, then the word index Indicator of combining information entropy and document frequency calculated candidate evaluation object.
If in the evaluation object list of candidate word have higher word frequency and in corpus distribution uniform, so to become the possibility of evaluation object larger for this candidate word.In comment corpus C, each section of comment as being an independently classification, a n section in corpus C, is had to comment on C={d 1, d 2..., d n, if certain evaluation object word t idistribute in corpus more even, then information entropy is larger.Describing formulism above:
I E ( t i ) = - Σ j = 1 n p ( d j , t i ) log p ( d j , t i ) - - - ( 1 )
P (d in above formula j, t i) represent candidate word t iat comment d jthe probability of middle appearance, circular is as follows:
p ( d j , t i ) = tf i j Σ j = 1 n tf i j - - - ( 2 )
Tf in above formula ijrepresent candidate word t iword frequency in the comment of a jth section, if t ionly occur in one section of comment, so p (d j, t i)=1, then logp (d j, t i)=0, makes IE (t i)=0.In order to the feasibility of subsequent calculations, IE (t i) can not be 0, just adding very little constant term factor ε=0.0001. new probability formula so now in the denominator part of formula (2) is:
p ( d j , t i ) = tf i j Σ j = 1 n tf i j + ϵ - - - ( 3 )
According to information entropy meeting prioritizing selection high frequency words, but common major terms may be had in high frequency words, and also may there is evaluation object in the low-frequency word omitted.Therefore, the present invention has used m and comment incoherent expectation storehouse, language material field, and calculates in conjunction with document frequency, and concrete formula is as follows:
D s ( t i ) j = α × l o g ( 1 + df i n ) i f df o u t _ j = 0 l o g ( 1 + df i n ) l o g ( 1 + df o u t _ j ) o t h e r w i s e - - - ( 4 )
Df in above formula incandidate word t idocument frequency in comment corpus, df out_jrepresent t iat jth and the document frequency in the incoherent corpus in field.Work as df out_jwhen=0, candidate word t ivery large probability is had to be the evaluation object word with field singularity.Therefore α is a parameter being greater than 1.
By above description, the formula of the Indicator of evaluation object is asked to be expressed as:
I ( t i ) = D s ( t 1 ) ‾ × I E ( t i ) - - - ( 5 )
In above formula D s ( t 1 ) ‾ = Σ j = 1 m D s ( t i ) j m
The word characteristic value Indicator of 3.3 calculated candidate emotion word
Most emotion word does not have field correlativity, as: " good ", " disliking ", " liking " etc.Small part emotion word has field singularity, as: " good to eat " in food and drink comment.The present invention is in conjunction with the Indicator of document frequency and word distribution proportion calculated candidate emotion word.Computing formula is as follows:
I(o i)=log(1+df i)×D i(6)
Df in above formula irepresent candidate word o idocument frequency in comment corpus, represent the distribution situation of candidate word.Tf ijcandidate word o iword frequency in the comment of a jth section, represent o iaverage word frequency in all comments.
3.4 obtain accurate emotion word and evaluation object
In order to more than modelling two factors, the present invention constructs a bipartite graph, then use the energy value of a Random Walk Algorithm iterative computation emotion word and evaluation object, in candidate word list, energy value is chosen as final emotion word (or evaluation object word) higher than those words of certain threshold value.Concrete formula is as follows:
E(t)=λ×R×E(o)+(1-λ)×I t
(7)
E(o)=λ×R×E(t)+(1-λ)×I o
Wherein E (t) and E (o) represents the energy value of evaluation object word and emotion word respectively, R representation relation matrix, R ijrepresent the Association weight between the evaluation object word of i-th candidate and jth candidate's emotion word.I trepresent the vector of the Indicator of candidate evaluations object, each element value is wherein calculated by formula (5).I orepresent the vector of the Indicator of candidate's emotion word, wherein each element value is calculated by formula (6).λ ∈ [0,1] is a mediation parameter.
CN201510471154.3A 2015-08-04 2015-08-04 A kind of web comment sentiment analysis method based on word alignment model Active CN105117428B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510471154.3A CN105117428B (en) 2015-08-04 2015-08-04 A kind of web comment sentiment analysis method based on word alignment model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510471154.3A CN105117428B (en) 2015-08-04 2015-08-04 A kind of web comment sentiment analysis method based on word alignment model

Publications (2)

Publication Number Publication Date
CN105117428A true CN105117428A (en) 2015-12-02
CN105117428B CN105117428B (en) 2018-12-04

Family

ID=54665418

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510471154.3A Active CN105117428B (en) 2015-08-04 2015-08-04 A kind of web comment sentiment analysis method based on word alignment model

Country Status (1)

Country Link
CN (1) CN105117428B (en)

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106257455A (en) * 2016-07-08 2016-12-28 闽江学院 A kind of Bootstrapping algorithm based on dependence template extraction viewpoint evaluation object
CN106528611A (en) * 2016-09-28 2017-03-22 西南交通大学 Analysis method based on internet comment data
CN107038609A (en) * 2017-04-24 2017-08-11 广州华企联信息科技有限公司 A kind of Method of Commodity Recommendation and system based on deep learning
CN107153641A (en) * 2017-05-08 2017-09-12 北京百度网讯科技有限公司 Comment information determines method, device, server and storage medium
CN107220238A (en) * 2017-05-24 2017-09-29 电子科技大学 A kind of text object abstracting method based on Mixed Weibull distribution
CN107544959A (en) * 2017-08-28 2018-01-05 北京奇艺世纪科技有限公司 The extracting method and device of a kind of evaluation object
CN107767195A (en) * 2016-08-16 2018-03-06 阿里巴巴集团控股有限公司 The display systems and displaying of description information, generation method and electronic equipment
CN108255808A (en) * 2017-12-29 2018-07-06 东软集团股份有限公司 The method, apparatus and storage medium and electronic equipment that text divides
CN108509421A (en) * 2018-04-04 2018-09-07 郑州大学 Text sentiment classification method based on random walk and Rough Decision confidence level
CN108763224A (en) * 2016-06-28 2018-11-06 大连民族大学 The interpretation method of the multi-lingual machine translation subsystem of comment information
CN108763214A (en) * 2018-05-30 2018-11-06 河海大学 A kind of sentiment dictionary method for auto constructing for comment on commodity
CN109033433A (en) * 2018-08-13 2018-12-18 中国地质大学(武汉) A kind of comment data sensibility classification method and system based on convolutional neural networks
CN109684641A (en) * 2018-12-26 2019-04-26 广东工业大学 A kind of data extraction device, method, electronic equipment and storage medium
CN110008477A (en) * 2019-04-15 2019-07-12 江西财经大学 A kind of Chinese Affective Evaluation unit abstracting method
CN110309308A (en) * 2019-06-27 2019-10-08 北京金山安全软件有限公司 Text information classification method and device and electronic equipment
CN110377739A (en) * 2019-07-19 2019-10-25 出门问问(苏州)信息科技有限公司 Text sentiment classification method, readable storage medium storing program for executing and electronic equipment
CN110569497A (en) * 2018-06-06 2019-12-13 淡江大学 Opinion vocabulary expansion system and opinion vocabulary expansion method
CN110738056A (en) * 2018-07-03 2020-01-31 百度在线网络技术(北京)有限公司 Method and apparatus for generating information
CN110825876A (en) * 2019-11-07 2020-02-21 上海德拓信息技术股份有限公司 Movie comment viewpoint emotion tendency analysis method
CN111079404A (en) * 2019-11-14 2020-04-28 联想(北京)有限公司 Data analysis method, device and storage medium
CN111125312A (en) * 2019-12-24 2020-05-08 深圳视界信息技术有限公司 Text labeling method and system
CN111353308A (en) * 2018-12-20 2020-06-30 北京深知无限人工智能研究院有限公司 Named entity recognition method, device, server and storage medium
CN111858886A (en) * 2020-07-13 2020-10-30 北京航空航天大学 Object and viewpoint extraction system for airport comments
CN113011182A (en) * 2019-12-19 2021-06-22 北京多点在线科技有限公司 Method, device and storage medium for labeling target object
CN107862343B (en) * 2017-11-28 2021-07-13 南京理工大学 Commodity comment attribute level emotion classification method based on rules and neural network
CN114398911A (en) * 2022-01-24 2022-04-26 平安科技(深圳)有限公司 Emotion analysis method and device, computer equipment and storage medium
CN117172248A (en) * 2023-11-03 2023-12-05 翼方健数(北京)信息科技有限公司 Text data labeling method, system and medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008106665A2 (en) * 2007-03-01 2008-09-04 Microsoft Corporation Graph-based search leveraging sentiment analysis of user comments
CN103617245A (en) * 2013-11-27 2014-03-05 苏州大学 Bilingual sentiment classification method and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008106665A2 (en) * 2007-03-01 2008-09-04 Microsoft Corporation Graph-based search leveraging sentiment analysis of user comments
CN103617245A (en) * 2013-11-27 2014-03-05 苏州大学 Bilingual sentiment classification method and device

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
KANG LIU ET AL: "Co-Extracting Opinion Targets and Opinion Words from Online Reviews Based on the Word Alignment Model", 《IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING》 *
杜嘉忠 等: "网络商品评价的特征-情感词本体构建与情感分析方法研究", 《情报分析与研究》 *
许力波: "产品评价对象与情感词搭配关系的抽取", 《中国优秀硕士学位论文全文数据库》 *
许力波: "产品评价对象与情感词搭配关系的抽取", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108763224A (en) * 2016-06-28 2018-11-06 大连民族大学 The interpretation method of the multi-lingual machine translation subsystem of comment information
CN106257455A (en) * 2016-07-08 2016-12-28 闽江学院 A kind of Bootstrapping algorithm based on dependence template extraction viewpoint evaluation object
CN107767195A (en) * 2016-08-16 2018-03-06 阿里巴巴集团控股有限公司 The display systems and displaying of description information, generation method and electronic equipment
CN106528611A (en) * 2016-09-28 2017-03-22 西南交通大学 Analysis method based on internet comment data
CN107038609A (en) * 2017-04-24 2017-08-11 广州华企联信息科技有限公司 A kind of Method of Commodity Recommendation and system based on deep learning
CN107153641A (en) * 2017-05-08 2017-09-12 北京百度网讯科技有限公司 Comment information determines method, device, server and storage medium
CN107220238A (en) * 2017-05-24 2017-09-29 电子科技大学 A kind of text object abstracting method based on Mixed Weibull distribution
CN107544959A (en) * 2017-08-28 2018-01-05 北京奇艺世纪科技有限公司 The extracting method and device of a kind of evaluation object
CN107862343B (en) * 2017-11-28 2021-07-13 南京理工大学 Commodity comment attribute level emotion classification method based on rules and neural network
CN108255808A (en) * 2017-12-29 2018-07-06 东软集团股份有限公司 The method, apparatus and storage medium and electronic equipment that text divides
CN108255808B (en) * 2017-12-29 2021-10-22 东软集团股份有限公司 Text division method and device, storage medium and electronic equipment
CN108509421B (en) * 2018-04-04 2021-09-28 郑州大学 Text emotion classification method based on random walk and rough decision confidence
CN108509421A (en) * 2018-04-04 2018-09-07 郑州大学 Text sentiment classification method based on random walk and Rough Decision confidence level
CN108763214A (en) * 2018-05-30 2018-11-06 河海大学 A kind of sentiment dictionary method for auto constructing for comment on commodity
CN108763214B (en) * 2018-05-30 2021-09-24 河海大学 Automatic construction method of emotion dictionary for commodity comments
CN110569497A (en) * 2018-06-06 2019-12-13 淡江大学 Opinion vocabulary expansion system and opinion vocabulary expansion method
CN110738056B (en) * 2018-07-03 2023-12-19 百度在线网络技术(北京)有限公司 Method and device for generating information
CN110738056A (en) * 2018-07-03 2020-01-31 百度在线网络技术(北京)有限公司 Method and apparatus for generating information
CN109033433B (en) * 2018-08-13 2020-09-29 中国地质大学(武汉) Comment data emotion classification method and system based on convolutional neural network
CN109033433A (en) * 2018-08-13 2018-12-18 中国地质大学(武汉) A kind of comment data sensibility classification method and system based on convolutional neural networks
CN111353308A (en) * 2018-12-20 2020-06-30 北京深知无限人工智能研究院有限公司 Named entity recognition method, device, server and storage medium
CN109684641B (en) * 2018-12-26 2023-04-07 广东工业大学 Data extraction device and method, electronic equipment and storage medium
CN109684641A (en) * 2018-12-26 2019-04-26 广东工业大学 A kind of data extraction device, method, electronic equipment and storage medium
CN110008477A (en) * 2019-04-15 2019-07-12 江西财经大学 A kind of Chinese Affective Evaluation unit abstracting method
CN110309308A (en) * 2019-06-27 2019-10-08 北京金山安全软件有限公司 Text information classification method and device and electronic equipment
CN110377739A (en) * 2019-07-19 2019-10-25 出门问问(苏州)信息科技有限公司 Text sentiment classification method, readable storage medium storing program for executing and electronic equipment
CN110825876A (en) * 2019-11-07 2020-02-21 上海德拓信息技术股份有限公司 Movie comment viewpoint emotion tendency analysis method
CN111079404A (en) * 2019-11-14 2020-04-28 联想(北京)有限公司 Data analysis method, device and storage medium
CN113011182A (en) * 2019-12-19 2021-06-22 北京多点在线科技有限公司 Method, device and storage medium for labeling target object
CN113011182B (en) * 2019-12-19 2023-10-03 北京多点在线科技有限公司 Method, device and storage medium for labeling target object
CN111125312A (en) * 2019-12-24 2020-05-08 深圳视界信息技术有限公司 Text labeling method and system
CN111858886B (en) * 2020-07-13 2022-05-31 北京航空航天大学 Object and viewpoint extraction system for airport comments
CN111858886A (en) * 2020-07-13 2020-10-30 北京航空航天大学 Object and viewpoint extraction system for airport comments
CN114398911A (en) * 2022-01-24 2022-04-26 平安科技(深圳)有限公司 Emotion analysis method and device, computer equipment and storage medium
CN117172248A (en) * 2023-11-03 2023-12-05 翼方健数(北京)信息科技有限公司 Text data labeling method, system and medium
CN117172248B (en) * 2023-11-03 2024-01-30 翼方健数(北京)信息科技有限公司 Text data labeling method, system and medium

Also Published As

Publication number Publication date
CN105117428B (en) 2018-12-04

Similar Documents

Publication Publication Date Title
CN105117428A (en) Web comment sentiment analysis method based on word alignment model
Çalı et al. Improved decisions for marketing, supply and purchasing: Mining big data through an integration of sentiment analysis and intuitionistic fuzzy multi criteria assessment
Mendoza et al. Extractive single-document summarization based on genetic operators and guided local search
Hu et al. Unsupervised sentiment analysis with emotional signals
CN108681557B (en) Short text topic discovery method and system based on self-expansion representation and similar bidirectional constraint
CN109344240B (en) Data processing method, server and electronic equipment
Shi et al. Enhanced customer requirement classification for product design using big data and improved Kano model
CN107038184B (en) A kind of news recommended method based on layering latent variable model
Bigorra et al. Aspect-based Kano categorization
CN103455487A (en) Extracting method and device for search term
WO2013118435A1 (en) Semantic similarity level computation method, system and program
KR102370729B1 (en) Sentence writing system
Zhu et al. A neural translating general hyperplane for knowledge graph embedding
Lu et al. Graph-based collaborative filtering with mlp
Lu et al. Joint semantic similarity assessment with raw corpus and structured ontology for semantic-oriented service discovery
Ramshankar et al. A novel recommendation system enabled by adaptive fuzzy aided sentiment classification for E-commerce sector using black hole-based grey wolf optimization
Li et al. A novel locality-sensitive hashing relational graph matching network for semantic textual similarity measurement
Xu et al. Latent interest and topic mining on user-item bipartite networks
CN105205075A (en) Named entity set extension method based on synergetic self-extension and query suggestion method
D’Addio et al. Combining different metadata views for better recommendation accuracy
Gan et al. CDMF: a deep learning model based on convolutional and dense-layer matrix factorization for context-aware recommendation
Rizal et al. Sentiment analysis for opinion IESM product with recurrent neural network approach based on long short term memory
Mishra et al. Evaluating Performance of Machine Leaming Techniques used in Opinion Mining
Nguyen et al. A variational autoencoder mixture model for online behavior recommendation
Sivaramakrishnan et al. Validating effective resume based on employer’s interest with recommendation system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant