CN101782898A - Method for analyzing tendentiousness of affective words - Google Patents

Method for analyzing tendentiousness of affective words Download PDF

Info

Publication number
CN101782898A
CN101782898A CN 201010133149 CN201010133149A CN101782898A CN 101782898 A CN101782898 A CN 101782898A CN 201010133149 CN201010133149 CN 201010133149 CN 201010133149 A CN201010133149 A CN 201010133149A CN 101782898 A CN101782898 A CN 101782898A
Authority
CN
China
Prior art keywords
emotion
speech
emotion speech
tendentiousness
seed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 201010133149
Other languages
Chinese (zh)
Inventor
蒋喻新
张勇东
郭俊波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN 201010133149 priority Critical patent/CN101782898A/en
Publication of CN101782898A publication Critical patent/CN101782898A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method for analyzing tendentiousness of affective words, comprising the following steps: grasping the comment information with star levels on the network; extracting affective words in the comment information; and obtaining the tendentiousness of affective words by weighting sums of tendentiousness of affective words calculated by numerical value of star levels, tendentiousness of affective words with constructed seed affective words based on PMI-IR algorithm and tendentiousness of affective words calculated based on conjunction character. Through the application of the method, the work load for affective analysis system construction is reduced; the influence of human subjective factor on marking of affective words is reduced by using the comment with star levels on the network; and weighting sums of tendentiousness of affective words are carried out through the combination of various algorithms, thereby reducing the influence due to bad selection of seed affective words and the influence of corpus quality to tendentiousness of affective words.

Description

A kind of method for analyzing tendentiousness of affective words
Technical field
The present invention relates to the sorting technique of emotion speech, more specifically, the present invention relates to a kind of method for analyzing tendentiousness of affective words.
Background technology
Fast development and widespread use along with the web2.0 technology, original content (the UGC of increasing user has appearred on the network, User Generated Content), the content of issuing on the BBS system for example, the user is to commodity or seller's comment in the Online Store, present Internet video and review information thereof, and the system that can talk animatedly own idea such as the various network users of allowing of blog and micro blog system.Along with the convenience of people's online and the popularization of surfing Internet with cell phone, the original content of user expands rapidly, and various aspects also increase gradually to the concern of this information.For example producer wishes by analyzing the review information of buyer to commodity, knows the merits and demerits of people to the expectation and the existence of this commodity of these commodity, thereby better this product is improved and sales promotion, to increase the sale of this product.People can pass through the analysis to the emotion tendency of film comment, know the degree of recognition of spectators to this film.Government can be by carrying out the analysis of emotion tendency to the comment on the network, understands the viewpoint that the masses hold a certain policy or incident, thereby implement policy or processing events better.
Current tendentious calculating to the emotion speech, the relatively more classical PMI-IR that Turney is arranged (PointMutual Information-Information Retrieval) algorithm, this algorithm needs very a large amount of corpus, by calculating the correlativity between current emotion speech and the forward and reverse seed emotion speech, obtain the tendentiousness of emotion speech.This algorithm relies on the selection of seed emotion speech and the quality of corpus very much, and be difficult to get access to the corpus that has emotion on a large scale, this algorithm is not considered the context environmental of the speech that the emotion speech is relied in addition, just simply calculates from the co-occurrence frequency aspect of speech.
In addition, also exist and utilize synon method to calculate the tendentiousness of emotion speech, this method need find a synonymicon, and in this synonymicon, must sort according to related tightness degree to the synonym set of each speech.When calculating synon correlativity, must consider that a speech is after the certain redirect of process, may be all identical with the meaning of a word of adversative two speech, this makes when calculating the emotion tendency of a speech, both require the selection of seed emotion speech very strict, and required synonymicon can meet above-mentioned requirements again simultaneously.Because the tendentiousness of partial feeling speech depends on context, it is favourable that this makes the ordering of synonym set in the synonymicon to calculate the emotion tendency of this speech in some field again, and may obtain the emotion tendency of this speech mistake concerning other field.
Summary of the invention
For the tendentiousness that overcomes existing emotion speech calculate in dictionary be difficult to obtain, the defective of poor accuracy and artificial affecting, the present invention proposes a kind of analytical approach of emotion tendency.
According to an aspect of the present invention, proposed a kind of method for analyzing tendentiousness of affective words, having comprised:
The review information that has star on step 10), the extracting network;
Step 20), the emotion speech in the extraction review information;
Step 30), the emotion speech tendentiousness by will utilizing star to quantize to calculate, utilize emotion speech tendentiousness that the seed emotion base that makes up obtains in the PMI-IR algorithm and the emotion speech tendentiousness weighted sum of calculating based on conjunction character, obtain the tendentiousness of emotion speech.
Step 40), the tabulation that will have a tendentious emotion speech sorts out this method also comprises:, for the tendentiousness value greater than the emotion speech of threshold value as forward emotion speech, for the tendentiousness value less than the emotion speech of threshold value as negative sense emotion speech, the emotion speech of other conduct neutrality obtains emotion speech dictionary.
Wherein, step 10) also comprises: the layout according to web page contents is provided with different extraction templates and rule, grasps review information and corresponding star information thereof in the webpage.
Wherein, step 10) comprises:
Step 110), according to the source difference of review information, based on corresponding review information and the star information extraction template of website customization;
Step 120), download the content of whole webpage;
Step 130), according to the source code of extraction template and web page contents, extract review information in the described web page contents and corresponding star information, constitute the review information corpus.
Wherein, in step 20) in, extract adjective, verb, adverbial word or noun in the described review information as the emotion speech.
Wherein, step 20) further comprise:
Step 210), definition emotion speech, and the emotion speech is divided into forward, reverse and neutral;
Step 220), with the review information participle;
Step 230), that the pairing star of all the emotion speech in the review information behind the participle and this review information is formed binary is right, constitutes emotion speech and star binary to tabulation.
Wherein, step 30) also comprise:
Step 310), star is quantized, calculate the tendentiousness of described binary to emotion speech in the tabulation;
Step 320), make up seed emotion set of words;
Step 330), based on described seed emotion set of words, utilize PMI-IR algorithm computation binary to the tendentiousness of emotion speech in the tabulation;
Step 340), create the conjunction processing rule, based on seed emotion set of words, loop iteration calculates the tendentiousness of the emotion speech that occurred jointly with seed emotion speech;
Step 350), above-mentioned three kinds of tendentiousness are weighted summation.
Wherein, step 310) also comprise: all tendentiousness values of described emotion speech are averaged, the tendentiousness of this emotion speech mean value as this emotion speech.
Wherein, step 320) comprising: select not to be subjected to intense emotion speech that context and field influence as seed emotion speech, wherein, for the emotion analytic system towards the particular area, the intense emotion speech of selecting this field is as seed emotion speech.
According to a further aspect in the invention, provide a kind of construction method of emotion speech dictionary, comprising:
The review information that has star on step 10), the extracting network;
Step 20), the emotion speech in the described review information of extraction;
Step 30), the emotion speech tendentiousness by will utilizing described star to quantize to calculate, utilize emotion speech tendentiousness that the seed emotion base that makes up obtains in the PMI-IR algorithm and the emotion speech tendentiousness weighted sum of calculating based on conjunction character, obtain the tendentiousness of described emotion speech;
Step 40), output emotion speech dictionary is sorted out in the tabulation that will have tendentious emotion speech.
Wherein, step 40) also comprise: the tabulation that will have tendentious described emotion speech is sorted out, for tendentious value greater than the emotion speech of threshold value as forward emotion speech, for tendentious value less than the emotion speech of threshold value as negative sense emotion speech, the emotion speech of other conduct neutrality obtains emotion speech dictionary.
By using method of the present invention, reduced artificial mark to the emotion corpus, alleviated work load for affective analysis system construction to a great extent; Utilize the star comment on the network, minimum degree ground reduces the influence of the subjective factor of people in the artificial mark to the mark of emotion speech; By the combination of multiple algorithm, the tendentiousness of emotion speech is weighted summation, reduce seed emotion selected ci poem and selected the influence that the improper influence that brings and corpus quality are calculated emotion speech tendentiousness.And high-quality seed emotion speech and emotion corpus can further improve the accuracy that emotion speech tendentiousness is calculated.Make method disclosed by the invention can make up an emotion speech dictionary fast and accurately, accelerate the structure of emotion analytic system.
Description of drawings
Fig. 1 illustrates the process flow diagram according to a kind of emotion speech based on sentiment classification method of the embodiment of the invention;
Fig. 2 illustrates the pattern diagram according to the information extraction template of the embodiment of the invention.
Embodiment
Below in conjunction with the drawings and specific embodiments a kind of method for analyzing tendentiousness of affective words provided by the invention is described in detail.
The present invention proposes a kind of method for analyzing tendentiousness of affective words that is independent of application, this method can be used or be transplanted in a plurality of specific fields, include but not limited to: the emotion analysis of Internet video, the emotion analysis of news, blog, microblogging visitor and comment thereof, the emotion analysis of film comment, the emotion analysis of comment on commodity etc.
With reference to figure 1, illustrate and describe a kind of method for analyzing tendentiousness of affective words according to the embodiment of the invention.As shown in Figure 1, this method comprises: the review information that has star on step 10), the extracting network; Step 20), the emotion speech in the extraction review information; Step 30), the emotion speech tendentiousness that calculates by star is quantized, utilize emotion speech tendentiousness that the seed emotion base that makes up obtains in the PMI-IR algorithm and the emotion speech tendentiousness weighted sum of calculating based on conjunction character, obtain the tendentiousness of emotion speech.The present invention also comprises step 40), the tabulation that will have a tendentious emotion speech sorts out output emotion speech dictionary.That is to say,, can obtain emotion speech dictionary based on the based on sentiment classification result of above-mentioned emotion speech.
Further with reference to figure 1, each step of the inventive method is described in detail.In step 10), according to the layout of web page contents different extraction templates and rule are set, grasp review information and corresponding star information thereof in the webpage; Comprise: step 110), according to the source difference of review information, based on corresponding review information and the star information extraction template of website customization.Step 120), download the content (for example, webpage source code) of whole webpage.Step 130), according to the content of extraction template and webpage source code, comment text information in the extraction step 120 in the web pages downloaded content and corresponding star information constitute the review information corpus.Comprise following concrete operations step:
In step 110) in, for the separate sources of review information, such as the online review information of bean cotyledon, the review information on the TIME dotCom or other have the review information of the website of this star information, can be as the source of data.Usually, most of website is all by generating web data by template, and the webpage with identical or similar semantic content has identical or similar HTML syntactic structure.By extracting the regular expression that all the total syntactic structures of webpage that include review information and star information in the website make up information extraction.
In an instantiation, according to the website that will extract review information and star information, formulate the template of information extraction respectively, the pattern of information extraction template is as shown in Figure 2.Further, utilize the instrument HTMLParser that increases income that the html text content of the page is analyzed, and formulate the regular expression of information extraction.
In step 20, the adjective in the extraction comment statement is as the emotion speech.The expression of emotion can be used adjective, verb, adverbial word or noun, but adjective is still passed through in the expression of most of emotion, in order to simplify the structure that emotion is analyzed, specifically be example herein, but this method it is also conceivable that verb, noun or adverbial word in the comment statement with the adjective.
Step 20) comprising: step 210), definition emotion speech; Step 220), the participle of review information; Step 230), carry out corresponding with star the emotion speech; With step 240), obtain emotion speech and star to tabulation.
In step 210, several emotion classification such as general emotion definition is in the family way, angry, sad, happy, method of the present invention is divided into forward, reverse and neutral with emotion, such as " greatness ", " affability ", " fine " etc. are thought forward emotion speech, " grief ", " vulgar ", " badly " etc. are thought reverse emotion speech, neither belong to forward emotion speech for those, do not belong to reverse emotion speech again, then think to belong to neutral emotion speech.In one embodiment, define an emotion score value interval [1 ,-1], and a threshold value t, for an emotion speech w, the emotion score value v of w, if | v|<=t, think that then this emotion speech is neutral emotion speech, if v>t then thinks forward emotion speech, otherwise be reverse emotion speech.Wherein, the score value interval is the span of score value v, threshold value also be one of the score value interval greater than 0 value, can there be other definition in the score value interval.Here can carry out normalized to the score value that obtains, make it between [1 ,-1], the selection of threshold value is mainly still according to empirical value, and general selection is a value between 0.15-0.20 greatly.
In step 220, use the version of increasing income of the Words partition system ictclas of the Inst. of Computing Techn. Academia Sinica's exploitation that has part-of-speech tagging that review information is carried out participle.The participle speed and the accuracy of this system can satisfy needs of the present invention, and the version of increasing income of this system can be taken up an official post in the mind to carry and obtained from network.
In step 230, the star according to the content of being extracted behind the participle and this content correspondence extracts all the emotion speech in the comment text information behind the participle, forms binary to {<w with the pairing star of this review information i, p〉| the number of the emotion speech that i=1 comprises in this comment text information), constitute a binary to tabulation, for example table 1.
Table 1
The emotion speech The tendentiousness value
Ugly ??-0.9047619047619047
Bored ??-0.25
Curious ??0.1282051282051283
Actively ??0.4285714285714286
In step 30) in, calculate the tendentious step of emotion speech and comprise: step 310), star is quantized; Step 320), calculate the tendentiousness of binary to emotion speech in the tabulation; Step 330), make up seed emotion set of words; Step 340), utilize the tendentiousness of PMI-IR algorithm computation binary to emotion speech in the tabulation; Step 350), create the conjunction processing rule; Step 360), loop iteration calculates the tendentiousness of the emotion speech that occurred jointly with seed emotion speech; Step 370) calculates tendentiousness with three kinds and be weighted summation.
In step 310, various star comment major parts on the current network all are the comments of 5 stars, be minimum be 1 star, the highest is 5 stars, like this with these these stars comment respectively with [2,2] in 5 numerical value carry out correspondence, oppositely comment of negative representative, positive number is represented the forward comment, thinks neutral comment for 0.Certainly, also have other critiquing approach, the website that has is direct marking, and for example Taobao is from the star to the brill, but such marking all has an order from low to high, and the interval that constitutes from low to high is divided into a plurality of grades, carries out corresponding with numerical value respectively.Like this, all review information are equivalent to carry out artificial mark, and this method is relative and those are manually concerning the mark that corpus carries out, an advantage that exists be exactly these so-called " marks " be to be undertaken by a large amount of netizens, reduced the mark biasing problem that causes owing to the people is few in the artificial mark to a great extent.
In step 320, to the tendentious calculating of an emotion speech, with all tendentiousness values of the appearance of this speech carry out one average, with the tendentiousness value of this mean value, suppose that all mark values of emotion speech W are V as this emotion speech 1, V 2..., V nThen the emotion tendency value SO of this emotion speech W is:
SO ( w ) = Σ i = 1 n V i n
Can obtain the emotion tendency value of all emotion speech in the tabulation of emotion speech by step 320, because the randomness of network data, the star comment is the overall marking that the netizen does to comment, but do not represent in the comment and have only a kind of viewpoint, might exist two kinds of diametrically opposite two kinds of viewpoints in a comment, this just may cause erroneous judgement to the based on sentiment classification of emotion speech.In order to reduce this risk, further added the two kinds of algorithms in back, the tendentiousness of emotion speech is carried out further determining.
In the step 330, the selection of seed emotion speech can be subjected to the influence in field, such as " length " this speech, may not be the speech of a forward in the inside, film field, but concerning some commodity, " length " is again a forward speech.In order to make this invention can be independent of the field as far as possible, the seed emotion speech of selection is those intense emotion speech that not influenced by context and field as far as possible.For the emotion analytic system towards the particular area, the intense emotion speech that can select this field is as seed emotion speech.Rule of thumb, generally select each 10 of forward and reverse seed emotion speech, this had both considered accuracy, had considered the computing velocity problem simultaneously again.
In step 340, according to the seed emotion set of words that obtains in the step 330, emotion base in the tabulation of emotion speech is carried out tendentiousness in the PMI-IR algorithm to be calculated, can not need the very big emotion corpus that is beneficial to the PMI-IR algorithm, and can utilize the help of search engine, calculate the tendentiousness of emotion speech.The computing formula of this algorithm is as follows
I ( w , w i ) = log ( p ( w , w i ) p ( w ) * p ( w i ) )
SO ( w ) = Σ i = 1 n I ( w , P i ) - Σ i = 1 n I ( w , N i )
I (w, w wherein i) expression emotion speech w and emotion speech w iBetween mutual information, p (w, w i) co-occurrence probabilities of two speech of expression between directly, P iExpression forward seed emotion speech, N iRepresent reverse seed emotion speech, the emotion tendency value of SO (w) expression emotion speech w.Because calculating the co-occurrence frequency of two speech is by search engine, thinks that then the sum of speech all is the same, like this, with the hits (Search Results number) of Search Results as the co-occurrence probabilities between two speech, simplification computation process.
For an emotion speech w, if the absolute value of the value of SO (w), thinks that this emotion speech is a neutral words less than threshold value, if the value of SO (w), thinks then that this emotion speech is a forward greater than threshold value, otherwise, then be reverse emotion speech (this threshold value is aforesaid threshold values).Like this, obtain the tendentiousness of all emotion speech in the tabulation of emotion speech by the PMI-IR algorithm.
In step 350) in, the present invention has considered that coordination conjunction and turnover concerns that two conjunctions of conjunction gather, such as the coordination conjunction " with ", " with ", " and " etc., simultaneously, the present invention also also thinks coordinating conjunction with the progressive relationship conjunction, as " and ", " and " etc.For the emotion speech that couples together by coordinating conjunction, think to have identical emotion tendency.And turnover concerns conjunction, as " still ", " but ", " yet " etc., for there being turnover to concern the emotion speech that conjunction couples together, think to have opposite emotion tendency.Equally, utilize selected seed emotion speech in the operation steps 330, concern that by in the emotion corpus, using coordination conjunction and turnover conjunction constantly carries out loop iteration, emotion speech dictionary is expanded, till emotion speech dictionary does not increase.Like this, can obtain the tendentiousness of emotion speech in the some or all of emotion speech tabulation.In the tabulation of emotion speech, still think 0 for those by the tendentiousness value that concerns the emotion speech that conjunction does not expand to.Wherein, this seed emotion set of words can enlarge with different applications as required, because this step utilizes existing emotion corpus to calculate, need not obtain data in real time from network, so the speed aspect can not increase and influence to some extent because of the appropriateness of seed emotion speech.
In step 370, the tendentiousness value of emotion speech is carried out normalization in the emotion speech tabulation that above three kinds of modes are obtained, and is weighted summation.
SO ′ ( w ) = so ( w ) - min ( so ( w i ) ) max ( so ( w i ) ) - min ( so ( w i ) )
The emotion tendency value of so (w) expression emotion speech w, min (so (w i)) then represent the value of emotion tendency minimum in all emotion speech, max (so (w i)) then representing the value of emotion tendency maximum in all emotion speech, so (w) then represents the emotion tendency of emotion speech w.SO ' (w) then represents tendentiousness value after w carries out normalization, and symbol keeps the same with original symbol.Like this, carrying out after the normalization, each emotion speech all has three tendentiousness values (being the above-mentioned tendentiousness value of utilizing this emotion speech that three kinds of algorithms obtain respectively), and they are weighted summation, obtains the final emotion tendency value of this emotion speech:
SO ′ ′ ( w ) = Σ i = 1 n α i SO i ′ ( w )
SO " (w) expression be weighted the tendentiousness value of summation back emotion speech w, α iThe expression weight coefficient (selection of weight is still value rule of thumb mainly, different to the field of the quality of handled language material and language material, weight can be different), SO i' (w) the tendentiousness value of w, n=3 here among the expression tabulation i.
In step 40) in, the emotion speech tabulation that has the emotion tendency value that obtains is sorted out, a threshold value is set, for the tendentiousness value greater than the emotion speech of threshold value as forward emotion speech, for the tendentiousness value less than the emotion speech of threshold value as negative sense emotion speech, the emotion speech of other conduct neutrality constitutes unsupervised emotion speech dictionary.
It should be noted that at last, above embodiment is only in order to describe technical scheme of the present invention rather than the present technique method is limited, the present invention can extend to other modification, variation, application and embodiment on using, and therefore thinks that all such modifications, variation, application, embodiment are in spirit of the present invention and teachings.

Claims (13)

1. method for analyzing tendentiousness of affective words comprises:
The review information that has star on step 10), the extracting network;
Step 20), the emotion speech in the described review information of extraction;
Step 30), the emotion speech tendentiousness by will utilizing described star to quantize to calculate, utilize emotion speech tendentiousness that the seed emotion base that makes up obtains in the PMI-IR algorithm and the emotion speech tendentiousness weighted sum of calculating based on conjunction character, obtain the tendentiousness of described emotion speech.
2. the described method of claim 1 also comprises:
Step 40), the tabulation that will have tendentious described emotion speech is sorted out, for tendentious value greater than the emotion speech of threshold value as forward emotion speech, for tendentious value less than the emotion speech of threshold value as negative sense emotion speech, the emotion speech of other conduct neutrality obtains emotion speech dictionary.
3. the described method of claim 1, wherein, step 10) also comprises: the layout according to web page contents is provided with different extraction templates and rule, grasps review information and corresponding star information thereof in the webpage.
4. the described method of claim 3, wherein, step 10) comprises:
Step 110), according to the source difference of described review information, based on corresponding review information and the star information extraction template of website customization;
Step 120), download the content of whole webpage;
Step 130), according to the source code of extraction template and web page contents, extract review information in the described web page contents and corresponding star information, constitute the review information corpus.
5. the described method of claim 1 is wherein, in step 20) in, extract adjective, verb, adverbial word or noun in the described review information as the emotion speech.
6. the described method of claim 5, wherein, step 20) further comprise:
Step 210), definition emotion speech, and the emotion speech is divided into forward, reverse and neutral;
Step 220), with the review information participle;
Step 230), that the pairing star of all the emotion speech in the review information behind the participle and this review information is formed binary is right, constitutes emotion speech and star binary to tabulation.
7. the described method of claim 6, wherein, step 30) also comprise:
Step 310), star is quantized, calculate the tendentiousness of described binary to emotion speech in the tabulation;
Step 320), make up seed emotion set of words;
Step 330), based on described seed emotion set of words, utilize PMI-IR algorithm computation binary to the tendentiousness of emotion speech in the tabulation;
Step 340), create the conjunction processing rule, based on seed emotion set of words, loop iteration calculates the tendentiousness of the emotion speech that occurred jointly with seed emotion speech;
Step 350), above-mentioned three kinds of tendentiousness are weighted summation.
8. the described method of claim 7, wherein, step 310) also comprise: the tendentiousness values that described emotion speech is all average, with the tendentiousness of this emotion speech mean value as this emotion speech.
9. the described method of claim 7, wherein, step 320) comprising: select not to be subjected to intense emotion speech that context and field influence as seed emotion speech, wherein, for the emotion analytic system towards the particular area, the intense emotion speech of selecting this field is as seed emotion speech.
10. the described method of claim 7, wherein, step 330) comprising: according to described seed emotion set of words,, utilize the Search Results that obtains of search engine, calculate the tendentiousness of emotion speech, promptly based on the PMI-IR algorithm
I ( w , w i ) = log ( p ( w , w i ) p ( w ) * p ( w i ) )
SO ( w ) = Σ i = 1 n I ( w , P i ) - Σ i = 1 n I ( w , N i )
Wherein, I (w, w i) expression emotion speech w and emotion speech w iBetween mutual information, p (w, w i) co-occurrence probabilities of two speech of expression between directly, P iExpression forward seed emotion speech, N iRepresent reverse seed emotion speech, the emotion tendency value of SO (w) expression emotion speech w.
11. the described method of claim 7, wherein, step 340) also comprises: according to described seed emotion set of words, by in the emotion corpus, using coordination conjunction and turnover to concern that conjunction carries out loop iteration, emotion speech dictionary is expanded, obtained the tendentiousness of emotion speech in the some or all of emotion speech tabulation.
12. the construction method of an emotion speech dictionary comprises:
The review information that has star on step 10), the extracting network;
Step 20), the emotion speech in the described review information of extraction;
Step 30), the emotion speech tendentiousness by will utilizing described star to quantize to calculate, utilize emotion speech tendentiousness that the seed emotion base that makes up obtains in the PMI-IR algorithm and the emotion speech tendentiousness weighted sum of calculating based on conjunction character, obtain the tendentiousness of described emotion speech;
Step 40), output emotion speech dictionary is sorted out in the tabulation that will have tendentious emotion speech.
13. the described method of claim 12, wherein, step 40) also comprise: the tabulation that will have tendentious described emotion speech is sorted out, for tendentious value greater than the emotion speech of threshold value as forward emotion speech, for tendentious value less than the emotion speech of threshold value as negative sense emotion speech, the emotion speech of other conduct neutrality obtains emotion speech dictionary.
CN 201010133149 2010-03-25 2010-03-25 Method for analyzing tendentiousness of affective words Pending CN101782898A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201010133149 CN101782898A (en) 2010-03-25 2010-03-25 Method for analyzing tendentiousness of affective words

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201010133149 CN101782898A (en) 2010-03-25 2010-03-25 Method for analyzing tendentiousness of affective words

Publications (1)

Publication Number Publication Date
CN101782898A true CN101782898A (en) 2010-07-21

Family

ID=42522898

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201010133149 Pending CN101782898A (en) 2010-03-25 2010-03-25 Method for analyzing tendentiousness of affective words

Country Status (1)

Country Link
CN (1) CN101782898A (en)

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102279890A (en) * 2011-09-02 2011-12-14 苏州大学 Sentiment word extracting and collecting method based on micro blog
CN102646128A (en) * 2012-03-06 2012-08-22 北京航空航天大学 Method for labeling word properties of emotional words based on extensible markup language (XML)
CN102855276A (en) * 2012-07-20 2013-01-02 北京大学 Method for judging polarity of comment text and application of method
CN103077207A (en) * 2012-12-28 2013-05-01 深圳先进技术研究院 Method and system for analyzing microblog happiness index
CN103150367A (en) * 2013-03-07 2013-06-12 宁波成电泰克电子信息技术发展有限公司 Method for analyzing emotional tendency of Chinese microblogs
CN103559174A (en) * 2013-09-30 2014-02-05 东软集团股份有限公司 Semantic emotion classification characteristic value extraction method and system
CN103577452A (en) * 2012-07-31 2014-02-12 国际商业机器公司 Website server and method and device for enriching content of website
CN103593334A (en) * 2012-08-15 2014-02-19 中国电信股份有限公司 Method and system for judging emotional degree of text
CN103688256A (en) * 2012-01-20 2014-03-26 华为技术有限公司 Method, device and system for determining video quality parameter based on comment
CN103885994A (en) * 2012-12-24 2014-06-25 腾讯科技(深圳)有限公司 Method and device for comparing products
CN104090864A (en) * 2014-06-09 2014-10-08 合肥工业大学 Emotion dictionary building and emotion calculation method
CN104199875A (en) * 2014-08-20 2014-12-10 百度在线网络技术(北京)有限公司 Search recommending method and device
CN104731770A (en) * 2015-03-23 2015-06-24 中国科学技术大学苏州研究院 Chinese microblog emotion analysis method based on rules and statistical model
CN105243054A (en) * 2015-09-23 2016-01-13 中国传媒大学 Television program satisfaction subjective evaluation method and construction system
CN105279148A (en) * 2015-10-19 2016-01-27 昆明理工大学 User review consistency judgment method of APP (Application) software
CN105868185A (en) * 2016-05-16 2016-08-17 南京邮电大学 Part-of-speech-tagging-based dictionary construction method applied in shopping comment emotion analysis
CN106021433A (en) * 2016-05-16 2016-10-12 北京百分点信息科技有限公司 Public praise analysis method and apparatus for product review data
CN106022878A (en) * 2016-05-19 2016-10-12 华南理工大学 Community comment emotion tendency analysis-based mobile phone game ranking list construction method
CN106202032A (en) * 2016-06-24 2016-12-07 广州数说故事信息科技有限公司 A kind of sentiment analysis method towards microblogging short text and system thereof
CN106598938A (en) * 2015-10-16 2017-04-26 北京国双科技有限公司 Method and device for determining emotion tendencies of documents
CN106610955A (en) * 2016-12-13 2017-05-03 成都数联铭品科技有限公司 Dictionary-based multi-dimensional emotion analysis method
CN107688630A (en) * 2017-08-21 2018-02-13 北京工业大学 A kind of more sentiment dictionary extending methods of Weakly supervised microblogging based on semanteme
CN107908782A (en) * 2017-12-06 2018-04-13 陕西识代运筹信息科技股份有限公司 A kind of data processing method and device based on sentiment analysis
CN108197106A (en) * 2017-12-29 2018-06-22 深圳市中易科技有限责任公司 A kind of product competition analysis method based on deep learning, apparatus and system
CN108848300A (en) * 2018-05-08 2018-11-20 百度在线网络技术(北京)有限公司 Method and apparatus for output information
CN108958710A (en) * 2018-07-05 2018-12-07 北方工业大学 Method for extracting covariance correlation of project progress based on emotional factors
CN109657045A (en) * 2018-12-20 2019-04-19 东软集团股份有限公司 A kind of method, apparatus, storage medium and processor obtaining vocabulary emotional value
CN109977396A (en) * 2019-02-18 2019-07-05 深圳壹账通智能科技有限公司 Emotion identification method, device, computer equipment and the computer storage medium of corpus participle
CN110110217A (en) * 2018-02-02 2019-08-09 优视科技有限公司 The emotional orientation analysis and information recommendation method and device of a kind of pair of information
CN110781662A (en) * 2019-10-21 2020-02-11 腾讯科技(深圳)有限公司 Method for determining point-to-point mutual information and related equipment
CN111310455A (en) * 2020-02-11 2020-06-19 安徽理工大学 New emotion word polarity calculation method for online shopping comments
CN115659961A (en) * 2022-11-01 2023-01-31 广东美云智数科技有限公司 Method, apparatus and computer storage medium for extracting text viewpoints
CN116189064A (en) * 2023-04-26 2023-05-30 中国科学技术大学 Barrage emotion analysis method and system based on joint model

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7181693B1 (en) * 2000-03-17 2007-02-20 Gateway Inc. Affective control of information systems
JP2007226460A (en) * 2006-02-22 2007-09-06 Just Syst Corp Data processor and data processing method
CN101639824A (en) * 2009-08-27 2010-02-03 北京理工大学 Text filtering method based on emotional orientation analysis against malicious information

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7181693B1 (en) * 2000-03-17 2007-02-20 Gateway Inc. Affective control of information systems
JP2007226460A (en) * 2006-02-22 2007-09-06 Just Syst Corp Data processor and data processing method
CN101639824A (en) * 2009-08-27 2010-02-03 北京理工大学 Text filtering method based on emotional orientation analysis against malicious information

Cited By (52)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102279890A (en) * 2011-09-02 2011-12-14 苏州大学 Sentiment word extracting and collecting method based on micro blog
CN103688256A (en) * 2012-01-20 2014-03-26 华为技术有限公司 Method, device and system for determining video quality parameter based on comment
CN102646128A (en) * 2012-03-06 2012-08-22 北京航空航天大学 Method for labeling word properties of emotional words based on extensible markup language (XML)
CN102855276A (en) * 2012-07-20 2013-01-02 北京大学 Method for judging polarity of comment text and application of method
CN103577452A (en) * 2012-07-31 2014-02-12 国际商业机器公司 Website server and method and device for enriching content of website
CN103593334A (en) * 2012-08-15 2014-02-19 中国电信股份有限公司 Method and system for judging emotional degree of text
CN103593334B (en) * 2012-08-15 2017-07-28 中国电信股份有限公司 A kind of method and system for being used to judge emotional degree of text
CN103885994A (en) * 2012-12-24 2014-06-25 腾讯科技(深圳)有限公司 Method and device for comparing products
CN103885994B (en) * 2012-12-24 2018-11-23 腾讯科技(深圳)有限公司 A kind of product control methods and device
CN103077207A (en) * 2012-12-28 2013-05-01 深圳先进技术研究院 Method and system for analyzing microblog happiness index
CN103077207B (en) * 2012-12-28 2016-09-07 深圳先进技术研究院 A kind of microblogging happy index analysis method and system
CN103150367A (en) * 2013-03-07 2013-06-12 宁波成电泰克电子信息技术发展有限公司 Method for analyzing emotional tendency of Chinese microblogs
CN103150367B (en) * 2013-03-07 2016-01-20 宁波成电泰克电子信息技术发展有限公司 A kind of Sentiment orientation analytical approach of Chinese microblogging
CN103559174A (en) * 2013-09-30 2014-02-05 东软集团股份有限公司 Semantic emotion classification characteristic value extraction method and system
CN103559174B (en) * 2013-09-30 2016-03-09 东软集团股份有限公司 Semantic emotion classification characteristic value extraction and system
CN104090864B (en) * 2014-06-09 2018-02-06 合肥工业大学 A kind of sentiment dictionary is established and affection computation method
CN104090864A (en) * 2014-06-09 2014-10-08 合肥工业大学 Emotion dictionary building and emotion calculation method
CN104199875A (en) * 2014-08-20 2014-12-10 百度在线网络技术(北京)有限公司 Search recommending method and device
CN104199875B (en) * 2014-08-20 2017-10-27 百度在线网络技术(北京)有限公司 Method and device is recommended in one kind search
CN104731770A (en) * 2015-03-23 2015-06-24 中国科学技术大学苏州研究院 Chinese microblog emotion analysis method based on rules and statistical model
CN105243054B (en) * 2015-09-23 2017-12-29 中国传媒大学 A kind of TV programme satisfaction subjective evaluation method and construction system
CN105243054A (en) * 2015-09-23 2016-01-13 中国传媒大学 Television program satisfaction subjective evaluation method and construction system
CN106598938B (en) * 2015-10-16 2019-12-10 北京国双科技有限公司 Method and device for determining document emotion tendentiousness
CN106598938A (en) * 2015-10-16 2017-04-26 北京国双科技有限公司 Method and device for determining emotion tendencies of documents
CN105279148B (en) * 2015-10-19 2018-05-11 昆明理工大学 A kind of APP software users comment on uniformity determination methods
CN105279148A (en) * 2015-10-19 2016-01-27 昆明理工大学 User review consistency judgment method of APP (Application) software
CN105868185A (en) * 2016-05-16 2016-08-17 南京邮电大学 Part-of-speech-tagging-based dictionary construction method applied in shopping comment emotion analysis
CN106021433A (en) * 2016-05-16 2016-10-12 北京百分点信息科技有限公司 Public praise analysis method and apparatus for product review data
CN106021433B (en) * 2016-05-16 2019-05-10 北京百分点信息科技有限公司 A kind of the public praise analysis method and device of comment on commodity data
CN106022878A (en) * 2016-05-19 2016-10-12 华南理工大学 Community comment emotion tendency analysis-based mobile phone game ranking list construction method
CN106202032B (en) * 2016-06-24 2018-08-28 广州数说故事信息科技有限公司 A kind of sentiment analysis method and its system towards microblogging short text
CN106202032A (en) * 2016-06-24 2016-12-07 广州数说故事信息科技有限公司 A kind of sentiment analysis method towards microblogging short text and system thereof
CN106610955A (en) * 2016-12-13 2017-05-03 成都数联铭品科技有限公司 Dictionary-based multi-dimensional emotion analysis method
CN107688630B (en) * 2017-08-21 2020-05-22 北京工业大学 Semantic-based weakly supervised microbo multi-emotion dictionary expansion method
CN107688630A (en) * 2017-08-21 2018-02-13 北京工业大学 A kind of more sentiment dictionary extending methods of Weakly supervised microblogging based on semanteme
CN107908782A (en) * 2017-12-06 2018-04-13 陕西识代运筹信息科技股份有限公司 A kind of data processing method and device based on sentiment analysis
CN108197106A (en) * 2017-12-29 2018-06-22 深圳市中易科技有限责任公司 A kind of product competition analysis method based on deep learning, apparatus and system
CN108197106B (en) * 2017-12-29 2021-07-13 深圳市中易科技有限责任公司 Product competition analysis method, device and system based on deep learning
CN110110217A (en) * 2018-02-02 2019-08-09 优视科技有限公司 The emotional orientation analysis and information recommendation method and device of a kind of pair of information
CN108848300A (en) * 2018-05-08 2018-11-20 百度在线网络技术(北京)有限公司 Method and apparatus for output information
CN108958710A (en) * 2018-07-05 2018-12-07 北方工业大学 Method for extracting covariance correlation of project progress based on emotional factors
CN108958710B (en) * 2018-07-05 2021-07-16 北方工业大学 Method for extracting covariance correlation of project progress based on emotional factors
CN109657045A (en) * 2018-12-20 2019-04-19 东软集团股份有限公司 A kind of method, apparatus, storage medium and processor obtaining vocabulary emotional value
CN109657045B (en) * 2018-12-20 2021-01-05 东软集团股份有限公司 Method and device for acquiring vocabulary emotion value, storage medium and processor
CN109977396A (en) * 2019-02-18 2019-07-05 深圳壹账通智能科技有限公司 Emotion identification method, device, computer equipment and the computer storage medium of corpus participle
CN110781662A (en) * 2019-10-21 2020-02-11 腾讯科技(深圳)有限公司 Method for determining point-to-point mutual information and related equipment
CN110781662B (en) * 2019-10-21 2022-02-01 腾讯科技(深圳)有限公司 Method for determining point-to-point mutual information and related equipment
CN111310455A (en) * 2020-02-11 2020-06-19 安徽理工大学 New emotion word polarity calculation method for online shopping comments
CN115659961A (en) * 2022-11-01 2023-01-31 广东美云智数科技有限公司 Method, apparatus and computer storage medium for extracting text viewpoints
CN115659961B (en) * 2022-11-01 2023-08-04 美云智数科技有限公司 Method, apparatus and computer storage medium for extracting text views
CN116189064A (en) * 2023-04-26 2023-05-30 中国科学技术大学 Barrage emotion analysis method and system based on joint model
CN116189064B (en) * 2023-04-26 2023-08-29 中国科学技术大学 Barrage emotion analysis method and system based on joint model

Similar Documents

Publication Publication Date Title
CN101782898A (en) Method for analyzing tendentiousness of affective words
CN107193803B (en) Semantic-based specific task text keyword extraction method
CN106649818B (en) Application search intention identification method and device, application search method and server
US9336192B1 (en) Methods for analyzing text
US10120861B2 (en) Hybrid classifier for assigning natural language processing (NLP) inputs to domains in real-time
US8386240B2 (en) Domain dictionary creation by detection of new topic words using divergence value comparison
CN111767725B (en) Data processing method and device based on emotion polarity analysis model
CN104268197A (en) Industry comment data fine grain sentiment analysis method
CN103646088A (en) Product comment fine-grained emotional element extraction method based on CRFs and SVM
CN108108468A (en) A kind of short text sentiment analysis method and apparatus based on concept and text emotion
CN103646099A (en) Thesis recommendation method based on multilayer drawing
CN109344246B (en) Electronic questionnaire generating method, computer readable storage medium and terminal device
CN106294330A (en) A kind of scientific text selection method and device
Selamat et al. Word-length algorithm for language identification of under-resourced languages
CN106250365A (en) The extracting method of item property Feature Words in consumer reviews based on text analyzing
CN112069312A (en) Text classification method based on entity recognition and electronic device
CN103869999A (en) Method and device for sorting candidate items generated by input method
Kurniawan et al. Indonesian twitter sentiment analysis using Word2Vec
CN111737420A (en) Class case retrieval method, system, device and medium based on dispute focus
Putra et al. Sentiment Analysis on Social Media with Glove Using Combination CNN and RoBERTa
Wang et al. YNUWB at SemEval-2019 Task 6: K-max pooling CNN with average meta-embedding for identifying offensive language
CN112632272A (en) Microblog emotion classification method and system based on syntactic analysis
Walha et al. A Lexicon approach to multidimensional analysis of tweets opinion
Shah et al. An automatic text summarization on Naive Bayes classifier using latent semantic analysis
Chopra et al. The Natural Language Processing Workshop: Confidently design and build your own NLP projects with this easy-to-understand practical guide

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20100721