CN104142913A - Distinguishing method and distinguishing system for polarities of words and expressions - Google Patents

Distinguishing method and distinguishing system for polarities of words and expressions Download PDF

Info

Publication number
CN104142913A
CN104142913A CN201310165049.8A CN201310165049A CN104142913A CN 104142913 A CN104142913 A CN 104142913A CN 201310165049 A CN201310165049 A CN 201310165049A CN 104142913 A CN104142913 A CN 104142913A
Authority
CN
China
Prior art keywords
word
neologisms
polarity
words
corpus
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310165049.8A
Other languages
Chinese (zh)
Inventor
张磊
张玄
尚磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hitachi Ltd
Original Assignee
Hitachi Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi Ltd filed Critical Hitachi Ltd
Priority to CN201310165049.8A priority Critical patent/CN104142913A/en
Publication of CN104142913A publication Critical patent/CN104142913A/en
Pending legal-status Critical Current

Links

Abstract

The invention relates to natural language processing, and discloses a distinguishing method and a distinguishing system for polarities of words and expressions. By the distinguishing method and the distinguishing system, the purpose of distinguishing emotion polarities of new words is achieved. A word group is acquired by selecting a language database with certain timeliness as a basis, the word group and the new words have certain cooccurrence rate, the polarities of the words in the word group are analyzed and are summarized to obtain the polarity of the word group, and finally the emotional tendency of the vocabulary of the new words can be determined comprehensively. The technology can be applied to aspects of opinion mining, product evaluation and the like, so that relevant information (such as commodities, friends and news) can be recommended. Moreover, the invention also provides a system for implementing functions of the method.

Description

The method of discrimination of word polarity and judgement system
Technical field
The present invention relates to the emotion tendency judgement of the differentiation to word polarity, particularly neologisms.More specifically, the present invention relates to the method and system that a kind of tendency of the emotion to neologisms detects.
Background technology
Along with SNS(Social Networking Services, social network services) constantly popular, people, by platforms such as microblogging, forums, issue various viewpoints to personage, event, product.In order effectively to process these information, find people's attitude suggestion, just need to carry out text emotion analysis.
Yet in the today in personalized epoch, various neologisms are constantly created, some old words are endowed new implication, follow the outburst of diverse network event also to make a lot of proprietary words continue to bring out and be endowed strong feelings color as name etc. simultaneously.How to detect these neologisms, or the emotion of new meaning word tendency just becomes the objective problem of holding public opinion.
In order to solve the above-mentioned problem, there is the prior art of the word feeling polarities computing method based on PMI: 1. patent documentation: a kind of sentiment dictionary construction method and system [201210138364.7]; 2. paper: the word feeling polarities based on Hownet and PMI calculates, computer engineering, 2012.08.
Above-mentioned known technology mainly adopts PMI(point mutual information) determine the polarity of word.First the method chooses some benchmark words, these benchmark words have commendation, also has derogatory sense.By calculating neologisms and the co-occurrence probabilities of these benchmark words in corpus, determine that the justice of passing judgement on of neologisms is inclined to.Suppose that benchmark commendatory term is WordSet1={commendatroy 1, commendatroy 2..., commendatroy n, derogatory term is WordSet2={derogatory 1, derogatory 2..., derogatory n, for certain word Word, the word polarity S O_PMI (Word) based on PMI is:
SO _ PMI ( Word ) = Σ i = 1 n PMI ( Word , commendatory i ) - Σ i = 1 n PMI ( Word , derogator y i ) .
The computing formula of PMI: wherein, P (Word 1) expression Word 1the independent probability occurring in corpus, P (Word 2) expression Word 2the independent probability occurring in corpus; P (Word 1aMP.AMp.Amp Word 2) expression Word 1with Word 2probability simultaneously that occur in corpus.
Prior art
1 one kinds of sentiment dictionary construction methods of patent documentation and system [201210138364.7]
The word feeling polarities of non-patent literature [1] based on Hownet and PMI calculates, computer engineering, 2012.08
The problem that invention will solve
Yet, in existing known technology, there is following problem:
First, for neologisms, the unexpected word of new outburst on network especially, the probability occurring in corpus is in the past minimum, does not even have.Like this when calculating co-occurrence probabilities, may obtain PMI and be 0 result, thereby differentiate, lost efficacy.
Secondly, on network, neologisms often follow neologisms to occur, they complement one another and annotate each other, such as neologisms " mountain vallage " (negative sense) with it the word of high co-occurrence be that " thunder people " itself is also neologisms (negative sense), and in known technology, the generic word that obtains high co-occurrence is with it relatively deficient, easily cause error in judgement.For another example, due to irony and the situation of laughing at oneself, " Cock silk " with it the word of high co-occurrence is " high, rich, general ", with known technology, can obtain antipodal polarity on the contrary.
Again, the name producing for some network focus incidents, place name, proper noun etc., these nouns are also comprising the emotion tendency of public opinion behind, such as " room elder sister ", " cousin ", " peekaboo " etc., by known technology, may obtain and the antipodal polarity of network event.Noun such as " cousin " this cousin's feelings relation, does not have polarity, but in network event, has the meaning that negative sense is talked in professional jargon.
All there is the situation that causes erroneous judgement, fails to judge in above problem.Therefore the word emotion tendency out of true, directly obtaining by prior art.
Summary of the invention
The present invention be directed to the deficiencies in the prior art part proposes.The present invention is directed to the problem that above known technology exists, by selection, there is ageing corpus for basis, obtain one group of phrase (also comprising other co-occurrence neologisms) with neologisms with certain co-occurrence rate, and by analyzing the polarity of each word in phrase, gather and obtain phrase polarity, thus the emotion of this word of synthetic determination tendency.
The word polarity discriminating method of one aspect of the present invention is characterised in that, comprising: building of corpus step, choose the interior network text message of certain hour as corpus, neologisms obtaining step, obtains as the neologisms of differentiating object,, these language material resources is carried out to word segmentation processing by sentence that is, by existing methods such as information gains (IG), obtains neologisms, step collected in co-occurrence word, based on corpus, obtain being greater than the word of threshold alpha and forming set of words with the co-occurrence rate of these neologisms, be specially, in corpus, select those and neologisms co-occurrence rate to surpass the word of threshold value, form co-occurrence set of words, this set is stored the word of each co-occurrence and the frequency of co-occurrence with the form of key-value pair, word polarity discriminating step, first judge the polarity of each word in set of words, select successively each word in set of words, judge whether it is neologisms, if not neologisms, directly from sentiment dictionary, obtain the polarity of this word, if neologisms, this word itself also needs to determine by the present invention the polarity of this word, in the process of processing, in order to prevent the generation of endless loop, be that initially treated neologisms and the current neologisms of just processing iterate, therefore by being set, an iteration control device detects the generation of this situation, once trigger iteration control device, the neologisms of processing deleted, , get rid of the impact of this word on neologisms polarity, by judging the polarity that obtains each word in set of words, calculate the co-occurrence degree sum of all forward words and the co-occurrence degree sum of all negative sense words, thereby the relation that then comprehensively judges forward and negative sense word finally obtains the polarity of this set of words, this polarity is exactly the polarity of neologisms word, sentiment dictionary is safeguarded step, and the neologisms of having differentiated and feeling polarities thereof are added in sentiment dictionary and gone, and expands sentiment dictionary.
The word polarity discriminating system of one aspect of the present invention is characterised in that, comprising: building of corpus module, and it is collected in the network text information in certain hour window and these information is assembled into corpus; Neologisms acquisition module, it is cut word to the sentence in corpus and processes and find neologisms by prior art; Co-occurrence word collection module, it is the co-occurrence rate with neologisms for the word calculating in corpus, and selects the word that co-occurrence rate is greater than threshold value, forms set of words; Word polarity discriminating module, differentiate the polarity of each word in set of words, according to COMPREHENSIVE CALCULATING formula, obtain again the polarity of neologisms, in the process of differentiating, it utilizes sentiment dictionary to judge whether the word in set of words is present in sentiment dictionary, whether be neologisms, if be present in the polarity of directly obtaining this word in sentiment dictionary, if be not present in, in sentiment dictionary, this word carried out to iterative processing according to the correlation step in the inventive method and obtain polarity; Sentiment dictionary update module, it is increased to the neologisms that obtain and polarity thereof in dictionary; Subscriber Interface Module SIM, receives setting and displaying neologisms and polarity thereof that user carries out system.
Invention effect
According to the present invention, can jointly determine the polarity of these neologisms with the word of the high co-occurrence of neologisms by one group, avoided with reference to not enough problem.Can constantly automatically improve and upgrade emotion word dictionary simultaneously.According to this result, can judge the user feeling tendency of the sentence that those have neologisms, thereby carry out the work such as recommendation, evaluation of relevant information (as commodity, good friend, news etc.).
Accompanying drawing explanation
Fig. 1 means the structured flowchart of neologisms feeling polarities judgement system.
Fig. 2 means the process flow diagram of neologisms feeling polarities method of discrimination.
Fig. 3 means the process flow diagram of set of words polarity discriminating method.
Embodiment
Below disclose embodiment, with reference to accompanying drawing, the present invention will be described in more detail, but the present invention is not limited to this embodiment.
As shown in Figure 1, the word polarity discriminating system of present embodiment (also referred to as word emotion tendency quantization system) comprising: building of corpus module 101, neologisms acquisition module 102, co-occurrence word collection module 103, word polarity discriminating module 104, sentiment dictionary update module 105, Subscriber Interface Module SIM 106.
Building of corpus module 101 is collected in the network text information in certain hour window, if user has inputted the neologisms that will detect in advance, according to neologisms, is that key word carries out text screening, and these texts are assembled into corpus the most at last.Time window can be sky, Yue Deng unit, can be also the chronomere that user arranges voluntarily.The API providing by website carries out content crawl, also can carry out text collection by web crawlers.The network text information acquiring is saved to local storage as corpus.
Neologisms acquisition module 102 is for carrying out word segmentation processing and finding neologisms by prior art to the sentence of corpus.Utilize related software, as ICTCLAS etc. carries out participle to sentence, the ICTCLAS of latest edition has possessed new word discovery function, can directly utilize its result.Or, also can carry out voluntarily new word discovery work, conventional method is such as information gain: after word segmentation processing, by calculating the probability of adjacent word or word co-occurrence, calculate its information gain, determine whether according to this neologisms.In addition, the quantity of the neologisms of discovery may be 1 or a plurality of, in a plurality of situations, only need differentiate one by one according to the relevant step of the inventive method.
Co-occurrence word collection module 103, by the word segmentation result to the sentence in corpus in neologisms acquisition module 102, calculates one by one the co-occurrence rate of these words and neologisms, and selects the word that co-occurrence rate is greater than threshold value, forms set of words.The threshold alpha is here an empirical value, can determine afterwards by the experiment of some.The computing formula of co-occurrence rate wherein, P (Word 1) expression Word 1the independent probability occurring in corpus, P (Word 2) expression Word 2the independent probability occurring in corpus; P (Word 1aMP.AMp.Amp Word 2) expression Word 1with Word 2the probability occurring in corpus, is positioned at a sentence simultaneously simultaneously.
Word polarity discriminating module 104, differentiates the polarity of each word in set of words, then according to COMPREHENSIVE CALCULATING formula, obtains the polarity of neologisms.Detailed process is to utilize sentiment dictionary to judge one by one whether the word in set of words is present in sentiment dictionary, whether be neologisms, if be present in the polarity of directly obtaining this word in sentiment dictionary, if not being present in sentiment dictionary this word is also neologisms, need carries out iterative processing according to the correlation step in the inventive method and obtain polarity.COMPREHENSIVE CALCULATING formula is following formula:
Wherein, S is final judged result, and Pos{ ∑ f} is the co-occurrence rate sum of all forward words, and Neg{ ∑ f} is the co-occurrence rate sum of all negative sense words, and Neu{ ∑ f} is the co-occurrence rate sum of all neutral words.Here, for the accuracy that guarantees to differentiate, introduced threshold value beta during calculating, this value is an empirical value, when the absolute value of the difference of positive and negative co-occurrence rate is not more than β, result of calculation is set to neutrality.Especially, when threshold value beta is 0, directly by positive and negative, to that larger class word of the rate of sharing, determine neologisms polarity.
Sentiment dictionary update module 105, neologisms and polarity thereof according to obtaining in word polarity discriminating module 104, be saved in sentiment dictionary.Sentiment dictionary be differentiate new and old word according to and the medium that extracts word polarity.
Subscriber Interface Module SIM 106, the configuration that main completing user carries out system, input and the interface of showing neologisms and polarity thereof.
Below the flow process of the word polarity discriminating method of present embodiment (word emotion tendency quantization method) is described.
Fig. 2 means the process flow diagram of neologisms feeling polarities method of discrimination, and Fig. 3 means the process flow diagram of set of words polarity discriminating method, in conjunction with Fig. 2, Fig. 3, describes.
(1) first, need to judge whether user has directly inputted neologisms, if user directly inputs neologisms, directly these neologisms is detected, otherwise need system automatically to find neologisms (the step S100 of Fig. 2).Wherein when user directly inputs neologisms, need to detect this word whether in sentiment dictionary, if existed, to put its state be 1, that is, and the corresponding situation that detects old word new meaning.
Below table 1 be an object lesson of emotion word dictionary.In table, the 1st row record word, the 2nd row record the polarity of word emotion, and the 3rd row record its state (if neologisms are directly inputted by user, configuration state is 1, otherwise is defaulted as 0).Certainly the present invention is not limited to these object lessons.
[table 1]
Sentiment dictionary
Word Polarity State
Good Just 0
Amiable Just 0
Succinctly Just 0
?
Sorrow Negative 0
Bad Negative 0
Profound Negative 0
Embarrassing Negative 1
?
I In 0
Cousin In 1
?
(2) collection network text message, constructs corpus and makes word segmentation processing.
The time window of collecting is arranged voluntarily by user, for example, by unit period " time, sky, week " etc.The text of collecting can obtain from network medias such as SNS, news websites, and the API that can provide by website collects, and the reptile program that also can define by oneself is carried out text collection.
Concrete steps are described:
(2.1) if do not input neologisms (that is, needing automatically to detect neologisms) step S100 judgement user, by time window, carry out full text collection (the step S101 of Fig. 2).
(2.2), in step S100, if judgement neologisms are directly inputted by user, the neologisms of need to take filter the text in time window as keyword, collect those texts that comprise neologisms (the step S102 of Fig. 2).
Corpus Corpus={Message 1, Message 2..., Message n, wherein N is natural number, represents language material textual data.
(2.3) then, for each text in corpus, by sentence, carry out word segmentation processing and obtain a minute set of words.In this process, can utilize existing instrument to carry out participle as ICTCLAS etc.If need automatically catch neologisms, can use the ICTCLAS(2013 version of latest edition), also by methods such as information gains, find neologisms.It should be noted that, in participle process by setting stop words dictionary, by those a large amount of repeat and without practical significance word as " this, that, you, I, " etc. word filter out to facilitate subsequent treatment.Afterwards, in conjunction with sentiment dictionary, judge one by one whether the word of minute set of words needs to carry out polarity detection.If the word in set of words is not present in this dictionary, need to detect.If the word in set of words is present in this set, " state " is 1, needs to detect.
(3) then, in language material resource, find co-occurrence rate and form set of words (the step S103 of Fig. 2) higher than the word of threshold alpha.Its method is: the word segmentation result by previous step (2) to the sentence in corpus, and calculate one by one the co-occurrence rate of these words and neologisms, and select the word that co-occurrence rate is greater than threshold value, form set of words.The threshold alpha is here an empirical value, can determine afterwards by the experiment of some.The computing formula of co-occurrence rate wherein, P (Word 1) expression Word 1the independent probability occurring in corpus, P (Word 2) expression Word 2the independent probability occurring in corpus; P (Word 1aMP.AMp.Amp Word 2) expression Word 1with Word 2probability (being simultaneously positioned at a sentence) simultaneously that occur in corpus.The set of words finally forming is Coword_list={ (w 1, f 1), (w 2, f 2) ..., (w n, f n), wherein, w ifor co-occurrence word, f iit is corresponding co-occurrence rate.
(4) then, extract the word in Coword_list, determine that in Coword_list, the tendency of each word is polarity (the step S104 of Fig. 2).
Here, the concrete steps that are polarity in conjunction with Fig. 3 to the tendency of each word in definite Coword_list describe:
(4.1) extract one by one the word W in Coword_list i, judge this word W iwhether be neologisms, that is, whether be present in (the step S401 of Fig. 3) in sentiment dictionary.
(4.2) if this word W iin sentiment dictionary, directly obtain this word W ipolarity, and it is included into (the step S404 of Fig. 3) in corresponding polarity set.
(4.3) if this word W ido not exist in sentiment dictionary, this word W iitself be also neologisms, need to be according to method of the present invention to this word W ialso differentiate.In order to prevent the generation of endless loop, be the situation of iteration generation mutually between neologisms and neologisms, iteration control device is set in program, this iteration control device has been stored the neologisms of main procedure processing and the neologisms that iterative process is processed with the form of Word-Pair, be <Nword, W ithe step S402 of >(Fig. 3).
(4.4) once the word W that discovery is just being processed ialready in Word-Pair, by W ifrom set of words, delete (the step S405 of Fig. 3).
(4.5), if without word pair, this word iteration is started to main procedure from step S103 to process (the step S403 of Fig. 3).Certainly, for prevent iterations too much, the upper control limit of an iterations also can be set, such as 7 layers of iteration at the most, when reaching the upper iteration that stops of the iteration number of plies in limited time.
(5) next, get back to the polarity of the flowchart text how to confirm neologisms of Fig. 2.
Obtain corresponding forward set of words, after negative sense set of words and neutral set of words, according to following COMPREHENSIVE CALCULATING formula, determining the polarity (the step S105 of Fig. 2) of neologisms.
Wherein, S is net result, and Pos{ ∑ f} is the co-occurrence rate sum of all forward words, and Neg{ ∑ f} is the co-occurrence rate sum of all negative sense words, and Neu{ ∑ f} is the co-occurrence rate sum of all neutral words.Here, for the accuracy that guarantees to differentiate, introduced threshold value beta during calculating, this value is an empirical value, when the absolute value of the difference of positive and negative co-occurrence rate is not more than β, result of calculation is set to neutrality.Especially, when threshold value beta is 0, directly by the positive and negative polarity that determines neologisms to that larger class word of the rate of sharing.
(6) last, the result of differentiation is returned, and the neologisms that detect are joined to (the step S106 of Fig. 2) in sentiment dictionary.
Below in conjunction with specific embodiment, embodiment is further elaborated.It should be noted that following embodiment just the present invention and the object lesson enumerated for convenience of explanation, the present invention is not limited to these embodiment.
(embodiment 1)
Embodiment 1 is the object lesson that user has initiatively inputted the emotion word that will detect.
(routine 1-1) microblog users first, input network neologisms " cousin ", setup times window is " 2012-9-1~2012-9-8 " week age (note, microblogging is discussed in a large number certain chief Yang and (is concealed Real Name around here, lower same) smiling at a scene of an automobile accident, cause online friend discontented, being dug out subsequently it has a large amount of famous and precious wrist-watches, is queried corruption).
1. will search sentiment dictionary, find that " cousin " exists (old word new meaning), is set to 1 by mode field.
[table 1]
Sentiment dictionary
2. collect all micro-blog informations in this time window.According to key word " cousin ", filter, filter out the microblogging content that comprises key word and set up corpus.
Here for the example of several microbloggings:
A. " do not know what reason, always allow me remember ' cousin '---Yang, he may be a corrupt official of the most grieved on Chinese history, legend, be exactly he this world-famous one laugh at, oneself laughing at into prison.This might be the miracle of the miracle of China Today! "
B. " this goods is corrupt official at a glance, and looking feels sick so just can imagine without PS the combination that cousin Yang and Lei so-and-so (note, certain Nude Picture Scandal leading role) ".
C. since certain chief Yang is at miserable scene of an automobile accident, doubtful laughing foolishly enraged netizen and gone out after " cousin " identity by people's meat, though there is people to think that protection reaches a standard, that sustains netizen extremely twines rotten beating, and finally becomes first, republic because of " laughing at " corrupt official that fails.
3. pair corpus content is carried out participle by sentence, and calculating and cousin's co-occurrence rate.
4. for convenience of explanation, threshold alpha represents to get Top5 high co-occurrence word here, therefore obtains:
Cousin={ (corrupt official, 3), (Yang, 3), (corruption, 2), (traffic accident, 1), (accident, 1) }
5. for this set of words, carry out respectively part of speech judgement.Be easy to get, negativity set of words { corrupt official, corruption }, positivity set of words { ф } (sky), neutral word { traffic accident, accident }.
Here " Yang " in processing procedure, it is not present in sentiment dictionary, so it also needs to differentiate as neologisms, the co-occurrence word set of finding " Yang " be Yang=(cousin, f1), (chief, f2) ....Therefore formed < cousin, Yang > word pair, program is done to delete to " Yang " and is processed.
6. calculate the polarity of " cousin " set of words, for simplicity, we are set to 0 by threshold value beta.S=0-(the 3+2)/3+2+1+1<0 that is easy to get, therefore " cousin " is the word of negative sense.
7. result is returned to user and show, and upgrade sentiment dictionary and put " cousin " for negative sense.
(embodiment 2)
Embodiment 2 is that user does not input keyword, needs system to automatically identify keyword and obtains the example of its polarity.
(example 2) microblog users second, setup times window is " 2013-3-6~2013-3-7 ".(note is now discussed " somewhere is lost car and lost baby by the event of killing " in a large number on microblogging).
Relevant microblogging thes contents are as follows:
A. car is stolen in all so-and-so (concealing Real Name, lower same) 7 left and right, and child is murdered in 8 left and right, and human nature dies out, must death penalty.Somebody swears and swears that this is to propagandize, and makes us without language.
B. sometimes animal is very strong than people, just as week so-and-so, say that you are beast, beast is all unwilling.
C. heartless week so-and-so murdered little extensive, pained after, unavoidably regret! He is just 2 months! ! How could you descend to such an extent that remove hand? what peace of conscience?
1. by collecting this time window text, obtain above (including but not limited to) content, form corpus.
2. by prior art, find neologisms " week so-and-so ", this word is not present in sentiment dictionary yet.
3. the word of retrieval and " week so-and-so " co-occurrence in corpus, as the same example, we will be arranged to get Top5 by α.Obtain set of words: week so-and-so={ (murdering 2), (beast, 2), (human nature dies out, 1), (heartless, 1), (steal, 1) }
4. detect respectively the word polarity in this set, the negative sense set of words that is easy to get murder, beast, human nature are die out, heartless, steal, and the set of words of neutrality and forward is sky.Therefore be easy to get the polarity in " week so-and-so " for negative.
5. result is returned to user and shown, and will add in sentiment dictionary in " week so-and-so ", put polarity for negative.
Above embodiments of the present invention and specific embodiment are illustrated, but the present invention is not defined in above-mentioned embodiment.In addition, for above-mentioned embodiment, do not departing from the scope that purport of the present invention is the meaning shown in the word described in claims, implementing the various distortion that those skilled in the art can expect and obtain variation and be also contained in the present invention.
By adopting the method for the invention and system, can jointly determine the polarity of these neologisms with the word of the high co-occurrence of neologisms by one group, avoided with reference to not enough problem.Can constantly automatically improve and upgrade emotion word dictionary simultaneously.According to this result, can judge the user feeling tendency of the sentence that those have neologisms, thereby carry out the work such as recommendation, evaluation of relevant information (as commodity, good friend, news etc.).

Claims (9)

1. a method of discrimination for word polarity, is characterized in that, comprising:
Building of corpus step, chooses the interior network text message of certain hour as corpus,
Neologisms obtaining step, obtains as the neologisms of differentiating object,
Step collected in co-occurrence word, based on corpus, obtain being greater than the word of threshold alpha and forming set of words with the co-occurrence rate of these neologisms,
Word polarity discriminating step, determines the polarity of each word in described set of words successively, differentiates thus the polarity of these neologisms.
2. the method for discrimination of word polarity according to claim 1, is characterized in that,
If the emotion propensity value that S is neologisms, β is threshold value, in described word polarity discriminating step, ,, when ∣ S ∣≤β, differentiates the polarity of these neologisms for neutral, when S > β, the polarity of differentiating these neologisms is positivity, and when S < ﹣ β, the polarity of differentiating these neologisms is negativity, be that S and β meet following relational expression
Wherein, Pos{ ∑ f} is the co-occurrence rate sum of all positivity words, and Neg{ ∑ f} is the co-occurrence rate sum of all negativity words, and Neu{ ∑ f} is the co-occurrence rate sum of all neutral words.
3. the method for discrimination of word polarity according to claim 1, is characterized in that,
In described neologisms obtaining step, by user, neologisms have directly been inputted as differentiating object.
4. the method for discrimination of word polarity according to claim 3, is characterized in that,
In co-occurrence word obtaining step, the neologisms of having inputted according to user extract the language material resource that comprises these neologisms from corpus, obtain being greater than the word of threshold alpha and forming set of words with the co-occurrence rate of these neologisms.
5. the method for discrimination of word polarity according to claim 1, is characterized in that,
In described neologisms obtaining step, from described corpus, extract as the neologisms of differentiating object.
6. the method for discrimination of word polarity according to claim 1, is characterized in that,
In described word polarity discriminating step, for each word in described set of words, determine whether other neologisms, if not obtain the polarity of this word from sentiment dictionary, if other neologisms, these other neologisms are carried out to iterative processing, and using the form storage of Word-Pair as the neologisms of differentiation object and other neologisms in described set of words, if the word W just processing ialready in Word-Pair, by W ifrom set of words, delete.
7. the method for discrimination of word polarity according to claim 1, is characterized in that,
Described method of discrimination also comprises that sentiment dictionary safeguards step, and neologisms and the polarity thereof in word polarity discriminating step, differentiated are increased in sentiment dictionary.
8. a judgement system for word polarity, is characterized in that, has:
Building of corpus module, obtains the interior network text message of certain hour as corpus,
Neologisms acquisition module, obtains as the neologisms of differentiating object,
Co-occurrence word collection module, based on corpus, calculates the word composition set of words that is greater than threshold alpha with this neologisms co-occurrence rate, and
Word polarity discriminating module, determines the polarity of each word in described set of words successively, differentiates thus the polarity of these neologisms.
9. the judgement system of word polarity according to claim 8, is characterized in that,
Also have:
Sentiment dictionary update module, is increased to the neologisms that obtain and polarity thereof in sentiment dictionary, and Subscriber Interface Module SIM receives setting and demonstration neologisms and polarity thereof that user carries out system.
CN201310165049.8A 2013-05-07 2013-05-07 Distinguishing method and distinguishing system for polarities of words and expressions Pending CN104142913A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310165049.8A CN104142913A (en) 2013-05-07 2013-05-07 Distinguishing method and distinguishing system for polarities of words and expressions

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310165049.8A CN104142913A (en) 2013-05-07 2013-05-07 Distinguishing method and distinguishing system for polarities of words and expressions

Publications (1)

Publication Number Publication Date
CN104142913A true CN104142913A (en) 2014-11-12

Family

ID=51852089

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310165049.8A Pending CN104142913A (en) 2013-05-07 2013-05-07 Distinguishing method and distinguishing system for polarities of words and expressions

Country Status (1)

Country Link
CN (1) CN104142913A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104408035A (en) * 2014-12-15 2015-03-11 北京国双科技有限公司 Word emotion type analysis method and device
CN105138510A (en) * 2015-08-10 2015-12-09 昆明理工大学 Microblog-based neologism emotional tendency judgment method
CN106649250A (en) * 2015-10-29 2017-05-10 北京国双科技有限公司 Method and device for identifying emotional new words
CN106874363A (en) * 2016-12-30 2017-06-20 北京光年无限科技有限公司 The multi-modal output intent and device of intelligent robot
CN107341496A (en) * 2016-05-03 2017-11-10 株式会社理光 A kind of word analysis method and device
CN108804512A (en) * 2018-04-20 2018-11-13 平安科技(深圳)有限公司 Generating means, method and the computer readable storage medium of textual classification model
CN109408798A (en) * 2018-07-27 2019-03-01 昆明理工大学 A kind of word Sentiment orientation determination method
CN109885687A (en) * 2018-12-29 2019-06-14 深兰科技(上海)有限公司 A kind of sentiment analysis method, apparatus, electronic equipment and the storage medium of text
CN111400439A (en) * 2020-02-26 2020-07-10 平安科技(深圳)有限公司 Network bad data monitoring method and device and storage medium
CN115269852A (en) * 2022-08-08 2022-11-01 浙江浙蕨科技有限公司 Public opinion analysis method, system and storage medium

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104408035A (en) * 2014-12-15 2015-03-11 北京国双科技有限公司 Word emotion type analysis method and device
CN104408035B (en) * 2014-12-15 2018-04-03 北京国双科技有限公司 The analysis method and device of word affective style
CN105138510A (en) * 2015-08-10 2015-12-09 昆明理工大学 Microblog-based neologism emotional tendency judgment method
CN105138510B (en) * 2015-08-10 2018-05-25 昆明理工大学 A kind of neologisms Sentiment orientation determination method based on microblogging
CN106649250B (en) * 2015-10-29 2019-08-02 北京国双科技有限公司 A kind of recognition methods of emotion neologisms and device
CN106649250A (en) * 2015-10-29 2017-05-10 北京国双科技有限公司 Method and device for identifying emotional new words
CN107341496A (en) * 2016-05-03 2017-11-10 株式会社理光 A kind of word analysis method and device
CN106874363A (en) * 2016-12-30 2017-06-20 北京光年无限科技有限公司 The multi-modal output intent and device of intelligent robot
CN108804512A (en) * 2018-04-20 2018-11-13 平安科技(深圳)有限公司 Generating means, method and the computer readable storage medium of textual classification model
CN109408798A (en) * 2018-07-27 2019-03-01 昆明理工大学 A kind of word Sentiment orientation determination method
CN109408798B (en) * 2018-07-27 2021-09-14 昆明理工大学 Word emotional tendency judgment method
CN109885687A (en) * 2018-12-29 2019-06-14 深兰科技(上海)有限公司 A kind of sentiment analysis method, apparatus, electronic equipment and the storage medium of text
CN111400439A (en) * 2020-02-26 2020-07-10 平安科技(深圳)有限公司 Network bad data monitoring method and device and storage medium
CN115269852A (en) * 2022-08-08 2022-11-01 浙江浙蕨科技有限公司 Public opinion analysis method, system and storage medium

Similar Documents

Publication Publication Date Title
CN104142913A (en) Distinguishing method and distinguishing system for polarities of words and expressions
Ofoghi et al. Towards early discovery of salient health threats: A social media emotion classification technique
EP3086239A1 (en) Scenario generation device and computer program therefor
CN103023714B (en) The liveness of topic Network Based and cluster topology analytical system and method
US10095685B2 (en) Phrase pair collecting apparatus and computer program therefor
CN107077486A (en) Affective Evaluation system and method
WO2016085409A1 (en) A method and system for sentiment classification and emotion classification
EP2562659A1 (en) Data mapping acceleration
CN104615608A (en) Data mining processing system and method
US10430717B2 (en) Complex predicate template collecting apparatus and computer program therefor
Sharma et al. Detecting hate speech and insults on social commentary using nlp and machine learning
CN106649334B (en) Processing method and device of associated word set
CN108062402B (en) Event timeline mining method and system
AU2018411565B2 (en) System and methods for generating an enhanced output of relevant content to facilitate content analysis
CN104346408A (en) Method and equipment for labeling network user
Chu et al. Identifying key target audiences for public health campaigns: Leveraging machine learning in the case of hookah tobacco smoking
CN103246728A (en) Emergency detection method based on document lexical feature variations
Kumar et al. Battling fake news: A survey on mitigation techniques and identification
Menezes et al. Building a massive corpus for named entity recognition using free open data sources
Nguyen et al. Evaluating marijuana-related tweets on Twitter
CN106484746B (en) Website conversion event analysis method and device
Intxaurrondo et al. Diamonds in the rough: Event extraction from imperfect microblog data
CN112926308A (en) Method, apparatus, device, storage medium and program product for matching text
CN110232160B (en) Method and device for detecting interest point transition event and storage medium
KR102328234B1 (en) System and method for detecting local event by analyzing relevant documents in social network

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20141112