CN105138510A - Microblog-based neologism emotional tendency judgment method - Google Patents
Microblog-based neologism emotional tendency judgment method Download PDFInfo
- Publication number
- CN105138510A CN105138510A CN201510485811.XA CN201510485811A CN105138510A CN 105138510 A CN105138510 A CN 105138510A CN 201510485811 A CN201510485811 A CN 201510485811A CN 105138510 A CN105138510 A CN 105138510A
- Authority
- CN
- China
- Prior art keywords
- word
- neologisms
- occurrence
- words
- net
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 206010028916 Neologism Diseases 0.000 title claims abstract description 201
- 238000000034 method Methods 0.000 title claims abstract description 27
- 230000002996 emotional effect Effects 0.000 title abstract 9
- 230000008859 change Effects 0.000 claims abstract description 15
- 238000001914 filtration Methods 0.000 claims abstract description 11
- 230000008451 emotion Effects 0.000 claims description 61
- 239000000463 material Substances 0.000 claims description 31
- 230000011218 segmentation Effects 0.000 claims description 21
- 238000013461 design Methods 0.000 claims description 10
- 230000006870 function Effects 0.000 claims description 8
- 230000007935 neutral effect Effects 0.000 claims description 5
- 238000004064 recycling Methods 0.000 claims description 5
- 238000003058 natural language processing Methods 0.000 abstract description 3
- 230000015572 biosynthetic process Effects 0.000 abstract 1
- 230000000903 blocking effect Effects 0.000 abstract 1
- 230000008569 process Effects 0.000 description 6
- 238000002474 experimental method Methods 0.000 description 3
- 235000010086 Setaria viridis var. viridis Nutrition 0.000 description 1
- 241001122767 Theaceae Species 0.000 description 1
- 230000003466 anti-cipated effect Effects 0.000 description 1
- 230000004888 barrier function Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 235000019504 cigarettes Nutrition 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 244000230342 green foxtail Species 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 239000008267 milk Substances 0.000 description 1
- 210000004080 milk Anatomy 0.000 description 1
- 235000013336 milk Nutrition 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011524 similarity measure Methods 0.000 description 1
Abstract
The invention relates to a microblog-based neologism emotional tendency judgment method, belonging to the field of natural language processing. The microblog-based neologism emotional tendency judgment method disclosed by the invention comprises the following steps: dividing words of microblog corpuses through a Chinese word division tool, blocking the corpuses, the words in which are divided, by taking stop words in a word division result as a division point, pairwise combining adjacent word strings in each block, calculating the combined word string frequency, and taking the word strings, the frequencies of which are higher than a threshold value, as neologism candidate strings; filtering the neologism candidate strings according to a word formation rule of Chinese linguistics and an adjacent change number rule so as to obtain neologisms; calculating the similarity between co-occurrence words and hownet emotional words by utilizing an emotional dictionary of a hownet; calculating the relevancy between the neologisms and the co-occurrence words; constructing an image model; and obtaining the emotional polarity distribution of the neologisms by utilizing a label propagation algorithm, and obtaining the emotional tendency of the neologisms by constructing a linear classifier. By means of judgement of the emotional tendency of the neologisms, a blogger can express views better; and furthermore, the emotional tendency of the blogger can be accurately known by users.
Description
Technical field
The present invention relates to a kind of neologisms Sentiment orientation decision method based on microblogging, belong to natural language processing field.
Background technology
A large amount of emotion neologisms emerge in large numbers in microblogging, and appearing in the daily interchange of people of these neologisms plays an important role, can the abundanter comprehensive expression viewpoint of people and emotion, the simultaneously refraction of Ye Shi social trend and media event.In natural language processing process, emotion new word identification is a difficulties always, and it has very important application in Chinese word segmentation, information retrieval, question answering system etc.
Current vocabulary feeling polarities recognition methods, first choose there is intense emotion tendency word as benchmark word, then determine the feeling polarities of target word by the strength of association calculated between benchmark word.TurneyPD. wait people to use PMI-IR method, utilization point mutual information represents the strength of association between target word and benchmark word, obtains the feeling polarities of target word; The people such as Wang Suge use PMI method, calculate word and synonym thereof respectively and pass judgement on the strength of association of benchmark word set, then judge the feeling polarities of word according to the difference of strength of association.The people such as Li Dun think that the vocabulary of co-occurrence has identical feeling polarities.Utilize " good ", " green bristlegrass " polarity justice in HowNet former, the polarity similarity that calculating word and benchmark word are anticipated between item, thus calculate word polarity number.Yao Tian Fang, Wan Changxuan etc. introduce link word (but and etc.) when calculating word association intensity, and utilize word part of speech within a context and syntactic structure information, calculate dynamic (modification) polarity of word.Improve the accuracy that feeling polarities calculates.
For the feeling polarities identification of neologisms, neologisms lack the priori of part of speech, semantic aspect on the one hand, cannot directly utilize the feeling polarities to neologisms such as external resource of knowing net to judge; On the other hand because the quantity of neologisms and benchmark word is all relatively limited, only calculate the degree of correlation with benchmark word, there will be serious Sparse Problem.The emotion neologisms polarity identification method based on microblogging language material that the present invention proposes is when calculating neologisms feeling polarities, not only consider the benchmark word be associated with neologisms, and consider the non-referenced word and other neologisms with Sentiment orientation, on the impact of neologisms feeling polarities identification.
Summary of the invention
The invention provides a kind of neologisms Sentiment orientation decision method based on microblogging, the problem that the emotion neologisms in existing situation in microblogging language material cannot identify automatically can be solved.
Technical scheme of the present invention is: a kind of neologisms Sentiment orientation decision method based on microblogging, by Chinese word segmentation instrument, participle is carried out to microblogging language material, and with the stop words in word segmentation result for cut-point carries out piecemeal to the language material after participle, by word string combination of two adjacent in each piece, word string frequency after combination is added up, the word string of frequency higher than threshold value is gone here and there as neologisms candidate; According to the one-tenth word rule of Chinese and the rule of adjacent change number, filtration is carried out to neologisms candidate string and obtain neologisms; Recycling knows the sentiment dictionary of net, calculates co-occurrence word and the Words similarity knowing net emotion word; Calculate the degree of correlation of neologisms and co-occurrence word; With the co-occurrence word of neologisms and these neologisms for node, between neologisms and its co-occurrence word, set up limit, the weight being limit with the degree of correlation of neologisms and co-occurrence word, design of graphics model; Utilize label propagation algorithm to obtain the feeling polarities distribution of neologisms, obtain the emotion tendency of neologisms finally by structure linear classifier.
The concrete steps of described method are as follows:
Step1, by Chinese word segmentation instrument, participle is carried out to microblogging language material;
Step2, with the stop words in word segmentation result for cut-point carries out piecemeal to the language material after participle, and by word string combination of two adjacent in each piece, the word string frequency after combination is added up, the word string of frequency higher than threshold value T is gone here and there as neologisms candidate;
Step3, rule that is regular according to the one-tenth word of Chinese and adjacent change number are carried out filtration to neologisms candidate string and are obtained neologisms;
Step4, utilize and know the sentiment dictionary of net, calculate co-occurrence word and the Words similarity knowing net emotion word:
Step4.1, find out the non-stop words with new Term co-occurrence in microblogging language material, as co-occurrence word;
Net sentiment dictionary is known in Step4.2, utilization, calculates co-occurrence word and the Words similarity knowing net emotion word, is expressed as follows:
In formula, Sim (s
i, p
j) represent co-occurrence word s
iwith know net emotion word p
jbetween Words similarity, i and j represents the subscript of arbitrary two words, m and n is respectively co-occurrence word s
iwith know net emotion word p
jsenses of a dictionary entry number,
represent co-occurrence word s
im the senses of a dictionary entry,
represent and know net emotion word p
jn-th senses of a dictionary entry, P represents the set knowing net emotion word;
In formula,
represent co-occurrence word s
iwith know net emotion word p
jsenses of a dictionary entry similarity, n
1and n
2be respectively the senses of a dictionary entry
with
in attribute number,
for the weighted value of the attribute of diverse location in senses of a dictionary entry definition, l is 1 to n
1a variable, f is 1 to n
2a variable,
with
for the senses of a dictionary entry
with
justice unit;
In formula,
represent co-occurrence word s
iwith know net emotion word p
jjustice unit similarity, d is
with
path distance in hierarchical system, α is an adjustable parameter;
The degree of correlation of the co-occurrence word of Step5, calculating neologisms and neologisms:
In formula, i and j represents the subscript of arbitrary two words, and R is the size of self-defining window, and r is the positive number being less than or equal to R, represents the distance of two words in R window, w
ijrepresent neologisms v
iwith neologisms v
ico-occurrence word v
jthe degree of correlation, N (v
i, r, v
j) be: neologisms v
iwith neologisms v
ico-occurrence word .v
j. the co-occurrence number of times (r < R) when distance is r in R window in relevant documentation set, C (v
i, v
j)=R-r+1 is the co-occurrence intensity between two words;
Step6, with the co-occurrence word of neologisms and these neologisms for node, between neologisms and its co-occurrence word, set up limit, the weight being limit with the degree of correlation in Step5, design of graphics model;
Step7, utilize Word similarity co-occurrence word s
ipolarity distribution on y:
In formula, RK is threshold value, s
iy () represents the polarity distribution of co-occurrence word, i and j represents the subscript of arbitrary two words, and Sim represents Words similarity,
for knowing just tendentious word in net emotion word,
for knowing in net emotion word negative tendentious word, the quantity of what count represented is word;
Step8, label propagation algorithm determination neologisms Sentiment orientation;
Step8.1, obtain neologisms polarity distribution, its objective function is as follows:
In the polarity identification of neologisms, think that the degree of correlation between word is higher, the polarity distribution between them is more similar.Based on above thought, set up following objective function.Make objective function C minimum, and then obtain neologisms node v
ipolarity distribution.
Wherein:
In formula, i and j represents the subscript of arbitrary two words, q
iy () represents neologisms node v
ipolarity distribution, s
iy () represents co-occurrence word v
jpolarity distribution, γ and λ is custom parameter, V
trepresent co-occurrence word set, K (V
t) represent co-occurrence word V
tk nearest neighbor set of words;
Step8.2, by the polarity of the neologisms obtained distribution be designated as Q
n, build linear classifier, obtain the Sentiment orientation of neologisms: when the probability that the Sentiment orientation of neologisms the is commendation probability deducted as derogatory sense is greater than threshold value RT, this neologisms Sentiment orientation is 1, is namely commendatory term; When the probability that the Sentiment orientation of neologisms the is commendation absolute value deducted as the probability of derogatory sense is less than threshold value RT, this neologisms Sentiment orientation is 0, is namely neutral words; When the probability that the Sentiment orientation of neologisms the is derogatory sense probability deducted as commendation is greater than threshold value RT, this neologisms Sentiment orientation is-1, is namely derogatory term;
In formula, Q
n(y=1) Q is represented
nfor the probability of commendation, Q
n(y=-1) Q is represented
nfor the probability of derogatory sense, RT is threshold value.
Principle of work of the present invention is:
Step 01) first obtain microblogging neologisms:
By Chinese word segmentation instrument, participle is carried out to microblogging language material.With the stop words in word segmentation result for cut-point carries out piecemeal to the language material after participle, by word string combination of two adjacent in each piece, the word string frequency after combination is added up, the word string of frequency higher than threshold value is gone here and there as neologisms candidate.According to the one-tenth word rule of Chinese and the rule of adjacent change number, filtration is carried out to neologisms candidate string and obtain neologisms;
Step 02) extract the emotion word knowing net, find out the word with new Term co-occurrence in microblogging language material.By these words and the similarity knowing their senses of a dictionary entry of Similarity Measure that the justice of net emotion word is former, and then obtain the similarity of word according to the similarity of the senses of a dictionary entry.Calculate the probability distribution of co-occurrence word polarity, select have intense emotion tendency co-occurrence word and neologisms as node, according to word and co-occurrence word and and other neologisms between co-occurrence Strength co-mputation node between the degree of correlation, as the weight on limit between node, design of graphics model;
Step 03) method that adopts label to propagate, known feeling polarities is distributed, is delivered to the neologisms strong with its degree of correlation, the distribution of the neologisms polarity of the unknown is calculated.Passing judgement on polarity distribution spatially according to neologisms, building linear classifier, the feeling polarities of neologisms is identified.
The invention has the beneficial effects as follows: a large amount of neologisms occur along with the fast development of microblogging, bloger can not only be made well to express the viewpoint of oneself to the judgement of the emotion tendency of these neologisms, and user can be allowed to hold bloger's Sentiment orientation accurately.Meanwhile, judge the Sentiment orientation of these neologisms in time, have very important meaning to the word segmentation processing in Chinese information processing, the analysis of public opinion etc.
Accompanying drawing explanation
Fig. 1 is process flow diagram of the present invention.
Embodiment
Embodiment 1: as shown in Figure 1, a kind of neologisms Sentiment orientation decision method based on microblogging, by Chinese word segmentation instrument, participle is carried out to microblogging language material, and with the stop words in word segmentation result for cut-point carries out piecemeal to the language material after participle, by word string combination of two adjacent in each piece, word string frequency after combination is added up, the word string of frequency higher than threshold value is gone here and there as neologisms candidate; According to the one-tenth word rule of Chinese and the rule of adjacent change number, filtration is carried out to neologisms candidate string and obtain neologisms; Recycling knows the sentiment dictionary of net, calculates co-occurrence word and the Words similarity knowing net emotion word; Calculate the degree of correlation of neologisms and co-occurrence word; With the co-occurrence word of neologisms and these neologisms for node, between neologisms and its co-occurrence word, set up limit, the weight being limit with the degree of correlation of neologisms and co-occurrence word, design of graphics model; Utilize label propagation algorithm to obtain the feeling polarities distribution of neologisms, obtain the emotion tendency of neologisms finally by structure linear classifier.
The concrete steps of described method are as follows:
Step1, by Chinese word segmentation instrument, participle is carried out to microblogging language material;
Step2, with the stop words in word segmentation result for cut-point carries out piecemeal to the language material after participle, and by word string combination of two adjacent in each piece, the word string frequency after combination is added up, the word string of frequency higher than threshold value T is gone here and there as neologisms candidate;
Step3, rule that is regular according to the one-tenth word of Chinese and adjacent change number are carried out filtration to neologisms candidate string and are obtained neologisms;
Step4, utilize and know the sentiment dictionary of net, calculate co-occurrence word and the Words similarity knowing net emotion word:
Step4.1, find out the non-stop words with new Term co-occurrence in microblogging language material, as co-occurrence word;
Net sentiment dictionary is known in Step4.2, utilization, calculates co-occurrence word and the Words similarity knowing net emotion word, is expressed as follows:
In formula, Sim (s
i, p
j) represent co-occurrence word s
iwith know net emotion word p
jbetween Words similarity, i and j represents the subscript of arbitrary two words, m and n is respectively co-occurrence word s
iwith know net emotion word p
jsenses of a dictionary entry number,
represent co-occurrence word s
im the senses of a dictionary entry,
represent and know net emotion word p
jn-th senses of a dictionary entry, P represents the set knowing net emotion word;
In formula,
represent co-occurrence word s
iwith know net emotion word p
jsenses of a dictionary entry similarity, n
1and n
2be respectively the senses of a dictionary entry
with
in attribute number,
for the weighted value of the attribute of diverse location in senses of a dictionary entry definition, l is 1 to n
1a variable, f is 1 to n
2a variable,
with
for the senses of a dictionary entry
with
justice unit;
In formula,
represent co-occurrence word s
iwith know net emotion word p
jjustice unit similarity, d is
with
path distance in hierarchical system, α is an adjustable parameter;
The degree of correlation of the co-occurrence word of Step5, calculating neologisms and neologisms:
In formula, i and j represents the subscript of arbitrary two words, and R is the size of self-defining window, and r is the positive number being less than or equal to R, represents the distance of two words in R window, w
ijrepresent neologisms v
iwith neologisms v
ico-occurrence word v
jthe degree of correlation, N (v
i, r, v
j) be: neologisms v
iwith neologisms v
ico-occurrence word v
jco-occurrence number of times (r < R) when distance is r in R window in relevant documentation set, C (v
i, v
j)=R-r+1 is the co-occurrence intensity between two words;
Step6, with the co-occurrence word of neologisms and these neologisms for node, between neologisms and its co-occurrence word, set up limit, the weight being limit with the degree of correlation in Step5, design of graphics model;
Step7, utilize Word similarity co-occurrence word si on y polarity distribution:
In formula, RK is threshold value, s
iy () represents the polarity distribution of co-occurrence word, i and j represents the subscript of arbitrary two words, and Sim represents Words similarity,
for knowing just tendentious word in net emotion word,
for knowing in net emotion word negative tendentious word, the quantity of what count represented is word;
Step8, label propagation algorithm determination neologisms Sentiment orientation;
Step8.1, obtain neologisms polarity distribution, its objective function is as follows:
Wherein:
In formula, i and j represents the subscript of arbitrary two words, q
iy () represents neologisms node v
ipolarity distribution, s
iy () represents co-occurrence word v
jpolarity distribution, γ and λ is custom parameter, V
trepresent co-occurrence word set, K (V
t) represent co-occurrence word V
tk nearest neighbor set of words;
Step8.2, by the polarity of the neologisms obtained distribution be designated as Q
n, build linear classifier, obtain the Sentiment orientation of neologisms: when the probability that the Sentiment orientation of neologisms the is commendation probability deducted as derogatory sense is greater than threshold value RT, this neologisms Sentiment orientation is 1, is namely commendatory term; When the probability that the Sentiment orientation of neologisms the is commendation absolute value deducted as the probability of derogatory sense is less than threshold value RT, this neologisms Sentiment orientation is 0, is namely neutral words; When the probability that the Sentiment orientation of neologisms the is derogatory sense probability deducted as commendation is greater than threshold value RT, this neologisms Sentiment orientation is-1, is namely derogatory term;
In formula, Q
n(y=1) Q is represented
nfor the probability of commendation, Q
n(y=-1) Q is represented
nfor the probability of derogatory sense, RT is threshold value.
Embodiment 2: as shown in Figure 1, a kind of neologisms Sentiment orientation decision method based on microblogging, by Chinese word segmentation instrument, participle is carried out to microblogging language material, and with the stop words in word segmentation result for cut-point carries out piecemeal to the language material after participle, by word string combination of two adjacent in each piece, word string frequency after combination is added up, the word string of frequency higher than threshold value is gone here and there as neologisms candidate; According to the one-tenth word rule of Chinese and the rule of adjacent change number, filtration is carried out to neologisms candidate string and obtain neologisms; Recycling knows the sentiment dictionary of net, calculates co-occurrence word and the Words similarity knowing net emotion word; Calculate the degree of correlation of neologisms and co-occurrence word; With the co-occurrence word of neologisms and these neologisms for node, between neologisms and its co-occurrence word, set up limit, the weight being limit with the degree of correlation of neologisms and co-occurrence word, design of graphics model; Utilize label propagation algorithm to obtain the feeling polarities distribution of neologisms, obtain the emotion tendency of neologisms finally by structure linear classifier.
Embodiment 3: as shown in Figure 1, a kind of neologisms Sentiment orientation decision method based on microblogging, by Chinese word segmentation instrument, participle is carried out to microblogging language material, and with the stop words in word segmentation result for cut-point carries out piecemeal to the language material after participle, by word string combination of two adjacent in each piece, word string frequency after combination is added up, the word string of frequency higher than threshold value is gone here and there as neologisms candidate; According to the one-tenth word rule of Chinese and the rule of adjacent change number, filtration is carried out to neologisms candidate string and obtain neologisms; Recycling knows the sentiment dictionary of net, calculates co-occurrence word and the Words similarity knowing net emotion word; Calculate the degree of correlation of neologisms and co-occurrence word; With the co-occurrence word of neologisms and these neologisms for node, between neologisms and its co-occurrence word, set up limit, the weight being limit with the degree of correlation of neologisms and co-occurrence word, design of graphics model; Utilize label propagation algorithm to obtain the feeling polarities distribution of neologisms, obtain the emotion tendency of neologisms finally by structure linear classifier.
The concrete steps of described method are as follows:
Step1, utilize Chinese Academy of Sciences's participle instrument, participle is carried out to microblogging language material, is input as 1,000 ten thousand microblogging language materials herein, export as microblogging language material after participle;
Step2, obtain neologisms candidate string, be input as microblogging language material after participle herein, export as neologisms candidate string:
With the stop words in word segmentation result for cut-point carries out piecemeal to the language material after participle, by word string combination of two (combined situation is as shown in table 1) adjacent in each piece, word string frequency after combination is added up, the word string of frequency higher than threshold value T=8 is gone here and there as neologisms candidate;
Word combination between table 1 stop words
Table 1 is depicted as the language material after participle, by the process of the word string combination of two between adjacent stop words.
Step3, acquisition neologisms, be input as neologisms candidate string herein, export as neologisms:
First filter neologisms candidate string according to the one-tenth word rule of Chinese, secondly utilize the rule of adjacent change number to carry out filtration to neologisms candidate string and obtain neologisms, the generative process of neologisms and number change are as table 2:
Table 2 neologisms generative process and number change
Table 2 is depicted as the change of the quantity of the word of every one-phase in new word discovery process.
Step4, utilize and know the sentiment dictionary of net, calculate co-occurrence word and the Words similarity knowing net emotion word:
Step4.1, find out the non-stop words with new Term co-occurrence in microblogging language material, as co-occurrence word;
Net sentiment dictionary is known in Step4.2, utilization, calculates co-occurrence word and the Words similarity knowing net emotion word, is expressed as follows:
In formula, Sim (s
i, p
j) represent co-occurrence word s
iwith know net emotion word p
jbetween Words similarity, i and j represents the subscript of arbitrary two words, m and n is respectively co-occurrence word s
iwith know net emotion word p
jsenses of a dictionary entry number,
represent co-occurrence word s
im the senses of a dictionary entry,
represent and know net emotion word p
jn-th senses of a dictionary entry, P represents the set knowing net emotion word;
In formula,
represent co-occurrence word s
iwith know net emotion word p
jsenses of a dictionary entry similarity, n
1and n
2be respectively the senses of a dictionary entry
with
in attribute number,
for the weighted value of the attribute of diverse location in senses of a dictionary entry definition, l is 1 to n
1a variable, f is 1 to n
2a variable,
with
for the senses of a dictionary entry
with
justice unit;
In formula,
represent co-occurrence word s
iwith know net emotion word p
jjustice unit similarity, d is
with
path distance in hierarchical system, α is an adjustable parameter, and value is 1.6;
The degree of correlation of the co-occurrence word of Step5, calculating neologisms and neologisms:
In formula, i and j represents the subscript of arbitrary two words, and R is the size of self-defining window, experimentally analyzes, and we get R=5, and r is the positive number being less than or equal to R, represents the distance of two words in R window, w
ijrepresent neologisms v
iwith neologisms v
ico-occurrence word v
jthe degree of correlation, N (v
i, r, v
j) be: neologisms v
iwith neologisms v
ico-occurrence word v
jco-occurrence number of times (r < R) when distance is r in R window in relevant documentation set, C (v
i, v
j)=R-r+1 is the co-occurrence intensity between two words;
Step6, with the co-occurrence word of neologisms and these neologisms for node, between neologisms and its co-occurrence word, set up limit, the weight being limit with the degree of correlation in Step5, design of graphics model;
Step7, utilize Word similarity co-occurrence word s
ipolarity distribution on y:
In formula, RK is threshold value, analyzes by experiment, and we get RK=0.25, s
iy () represents the polarity distribution of co-occurrence word, i and j represents the subscript of arbitrary two words, and Sim represents Words similarity,
for knowing just tendentious word in net emotion word,
for knowing in net emotion word negative tendentious word, the quantity of what count represented is word;
Step8, label propagation algorithm determination neologisms Sentiment orientation;
Step8.1 obtains the polarity distribution of neologisms, and its objective function is as follows:
In the polarity identification of neologisms, think that the degree of correlation between word is higher, the polarity distribution between them is more similar.Based on above thought, set up following objective function.Make objective function C minimum, and then obtain neologisms node v
ipolarity distribution.
Wherein:
In formula, i and j represents the subscript of arbitrary two words, q
iy () represents word node v
ipolarity distribution, s
iy () represents co-occurrence word v
jpolarity distribution, γ and λ is custom parameter, and value is 1 and 0.1, V respectively
trepresent co-occurrence word set, K (V
t) represent co-occurrence word V
tk nearest neighbor set of words, K value is 3 herein;
The distribution of the polarity of the neologisms obtained is designated as Q by Step8.2
n, build linear classifier, obtain the Sentiment orientation of neologisms:
In formula, Q
n(y=1) Q is represented
nfor the probability of commendation, Q
n(y=-1) Q is represented
nfor the probability of derogatory sense, RT is threshold value, analyzes by experiment, and we get RT=0.6.As shown by the equation, when the probability that the Sentiment orientation of neologisms the is commendation probability deducted as derogatory sense is greater than threshold value RT, this neologisms Sentiment orientation is 1, is namely commendatory term; When the probability that the Sentiment orientation of neologisms the is commendation absolute value deducted as the probability of derogatory sense is less than threshold value RT, this neologisms Sentiment orientation is 0, is namely neutral words; When the probability that the Sentiment orientation of neologisms the is derogatory sense probability deducted as commendation is greater than threshold value RT, this neologisms Sentiment orientation is-1, is namely derogatory term.
Table 3 part neologisms
Polarity | Neologisms |
Commendation | To power, Bai Fumei, sprout god, milk tea younger sister, man god |
Neutral | Private chat, little wind child, have drip, thief, do face, plunge into the commercial sea, cigarette friend |
Derogatory sense | To send out gram, overtax one's nerves, barrier set by the devil, brain deficiency, your younger sister, salty pig hand |
Table 3 is depicted as the partial feeling neologisms obtained by experiment, and the emotion tendency of these neologisms is divided three classes, i.e. commendation, neutrality and derogatory sense.
By reference to the accompanying drawings the specific embodiment of the present invention is explained in detail above, but the present invention is not limited to above-mentioned embodiment, in the ken that those of ordinary skill in the art possess, various change can also be made under the prerequisite not departing from present inventive concept.
Claims (2)
1. the neologisms Sentiment orientation decision method based on microblogging, it is characterized in that: by Chinese word segmentation instrument, participle is carried out to microblogging language material, and with the stop words in word segmentation result for cut-point carries out piecemeal to the language material after participle, by word string combination of two adjacent in each piece, word string frequency after combination is added up, the word string of frequency higher than threshold value is gone here and there as neologisms candidate; According to the one-tenth word rule of Chinese and the rule of adjacent change number, filtration is carried out to neologisms candidate string and obtain neologisms; Recycling knows the sentiment dictionary of net, calculates co-occurrence word and the Words similarity knowing net emotion word; Calculate the degree of correlation of neologisms and co-occurrence word; With the co-occurrence word of neologisms and these neologisms for node, between neologisms and its co-occurrence word, set up limit, the weight being limit with the degree of correlation of neologisms and co-occurrence word, design of graphics model; Utilize label propagation algorithm to obtain the feeling polarities distribution of neologisms, obtain the emotion tendency of neologisms finally by structure linear classifier.
2. the neologisms Sentiment orientation decision method based on microblogging according to claim 1, is characterized in that: the concrete steps of described method are as follows:
Step1, by Chinese word segmentation instrument, participle is carried out to microblogging language material;
Step2, with the stop words in word segmentation result for cut-point carries out piecemeal to the language material after participle, and by word string combination of two adjacent in each piece, the word string frequency after combination is added up, the word string of frequency higher than threshold value T is gone here and there as neologisms candidate;
Step3, rule that is regular according to the one-tenth word of Chinese and adjacent change number are carried out filtration to neologisms candidate string and are obtained neologisms;
Step4, utilize and know the sentiment dictionary of net, calculate co-occurrence word and the Words similarity knowing net emotion word:
Step4.1, find out the non-stop words with new Term co-occurrence in microblogging language material, as co-occurrence word;
Net sentiment dictionary is known in Step4.2, utilization, calculates co-occurrence word and the Words similarity knowing net emotion word, is expressed as follows:
In formula, Sim (s
i, p
j) represent co-occurrence word s
iwith know net emotion word p
jbetween Words similarity, i and j represents the subscript of arbitrary two words, m and n is respectively co-occurrence word s
iwith know net emotion word p
jsenses of a dictionary entry number,
represent co-occurrence word s
im the senses of a dictionary entry,
represent and know net emotion word p
jn-th senses of a dictionary entry, P represents the set knowing net emotion word;
In formula,
represent co-occurrence word s
iwith know net emotion word p
jsenses of a dictionary entry similarity, n
1and n
2be respectively the senses of a dictionary entry
with
in attribute number,
for the weighted value of the attribute of diverse location in senses of a dictionary entry definition, l is 1 to n
1a variable, f is 1 to n
2a variable,
with
for the senses of a dictionary entry
with
justice unit;
In formula,
represent co-occurrence word s
iwith know net emotion word p
jjustice unit similarity, d is
with
path distance in hierarchical system, α is an adjustable parameter;
The degree of correlation of the co-occurrence word of Step5, calculating neologisms and neologisms:
In formula, i and j represents the subscript of arbitrary two words, and R is the size of self-defining window, and r is the positive number being less than or equal to R, represents the distance of two words in R window, w
ijrepresent neologisms v
iwith neologisms v
ico-occurrence word v
jthe degree of correlation, N (v
i, r, v
j) be: neologisms v
iwith neologisms v
ico-occurrence word v
jco-occurrence number of times (r < R) when distance is r in R window in relevant documentation set, C (v
i, v
j)=R-r+1 is the co-occurrence intensity between two words;
Step6, with the co-occurrence word of neologisms and these neologisms for node, between neologisms and its co-occurrence word, set up limit, the weight being limit with the degree of correlation in Step5, design of graphics model;
Step7, utilize Word similarity co-occurrence word s
ipolarity distribution on y:
In formula, RK is threshold value, s
iy () represents the polarity distribution of co-occurrence word, i and j represents the subscript of arbitrary two words, and Sim represents Words similarity,
for knowing just tendentious word in net emotion word,
for knowing in net emotion word negative tendentious word, the quantity of what count represented is word;
Step8, label propagation algorithm determination neologisms Sentiment orientation;
Step8.1, obtain neologisms polarity distribution, its objective function is as follows:
Wherein:
In formula, i and j represents the subscript of arbitrary two words, q
iy () represents neologisms node v
ipolarity distribution, s
iy () represents co-occurrence word v
jpolarity distribution, γ and λ is custom parameter, V
trepresent co-occurrence word set, K (V
t) represent co-occurrence word V
tk nearest neighbor set of words;
Step8.2, by the polarity of the neologisms obtained distribution be designated as Q
n, build linear classifier, obtain the Sentiment orientation of neologisms: when the probability that the Sentiment orientation of neologisms the is commendation probability deducted as derogatory sense is greater than threshold value RT, this neologisms Sentiment orientation is 1, is namely commendatory term; When the probability that the Sentiment orientation of neologisms the is commendation absolute value deducted as the probability of derogatory sense is less than threshold value RT, this neologisms Sentiment orientation is 0, is namely neutral words; When the probability that the Sentiment orientation of neologisms the is derogatory sense probability deducted as commendation is greater than threshold value RT, this neologisms Sentiment orientation is-1, is namely derogatory term;
In formula, Q
n(y=1) Q is represented
nfor the probability of commendation, Q
n(y=-1) Q is represented
nfor the probability of derogatory sense, RT is threshold value.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510485811.XA CN105138510B (en) | 2015-08-10 | 2015-08-10 | A kind of neologisms Sentiment orientation determination method based on microblogging |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510485811.XA CN105138510B (en) | 2015-08-10 | 2015-08-10 | A kind of neologisms Sentiment orientation determination method based on microblogging |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105138510A true CN105138510A (en) | 2015-12-09 |
CN105138510B CN105138510B (en) | 2018-05-25 |
Family
ID=54723861
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510485811.XA Active CN105138510B (en) | 2015-08-10 | 2015-08-10 | A kind of neologisms Sentiment orientation determination method based on microblogging |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105138510B (en) |
Cited By (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105760439A (en) * | 2016-02-02 | 2016-07-13 | 西安交通大学 | Figure cooccurrence relation graph establishing method based on specific behavior cooccurrence network |
CN105786991A (en) * | 2016-02-18 | 2016-07-20 | 中国科学院自动化研究所 | Chinese emotion new word recognition method and system in combination with user emotion expression ways |
CN106202048A (en) * | 2016-07-15 | 2016-12-07 | 合肥指南针电子科技有限责任公司 | A kind of public sentiment monitoring system |
CN106294845A (en) * | 2016-08-19 | 2017-01-04 | 清华大学 | The many emotions sorting technique extracted based on weight study and multiple features and device |
CN106502984A (en) * | 2016-10-19 | 2017-03-15 | 上海智臻智能网络科技股份有限公司 | A kind of method and device of field new word discovery |
CN107145524A (en) * | 2017-04-12 | 2017-09-08 | 清华大学 | Suicide risk checking method and system based on microblogging and Fuzzy Cognitive Map |
CN107169142A (en) * | 2017-06-15 | 2017-09-15 | 厦门快商通科技股份有限公司 | A kind of document sentiment analysis system and method automatically updated |
CN107291686A (en) * | 2016-04-13 | 2017-10-24 | 北京大学 | The discrimination method of emotion identification and the identification system of emotion identification |
CN107341496A (en) * | 2016-05-03 | 2017-11-10 | 株式会社理光 | A kind of word analysis method and device |
CN107862089A (en) * | 2017-12-02 | 2018-03-30 | 北京工业大学 | A kind of tag extraction method based on perception data |
CN108268439A (en) * | 2016-12-30 | 2018-07-10 | 北京国双科技有限公司 | The processing method and processing device of text emotion |
CN108427668A (en) * | 2018-01-23 | 2018-08-21 | 山东汇贸电子口岸有限公司 | A kind of generation method of Chinese semantic base neologisms |
CN108595679A (en) * | 2018-05-02 | 2018-09-28 | 武汉斗鱼网络科技有限公司 | A kind of label determines method, apparatus, terminal and storage medium |
CN108681564A (en) * | 2018-04-28 | 2018-10-19 | 北京京东尚科信息技术有限公司 | The determination method, apparatus and computer readable storage medium of keyword and answer |
CN109408798A (en) * | 2018-07-27 | 2019-03-01 | 昆明理工大学 | A kind of word Sentiment orientation determination method |
CN110245345A (en) * | 2018-03-08 | 2019-09-17 | 普天信息技术有限公司 | Participle processing method and device suitable for network neologisms |
CN110472047A (en) * | 2019-07-15 | 2019-11-19 | 昆明理工大学 | A kind of Chinese of multiple features fusion gets over news viewpoint sentence abstracting method |
CN110619073A (en) * | 2019-08-30 | 2019-12-27 | 北京影谱科技股份有限公司 | Method and device for constructing video subtitle network expression dictionary based on Apriori algorithm |
CN111221962A (en) * | 2019-11-18 | 2020-06-02 | 重庆邮电大学 | Text emotion analysis method based on new word expansion and complex sentence pattern expansion |
CN111310455A (en) * | 2020-02-11 | 2020-06-19 | 安徽理工大学 | New emotion word polarity calculation method for online shopping comments |
CN112115260A (en) * | 2020-07-17 | 2020-12-22 | 网娱互动科技(北京)股份有限公司 | Method for automatically calculating Chinese word classification |
CN112232077A (en) * | 2020-09-30 | 2021-01-15 | 和美(深圳)信息技术股份有限公司 | New word discovery method, system, equipment and medium based on graph embedding |
CN112446210A (en) * | 2020-11-27 | 2021-03-05 | 广州三七互娱科技有限公司 | User gender prediction method and device and electronic equipment |
CN112529627A (en) * | 2020-12-16 | 2021-03-19 | 中国联合网络通信集团有限公司 | Method and device for extracting implicit attribute of commodity, computer equipment and storage medium |
CN112860907A (en) * | 2021-04-27 | 2021-05-28 | 华南师范大学 | Emotion classification method and equipment |
CN113076490A (en) * | 2021-04-25 | 2021-07-06 | 昆明理工大学 | Case-related microblog object-level emotion classification method based on mixed node graph |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6188977B1 (en) * | 1997-12-26 | 2001-02-13 | Canon Kabushiki Kaisha | Natural language processing apparatus and method for converting word notation grammar description data |
CN103544246A (en) * | 2013-10-10 | 2014-01-29 | 清华大学 | Method and system for constructing multi-emotion dictionary for internet |
CN103559233A (en) * | 2012-10-29 | 2014-02-05 | 中国人民解放军国防科学技术大学 | Extraction method for network new words in microblogs and microblog emotion analysis method and system |
CN104142913A (en) * | 2013-05-07 | 2014-11-12 | 株式会社日立制作所 | Distinguishing method and distinguishing system for polarities of words and expressions |
-
2015
- 2015-08-10 CN CN201510485811.XA patent/CN105138510B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6188977B1 (en) * | 1997-12-26 | 2001-02-13 | Canon Kabushiki Kaisha | Natural language processing apparatus and method for converting word notation grammar description data |
CN103559233A (en) * | 2012-10-29 | 2014-02-05 | 中国人民解放军国防科学技术大学 | Extraction method for network new words in microblogs and microblog emotion analysis method and system |
CN104142913A (en) * | 2013-05-07 | 2014-11-12 | 株式会社日立制作所 | Distinguishing method and distinguishing system for polarities of words and expressions |
CN103544246A (en) * | 2013-10-10 | 2014-01-29 | 清华大学 | Method and system for constructing multi-emotion dictionary for internet |
Non-Patent Citations (4)
Title |
---|
MICHAEL SPERIOSU等: "Twitter Polarity Classification with Label Propagation over Lexical Links and the Follower Graph", 《PROCEEDINGS OF EMNLP 2011, CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING》 * |
唐波等: "微博新词发现及情感倾向判断分析", 《山东大学学报(理学版)》 * |
李寿山等: "基于双语信息和标签传播算法的中文情感词典构建方法", 《中文信息学报》 * |
李钝等: "基于语义分析的词汇倾向识别研究", 《模式识别与人工智能》 * |
Cited By (41)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105760439B (en) * | 2016-02-02 | 2018-12-07 | 西安交通大学 | A kind of personage's cooccurrence relation map construction method based on specific behavior co-occurrence network |
CN105760439A (en) * | 2016-02-02 | 2016-07-13 | 西安交通大学 | Figure cooccurrence relation graph establishing method based on specific behavior cooccurrence network |
CN105786991A (en) * | 2016-02-18 | 2016-07-20 | 中国科学院自动化研究所 | Chinese emotion new word recognition method and system in combination with user emotion expression ways |
CN105786991B (en) * | 2016-02-18 | 2019-03-15 | 中国科学院自动化研究所 | In conjunction with the Chinese emotion new word identification method and system of user feeling expression way |
CN107291686B (en) * | 2016-04-13 | 2020-10-16 | 北京大学 | Method and system for identifying emotion identification |
CN107291686A (en) * | 2016-04-13 | 2017-10-24 | 北京大学 | The discrimination method of emotion identification and the identification system of emotion identification |
CN107341496A (en) * | 2016-05-03 | 2017-11-10 | 株式会社理光 | A kind of word analysis method and device |
CN106202048A (en) * | 2016-07-15 | 2016-12-07 | 合肥指南针电子科技有限责任公司 | A kind of public sentiment monitoring system |
CN106294845A (en) * | 2016-08-19 | 2017-01-04 | 清华大学 | The many emotions sorting technique extracted based on weight study and multiple features and device |
CN106294845B (en) * | 2016-08-19 | 2019-08-09 | 清华大学 | The susceptible thread classification method and device extracted based on weight study and multiple features |
CN106502984B (en) * | 2016-10-19 | 2019-05-24 | 上海智臻智能网络科技股份有限公司 | A kind of method and device of field new word discovery |
CN106502984A (en) * | 2016-10-19 | 2017-03-15 | 上海智臻智能网络科技股份有限公司 | A kind of method and device of field new word discovery |
CN108268439B (en) * | 2016-12-30 | 2021-09-07 | 北京国双科技有限公司 | Text emotion processing method and device |
CN108268439A (en) * | 2016-12-30 | 2018-07-10 | 北京国双科技有限公司 | The processing method and processing device of text emotion |
CN107145524A (en) * | 2017-04-12 | 2017-09-08 | 清华大学 | Suicide risk checking method and system based on microblogging and Fuzzy Cognitive Map |
CN107169142A (en) * | 2017-06-15 | 2017-09-15 | 厦门快商通科技股份有限公司 | A kind of document sentiment analysis system and method automatically updated |
CN107862089B (en) * | 2017-12-02 | 2020-03-13 | 北京工业大学 | Label extraction method based on perception data |
CN107862089A (en) * | 2017-12-02 | 2018-03-30 | 北京工业大学 | A kind of tag extraction method based on perception data |
CN108427668A (en) * | 2018-01-23 | 2018-08-21 | 山东汇贸电子口岸有限公司 | A kind of generation method of Chinese semantic base neologisms |
CN110245345A (en) * | 2018-03-08 | 2019-09-17 | 普天信息技术有限公司 | Participle processing method and device suitable for network neologisms |
CN108681564A (en) * | 2018-04-28 | 2018-10-19 | 北京京东尚科信息技术有限公司 | The determination method, apparatus and computer readable storage medium of keyword and answer |
CN108681564B (en) * | 2018-04-28 | 2021-06-29 | 北京京东尚科信息技术有限公司 | Keyword and answer determination method, device and computer readable storage medium |
CN108595679A (en) * | 2018-05-02 | 2018-09-28 | 武汉斗鱼网络科技有限公司 | A kind of label determines method, apparatus, terminal and storage medium |
CN109408798A (en) * | 2018-07-27 | 2019-03-01 | 昆明理工大学 | A kind of word Sentiment orientation determination method |
CN109408798B (en) * | 2018-07-27 | 2021-09-14 | 昆明理工大学 | Word emotional tendency judgment method |
CN110472047A (en) * | 2019-07-15 | 2019-11-19 | 昆明理工大学 | A kind of Chinese of multiple features fusion gets over news viewpoint sentence abstracting method |
CN110472047B (en) * | 2019-07-15 | 2022-12-13 | 昆明理工大学 | Multi-feature fusion Chinese-Yue news viewpoint sentence extraction method |
CN110619073B (en) * | 2019-08-30 | 2022-04-22 | 北京影谱科技股份有限公司 | Method and device for constructing video subtitle network expression dictionary based on Apriori algorithm |
CN110619073A (en) * | 2019-08-30 | 2019-12-27 | 北京影谱科技股份有限公司 | Method and device for constructing video subtitle network expression dictionary based on Apriori algorithm |
CN111221962A (en) * | 2019-11-18 | 2020-06-02 | 重庆邮电大学 | Text emotion analysis method based on new word expansion and complex sentence pattern expansion |
CN111221962B (en) * | 2019-11-18 | 2023-05-26 | 重庆邮电大学 | Text emotion analysis method based on new word expansion and complex sentence pattern expansion |
CN111310455A (en) * | 2020-02-11 | 2020-06-19 | 安徽理工大学 | New emotion word polarity calculation method for online shopping comments |
CN112115260A (en) * | 2020-07-17 | 2020-12-22 | 网娱互动科技(北京)股份有限公司 | Method for automatically calculating Chinese word classification |
CN112232077A (en) * | 2020-09-30 | 2021-01-15 | 和美(深圳)信息技术股份有限公司 | New word discovery method, system, equipment and medium based on graph embedding |
CN112446210A (en) * | 2020-11-27 | 2021-03-05 | 广州三七互娱科技有限公司 | User gender prediction method and device and electronic equipment |
CN112446210B (en) * | 2020-11-27 | 2024-01-09 | 广州三七互娱科技有限公司 | User gender prediction method and device and electronic equipment |
CN112529627A (en) * | 2020-12-16 | 2021-03-19 | 中国联合网络通信集团有限公司 | Method and device for extracting implicit attribute of commodity, computer equipment and storage medium |
CN112529627B (en) * | 2020-12-16 | 2023-06-13 | 中国联合网络通信集团有限公司 | Method and device for extracting implicit attribute of commodity, computer equipment and storage medium |
CN113076490A (en) * | 2021-04-25 | 2021-07-06 | 昆明理工大学 | Case-related microblog object-level emotion classification method based on mixed node graph |
CN112860907B (en) * | 2021-04-27 | 2021-06-29 | 华南师范大学 | Emotion classification method and equipment |
CN112860907A (en) * | 2021-04-27 | 2021-05-28 | 华南师范大学 | Emotion classification method and equipment |
Also Published As
Publication number | Publication date |
---|---|
CN105138510B (en) | 2018-05-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105138510A (en) | Microblog-based neologism emotional tendency judgment method | |
CN108573411A (en) | Depth sentiment analysis and multi-source based on user comment recommend the mixing of view fusion to recommend method | |
Banerjee et al. | Detection of cyberbullying using deep neural network | |
CN103500175B (en) | A kind of method based on sentiment analysis on-line checking microblog hot event | |
CN111274398B (en) | Method and system for analyzing comment emotion of aspect-level user product | |
CN107153642A (en) | A kind of analysis method based on neural network recognization text comments Sentiment orientation | |
Wu et al. | Chinese micro-blog sentiment analysis based on multiple sentiment dictionaries and semantic rule sets | |
CN109284506A (en) | A kind of user comment sentiment analysis system and method based on attention convolutional neural networks | |
CN105183748B (en) | A kind of combination forecasting method based on content and scoring | |
CN105069072A (en) | Emotional analysis based mixed user scoring information recommendation method and apparatus | |
CN107330461A (en) | Collaborative filtering recommending method based on emotion with trust | |
CN104899298A (en) | Microblog sentiment analysis method based on large-scale corpus characteristic learning | |
CN104268160A (en) | Evaluation object extraction method based on domain dictionary and semantic roles | |
CN103631859A (en) | Intelligent review expert recommending method for science and technology projects | |
CN107451607A (en) | A kind of personal identification method of the typical character based on deep learning | |
CN103034726B (en) | Text filtering system and method | |
CN108038205A (en) | For the viewpoint analysis prototype system of Chinese microblogging | |
CN108052505A (en) | Text emotion analysis method and device, storage medium, terminal | |
CN103838744A (en) | Method and device for analyzing query requirement | |
CN106257455A (en) | A kind of Bootstrapping algorithm based on dependence template extraction viewpoint evaluation object | |
CN103853746A (en) | Word bank generation method and system, input method and input system | |
CN110162625A (en) | Based on word in sentence to the irony detection method of relationship and context user feature | |
CN103473380A (en) | Computer text sentiment classification method | |
Lu et al. | Sentiment analysis of film review texts based on sentiment dictionary and SVM | |
CN105975497A (en) | Automatic microblog topic recommendation method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information |
Inventor after: Yan Xin Inventor after: Zhou Chao Inventor after: Yu Zhengtao Inventor after: Hong Xudong Inventor after: Xu Guangyi Inventor after: Fu Yunfa Inventor before: Yan Xin Inventor before: Zhou Chao Inventor before: Yu Zhengtao Inventor before: Hong Xudong Inventor before: Fu Yunfa |
|
CB03 | Change of inventor or designer information | ||
GR01 | Patent grant | ||
GR01 | Patent grant |