CN108038166A - A kind of Chinese microblog emotional analysis method based on the subjective and objective skewed popularity of lexical item - Google Patents
A kind of Chinese microblog emotional analysis method based on the subjective and objective skewed popularity of lexical item Download PDFInfo
- Publication number
- CN108038166A CN108038166A CN201711279503.7A CN201711279503A CN108038166A CN 108038166 A CN108038166 A CN 108038166A CN 201711279503 A CN201711279503 A CN 201711279503A CN 108038166 A CN108038166 A CN 108038166A
- Authority
- CN
- China
- Prior art keywords
- mrow
- msub
- emotion
- skewed popularity
- lexical item
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to a kind of Chinese microblog emotional analysis method based on the subjective and objective skewed popularity of lexical item.Step is as follows:(1) target microblog data collection to be analyzed is obtained;(2) every microblogging is segmented, the pre-operation such as part-of-speech tagging, stop words filtering, and the emotion word to preceding connecing negative word is combined operation;(3) to pretreated microblog data, emotion priori and skewed popularity priori are introduced;(4) skewed popularity, emotion and the theme label of each lexical item are sampled using Gibbs sampling algorithms;(5) skewed popularity and emotion Joint Distribution variable of every microblogging are calculated;(6) the final feeling polarities probability distribution of every microblogging is calculated, and then determines the feeling polarities of microblogging.This method proposes the concept of the subjective and objective skewed popularity (abbreviation skewed popularity) of lexical item for microblog data, combines modeling to the relation of skewed popularity, emotion and theme using Gibbs algorithms.This method is simple and practical, can significantly improve microblog emotional classification performance.
Description
Technical field
The present invention relates to a kind of sentiment analysis method to Chinese microblogging, is specifically related to be directed to microblog data collection, proposes
The concept of the skewed popularity of lexical item, while emotion priori and skewed popularity priori are introduced, based on skewed popularity, emotion and theme
Relation this triple combination is sampled using Gibbs sampling algorithms, then calculate the skewed popularity and emotion joint point of every microblogging
Cloth variable, then the final emotion probability distribution of every microblogging is calculated, and then determine the feeling polarities of microblogging, it is that one kind is based on lexical item
The Chinese microblog emotional analysis method of subjective and objective skewed popularity.
Background technology
In recent years, as the rapid development of Internet technology, various social media platforms emerge rapidly, people are more and more
Emotion or viewpoint that oneself is expressed using social medias such as microbloggings, there is the microblogging of magnanimity constantly producing and passing daily
Broadcast.Compared with traditional long text, how therefrom microblogging short text has the characteristics that brief, colloquial style, lack of standardization and feature are sparse,
Emotion or opinion knowledge are effectively excavated, has become an important research direction.
Mainly there are two major class methods to microblog emotional analysis at present:Method based on sentiment dictionary and based on machine learning
Method.Method based on sentiment dictionary mainly utilizes the emotion word in sentiment dictionary, by Keywords matching, and then determines one
The feeling polarities or intensity of text, the major defect of this kind of method are to be too dependent on surface characteristics.Side based on machine learning
Method is also divided into supervision, Weakly supervised and unsupervised approaches entirely.Full measure of supervision is instructed on the large-scale dataset manually marked first
Practice emotion classifiers, then trained grader is used in further emotional semantic classification, manually mark therein is especially time-consuming
Effort.Weakly supervised method is mainly marked by the use of noise tokens such as the emoticons in social networks text as the emotion of text,
Full measure of supervision training grader is reused, but noise present in marking can also influence the performance of grader.Unsupervised side
Training set is not required in method, guides sentiment analysis using emotion word as emotion priori mostly.
Recent studies suggest that the emotion of text has dependence with theme, on this idea basis, occur very much
The conjunctive model of emotion and theme.This kind of unsupervised approaches are primarily based on emotion and the relation of theme constructs rational lexical item life
Into model, combined sampling then is carried out to the emotion and theme of lexical item using the Gibbs method of samplings, and then calculate the emotion of text
Distribution, and using the emotional category belonging to the emotional category as text of maximum probability.
It is above-mentioned to be had the following disadvantages based on emotion and thematic relation to analyze the method for microblog emotional:
(1) they only think that emotion and theme have dependence, do not account for influence of the skewed popularity to emotion;
(2) when for microblogging field, they cannot utilize this most typical affective characteristics of emoticon well;
(3) due to not accounting for skewed popularity, they can not utilize the skewed popularity that the part of speech of emoticon and lexical item is included
Priori.
The content of the invention
It is an object of the invention to for the deficiency in terms of current Chinese microblog emotional analysis, there is provided one kind is based on lexical item master
The Chinese microblog emotional analysis method of objective skewed popularity, this method propose the concept of the skewed popularity of lexical item, while introduce emotion elder generation
Knowledge and skewed popularity priori are tested, the relation based on skewed popularity, emotion and theme joins this three using Gibbs sampling algorithms
Sampling is closed, then calculates the skewed popularity and emotion Joint Distribution variable of every microblogging, then to calculate the final emotion of every microblogging general
Rate is distributed, and then determines the feeling polarities of microblogging.
To achieve the above object, design of the invention is as follows:Obtain target microblog data collection to be analyzed and carry out
Pretreatment;Emotion priori and skewed popularity priori are introduced, and the deviation of each lexical item is sampled using Gibbs sampling algorithms
Property, emotion and theme label;Calculate the skewed popularity and emotion Joint Distribution variable of every microblogging;Calculate the final feelings of every microblogging
Feel polarity probability distribution, and then determine the feeling polarities of microblogging.
Thought according to above-mentioned invention, the present invention use following technical proposals:
A kind of Chinese microblog emotional analysis method based on the subjective and objective skewed popularity of lexical item, it is characterised in that:Including following step
Suddenly:
Step 1, obtain target microblog data collection to be analyzed;
Step 2, segment every microblogging, the pre-operation such as part-of-speech tagging, stop words filtering, and to preceding connecing negative word
Emotion word is combined operation;
Step 3, to pretreated microblog data, introduce emotion priori and skewed popularity priori, emotion priori
Knowledge includes emotion word and emoticon;The skewed popularity of lexical item includes subjective skewed popularity and objective skewed popularity, the former refers to partially
To in the subjective emotion of expression, the latter refers to being partial to describe objective things;This method uses emoticon to be inclined to as subjective
Property priori, time word, place word and pronoun are as objective skewed popularity priori;Introduce emotion and skewed popularity priori
Process be specially:
Step 3a, build skewed popularity transfer matrix η, K × S × T × V's of transference matrix λ, K × V of empty S × V
β matrixes and final prior matrix F (β, η, λ);Wherein S, T, K, V represent respectively emotion number, theme number, skewed popularity number and
Different lexical item numbers in data set;
Step 3b, ηK×VAnd λS×VElement be initialized as 1;
Step 3c, for each lexical item w ∈ { 1 ..., V }, every kind of skewed popularity mark c ∈ { 1 ..., K } and every kind of emotion
Mark l ∈ { 1 ..., S }, if w is skewed popularity priori, ηK×VIn element ηcwRenewal is as follows:
If w is emotion priori, λS×VIn element λlwRenewal is as follows:
Wherein, K (w) is the corresponding skewed popularity labels of w, and S (w) is the corresponding emotion labels of w;
Step 3d, for each lexical item w ∈ { 1 ..., V }, every kind of emotion mark l ∈ { 1 ..., S }, every kind of skewed popularity mark
Note c ∈ 1 ..., K }
With
Each theme z ∈ { 1 ..., T }, final priori Fc,l,z,w(β, η, λ) is:
Fc,l,z,w(β, η, λ)=ηc,w·βc,l,z,w·λl,w
Step 4, according to pretreated microblog data and priori, utilize Gibbs sampling algorithms to sample each word
Skewed popularity, emotion and the theme label of item, concentrate data the lexical item w of each position iiSkewed popularity label ci, emotion label li
With theme label ziSampling is as follows:
P(ci=k, li=s, zi=t | w, c-i,l-i,z-i,ε,γ,α,β,η,λ)∝
Wherein, NdRepresent wiThe lexical item number of place text d, Nd,kRepresent the lexical item number for belonging to skewed popularity k in text d, Nd,k,s
Represent the lexical item number for belonging to skewed popularity k and emotion s in text d, Nd,k,s,tRepresent to belong to skewed popularity k, emotion s and master in text d
Inscribe the lexical item number of t, Nk,s,tRepresent the lexical item number for belonging to skewed popularity k, emotion s and theme t in data set, Nk,s,t,wRepresent data
Concentrate the number for the lexical item w for belonging to skewed popularity k, emotion s and theme t;In addition, the priori meter that ε, γ are skewed popularity and emotion marks
Number, is empirical value, and the priori that α is the theme counts, and learns to obtain by maximal possibility estimation, and-i represents not include current lexical item;
Step 5, the sampling by certain iterations, calculate the skewed popularity and emotion Joint Distribution variable of every microblogging,
The skewed popularity and emotion Joint Distribution variable of microblogging d calculates as follows:
Step 6, calculate the final emotion probability distribution of every microblogging, and the feeling polarities of select probability maximum are as microblogging
Feeling polarities, the final emotion probability distribution calculating of microblogging d are as follows:
The present invention's is a kind of based on the Chinese microblog emotional analysis method of the subjective and objective skewed popularity of lexical item and existing technology phase
Compare, have the advantages that following outstanding feature and:First, it is believed that emotion, skewed popularity and theme are interrelated in microblogging text, to text
This semanteme carried out deeper into analysis;Second, while emotion priori and skewed popularity priori are introduced, it is not only fully sharp
With emoticon, and the part-of-speech tagging of lexical item is combined, used the emotion correlated characteristic of text to a greater extent;The
Three, using influence of the skewed popularity to emotion, to emotion, skewed popularity and theme combined sampling so that final emotional semantic classification is more
Accurately.
Brief description of the drawings
Fig. 1 is a kind of flow chart of Chinese microblog emotional analysis method based on the subjective and objective skewed popularity of lexical item of the present invention.
Embodiment
The embodiment of the present invention is further described below in conjunction with attached drawing.
The Method And Principle of the present invention is introduced first:
A kind of Chinese microblog emotional analysis method based on the subjective and objective skewed popularity of lexical item, it is characterised in that its specific steps
It is as follows:
(1) target microblog data collection to be analyzed is obtained;
(2) every microblogging is segmented, the pre-operation such as part-of-speech tagging, stop words filtering, and to preceding connecing the feelings of negative word
Feel word
It is combined operation;
(3) to pretreated microblog data, emotion priori and skewed popularity priori are introduced;
(4) according to pretreated microblog data and priori, each lexical item is sampled using Gibbs sampling algorithms
Skewed popularity, emotion and theme label;
(5) by the sampling of certain iterations, the skewed popularity and emotion Joint Distribution variable of every microblogging of calculating;
(6) the final emotion probability distribution of every microblogging is calculated, the feelings of the feeling polarities of select probability maximum as microblogging
Feel polarity.In (2) step, emotion word and negative word combination are into after neologisms, the emotion attribute before having no longer;
In (3) step, emotion priori includes emotion word and emoticon.The skewed popularity of lexical item includes subjective skewed popularity
With objective skewed popularity, the former refers to being partial to express subjective emotion, and the latter refers to being partial to describe objective things.This method
Using emoticon as subjective skewed popularity priori, time word, place word and pronoun are as objective skewed popularity priori.
The process for introducing emotion and skewed popularity priori is specially:
The β squares of skewed popularity transfer matrix η, K × S × T × V of transference matrix λ, K × V of the empty S × V of (3a) structure
Battle array and final prior matrix F (β, η, λ).Wherein S, T, K, V represent emotion number, theme number, skewed popularity number sum number respectively
According to the different lexical item numbers of concentration;
(3b)ηK×VAnd λS×VElement be initialized as 1;
(3c) is marked for each lexical item w ∈ { 1 ..., V }, every kind of skewed popularity mark c ∈ { 1 ..., K } and every kind of emotion
L ∈ { 1 ..., S }, if w is skewed popularity priori, ηK×VIn element ηcwRenewal is as follows:
If w is emotion priori, λS×VIn element λlwRenewal is as follows:
Wherein, K (w) is the corresponding skewed popularity labels of w, and S (w) is the corresponding emotion labels of w;
(3d) is for each lexical item w ∈ { 1 ..., V }, every kind of emotion mark l ∈ { 1 ..., S }, every kind of skewed popularity mark c
∈ { 1 ..., K } and each theme z ∈ { 1 ..., T }, final priori Fc,l,z,w(β, η, λ) is:
Fc,l,z,w(β, η, λ)=ηc,w·βc,l,z,w.λl,w
In (4) step, data are concentrated with the lexical item w of each position iiSkewed popularity label ci, emotion label liWith theme mark
Number ziSampling is as follows:
P(ci=k, li=s, zi=t | w, c-i,l-i,z-i,ε,γ,α,β,η,λ)∝
Wherein, NdRepresent wiThe lexical item number of place text d, Nd,kRepresent the lexical item number for belonging to skewed popularity k in text d, Nd,k,s
Represent the lexical item number for belonging to skewed popularity k and emotion s in text d, Nd,k,s,tRepresent to belong to skewed popularity k, emotion s and master in text d
Inscribe the lexical item number of t, Nk,s,tRepresent the lexical item number for belonging to skewed popularity k, emotion s and theme t in data set, Nk,s,t,wRepresent data
Concentrate the number for the lexical item w for belonging to skewed popularity k, emotion s and theme t.In addition, the priori meter of ε, γ skewed popularity and emotion mark
Number, is empirical value, and the priori that α is the theme counts, and learns to obtain by maximal possibility estimation, and-i represents not include current lexical item.
In (5) step, skewed popularity and emotion Joint Distribution the variable calculating of microblogging d is as follows:
In (6) step, the final emotion probability distribution calculating of microblogging d is as follows:
Wherein, ksubRepresent subjective skewed popularity label, kobjRepresent objective skewed popularity label, WsubAnd WobjRepresent respectively subjective
Emotion under skewed popularity and under objective skewed popularity is distributed in weight shared in overall emotion distribution.Obviously, subjective skewed popularity
Emotion distribution shared by weight bigger.
Here is the specific embodiment with reference to the above method.
Embodiment one:Referring to Fig. 1, based on the Chinese microblog emotional analysis method of the subjective and objective skewed popularity of lexical item, its feature
It is:For microblog data collection, the concept of the skewed popularity of lexical item is proposed, while introduce emotion priori and skewed popularity priori is known
Know, the relation based on skewed popularity, emotion and theme samples this triple combination using Gibbs sampling algorithms, and then determines microblogging
Feeling polarities.
The process of the introducing emotion priori and skewed popularity priori is as follows:
The β squares of skewed popularity transfer matrix η, K × S × T × V of transference matrix λ, K × V of the empty S × V of (3a) structure
Battle array and final prior matrix F (β, η, λ).Wherein S, T, K, V represent emotion number, theme number, skewed popularity number sum number respectively
According to the different lexical item numbers of concentration;
(3b)ηK×VAnd λS×VElement be initialized as 1;
(3c) is marked for each lexical item w ∈ { 1 ..., V }, every kind of skewed popularity mark c ∈ { 1 ..., K } and every kind of emotion
L ∈ { 1 ..., S }, if w is skewed popularity priori, ηK×VIn element ηcwRenewal is as follows:
If w is emotion priori, λS×VIn element λlwRenewal is as follows:
Wherein, K (w) is the corresponding skewed popularity labels of w, and S (w) is the corresponding emotion labels of w;
(3d) is for each lexical item w ∈ { 1 ..., V }, every kind of emotion mark l ∈ { 1 ..., S }, every kind of skewed popularity mark c
∈ { 1 ..., K } and each theme z ∈ { 1 ..., T }, final priori Fc,l,z,w(β, η, λ) is:
Fc,l,z,w(β, η, λ)=ηc,w·βc,l,z,w·λl,w
The lexical item w that data are concentrated with each position iiSkewed popularity label ci, emotion label liWith theme label zi's
Sampling type is as follows:
P(ci=k, li=s, zi=t | w, c-i,l-i,z-i,ε,γ,α,β,η,λ)∝
Wherein, NdRepresent wiThe lexical item number of place text d, Nd,kRepresent the lexical item number for belonging to skewed popularity k in text d, Nd,k,s
Represent the lexical item number for belonging to skewed popularity k and emotion s in text d, Nd,k,s,tRepresent to belong to skewed popularity k, emotion s and master in text d
Inscribe the lexical item number of t, Nk,s,tRepresent the lexical item number for belonging to skewed popularity k, emotion s and theme t in data set, Nk,s,t,wRepresent data
Concentrate the number for the lexical item w for belonging to skewed popularity k, emotion s and theme t.In addition, the priori meter of ε, γ skewed popularity and emotion mark
Number, is empirical value, and the priori that α is the theme counts, and estimates study by maximum likelihood and obtains, and-i represents not include current lexical item.
The skewed popularity and emotion Joint Distribution variable calculating formula to every microblogging d is as follows:
The final emotion probability distribution to microblogging d calculates as follows:
Wherein, ksubRepresent subjective skewed popularity label, kobjRepresent objective skewed popularity label, WsubAnd WobjRepresent respectively subjective
Emotion under skewed popularity and under objective skewed popularity is distributed in weight shared in overall emotion distribution.Obviously, subjective skewed popularity
Emotion distribution shared by weight bigger.
Embodiment two:Based on the Chinese microblog emotional analysis method of the subjective and objective skewed popularity of lexical item, from Sina weibo website
Crawl 3000
Bar microblogging is as target data set to be analyzed.As shown in Figure 1, one kind of the present embodiment is subjective and objective partially based on lexical item
Tropism
Chinese microblog emotional analysis method, its step is as follows:
S1. it is " modern as target data set to be analyzed, such as microblogging 3000 microblog datas to be crawled from Sina weibo website
It has bought new cell-phone, good happy![heartily] ";
S2. every microblogging is segmented, the pre-operation such as part-of-speech tagging, stop words filtering, and to preceding connecing the feelings of negative word
Sense word is combined operation.For example, using the Chinese Academy of Sciences NLPIR as participle and part-of-speech tagging instrument, microblogging " bought newly by today
Mobile phone, it is good happy![heartily] " the lexical item sequence after processing for " today ", " buying ", " new ", " mobile phone ", " good ", " happy ",
" [heartily] " }, corresponding part of speech sequence is { time word, verb, adjective, noun, adjective, prefix, character string };
S3. to pretreated microblog data, emotion priori and skewed popularity priori are introduced.For example, for place
The emoticon in microblogging lexical item sequence { " today ", " buying ", " new ", " mobile phone ", " good ", " happy ", " [heartily] " } after reason
" [heartily] ", it is corresponded to marked as 5 in language material lexicon, and polarity is forward direction, then its corresponding η·,5For [0,1]T, wherein
Element be respectively objective skewed popularity and subjective skewed popularity priori, corresponding λ·,5For [1,0]T, element therein is respectively forward direction
With negative sense polarity priori.Set in β that all elements is 0.1, then for each theme z ∈ { 1 ..., T }, final priori
F·,.,z,5For:
S4. data are concentrated with the lexical item w of each position iiSkewed popularity label ci, emotion label liWith theme label ziInto
Row sampling.Such as the emoticon " [heartily] " in the microblogging d " it is good happy [heartily] that today buys new cell-phone " after processing, its
The sampling type of skewed popularity, emotion and theme label is as follows:
P(ci=k, li=s, zi=t | w, c-i,l-i,z-i,ε,γ,α,β,η,λ)∝
Wherein, NdRepresent current lexical item wiThe lexical item number of place text d, Nd,kRepresent the lexical item for belonging to skewed popularity k in text d
Number, Nd,k,sRepresent the lexical item number for belonging to skewed popularity k and emotion s in text d, Nd,k,s,tRepresent to belong to skewed popularity k, feelings in text d
Feel the lexical item number of s and theme t, Nk,s,tRepresent the lexical item number for belonging to skewed popularity k, emotion s and theme t in data set, Nk,s,t,w
Represent the number for belonging to the lexical item w of skewed popularity k, emotion s and theme t in data set ,-i represents not include current lexical item.Based on α
The priori of topic counts, by the acquistion of maximum likelihood numerology to.In addition, the priori of ε, γ skewed popularity and emotion mark counts, for warp
Value is tested, sets γ=0.1*AL/S, ε=0.1*AL/K, wherein AL represents the average text size of microblog data collection;
S5. pass through the sampling of certain iterations, calculate the skewed popularity and emotion Joint Distribution variable of every microblogging.Such as
After 1000 sampling iteration, the skewed popularity and emotion Joint Distribution of microblogging d " it is good happy [heartily] that today buys new cell-phone "
Variable calculating formula is as follows:
S6. the final emotion probability distribution of every microblogging is calculated, the feelings of the feeling polarities of select probability maximum as microblogging
Feel polarity.Such as microblogging d " it is good happy [heartily] that today buys new cell-phone " it is final emotion probability distribution calculating formula it is as follows
Wherein, ksubRepresent subjective skewed popularity label, kobjRepresent objective skewed popularity label.This microblogging after normalization
The probability for belonging to positively and negatively feeling polarities is respectively 0.893434 and 0.106566, therefore judges the emotion pole of this microblogging
Property for forward direction.
Specific embodiment described herein is only to spirit explanation for example of the invention.Technology belonging to the present invention is led
The technical staff in domain can do various modifications or additions to described specific embodiment or replace in a similar way
Generation, but without departing from spirit of the invention or beyond the scope of the appended claims.
Claims (1)
- A kind of 1. Chinese microblog emotional analysis method based on the subjective and objective skewed popularity of lexical item, it is characterised in that:Comprise the following steps:Step 1, obtain target microblog data collection to be analyzed;Step 2, segment every microblogging, the pre-operation such as part-of-speech tagging, stop words filtering, and to preceding connecing the emotion of negative word Word is combined operation;Step 3, to pretreated microblog data, introduce emotion priori and skewed popularity priori, emotion priori Including emotion word and emoticon;The skewed popularity of lexical item includes subjective skewed popularity and objective skewed popularity, the former refers to being partial to The subjective emotion of expression, the latter refer to being partial to describe objective things;This method uses emoticon first as subjective skewed popularity Knowledge is tested, time word, place word and pronoun are as objective skewed popularity priori;Introduce the mistake of emotion and skewed popularity priori Journey is specially:Step 3a, the β squares of skewed popularity transfer matrix η, K × S × T × V of transference matrix λ, K × V of the empty S × V of structure Battle array and final prior matrix F (β, η, λ);Wherein S, T, K, V represent emotion number, theme number, skewed popularity number sum number respectively According to the different lexical item numbers of concentration;Step 3b, ηK×VAnd λS×VElement be initialized as 1;Step 3c, l is marked for each lexical item w ∈ { 1 ..., V }, every kind of skewed popularity mark c ∈ { 1 ..., K } and every kind of emotion ∈ { 1 ..., S }, if w is skewed popularity priori, ηK×VIn element ηcwRenewal is as follows:If w is emotion priori, λS×VIn element λlwRenewal is as follows:Wherein, K (w) is the corresponding skewed popularity labels of w, and S (w) is the corresponding emotion labels of w;Step 3d, for each lexical item w ∈ { 1 ..., V }, every kind of emotion mark l ∈ { 1 ..., S }, every kind of skewed popularity mark c ∈ { 1 ..., K } and each theme z ∈ { 1 ..., T }, final priori Fc,l,z,w(β, η, λ) is:Fc,l,z,w(β, η, λ)=ηc,w·βc,l,z,w·λl,wStep 4, according to pretreated microblog data and priori, utilize Gibbs sampling algorithms to sample each lexical item Skewed popularity, emotion and theme label, concentrate data the lexical item w of each position iiSkewed popularity label ci, emotion label liAnd master Inscribe label ziSampling is as follows:<mfenced open = "" close = ""> <mtable> <mtr> <mtd> <mrow> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>c</mi> <mi>i</mi> </msub> <mo>=</mo> <mi>k</mi> <mo>,</mo> <msub> <mi>l</mi> <mi>i</mi> </msub> <mo>=</mo> <mi>s</mi> <mo>,</mo> <msub> <mi>z</mi> <mi>i</mi> </msub> <mo>=</mo> <mi>t</mi> <mo>|</mo> <mi>w</mi> <mo>,</mo> <msup> <mi>c</mi> <mrow> <mo>-</mo> <mi>i</mi> </mrow> </msup> <mo>,</mo> <msup> <mi>l</mi> <mrow> <mo>-</mo> <mi>i</mi> </mrow> </msup> <mo>,</mo> <msup> <mi>z</mi> <mrow> <mo>-</mo> <mi>i</mi> </mrow> </msup> <mo>,</mo> <mi>&epsiv;</mi> <mo>,</mo> <mi>&gamma;</mi> <mo>,</mo> <mi>&alpha;</mi> <mo>,</mo> <mi>&beta;</mi> <mo>,</mo> <mi>&eta;</mi> <mo>,</mo> <mi>&lambda;</mi> <mo>)</mo> </mrow> <mo>&Proportional;</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mfrac> <mrow> <msubsup> <mi>N</mi> <mrow> <mi>d</mi> <mo>,</mo> <mi>k</mi> </mrow> <mrow> <mo>-</mo> <mi>i</mi> </mrow> </msubsup> <mo>+</mo> <mi>&epsiv;</mi> </mrow> <mrow> <msubsup> <mi>N</mi> <mi>d</mi> <mrow> <mo>-</mo> <mi>i</mi> </mrow> </msubsup> <mo>+</mo> <mi>K</mi> <mo>&CenterDot;</mo> <mi>&epsiv;</mi> </mrow> </mfrac> <mo>&CenterDot;</mo> <mfrac> <mrow> <msubsup> <mi>N</mi> <mrow> <mi>d</mi> <mo>,</mo> <mi>k</mi> <mo>,</mo> <mi>s</mi> </mrow> <mrow> <mo>-</mo> <mi>i</mi> </mrow> </msubsup> <mo>+</mo> <mi>&gamma;</mi> </mrow> <mrow> <msubsup> <mi>N</mi> <mrow> <mi>d</mi> <mo>,</mo> <mi>k</mi> </mrow> <mrow> <mo>-</mo> <mi>i</mi> </mrow> </msubsup> <mo>+</mo> <mi>S</mi> <mo>&CenterDot;</mo> <mi>&gamma;</mi> </mrow> </mfrac> <mo>&CenterDot;</mo> <mfrac> <mrow> <msubsup> <mi>N</mi> <mrow> <mi>d</mi> <mo>,</mo> <mi>k</mi> <mo>,</mo> <mi>s</mi> <mo>,</mo> <mi>t</mi> </mrow> <mrow> <mo>-</mo> <mi>i</mi> </mrow> </msubsup> <mo>+</mo> <msub> <mi>&alpha;</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>s</mi> <mo>,</mo> <mi>t</mi> </mrow> </msub> </mrow> <mrow> <msubsup> <mi>N</mi> <mrow> <mi>d</mi> <mo>,</mo> <mi>k</mi> <mo>,</mo> <mi>s</mi> </mrow> <mrow> <mo>-</mo> <mi>i</mi> </mrow> </msubsup> <mo>+</mo> <msubsup> <mo>&Sigma;</mo> <mi>t</mi> <mi>T</mi> </msubsup> <msub> <mi>&alpha;</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>s</mi> <mo>,</mo> <mi>t</mi> </mrow> </msub> </mrow> </mfrac> <mo>&CenterDot;</mo> <mfrac> <mrow> <msubsup> <mi>N</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>s</mi> <mo>,</mo> <mi>t</mi> <mo>,</mo> <msub> <mi>w</mi> <mi>i</mi> </msub> </mrow> <mrow> <mo>-</mo> <mi>i</mi> </mrow> </msubsup> <mo>+</mo> <msub> <mi>F</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>s</mi> <mo>,</mo> <mi>t</mi> <mo>,</mo> <msub> <mi>w</mi> <mi>i</mi> </msub> </mrow> </msub> </mrow> <mrow> <msubsup> <mi>N</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>s</mi> <mo>,</mo> <mi>t</mi> </mrow> <mrow> <mo>-</mo> <mi>i</mi> </mrow> </msubsup> <mo>+</mo> <msubsup> <mo>&Sigma;</mo> <mi>v</mi> <mi>V</mi> </msubsup> <msub> <mi>F</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>s</mi> <mo>,</mo> <mi>t</mi> <mo>,</mo> <mi>v</mi> </mrow> </msub> </mrow> </mfrac> </mrow> </mtd> </mtr> </mtable> </mfenced>Wherein, NdRepresent wiThe lexical item number of place text d, Nd,kRepresent the lexical item number for belonging to skewed popularity k in text d, Nd,k,sRepresent Belong to the lexical item number of skewed popularity k and emotion s, N in text dd,k,s,tRepresent to belong to skewed popularity k, emotion s and theme t in text d Lexical item number, Nk,s,tRepresent the lexical item number for belonging to skewed popularity k, emotion s and theme t in data set, Nk,s,t,wRepresent in data set Belong to the number of the lexical item w of skewed popularity k, emotion s and theme t;In addition, the priori counting that ε, γ are skewed popularity and emotion marks, For empirical value, the priori that α is the theme counts, learns to obtain by maximal possibility estimation, and-i represents not include current lexical item;Step 5, the sampling by certain iterations, calculate the skewed popularity and emotion Joint Distribution variable of every microblogging, microblogging d Skewed popularity and emotion Joint Distribution variable calculate it is as follows:<mrow> <msub> <mi>&pi;</mi> <mrow> <mi>d</mi> <mo>,</mo> <mi>k</mi> <mo>,</mo> <mi>s</mi> </mrow> </msub> <mo>=</mo> <mfrac> <mrow> <msub> <mi>N</mi> <mrow> <mi>d</mi> <mo>,</mo> <mi>k</mi> <mo>,</mo> <mi>s</mi> </mrow> </msub> <mo>+</mo> <mi>&gamma;</mi> </mrow> <mrow> <msub> <mi>N</mi> <mrow> <mi>d</mi> <mo>,</mo> <mi>k</mi> </mrow> </msub> <mo>+</mo> <mi>S</mi> <mo>&CenterDot;</mo> <mi>&gamma;</mi> </mrow> </mfrac> </mrow>Step 6, calculate the final emotion probability distribution of every microblogging, the emotion of the feeling polarities of select probability maximum as microblogging Polarity, the final emotion probability distribution calculating of microblogging d are as follows:<mrow> <msub> <mo>&Pi;</mo> <mrow> <mi>d</mi> <mo>,</mo> <mi>s</mi> </mrow> </msub> <mo>=</mo> <msub> <mi>W</mi> <mrow> <mi>s</mi> <mi>u</mi> <mi>b</mi> </mrow> </msub> <mo>&CenterDot;</mo> <mfrac> <mrow> <msub> <mi>N</mi> <mrow> <mi>d</mi> <mo>,</mo> <msub> <mi>k</mi> <mrow> <mi>s</mi> <mi>u</mi> <mi>b</mi> </mrow> </msub> <mo>,</mo> <mi>s</mi> </mrow> </msub> <mo>+</mo> <mi>&gamma;</mi> </mrow> <mrow> <msub> <mi>N</mi> <mrow> <mi>d</mi> <mo>,</mo> <msub> <mi>k</mi> <mrow> <mi>s</mi> <mi>u</mi> <mi>b</mi> </mrow> </msub> </mrow> </msub> <mo>+</mo> <mi>S</mi> <mo>&CenterDot;</mo> <mi>&gamma;</mi> </mrow> </mfrac> <mo>+</mo> <msub> <mi>W</mi> <mrow> <mi>o</mi> <mi>b</mi> <mi>j</mi> </mrow> </msub> <mo>&CenterDot;</mo> <mfrac> <mrow> <msub> <mi>N</mi> <mrow> <mi>d</mi> <mo>,</mo> <msub> <mi>k</mi> <mrow> <mi>o</mi> <mi>b</mi> <mi>j</mi> </mrow> </msub> <mo>,</mo> <mi>s</mi> </mrow> </msub> <mo>+</mo> <mi>&gamma;</mi> </mrow> <mrow> <msub> <mi>N</mi> <mrow> <mi>d</mi> <mo>,</mo> <msub> <mi>k</mi> <mrow> <mi>o</mi> <mi>b</mi> <mi>j</mi> </mrow> </msub> </mrow> </msub> <mo>+</mo> <mi>S</mi> <mo>&CenterDot;</mo> <mi>&gamma;</mi> </mrow> </mfrac> <mo>,</mo> <msub> <mi>W</mi> <mrow> <mi>s</mi> <mi>u</mi> <mi>b</mi> </mrow> </msub> <mo>></mo> <msub> <mi>W</mi> <mrow> <mi>o</mi> <mi>b</mi> <mi>j</mi> </mrow> </msub> <mo>></mo> <mn>0.</mn> </mrow>
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711279503.7A CN108038166A (en) | 2017-12-06 | 2017-12-06 | A kind of Chinese microblog emotional analysis method based on the subjective and objective skewed popularity of lexical item |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711279503.7A CN108038166A (en) | 2017-12-06 | 2017-12-06 | A kind of Chinese microblog emotional analysis method based on the subjective and objective skewed popularity of lexical item |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108038166A true CN108038166A (en) | 2018-05-15 |
Family
ID=62095663
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711279503.7A Pending CN108038166A (en) | 2017-12-06 | 2017-12-06 | A kind of Chinese microblog emotional analysis method based on the subjective and objective skewed popularity of lexical item |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108038166A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110866087A (en) * | 2019-08-12 | 2020-03-06 | 上海大学 | Entity-oriented text emotion analysis method based on topic model |
CN112989033A (en) * | 2020-12-03 | 2021-06-18 | 昆明理工大学 | Microblog emotion classification method based on emotion category description |
CN113723084A (en) * | 2021-07-26 | 2021-11-30 | 内蒙古工业大学 | Mongolian text emotion analysis method fusing priori knowledge |
US11966702B1 (en) * | 2020-08-17 | 2024-04-23 | Alphavu, Llc | System and method for sentiment and misinformation analysis of digital conversations |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100257117A1 (en) * | 2009-04-03 | 2010-10-07 | Bulloons.Com Ltd. | Predictions based on analysis of online electronic messages |
CN102663046A (en) * | 2012-03-29 | 2012-09-12 | 中国科学院自动化研究所 | Sentiment analysis method oriented to micro-blog short text |
CN103150367A (en) * | 2013-03-07 | 2013-06-12 | 宁波成电泰克电子信息技术发展有限公司 | Method for analyzing emotional tendency of Chinese microblogs |
CN103995853A (en) * | 2014-05-12 | 2014-08-20 | 中国科学院计算技术研究所 | Multi-language emotional data processing and classifying method and system based on key sentences |
CN104679825A (en) * | 2015-01-06 | 2015-06-03 | 中国农业大学 | Web text-based acquiring and screening method of seismic macroscopic anomaly information |
-
2017
- 2017-12-06 CN CN201711279503.7A patent/CN108038166A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100257117A1 (en) * | 2009-04-03 | 2010-10-07 | Bulloons.Com Ltd. | Predictions based on analysis of online electronic messages |
CN102663046A (en) * | 2012-03-29 | 2012-09-12 | 中国科学院自动化研究所 | Sentiment analysis method oriented to micro-blog short text |
CN103150367A (en) * | 2013-03-07 | 2013-06-12 | 宁波成电泰克电子信息技术发展有限公司 | Method for analyzing emotional tendency of Chinese microblogs |
CN103995853A (en) * | 2014-05-12 | 2014-08-20 | 中国科学院计算技术研究所 | Multi-language emotional data processing and classifying method and system based on key sentences |
CN104679825A (en) * | 2015-01-06 | 2015-06-03 | 中国农业大学 | Web text-based acquiring and screening method of seismic macroscopic anomaly information |
Non-Patent Citations (2)
Title |
---|
WAN X等: "using bilingual knowledge and ensemble techniques for unsupervised chinese sentiment analysis", 《PROCEEDINGS OF THE CONFERNECE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING》 * |
张志琳等: "基于多样化特征的中文微博情感分类方法研究", 《中文信息学报》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110866087A (en) * | 2019-08-12 | 2020-03-06 | 上海大学 | Entity-oriented text emotion analysis method based on topic model |
CN110866087B (en) * | 2019-08-12 | 2023-11-17 | 上海大学 | Entity-oriented text emotion analysis method based on topic model |
US11966702B1 (en) * | 2020-08-17 | 2024-04-23 | Alphavu, Llc | System and method for sentiment and misinformation analysis of digital conversations |
CN112989033A (en) * | 2020-12-03 | 2021-06-18 | 昆明理工大学 | Microblog emotion classification method based on emotion category description |
CN113723084A (en) * | 2021-07-26 | 2021-11-30 | 内蒙古工业大学 | Mongolian text emotion analysis method fusing priori knowledge |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Dahou et al. | Word embeddings and convolutional neural network for arabic sentiment classification | |
CN108763326B (en) | Emotion analysis model construction method of convolutional neural network based on feature diversification | |
CN110472003B (en) | Social network text emotion fine-grained classification method based on graph convolution network | |
CN107038480A (en) | A kind of text sentiment classification method based on convolutional neural networks | |
CN109670039B (en) | Semi-supervised e-commerce comment emotion analysis method based on three-part graph and cluster analysis | |
CN103150367B (en) | A kind of Sentiment orientation analytical approach of Chinese microblogging | |
CN108108433A (en) | A kind of rule-based and the data network integration sentiment analysis method | |
CN108984530A (en) | A kind of detection method and detection system of network sensitive content | |
CN109299268A (en) | A kind of text emotion analysis method based on dual channel model | |
CN106407236B (en) | A kind of emotion tendency detection method towards comment data | |
CN107609132A (en) | One kind is based on Ontology storehouse Chinese text sentiment analysis method | |
CN108763216A (en) | A kind of text emotion analysis method based on Chinese data collection | |
Soliman et al. | Sentiment analysis of Arabic slang comments on facebook | |
CN107247703A (en) | Microblog emotional analysis method based on convolutional neural networks and integrated study | |
CN107203511A (en) | A kind of network text name entity recognition method based on neutral net probability disambiguation | |
CN106980609A (en) | A kind of name entity recognition method of the condition random field of word-based vector representation | |
CN104778256B (en) | A kind of the quick of field question answering system consulting can increment clustering method | |
CN107862087A (en) | Sentiment analysis method, apparatus and storage medium based on big data and deep learning | |
CN109446404A (en) | A kind of the feeling polarities analysis method and device of network public-opinion | |
CN106202372A (en) | A kind of method of network text information emotional semantic classification | |
CN107688630B (en) | Semantic-based weakly supervised microbo multi-emotion dictionary expansion method | |
CN108038166A (en) | A kind of Chinese microblog emotional analysis method based on the subjective and objective skewed popularity of lexical item | |
CN107305539A (en) | A kind of text tendency analysis method based on Word2Vec network sentiment new word discoveries | |
CN104268160A (en) | Evaluation object extraction method based on domain dictionary and semantic roles | |
CN106202584A (en) | A kind of microblog emotional based on standard dictionary and semantic rule analyzes method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180515 |