CN108038166A - A kind of Chinese microblog emotional analysis method based on the subjective and objective skewed popularity of lexical item - Google Patents

A kind of Chinese microblog emotional analysis method based on the subjective and objective skewed popularity of lexical item Download PDF

Info

Publication number
CN108038166A
CN108038166A CN201711279503.7A CN201711279503A CN108038166A CN 108038166 A CN108038166 A CN 108038166A CN 201711279503 A CN201711279503 A CN 201711279503A CN 108038166 A CN108038166 A CN 108038166A
Authority
CN
China
Prior art keywords
mrow
msub
emotion
skewed popularity
lexical item
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711279503.7A
Other languages
Chinese (zh)
Inventor
刘进
郭峻材
陈雪
崔晓晖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN201711279503.7A priority Critical patent/CN108038166A/en
Publication of CN108038166A publication Critical patent/CN108038166A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to a kind of Chinese microblog emotional analysis method based on the subjective and objective skewed popularity of lexical item.Step is as follows:(1) target microblog data collection to be analyzed is obtained;(2) every microblogging is segmented, the pre-operation such as part-of-speech tagging, stop words filtering, and the emotion word to preceding connecing negative word is combined operation;(3) to pretreated microblog data, emotion priori and skewed popularity priori are introduced;(4) skewed popularity, emotion and the theme label of each lexical item are sampled using Gibbs sampling algorithms;(5) skewed popularity and emotion Joint Distribution variable of every microblogging are calculated;(6) the final feeling polarities probability distribution of every microblogging is calculated, and then determines the feeling polarities of microblogging.This method proposes the concept of the subjective and objective skewed popularity (abbreviation skewed popularity) of lexical item for microblog data, combines modeling to the relation of skewed popularity, emotion and theme using Gibbs algorithms.This method is simple and practical, can significantly improve microblog emotional classification performance.

Description

A kind of Chinese microblog emotional analysis method based on the subjective and objective skewed popularity of lexical item
Technical field
The present invention relates to a kind of sentiment analysis method to Chinese microblogging, is specifically related to be directed to microblog data collection, proposes The concept of the skewed popularity of lexical item, while emotion priori and skewed popularity priori are introduced, based on skewed popularity, emotion and theme Relation this triple combination is sampled using Gibbs sampling algorithms, then calculate the skewed popularity and emotion joint point of every microblogging Cloth variable, then the final emotion probability distribution of every microblogging is calculated, and then determine the feeling polarities of microblogging, it is that one kind is based on lexical item The Chinese microblog emotional analysis method of subjective and objective skewed popularity.
Background technology
In recent years, as the rapid development of Internet technology, various social media platforms emerge rapidly, people are more and more Emotion or viewpoint that oneself is expressed using social medias such as microbloggings, there is the microblogging of magnanimity constantly producing and passing daily Broadcast.Compared with traditional long text, how therefrom microblogging short text has the characteristics that brief, colloquial style, lack of standardization and feature are sparse, Emotion or opinion knowledge are effectively excavated, has become an important research direction.
Mainly there are two major class methods to microblog emotional analysis at present:Method based on sentiment dictionary and based on machine learning Method.Method based on sentiment dictionary mainly utilizes the emotion word in sentiment dictionary, by Keywords matching, and then determines one The feeling polarities or intensity of text, the major defect of this kind of method are to be too dependent on surface characteristics.Side based on machine learning Method is also divided into supervision, Weakly supervised and unsupervised approaches entirely.Full measure of supervision is instructed on the large-scale dataset manually marked first Practice emotion classifiers, then trained grader is used in further emotional semantic classification, manually mark therein is especially time-consuming Effort.Weakly supervised method is mainly marked by the use of noise tokens such as the emoticons in social networks text as the emotion of text, Full measure of supervision training grader is reused, but noise present in marking can also influence the performance of grader.Unsupervised side Training set is not required in method, guides sentiment analysis using emotion word as emotion priori mostly.
Recent studies suggest that the emotion of text has dependence with theme, on this idea basis, occur very much The conjunctive model of emotion and theme.This kind of unsupervised approaches are primarily based on emotion and the relation of theme constructs rational lexical item life Into model, combined sampling then is carried out to the emotion and theme of lexical item using the Gibbs method of samplings, and then calculate the emotion of text Distribution, and using the emotional category belonging to the emotional category as text of maximum probability.
It is above-mentioned to be had the following disadvantages based on emotion and thematic relation to analyze the method for microblog emotional:
(1) they only think that emotion and theme have dependence, do not account for influence of the skewed popularity to emotion;
(2) when for microblogging field, they cannot utilize this most typical affective characteristics of emoticon well;
(3) due to not accounting for skewed popularity, they can not utilize the skewed popularity that the part of speech of emoticon and lexical item is included Priori.
The content of the invention
It is an object of the invention to for the deficiency in terms of current Chinese microblog emotional analysis, there is provided one kind is based on lexical item master The Chinese microblog emotional analysis method of objective skewed popularity, this method propose the concept of the skewed popularity of lexical item, while introduce emotion elder generation Knowledge and skewed popularity priori are tested, the relation based on skewed popularity, emotion and theme joins this three using Gibbs sampling algorithms Sampling is closed, then calculates the skewed popularity and emotion Joint Distribution variable of every microblogging, then to calculate the final emotion of every microblogging general Rate is distributed, and then determines the feeling polarities of microblogging.
To achieve the above object, design of the invention is as follows:Obtain target microblog data collection to be analyzed and carry out Pretreatment;Emotion priori and skewed popularity priori are introduced, and the deviation of each lexical item is sampled using Gibbs sampling algorithms Property, emotion and theme label;Calculate the skewed popularity and emotion Joint Distribution variable of every microblogging;Calculate the final feelings of every microblogging Feel polarity probability distribution, and then determine the feeling polarities of microblogging.
Thought according to above-mentioned invention, the present invention use following technical proposals:
A kind of Chinese microblog emotional analysis method based on the subjective and objective skewed popularity of lexical item, it is characterised in that:Including following step Suddenly:
Step 1, obtain target microblog data collection to be analyzed;
Step 2, segment every microblogging, the pre-operation such as part-of-speech tagging, stop words filtering, and to preceding connecing negative word Emotion word is combined operation;
Step 3, to pretreated microblog data, introduce emotion priori and skewed popularity priori, emotion priori Knowledge includes emotion word and emoticon;The skewed popularity of lexical item includes subjective skewed popularity and objective skewed popularity, the former refers to partially To in the subjective emotion of expression, the latter refers to being partial to describe objective things;This method uses emoticon to be inclined to as subjective Property priori, time word, place word and pronoun are as objective skewed popularity priori;Introduce emotion and skewed popularity priori Process be specially:
Step 3a, build skewed popularity transfer matrix η, K × S × T × V's of transference matrix λ, K × V of empty S × V β matrixes and final prior matrix F (β, η, λ);Wherein S, T, K, V represent respectively emotion number, theme number, skewed popularity number and Different lexical item numbers in data set;
Step 3b, ηK×VAnd λS×VElement be initialized as 1;
Step 3c, for each lexical item w ∈ { 1 ..., V }, every kind of skewed popularity mark c ∈ { 1 ..., K } and every kind of emotion Mark l ∈ { 1 ..., S }, if w is skewed popularity priori, ηK×VIn element ηcwRenewal is as follows:
If w is emotion priori, λS×VIn element λlwRenewal is as follows:
Wherein, K (w) is the corresponding skewed popularity labels of w, and S (w) is the corresponding emotion labels of w;
Step 3d, for each lexical item w ∈ { 1 ..., V }, every kind of emotion mark l ∈ { 1 ..., S }, every kind of skewed popularity mark Note c ∈ 1 ..., K }
With
Each theme z ∈ { 1 ..., T }, final priori Fc,l,z,w(β, η, λ) is:
Fc,l,z,w(β, η, λ)=ηc,w·βc,l,z,w·λl,w
Step 4, according to pretreated microblog data and priori, utilize Gibbs sampling algorithms to sample each word Skewed popularity, emotion and the theme label of item, concentrate data the lexical item w of each position iiSkewed popularity label ci, emotion label li With theme label ziSampling is as follows:
P(ci=k, li=s, zi=t | w, c-i,l-i,z-i,ε,γ,α,β,η,λ)∝
Wherein, NdRepresent wiThe lexical item number of place text d, Nd,kRepresent the lexical item number for belonging to skewed popularity k in text d, Nd,k,s Represent the lexical item number for belonging to skewed popularity k and emotion s in text d, Nd,k,s,tRepresent to belong to skewed popularity k, emotion s and master in text d Inscribe the lexical item number of t, Nk,s,tRepresent the lexical item number for belonging to skewed popularity k, emotion s and theme t in data set, Nk,s,t,wRepresent data Concentrate the number for the lexical item w for belonging to skewed popularity k, emotion s and theme t;In addition, the priori meter that ε, γ are skewed popularity and emotion marks Number, is empirical value, and the priori that α is the theme counts, and learns to obtain by maximal possibility estimation, and-i represents not include current lexical item;
Step 5, the sampling by certain iterations, calculate the skewed popularity and emotion Joint Distribution variable of every microblogging, The skewed popularity and emotion Joint Distribution variable of microblogging d calculates as follows:
Step 6, calculate the final emotion probability distribution of every microblogging, and the feeling polarities of select probability maximum are as microblogging Feeling polarities, the final emotion probability distribution calculating of microblogging d are as follows:
The present invention's is a kind of based on the Chinese microblog emotional analysis method of the subjective and objective skewed popularity of lexical item and existing technology phase Compare, have the advantages that following outstanding feature and:First, it is believed that emotion, skewed popularity and theme are interrelated in microblogging text, to text This semanteme carried out deeper into analysis;Second, while emotion priori and skewed popularity priori are introduced, it is not only fully sharp With emoticon, and the part-of-speech tagging of lexical item is combined, used the emotion correlated characteristic of text to a greater extent;The Three, using influence of the skewed popularity to emotion, to emotion, skewed popularity and theme combined sampling so that final emotional semantic classification is more Accurately.
Brief description of the drawings
Fig. 1 is a kind of flow chart of Chinese microblog emotional analysis method based on the subjective and objective skewed popularity of lexical item of the present invention.
Embodiment
The embodiment of the present invention is further described below in conjunction with attached drawing.
The Method And Principle of the present invention is introduced first:
A kind of Chinese microblog emotional analysis method based on the subjective and objective skewed popularity of lexical item, it is characterised in that its specific steps It is as follows:
(1) target microblog data collection to be analyzed is obtained;
(2) every microblogging is segmented, the pre-operation such as part-of-speech tagging, stop words filtering, and to preceding connecing the feelings of negative word Feel word
It is combined operation;
(3) to pretreated microblog data, emotion priori and skewed popularity priori are introduced;
(4) according to pretreated microblog data and priori, each lexical item is sampled using Gibbs sampling algorithms Skewed popularity, emotion and theme label;
(5) by the sampling of certain iterations, the skewed popularity and emotion Joint Distribution variable of every microblogging of calculating;
(6) the final emotion probability distribution of every microblogging is calculated, the feelings of the feeling polarities of select probability maximum as microblogging Feel polarity.In (2) step, emotion word and negative word combination are into after neologisms, the emotion attribute before having no longer;
In (3) step, emotion priori includes emotion word and emoticon.The skewed popularity of lexical item includes subjective skewed popularity With objective skewed popularity, the former refers to being partial to express subjective emotion, and the latter refers to being partial to describe objective things.This method Using emoticon as subjective skewed popularity priori, time word, place word and pronoun are as objective skewed popularity priori. The process for introducing emotion and skewed popularity priori is specially:
The β squares of skewed popularity transfer matrix η, K × S × T × V of transference matrix λ, K × V of the empty S × V of (3a) structure Battle array and final prior matrix F (β, η, λ).Wherein S, T, K, V represent emotion number, theme number, skewed popularity number sum number respectively According to the different lexical item numbers of concentration;
(3b)ηK×VAnd λS×VElement be initialized as 1;
(3c) is marked for each lexical item w ∈ { 1 ..., V }, every kind of skewed popularity mark c ∈ { 1 ..., K } and every kind of emotion L ∈ { 1 ..., S }, if w is skewed popularity priori, ηK×VIn element ηcwRenewal is as follows:
If w is emotion priori, λS×VIn element λlwRenewal is as follows:
Wherein, K (w) is the corresponding skewed popularity labels of w, and S (w) is the corresponding emotion labels of w;
(3d) is for each lexical item w ∈ { 1 ..., V }, every kind of emotion mark l ∈ { 1 ..., S }, every kind of skewed popularity mark c ∈ { 1 ..., K } and each theme z ∈ { 1 ..., T }, final priori Fc,l,z,w(β, η, λ) is:
Fc,l,z,w(β, η, λ)=ηc,w·βc,l,z,wl,w
In (4) step, data are concentrated with the lexical item w of each position iiSkewed popularity label ci, emotion label liWith theme mark Number ziSampling is as follows:
P(ci=k, li=s, zi=t | w, c-i,l-i,z-i,ε,γ,α,β,η,λ)∝
Wherein, NdRepresent wiThe lexical item number of place text d, Nd,kRepresent the lexical item number for belonging to skewed popularity k in text d, Nd,k,s Represent the lexical item number for belonging to skewed popularity k and emotion s in text d, Nd,k,s,tRepresent to belong to skewed popularity k, emotion s and master in text d Inscribe the lexical item number of t, Nk,s,tRepresent the lexical item number for belonging to skewed popularity k, emotion s and theme t in data set, Nk,s,t,wRepresent data Concentrate the number for the lexical item w for belonging to skewed popularity k, emotion s and theme t.In addition, the priori meter of ε, γ skewed popularity and emotion mark Number, is empirical value, and the priori that α is the theme counts, and learns to obtain by maximal possibility estimation, and-i represents not include current lexical item.
In (5) step, skewed popularity and emotion Joint Distribution the variable calculating of microblogging d is as follows:
In (6) step, the final emotion probability distribution calculating of microblogging d is as follows:
Wherein, ksubRepresent subjective skewed popularity label, kobjRepresent objective skewed popularity label, WsubAnd WobjRepresent respectively subjective Emotion under skewed popularity and under objective skewed popularity is distributed in weight shared in overall emotion distribution.Obviously, subjective skewed popularity Emotion distribution shared by weight bigger.
Here is the specific embodiment with reference to the above method.
Embodiment one:Referring to Fig. 1, based on the Chinese microblog emotional analysis method of the subjective and objective skewed popularity of lexical item, its feature It is:For microblog data collection, the concept of the skewed popularity of lexical item is proposed, while introduce emotion priori and skewed popularity priori is known Know, the relation based on skewed popularity, emotion and theme samples this triple combination using Gibbs sampling algorithms, and then determines microblogging Feeling polarities.
The process of the introducing emotion priori and skewed popularity priori is as follows:
The β squares of skewed popularity transfer matrix η, K × S × T × V of transference matrix λ, K × V of the empty S × V of (3a) structure Battle array and final prior matrix F (β, η, λ).Wherein S, T, K, V represent emotion number, theme number, skewed popularity number sum number respectively According to the different lexical item numbers of concentration;
(3b)ηK×VAnd λS×VElement be initialized as 1;
(3c) is marked for each lexical item w ∈ { 1 ..., V }, every kind of skewed popularity mark c ∈ { 1 ..., K } and every kind of emotion L ∈ { 1 ..., S }, if w is skewed popularity priori, ηK×VIn element ηcwRenewal is as follows:
If w is emotion priori, λS×VIn element λlwRenewal is as follows:
Wherein, K (w) is the corresponding skewed popularity labels of w, and S (w) is the corresponding emotion labels of w;
(3d) is for each lexical item w ∈ { 1 ..., V }, every kind of emotion mark l ∈ { 1 ..., S }, every kind of skewed popularity mark c ∈ { 1 ..., K } and each theme z ∈ { 1 ..., T }, final priori Fc,l,z,w(β, η, λ) is:
Fc,l,z,w(β, η, λ)=ηc,w·βc,l,z,w·λl,w
The lexical item w that data are concentrated with each position iiSkewed popularity label ci, emotion label liWith theme label zi's Sampling type is as follows:
P(ci=k, li=s, zi=t | w, c-i,l-i,z-i,ε,γ,α,β,η,λ)∝
Wherein, NdRepresent wiThe lexical item number of place text d, Nd,kRepresent the lexical item number for belonging to skewed popularity k in text d, Nd,k,s Represent the lexical item number for belonging to skewed popularity k and emotion s in text d, Nd,k,s,tRepresent to belong to skewed popularity k, emotion s and master in text d Inscribe the lexical item number of t, Nk,s,tRepresent the lexical item number for belonging to skewed popularity k, emotion s and theme t in data set, Nk,s,t,wRepresent data Concentrate the number for the lexical item w for belonging to skewed popularity k, emotion s and theme t.In addition, the priori meter of ε, γ skewed popularity and emotion mark Number, is empirical value, and the priori that α is the theme counts, and estimates study by maximum likelihood and obtains, and-i represents not include current lexical item.
The skewed popularity and emotion Joint Distribution variable calculating formula to every microblogging d is as follows:
The final emotion probability distribution to microblogging d calculates as follows:
Wherein, ksubRepresent subjective skewed popularity label, kobjRepresent objective skewed popularity label, WsubAnd WobjRepresent respectively subjective Emotion under skewed popularity and under objective skewed popularity is distributed in weight shared in overall emotion distribution.Obviously, subjective skewed popularity Emotion distribution shared by weight bigger.
Embodiment two:Based on the Chinese microblog emotional analysis method of the subjective and objective skewed popularity of lexical item, from Sina weibo website Crawl 3000
Bar microblogging is as target data set to be analyzed.As shown in Figure 1, one kind of the present embodiment is subjective and objective partially based on lexical item Tropism
Chinese microblog emotional analysis method, its step is as follows:
S1. it is " modern as target data set to be analyzed, such as microblogging 3000 microblog datas to be crawled from Sina weibo website It has bought new cell-phone, good happy![heartily] ";
S2. every microblogging is segmented, the pre-operation such as part-of-speech tagging, stop words filtering, and to preceding connecing the feelings of negative word Sense word is combined operation.For example, using the Chinese Academy of Sciences NLPIR as participle and part-of-speech tagging instrument, microblogging " bought newly by today Mobile phone, it is good happy![heartily] " the lexical item sequence after processing for " today ", " buying ", " new ", " mobile phone ", " good ", " happy ", " [heartily] " }, corresponding part of speech sequence is { time word, verb, adjective, noun, adjective, prefix, character string };
S3. to pretreated microblog data, emotion priori and skewed popularity priori are introduced.For example, for place The emoticon in microblogging lexical item sequence { " today ", " buying ", " new ", " mobile phone ", " good ", " happy ", " [heartily] " } after reason " [heartily] ", it is corresponded to marked as 5 in language material lexicon, and polarity is forward direction, then its corresponding η·,5For [0,1]T, wherein Element be respectively objective skewed popularity and subjective skewed popularity priori, corresponding λ·,5For [1,0]T, element therein is respectively forward direction With negative sense polarity priori.Set in β that all elements is 0.1, then for each theme z ∈ { 1 ..., T }, final priori F·,.,z,5For:
S4. data are concentrated with the lexical item w of each position iiSkewed popularity label ci, emotion label liWith theme label ziInto Row sampling.Such as the emoticon " [heartily] " in the microblogging d " it is good happy [heartily] that today buys new cell-phone " after processing, its The sampling type of skewed popularity, emotion and theme label is as follows:
P(ci=k, li=s, zi=t | w, c-i,l-i,z-i,ε,γ,α,β,η,λ)∝
Wherein, NdRepresent current lexical item wiThe lexical item number of place text d, Nd,kRepresent the lexical item for belonging to skewed popularity k in text d Number, Nd,k,sRepresent the lexical item number for belonging to skewed popularity k and emotion s in text d, Nd,k,s,tRepresent to belong to skewed popularity k, feelings in text d Feel the lexical item number of s and theme t, Nk,s,tRepresent the lexical item number for belonging to skewed popularity k, emotion s and theme t in data set, Nk,s,t,w Represent the number for belonging to the lexical item w of skewed popularity k, emotion s and theme t in data set ,-i represents not include current lexical item.Based on α The priori of topic counts, by the acquistion of maximum likelihood numerology to.In addition, the priori of ε, γ skewed popularity and emotion mark counts, for warp Value is tested, sets γ=0.1*AL/S, ε=0.1*AL/K, wherein AL represents the average text size of microblog data collection;
S5. pass through the sampling of certain iterations, calculate the skewed popularity and emotion Joint Distribution variable of every microblogging.Such as After 1000 sampling iteration, the skewed popularity and emotion Joint Distribution of microblogging d " it is good happy [heartily] that today buys new cell-phone " Variable calculating formula is as follows:
S6. the final emotion probability distribution of every microblogging is calculated, the feelings of the feeling polarities of select probability maximum as microblogging Feel polarity.Such as microblogging d " it is good happy [heartily] that today buys new cell-phone " it is final emotion probability distribution calculating formula it is as follows
Wherein, ksubRepresent subjective skewed popularity label, kobjRepresent objective skewed popularity label.This microblogging after normalization The probability for belonging to positively and negatively feeling polarities is respectively 0.893434 and 0.106566, therefore judges the emotion pole of this microblogging Property for forward direction.
Specific embodiment described herein is only to spirit explanation for example of the invention.Technology belonging to the present invention is led The technical staff in domain can do various modifications or additions to described specific embodiment or replace in a similar way Generation, but without departing from spirit of the invention or beyond the scope of the appended claims.

Claims (1)

  1. A kind of 1. Chinese microblog emotional analysis method based on the subjective and objective skewed popularity of lexical item, it is characterised in that:Comprise the following steps:
    Step 1, obtain target microblog data collection to be analyzed;
    Step 2, segment every microblogging, the pre-operation such as part-of-speech tagging, stop words filtering, and to preceding connecing the emotion of negative word Word is combined operation;
    Step 3, to pretreated microblog data, introduce emotion priori and skewed popularity priori, emotion priori Including emotion word and emoticon;The skewed popularity of lexical item includes subjective skewed popularity and objective skewed popularity, the former refers to being partial to The subjective emotion of expression, the latter refer to being partial to describe objective things;This method uses emoticon first as subjective skewed popularity Knowledge is tested, time word, place word and pronoun are as objective skewed popularity priori;Introduce the mistake of emotion and skewed popularity priori Journey is specially:
    Step 3a, the β squares of skewed popularity transfer matrix η, K × S × T × V of transference matrix λ, K × V of the empty S × V of structure Battle array and final prior matrix F (β, η, λ);Wherein S, T, K, V represent emotion number, theme number, skewed popularity number sum number respectively According to the different lexical item numbers of concentration;
    Step 3b, ηK×VAnd λS×VElement be initialized as 1;
    Step 3c, l is marked for each lexical item w ∈ { 1 ..., V }, every kind of skewed popularity mark c ∈ { 1 ..., K } and every kind of emotion ∈ { 1 ..., S }, if w is skewed popularity priori, ηK×VIn element ηcwRenewal is as follows:
    If w is emotion priori, λS×VIn element λlwRenewal is as follows:
    Wherein, K (w) is the corresponding skewed popularity labels of w, and S (w) is the corresponding emotion labels of w;
    Step 3d, for each lexical item w ∈ { 1 ..., V }, every kind of emotion mark l ∈ { 1 ..., S }, every kind of skewed popularity mark c ∈ { 1 ..., K } and each theme z ∈ { 1 ..., T }, final priori Fc,l,z,w(β, η, λ) is:
    Fc,l,z,w(β, η, λ)=ηc,w·βc,l,z,w·λl,w
    Step 4, according to pretreated microblog data and priori, utilize Gibbs sampling algorithms to sample each lexical item Skewed popularity, emotion and theme label, concentrate data the lexical item w of each position iiSkewed popularity label ci, emotion label liAnd master Inscribe label ziSampling is as follows:
    <mfenced open = "" close = ""> <mtable> <mtr> <mtd> <mrow> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>c</mi> <mi>i</mi> </msub> <mo>=</mo> <mi>k</mi> <mo>,</mo> <msub> <mi>l</mi> <mi>i</mi> </msub> <mo>=</mo> <mi>s</mi> <mo>,</mo> <msub> <mi>z</mi> <mi>i</mi> </msub> <mo>=</mo> <mi>t</mi> <mo>|</mo> <mi>w</mi> <mo>,</mo> <msup> <mi>c</mi> <mrow> <mo>-</mo> <mi>i</mi> </mrow> </msup> <mo>,</mo> <msup> <mi>l</mi> <mrow> <mo>-</mo> <mi>i</mi> </mrow> </msup> <mo>,</mo> <msup> <mi>z</mi> <mrow> <mo>-</mo> <mi>i</mi> </mrow> </msup> <mo>,</mo> <mi>&amp;epsiv;</mi> <mo>,</mo> <mi>&amp;gamma;</mi> <mo>,</mo> <mi>&amp;alpha;</mi> <mo>,</mo> <mi>&amp;beta;</mi> <mo>,</mo> <mi>&amp;eta;</mi> <mo>,</mo> <mi>&amp;lambda;</mi> <mo>)</mo> </mrow> <mo>&amp;Proportional;</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mfrac> <mrow> <msubsup> <mi>N</mi> <mrow> <mi>d</mi> <mo>,</mo> <mi>k</mi> </mrow> <mrow> <mo>-</mo> <mi>i</mi> </mrow> </msubsup> <mo>+</mo> <mi>&amp;epsiv;</mi> </mrow> <mrow> <msubsup> <mi>N</mi> <mi>d</mi> <mrow> <mo>-</mo> <mi>i</mi> </mrow> </msubsup> <mo>+</mo> <mi>K</mi> <mo>&amp;CenterDot;</mo> <mi>&amp;epsiv;</mi> </mrow> </mfrac> <mo>&amp;CenterDot;</mo> <mfrac> <mrow> <msubsup> <mi>N</mi> <mrow> <mi>d</mi> <mo>,</mo> <mi>k</mi> <mo>,</mo> <mi>s</mi> </mrow> <mrow> <mo>-</mo> <mi>i</mi> </mrow> </msubsup> <mo>+</mo> <mi>&amp;gamma;</mi> </mrow> <mrow> <msubsup> <mi>N</mi> <mrow> <mi>d</mi> <mo>,</mo> <mi>k</mi> </mrow> <mrow> <mo>-</mo> <mi>i</mi> </mrow> </msubsup> <mo>+</mo> <mi>S</mi> <mo>&amp;CenterDot;</mo> <mi>&amp;gamma;</mi> </mrow> </mfrac> <mo>&amp;CenterDot;</mo> <mfrac> <mrow> <msubsup> <mi>N</mi> <mrow> <mi>d</mi> <mo>,</mo> <mi>k</mi> <mo>,</mo> <mi>s</mi> <mo>,</mo> <mi>t</mi> </mrow> <mrow> <mo>-</mo> <mi>i</mi> </mrow> </msubsup> <mo>+</mo> <msub> <mi>&amp;alpha;</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>s</mi> <mo>,</mo> <mi>t</mi> </mrow> </msub> </mrow> <mrow> <msubsup> <mi>N</mi> <mrow> <mi>d</mi> <mo>,</mo> <mi>k</mi> <mo>,</mo> <mi>s</mi> </mrow> <mrow> <mo>-</mo> <mi>i</mi> </mrow> </msubsup> <mo>+</mo> <msubsup> <mo>&amp;Sigma;</mo> <mi>t</mi> <mi>T</mi> </msubsup> <msub> <mi>&amp;alpha;</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>s</mi> <mo>,</mo> <mi>t</mi> </mrow> </msub> </mrow> </mfrac> <mo>&amp;CenterDot;</mo> <mfrac> <mrow> <msubsup> <mi>N</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>s</mi> <mo>,</mo> <mi>t</mi> <mo>,</mo> <msub> <mi>w</mi> <mi>i</mi> </msub> </mrow> <mrow> <mo>-</mo> <mi>i</mi> </mrow> </msubsup> <mo>+</mo> <msub> <mi>F</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>s</mi> <mo>,</mo> <mi>t</mi> <mo>,</mo> <msub> <mi>w</mi> <mi>i</mi> </msub> </mrow> </msub> </mrow> <mrow> <msubsup> <mi>N</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>s</mi> <mo>,</mo> <mi>t</mi> </mrow> <mrow> <mo>-</mo> <mi>i</mi> </mrow> </msubsup> <mo>+</mo> <msubsup> <mo>&amp;Sigma;</mo> <mi>v</mi> <mi>V</mi> </msubsup> <msub> <mi>F</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>s</mi> <mo>,</mo> <mi>t</mi> <mo>,</mo> <mi>v</mi> </mrow> </msub> </mrow> </mfrac> </mrow> </mtd> </mtr> </mtable> </mfenced>
    Wherein, NdRepresent wiThe lexical item number of place text d, Nd,kRepresent the lexical item number for belonging to skewed popularity k in text d, Nd,k,sRepresent Belong to the lexical item number of skewed popularity k and emotion s, N in text dd,k,s,tRepresent to belong to skewed popularity k, emotion s and theme t in text d Lexical item number, Nk,s,tRepresent the lexical item number for belonging to skewed popularity k, emotion s and theme t in data set, Nk,s,t,wRepresent in data set Belong to the number of the lexical item w of skewed popularity k, emotion s and theme t;In addition, the priori counting that ε, γ are skewed popularity and emotion marks, For empirical value, the priori that α is the theme counts, learns to obtain by maximal possibility estimation, and-i represents not include current lexical item;
    Step 5, the sampling by certain iterations, calculate the skewed popularity and emotion Joint Distribution variable of every microblogging, microblogging d Skewed popularity and emotion Joint Distribution variable calculate it is as follows:
    <mrow> <msub> <mi>&amp;pi;</mi> <mrow> <mi>d</mi> <mo>,</mo> <mi>k</mi> <mo>,</mo> <mi>s</mi> </mrow> </msub> <mo>=</mo> <mfrac> <mrow> <msub> <mi>N</mi> <mrow> <mi>d</mi> <mo>,</mo> <mi>k</mi> <mo>,</mo> <mi>s</mi> </mrow> </msub> <mo>+</mo> <mi>&amp;gamma;</mi> </mrow> <mrow> <msub> <mi>N</mi> <mrow> <mi>d</mi> <mo>,</mo> <mi>k</mi> </mrow> </msub> <mo>+</mo> <mi>S</mi> <mo>&amp;CenterDot;</mo> <mi>&amp;gamma;</mi> </mrow> </mfrac> </mrow>
    Step 6, calculate the final emotion probability distribution of every microblogging, the emotion of the feeling polarities of select probability maximum as microblogging Polarity, the final emotion probability distribution calculating of microblogging d are as follows:
    <mrow> <msub> <mo>&amp;Pi;</mo> <mrow> <mi>d</mi> <mo>,</mo> <mi>s</mi> </mrow> </msub> <mo>=</mo> <msub> <mi>W</mi> <mrow> <mi>s</mi> <mi>u</mi> <mi>b</mi> </mrow> </msub> <mo>&amp;CenterDot;</mo> <mfrac> <mrow> <msub> <mi>N</mi> <mrow> <mi>d</mi> <mo>,</mo> <msub> <mi>k</mi> <mrow> <mi>s</mi> <mi>u</mi> <mi>b</mi> </mrow> </msub> <mo>,</mo> <mi>s</mi> </mrow> </msub> <mo>+</mo> <mi>&amp;gamma;</mi> </mrow> <mrow> <msub> <mi>N</mi> <mrow> <mi>d</mi> <mo>,</mo> <msub> <mi>k</mi> <mrow> <mi>s</mi> <mi>u</mi> <mi>b</mi> </mrow> </msub> </mrow> </msub> <mo>+</mo> <mi>S</mi> <mo>&amp;CenterDot;</mo> <mi>&amp;gamma;</mi> </mrow> </mfrac> <mo>+</mo> <msub> <mi>W</mi> <mrow> <mi>o</mi> <mi>b</mi> <mi>j</mi> </mrow> </msub> <mo>&amp;CenterDot;</mo> <mfrac> <mrow> <msub> <mi>N</mi> <mrow> <mi>d</mi> <mo>,</mo> <msub> <mi>k</mi> <mrow> <mi>o</mi> <mi>b</mi> <mi>j</mi> </mrow> </msub> <mo>,</mo> <mi>s</mi> </mrow> </msub> <mo>+</mo> <mi>&amp;gamma;</mi> </mrow> <mrow> <msub> <mi>N</mi> <mrow> <mi>d</mi> <mo>,</mo> <msub> <mi>k</mi> <mrow> <mi>o</mi> <mi>b</mi> <mi>j</mi> </mrow> </msub> </mrow> </msub> <mo>+</mo> <mi>S</mi> <mo>&amp;CenterDot;</mo> <mi>&amp;gamma;</mi> </mrow> </mfrac> <mo>,</mo> <msub> <mi>W</mi> <mrow> <mi>s</mi> <mi>u</mi> <mi>b</mi> </mrow> </msub> <mo>&gt;</mo> <msub> <mi>W</mi> <mrow> <mi>o</mi> <mi>b</mi> <mi>j</mi> </mrow> </msub> <mo>&gt;</mo> <mn>0.</mn> </mrow>
CN201711279503.7A 2017-12-06 2017-12-06 A kind of Chinese microblog emotional analysis method based on the subjective and objective skewed popularity of lexical item Pending CN108038166A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711279503.7A CN108038166A (en) 2017-12-06 2017-12-06 A kind of Chinese microblog emotional analysis method based on the subjective and objective skewed popularity of lexical item

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711279503.7A CN108038166A (en) 2017-12-06 2017-12-06 A kind of Chinese microblog emotional analysis method based on the subjective and objective skewed popularity of lexical item

Publications (1)

Publication Number Publication Date
CN108038166A true CN108038166A (en) 2018-05-15

Family

ID=62095663

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711279503.7A Pending CN108038166A (en) 2017-12-06 2017-12-06 A kind of Chinese microblog emotional analysis method based on the subjective and objective skewed popularity of lexical item

Country Status (1)

Country Link
CN (1) CN108038166A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110866087A (en) * 2019-08-12 2020-03-06 上海大学 Entity-oriented text emotion analysis method based on topic model
CN112989033A (en) * 2020-12-03 2021-06-18 昆明理工大学 Microblog emotion classification method based on emotion category description
CN113723084A (en) * 2021-07-26 2021-11-30 内蒙古工业大学 Mongolian text emotion analysis method fusing priori knowledge
US11966702B1 (en) * 2020-08-17 2024-04-23 Alphavu, Llc System and method for sentiment and misinformation analysis of digital conversations

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100257117A1 (en) * 2009-04-03 2010-10-07 Bulloons.Com Ltd. Predictions based on analysis of online electronic messages
CN102663046A (en) * 2012-03-29 2012-09-12 中国科学院自动化研究所 Sentiment analysis method oriented to micro-blog short text
CN103150367A (en) * 2013-03-07 2013-06-12 宁波成电泰克电子信息技术发展有限公司 Method for analyzing emotional tendency of Chinese microblogs
CN103995853A (en) * 2014-05-12 2014-08-20 中国科学院计算技术研究所 Multi-language emotional data processing and classifying method and system based on key sentences
CN104679825A (en) * 2015-01-06 2015-06-03 中国农业大学 Web text-based acquiring and screening method of seismic macroscopic anomaly information

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100257117A1 (en) * 2009-04-03 2010-10-07 Bulloons.Com Ltd. Predictions based on analysis of online electronic messages
CN102663046A (en) * 2012-03-29 2012-09-12 中国科学院自动化研究所 Sentiment analysis method oriented to micro-blog short text
CN103150367A (en) * 2013-03-07 2013-06-12 宁波成电泰克电子信息技术发展有限公司 Method for analyzing emotional tendency of Chinese microblogs
CN103995853A (en) * 2014-05-12 2014-08-20 中国科学院计算技术研究所 Multi-language emotional data processing and classifying method and system based on key sentences
CN104679825A (en) * 2015-01-06 2015-06-03 中国农业大学 Web text-based acquiring and screening method of seismic macroscopic anomaly information

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
WAN X等: "using bilingual knowledge and ensemble techniques for unsupervised chinese sentiment analysis", 《PROCEEDINGS OF THE CONFERNECE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING》 *
张志琳等: "基于多样化特征的中文微博情感分类方法研究", 《中文信息学报》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110866087A (en) * 2019-08-12 2020-03-06 上海大学 Entity-oriented text emotion analysis method based on topic model
CN110866087B (en) * 2019-08-12 2023-11-17 上海大学 Entity-oriented text emotion analysis method based on topic model
US11966702B1 (en) * 2020-08-17 2024-04-23 Alphavu, Llc System and method for sentiment and misinformation analysis of digital conversations
CN112989033A (en) * 2020-12-03 2021-06-18 昆明理工大学 Microblog emotion classification method based on emotion category description
CN113723084A (en) * 2021-07-26 2021-11-30 内蒙古工业大学 Mongolian text emotion analysis method fusing priori knowledge

Similar Documents

Publication Publication Date Title
Dahou et al. Word embeddings and convolutional neural network for arabic sentiment classification
CN108763326B (en) Emotion analysis model construction method of convolutional neural network based on feature diversification
CN110472003B (en) Social network text emotion fine-grained classification method based on graph convolution network
CN107038480A (en) A kind of text sentiment classification method based on convolutional neural networks
CN109670039B (en) Semi-supervised e-commerce comment emotion analysis method based on three-part graph and cluster analysis
CN103150367B (en) A kind of Sentiment orientation analytical approach of Chinese microblogging
CN108108433A (en) A kind of rule-based and the data network integration sentiment analysis method
CN108984530A (en) A kind of detection method and detection system of network sensitive content
CN109299268A (en) A kind of text emotion analysis method based on dual channel model
CN106407236B (en) A kind of emotion tendency detection method towards comment data
CN107609132A (en) One kind is based on Ontology storehouse Chinese text sentiment analysis method
CN108763216A (en) A kind of text emotion analysis method based on Chinese data collection
Soliman et al. Sentiment analysis of Arabic slang comments on facebook
CN107247703A (en) Microblog emotional analysis method based on convolutional neural networks and integrated study
CN107203511A (en) A kind of network text name entity recognition method based on neutral net probability disambiguation
CN106980609A (en) A kind of name entity recognition method of the condition random field of word-based vector representation
CN104778256B (en) A kind of the quick of field question answering system consulting can increment clustering method
CN107862087A (en) Sentiment analysis method, apparatus and storage medium based on big data and deep learning
CN109446404A (en) A kind of the feeling polarities analysis method and device of network public-opinion
CN106202372A (en) A kind of method of network text information emotional semantic classification
CN107688630B (en) Semantic-based weakly supervised microbo multi-emotion dictionary expansion method
CN108038166A (en) A kind of Chinese microblog emotional analysis method based on the subjective and objective skewed popularity of lexical item
CN107305539A (en) A kind of text tendency analysis method based on Word2Vec network sentiment new word discoveries
CN104268160A (en) Evaluation object extraction method based on domain dictionary and semantic roles
CN106202584A (en) A kind of microblog emotional based on standard dictionary and semantic rule analyzes method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180515