CN109376239B - Specific emotion dictionary generation method for Chinese microblog emotion classification - Google Patents

Specific emotion dictionary generation method for Chinese microblog emotion classification Download PDF

Info

Publication number
CN109376239B
CN109376239B CN201811145088.0A CN201811145088A CN109376239B CN 109376239 B CN109376239 B CN 109376239B CN 201811145088 A CN201811145088 A CN 201811145088A CN 109376239 B CN109376239 B CN 109376239B
Authority
CN
China
Prior art keywords
emotion
units
score
microblog
dictionary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811145088.0A
Other languages
Chinese (zh)
Other versions
CN109376239A (en
Inventor
赵传君
王素格
李德玉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanxi University
Original Assignee
Shanxi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanxi University filed Critical Shanxi University
Priority to CN201811145088.0A priority Critical patent/CN109376239B/en
Publication of CN109376239A publication Critical patent/CN109376239A/en
Application granted granted Critical
Publication of CN109376239B publication Critical patent/CN109376239B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Abstract

The invention discloses a method for generating a specific emotion dictionary for Chinese microblog emotion classification. And finally, completing emotion transmission from the seed emotion unit set with the tag to the emotion units without the tag through an emotion transmission algorithm, acquiring emotion marks of all emotion words in all emotion units, obtaining a microblog specific emotion dictionary containing explicit emotion characteristics and implicit emotion characteristics, and classifying the emotions of the microblog corpus according to the microblog specific emotion dictionary. Compared with a similar representative method, the method has higher overall calculation accuracy and higher stability, can effectively construct a domain-specific emotion dictionary, and accurately extracts explicit and implicit emotional features.

Description

Specific emotion dictionary generation method for Chinese microblog emotion classification
Technical Field
The invention relates to the field of computer social media text sentiment analysis, and provides a method for generating a specific sentiment dictionary for Chinese microblog sentiment classification.
Background
The construction of the emotion dictionary is a basic and important aspect of the emotion analysis task. Where a word or phrase is an essential element that expresses positive or negative emotions. At present, microblogs become a fashionable communication mode on the Internet. The individual users can freely, conveniently and instantly express their opinions on products and public events through the media such as the Xinlang microblog and the like. Due to the extremely short length and rich vocabulary of the microblog, the vector space model represented by the microblog is very sparse. Therefore, the dictionary method is more suitable for microblog emotion analysis. However, the emotion vocabulary used in different fields is different, and the same word appearing in different fields may express different views, which results in diversity of emotion expression and semantic variability of the emotion vocabulary. Because the emotion of a word or a phrase often depends on a specific field, and manual labeling of emotion words by using a general emotion dictionary in a specific field is time-consuming and labor-consuming, the general emotion dictionary cannot well classify the emotion of the specific field and cannot meet the requirement of emotion classification of the specific field.
Methods such as linguistic rule-based methods, corpus-based methods, and dictionary-based methods have been proposed in the prior art to automatically construct domain-specific emotion dictionaries. However, because expression forms of the grammar of the microblog are various, a user does not follow grammar rules when organizing the microblog, and therefore, the method based on the language rules cannot be suitable for all situations of the microblog; the corpus-based method depends heavily on the size of the corpus, and the dictionary-based method is closely related to the quality of the general emotion dictionary. Therefore, the three methods are not suitable for constructing the emotional dictionary in the microblog specific field.
In addition, conventional emotion analysis studies have focused on Explicit emotional Features (Explicit sentimental Features) such as "goodness", "dislike", and "like". One obvious element of these emotional characteristics is the presence of a clear emotional indication. These affective words, phrases and idioms that directly express emotion on an entity or aspect are referred to as explicit emotional characteristics. In fact, many users use linguistic paraphrases or statements of facts to express an inclusive indirect emotion. Implicit emotional Features (Implicit Sentiment Features) generally refer to Features that express positive or negative emotions without explicit emotional indicators. These features often state a fact or indirectly express an emotion. Four microblogs including the implicit emotional characteristics "cutworm", "flood and wilderness", "navy" and "five-mao special effect" are shown in fig. 3. The identification of implicit emotional features has been a challenging problem because there is no emotional indication at all.
Disclosure of Invention
The method aims to construct a microblog specific emotion dictionary by extracting dominant and recessive emotional features, and carry out emotion classification on the microblog according to the emotion polarity (positive, negative and neutral) of the emotional features.
In order to achieve the purpose, the invention provides a method for generating a specific emotion dictionary for Chinese microblog emotion classification aiming at construction of microblog specific emotion vocabularies and the mentioned implicit emotion characteristic, which comprises the following steps:
s1, changing the microblog corpus D to { D ═ D1,d2,…dlPreprocessing, and extracting a plurality of emotion units T through lexical analysis and syntactic analysisiAnd a plurality of said emotion units TiSet T ═ T as emotion unit1,T2,…TnWherein i and n are positive integers, i is more than or equal to 1 and less than or equal to n, and T is definediN is a negative indicator, D is a degree adverb, E is an evaluation word, and P is an emotional polarity;
s2, constructing an emotion propagation graph G (V, E, W) based on the emotion unit set T, wherein V is a set of emotion units, E is a set of edges, W is a weight matrix between emotion units, and calculating the emotion units TiStandard centrality of H (T)i) And according to the standard centrality H (T)i) Descending order for a plurality of emotion units TiThe sorting is carried out, and the sorting is carried out,selecting the first M as a seed emotion unit set TsAnd carrying out emotion label labeling on the seed emotion units by using a general emotion dictionary and manual labeling, wherein M is<n, wherein M/n is more than or equal to 20 percent, and M, n is a positive integer;
s3, applying emotion transmission algorithm to complete labeled seed emotion unit set TsTransmitting the emotion to n-M emotion units without labels, and respectively acquiring the emotion score of each emotion unit in the n-M emotion units;
s4, obtaining the emotion score of each emotion word in each emotion unit according to the emotion score of each emotion unit to obtain a microblog specific emotion dictionary L containing explicit emotion characteristics and implicit emotion characteristicsspe
S5, according to the microblog specific emotion dictionary LspeAnd classifying the emotions of the microblog linguistic data.
According to the method for generating the specific emotion dictionary for classifying Chinese microblog emotions, which is provided by the embodiment of the invention, microblog linguistic data are preprocessed, a plurality of emotion units are selected, then constructing an emotion propagation map by using a plurality of emotion units, calculating the standard centrality of the emotion units, then the emotion units are arranged according to the standard centrality degree and the size, the first M emotion units are selected as seed emotion units, then, the emotion label marking is carried out on the seed emotion units through the general emotion dictionary and the manual marking, and finally, the emotion transmission of the seed emotion unit set with the label to the emotion unit without the label is completed through an emotion transmission algorithm, and obtaining the emotion marks of all emotion units in all emotion units to obtain a microblog specific emotion dictionary containing explicit emotion characteristics and implicit emotion characteristics, and classifying the emotions of the microblog corpus according to the microblog specific emotion dictionary.
According to an embodiment of the present invention, the step S1 includes:
s11, filtering out microblog data set D ═ D by using a rule method1,d2,…dlThe information of links, stop words, repeated words and noise in the Chinese language is obtained;
s12, using the part-of-speech tagging tool pair D ═ D1,d2,…dlLexical analysis is performed and a dependent parsing tool is used to pair D ═ D1,d2,…dlCarrying out syntactic analysis;
s13, extracting D ═ D1,d2,…dlAdjectives, verbs, adverbs and nouns in the Chinese are taken as candidate emotional characteristics W ═ W1,w2,…wnAnd filtering out low-frequency words;
s14, extracting negative modification relations and degree modification relations in the dependency syntax analysis result;
s15, using the extracted candidate emotion characteristics, negative modification relation and degree modification relation as emotion units TiA plurality of emotion units form an emotion unit set T ═ { T ═ T1,T2,…TN}。
According to one embodiment of the invention, the standard centrality is
Figure GDA0003029256900000031
Wherein the content of the first and second substances,
Figure GDA0003029256900000032
n is the total number of emotion units in the corpus, niIs TiDegrees, hits (T) in graph Gi) Is T in corpusiNumber of occurrences, hits (T)j) Is T in corpusjNumber of occurrences, hits (T)i,Tj) Is TiAnd TjThe frequency of occurrence in the same window under the local and social context, and the relationship matrix among the emotion units is marked as Pij
According to an embodiment of the present invention, step S3 further includes:
s31, marking the initial emotion score vector of the emotion unit as score (t):
score(T)=[score(T1),score(T2),…score(Tn)]
normalization to score (t):
Figure GDA0003029256900000033
Figure GDA0003029256900000034
wherein is TposSet of positive emotion units, TnegA set of negative emotion units;
s32, removing the smaller connecting edge of the graph G to carry out pruning operation, wherein, k larger values are reserved for each row of the matrix P', the rest are assigned with 0, so as to determine k larger units of each emotion unit as emotion neighbors,
Figure GDA0003029256900000035
wherein, P' is a probability transition matrix;
s33, defining the probability transition matrix of emotion propagation as follows:
Figure GDA0003029256900000041
wherein, beta belongs to [0,1] as an adaptive parameter, A is a matrix with partial rows of 1/n and the rest rows of 0, and J is a matrix with all elements of 1/n. The purpose of adding matrix A is to ensure that matrix P' has no non-0 rows;
s34, the process of emotion label propagation is defined as follows:
Figure GDA0003029256900000042
wherein
Figure GDA0003029256900000043
Is TiSentiment score under t +1 iteration, alpha ∈ [0,1]]In order to be a weight parameter, the weight parameter,
Figure GDA0003029256900000044
is a matrix
Figure GDA0003029256900000045
Line i of (1), score (T)t) The emotion component vector of the T under the T iteration; at each iteration, we compute in order of i ═ 1: n
Figure GDA0003029256900000046
Each time a new one is obtained
Figure GDA0003029256900000047
Is convenient to update
Figure GDA0003029256900000048
S35, when the iteration stops, according to normalizing score (t):
Figure GDA0003029256900000049
according to an embodiment of the present invention, step S4 further includes:
s41, calculating the emotion score of the emotion characteristics according to the emotion scores of the emotion units:
Figure GDA00030292569000000410
wherein n (w)i) Is the word wiFrequency of occurrence in corpus, score (N) in emotional unit TiScore (D) is in units of sentiment TiDegree of (5) is divided into degrees.
S42, acquiring a microblog specific emotion dictionary L containing explicit emotion characteristics and implicit emotion characteristicsspe
According to an embodiment of the present invention, step S5 further includes:
the microblog-specific emotion dictionary LspeThe method is applied to sentiment classification, wherein explicit and implicit sentiment features and 20 semantic synthesis rules in the Chinese microblog field are considered, andand microblog diSentiment score of (d)i) And summing all the emotion units to determine the final emotion polarity according to the emotion marks of the microblogs.
Compared with the prior art, the invention has the following beneficial effects: (1) the invention provides a new emotion unit emotion spreading frame which can generate a microblog specific emotion dictionary on a Chinese microblog data set, wherein the generated dictionary comprises explicit emotion characteristics and implicit noun or noun phrase emotion characteristics; (2) the social relationships, topic features and local context information are used to construct an emotion propagation graph and its adjacency matrix. The emotion propagation algorithm propagates emotion from labeled cells to unlabeled cells. Through emotion label propagation, emotion scores of explicit and implicit emotion characteristics can be obtained; (3) the proposed framework is verified on two microblog datasets, UCI and Weibo. Experimental results prove that the method can generate a high-quality microblog specific emotion dictionary, obtain a better result in the emotion classification task and improve the emotion classification accuracy.
Drawings
FIG. 1 is a flowchart of a method for generating a specific emotion dictionary for Chinese microblog emotion classification according to an embodiment of the invention.
FIG. 2 is a frame diagram for generating Chinese domain-specific emotion dictionary fusing social relations and local contexts.
Fig. 3 is a micro-blog containing four implicit features "cutworm", "flood force", "navy", and "five mao effect".
FIG. 4 is a flow chart of three microblog processes under the context propagation framework of emotion units.
FIG. 5 is a diagram of local context and social context of a target sentiment unit in a microblog.
FIG. 6 is a microbump diagram from the relationship of the Sina microblog users and their releases.
Fig. 7 is a graph of the matching rate of the emotion dictionary obtained under the UCI data set 5-fold cross validation.
FIG. 8 is a chart of the matching rate of the emotion dictionaries obtained under 5-fold cross validation of the Weibo data set.
Fig. 9 is a word cloud graph of nominal emotional features on a UCI data set.
FIG. 10 is a word cloud of nominal emotional features on a Weibo data set.
FIG. 11 is the sentiment classification results (positive, negative and neutral) under the UCI data set.
FIG. 12 is the sentiment classification results (positive and negative) under the Weibo data set.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
As shown in fig. 1 to 12, the framework of the invention is essentially divided into the following five steps, which are connected layer by layer and finally fused. The learning process mainly comprises the following steps:
s1, changing the microblog corpus D to { D ═ D1,d2,…dlPreprocessing, and extracting a plurality of emotion units T through lexical analysis and syntactic analysisiAnd a plurality of said emotion units TiSet T ═ T as emotion unit1,T2,…TnWherein i and n are positive integers, i is more than or equal to 1 and less than or equal to n, and T is definediN is a negative indicator, D is a degree adverb, E is an evaluation word, and P is an emotional polarity.
Wherein, step S1 includes: s11, filtering out microblog data set D ═ D by using a rule method1,d2,…dlThe information of links, stop words, repeated words and noise in the Chinese language is obtained;
s12, using the part-of-speech tagging tool pair D ═ D1,d2,…dlLexical analysis is performed and a dependent parsing tool is used to pair D ═ D1,d2,…dlCarrying out syntactic analysis;
s13, extracting D ═ D1,d2,…dlAdjectives, verbs, adverbs and nouns in the Chinese are taken as candidate emotional characteristics W ═ W1,w2,…wnAnd filtering out low-frequency words;
s14, extracting negative modification relations and degree modification relations in the dependency syntax analysis result;
s15, using the extracted candidate emotion characteristics, negative modification relation and degree modification relation as emotion units TiA plurality of emotion units form an emotion unit set T ═ { T ═ T1,T2,…Tn}。
In the preprocessing stage, links, stop words, repeated words and other noise information in the microblog are filtered out firstly. Lexical and syntactic analyses are performed using part-of-speech tagging and dependency parsing tools in the Hadamard Language Technology Platform (LTP). For a specific character in the microblog, firstly, extracting topic label information marked by #, user relations marked by @ and emoticons and the like. Words or phrases that may express emotion are defined as candidate explicit and implicit emotional features. Lexical analysis may analyze word tagging information (Parts of Speech, POS), topic characteristics, user relationships in the social network, and the like. We extract adjectives, verbs, adverbs and nouns as candidate emotional features and filter out words with low frequency. Dependency parsing reveals their syntactic structure by analyzing dependencies between linguistic elements. And extracting negative modification relations and degree modification relations in the dependency syntax analysis result. In addition, the microblog emotion data is marked as D ═ D1,d2,…dl},W={w1,w2,…wnIs the set of potential emotion features, where WkAnd the emotion words or phrases are more than or equal to 1 and less than or equal to n. N is a negative indicator such as "no", "not" and "none", etc. D is a degree adverb such as "completely" and "equivalently" and the like.
Wherein, the emotion unit set is marked as T ═ { T ═ T1,T2,…TnWhere T isiIs an emotion unit, i is more than or equal to 1 and less than or equal to n. T issIs a set of seed sentiment units, | TsAnd M. The adverbs and negative indicators of degree affect the emotional intensity and polarity of the emotion. For example, as shown in FIG. 4, in the emotion units (not, good, negative), "good" is a positive emotion word, and "very" enhances the degree of negative emotion.
S2, constructing emotion sensor based on emotion unit set TCalculating the emotion unit T by (V, E, W), wherein V is a set of emotion units, E is a set of edges, and W is a weight matrix between the emotion unitsiStandard centrality of H (T)i) And according to the standard centrality H (T)i) Descending order for a plurality of emotion units TiSorting is carried out, and the first M are selected as seed emotion unit sets TsAnd carrying out emotion label labeling on the seed emotion units by using a general emotion dictionary and manual labeling, wherein M is<n, and M/n is more than or equal to 20 percent, and M, n is a positive integer.
It should be noted that the local context and social relationship are used to construct a connection graph between emotion units. The local context of the sentiment unit comprises other sentiment units in the related microblogs, namely comment information of the related microblogs and other microblogs issued by the same microblog user. The social relationship context of the emotional unit comprises forwarding and replying information, theme characteristics, user relationship information and the like.
In addition, a specific emotion unit TiThe local context and social relationship context of (a) is shown in fig. 5. Wherein, the target emotion unit T in the microblog diA local context and social relationship context diagram. T isjAnd TkIs TiThe local context of (a). T isiIncludes d1、d2And d3Wherein d is1Is a replied or forwarded microblog, d2Representative and TiMicroblogs with a common topic, d3Is a microblog with a user relationship to d.
It should be noted that, constructing the emotion propagation graph requires determining the neighbor relation between two emotion units, so that the propagation window is defined as a specific unit TiThe first n emotion units and the last n emotion units. A co-occurrence relationship refers to a co-occurrence relationship where two words appear under the same window. Co-occurrence relationships and semantic rules can be used to infer the sentiment polarity of an unlabeled sentiment unit. If the probability that two emotion units appear in the same window is high, the mutual information of the two emotion units is also high.
For example, the user relationships from the Sing microblog and the microblog and emotion polarities of the postings are shown in FIG. 6. The relation of attention and attention with direction is formed among different users, for example, User A is a fan of User B, and User B does not pay attention to User A. The label "@ User a" indicates that the relevant context is for User a. The topic labels marked by "#" and "#" imply the topic of the microblog, for example, in the microblog "# popular glow # which is actually a cutworm", it can be inferred that the word microblog is about # popular glow #. In addition, different users may discuss the same theme, such as # Volkswagen # and # Revere Sound #. In FIG. 6, the microblog issued by User D "unparalleled Security, complimentary detail Process! "in," unequally "and" praise "both express positive emotional tendencies.
Besides the co-occurrence relationship, the semantic relationship between the emotion units can be directly judged according to the conjunctions. The emotion units connected by parallel conjunctions have higher probability of possessing the same emotion polarity and similar emotion intensity. For example, in the microblog "super bass" issued by User C, like sounds of nature # refocusing good sound #, positive emotional tendency is expressed by the connected (-super bass) and (-sounds of nature) of "same". And two emotion units connected by a disjunct have opposite emotion polarities. For example, the microblog "experience published at User A is not very good, but I" porridge "! "in, two clauses linked by" but "express opposite emotional tendencies.
In addition to conjunctions, the linguistic-semantic rules of Table 1 can be utilized to determine semantic relationships between emotion units. Wherein, rules 1-11 show the emotion element synthesis rules of part of speech and synthesis rules. Rules 12-17 show 6 rules for conditional clause emotion synthesis. In addition to these semantic rules, virtual mood, doubtful sentences, sarcasm and reflexions need to be considered. These three sentences that require special handling are shown in rules 18-20. The virtual mood is intended to indicate that the speaker is not a fact, but rather a hypothesis, wish, suspicion, or guess. Although there may be emotional indicators in the question sentence, the sentence does not express any emotion. Irony or irony causes the reversal of the emotional polarity of the microblog.
Table 1 20 emotional semantic synthesis rules in Chinese microblog field
Figure GDA0003029256900000081
After obtaining the relationship between two emotion units, TiAnd TjThe edge between is defined as pij=PMI(Ti,Tj)
Figure GDA0003029256900000082
Wherein n is the total number of emotion units in the corpus, hits (T)i) Is T in corpusiNumber of occurrences, hits (T)j) Is T in corpusjNumber of occurrences, hits (T)i,Tj) Is TiAnd TjNumber of times under the same window in both local and social context. The relationship matrix among the emotion units is marked as Pn×n
Furthermore, to measure the connectivity of nodes (emotion units) in the emotion propagation graph G, T is definediIs connected to TjConnection pijThe probability is as follows:
Figure GDA0003029256900000091
wherein n isiIs a sum of TiThere are the number of connected nodes. After the degree of connection is calculated, emotion unit T is calculatediStandard centrality of H (T)i) It is defined as follows:
Figure GDA0003029256900000092
wherein n is the total number of emotion units, niIs TiDegree (Degree) in FIG. G. A node with a higher degree of standard centrality indicates that the node is more important in the network, and therefore the degree of standard centrality is used to measure the connectivity of the node in graph G.
If H is(Ti) The higher, the higher TiThe higher the connectivity; otherwise, the lower. Due to the higher value of H (T)i) Shows TiThe greater the contribution to the propagation of the affective tag. Thus, according to H (T)i) Magnitude of value for all TiSorting and selecting the first M emotion units as a seed emotion unit set TsThus, the seed emotion unit can provide the correct emotion source for the emotion propagation map G. The specific seed emotion unit selection algorithm is shown in table 2.
Table 2 selection algorithm of seed emotion unit
Figure GDA0003029256900000093
After obtaining the seed emotion unit, we use a general emotion dictionary and manual calibration to label it for emotion. The existing emotion dictionary does not contain network words, and network popular words such as 'mountain village', 'male' and 'mulberry heart' are added into the general emotion dictionary. The manual calibration is to check the labels of the seed emotion units and correct the inaccurate labels.
S3, applying emotion transmission algorithm to complete labeled seed emotion unit set TsAnd transmitting the emotion to the n-M emotion units without labels, and respectively acquiring the emotion score of each emotion unit in the n-M emotion units.
Step S3 further includes:
s31, marking the initial emotion score vector of the emotion unit as score (t):
score(T)=[score(T1),score(T2),…score(Tn)]
normalization to score (t):
Figure GDA0003029256900000101
Figure GDA0003029256900000102
wherein is TposSet of positive emotion units, TnegA set of negative emotion units;
s32, removing the smaller connecting edge of the graph G to carry out pruning operation, wherein, k larger values are reserved for each row of the matrix P', the rest are assigned with 0, so as to determine k larger units of each emotion unit as emotion neighbors,
Figure GDA0003029256900000103
wherein, P' is a probability transition matrix;
s33, defining the probability transition matrix of emotion propagation as follows:
Figure GDA0003029256900000104
wherein, beta belongs to [0,1] as an adaptive parameter, A is a matrix with partial rows of 1/n and the rest rows of 0, and J is a matrix with all elements of 1/n. The purpose of adding matrix A is to ensure that matrix P' has no non-0 rows;
s34, the process of emotion label propagation is defined as follows:
Figure GDA0003029256900000105
wherein
Figure GDA0003029256900000106
Is TiSentiment score under t +1 iteration, alpha ∈ [0,1]]In order to be a weight parameter, the weight parameter,
Figure GDA0003029256900000107
is a matrix
Figure GDA0003029256900000108
Line i of (1), score (T)t) The emotion component vector of the T under the T iteration; at each iteration, we compute in order of i ═ 1: n
Figure GDA0003029256900000109
Each time a new one is obtained
Figure GDA00030292569000001010
Is convenient to update
Figure GDA00030292569000001011
S35, when the iteration stops, according to normalizing score (t):
Figure GDA0003029256900000111
TABLE 3 Emotion propagation Algorithm
Figure GDA0003029256900000112
S4, obtaining the emotion score of each emotion word in each emotion unit according to the emotion score of each emotion unit to obtain a microblog specific emotion dictionary L containing explicit emotion characteristics and implicit emotion characteristicsspe
Step S4 further includes:
s41, calculating the emotion score of the emotion characteristics according to the emotion scores of the emotion units:
Figure GDA0003029256900000113
wherein n (w)i) Is the word wiFrequency of occurrence in corpus, score (N) in emotional unit TiScore (D) is in units of sentiment TiDegree of (5) is divided into degrees.
S42, acquiring a microblog specific emotion dictionary L containing explicit emotion characteristics and implicit emotion characteristicsspe
S5, according to the microblog specific emotion dictionary LspeAnd classifying the emotions of the microblog linguistic data.
Step S5 further includes: the microblog-specific emotion dictionary LspeApplying to sentiment classification, wherein explicit and implicit sentiment features and 20 semantic synthesis rules in the Chinese microblog field are considered, and microblog diSentiment score of (d)i) And summing all the emotion units to determine the final emotion polarity according to the emotion marks of the microblogs.
Table 4 parameter setting under three emotion dictionaries of Hownet, NTUSD and SXU and UCI and Weibo data sets
Figure GDA0003029256900000121
It should be noted that some parameters are set and detailed in the model learning process. Table 4 shows the parameter set of the context propagation framework of emotion unit as Φ ═ H, M, k, T, α, β }. Where H is the size of the window in the seed emotion unit selection algorithm, which determines the context of the particular unit. M is the number of seed emotion units. k is the number of emotion units of the emotion propagation map, which affects the number of iterations and the stability of the computation. T is the number of iterations. The convergence of score (T) depends directly on the emotion propagation matrix
Figure GDA0003029256900000122
α is the update rate and β is the weight parameter. The present invention tests various combinations of parameters to find the optimum combination of parameters as shown in table 4.
Fig. 7 and 8 show the consistency ratio of the emotion dictionary obtained by 5-fold cross validation. And obtaining the emotional polarity and strength of the explicit and implicit emotional characteristics through emotional tag propagation. The coincidence rate refers to the percentage of the emotional features and polarities in the two emotional dictionaries that are consistent. The emotional dictionaries generated by the different data sets are not of uniform size and therefore the coherence rate confusion matrix is asymmetric. And combining the 5-fold cross-validation data set to generate an emotion dictionary as a final emotion dictionary. In particular, the present invention uses a voting method to determine the final emotional propensity of an emotional feature.
The emotion dictionary generated in Table 5 includes four parts of speech and percentages
Figure GDA0003029256900000131
Table 5 shows emotion dictionary details generated under the UCI and Weibo data sets. As can be seen from Table 5, the UCI emotion dictionary contains 11428 emotion features, of which 8665 explicit emotion features and 2763 implicit emotion features. The Weibo emotion dictionary contains 24415 emotional features, 20189 explicit emotional features and 4226 implicit emotional features. Most of the general emotion dictionaries are verbs and adjectives, while the emotional features of the nominal part account for 2763 (24.2%) and 4226 (17.5%) of the generated UCI and Weibo microblog-specific emotion dictionaries, respectively. This indicates that the implicit emotional features are the main emotional indicators in the microblog emotional expressions. Small adverbs such as "cluttered," "happy," and "parsimonious" also express emotional tendencies. The emotional features of adverbs account for 185 (1.6%) in the UCI emotion dictionary and 308 (1.3%) in the Weibo emotion dictionary. In both types of emotion dictionaries, negative emotion features are more redundant than positive emotion features. Negative and positive features account for 6322 (55.2%) and 5106 (44.8%) in the UCI emotion dictionary, 12935 (53.0%) and 11480 (47.0%) in the Weibo emotion dictionary, respectively.
Tables 6 and 7 show the emotion classification results of the proposed method and baseline system in the UCI and Weibo data sets, and fig. 11 and 12 show the overall accuracy comparison. It can be seen that the accuracy of the SUCPF method is improved compared with the other five baseline methods. For example, the SUCPF method was improved by 7.1%, 4.8%, 5.9%, 5.4%, 4.5%, and 6.2% under the SXU dictionary and UCI data sets, respectively, and by 14.5%, 2.9%, 4.2%, 6.4%, 5.3%, and 5.1% under the SXU dictionary and Weibo data sets, respectively. This indicates that SUCPF can get better emotion dictionary and emotion classification results. This is mainly because the proposed SUCPF framework is a semi-supervised framework that utilizes existing emotion dictionaries and manual annotations. The SUCPF method uses local context, topic features, and context relationships to extract explicit and implicit emotional features.
TABLE 6 Emotion Classification results in three general Emotion dictionaries in UCI data
Figure GDA0003029256900000141
TABLE 7 Emotion Classification results in three general Emotion dictionaries in Weibo data
Figure GDA0003029256900000151
Fig. 9 and 10 show 50 nominal emotional features extracted on two microblog data sets. It can be seen that these word expressions strongly imply positive or negative emotions and views. Different microblog data sets have the same implicit emotional characteristics, such as 'bad eggs', 'cheating', and 'mushrooms'. The present invention is also able to discover new emotional characteristics, including explicit emotional characteristics such as "bad eggs", "Shenjing disease" and "weak people", and implicit emotional characteristics such as "fire fighters", "tide men" and "industry sickness". This indicates that users are accustomed to using implicit features in social media such as microblogs to indirectly express their sentiment about products, services, etc. The invention has good field adaptability and the identification capability for explicit and implicit emotional characteristics.
In conclusion, compared with the similar representative method, the microblog specific emotion dictionary containing the explicit emotion characteristics and the implicit emotion characteristics is constructed by using the strategy of combining the rules and statistics, the overall calculation accuracy is higher, the stability is higher, the domain-specific emotion dictionary can be effectively constructed, and the explicit emotion characteristics and the implicit emotion characteristics can be accurately extracted.
The accompanying drawings and the detailed description are included to provide a further understanding of the invention. The method of the present invention is not limited to the examples described in the specific embodiments, and other embodiments derived from the method and idea of the present invention by those skilled in the art also belong to the technical innovation scope of the present invention. This summary should not be construed to limit the present invention.

Claims (3)

1. A specific emotion dictionary generation method for Chinese microblog emotion classification is characterized by comprising the following steps of:
s1, changing the microblog corpus D to { D ═ D1,d2,…dlPreprocessing, and extracting a plurality of emotion units T through lexical analysis and syntactic analysisiAnd a plurality of said emotion units TiSet T ═ T as emotion unit1,T2,…TnWherein i and n are positive integers, i is more than or equal to 1 and less than or equal to n, and T is definediN is a negative indicator, D is a degree adverb, E is an evaluation word, and P is an emotional polarity;
s2, constructing an emotion propagation graph G (V, E, W) based on the emotion unit set T, wherein V is a set of emotion units, E is a set of edges, W is a weight matrix between emotion units, and calculating the emotion units TiStandard centrality of H (T)i) And according to the standard centrality H (T)i) For a plurality of emotion units TiSorting is carried out, and the first M are selected as seed emotion unit sets TsAnd carrying out emotion label labeling on the seed emotion units by using a general emotion dictionary and manual labeling, wherein M is<n, wherein M/n is more than or equal to 20 percent, and M, n is a positive integer;
s3, applying emotion transmission algorithm to complete labeled seed emotion unit set TsTransmitting the emotion to n-M emotion units without labels, and respectively acquiring the emotion score of each emotion unit in the n-M emotion units;
step S3 further includes:
s31, marking the initial emotion score vector of the emotion unit as score (t):
score(T)=[score(T1),score(T2),…score(Tn)]
normalization to score (t):
Figure FDA0003029256890000011
Figure FDA0003029256890000012
wherein is TposSet of positive emotion units, TnegA set of negative emotion units;
s32, removing the smaller connecting edge of the graph G to carry out pruning operation, wherein, k larger values are reserved for each row of the matrix P', the rest are assigned with 0, so as to determine k larger units of each emotion unit as emotion neighbors,
Figure FDA0003029256890000021
wherein, P' is a probability transition matrix;
Figure FDA0003029256890000022
wherein the content of the first and second substances,
Figure FDA0003029256890000023
n is the total number of emotion units in the corpus, niIs TiDegrees, hits (T) in graph Gi) Is T in corpusiNumber of occurrences, hits (T)j) Is T in corpusjNumber of occurrences, hits (T)i,Tj) Is TiAnd TjThe frequency of occurrence in the same window under the local and social context, and the relationship matrix among the emotion units is marked as Pij
S33, defining the probability transition matrix of emotion propagation as follows:
Figure FDA0003029256890000024
wherein, beta belongs to [0,1] as an adaptive parameter, A is a matrix with partial rows of 1/n and the rest rows of 0, the purpose of adding the matrix A is to ensure that the matrix P' has no rows other than 0, and J is a matrix with all elements of 1/n;
s34, the process of emotion label propagation is defined as follows:
Figure FDA0003029256890000025
wherein score (T)i t+1) Is TiSentiment score under t +1 iteration, alpha ∈ [0,1]]In order to be a weight parameter, the weight parameter,
Figure FDA0003029256890000026
is a matrix
Figure FDA0003029256890000027
Line i of (1), score (T)t) The emotion component vector of the T under the T iteration; at each iteration, we compute score (T) in the order of i ═ 1: ni t+1) Each time a new one is obtained
Figure FDA0003029256890000028
Just update score (T)i t+1);
S35, when the iteration stops, according to the normalization of score (T),
Figure FDA0003029256890000029
s4, obtaining the emotion score of each emotion word in each emotion unit according to the emotion score of each emotion unit to obtain a microblog specific emotion dictionary L containing explicit emotion characteristics and implicit emotion characteristicsspe
Step S4 further includes:
s41, calculating the emotion score of the emotion characteristics according to the emotion scores of the emotion units:
Figure FDA0003029256890000031
wherein n (w)i) Is the word wiFrequency of occurrence in corpus, score (N) in emotional unit TiScore (D) is in units of sentiment TiThe degree of the middle is divided;
s42, acquiring a microblog specific emotion dictionary L containing explicit emotion characteristics and implicit emotion characteristicsspe
S5, according to the microblog specific emotion dictionary LspeClassifying the emotions of the microblog linguistic data;
the explicit emotional characteristics refer to emotional words, phrases and idioms which directly express emotion on entities or aspects, and the implicit emotional characteristics refer to characteristics which express positive or negative emotion without obvious emotional indicators.
2. The method for generating the specific emotion dictionary for Chinese microblog emotion classification according to claim 1, wherein the step S1 includes:
s11, filtering out microblog data set D ═ D by using a rule method1,d2,…dlThe information of links, stop words, repeated words and noise in the Chinese language is obtained;
s12, using the part-of-speech tagging tool pair D ═ D1,d2,…dlLexical analysis is performed and a dependent parsing tool is used to pair D ═ D1,d2,…dlCarrying out syntactic analysis;
s13, extracting D ═ D1,d2,…dlAdjectives, verbs, adverbs and nouns in the Chinese are taken as candidate emotional characteristics W ═ W1,w2,…wnAnd filtering out low-frequency words;
s14, extracting negative modification relations and degree modification relations in the dependency syntax analysis result;
s15, using the extracted candidate emotion characteristics, negative modification relation and degree modification relation as emotion units TiA plurality of standsThe emotion units form an emotion unit set T ═ T1,T2,…TN}。
3. The method for generating the specific emotion dictionary for Chinese microblog emotion classification according to claim 1, wherein the step S5 further comprises:
the microblog-specific emotion dictionary LspeApplying to sentiment classification, wherein explicit and implicit sentiment features and 20 semantic synthesis rules in the Chinese microblog field are considered, and microblog diSentiment score of (d)i) And summing all the emotion units to determine the final emotion polarity according to the emotion marks of the microblogs.
CN201811145088.0A 2018-09-29 2018-09-29 Specific emotion dictionary generation method for Chinese microblog emotion classification Active CN109376239B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811145088.0A CN109376239B (en) 2018-09-29 2018-09-29 Specific emotion dictionary generation method for Chinese microblog emotion classification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811145088.0A CN109376239B (en) 2018-09-29 2018-09-29 Specific emotion dictionary generation method for Chinese microblog emotion classification

Publications (2)

Publication Number Publication Date
CN109376239A CN109376239A (en) 2019-02-22
CN109376239B true CN109376239B (en) 2021-07-30

Family

ID=65402516

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811145088.0A Active CN109376239B (en) 2018-09-29 2018-09-29 Specific emotion dictionary generation method for Chinese microblog emotion classification

Country Status (1)

Country Link
CN (1) CN109376239B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110083726B (en) * 2019-03-11 2021-10-22 北京比速信息科技有限公司 Destination image perception method based on UGC picture data
CN110032646B (en) * 2019-05-08 2022-12-30 山西财经大学 Cross-domain text emotion classification method based on multi-source domain adaptive joint learning
CN110489553B (en) * 2019-07-26 2022-07-05 湖南大学 Multi-source information fusion-based emotion classification method
CN110489522B (en) * 2019-07-26 2022-04-12 湖南大学 Emotional dictionary construction method based on user score
CN111523300B (en) * 2020-04-14 2021-03-05 北京精准沟通传媒科技股份有限公司 Vehicle comprehensive evaluation method and device and electronic equipment
CN112000804B (en) * 2020-08-18 2022-08-02 安徽理工大学 Microblog hot topic user group emotion tendentiousness analysis method
CN112632272B (en) * 2020-10-20 2022-07-19 浙江工业大学 Microblog emotion classification method and system based on syntactic analysis
CN113326694B (en) * 2021-05-18 2022-09-30 西华大学 Implicit emotion dictionary generation method based on emotion propagation
CN115080689A (en) * 2022-06-15 2022-09-20 昆明理工大学 Label association fused hidden space data enhanced multi-label text classification method

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103544242A (en) * 2013-09-29 2014-01-29 广东工业大学 Microblog-oriented emotion entity searching system
WO2015053607A1 (en) * 2013-10-10 2015-04-16 Mimos Berhad System and method for semantic-level sentiment analysis of text
CN106372058A (en) * 2016-08-29 2017-02-01 中译语通科技(北京)有限公司 Short text emotion factor extraction method and device based on deep learning
CN106503049A (en) * 2016-09-22 2017-03-15 南京理工大学 A kind of microblog emotional sorting technique for merging multiple affection resources based on SVM
CN106547842A (en) * 2016-10-14 2017-03-29 华东师范大学 A kind of method that location-based emotion is visualized on virtual earth platform
US9633007B1 (en) * 2016-03-24 2017-04-25 Xerox Corporation Loose term-centric representation for term classification in aspect-based sentiment analysis
CN106776554A (en) * 2016-12-09 2017-05-31 厦门大学 A kind of microblog emotional Forecasting Methodology based on the study of multi-modal hypergraph

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110320387A1 (en) * 2010-06-28 2011-12-29 International Business Machines Corporation Graph-based transfer learning

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103544242A (en) * 2013-09-29 2014-01-29 广东工业大学 Microblog-oriented emotion entity searching system
WO2015053607A1 (en) * 2013-10-10 2015-04-16 Mimos Berhad System and method for semantic-level sentiment analysis of text
US9633007B1 (en) * 2016-03-24 2017-04-25 Xerox Corporation Loose term-centric representation for term classification in aspect-based sentiment analysis
CN106372058A (en) * 2016-08-29 2017-02-01 中译语通科技(北京)有限公司 Short text emotion factor extraction method and device based on deep learning
CN106503049A (en) * 2016-09-22 2017-03-15 南京理工大学 A kind of microblog emotional sorting technique for merging multiple affection resources based on SVM
CN106547842A (en) * 2016-10-14 2017-03-29 华东师范大学 A kind of method that location-based emotion is visualized on virtual earth platform
CN106776554A (en) * 2016-12-09 2017-05-31 厦门大学 A kind of microblog emotional Forecasting Methodology based on the study of multi-modal hypergraph

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Large scale and parallel sentiment analysis based on Label Propagation in Twitter Data;Yibing Yang等;《2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications》;20180906;第1791-1798页 *
Semi-supervised polarity lexicon induction;Delip Rao等;《EACL "09: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics》;20090331;第675–682页 *
基于分类的微博新情感词抽取方法和特征分析;刘德喜等;《计算机学报》;20180731;第1574-1597页 *
基于情感分析的评论挖掘技术研究;雷小惠;《中国优秀硕士学位论文全文数据库 信息科技辑》;20180415;第I138-3747页 *
王志昊.情感分类特征选择方法研究.《中国优秀硕士学位论文全文数据库 信息科技辑》.2014, *

Also Published As

Publication number Publication date
CN109376239A (en) 2019-02-22

Similar Documents

Publication Publication Date Title
CN109376239B (en) Specific emotion dictionary generation method for Chinese microblog emotion classification
Fernández-Gavilanes et al. Unsupervised method for sentiment analysis in online texts
Ahmed et al. SentiCR: A customized sentiment analysis tool for code review interactions
Saputri et al. Emotion classification on indonesian twitter dataset
Fernández-Gavilanes et al. Creating emoji lexica from unsupervised sentiment analysis of their descriptions
Kunneman et al. Signaling sarcasm: From hyperbole to hashtag
US9336192B1 (en) Methods for analyzing text
Gamal et al. Twitter benchmark dataset for Arabic sentiment analysis
Rokade et al. Business intelligence analytics using sentiment analysis-a survey
CN108920455A (en) A kind of Chinese automatically generates the automatic evaluation method of text
CN110765769A (en) Entity attribute dependency emotion analysis method based on clause characteristics
Tan et al. Usaar-wlv: Hypernym generation with deep neural nets
Fernández-Gavilanes et al. GTI: An unsupervised approach for sentiment analysis in Twitter
Gosai et al. A review on a emotion detection and recognization from text using natural language processing
Lertpiya et al. A preliminary study on fundamental Thai NLP tasks for user-generated web content
Padmamala et al. Sentiment analysis of online Tamil contents using recursive neural network models approach for Tamil language
Krommyda et al. Emotion detection in Twitter posts: a rule-based algorithm for annotated data acquisition
Nahar et al. Sentiment analysis and emotion extraction: A review of research paradigm
Nugraheni Indonesian twitter data pre-processing for the emotion recognition
Kanev et al. Sentiment analysis of multilingual texts using machine learning methods
Bahrainian et al. Fuzzy subjective sentiment phrases: A context sensitive and self-maintaining sentiment lexicon
Bracewell Semi-automatic creation of an emotion dictionary using wordnet and its evaluation
Hathout Acquisition of morphological families and derivational series from a machine readable dictionary
Huangfu et al. OCC model-based emotion extraction from online reviews
Litkowski Desiderata for tagging with wordnet synsets or mcca categories

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant