CN109376239B - Specific emotion dictionary generation method for Chinese microblog emotion classification - Google Patents
Specific emotion dictionary generation method for Chinese microblog emotion classification Download PDFInfo
- Publication number
- CN109376239B CN109376239B CN201811145088.0A CN201811145088A CN109376239B CN 109376239 B CN109376239 B CN 109376239B CN 201811145088 A CN201811145088 A CN 201811145088A CN 109376239 B CN109376239 B CN 109376239B
- Authority
- CN
- China
- Prior art keywords
- emotion
- units
- score
- microblog
- dictionary
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/253—Grammatical analysis; Style critique
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Abstract
The invention discloses a method for generating a specific emotion dictionary for Chinese microblog emotion classification. And finally, completing emotion transmission from the seed emotion unit set with the tag to the emotion units without the tag through an emotion transmission algorithm, acquiring emotion marks of all emotion words in all emotion units, obtaining a microblog specific emotion dictionary containing explicit emotion characteristics and implicit emotion characteristics, and classifying the emotions of the microblog corpus according to the microblog specific emotion dictionary. Compared with a similar representative method, the method has higher overall calculation accuracy and higher stability, can effectively construct a domain-specific emotion dictionary, and accurately extracts explicit and implicit emotional features.
Description
Technical Field
The invention relates to the field of computer social media text sentiment analysis, and provides a method for generating a specific sentiment dictionary for Chinese microblog sentiment classification.
Background
The construction of the emotion dictionary is a basic and important aspect of the emotion analysis task. Where a word or phrase is an essential element that expresses positive or negative emotions. At present, microblogs become a fashionable communication mode on the Internet. The individual users can freely, conveniently and instantly express their opinions on products and public events through the media such as the Xinlang microblog and the like. Due to the extremely short length and rich vocabulary of the microblog, the vector space model represented by the microblog is very sparse. Therefore, the dictionary method is more suitable for microblog emotion analysis. However, the emotion vocabulary used in different fields is different, and the same word appearing in different fields may express different views, which results in diversity of emotion expression and semantic variability of the emotion vocabulary. Because the emotion of a word or a phrase often depends on a specific field, and manual labeling of emotion words by using a general emotion dictionary in a specific field is time-consuming and labor-consuming, the general emotion dictionary cannot well classify the emotion of the specific field and cannot meet the requirement of emotion classification of the specific field.
Methods such as linguistic rule-based methods, corpus-based methods, and dictionary-based methods have been proposed in the prior art to automatically construct domain-specific emotion dictionaries. However, because expression forms of the grammar of the microblog are various, a user does not follow grammar rules when organizing the microblog, and therefore, the method based on the language rules cannot be suitable for all situations of the microblog; the corpus-based method depends heavily on the size of the corpus, and the dictionary-based method is closely related to the quality of the general emotion dictionary. Therefore, the three methods are not suitable for constructing the emotional dictionary in the microblog specific field.
In addition, conventional emotion analysis studies have focused on Explicit emotional Features (Explicit sentimental Features) such as "goodness", "dislike", and "like". One obvious element of these emotional characteristics is the presence of a clear emotional indication. These affective words, phrases and idioms that directly express emotion on an entity or aspect are referred to as explicit emotional characteristics. In fact, many users use linguistic paraphrases or statements of facts to express an inclusive indirect emotion. Implicit emotional Features (Implicit Sentiment Features) generally refer to Features that express positive or negative emotions without explicit emotional indicators. These features often state a fact or indirectly express an emotion. Four microblogs including the implicit emotional characteristics "cutworm", "flood and wilderness", "navy" and "five-mao special effect" are shown in fig. 3. The identification of implicit emotional features has been a challenging problem because there is no emotional indication at all.
Disclosure of Invention
The method aims to construct a microblog specific emotion dictionary by extracting dominant and recessive emotional features, and carry out emotion classification on the microblog according to the emotion polarity (positive, negative and neutral) of the emotional features.
In order to achieve the purpose, the invention provides a method for generating a specific emotion dictionary for Chinese microblog emotion classification aiming at construction of microblog specific emotion vocabularies and the mentioned implicit emotion characteristic, which comprises the following steps:
s1, changing the microblog corpus D to { D ═ D1,d2,…dlPreprocessing, and extracting a plurality of emotion units T through lexical analysis and syntactic analysisiAnd a plurality of said emotion units TiSet T ═ T as emotion unit1,T2,…TnWherein i and n are positive integers, i is more than or equal to 1 and less than or equal to n, and T is definediN is a negative indicator, D is a degree adverb, E is an evaluation word, and P is an emotional polarity;
s2, constructing an emotion propagation graph G (V, E, W) based on the emotion unit set T, wherein V is a set of emotion units, E is a set of edges, W is a weight matrix between emotion units, and calculating the emotion units TiStandard centrality of H (T)i) And according to the standard centrality H (T)i) Descending order for a plurality of emotion units TiThe sorting is carried out, and the sorting is carried out,selecting the first M as a seed emotion unit set TsAnd carrying out emotion label labeling on the seed emotion units by using a general emotion dictionary and manual labeling, wherein M is<n, wherein M/n is more than or equal to 20 percent, and M, n is a positive integer;
s3, applying emotion transmission algorithm to complete labeled seed emotion unit set TsTransmitting the emotion to n-M emotion units without labels, and respectively acquiring the emotion score of each emotion unit in the n-M emotion units;
s4, obtaining the emotion score of each emotion word in each emotion unit according to the emotion score of each emotion unit to obtain a microblog specific emotion dictionary L containing explicit emotion characteristics and implicit emotion characteristicsspe;
S5, according to the microblog specific emotion dictionary LspeAnd classifying the emotions of the microblog linguistic data.
According to the method for generating the specific emotion dictionary for classifying Chinese microblog emotions, which is provided by the embodiment of the invention, microblog linguistic data are preprocessed, a plurality of emotion units are selected, then constructing an emotion propagation map by using a plurality of emotion units, calculating the standard centrality of the emotion units, then the emotion units are arranged according to the standard centrality degree and the size, the first M emotion units are selected as seed emotion units, then, the emotion label marking is carried out on the seed emotion units through the general emotion dictionary and the manual marking, and finally, the emotion transmission of the seed emotion unit set with the label to the emotion unit without the label is completed through an emotion transmission algorithm, and obtaining the emotion marks of all emotion units in all emotion units to obtain a microblog specific emotion dictionary containing explicit emotion characteristics and implicit emotion characteristics, and classifying the emotions of the microblog corpus according to the microblog specific emotion dictionary.
According to an embodiment of the present invention, the step S1 includes:
s11, filtering out microblog data set D ═ D by using a rule method1,d2,…dlThe information of links, stop words, repeated words and noise in the Chinese language is obtained;
s12, using the part-of-speech tagging tool pair D ═ D1,d2,…dlLexical analysis is performed and a dependent parsing tool is used to pair D ═ D1,d2,…dlCarrying out syntactic analysis;
s13, extracting D ═ D1,d2,…dlAdjectives, verbs, adverbs and nouns in the Chinese are taken as candidate emotional characteristics W ═ W1,w2,…wnAnd filtering out low-frequency words;
s14, extracting negative modification relations and degree modification relations in the dependency syntax analysis result;
s15, using the extracted candidate emotion characteristics, negative modification relation and degree modification relation as emotion units TiA plurality of emotion units form an emotion unit set T ═ { T ═ T1,T2,…TN}。
n is the total number of emotion units in the corpus, niIs TiDegrees, hits (T) in graph Gi) Is T in corpusiNumber of occurrences, hits (T)j) Is T in corpusjNumber of occurrences, hits (T)i,Tj) Is TiAnd TjThe frequency of occurrence in the same window under the local and social context, and the relationship matrix among the emotion units is marked as Pij。
According to an embodiment of the present invention, step S3 further includes:
s31, marking the initial emotion score vector of the emotion unit as score (t):
score(T)=[score(T1),score(T2),…score(Tn)]
normalization to score (t):
wherein is TposSet of positive emotion units, TnegA set of negative emotion units;
s32, removing the smaller connecting edge of the graph G to carry out pruning operation, wherein, k larger values are reserved for each row of the matrix P', the rest are assigned with 0, so as to determine k larger units of each emotion unit as emotion neighbors,wherein, P' is a probability transition matrix;
s33, defining the probability transition matrix of emotion propagation as follows:
wherein, beta belongs to [0,1] as an adaptive parameter, A is a matrix with partial rows of 1/n and the rest rows of 0, and J is a matrix with all elements of 1/n. The purpose of adding matrix A is to ensure that matrix P' has no non-0 rows;
s34, the process of emotion label propagation is defined as follows:
whereinIs TiSentiment score under t +1 iteration, alpha ∈ [0,1]]In order to be a weight parameter, the weight parameter,is a matrixLine i of (1), score (T)t) The emotion component vector of the T under the T iteration; at each iteration, we compute in order of i ═ 1: nEach time a new one is obtainedIs convenient to update
S35, when the iteration stops, according to normalizing score (t):
according to an embodiment of the present invention, step S4 further includes:
s41, calculating the emotion score of the emotion characteristics according to the emotion scores of the emotion units:
wherein n (w)i) Is the word wiFrequency of occurrence in corpus, score (N) in emotional unit TiScore (D) is in units of sentiment TiDegree of (5) is divided into degrees.
S42, acquiring a microblog specific emotion dictionary L containing explicit emotion characteristics and implicit emotion characteristicsspe。
According to an embodiment of the present invention, step S5 further includes:
the microblog-specific emotion dictionary LspeThe method is applied to sentiment classification, wherein explicit and implicit sentiment features and 20 semantic synthesis rules in the Chinese microblog field are considered, andand microblog diSentiment score of (d)i) And summing all the emotion units to determine the final emotion polarity according to the emotion marks of the microblogs.
Compared with the prior art, the invention has the following beneficial effects: (1) the invention provides a new emotion unit emotion spreading frame which can generate a microblog specific emotion dictionary on a Chinese microblog data set, wherein the generated dictionary comprises explicit emotion characteristics and implicit noun or noun phrase emotion characteristics; (2) the social relationships, topic features and local context information are used to construct an emotion propagation graph and its adjacency matrix. The emotion propagation algorithm propagates emotion from labeled cells to unlabeled cells. Through emotion label propagation, emotion scores of explicit and implicit emotion characteristics can be obtained; (3) the proposed framework is verified on two microblog datasets, UCI and Weibo. Experimental results prove that the method can generate a high-quality microblog specific emotion dictionary, obtain a better result in the emotion classification task and improve the emotion classification accuracy.
Drawings
FIG. 1 is a flowchart of a method for generating a specific emotion dictionary for Chinese microblog emotion classification according to an embodiment of the invention.
FIG. 2 is a frame diagram for generating Chinese domain-specific emotion dictionary fusing social relations and local contexts.
Fig. 3 is a micro-blog containing four implicit features "cutworm", "flood force", "navy", and "five mao effect".
FIG. 4 is a flow chart of three microblog processes under the context propagation framework of emotion units.
FIG. 5 is a diagram of local context and social context of a target sentiment unit in a microblog.
FIG. 6 is a microbump diagram from the relationship of the Sina microblog users and their releases.
Fig. 7 is a graph of the matching rate of the emotion dictionary obtained under the UCI data set 5-fold cross validation.
FIG. 8 is a chart of the matching rate of the emotion dictionaries obtained under 5-fold cross validation of the Weibo data set.
Fig. 9 is a word cloud graph of nominal emotional features on a UCI data set.
FIG. 10 is a word cloud of nominal emotional features on a Weibo data set.
FIG. 11 is the sentiment classification results (positive, negative and neutral) under the UCI data set.
FIG. 12 is the sentiment classification results (positive and negative) under the Weibo data set.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
As shown in fig. 1 to 12, the framework of the invention is essentially divided into the following five steps, which are connected layer by layer and finally fused. The learning process mainly comprises the following steps:
s1, changing the microblog corpus D to { D ═ D1,d2,…dlPreprocessing, and extracting a plurality of emotion units T through lexical analysis and syntactic analysisiAnd a plurality of said emotion units TiSet T ═ T as emotion unit1,T2,…TnWherein i and n are positive integers, i is more than or equal to 1 and less than or equal to n, and T is definediN is a negative indicator, D is a degree adverb, E is an evaluation word, and P is an emotional polarity.
Wherein, step S1 includes: s11, filtering out microblog data set D ═ D by using a rule method1,d2,…dlThe information of links, stop words, repeated words and noise in the Chinese language is obtained;
s12, using the part-of-speech tagging tool pair D ═ D1,d2,…dlLexical analysis is performed and a dependent parsing tool is used to pair D ═ D1,d2,…dlCarrying out syntactic analysis;
s13, extracting D ═ D1,d2,…dlAdjectives, verbs, adverbs and nouns in the Chinese are taken as candidate emotional characteristics W ═ W1,w2,…wnAnd filtering out low-frequency words;
s14, extracting negative modification relations and degree modification relations in the dependency syntax analysis result;
s15, using the extracted candidate emotion characteristics, negative modification relation and degree modification relation as emotion units TiA plurality of emotion units form an emotion unit set T ═ { T ═ T1,T2,…Tn}。
In the preprocessing stage, links, stop words, repeated words and other noise information in the microblog are filtered out firstly. Lexical and syntactic analyses are performed using part-of-speech tagging and dependency parsing tools in the Hadamard Language Technology Platform (LTP). For a specific character in the microblog, firstly, extracting topic label information marked by #, user relations marked by @ and emoticons and the like. Words or phrases that may express emotion are defined as candidate explicit and implicit emotional features. Lexical analysis may analyze word tagging information (Parts of Speech, POS), topic characteristics, user relationships in the social network, and the like. We extract adjectives, verbs, adverbs and nouns as candidate emotional features and filter out words with low frequency. Dependency parsing reveals their syntactic structure by analyzing dependencies between linguistic elements. And extracting negative modification relations and degree modification relations in the dependency syntax analysis result. In addition, the microblog emotion data is marked as D ═ D1,d2,…dl},W={w1,w2,…wnIs the set of potential emotion features, where WkAnd the emotion words or phrases are more than or equal to 1 and less than or equal to n. N is a negative indicator such as "no", "not" and "none", etc. D is a degree adverb such as "completely" and "equivalently" and the like.
Wherein, the emotion unit set is marked as T ═ { T ═ T1,T2,…TnWhere T isiIs an emotion unit, i is more than or equal to 1 and less than or equal to n. T issIs a set of seed sentiment units, | TsAnd M. The adverbs and negative indicators of degree affect the emotional intensity and polarity of the emotion. For example, as shown in FIG. 4, in the emotion units (not, good, negative), "good" is a positive emotion word, and "very" enhances the degree of negative emotion.
S2, constructing emotion sensor based on emotion unit set TCalculating the emotion unit T by (V, E, W), wherein V is a set of emotion units, E is a set of edges, and W is a weight matrix between the emotion unitsiStandard centrality of H (T)i) And according to the standard centrality H (T)i) Descending order for a plurality of emotion units TiSorting is carried out, and the first M are selected as seed emotion unit sets TsAnd carrying out emotion label labeling on the seed emotion units by using a general emotion dictionary and manual labeling, wherein M is<n, and M/n is more than or equal to 20 percent, and M, n is a positive integer.
It should be noted that the local context and social relationship are used to construct a connection graph between emotion units. The local context of the sentiment unit comprises other sentiment units in the related microblogs, namely comment information of the related microblogs and other microblogs issued by the same microblog user. The social relationship context of the emotional unit comprises forwarding and replying information, theme characteristics, user relationship information and the like.
In addition, a specific emotion unit TiThe local context and social relationship context of (a) is shown in fig. 5. Wherein, the target emotion unit T in the microblog diA local context and social relationship context diagram. T isjAnd TkIs TiThe local context of (a). T isiIncludes d1、d2And d3Wherein d is1Is a replied or forwarded microblog, d2Representative and TiMicroblogs with a common topic, d3Is a microblog with a user relationship to d.
It should be noted that, constructing the emotion propagation graph requires determining the neighbor relation between two emotion units, so that the propagation window is defined as a specific unit TiThe first n emotion units and the last n emotion units. A co-occurrence relationship refers to a co-occurrence relationship where two words appear under the same window. Co-occurrence relationships and semantic rules can be used to infer the sentiment polarity of an unlabeled sentiment unit. If the probability that two emotion units appear in the same window is high, the mutual information of the two emotion units is also high.
For example, the user relationships from the Sing microblog and the microblog and emotion polarities of the postings are shown in FIG. 6. The relation of attention and attention with direction is formed among different users, for example, User A is a fan of User B, and User B does not pay attention to User A. The label "@ User a" indicates that the relevant context is for User a. The topic labels marked by "#" and "#" imply the topic of the microblog, for example, in the microblog "# popular glow # which is actually a cutworm", it can be inferred that the word microblog is about # popular glow #. In addition, different users may discuss the same theme, such as # Volkswagen # and # Revere Sound #. In FIG. 6, the microblog issued by User D "unparalleled Security, complimentary detail Process! "in," unequally "and" praise "both express positive emotional tendencies.
Besides the co-occurrence relationship, the semantic relationship between the emotion units can be directly judged according to the conjunctions. The emotion units connected by parallel conjunctions have higher probability of possessing the same emotion polarity and similar emotion intensity. For example, in the microblog "super bass" issued by User C, like sounds of nature # refocusing good sound #, positive emotional tendency is expressed by the connected (-super bass) and (-sounds of nature) of "same". And two emotion units connected by a disjunct have opposite emotion polarities. For example, the microblog "experience published at User A is not very good, but I" porridge "! "in, two clauses linked by" but "express opposite emotional tendencies.
In addition to conjunctions, the linguistic-semantic rules of Table 1 can be utilized to determine semantic relationships between emotion units. Wherein, rules 1-11 show the emotion element synthesis rules of part of speech and synthesis rules. Rules 12-17 show 6 rules for conditional clause emotion synthesis. In addition to these semantic rules, virtual mood, doubtful sentences, sarcasm and reflexions need to be considered. These three sentences that require special handling are shown in rules 18-20. The virtual mood is intended to indicate that the speaker is not a fact, but rather a hypothesis, wish, suspicion, or guess. Although there may be emotional indicators in the question sentence, the sentence does not express any emotion. Irony or irony causes the reversal of the emotional polarity of the microblog.
Table 1 20 emotional semantic synthesis rules in Chinese microblog field
After obtaining the relationship between two emotion units, TiAnd TjThe edge between is defined as pij=PMI(Ti,Tj)
Wherein n is the total number of emotion units in the corpus, hits (T)i) Is T in corpusiNumber of occurrences, hits (T)j) Is T in corpusjNumber of occurrences, hits (T)i,Tj) Is TiAnd TjNumber of times under the same window in both local and social context. The relationship matrix among the emotion units is marked as Pn×n。
Furthermore, to measure the connectivity of nodes (emotion units) in the emotion propagation graph G, T is definediIs connected to TjConnection pi′jThe probability is as follows:wherein n isiIs a sum of TiThere are the number of connected nodes. After the degree of connection is calculated, emotion unit T is calculatediStandard centrality of H (T)i) It is defined as follows:
wherein n is the total number of emotion units, niIs TiDegree (Degree) in FIG. G. A node with a higher degree of standard centrality indicates that the node is more important in the network, and therefore the degree of standard centrality is used to measure the connectivity of the node in graph G.
If H is(Ti) The higher, the higher TiThe higher the connectivity; otherwise, the lower. Due to the higher value of H (T)i) Shows TiThe greater the contribution to the propagation of the affective tag. Thus, according to H (T)i) Magnitude of value for all TiSorting and selecting the first M emotion units as a seed emotion unit set TsThus, the seed emotion unit can provide the correct emotion source for the emotion propagation map G. The specific seed emotion unit selection algorithm is shown in table 2.
Table 2 selection algorithm of seed emotion unit
After obtaining the seed emotion unit, we use a general emotion dictionary and manual calibration to label it for emotion. The existing emotion dictionary does not contain network words, and network popular words such as 'mountain village', 'male' and 'mulberry heart' are added into the general emotion dictionary. The manual calibration is to check the labels of the seed emotion units and correct the inaccurate labels.
S3, applying emotion transmission algorithm to complete labeled seed emotion unit set TsAnd transmitting the emotion to the n-M emotion units without labels, and respectively acquiring the emotion score of each emotion unit in the n-M emotion units.
Step S3 further includes:
s31, marking the initial emotion score vector of the emotion unit as score (t):
score(T)=[score(T1),score(T2),…score(Tn)]
normalization to score (t):
wherein is TposSet of positive emotion units, TnegA set of negative emotion units;
s32, removing the smaller connecting edge of the graph G to carry out pruning operation, wherein, k larger values are reserved for each row of the matrix P', the rest are assigned with 0, so as to determine k larger units of each emotion unit as emotion neighbors,wherein, P' is a probability transition matrix;
s33, defining the probability transition matrix of emotion propagation as follows:
wherein, beta belongs to [0,1] as an adaptive parameter, A is a matrix with partial rows of 1/n and the rest rows of 0, and J is a matrix with all elements of 1/n. The purpose of adding matrix A is to ensure that matrix P' has no non-0 rows;
s34, the process of emotion label propagation is defined as follows:
whereinIs TiSentiment score under t +1 iteration, alpha ∈ [0,1]]In order to be a weight parameter, the weight parameter,is a matrixLine i of (1), score (T)t) The emotion component vector of the T under the T iteration; at each iteration, we compute in order of i ═ 1: nEach time a new one is obtainedIs convenient to update
S35, when the iteration stops, according to normalizing score (t):
TABLE 3 Emotion propagation Algorithm
S4, obtaining the emotion score of each emotion word in each emotion unit according to the emotion score of each emotion unit to obtain a microblog specific emotion dictionary L containing explicit emotion characteristics and implicit emotion characteristicsspe。
Step S4 further includes:
s41, calculating the emotion score of the emotion characteristics according to the emotion scores of the emotion units:
wherein n (w)i) Is the word wiFrequency of occurrence in corpus, score (N) in emotional unit TiScore (D) is in units of sentiment TiDegree of (5) is divided into degrees.
S42, acquiring a microblog specific emotion dictionary L containing explicit emotion characteristics and implicit emotion characteristicsspe。
S5, according to the microblog specific emotion dictionary LspeAnd classifying the emotions of the microblog linguistic data.
Step S5 further includes: the microblog-specific emotion dictionary LspeApplying to sentiment classification, wherein explicit and implicit sentiment features and 20 semantic synthesis rules in the Chinese microblog field are considered, and microblog diSentiment score of (d)i) And summing all the emotion units to determine the final emotion polarity according to the emotion marks of the microblogs.
Table 4 parameter setting under three emotion dictionaries of Hownet, NTUSD and SXU and UCI and Weibo data sets
It should be noted that some parameters are set and detailed in the model learning process. Table 4 shows the parameter set of the context propagation framework of emotion unit as Φ ═ H, M, k, T, α, β }. Where H is the size of the window in the seed emotion unit selection algorithm, which determines the context of the particular unit. M is the number of seed emotion units. k is the number of emotion units of the emotion propagation map, which affects the number of iterations and the stability of the computation. T is the number of iterations. The convergence of score (T) depends directly on the emotion propagation matrixα is the update rate and β is the weight parameter. The present invention tests various combinations of parameters to find the optimum combination of parameters as shown in table 4.
Fig. 7 and 8 show the consistency ratio of the emotion dictionary obtained by 5-fold cross validation. And obtaining the emotional polarity and strength of the explicit and implicit emotional characteristics through emotional tag propagation. The coincidence rate refers to the percentage of the emotional features and polarities in the two emotional dictionaries that are consistent. The emotional dictionaries generated by the different data sets are not of uniform size and therefore the coherence rate confusion matrix is asymmetric. And combining the 5-fold cross-validation data set to generate an emotion dictionary as a final emotion dictionary. In particular, the present invention uses a voting method to determine the final emotional propensity of an emotional feature.
The emotion dictionary generated in Table 5 includes four parts of speech and percentages
Table 5 shows emotion dictionary details generated under the UCI and Weibo data sets. As can be seen from Table 5, the UCI emotion dictionary contains 11428 emotion features, of which 8665 explicit emotion features and 2763 implicit emotion features. The Weibo emotion dictionary contains 24415 emotional features, 20189 explicit emotional features and 4226 implicit emotional features. Most of the general emotion dictionaries are verbs and adjectives, while the emotional features of the nominal part account for 2763 (24.2%) and 4226 (17.5%) of the generated UCI and Weibo microblog-specific emotion dictionaries, respectively. This indicates that the implicit emotional features are the main emotional indicators in the microblog emotional expressions. Small adverbs such as "cluttered," "happy," and "parsimonious" also express emotional tendencies. The emotional features of adverbs account for 185 (1.6%) in the UCI emotion dictionary and 308 (1.3%) in the Weibo emotion dictionary. In both types of emotion dictionaries, negative emotion features are more redundant than positive emotion features. Negative and positive features account for 6322 (55.2%) and 5106 (44.8%) in the UCI emotion dictionary, 12935 (53.0%) and 11480 (47.0%) in the Weibo emotion dictionary, respectively.
Tables 6 and 7 show the emotion classification results of the proposed method and baseline system in the UCI and Weibo data sets, and fig. 11 and 12 show the overall accuracy comparison. It can be seen that the accuracy of the SUCPF method is improved compared with the other five baseline methods. For example, the SUCPF method was improved by 7.1%, 4.8%, 5.9%, 5.4%, 4.5%, and 6.2% under the SXU dictionary and UCI data sets, respectively, and by 14.5%, 2.9%, 4.2%, 6.4%, 5.3%, and 5.1% under the SXU dictionary and Weibo data sets, respectively. This indicates that SUCPF can get better emotion dictionary and emotion classification results. This is mainly because the proposed SUCPF framework is a semi-supervised framework that utilizes existing emotion dictionaries and manual annotations. The SUCPF method uses local context, topic features, and context relationships to extract explicit and implicit emotional features.
TABLE 6 Emotion Classification results in three general Emotion dictionaries in UCI data
TABLE 7 Emotion Classification results in three general Emotion dictionaries in Weibo data
Fig. 9 and 10 show 50 nominal emotional features extracted on two microblog data sets. It can be seen that these word expressions strongly imply positive or negative emotions and views. Different microblog data sets have the same implicit emotional characteristics, such as 'bad eggs', 'cheating', and 'mushrooms'. The present invention is also able to discover new emotional characteristics, including explicit emotional characteristics such as "bad eggs", "Shenjing disease" and "weak people", and implicit emotional characteristics such as "fire fighters", "tide men" and "industry sickness". This indicates that users are accustomed to using implicit features in social media such as microblogs to indirectly express their sentiment about products, services, etc. The invention has good field adaptability and the identification capability for explicit and implicit emotional characteristics.
In conclusion, compared with the similar representative method, the microblog specific emotion dictionary containing the explicit emotion characteristics and the implicit emotion characteristics is constructed by using the strategy of combining the rules and statistics, the overall calculation accuracy is higher, the stability is higher, the domain-specific emotion dictionary can be effectively constructed, and the explicit emotion characteristics and the implicit emotion characteristics can be accurately extracted.
The accompanying drawings and the detailed description are included to provide a further understanding of the invention. The method of the present invention is not limited to the examples described in the specific embodiments, and other embodiments derived from the method and idea of the present invention by those skilled in the art also belong to the technical innovation scope of the present invention. This summary should not be construed to limit the present invention.
Claims (3)
1. A specific emotion dictionary generation method for Chinese microblog emotion classification is characterized by comprising the following steps of:
s1, changing the microblog corpus D to { D ═ D1,d2,…dlPreprocessing, and extracting a plurality of emotion units T through lexical analysis and syntactic analysisiAnd a plurality of said emotion units TiSet T ═ T as emotion unit1,T2,…TnWherein i and n are positive integers, i is more than or equal to 1 and less than or equal to n, and T is definediN is a negative indicator, D is a degree adverb, E is an evaluation word, and P is an emotional polarity;
s2, constructing an emotion propagation graph G (V, E, W) based on the emotion unit set T, wherein V is a set of emotion units, E is a set of edges, W is a weight matrix between emotion units, and calculating the emotion units TiStandard centrality of H (T)i) And according to the standard centrality H (T)i) For a plurality of emotion units TiSorting is carried out, and the first M are selected as seed emotion unit sets TsAnd carrying out emotion label labeling on the seed emotion units by using a general emotion dictionary and manual labeling, wherein M is<n, wherein M/n is more than or equal to 20 percent, and M, n is a positive integer;
s3, applying emotion transmission algorithm to complete labeled seed emotion unit set TsTransmitting the emotion to n-M emotion units without labels, and respectively acquiring the emotion score of each emotion unit in the n-M emotion units;
step S3 further includes:
s31, marking the initial emotion score vector of the emotion unit as score (t):
score(T)=[score(T1),score(T2),…score(Tn)]
normalization to score (t):
wherein is TposSet of positive emotion units, TnegA set of negative emotion units;
s32, removing the smaller connecting edge of the graph G to carry out pruning operation, wherein, k larger values are reserved for each row of the matrix P', the rest are assigned with 0, so as to determine k larger units of each emotion unit as emotion neighbors,wherein, P' is a probability transition matrix;
n is the total number of emotion units in the corpus, niIs TiDegrees, hits (T) in graph Gi) Is T in corpusiNumber of occurrences, hits (T)j) Is T in corpusjNumber of occurrences, hits (T)i,Tj) Is TiAnd TjThe frequency of occurrence in the same window under the local and social context, and the relationship matrix among the emotion units is marked as Pij;
S33, defining the probability transition matrix of emotion propagation as follows:
wherein, beta belongs to [0,1] as an adaptive parameter, A is a matrix with partial rows of 1/n and the rest rows of 0, the purpose of adding the matrix A is to ensure that the matrix P' has no rows other than 0, and J is a matrix with all elements of 1/n;
s34, the process of emotion label propagation is defined as follows:
wherein score (T)i t+1) Is TiSentiment score under t +1 iteration, alpha ∈ [0,1]]In order to be a weight parameter, the weight parameter,is a matrixLine i of (1), score (T)t) The emotion component vector of the T under the T iteration; at each iteration, we compute score (T) in the order of i ═ 1: ni t+1) Each time a new one is obtainedJust update score (T)i t+1);
S35, when the iteration stops, according to the normalization of score (T),
s4, obtaining the emotion score of each emotion word in each emotion unit according to the emotion score of each emotion unit to obtain a microblog specific emotion dictionary L containing explicit emotion characteristics and implicit emotion characteristicsspe;
Step S4 further includes:
s41, calculating the emotion score of the emotion characteristics according to the emotion scores of the emotion units:
wherein n (w)i) Is the word wiFrequency of occurrence in corpus, score (N) in emotional unit TiScore (D) is in units of sentiment TiThe degree of the middle is divided;
s42, acquiring a microblog specific emotion dictionary L containing explicit emotion characteristics and implicit emotion characteristicsspe;
S5, according to the microblog specific emotion dictionary LspeClassifying the emotions of the microblog linguistic data;
the explicit emotional characteristics refer to emotional words, phrases and idioms which directly express emotion on entities or aspects, and the implicit emotional characteristics refer to characteristics which express positive or negative emotion without obvious emotional indicators.
2. The method for generating the specific emotion dictionary for Chinese microblog emotion classification according to claim 1, wherein the step S1 includes:
s11, filtering out microblog data set D ═ D by using a rule method1,d2,…dlThe information of links, stop words, repeated words and noise in the Chinese language is obtained;
s12, using the part-of-speech tagging tool pair D ═ D1,d2,…dlLexical analysis is performed and a dependent parsing tool is used to pair D ═ D1,d2,…dlCarrying out syntactic analysis;
s13, extracting D ═ D1,d2,…dlAdjectives, verbs, adverbs and nouns in the Chinese are taken as candidate emotional characteristics W ═ W1,w2,…wnAnd filtering out low-frequency words;
s14, extracting negative modification relations and degree modification relations in the dependency syntax analysis result;
s15, using the extracted candidate emotion characteristics, negative modification relation and degree modification relation as emotion units TiA plurality of standsThe emotion units form an emotion unit set T ═ T1,T2,…TN}。
3. The method for generating the specific emotion dictionary for Chinese microblog emotion classification according to claim 1, wherein the step S5 further comprises:
the microblog-specific emotion dictionary LspeApplying to sentiment classification, wherein explicit and implicit sentiment features and 20 semantic synthesis rules in the Chinese microblog field are considered, and microblog diSentiment score of (d)i) And summing all the emotion units to determine the final emotion polarity according to the emotion marks of the microblogs.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811145088.0A CN109376239B (en) | 2018-09-29 | 2018-09-29 | Specific emotion dictionary generation method for Chinese microblog emotion classification |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811145088.0A CN109376239B (en) | 2018-09-29 | 2018-09-29 | Specific emotion dictionary generation method for Chinese microblog emotion classification |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109376239A CN109376239A (en) | 2019-02-22 |
CN109376239B true CN109376239B (en) | 2021-07-30 |
Family
ID=65402516
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811145088.0A Active CN109376239B (en) | 2018-09-29 | 2018-09-29 | Specific emotion dictionary generation method for Chinese microblog emotion classification |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109376239B (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110083726B (en) * | 2019-03-11 | 2021-10-22 | 北京比速信息科技有限公司 | Destination image perception method based on UGC picture data |
CN110032646B (en) * | 2019-05-08 | 2022-12-30 | 山西财经大学 | Cross-domain text emotion classification method based on multi-source domain adaptive joint learning |
CN110489553B (en) * | 2019-07-26 | 2022-07-05 | 湖南大学 | Multi-source information fusion-based emotion classification method |
CN110489522B (en) * | 2019-07-26 | 2022-04-12 | 湖南大学 | Emotional dictionary construction method based on user score |
CN111523300B (en) * | 2020-04-14 | 2021-03-05 | 北京精准沟通传媒科技股份有限公司 | Vehicle comprehensive evaluation method and device and electronic equipment |
CN112000804B (en) * | 2020-08-18 | 2022-08-02 | 安徽理工大学 | Microblog hot topic user group emotion tendentiousness analysis method |
CN112632272B (en) * | 2020-10-20 | 2022-07-19 | 浙江工业大学 | Microblog emotion classification method and system based on syntactic analysis |
CN113326694B (en) * | 2021-05-18 | 2022-09-30 | 西华大学 | Implicit emotion dictionary generation method based on emotion propagation |
CN115080689A (en) * | 2022-06-15 | 2022-09-20 | 昆明理工大学 | Label association fused hidden space data enhanced multi-label text classification method |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103544242A (en) * | 2013-09-29 | 2014-01-29 | 广东工业大学 | Microblog-oriented emotion entity searching system |
WO2015053607A1 (en) * | 2013-10-10 | 2015-04-16 | Mimos Berhad | System and method for semantic-level sentiment analysis of text |
CN106372058A (en) * | 2016-08-29 | 2017-02-01 | 中译语通科技(北京)有限公司 | Short text emotion factor extraction method and device based on deep learning |
CN106503049A (en) * | 2016-09-22 | 2017-03-15 | 南京理工大学 | A kind of microblog emotional sorting technique for merging multiple affection resources based on SVM |
CN106547842A (en) * | 2016-10-14 | 2017-03-29 | 华东师范大学 | A kind of method that location-based emotion is visualized on virtual earth platform |
US9633007B1 (en) * | 2016-03-24 | 2017-04-25 | Xerox Corporation | Loose term-centric representation for term classification in aspect-based sentiment analysis |
CN106776554A (en) * | 2016-12-09 | 2017-05-31 | 厦门大学 | A kind of microblog emotional Forecasting Methodology based on the study of multi-modal hypergraph |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110320387A1 (en) * | 2010-06-28 | 2011-12-29 | International Business Machines Corporation | Graph-based transfer learning |
-
2018
- 2018-09-29 CN CN201811145088.0A patent/CN109376239B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103544242A (en) * | 2013-09-29 | 2014-01-29 | 广东工业大学 | Microblog-oriented emotion entity searching system |
WO2015053607A1 (en) * | 2013-10-10 | 2015-04-16 | Mimos Berhad | System and method for semantic-level sentiment analysis of text |
US9633007B1 (en) * | 2016-03-24 | 2017-04-25 | Xerox Corporation | Loose term-centric representation for term classification in aspect-based sentiment analysis |
CN106372058A (en) * | 2016-08-29 | 2017-02-01 | 中译语通科技(北京)有限公司 | Short text emotion factor extraction method and device based on deep learning |
CN106503049A (en) * | 2016-09-22 | 2017-03-15 | 南京理工大学 | A kind of microblog emotional sorting technique for merging multiple affection resources based on SVM |
CN106547842A (en) * | 2016-10-14 | 2017-03-29 | 华东师范大学 | A kind of method that location-based emotion is visualized on virtual earth platform |
CN106776554A (en) * | 2016-12-09 | 2017-05-31 | 厦门大学 | A kind of microblog emotional Forecasting Methodology based on the study of multi-modal hypergraph |
Non-Patent Citations (5)
Title |
---|
Large scale and parallel sentiment analysis based on Label Propagation in Twitter Data;Yibing Yang等;《2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications》;20180906;第1791-1798页 * |
Semi-supervised polarity lexicon induction;Delip Rao等;《EACL "09: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics》;20090331;第675–682页 * |
基于分类的微博新情感词抽取方法和特征分析;刘德喜等;《计算机学报》;20180731;第1574-1597页 * |
基于情感分析的评论挖掘技术研究;雷小惠;《中国优秀硕士学位论文全文数据库 信息科技辑》;20180415;第I138-3747页 * |
王志昊.情感分类特征选择方法研究.《中国优秀硕士学位论文全文数据库 信息科技辑》.2014, * |
Also Published As
Publication number | Publication date |
---|---|
CN109376239A (en) | 2019-02-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109376239B (en) | Specific emotion dictionary generation method for Chinese microblog emotion classification | |
Fernández-Gavilanes et al. | Unsupervised method for sentiment analysis in online texts | |
Ahmed et al. | SentiCR: A customized sentiment analysis tool for code review interactions | |
Saputri et al. | Emotion classification on indonesian twitter dataset | |
Fernández-Gavilanes et al. | Creating emoji lexica from unsupervised sentiment analysis of their descriptions | |
Kunneman et al. | Signaling sarcasm: From hyperbole to hashtag | |
US9336192B1 (en) | Methods for analyzing text | |
Gamal et al. | Twitter benchmark dataset for Arabic sentiment analysis | |
Rokade et al. | Business intelligence analytics using sentiment analysis-a survey | |
CN108920455A (en) | A kind of Chinese automatically generates the automatic evaluation method of text | |
CN110765769A (en) | Entity attribute dependency emotion analysis method based on clause characteristics | |
Tan et al. | Usaar-wlv: Hypernym generation with deep neural nets | |
Fernández-Gavilanes et al. | GTI: An unsupervised approach for sentiment analysis in Twitter | |
Gosai et al. | A review on a emotion detection and recognization from text using natural language processing | |
Lertpiya et al. | A preliminary study on fundamental Thai NLP tasks for user-generated web content | |
Padmamala et al. | Sentiment analysis of online Tamil contents using recursive neural network models approach for Tamil language | |
Krommyda et al. | Emotion detection in Twitter posts: a rule-based algorithm for annotated data acquisition | |
Nahar et al. | Sentiment analysis and emotion extraction: A review of research paradigm | |
Nugraheni | Indonesian twitter data pre-processing for the emotion recognition | |
Kanev et al. | Sentiment analysis of multilingual texts using machine learning methods | |
Bahrainian et al. | Fuzzy subjective sentiment phrases: A context sensitive and self-maintaining sentiment lexicon | |
Bracewell | Semi-automatic creation of an emotion dictionary using wordnet and its evaluation | |
Hathout | Acquisition of morphological families and derivational series from a machine readable dictionary | |
Huangfu et al. | OCC model-based emotion extraction from online reviews | |
Litkowski | Desiderata for tagging with wordnet synsets or mcca categories |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |