CN109376239B

CN109376239B - Specific emotion dictionary generation method for Chinese microblog emotion classification

Info

Publication number: CN109376239B
Application number: CN201811145088.0A
Authority: CN
Inventors: 赵传君; 王素格; 李德玉
Original assignee: Shanxi University
Current assignee: Shanxi University
Priority date: 2018-09-29
Filing date: 2018-09-29
Publication date: 2021-07-30
Anticipated expiration: 2038-09-29
Also published as: CN109376239A

Abstract

The invention discloses a method for generating a specific emotion dictionary for Chinese microblog emotion classification. And finally, completing emotion transmission from the seed emotion unit set with the tag to the emotion units without the tag through an emotion transmission algorithm, acquiring emotion marks of all emotion words in all emotion units, obtaining a microblog specific emotion dictionary containing explicit emotion characteristics and implicit emotion characteristics, and classifying the emotions of the microblog corpus according to the microblog specific emotion dictionary. Compared with a similar representative method, the method has higher overall calculation accuracy and higher stability, can effectively construct a domain-specific emotion dictionary, and accurately extracts explicit and implicit emotional features.

Description

Specific emotion dictionary generation method for Chinese microblog emotion classification

Technical Field

The invention relates to the field of computer social media text sentiment analysis, and provides a method for generating a specific sentiment dictionary for Chinese microblog sentiment classification.

Background

The construction of the emotion dictionary is a basic and important aspect of the emotion analysis task. Where a word or phrase is an essential element that expresses positive or negative emotions. At present, microblogs become a fashionable communication mode on the Internet. The individual users can freely, conveniently and instantly express their opinions on products and public events through the media such as the Xinlang microblog and the like. Due to the extremely short length and rich vocabulary of the microblog, the vector space model represented by the microblog is very sparse. Therefore, the dictionary method is more suitable for microblog emotion analysis. However, the emotion vocabulary used in different fields is different, and the same word appearing in different fields may express different views, which results in diversity of emotion expression and semantic variability of the emotion vocabulary. Because the emotion of a word or a phrase often depends on a specific field, and manual labeling of emotion words by using a general emotion dictionary in a specific field is time-consuming and labor-consuming, the general emotion dictionary cannot well classify the emotion of the specific field and cannot meet the requirement of emotion classification of the specific field.

Methods such as linguistic rule-based methods, corpus-based methods, and dictionary-based methods have been proposed in the prior art to automatically construct domain-specific emotion dictionaries. However, because expression forms of the grammar of the microblog are various, a user does not follow grammar rules when organizing the microblog, and therefore, the method based on the language rules cannot be suitable for all situations of the microblog; the corpus-based method depends heavily on the size of the corpus, and the dictionary-based method is closely related to the quality of the general emotion dictionary. Therefore, the three methods are not suitable for constructing the emotional dictionary in the microblog specific field.

In addition, conventional emotion analysis studies have focused on Explicit emotional Features (Explicit sentimental Features) such as "goodness", "dislike", and "like". One obvious element of these emotional characteristics is the presence of a clear emotional indication. These affective words, phrases and idioms that directly express emotion on an entity or aspect are referred to as explicit emotional characteristics. In fact, many users use linguistic paraphrases or statements of facts to express an inclusive indirect emotion. Implicit emotional Features (Implicit Sentiment Features) generally refer to Features that express positive or negative emotions without explicit emotional indicators. These features often state a fact or indirectly express an emotion. Four microblogs including the implicit emotional characteristics "cutworm", "flood and wilderness", "navy" and "five-mao special effect" are shown in fig. 3. The identification of implicit emotional features has been a challenging problem because there is no emotional indication at all.

Disclosure of Invention

The method aims to construct a microblog specific emotion dictionary by extracting dominant and recessive emotional features, and carry out emotion classification on the microblog according to the emotion polarity (positive, negative and neutral) of the emotional features.

In order to achieve the purpose, the invention provides a method for generating a specific emotion dictionary for Chinese microblog emotion classification aiming at construction of microblog specific emotion vocabularies and the mentioned implicit emotion characteristic, which comprises the following steps:

s1, changing the microblog corpus D to { D ═ D₁,d₂,…d_lPreprocessing, and extracting a plurality of emotion units T through lexical analysis and syntactic analysis_iAnd a plurality of said emotion units T_iSet T ═ T as emotion unit₁,T₂,…T_nWherein i and n are positive integers, i is more than or equal to 1 and less than or equal to n, and T is defined_iN is a negative indicator, D is a degree adverb, E is an evaluation word, and P is an emotional polarity;

s2, constructing an emotion propagation graph G (V, E, W) based on the emotion unit set T, wherein V is a set of emotion units, E is a set of edges, W is a weight matrix between emotion units, and calculating the emotion units T_iStandard centrality of H (T)_i) And according to the standard centrality H (T)_i) Descending order for a plurality of emotion units T_iThe sorting is carried out, and the sorting is carried out,selecting the first M as a seed emotion unit set T^sAnd carrying out emotion label labeling on the seed emotion units by using a general emotion dictionary and manual labeling, wherein M is<n, wherein M/n is more than or equal to 20 percent, and M, n is a positive integer;

s3, applying emotion transmission algorithm to complete labeled seed emotion unit set T^sTransmitting the emotion to n-M emotion units without labels, and respectively acquiring the emotion score of each emotion unit in the n-M emotion units;

s4, obtaining the emotion score of each emotion word in each emotion unit according to the emotion score of each emotion unit to obtain a microblog specific emotion dictionary L containing explicit emotion characteristics and implicit emotion characteristics_spe；

S5, according to the microblog specific emotion dictionary L_speAnd classifying the emotions of the microblog linguistic data.

According to the method for generating the specific emotion dictionary for classifying Chinese microblog emotions, which is provided by the embodiment of the invention, microblog linguistic data are preprocessed, a plurality of emotion units are selected, then constructing an emotion propagation map by using a plurality of emotion units, calculating the standard centrality of the emotion units, then the emotion units are arranged according to the standard centrality degree and the size, the first M emotion units are selected as seed emotion units, then, the emotion label marking is carried out on the seed emotion units through the general emotion dictionary and the manual marking, and finally, the emotion transmission of the seed emotion unit set with the label to the emotion unit without the label is completed through an emotion transmission algorithm, and obtaining the emotion marks of all emotion units in all emotion units to obtain a microblog specific emotion dictionary containing explicit emotion characteristics and implicit emotion characteristics, and classifying the emotions of the microblog corpus according to the microblog specific emotion dictionary.

According to an embodiment of the present invention, the step S1 includes:

s11, filtering out microblog data set D ═ D by using a rule method₁,d₂,…d_lThe information of links, stop words, repeated words and noise in the Chinese language is obtained;

s12, using the part-of-speech tagging tool pair D ═ D₁,d₂,…d_lLexical analysis is performed and a dependent parsing tool is used to pair D ═ D₁,d₂,…d_lCarrying out syntactic analysis;

s13, extracting D ═ D₁,d₂,…d_lAdjectives, verbs, adverbs and nouns in the Chinese are taken as candidate emotional characteristics W ═ W₁,w₂,…w_nAnd filtering out low-frequency words;

s14, extracting negative modification relations and degree modification relations in the dependency syntax analysis result;

s15, using the extracted candidate emotion characteristics, negative modification relation and degree modification relation as emotion units T_iA plurality of emotion units form an emotion unit set T ═ { T ═ T₁,T₂,…T_N}。

According to one embodiment of the invention, the standard centrality is

Wherein the content of the first and second substances,

n is the total number of emotion units in the corpus, n_iIs T_iDegrees, hits (T) in graph G_i) Is T in corpus_iNumber of occurrences, hits (T)_j) Is T in corpus_jNumber of occurrences, hits (T)_i,T_j) Is T_iAnd T_jThe frequency of occurrence in the same window under the local and social context, and the relationship matrix among the emotion units is marked as P_ij。

According to an embodiment of the present invention, step S3 further includes:

s31, marking the initial emotion score vector of the emotion unit as score (t):

score(T)＝[score(T₁),score(T₂),…score(T_n)]

normalization to score (t):

wherein is T_posSet of positive emotion units, T_negA set of negative emotion units;

s32, removing the smaller connecting edge of the graph G to carry out pruning operation, wherein, k larger values are reserved for each row of the matrix P', the rest are assigned with 0, so as to determine k larger units of each emotion unit as emotion neighbors,

wherein, P' is a probability transition matrix;

s33, defining the probability transition matrix of emotion propagation as follows:

wherein, beta belongs to [0,1] as an adaptive parameter, A is a matrix with partial rows of 1/n and the rest rows of 0, and J is a matrix with all elements of 1/n. The purpose of adding matrix A is to ensure that matrix P' has no non-0 rows;

s34, the process of emotion label propagation is defined as follows:

wherein

Is T_iSentiment score under t +1 iteration, alpha ∈ [0,1]]In order to be a weight parameter, the weight parameter,

is a matrix

Line i of (1), score (T)^t) The emotion component vector of the T under the T iteration; at each iteration, we compute in order of i ═ 1: n

Each time a new one is obtained

Is convenient to update

S35, when the iteration stops, according to normalizing score (t):

according to an embodiment of the present invention, step S4 further includes:

s41, calculating the emotion score of the emotion characteristics according to the emotion scores of the emotion units:

wherein n (w)_i) Is the word w_iFrequency of occurrence in corpus, score (N) in emotional unit T_iScore (D) is in units of sentiment T_iDegree of (5) is divided into degrees.

S42, acquiring a microblog specific emotion dictionary L containing explicit emotion characteristics and implicit emotion characteristics_spe。

According to an embodiment of the present invention, step S5 further includes:

the microblog-specific emotion dictionary L_speThe method is applied to sentiment classification, wherein explicit and implicit sentiment features and 20 semantic synthesis rules in the Chinese microblog field are considered, andand microblog d_iSentiment score of (d)_i) And summing all the emotion units to determine the final emotion polarity according to the emotion marks of the microblogs.

Compared with the prior art, the invention has the following beneficial effects: (1) the invention provides a new emotion unit emotion spreading frame which can generate a microblog specific emotion dictionary on a Chinese microblog data set, wherein the generated dictionary comprises explicit emotion characteristics and implicit noun or noun phrase emotion characteristics; (2) the social relationships, topic features and local context information are used to construct an emotion propagation graph and its adjacency matrix. The emotion propagation algorithm propagates emotion from labeled cells to unlabeled cells. Through emotion label propagation, emotion scores of explicit and implicit emotion characteristics can be obtained; (3) the proposed framework is verified on two microblog datasets, UCI and Weibo. Experimental results prove that the method can generate a high-quality microblog specific emotion dictionary, obtain a better result in the emotion classification task and improve the emotion classification accuracy.

Drawings

FIG. 1 is a flowchart of a method for generating a specific emotion dictionary for Chinese microblog emotion classification according to an embodiment of the invention.

FIG. 2 is a frame diagram for generating Chinese domain-specific emotion dictionary fusing social relations and local contexts.

Fig. 3 is a micro-blog containing four implicit features "cutworm", "flood force", "navy", and "five mao effect".

FIG. 4 is a flow chart of three microblog processes under the context propagation framework of emotion units.

FIG. 5 is a diagram of local context and social context of a target sentiment unit in a microblog.

FIG. 6 is a microbump diagram from the relationship of the Sina microblog users and their releases.

Fig. 7 is a graph of the matching rate of the emotion dictionary obtained under the UCI data set 5-fold cross validation.

FIG. 8 is a chart of the matching rate of the emotion dictionaries obtained under 5-fold cross validation of the Weibo data set.

Fig. 9 is a word cloud graph of nominal emotional features on a UCI data set.

FIG. 10 is a word cloud of nominal emotional features on a Weibo data set.

FIG. 11 is the sentiment classification results (positive, negative and neutral) under the UCI data set.

FIG. 12 is the sentiment classification results (positive and negative) under the Weibo data set.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

As shown in fig. 1 to 12, the framework of the invention is essentially divided into the following five steps, which are connected layer by layer and finally fused. The learning process mainly comprises the following steps:

s1, changing the microblog corpus D to { D ═ D₁,d₂,…d_lPreprocessing, and extracting a plurality of emotion units T through lexical analysis and syntactic analysis_iAnd a plurality of said emotion units T_iSet T ═ T as emotion unit₁,T₂,…T_nWherein i and n are positive integers, i is more than or equal to 1 and less than or equal to n, and T is defined_iN is a negative indicator, D is a degree adverb, E is an evaluation word, and P is an emotional polarity.

Wherein, step S1 includes: s11, filtering out microblog data set D ═ D by using a rule method₁,d₂,…d_lThe information of links, stop words, repeated words and noise in the Chinese language is obtained;

In the preprocessing stage, links, stop words, repeated words and other noise information in the microblog are filtered out firstly. Lexical and syntactic analyses are performed using part-of-speech tagging and dependency parsing tools in the Hadamard Language Technology Platform (LTP). For a specific character in the microblog, firstly, extracting topic label information marked by #, user relations marked by @ and emoticons and the like. Words or phrases that may express emotion are defined as candidate explicit and implicit emotional features. Lexical analysis may analyze word tagging information (Parts of Speech, POS), topic characteristics, user relationships in the social network, and the like. We extract adjectives, verbs, adverbs and nouns as candidate emotional features and filter out words with low frequency. Dependency parsing reveals their syntactic structure by analyzing dependencies between linguistic elements. And extracting negative modification relations and degree modification relations in the dependency syntax analysis result. In addition, the microblog emotion data is marked as D ═ D₁,d₂,…d_l}，W＝{w₁,w₂,…w_nIs the set of potential emotion features, where W_kAnd the emotion words or phrases are more than or equal to 1 and less than or equal to n. N is a negative indicator such as "no", "not" and "none", etc. D is a degree adverb such as "completely" and "equivalently" and the like.

Wherein, the emotion unit set is marked as T ═ { T ═ T₁,T₂,…T_nWhere T is_iIs an emotion unit, i is more than or equal to 1 and less than or equal to n. T is^sIs a set of seed sentiment units, | T^sAnd M. The adverbs and negative indicators of degree affect the emotional intensity and polarity of the emotion. For example, as shown in FIG. 4, in the emotion units (not, good, negative), "good" is a positive emotion word, and "very" enhances the degree of negative emotion.

S2, constructing emotion sensor based on emotion unit set TCalculating the emotion unit T by (V, E, W), wherein V is a set of emotion units, E is a set of edges, and W is a weight matrix between the emotion units_iStandard centrality of H (T)_i) And according to the standard centrality H (T)_i) Descending order for a plurality of emotion units T_iSorting is carried out, and the first M are selected as seed emotion unit sets T^sAnd carrying out emotion label labeling on the seed emotion units by using a general emotion dictionary and manual labeling, wherein M is<n, and M/n is more than or equal to 20 percent, and M, n is a positive integer.

It should be noted that the local context and social relationship are used to construct a connection graph between emotion units. The local context of the sentiment unit comprises other sentiment units in the related microblogs, namely comment information of the related microblogs and other microblogs issued by the same microblog user. The social relationship context of the emotional unit comprises forwarding and replying information, theme characteristics, user relationship information and the like.

In addition, a specific emotion unit T_iThe local context and social relationship context of (a) is shown in fig. 5. Wherein, the target emotion unit T in the microblog d_iA local context and social relationship context diagram. T is_jAnd T_kIs T_iThe local context of (a). T is_iIncludes d₁、d₂And d₃Wherein d is₁Is a replied or forwarded microblog, d₂Representative and T_iMicroblogs with a common topic, d₃Is a microblog with a user relationship to d.

It should be noted that, constructing the emotion propagation graph requires determining the neighbor relation between two emotion units, so that the propagation window is defined as a specific unit T_iThe first n emotion units and the last n emotion units. A co-occurrence relationship refers to a co-occurrence relationship where two words appear under the same window. Co-occurrence relationships and semantic rules can be used to infer the sentiment polarity of an unlabeled sentiment unit. If the probability that two emotion units appear in the same window is high, the mutual information of the two emotion units is also high.

For example, the user relationships from the Sing microblog and the microblog and emotion polarities of the postings are shown in FIG. 6. The relation of attention and attention with direction is formed among different users, for example, User A is a fan of User B, and User B does not pay attention to User A. The label "@ User a" indicates that the relevant context is for User a. The topic labels marked by "#" and "#" imply the topic of the microblog, for example, in the microblog "# popular glow # which is actually a cutworm", it can be inferred that the word microblog is about # popular glow #. In addition, different users may discuss the same theme, such as # Volkswagen # and # Revere Sound #. In FIG. 6, the microblog issued by User D "unparalleled Security, complimentary detail Process! "in," unequally "and" praise "both express positive emotional tendencies.

Besides the co-occurrence relationship, the semantic relationship between the emotion units can be directly judged according to the conjunctions. The emotion units connected by parallel conjunctions have higher probability of possessing the same emotion polarity and similar emotion intensity. For example, in the microblog "super bass" issued by User C, like sounds of nature # refocusing good sound #, positive emotional tendency is expressed by the connected (-super bass) and (-sounds of nature) of "same". And two emotion units connected by a disjunct have opposite emotion polarities. For example, the microblog "experience published at User A is not very good, but I" porridge "! "in, two clauses linked by" but "express opposite emotional tendencies.

In addition to conjunctions, the linguistic-semantic rules of Table 1 can be utilized to determine semantic relationships between emotion units. Wherein, rules 1-11 show the emotion element synthesis rules of part of speech and synthesis rules. Rules 12-17 show 6 rules for conditional clause emotion synthesis. In addition to these semantic rules, virtual mood, doubtful sentences, sarcasm and reflexions need to be considered. These three sentences that require special handling are shown in rules 18-20. The virtual mood is intended to indicate that the speaker is not a fact, but rather a hypothesis, wish, suspicion, or guess. Although there may be emotional indicators in the question sentence, the sentence does not express any emotion. Irony or irony causes the reversal of the emotional polarity of the microblog.

Table 1 20 emotional semantic synthesis rules in Chinese microblog field

After obtaining the relationship between two emotion units, T_iAnd T_jThe edge between is defined as p_ij＝PMI(T_i,T_j)

Wherein n is the total number of emotion units in the corpus, hits (T)_i) Is T in corpus_iNumber of occurrences, hits (T)_j) Is T in corpus_jNumber of occurrences, hits (T)_i,T_j) Is T_iAnd T_jNumber of times under the same window in both local and social context. The relationship matrix among the emotion units is marked as P_n×n。

Furthermore, to measure the connectivity of nodes (emotion units) in the emotion propagation graph G, T is defined_iIs connected to T_jConnection p_i′_jThe probability is as follows:

wherein n is_iIs a sum of T_iThere are the number of connected nodes. After the degree of connection is calculated, emotion unit T is calculated_iStandard centrality of H (T)_i) It is defined as follows:

wherein n is the total number of emotion units, n_iIs T_iDegree (Degree) in FIG. G. A node with a higher degree of standard centrality indicates that the node is more important in the network, and therefore the degree of standard centrality is used to measure the connectivity of the node in graph G.

If H is(T_i) The higher, the higher T_iThe higher the connectivity; otherwise, the lower. Due to the higher value of H (T)_i) Shows T_iThe greater the contribution to the propagation of the affective tag. Thus, according to H (T)_i) Magnitude of value for all T_iSorting and selecting the first M emotion units as a seed emotion unit set T^sThus, the seed emotion unit can provide the correct emotion source for the emotion propagation map G. The specific seed emotion unit selection algorithm is shown in table 2.

Table 2 selection algorithm of seed emotion unit

After obtaining the seed emotion unit, we use a general emotion dictionary and manual calibration to label it for emotion. The existing emotion dictionary does not contain network words, and network popular words such as 'mountain village', 'male' and 'mulberry heart' are added into the general emotion dictionary. The manual calibration is to check the labels of the seed emotion units and correct the inaccurate labels.

S3, applying emotion transmission algorithm to complete labeled seed emotion unit set T^sAnd transmitting the emotion to the n-M emotion units without labels, and respectively acquiring the emotion score of each emotion unit in the n-M emotion units.

Step S3 further includes:

s31, marking the initial emotion score vector of the emotion unit as score (t):

score(T)＝[score(T₁),score(T₂),…score(T_n)]

normalization to score (t):

wherein, P' is a probability transition matrix;

s34, the process of emotion label propagation is defined as follows:

wherein

is a matrix

Each time a new one is obtained

Is convenient to update

S35, when the iteration stops, according to normalizing score (t):

TABLE 3 Emotion propagation Algorithm

S4, obtaining the emotion score of each emotion word in each emotion unit according to the emotion score of each emotion unit to obtain a microblog specific emotion dictionary L containing explicit emotion characteristics and implicit emotion characteristics_spe。

Step S4 further includes:

Step S5 further includes: the microblog-specific emotion dictionary L_speApplying to sentiment classification, wherein explicit and implicit sentiment features and 20 semantic synthesis rules in the Chinese microblog field are considered, and microblog d_iSentiment score of (d)_i) And summing all the emotion units to determine the final emotion polarity according to the emotion marks of the microblogs.

Table 4 parameter setting under three emotion dictionaries of Hownet, NTUSD and SXU and UCI and Weibo data sets

It should be noted that some parameters are set and detailed in the model learning process. Table 4 shows the parameter set of the context propagation framework of emotion unit as Φ ═ H, M, k, T, α, β }. Where H is the size of the window in the seed emotion unit selection algorithm, which determines the context of the particular unit. M is the number of seed emotion units. k is the number of emotion units of the emotion propagation map, which affects the number of iterations and the stability of the computation. T is the number of iterations. The convergence of score (T) depends directly on the emotion propagation matrix

α is the update rate and β is the weight parameter. The present invention tests various combinations of parameters to find the optimum combination of parameters as shown in table 4.

Fig. 7 and 8 show the consistency ratio of the emotion dictionary obtained by 5-fold cross validation. And obtaining the emotional polarity and strength of the explicit and implicit emotional characteristics through emotional tag propagation. The coincidence rate refers to the percentage of the emotional features and polarities in the two emotional dictionaries that are consistent. The emotional dictionaries generated by the different data sets are not of uniform size and therefore the coherence rate confusion matrix is asymmetric. And combining the 5-fold cross-validation data set to generate an emotion dictionary as a final emotion dictionary. In particular, the present invention uses a voting method to determine the final emotional propensity of an emotional feature.

The emotion dictionary generated in Table 5 includes four parts of speech and percentages

Table 5 shows emotion dictionary details generated under the UCI and Weibo data sets. As can be seen from Table 5, the UCI emotion dictionary contains 11428 emotion features, of which 8665 explicit emotion features and 2763 implicit emotion features. The Weibo emotion dictionary contains 24415 emotional features, 20189 explicit emotional features and 4226 implicit emotional features. Most of the general emotion dictionaries are verbs and adjectives, while the emotional features of the nominal part account for 2763 (24.2%) and 4226 (17.5%) of the generated UCI and Weibo microblog-specific emotion dictionaries, respectively. This indicates that the implicit emotional features are the main emotional indicators in the microblog emotional expressions. Small adverbs such as "cluttered," "happy," and "parsimonious" also express emotional tendencies. The emotional features of adverbs account for 185 (1.6%) in the UCI emotion dictionary and 308 (1.3%) in the Weibo emotion dictionary. In both types of emotion dictionaries, negative emotion features are more redundant than positive emotion features. Negative and positive features account for 6322 (55.2%) and 5106 (44.8%) in the UCI emotion dictionary, 12935 (53.0%) and 11480 (47.0%) in the Weibo emotion dictionary, respectively.

Tables 6 and 7 show the emotion classification results of the proposed method and baseline system in the UCI and Weibo data sets, and fig. 11 and 12 show the overall accuracy comparison. It can be seen that the accuracy of the SUCPF method is improved compared with the other five baseline methods. For example, the SUCPF method was improved by 7.1%, 4.8%, 5.9%, 5.4%, 4.5%, and 6.2% under the SXU dictionary and UCI data sets, respectively, and by 14.5%, 2.9%, 4.2%, 6.4%, 5.3%, and 5.1% under the SXU dictionary and Weibo data sets, respectively. This indicates that SUCPF can get better emotion dictionary and emotion classification results. This is mainly because the proposed SUCPF framework is a semi-supervised framework that utilizes existing emotion dictionaries and manual annotations. The SUCPF method uses local context, topic features, and context relationships to extract explicit and implicit emotional features.

TABLE 6 Emotion Classification results in three general Emotion dictionaries in UCI data

TABLE 7 Emotion Classification results in three general Emotion dictionaries in Weibo data

Fig. 9 and 10 show 50 nominal emotional features extracted on two microblog data sets. It can be seen that these word expressions strongly imply positive or negative emotions and views. Different microblog data sets have the same implicit emotional characteristics, such as 'bad eggs', 'cheating', and 'mushrooms'. The present invention is also able to discover new emotional characteristics, including explicit emotional characteristics such as "bad eggs", "Shenjing disease" and "weak people", and implicit emotional characteristics such as "fire fighters", "tide men" and "industry sickness". This indicates that users are accustomed to using implicit features in social media such as microblogs to indirectly express their sentiment about products, services, etc. The invention has good field adaptability and the identification capability for explicit and implicit emotional characteristics.

In conclusion, compared with the similar representative method, the microblog specific emotion dictionary containing the explicit emotion characteristics and the implicit emotion characteristics is constructed by using the strategy of combining the rules and statistics, the overall calculation accuracy is higher, the stability is higher, the domain-specific emotion dictionary can be effectively constructed, and the explicit emotion characteristics and the implicit emotion characteristics can be accurately extracted.

The accompanying drawings and the detailed description are included to provide a further understanding of the invention. The method of the present invention is not limited to the examples described in the specific embodiments, and other embodiments derived from the method and idea of the present invention by those skilled in the art also belong to the technical innovation scope of the present invention. This summary should not be construed to limit the present invention.

Claims

1. A specific emotion dictionary generation method for Chinese microblog emotion classification is characterized by comprising the following steps of:

s2, constructing an emotion propagation graph G (V, E, W) based on the emotion unit set T, wherein V is a set of emotion units, E is a set of edges, W is a weight matrix between emotion units, and calculating the emotion units T_iStandard centrality of H (T)_i) And according to the standard centrality H (T)_i) For a plurality of emotion units T_iSorting is carried out, and the first M are selected as seed emotion unit sets T^sAnd carrying out emotion label labeling on the seed emotion units by using a general emotion dictionary and manual labeling, wherein M is<n, wherein M/n is more than or equal to 20 percent, and M, n is a positive integer;

step S3 further includes:

s31, marking the initial emotion score vector of the emotion unit as score (t):

score(T)＝[score(T₁),score(T₂),…score(T_n)]

normalization to score (t):

wherein, P' is a probability transition matrix;

wherein the content of the first and second substances,

n is the total number of emotion units in the corpus, n_iIs T_iDegrees, hits (T) in graph G_i) Is T in corpus_iNumber of occurrences, hits (T)_j) Is T in corpus_jNumber of occurrences, hits (T)_i,T_j) Is T_iAnd T_jThe frequency of occurrence in the same window under the local and social context, and the relationship matrix among the emotion units is marked as P_ij；

wherein, beta belongs to [0,1] as an adaptive parameter, A is a matrix with partial rows of 1/n and the rest rows of 0, the purpose of adding the matrix A is to ensure that the matrix P' has no rows other than 0, and J is a matrix with all elements of 1/n;

s34, the process of emotion label propagation is defined as follows:

wherein score (T)_i ^t+1) Is T_iSentiment score under t +1 iteration, alpha ∈ [0,1]]In order to be a weight parameter, the weight parameter,

is a matrix

Line i of (1), score (T)^t) The emotion component vector of the T under the T iteration; at each iteration, we compute score (T) in the order of i ═ 1: n_i ^t+1) Each time a new one is obtained

Just update score (T)_i ^t+1)；

S35, when the iteration stops, according to the normalization of score (T),

Step S4 further includes:

wherein n (w)_i) Is the word w_iFrequency of occurrence in corpus, score (N) in emotional unit T_iScore (D) is in units of sentiment T_iThe degree of the middle is divided;

s42, acquiring a microblog specific emotion dictionary L containing explicit emotion characteristics and implicit emotion characteristics_spe；

S5, according to the microblog specific emotion dictionary L_speClassifying the emotions of the microblog linguistic data;

the explicit emotional characteristics refer to emotional words, phrases and idioms which directly express emotion on entities or aspects, and the implicit emotional characteristics refer to characteristics which express positive or negative emotion without obvious emotional indicators.

2. The method for generating the specific emotion dictionary for Chinese microblog emotion classification according to claim 1, wherein the step S1 includes:

s15, using the extracted candidate emotion characteristics, negative modification relation and degree modification relation as emotion units T_iA plurality of standsThe emotion units form an emotion unit set T ═ T₁,T₂,…T_N}。

3. The method for generating the specific emotion dictionary for Chinese microblog emotion classification according to claim 1, wherein the step S5 further comprises:

the microblog-specific emotion dictionary L_speApplying to sentiment classification, wherein explicit and implicit sentiment features and 20 semantic synthesis rules in the Chinese microblog field are considered, and microblog d_iSentiment score of (d)_i) And summing all the emotion units to determine the final emotion polarity according to the emotion marks of the microblogs.