Disclosure of Invention
The invention aims to firstly summarize text elements related to emotion analysis into a six-layer hierarchical system, each layer is provided with a representation method and an emotion representation formula corresponding to an upper layer, and on the basis, the weight of an emotion word can be obtained by combining the emotion tendencies of the upper layer and the lower layer of the unknown emotion word. Only the existing emotion dictionary resource is needed to be utilized, and other information such as the semantics of unknown emotion words is not needed.
The core idea of the invention is as follows: the emotion analysis task can be subdivided into the dimensionality of characters, emotion values of chapters can be obtained from the bottom layer to the upper layer, the emotion values of the lower layers of unknown emotion words can be obtained from the emotion values of the upper layer and the lower layer of the unknown emotion words, the emotion values of the lower layers of the unknown emotion words can be directly obtained based on the existing emotion dictionary, meanwhile, the emotion values of the upper layer and the lower layer of the unknown emotion words are obtained by means of syntax and sentence patterns, the emotion weights of the unknown emotion words are obtained through the emotion values of the upper layer and the lower layer of the unknown emotion words, and the emotion values are obtained through weighting.
The construction method of the emotion dictionary based on the emotion hierarchical system comprises the steps of constructing the emotion hierarchical system and constructing the emotion dictionary;
constructing an emotion hierarchical system which is a 6-layer emotion hierarchical system shown in a table 1;
TABLE 1 Emotion analysis hierarchy levels and method of representing same
L denotes a word, WNDenotes a negative word, WDA term of degree, WSRepresenting simple affective words, also called meta-affective words, i.e. affective words without negatives and degree words, WRIndicating a related word, CSRepresenting compound affective words, i.e. affective words including negative words and degree words, SSRepresenting a single sentence, i.e. a sentence without any associated words and without punctuation separating the sentences, MsThe sentence is a composite sentence, namely a plurality of single sentences connected by associated words or punctuation marks, and Q represents a chapter;
step 1, constructing an emotion hierarchy, which specifically comprises the following substeps:
step 1.1, segmenting the chapters in the corpus to be analyzed into complex sentences through periods, exclamation marks and question marks;
step 1.2, segmenting the compound sentence segmented in the step 1.1 into single sentences through associated words and commas;
step 1.3, performing word segmentation on each single sentence segmented in the step 1.2 by adopting a word segmentation tool to obtain a composite emotional word;
step 1.4, extracting the degree words and the negative words in the composite emotion words obtained by word segmentation in the step 1.3 to obtain meta emotion words;
step 1.5, downloading a known emotion dictionary, and marking the emotion weight of the meta-emotion words in the commend words as 1 and the emotion weight of the meta-emotion words in the derogatory words as-1;
step 1.6, segmenting the meta-emotion words obtained in the step 1.4 into single words;
step 1.7, obtaining the emotion weight corresponding to the composite emotion value through a formula (1) based on the meta emotion words and the emotion weight obtained in the step 1.4:
score(Cs)=weight(WS)*β (1)
wherein, score (C)s) For emotional weight of compound emotional words, WsRepresenting meta emotion words, weight (W)s) Is a meta emotion word weight, β is a weight, and β is calculated by equation (2):
among them, weight (W)D) Is degree word weight; epsilon is a scale factor, and epsilon is more than 0 and less than 1;
step 1.8, obtaining the emotion value of the corresponding single sentence through a formula (3) based on the emotion weight of the composite emotion word obtained in the step 1.7:
wherein, Score (S)S) Representing a single sentence SSN is C contained in the single sentencesThe number of (2);
step 1.9, obtaining the emotion value of the corresponding compound sentence through a formula (4) for the emotion value of the single sentence obtained in the step 1.8;
wherein, WRiFor corresponding single sentence SsiAssociated word of SsiCorresponding single sentence SSThe ith of (1); weight (W)Ri) Is a related word WRiBy weight of the associated word WRiAs shown in table 2:
TABLE 2 weights of different types of associated words at different positions
Step 2, constructing an emotion dictionary, comprising the following substeps:
step 2.1, obtaining the emotional tendency of the unknown emotional words through the characters in the constructed emotional hierarchy, and specifically comprising the following substeps:
step 2.1A, setting the emotion weight of the commend in the emotion dictionary downloaded in the step 1.4 as 1, setting the emotion weight of the derogatory word as-1, and splitting the composite emotion word of the emotion dictionary into meta-emotion words;
step 2.1B, calculating the weight of the meta-emotion words by a formula (5);
wherein β is a weight, and β is calculated by formula (2);
step 2.1C, marking words which are not in the emotion dictionary downloaded in step 1.5 in the composite emotion words obtained in step 1.3 as unknown emotion words;
step 2.1D, judging whether the unknown emotional words obtained in the step 2.1C are composite emotional words one by one, if yes, splitting the unknown emotional words into simple emotional words, and jumping to the step 2.1E; otherwise, the composite emotional word is the simple emotional word, and the step jumps to the step 2.1E;
step 2.1E obtains the emotional tendency of the constituent word L from equation (6) based on the unknown simple emotional word obtained in step 2.1C:
among them, emotional tendency P (W)s *,Li) I.e. Ws *And LiProbability of occurrence in the corpus at the same time, P (W)s *) Represents Ws *The probability of occurrence in the corpus is calculated from formula (7), Freq (W)s *,Li) Represents Ws *And LiThe number of occurrences in the corpus at the same time, Freq (W)s *) Represents Ws *The times of occurrence in the corpus, sigma, is a very large number, selected as the total number of words of the corpus;
wherein, W
s *Representing known emotion words;
representing the number of words in the corpus;
step 2.1F, according to the emotion tendency of the character obtained in step 2.1E, calculating the positive emotion weight of the character to the unknown simple emotion word through a formula (8):
wherein, P (W)s #)posPositive emotion weight for unknown simple emotional words, Ws #Representing unknown simple emotional words, Ws * posRecognition in the emotion dictionary downloaded for step 1.5Word, L0……LnIs Ws #Is composed of the words P (W)s #|Ws * pos) The positive emotion weight of the unknown emotion words under the condition of downloading the emotion dictionary in the step 1.5;
step 2.1G calculates the negative emotion weight of the unknown simple emotion word according to the emotion tendency of the character obtained in step 2.1E by the formula (9):
wherein, P (W)s #)negNegative emotional weight for unknown simple emotional words, Ws #Representing unknown simple emotional words, Ws * negDerogative words in the emotion dictionary downloaded in step 2.1A, P (W)s #|Ws * neg) Negative emotion weight of unknown emotion words under the condition of downloading an emotion dictionary;
step 2.1H subtracts the positive emotion weight output in step 2.1F and the negative emotion weight output in step 2.1G, respectively, to obtain the emotion weight of the unknown simple emotion word, as shown in (10):
P(Ws #)=P(Ws #)pos-P(Ws #)neg (10)
wherein, P (W)s #)posForward weight for unknown simple emotional words, P (W)s #)negNegative weight for unknown simple emotional words;
step 2.1I, based on the weight of the unknown simple emotional words obtained in the step 2.1H, obtaining the weight of the composite emotional words corresponding to the unknown simple emotional words through a formula (11);
score(Cs #)=P(Ws #)*β (11)
wherein, score (C)s #) For emotional weight of compound emotional words, Ws #Representing unknown simple emotional words, P (W)s #) For the unknown simple emotion word weight obtained in step 2.1H, β is the weight, and β is calculated by equation (12):
among them, weight (W)D) Is degree word weight; epsilon is a scale factor, and epsilon is more than 0 and less than 1;
step 2.1J, storing the composite emotion words obtained in the step 2.1I and corresponding weight records, and constructing an emotion dictionary DL;
2.2, constructing an emotion dictionary based on the context, and specifically comprising the following substeps:
step 2.2A, calculating the emotion value of the single sentence with the emotion value of 0 in the emotion analysis hierarchy system obtained in the step 1 through a table 3;
TABLE 3 formula for pushing different types of compound sentences to unknown emotional single sentences
Wherein alpha is a weight factor, alpha is more than 1,
for the single sentence for which an emotion value is currently to be calculated,
is composed of
The first sentence of (a) is,
is composed of
The latter sentence of (1);
step 2.2B, obtaining the emotion weight of the composite emotion words in the simple sentence through a formula (13) based on the simple sentence emotion value obtained in the step 2.2A;
wherein N is the number of composite emotion words in the simple sentence, score (C)si) In the composite emotional word CsIn j' th sentence SSjThe sentiment value of (1);
step 2.2C, based on the weight of the composite emotional word obtained in the step 2.2B, obtaining the weight of the composite emotional word in the whole corpus through a formula (14);
wherein J is the frequency of the composite emotional words appearing in the corpus;
and 2.2D, obtaining the positive tendency degree of the composite emotional words through a formula (15):
wherein, P (C)s)posRepresenting the degree of Positive tendency, Pos (S) of the Compound emotional wordS) A single sentence representing an emotion value greater than 0;
step 2.2E, obtaining the negative tendency degree of the composite emotional word through a formula (16);
wherein P (C)s)negIndicating the degree of negative tendency of the composite affective word, Neg (S)S) A single sentence representing an emotion value less than 0;
step 2.2F selects P (C) based on the positive tendency degree of the composite affective word obtained in step 2.2D and the negative tendency degree of the composite affective word obtained in step 2.2Es)posThe composite emotion words more than delta are positive emotion words, P (C)s)negThe composite emotion words with the value more than delta are negative emotion words and are stored in an emotion dictionary DCWherein 0< delta < 1;
step 2.2G is to use the emotion dictionary D obtained in step 2.2F according to the weight of the composite emotion word obtained in step 2.2CCStoring the weight records of the words in the Chinese database;
step 2.3, fusing the constructions of step 2.1 and step 2.2, and specifically comprising the following substeps:
step 2.3A Emotion dictionary D obtained in step 2.1JLThe emotion words in (1) and the emotion dictionary D obtained in step 2.2GCThe middle emotion words are stored in a new emotion dictionary DAPerforming the following steps;
step 2.3B calculation of D by equation (17)AThe emotion weight of the middle emotion word;
weight(CDA)=λweigth(CDL)+(1-λ)weight(CDc) (17)
wherein, λ weight, 0< λ <1, weight represents weight of emotion word in dictionary, CDARepresents DAEmotional word of (1), CDLRepresenting emotional words in DL, CDcRepresents DCThe emotional words in (1);
step 2.3C, the emotion weight of the emotion word obtained in step 2.3B is recorded in an emotion dictionary DAPerforming the following steps;
step 2.3D comparison DAAnd step 1.5, downloading the emotion dictionary if DAIf all the words with the middle weight larger than 0 are in the recognition words of the emotion dictionary downloaded in the step 1.5 and all the words with the weight smaller than 0 are in the derogatory words of the emotion dictionary downloaded in the step 1.5, the construction of the emotion dictionary is finished; otherwise, handle DAAnd as a new emotion dictionary, marking the emotion weight of the meta emotion words with the weight more than 0 in the new emotion dictionary as 1, marking the emotion weight of the meta emotion words with the weight less than 0 as-1, and jumping to the step 1.6.
Advantageous effects
Compared with the prior art, the emotion dictionary construction method based on the emotion hierarchical system has the following beneficial effects:
1. according to the method, the emotion weight of unknown emotion words such as network new words can be obtained under the condition that only known emotion dictionary resources are utilized, and the obtained emotion dictionary is more accurate and comprehensive;
2. the six-layer emotion analysis hierarchical system constructed by the method includes the text elements of the whole emotion analysis, and each layer is provided with the expression method and the emotion weight formula corresponding to the upper layer, so that an emotion analysis task can be subdivided into character dimensions, and the emotion analysis efficiency is improved;
3. the emotion dictionary constructed by the method can be directly used as a feature to be applied to an emotion analysis task, so that the accuracy of emotion analysis can be improved;
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail by the following embodiments with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Example 1
The embodiment describes a specific implementation of the method for automatically constructing the emotion dictionary based on the hierarchical structure.
The emotion analysis hierarchy constructed by the invention is shown in fig. 1, the implementation schematic diagram of the example is shown in fig. 2, the emotion dictionary construction flow chart is shown in fig. 3, the emotion dictionary construction flow chart is based on words, and the emotion dictionary construction flow chart is shown in fig. 4, the emotion dictionary construction flow chart is based on context. The invention belongs to the field of emotion analysis, and can obtain a more accurate and comprehensive emotion dictionary suitable for a current corpus.
By using the method provided by the invention, the corpus is divided into six layers of hierarchy systems, the emotion analysis efficiency can be improved, and meanwhile, the emotion dictionary obtained by the method can be directly used in the emotion analysis task and can improve the accuracy of emotion analysis.
Table 4 associated word list
For the corpus to be analyzed, firstly, a six-layer emotion analysis hierarchy is constructed according to the step 1 introduced in the invention.
In the corpus, all data are in chapter format, and the chapters are segmented into compound sentences according to punctuation marks, such as periods, question marks, exclamation marks and the like, through step 1.1. For the compound sentence, the compound sentence can be divided into simple sentences according to the associated words and commas shown in the table (4) through the step 1.2. In simple sentences, the simple sentences may be segmented into words using segmentation tools such as jieba, NLPIR, and the like. In the dimension of the words, the emotional words containing the negative words and the degree words are as follows: the words are subdivided into meta-emotion words such as like, offensive and happy, the negative words and degree words in the compound emotion words are removed through step 1.4 to obtain meta-emotion words, and the meta-emotion word weights can be obtained by downloading emotion dictionaries such as Hownet, Qinghua university Lijun, Taiwan university NTUSD and the like. Characters in the meta-emotion words are separated through the step 1.5, if the 'like' is separated into 'happiness' and 'Huan', the meta-emotion words are segmented into the dimension of the characters.
In emotion analysis, a word has no emotion value, so the emotion value is calculated from the dimension of the word to the upper layer. The emotion weight of the composite emotion word can be obtained from the meta emotion words, if the weight of the like is 1, the weight of the dislike is-1, and the weight of the composite emotion word can be calculated through the formula (1). Then, the emotion value of a single sentence can be obtained through the step 1.7, and the emotion value of a compound sentence is calculated through the step 1.8, so that six layers of emotion hierarchies of emotion analysis are constructed for the whole corpus.
When unknown emotional words such as 'waste blue' and ' ' are calculated, the emotion value is calculated through the method of step 2.1, taking 'waste blue' as an example, the probability of occurrence of the positive emotional words in 'waste' and a known emotion dictionary is calculated to obtain the positive emotional tendency of the 'waste blue', then the positive emotional tendency of the 'blue' is obtained, the positive emotional tendency of the 'waste blue' is obtained, the negative emotional tendency of the 'waste blue' is obtained through the same method, and the emotional weight of the 'waste blue' under the method of step 2.1 can be obtained by subtracting the negative emotional tendency from the positive emotional tendency. Then, the emotion value is calculated by the method of step 2.2, and taking "waste blue" as an example, the emotion value of the single sentence in which "waste blue" is located is firstly obtained, for example, "i dislike waste blue", the polarity of the sentence is negative according to "dislike", and the emotion weight of "waste blue" in the sentence is-1/2 ═ 0.5 assuming that the weight is-1. The other sentence is pests and waste blue is also the same, the polarity of the later sentence ' waste blue is also negative can be calculated by the former sentence, if the weight is-1, the emotional weight of the ' waste blue ' in the sentence is-1, then the weight of the ' waste blue ' in the sentence is obtained according to the weights of all the sentences in which the ' waste blue ' is positioned, and the average is carried out, so that the weight of the ' waste blue ' in the corpus can be obtained. Whether to add it to the new dictionary can be determined based on whether the probability that it appears in the positive and negative-inclined sentences exceeds a certain threshold, such as 0.8.
The corpus is processed to be divided into six layers of hierarchy systems, and an emotion dictionary suitable for the corpus is obtained. During emotion analysis, the existing emotion analysis method can directly analyze according to a six-layer hierarchy system, and calculate emotion weight according to the obtained emotion dictionary, so that the efficiency and accuracy of emotion analysis can be improved.
While the foregoing is directed to the preferred embodiment of the present invention, it is not intended that the invention be limited to the embodiment and the drawings disclosed herein. Equivalents and modifications may be made without departing from the spirit of the disclosure, which is to be considered as within the scope of the invention.