CN110991167B - Emotion dictionary construction method based on emotion hierarchy system - Google Patents

Emotion dictionary construction method based on emotion hierarchy system Download PDF

Info

Publication number
CN110991167B
CN110991167B CN201911233518.9A CN201911233518A CN110991167B CN 110991167 B CN110991167 B CN 110991167B CN 201911233518 A CN201911233518 A CN 201911233518A CN 110991167 B CN110991167 B CN 110991167B
Authority
CN
China
Prior art keywords
emotion
words
weight
emotional
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911233518.9A
Other languages
Chinese (zh)
Other versions
CN110991167A (en
Inventor
张宝华
张华平
商建云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
LINGJOIN ZHONGKE SOFTWARE (BEIJING) Co.,Ltd.
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN201911233518.9A priority Critical patent/CN110991167B/en
Publication of CN110991167A publication Critical patent/CN110991167A/en
Application granted granted Critical
Publication of CN110991167B publication Critical patent/CN110991167B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to an emotion dictionary construction method based on an emotion hierarchy system, and belongs to the field of emotion analysis. Comprises the following steps: the method comprises the following steps: splitting the corpus according to an emotion hierarchy system, and extracting unknown emotion words; step two: constructing characters according to unknown emotional words, calculating the weight of the unknown emotional words, and constructing an emotional dictionary based on the characters; step three: the emotion value of a single sentence with an unknown emotion value is deduced through the composite sentence, the weight of unknown emotion words is calculated through the context, and an emotion dictionary based on the context is constructed; step four: fusing the second emotion dictionary and the third emotion dictionary; step six: and re-iterating the calculation by using the new emotion dictionary until no new emotion words exist. The method divides the corpus into six layers of hierarchical systems, and then calculates the weight through a word construction and context method to obtain a more accurate and comprehensive emotion dictionary; the obtained emotion dictionary is applied to the emotion analysis task, so that the emotion analysis efficiency and accuracy can be improved.

Description

Emotion dictionary construction method based on emotion hierarchy system
Technical Field
The invention relates to an emotion dictionary construction method based on an emotion hierarchy system, and belongs to the field of emotion analysis.
Background
With the rapid development of the internet, the number of online interactive platforms is increasing, and more users tend to express their own opinions on the network. Especially on micro blogs and forums, some current hot news are easily subject to mass review forwarding. To quickly obtain the views of these comments and the attitudes of the masses, sentiment analysis is one of the indispensable technical means.
The current emotion analysis technology is mainly divided into two types from the result, one is a traditional method based on syntactic rules and emotion dictionaries, and the other is a method based on machine learning and deep learning. The attention point of the rule-based method is mainly on sentences, the sentences are divided into emotion words, emotion analysis of the sentences is completed by setting certain conditions by utilizing the existing emotion dictionary and grammar rules, according to different analysis methods, some methods only give positive and negative surface trends of the sentences, some methods give specific scores, and meanwhile, specific rules can be set according to target words to analyze the emotion of the target words. The method based on deep learning regards the emotion analysis task as a text classification task, the results given by the method are positive and negative trends of the text, and the most common classification is two-classification and three-classification.
For emotion analysis, no matter which method is used, an overall emotion dictionary suitable for the field can improve the accuracy to a great extent. At present, there are a plurality of methods for constructing an emotion dictionary, such as PMI, SO-PMI, word2vec to obtain similarity, and the like, and the weight of a candidate emotion word is obtained by using semantic information of a known emotion word and a candidate emotion word and methods such as a similar meaning word antisense word or similarity. However, the frequency and number of new words on the internet are more than those of 'waste blue', ' ', etc., and the emotion weight cannot be obtained by the existing method because semantic information is difficult to obtain, and the words are important factors for emotion analysis.
Therefore, on the basis of fully researching the elements related to emotion analysis, the method constructs a six-layer emotion analysis hierarchical system from words to chapters, and each layer has a corresponding representation method to the upper layer and a representation method of emotion values. On the basis, the method provides an automatic construction method of an emotion dictionary based on character construction and context, the emotion weight of the emotion word is calculated through the emotion tendency of unknown emotion word character construction and the emotion value of the context, and the weight of the candidate emotion word can be obtained through the existing emotion dictionary resources without semantic information of the candidate emotion word. Therefore, new words on the network are endowed with corresponding weights, and the accuracy of emotion analysis on network text analysis is improved.
Disclosure of Invention
The invention aims to firstly summarize text elements related to emotion analysis into a six-layer hierarchical system, each layer is provided with a representation method and an emotion representation formula corresponding to an upper layer, and on the basis, the weight of an emotion word can be obtained by combining the emotion tendencies of the upper layer and the lower layer of the unknown emotion word. Only the existing emotion dictionary resource is needed to be utilized, and other information such as the semantics of unknown emotion words is not needed.
The core idea of the invention is as follows: the emotion analysis task can be subdivided into the dimensionality of characters, emotion values of chapters can be obtained from the bottom layer to the upper layer, the emotion values of the lower layers of unknown emotion words can be obtained from the emotion values of the upper layer and the lower layer of the unknown emotion words, the emotion values of the lower layers of the unknown emotion words can be directly obtained based on the existing emotion dictionary, meanwhile, the emotion values of the upper layer and the lower layer of the unknown emotion words are obtained by means of syntax and sentence patterns, the emotion weights of the unknown emotion words are obtained through the emotion values of the upper layer and the lower layer of the unknown emotion words, and the emotion values are obtained through weighting.
The construction method of the emotion dictionary based on the emotion hierarchical system comprises the steps of constructing the emotion hierarchical system and constructing the emotion dictionary;
constructing an emotion hierarchical system which is a 6-layer emotion hierarchical system shown in a table 1;
TABLE 1 Emotion analysis hierarchy levels and method of representing same
Figure BDA0002304238490000021
L denotes a word, WNDenotes a negative word, WDA term of degree, WSRepresenting simple affective words, also called meta-affective words, i.e. affective words without negatives and degree words, WRIndicating a related word, CSRepresenting compound affective words, i.e. affective words including negative words and degree words, SSRepresenting a single sentence, i.e. a sentence without any associated words and without punctuation separating the sentences, MsThe sentence is a composite sentence, namely a plurality of single sentences connected by associated words or punctuation marks, and Q represents a chapter;
step 1, constructing an emotion hierarchy, which specifically comprises the following substeps:
step 1.1, segmenting the chapters in the corpus to be analyzed into complex sentences through periods, exclamation marks and question marks;
step 1.2, segmenting the compound sentence segmented in the step 1.1 into single sentences through associated words and commas;
step 1.3, performing word segmentation on each single sentence segmented in the step 1.2 by adopting a word segmentation tool to obtain a composite emotional word;
step 1.4, extracting the degree words and the negative words in the composite emotion words obtained by word segmentation in the step 1.3 to obtain meta emotion words;
step 1.5, downloading a known emotion dictionary, and marking the emotion weight of the meta-emotion words in the commend words as 1 and the emotion weight of the meta-emotion words in the derogatory words as-1;
step 1.6, segmenting the meta-emotion words obtained in the step 1.4 into single words;
step 1.7, obtaining the emotion weight corresponding to the composite emotion value through a formula (1) based on the meta emotion words and the emotion weight obtained in the step 1.4:
score(Cs)=weight(WS)*β (1)
wherein, score (C)s) For emotional weight of compound emotional words, WsRepresenting meta emotion words, weight (W)s) Is a meta emotion word weight, β is a weight, and β is calculated by equation (2):
Figure BDA0002304238490000031
among them, weight (W)D) Is degree word weight; epsilon is a scale factor, and epsilon is more than 0 and less than 1;
step 1.8, obtaining the emotion value of the corresponding single sentence through a formula (3) based on the emotion weight of the composite emotion word obtained in the step 1.7:
Figure BDA0002304238490000041
wherein, Score (S)S) Representing a single sentence SSN is C contained in the single sentencesThe number of (2);
step 1.9, obtaining the emotion value of the corresponding compound sentence through a formula (4) for the emotion value of the single sentence obtained in the step 1.8;
Figure BDA0002304238490000042
wherein, WRiFor corresponding single sentence SsiAssociated word of SsiCorresponding single sentence SSThe ith of (1); weight (W)Ri) Is a related word WRiBy weight of the associated word WRiAs shown in table 2:
TABLE 2 weights of different types of associated words at different positions
Figure BDA0002304238490000043
Step 2, constructing an emotion dictionary, comprising the following substeps:
step 2.1, obtaining the emotional tendency of the unknown emotional words through the characters in the constructed emotional hierarchy, and specifically comprising the following substeps:
step 2.1A, setting the emotion weight of the commend in the emotion dictionary downloaded in the step 1.4 as 1, setting the emotion weight of the derogatory word as-1, and splitting the composite emotion word of the emotion dictionary into meta-emotion words;
step 2.1B, calculating the weight of the meta-emotion words by a formula (5);
Figure BDA0002304238490000051
wherein β is a weight, and β is calculated by formula (2);
step 2.1C, marking words which are not in the emotion dictionary downloaded in step 1.5 in the composite emotion words obtained in step 1.3 as unknown emotion words;
step 2.1D, judging whether the unknown emotional words obtained in the step 2.1C are composite emotional words one by one, if yes, splitting the unknown emotional words into simple emotional words, and jumping to the step 2.1E; otherwise, the composite emotional word is the simple emotional word, and the step jumps to the step 2.1E;
step 2.1E obtains the emotional tendency of the constituent word L from equation (6) based on the unknown simple emotional word obtained in step 2.1C:
Figure BDA0002304238490000052
among them, emotional tendency P (W)s *,Li) I.e. Ws *And LiProbability of occurrence in the corpus at the same time, P (W)s *) Represents Ws *The probability of occurrence in the corpus is calculated from formula (7), Freq (W)s *,Li) Represents Ws *And LiThe number of occurrences in the corpus at the same time, Freq (W)s *) Represents Ws *The times of occurrence in the corpus, sigma, is a very large number, selected as the total number of words of the corpus;
Figure BDA0002304238490000053
wherein, Ws *Representing known emotion words;
Figure BDA0002304238490000054
representing the number of words in the corpus;
step 2.1F, according to the emotion tendency of the character obtained in step 2.1E, calculating the positive emotion weight of the character to the unknown simple emotion word through a formula (8):
Figure BDA0002304238490000055
wherein, P (W)s #)posPositive emotion weight for unknown simple emotional words, Ws #Representing unknown simple emotional words, Ws * posRecognition in the emotion dictionary downloaded for step 1.5Word, L0……LnIs Ws #Is composed of the words P (W)s #|Ws * pos) The positive emotion weight of the unknown emotion words under the condition of downloading the emotion dictionary in the step 1.5;
step 2.1G calculates the negative emotion weight of the unknown simple emotion word according to the emotion tendency of the character obtained in step 2.1E by the formula (9):
Figure BDA0002304238490000061
wherein, P (W)s #)negNegative emotional weight for unknown simple emotional words, Ws #Representing unknown simple emotional words, Ws * negDerogative words in the emotion dictionary downloaded in step 2.1A, P (W)s #|Ws * neg) Negative emotion weight of unknown emotion words under the condition of downloading an emotion dictionary;
step 2.1H subtracts the positive emotion weight output in step 2.1F and the negative emotion weight output in step 2.1G, respectively, to obtain the emotion weight of the unknown simple emotion word, as shown in (10):
P(Ws #)=P(Ws #)pos-P(Ws #)neg (10)
wherein, P (W)s #)posForward weight for unknown simple emotional words, P (W)s #)negNegative weight for unknown simple emotional words;
step 2.1I, based on the weight of the unknown simple emotional words obtained in the step 2.1H, obtaining the weight of the composite emotional words corresponding to the unknown simple emotional words through a formula (11);
score(Cs #)=P(Ws #)*β (11)
wherein, score (C)s #) For emotional weight of compound emotional words, Ws #Representing unknown simple emotional words, P (W)s #) For the unknown simple emotion word weight obtained in step 2.1H, β is the weight, and β is calculated by equation (12):
Figure BDA0002304238490000071
among them, weight (W)D) Is degree word weight; epsilon is a scale factor, and epsilon is more than 0 and less than 1;
step 2.1J, storing the composite emotion words obtained in the step 2.1I and corresponding weight records, and constructing an emotion dictionary DL
2.2, constructing an emotion dictionary based on the context, and specifically comprising the following substeps:
step 2.2A, calculating the emotion value of the single sentence with the emotion value of 0 in the emotion analysis hierarchy system obtained in the step 1 through a table 3;
TABLE 3 formula for pushing different types of compound sentences to unknown emotional single sentences
Figure BDA0002304238490000072
Wherein alpha is a weight factor, alpha is more than 1,
Figure BDA0002304238490000073
for the single sentence for which an emotion value is currently to be calculated,
Figure BDA0002304238490000074
is composed of
Figure BDA0002304238490000075
The first sentence of (a) is,
Figure BDA0002304238490000081
is composed of
Figure BDA0002304238490000082
The latter sentence of (1);
step 2.2B, obtaining the emotion weight of the composite emotion words in the simple sentence through a formula (13) based on the simple sentence emotion value obtained in the step 2.2A;
Figure BDA0002304238490000083
wherein N is the number of composite emotion words in the simple sentence, score (C)si) In the composite emotional word CsIn j' th sentence SSjThe sentiment value of (1);
step 2.2C, based on the weight of the composite emotional word obtained in the step 2.2B, obtaining the weight of the composite emotional word in the whole corpus through a formula (14);
Figure BDA0002304238490000084
wherein J is the frequency of the composite emotional words appearing in the corpus;
and 2.2D, obtaining the positive tendency degree of the composite emotional words through a formula (15):
Figure BDA0002304238490000085
wherein, P (C)s)posRepresenting the degree of Positive tendency, Pos (S) of the Compound emotional wordS) A single sentence representing an emotion value greater than 0;
step 2.2E, obtaining the negative tendency degree of the composite emotional word through a formula (16);
Figure BDA0002304238490000086
wherein P (C)s)negIndicating the degree of negative tendency of the composite affective word, Neg (S)S) A single sentence representing an emotion value less than 0;
step 2.2F selects P (C) based on the positive tendency degree of the composite affective word obtained in step 2.2D and the negative tendency degree of the composite affective word obtained in step 2.2Es)posThe composite emotion words more than delta are positive emotion words, P (C)s)negThe composite emotion words with the value more than delta are negative emotion words and are stored in an emotion dictionary DCWherein 0< delta < 1;
step 2.2G is to use the emotion dictionary D obtained in step 2.2F according to the weight of the composite emotion word obtained in step 2.2CCStoring the weight records of the words in the Chinese database;
step 2.3, fusing the constructions of step 2.1 and step 2.2, and specifically comprising the following substeps:
step 2.3A Emotion dictionary D obtained in step 2.1JLThe emotion words in (1) and the emotion dictionary D obtained in step 2.2GCThe middle emotion words are stored in a new emotion dictionary DAPerforming the following steps;
step 2.3B calculation of D by equation (17)AThe emotion weight of the middle emotion word;
weight(CDA)=λweigth(CDL)+(1-λ)weight(CDc) (17)
wherein, λ weight, 0< λ <1, weight represents weight of emotion word in dictionary, CDARepresents DAEmotional word of (1), CDLRepresenting emotional words in DL, CDcRepresents DCThe emotional words in (1);
step 2.3C, the emotion weight of the emotion word obtained in step 2.3B is recorded in an emotion dictionary DAPerforming the following steps;
step 2.3D comparison DAAnd step 1.5, downloading the emotion dictionary if DAIf all the words with the middle weight larger than 0 are in the recognition words of the emotion dictionary downloaded in the step 1.5 and all the words with the weight smaller than 0 are in the derogatory words of the emotion dictionary downloaded in the step 1.5, the construction of the emotion dictionary is finished; otherwise, handle DAAnd as a new emotion dictionary, marking the emotion weight of the meta emotion words with the weight more than 0 in the new emotion dictionary as 1, marking the emotion weight of the meta emotion words with the weight less than 0 as-1, and jumping to the step 1.6.
Advantageous effects
Compared with the prior art, the emotion dictionary construction method based on the emotion hierarchical system has the following beneficial effects:
1. according to the method, the emotion weight of unknown emotion words such as network new words can be obtained under the condition that only known emotion dictionary resources are utilized, and the obtained emotion dictionary is more accurate and comprehensive;
2. the six-layer emotion analysis hierarchical system constructed by the method includes the text elements of the whole emotion analysis, and each layer is provided with the expression method and the emotion weight formula corresponding to the upper layer, so that an emotion analysis task can be subdivided into character dimensions, and the emotion analysis efficiency is improved;
3. the emotion dictionary constructed by the method can be directly used as a feature to be applied to an emotion analysis task, so that the accuracy of emotion analysis can be improved;
drawings
Embodiments of the invention are further described below with reference to the accompanying drawings, in which:
FIG. 1 is a hierarchical emotion analysis system in step 1 of an emotion dictionary construction method based on a hierarchical structure;
FIG. 2 is a flowchart of emotion unit construction in substep 2.3 of an emotion dictionary construction method based on a hierarchical structure;
FIG. 3 is a flowchart of a word-based emotion dictionary construction method in substep 2.1 of the hierarchical structure-based emotion dictionary construction method;
FIG. 4 is a flowchart of a method for constructing an emotion dictionary based on a hierarchical structure, step 2, substep 2.2, which is a method for constructing an emotion dictionary based on a context.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail by the following embodiments with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Example 1
The embodiment describes a specific implementation of the method for automatically constructing the emotion dictionary based on the hierarchical structure.
The emotion analysis hierarchy constructed by the invention is shown in fig. 1, the implementation schematic diagram of the example is shown in fig. 2, the emotion dictionary construction flow chart is shown in fig. 3, the emotion dictionary construction flow chart is based on words, and the emotion dictionary construction flow chart is shown in fig. 4, the emotion dictionary construction flow chart is based on context. The invention belongs to the field of emotion analysis, and can obtain a more accurate and comprehensive emotion dictionary suitable for a current corpus.
By using the method provided by the invention, the corpus is divided into six layers of hierarchy systems, the emotion analysis efficiency can be improved, and meanwhile, the emotion dictionary obtained by the method can be directly used in the emotion analysis task and can improve the accuracy of emotion analysis.
Table 4 associated word list
Figure BDA0002304238490000101
For the corpus to be analyzed, firstly, a six-layer emotion analysis hierarchy is constructed according to the step 1 introduced in the invention.
In the corpus, all data are in chapter format, and the chapters are segmented into compound sentences according to punctuation marks, such as periods, question marks, exclamation marks and the like, through step 1.1. For the compound sentence, the compound sentence can be divided into simple sentences according to the associated words and commas shown in the table (4) through the step 1.2. In simple sentences, the simple sentences may be segmented into words using segmentation tools such as jieba, NLPIR, and the like. In the dimension of the words, the emotional words containing the negative words and the degree words are as follows: the words are subdivided into meta-emotion words such as like, offensive and happy, the negative words and degree words in the compound emotion words are removed through step 1.4 to obtain meta-emotion words, and the meta-emotion word weights can be obtained by downloading emotion dictionaries such as Hownet, Qinghua university Lijun, Taiwan university NTUSD and the like. Characters in the meta-emotion words are separated through the step 1.5, if the 'like' is separated into 'happiness' and 'Huan', the meta-emotion words are segmented into the dimension of the characters.
In emotion analysis, a word has no emotion value, so the emotion value is calculated from the dimension of the word to the upper layer. The emotion weight of the composite emotion word can be obtained from the meta emotion words, if the weight of the like is 1, the weight of the dislike is-1, and the weight of the composite emotion word can be calculated through the formula (1). Then, the emotion value of a single sentence can be obtained through the step 1.7, and the emotion value of a compound sentence is calculated through the step 1.8, so that six layers of emotion hierarchies of emotion analysis are constructed for the whole corpus.
When unknown emotional words such as 'waste blue' and ' ' are calculated, the emotion value is calculated through the method of step 2.1, taking 'waste blue' as an example, the probability of occurrence of the positive emotional words in 'waste' and a known emotion dictionary is calculated to obtain the positive emotional tendency of the 'waste blue', then the positive emotional tendency of the 'blue' is obtained, the positive emotional tendency of the 'waste blue' is obtained, the negative emotional tendency of the 'waste blue' is obtained through the same method, and the emotional weight of the 'waste blue' under the method of step 2.1 can be obtained by subtracting the negative emotional tendency from the positive emotional tendency. Then, the emotion value is calculated by the method of step 2.2, and taking "waste blue" as an example, the emotion value of the single sentence in which "waste blue" is located is firstly obtained, for example, "i dislike waste blue", the polarity of the sentence is negative according to "dislike", and the emotion weight of "waste blue" in the sentence is-1/2 ═ 0.5 assuming that the weight is-1. The other sentence is pests and waste blue is also the same, the polarity of the later sentence ' waste blue is also negative can be calculated by the former sentence, if the weight is-1, the emotional weight of the ' waste blue ' in the sentence is-1, then the weight of the ' waste blue ' in the sentence is obtained according to the weights of all the sentences in which the ' waste blue ' is positioned, and the average is carried out, so that the weight of the ' waste blue ' in the corpus can be obtained. Whether to add it to the new dictionary can be determined based on whether the probability that it appears in the positive and negative-inclined sentences exceeds a certain threshold, such as 0.8.
The corpus is processed to be divided into six layers of hierarchy systems, and an emotion dictionary suitable for the corpus is obtained. During emotion analysis, the existing emotion analysis method can directly analyze according to a six-layer hierarchy system, and calculate emotion weight according to the obtained emotion dictionary, so that the efficiency and accuracy of emotion analysis can be improved.
While the foregoing is directed to the preferred embodiment of the present invention, it is not intended that the invention be limited to the embodiment and the drawings disclosed herein. Equivalents and modifications may be made without departing from the spirit of the disclosure, which is to be considered as within the scope of the invention.

Claims (3)

1. An emotion dictionary construction method based on an emotion hierarchy system is characterized by comprising the following steps: constructing an emotion hierarchical system and an emotion dictionary;
constructing an emotion hierarchical system which is a 6-layer emotion hierarchical system shown in a table 1;
TABLE 1 Emotion analysis hierarchy levels and method of representing same
Figure FDA0003217734400000011
L denotes a word, WNDenotes a negative word, WDA term of degree, WSRepresenting simple affective words, also called meta-affective words, i.e. affective words without negatives and degree words, WRIndicating a related word, CSRepresenting compound affective words, i.e. affective words including negative words and degree words, SSRepresenting a single sentence, i.e. a sentence without any associated words and without punctuation separating the sentences, MsThe sentence is a composite sentence, namely a plurality of single sentences connected by associated words or punctuation marks, and Q represents a chapter;
step 1, constructing an emotion hierarchy, which specifically comprises the following substeps:
step 1.1, segmenting the chapters in the corpus to be analyzed into complex sentences through periods, exclamation marks and question marks;
step 1.2, segmenting the compound sentence segmented in the step 1.1 into single sentences through associated words and commas;
step 1.3, performing word segmentation on each single sentence segmented in the step 1.2 by adopting a word segmentation tool to obtain a composite emotional word;
step 1.4, extracting the degree words and the negative words in the composite emotion words obtained by word segmentation in the step 1.3 to obtain meta emotion words;
step 1.5, downloading a known emotion dictionary, and marking the emotion weight of the meta-emotion words in the commend words as 1 and the emotion weight of the meta-emotion words in the derogatory words as-1;
step 1.6, segmenting the meta-emotion words obtained in the step 1.4 into single words;
step 1.7, obtaining the emotion weight corresponding to the composite emotion value through a formula (1) based on the meta emotion words and the emotion weight obtained in the step 1.4:
score(Cs)=weight(WS)*β (1)
wherein, score (C)s) For emotional weight of compound emotional words, WsRepresenting meta emotion words, weight (W)s) Is a meta emotion word weight, β is a weight, and β is calculated by equation (2):
Figure FDA0003217734400000021
among them, weight (W)D) Is degree word weight; epsilon is a scale factor;
step 1.8, obtaining the emotion value of the corresponding single sentence through a formula (3) based on the emotion weight of the composite emotion word obtained in the step 1.7:
Figure FDA0003217734400000022
wherein, Score (S)S) Representing a single sentence SSN is C contained in the single sentencesThe number of (2);
step 1.9, obtaining the emotion value of the corresponding compound sentence through a formula (4) for the emotion value of the single sentence obtained in the step 1.8;
Figure FDA0003217734400000023
wherein, WRiFor corresponding single sentence SsiAssociated word of SsiCorresponding single sentence SSThe ith of (1); weight (W)Ri) Is a related word WRiBy weight of the associated word WRiIs determined;
step 2, constructing an emotion dictionary, comprising the following substeps:
step 2.1, obtaining the emotional tendency of the unknown emotional words through the characters in the constructed emotional hierarchy, and specifically comprising the following substeps:
step 2.1A, setting the emotion weight of the commend in the emotion dictionary downloaded in the step 1.4 as 1, setting the emotion weight of the derogatory word as-1, and splitting the composite emotion word of the emotion dictionary into meta-emotion words;
step 2.1B, calculating the weight of the meta-emotion words by a formula (5);
Figure FDA0003217734400000031
wherein β is a weight, and β is calculated by formula (2);
step 2.1C, marking words which are not in the emotion dictionary downloaded in step 1.5 in the composite emotion words obtained in step 1.3 as unknown emotion words;
step 2.1D, judging whether the unknown emotional words obtained in the step 2.1C are composite emotional words one by one, if yes, splitting the unknown emotional words into simple emotional words, and jumping to the step 2.1E; otherwise, the composite emotional word is the simple emotional word, and the step jumps to the step 2.1E;
step 2.1E obtains the emotional tendency of the constituent word L from equation (6) based on the unknown simple emotional word obtained in step 2.1C:
Figure FDA0003217734400000032
among them, emotional tendency P (W)s *,Li) I.e. Ws *And LiProbability of occurrence in the corpus at the same time, P (W)s *) Represents Ws *The probability of occurrence in the corpus is calculated from formula (7), Freq (W)s *,Li) Represents Ws *And LiThe number of occurrences in the corpus at the same time, Freq (W)s *) Represents Ws *The times of occurrence in the corpus, sigma, is a very large number, selected as the total number of words of the corpus;
Figure FDA0003217734400000033
wherein, Ws *Representing known emotion words;
Figure FDA0003217734400000034
representing the number of words in the corpus;
step 2.1F, according to the emotion tendency of the character obtained in step 2.1E, calculating the positive emotion weight of the character to the unknown simple emotion word through a formula (8):
Figure FDA0003217734400000041
wherein, P (W)s #)posPositive emotion weight for unknown simple emotional words, Ws #Representing unknown simple emotional words, Ws * posRecognition, L, in the emotion dictionary downloaded in step 1.50……LnIs Ws #Is composed of the words P (W)s #|Ws * pos) The positive emotion weight of the unknown emotion words under the condition of downloading the emotion dictionary in the step 1.5;
step 2.1G calculates the negative emotion weight of the unknown simple emotion word according to the emotion tendency of the character obtained in step 2.1E by the formula (9):
Figure FDA0003217734400000042
wherein, P (W)s #)negNegative emotional weight for unknown simple emotional words, Ws #Representing unknown simple emotional words, Ws * negDerogative words in the emotion dictionary downloaded in step 2.1A, P (W)s #|Ws * neg) Negative emotion weight of unknown emotion words under the condition of downloading an emotion dictionary;
step 2.1H subtracts the positive emotion weight output in step 2.1F and the negative emotion weight output in step 2.1G, respectively, to obtain the emotion weight of the unknown simple emotion word, as shown in (10):
P(Ws #)=P(Ws #)pos-P(Ws #)neg (10)
wherein, P (W)s #)posForward weight for unknown simple emotional words, P (W)s #)negNegative weight for unknown simple emotional words;
step 2.1I, based on the weight of the unknown simple emotional words obtained in the step 2.1H, obtaining the weight of the composite emotional words corresponding to the unknown simple emotional words through a formula (11);
score(Cs #)=P(Ws #)*β (11)
wherein, score (C)s #) For emotional weight of compound emotional words, Ws #Representing unknown simple emotional words, P (W)s #) For the unknown simple emotion word weight obtained in step 2.1H, β is the weight, and β is calculated by equation (12):
Figure FDA0003217734400000051
among them, weight (W)D) Is degree word weight; ε is a scale factor, 0<ε<1;
Step 2.1J, storing the composite emotion words obtained in the step 2.1I and corresponding weight records, and constructing an emotion dictionary DL
2.2, constructing an emotion dictionary based on the context, and specifically comprising the following substeps:
step 2.2A, calculating the emotion value of the single sentence with the emotion value of 0 in the emotion analysis hierarchy system obtained in the step 1 through a table 3;
TABLE 3 formula for pushing different types of compound sentences to unknown emotional single sentences
Figure FDA0003217734400000052
Figure FDA0003217734400000061
Wherein alpha is a weight factor, alpha>1,
Figure FDA0003217734400000062
For the single sentence for which an emotion value is currently to be calculated,
Figure FDA0003217734400000063
is composed of
Figure FDA0003217734400000064
The first sentence of (a) is,
Figure FDA0003217734400000065
is composed of
Figure FDA0003217734400000066
The latter sentence of (1);
step 2.2B, obtaining the emotion weight of the composite emotion words in the single sentence through a formula (13) based on the single sentence emotion value obtained in the step 2.2A;
Figure FDA0003217734400000067
wherein N is the number of composite emotion words in the single sentence, score (C)si) As a composite emotional word CsIn j' th sentence SSjThe sentiment value of (1);
step 2.2C, based on the weight of the composite emotional word obtained in the step 2.2B, obtaining the weight of the composite emotional word in the whole corpus through a formula (14);
Figure FDA0003217734400000068
wherein J is the frequency of the composite emotional words appearing in the corpus;
and 2.2D, obtaining the positive tendency degree of the composite emotional words through a formula (15):
Figure FDA0003217734400000069
wherein, P (C)s)posRepresenting the degree of Positive tendency, Pos (S) of the Compound emotional wordS) A single sentence representing an emotion value greater than 0;
step 2.2E, obtaining the negative tendency degree of the composite emotional word through a formula (16);
Figure FDA00032177344000000610
wherein P (C)s)negIndicating the degree of negative tendency of the composite affective word, Neg (S)S) A single sentence representing an emotion value less than 0;
step 2.2F selects P (C) based on the positive tendency degree of the composite affective word obtained in step 2.2D and the negative tendency degree of the composite affective word obtained in step 2.2Es)pos>The composite emotional word of delta is a positive emotional word P (C)s)neg>D, storing the composite emotion words of delta as negative emotion words in an emotion dictionary DCIn which 0 is<δ<1;
Step 2.2G is to use the emotion dictionary D obtained in step 2.2F according to the weight of the composite emotion word obtained in step 2.2CCStoring the weight records of the words in the Chinese database;
step 2.3, fusing the constructions of step 2.1 and step 2.2, and specifically comprising the following substeps:
step 2.3A preparation of the compound obtained in step 2.1JEmotion dictionary DLThe emotion words in (1) and the emotion dictionary D obtained in step 2.2GCThe middle emotion words are stored in a new emotion dictionary DAPerforming the following steps;
step 2.3B calculation of D by equation (17)AThe emotion weight of the middle emotion word;
weight(CDA)=λweigth(CDL)+(1-λ)weight(CDc) (17)
wherein λ is weight, 0<λ<1, weight represents weight of emotional words in dictionary, CDARepresents DAEmotional word of (1), CDLRepresents DLEmotional word of (1), CDcRepresents DCThe emotional words in (1);
step 2.3C, the emotion weight of the emotion word obtained in step 2.3B is recorded in an emotion dictionary DAPerforming the following steps;
step 2.3D comparison DAAnd the downloaded emotion dictionary from step 1.5, if DAIf all the words with the middle weight larger than 0 are in the recognition words of the emotion dictionary downloaded in the step 1.5 and all the words with the weight smaller than 0 are in the derogatory words of the emotion dictionary downloaded in the step 1.5, the construction of the emotion dictionary is finished; otherwise, handle DAAnd as a new emotion dictionary, marking the emotion weight of the meta emotion words with the weight more than 0 in the new emotion dictionary as 1, marking the emotion weight of the meta emotion words with the weight less than 0 as-1, and jumping to the step 1.6.
2. The emotion dictionary construction method based on the emotion hierarchy system as claimed in claim 1, wherein: in step 1.7, the value range of epsilon is 0< epsilon < 1.
3. The emotion dictionary construction method based on the emotion hierarchy system as claimed in claim 1, wherein: step 1.9 associated word WRiBy weight of the associated word WRiThe type of (c) is specifically determined as shown in table 2:
TABLE 2 weights of different types of associated words at different positions
Figure FDA0003217734400000071
Figure FDA0003217734400000081
CN201911233518.9A 2019-12-05 2019-12-05 Emotion dictionary construction method based on emotion hierarchy system Active CN110991167B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911233518.9A CN110991167B (en) 2019-12-05 2019-12-05 Emotion dictionary construction method based on emotion hierarchy system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911233518.9A CN110991167B (en) 2019-12-05 2019-12-05 Emotion dictionary construction method based on emotion hierarchy system

Publications (2)

Publication Number Publication Date
CN110991167A CN110991167A (en) 2020-04-10
CN110991167B true CN110991167B (en) 2021-10-08

Family

ID=70090406

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911233518.9A Active CN110991167B (en) 2019-12-05 2019-12-05 Emotion dictionary construction method based on emotion hierarchy system

Country Status (1)

Country Link
CN (1) CN110991167B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663139A (en) * 2012-05-07 2012-09-12 苏州大学 Method and system for constructing emotional dictionary
CN107203520A (en) * 2016-03-16 2017-09-26 中国科学院上海高等研究院 The method for building up of hotel's sentiment dictionary, the sentiment analysis method and system of comment
CN109947951A (en) * 2019-03-19 2019-06-28 北京师范大学 A kind of automatically updated emotion dictionary construction method for financial text analyzing
CN110263321A (en) * 2019-05-06 2019-09-20 成都数联铭品科技有限公司 A kind of sentiment dictionary construction method and system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9213687B2 (en) * 2009-03-23 2015-12-15 Lawrence Au Compassion, variety and cohesion for methods of text analytics, writing, search, user interfaces
CN106202200B (en) * 2016-06-28 2019-09-27 昆明理工大学 A kind of emotion tendentiousness of text classification method based on fixed theme

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663139A (en) * 2012-05-07 2012-09-12 苏州大学 Method and system for constructing emotional dictionary
CN107203520A (en) * 2016-03-16 2017-09-26 中国科学院上海高等研究院 The method for building up of hotel's sentiment dictionary, the sentiment analysis method and system of comment
CN109947951A (en) * 2019-03-19 2019-06-28 北京师范大学 A kind of automatically updated emotion dictionary construction method for financial text analyzing
CN110263321A (en) * 2019-05-06 2019-09-20 成都数联铭品科技有限公司 A kind of sentiment dictionary construction method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
An Improved Method to Building a Score Lexicon for Chinese Sentiment Analysis;Haiping Zhang et al.;《2012 Eighth International Conference on Semantics, Knowledge and Grids》;20121022;第241-244页 *
基于标签传播的评教文本情感词典构建;麻孟越 等;《内蒙古大学学报(自然科学版)》;20190531;第50卷(第3期);第324-330页 *

Also Published As

Publication number Publication date
CN110991167A (en) 2020-04-10

Similar Documents

Publication Publication Date Title
RU2665239C2 (en) Named entities from the text automatic extraction
CN108874937B (en) Emotion classification method based on part of speech combination and feature selection
US10496756B2 (en) Sentence creation system
CN112131863B (en) Comment opinion theme extraction method, electronic equipment and storage medium
Lo et al. A multilingual semi-supervised approach in deriving Singlish sentic patterns for polarity detection
US8443008B2 (en) Cooccurrence dictionary creating system, scoring system, cooccurrence dictionary creating method, scoring method, and program thereof
RU2601166C2 (en) Anaphora resolution based on a deep analysis technology
CN106997341A (en) A kind of innovation scheme matching process, device, server and system
JP2013020431A (en) Polysemic word extraction system, polysemic word extraction method and program
JP4534666B2 (en) Text sentence search device and text sentence search program
Atmadja et al. Comparison on the rule based method and statistical based method on emotion classification for Indonesian Twitter text
CN112860896A (en) Corpus generalization method and man-machine conversation emotion analysis method for industrial field
CN114997288A (en) Design resource association method
JP2572314B2 (en) Keyword extraction device
CN111444713B (en) Method and device for extracting entity relationship in news event
Durga et al. Ontology based text categorization-telugu document
JP7110554B2 (en) Ontology generation device, ontology generation program and ontology generation method
CN110096696A (en) A kind of Chinese long text sentiment analysis method
CN110991167B (en) Emotion dictionary construction method based on emotion hierarchy system
CN107818078B (en) Semantic association and matching method for Chinese natural language dialogue
CN114970543A (en) Semantic analysis method for crowdsourced design resources
CN112182159B (en) Personalized search type dialogue method and system based on semantic representation
Heidary et al. Automatic Persian text summarization using linguistic features from text structure analysis
Panahandeh et al. Correction of spaces in Persian sentences for tokenization
CN111814456A (en) Verb-based Chinese text similarity calculation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20220812

Address after: Room 325, 3rd Floor, No. 19, Madian East Road, Haidian District, Beijing 100088

Patentee after: Beijing Digital Ocean Times Analysis Technology Co., Ltd.

Address before: 100081 No. 5 South Main Street, Haidian District, Beijing, Zhongguancun

Patentee before: BEIJING INSTITUTE OF TECHNOLOGY

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20230621

Address after: Room 07, 13th Floor, Building 5, Yard A2, West Third Ring North Road, Haidian District, Beijing, 100089

Patentee after: LINGJOIN ZHONGKE SOFTWARE (BEIJING) Co.,Ltd.

Address before: Room 325, 3rd Floor, No. 19, Madian East Road, Haidian District, Beijing 100088

Patentee before: Beijing Digital Ocean Times Analysis Technology Co.,Ltd.

TR01 Transfer of patent right