CN110633367A - Seven-emotion classification method based on emotion dictionary and microblog text data - Google Patents

Seven-emotion classification method based on emotion dictionary and microblog text data Download PDF

Info

Publication number
CN110633367A
CN110633367A CN201910862263.6A CN201910862263A CN110633367A CN 110633367 A CN110633367 A CN 110633367A CN 201910862263 A CN201910862263 A CN 201910862263A CN 110633367 A CN110633367 A CN 110633367A
Authority
CN
China
Prior art keywords
emotion
words
dictionary
text data
name
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910862263.6A
Other languages
Chinese (zh)
Inventor
肖乐
轩辕敏峥
段梦诗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Henan University of Technology
Original Assignee
Henan University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Henan University of Technology filed Critical Henan University of Technology
Priority to CN201910862263.6A priority Critical patent/CN110633367A/en
Publication of CN110633367A publication Critical patent/CN110633367A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a seven-emotion classification method based on an emotion dictionary and microblog text data. The method comprises the following steps: first, 37033 pieces of text data are acquired from the microblog by using a python crawler and taking 'grain safety' as a keyword. Meanwhile, a seven-emotion dictionary is obtained by using second-order near-meaning words of the keywords of joy, anger, fun, fear and the like and adding some new words in the Hownet negative emotion dictionary. The deactivation dictionary, the negative dictionary, and the degree dictionary are then processed separately. And finally, calculating relative scores of the processed text data through each dictionary and classifying the processed text data. The invention classifies seven emotions based on the emotion dictionary, effectively solves the problem that the original method only has positive and negative and neutrality, and can use the result as a training set for deep learning emotion classification to solve the problem of manual labeling.

Description

Seven-emotion classification method based on emotion dictionary and microblog text data
Technical Field
The invention belongs to the field of text sentiment analysis, and particularly relates to a seven-sentiment classification method based on a sentiment dictionary and microblog text data.
Background
The emotion analysis technology is an important application field of Natural Language Processing (NLP), and can be used for rapidly mastering the attitude of people to a certain hot event or commodity service and guiding public opinion improvement service. There are two main approaches in emotion classification: emotion dictionaries and machine learning.
The emotion dictionary is constructed based on the emotion knowledge, and then the dictionary is used for classifying the text, although preparation work of a large number of emotion dictionaries is needed in the early stage, the emotion dictionary is widely used due to the characteristics of wide application range and short time consumption. As early as 1998, Whissel required subjects to describe various terms with 5 words, and established the first emotion dictionary. In the next two decades, many scholars have made various improvements on the basis of the scholars, and Whissel also makes revisions to the emotion dictionaries built by the scholars to further adapt to the requirements of natural languages. Since emotion dictionaries appear abroad at the earliest, english dictionaries accumulate abundant resources, and li shou shan et al construct the earliest chinese emotion dictionary by translating english dictionaries.
The emotion dictionaries are all the most basic, contain few words, cause low emotion word coverage rate, are difficult to recognize synonyms, and mainly depend on human co-statistics. Later scholars further improve the emotion dictionary by using various methods, and Yang love people and the like calculate the emotion tendency value of the basic emotion word by using a plurality of emotion seed words and obtain the co-occurrence number of the emotion seed words on a search engine to construct the emotion dictionary. According to the method, a new word dictionary is constructed by utilizing microblog data in Wangzao and the like, and emoticons are added to serve as an auxiliary function, so that the existing emotion resources are further expanded. Wanngwenbo et al discusses the possibility of creating an automatic training data set. Bharat Gaind et al uses the similar meaning words of six basic emotions to construct an emotion dictionary.
The emotion dictionary has the characteristics of large application range, short time consumption and the like, but the accuracy is often not high enough. The machine learning method can be trained for specific data sets to obtain higher accuracy, so that the machine learning method is widely used. Pang et al use SVM, maximum entropy and naive Bayes to perform secondary classification on movie reviews, and obtain good effect; in later studies, they produced opinion-oriented information search systems based on machine learning techniques. In the field of Chinese emotion analysis, Jun Li and the like compare common machine learning methods in Chinese hotel comment, and the conclusion that naive Bayes obtains the best effect in the Chinese hotel comment is obtained.
With the development of deep learning technology, more and more scholars conduct emotion classification technology research based on a deep learning method. Paredes et al propose a deep learning emotion analysis model based on a convolutional neural network and word2 vec. Xiao et al propose a Chinese emotion classification method based on a convolution control block in sentence units. Hassan et al jointly train CNN and RNN to obtain a convolution layer based on long-term dependence of long-term and short-term memory networks. The above methods all achieve higher accuracy, but require long training.
In the machine learning method, the supervised learning can definitely obtain higher accuracy, but the supervised learning needs manual marking, which needs a lot of time, and the semi-supervised learning effect developed to cope with the problem and only needs a small amount of marking is not ideal. In order to solve the problem, a large number of vocabularies are selected to be marked manually to form an emotion dictionary, and the vocabularies classified by the emotion dictionary are processed and then used as a training set to carry out emotion classification based on machine learning.
The invention content is as follows:
the current emotion classification whether uses an emotion dictionary, machine learning or deep learning method generally only divides the emotion into two categories, namely good and bad, and some scholars add neutrality to the emotion classification, but the emotion classification cannot well describe complex emotion. In order to solve the problem, the text is inspired by seven emotions in traditional culture and is changed, and the emotions are divided into seven categories, namely, happiness, anger, sadness, happiness, love, aversion and fear, aiming at describing the emotions more accurately. And in traditional research, there is a lack of grading of emotion intensity, and in this document, each emotion category is classified into three levels (low, medium, high) to represent the intensity of emotion. And the emotion in the text has an inclusion relation with the traditional emotion classification, namely love and dislike can respectively represent positive attitude and negative attitude, love can be used for representing neutrality, and the other four types of emotion can represent the personal emotion of a speaker instead of simple evaluation, so that the application range of emotion analysis can be expanded from simple commodity analysis to all texts. The specific flow of our method is as follows:
1. a seven-emotion classification method based on an emotion dictionary and microblog text data is characterized by comprising the following steps of: comprises the following steps of (a) carrying out,
preprocessing original text data;
step (B), emotion word detection;
step (C) detecting the name of the person;
step (D) negative word and degree word detection;
and (E) calculating the emotion score.
2. The seven-emotion classification method based on emotion dictionary and microblog text data according to claim 1, wherein: the method comprises the following steps of (A) preprocessing original text data, wherein in a data preprocessing stage, two important tasks are provided: firstly, sorting the acquired microblog data set, deleting useless information such as format characters, time, user names and the like, and arranging the useless information into a required form; and secondly, segmenting words and removing stop words, performing word segmentation processing on the text by using a jieba library in Python at the stage, and then removing words which have no influence on emotions, such as punctuation marks, conjunctions, nonsense words and the like, by referring to a stop dictionary.
3. The seven-emotion classification method based on emotion dictionary and microblog text data according to claim 1, wherein: step (B), detecting emotional words, wherein the local schematic of the emotional words is shown in a table 1;
TABLE 1 local example of an emotion dictionary
Vocabulary and phrases Emotion classification Emotion weight (intensity)
Love Love 3
Complaints Dislike of gastric cancer 2
Great happy heart Musical instrument 3
Heavy weight Grief 1
Worry about Fear of 2
Glace piece Anger 2
After each section of text data is processed, a group of words can be obtained, the group of words is compared with the emotion dictionary in a traversing mode, and the emotion and the weight corresponding to the emotion words are detected; if a plurality of emotional words are detected, recording the emotional words respectively; if any of the six emotions is not detected, the text data is classified as a favorite (calm) with a score of 100.
4. The seven-emotion classification method based on emotion dictionary and microblog text data according to claim 1, wherein: step (C), detecting the name of a person, wherein in the text, the name of the person also has an influence on the strong degree of emotion, the third name (the third name, a specific name and the like and the name of no person) is most objective, and the first name (I, us and the like) is most subjective; in the classification algorithm, the fact that the subject to which the emotional words belong is the name of the person is firstly detected, and the weight is given according to the name, and the weight value P of the person is shown in the table 2.
TABLE 2 personal weighing value
Weighing scale Personal weighing value P
First person scale 5
Second person weighing 2
Third person weighing 1
5. The seven-emotion classification method based on emotion dictionary and microblog text data according to claim 1, wherein: and (D) detecting negative words and degree words, wherein in the text data, two words can change the emotional state or the value: a negative and degree; the negative word can completely change the emotional property to be changed into an antisense word of the original emotional classification, love and aversion are exchanged under the negative word, sadness is exchanged, and anger and fear point to happiness; the degree words change the intensity of the emotion words, and the intensity of the emotion words is considered in combination, so that a new emotion weight value Q is obtained and shown in Table 3.
TABLE 3 Emotion weight values
Degree of emotion word intensity Degree of word intensity Emotion weight value Q
3 (high) Is free of 6
3 3 8
3 2 6
3 1 4
2 (middle) Is free of 4
2 3 6
2 2 4
2 1 2
1 (Low) Is free of 2
1 3 4
1 2 2
1 1 1
6. The seven-emotion classification method based on emotion dictionary and microblog text data according to claim 1, wherein: step (E), calculating the emotion scores, and finally calculating the emotion scores of each type of emotion after the steps are finished, wherein the absolute score of each type of individual emotion is calculated according to the emotion words, the degree words, the negative words and the human terms, and is shown in formula (1); and then obtaining the relative score of each emotion in a section of text according to the weight, which is shown in a formula (2): wherein, with Ei(i ═ 1,2, … 7) indicates an emotion (corresponding to a fear of joy, anger, fun, aversion), S (E)i) An absolute score, R (E), representing the emotioni) Representing the relative score of the emotion, wherein P is a personal weight value, and Q is an emotion weight value;
Figure BDA0002200153900000031
Figure BDA0002200153900000041
all Hits of E in equation (1)iRepresenting all corresponding words in the emotion dictionary belonging to the emotion in the text; equation (2) shows that R (E) is the absolute score of 0 for all other six emotions1) The segment of text is classified as calm 100.
Description of the drawings:
as shown in the attached drawings, FIG. 1 is a flow chart of a seven-emotion classification algorithm, FIG. 2 is an emotion classification antisense chart, FIG. 3 is a single emotion distribution chart, and FIG. 4 is a multi-emotion statistical chart.
The specific implementation mode is as follows:
the experimental process requires a text data set crawled from the microblog, a seven-emotion dictionary, a stop dictionary, a negative dictionary, a degree dictionary and the like. Calculating emotion classification of each sentence in the text data set, counting, drawing a column diagram by counting all texts with certain emotion, and drawing a pie diagram by counting the number of single emotion texts (texts with only one emotion).
After the score is calculated, a seven-column array is obtained, seven numbers in each row respectively represent relative scores of the text data on seven emotions, such as the parts listed in table 4, wherein if the score of a certain item is 100, the text is represented as a single emotion text, and if the score of multiple items is between 0 and 100, the text is referred to as a multi-emotion text, and the single emotion text is divided in such a way to be more favorable as a training set for machine learning.
Table 4 relative score values show
Happiness Anger Grief Musical instrument Love Dislike of gastric cancer Fear of
0 0 40 0 0 60 0
100 0 0 0 0 0 0
0 85.71 0 0 14.29 0 0
0 0 0 0 100 0 0
All the single emotion texts are counted to obtain the number of each emotion of the text only with the single emotion, all the emotion texts appearing in all the texts are counted to obtain the number of all the appearing emotions, and detailed data are shown in a table 5.
TABLE 5 statistics of the number of occurrences of each emotion
Figure BDA0002200153900000042
As can be seen from table 5, in the 37033 data whose subjects are food safety, the preference (calmness) accounts for most of the data, and the preference (preference) is less, which proves that the evaluation of food safety by the vast network friends is more and more fair, and the preference is biased, which represents that the public is full of confidence in food safety in China. However, there are also a lot of dislike (aversion) and other emotions, which need to pay attention in public opinion work, and find out and solve the pain and worry of the public in public opinion.
In addition, three emotions with very obvious advantages in the seven categories of emotions are respectively likes, loves and dislikes, which correspond to neutrality, positivity and negativity in the three-emotion classification, so that the three-emotion classification is an approximation of the seven-emotion classification and can obtain relatively close results. In later research, it is possible to expand more and more detailed emotional categories to make machines more language-aware, and thus more human-aware, in order to improve artificial intelligence.

Claims (6)

1. A seven-emotion classification method based on an emotion dictionary and microblog text data is characterized by comprising the following steps of: comprises the following steps of (a) carrying out,
preprocessing original text data;
step (B), emotion word detection;
step (C) detecting the name of the person;
step (D) negative word and degree word detection;
and (E) calculating the emotion score.
2. The seven-emotion classification method based on emotion dictionary and microblog text data according to claim 1, wherein: the method comprises the following steps of (A) preprocessing original text data, wherein in a data preprocessing stage, two important tasks are provided: firstly, sorting the acquired microblog data set, deleting useless information such as format characters, time, user names and the like, and arranging the useless information into a required form; and secondly, segmenting words and removing stop words, performing word segmentation processing on the text by using a jieba library in Python at the stage, and then removing words which have no influence on emotions, such as punctuation marks, conjunctions, nonsense words and the like, by referring to a stop dictionary.
3. The seven-emotion classification method based on emotion dictionary and microblog text data according to claim 1, wherein: step (B), detecting emotional words, wherein the local schematic of the emotional words is shown in a table 1;
TABLE 1 local example of an emotion dictionary
Vocabulary and phrases Emotion classification Emotion weight (intensity) Love Love 3 Complaints Dislike of gastric cancer 2 Great happy heart Musical instrument 3 Heavy weight Grief 1 Worry about Fear of 2 Glace piece Anger 2
After each section of text data is processed, a group of words can be obtained, the group of words is compared with the emotion dictionary in a traversing mode, and the emotion and the weight corresponding to the emotion words are detected; if a plurality of emotional words are detected, recording the emotional words respectively; if any of the six emotions is not detected, the text data is classified as a favorite (calm) with a score of 100.
4. The seven-emotion classification method based on emotion dictionary and microblog text data according to claim 1, wherein: step (C), detecting the name of a person, wherein in the text, the name of the person also has an influence on the strong degree of emotion, the third name (the third name, a specific name and the like and the name of no person) is most objective, and the first name (I, us and the like) is most subjective; in the classification algorithm, the fact that the subject to which the emotional words belong is the name of the person is firstly detected, and the weight is given according to the name, and the weight value P of the person is shown in the table 2.
TABLE 2 personal weighing value
Weighing scale Personal weighing value P First person scale 5 Second person weighing 2 Third person weighing 1
5. The seven-emotion classification method based on emotion dictionary and microblog text data according to claim 1, wherein: and (D) detecting negative words and degree words, wherein in the text data, two words can change the emotional state or the value: a negative and degree; the negative word can completely change the emotional property to be changed into an antisense word of the original emotional classification, love and aversion are exchanged under the negative word, sadness is exchanged, and anger and fear point to happiness; the degree words change the intensity of the emotion words, and the intensity of the emotion words is considered in combination, so that a new emotion weight value Q is obtained and shown in Table 3.
TABLE 3 Emotion weight values
Degree of emotion word intensity Degree of word intensity Emotion weight value Q 3 (high) Is free of 6 3 3 8 3 2 6 3 1 4 2 (middle) Is free of 4 2 3 6 2 2 4 2 1 2 1 (Low) Is free of 2 1 3 4 1 2 2 1 1 1
6. According to claim 1The seven-emotion classification method based on the emotion dictionary and the microblog text data is characterized by comprising the following steps of: step (E), calculating the emotion scores, and finally calculating the emotion scores of each type of emotion after the steps are finished, wherein the absolute score of each type of individual emotion is calculated according to the emotion words, the degree words, the negative words and the human terms, and is shown in formula (1); and then obtaining the relative score of each emotion in a section of text according to the weight, which is shown in a formula (2): wherein, with Ei(i ═ 1,2, … 7) indicates an emotion (corresponding to a fear of joy, anger, fun, aversion), S (E)i) An absolute score, R (E), representing the emotioni) Representing the relative score of the emotion, wherein P is a personal weight value, and Q is an emotion weight value;
Figure FDA0002200153890000021
all Hits of E in equation (1)iRepresenting all corresponding words in the emotion dictionary belonging to the emotion in the text; equation (2) shows that R (E) is the absolute score of 0 for all other six emotions1) The segment of text is classified as calm 100.
CN201910862263.6A 2019-09-12 2019-09-12 Seven-emotion classification method based on emotion dictionary and microblog text data Pending CN110633367A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910862263.6A CN110633367A (en) 2019-09-12 2019-09-12 Seven-emotion classification method based on emotion dictionary and microblog text data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910862263.6A CN110633367A (en) 2019-09-12 2019-09-12 Seven-emotion classification method based on emotion dictionary and microblog text data

Publications (1)

Publication Number Publication Date
CN110633367A true CN110633367A (en) 2019-12-31

Family

ID=68972605

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910862263.6A Pending CN110633367A (en) 2019-09-12 2019-09-12 Seven-emotion classification method based on emotion dictionary and microblog text data

Country Status (1)

Country Link
CN (1) CN110633367A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111191614A (en) * 2020-01-02 2020-05-22 中国建设银行股份有限公司 Document classification method and device
CN111241286A (en) * 2020-01-16 2020-06-05 东方红卫星移动通信有限公司 Short text emotion fine classification method based on mixed classifier
CN112000804A (en) * 2020-08-18 2020-11-27 安徽理工大学 Microblog hot topic user group emotion tendentiousness analysis method
CN112507115A (en) * 2020-12-07 2021-03-16 重庆邮电大学 Method and device for classifying emotion words in barrage text and storage medium
CN113609842A (en) * 2021-08-17 2021-11-05 四川轻化工大学 Method for obtaining scenic spot comment data and travel experience evaluation

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111191614A (en) * 2020-01-02 2020-05-22 中国建设银行股份有限公司 Document classification method and device
CN111191614B (en) * 2020-01-02 2023-08-29 中国建设银行股份有限公司 Document classification method and device
CN111241286A (en) * 2020-01-16 2020-06-05 东方红卫星移动通信有限公司 Short text emotion fine classification method based on mixed classifier
CN112000804A (en) * 2020-08-18 2020-11-27 安徽理工大学 Microblog hot topic user group emotion tendentiousness analysis method
CN112000804B (en) * 2020-08-18 2022-08-02 安徽理工大学 Microblog hot topic user group emotion tendentiousness analysis method
CN112507115A (en) * 2020-12-07 2021-03-16 重庆邮电大学 Method and device for classifying emotion words in barrage text and storage medium
CN112507115B (en) * 2020-12-07 2023-02-03 重庆邮电大学 Method and device for classifying emotion words in barrage text and storage medium
CN113609842A (en) * 2021-08-17 2021-11-05 四川轻化工大学 Method for obtaining scenic spot comment data and travel experience evaluation

Similar Documents

Publication Publication Date Title
CN110633367A (en) Seven-emotion classification method based on emotion dictionary and microblog text data
CN106599032B (en) Text event extraction method combining sparse coding and structure sensing machine
CN109829166B (en) People and host customer opinion mining method based on character-level convolutional neural network
CN111767741A (en) Text emotion analysis method based on deep learning and TFIDF algorithm
CN108763213A (en) Theme feature text key word extracting method
CN106202372A (en) A kind of method of network text information emotional semantic classification
CN110705247B (en) Based on x2-C text similarity calculation method
Ying et al. Improving multi-label emotion classification by integrating both general and domain-specific knowledge
Zhou et al. Sentiment analysis of text based on CNN and bi-directional LSTM model
CN108073571B (en) Multi-language text quality evaluation method and system and intelligent text processing system
CN106446147A (en) Emotion analysis method based on structuring features
CN110175221A (en) Utilize the refuse messages recognition methods of term vector combination machine learning
CN111339772B (en) Russian text emotion analysis method, electronic device and storage medium
CN107818173B (en) Vector space model-based Chinese false comment filtering method
Krchnavy et al. Sentiment analysis of social network posts in Slovak language
Khalid et al. Topic detection from conversational dialogue corpus with parallel dirichlet allocation model and elbow method
CN113934835B (en) Retrieval type reply dialogue method and system combining keywords and semantic understanding representation
Bilbao-Jayo et al. Improving political discourse analysis on twitter with context analysis
Hasnat et al. Understanding sarcasm from reddit texts using supervised algorithms
CN114090756B (en) Intelligent processing method, equipment and storage medium for public opinion information
Bouarara Sentiment analysis using machine learning algorithms and text mining to detect symptoms of mental difficulties over social media
CN110569495A (en) Emotional tendency classification method and device based on user comments and storage medium
Fuji et al. Emotion analysis on social big data
CN113254590A (en) Chinese text emotion classification method based on multi-core double-layer convolutional neural network
Maskat et al. Categorization of malay social media text and normalization of spelling variations and vowel-less words

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20191231

WD01 Invention patent application deemed withdrawn after publication