CN112364646A - Sentence comment emotion polarity analysis method considering modifiers - Google Patents

Sentence comment emotion polarity analysis method considering modifiers Download PDF

Info

Publication number
CN112364646A
CN112364646A CN202011293192.1A CN202011293192A CN112364646A CN 112364646 A CN112364646 A CN 112364646A CN 202011293192 A CN202011293192 A CN 202011293192A CN 112364646 A CN112364646 A CN 112364646A
Authority
CN
China
Prior art keywords
emotion
words
value
negative
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011293192.1A
Other languages
Chinese (zh)
Inventor
徐勇
李晓宇
苏发桂
吕锡志
李宇琪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui University of Finance and Economics
Original Assignee
Anhui University of Finance and Economics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui University of Finance and Economics filed Critical Anhui University of Finance and Economics
Priority to CN202011293192.1A priority Critical patent/CN112364646A/en
Publication of CN112364646A publication Critical patent/CN112364646A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a sentence comment emotion polarity analysis method considering modifiers, which solves the problem that emotion polarity calculation is not accurate enough. In order to solve the problems that the emotion value change amplitude is too large and is not comparable caused by different use habits of user languages, the value range of the emotion polarity value of the emotion phrase is normalized to be [ -5,5 ]. Experimental results show that the emotion polarity value of the sentence obtained by calculation can reflect the emotion of the user more delicately, and the value range is more reasonable.

Description

Sentence comment emotion polarity analysis method considering modifiers
Technical Field
The invention relates to the field of sentence comment sentiment analysis, in particular to a sentence comment sentiment polarity analysis method considering modifiers.
Background
With the rise and popularization of web2.0, more and more information is actively published on the web2.0 platform by Web users, and the information is collectively referred to as User-Generated Content (UGC) under web2.0. In recent years, UGC has been widely used in the fields of e-commerce, tourism, and the like. For example, the e-commerce platform provides the evaluation function of the commodity while providing the information related to the product characteristics (such as the material, size and the like of the product) for the consumer, the consumer who purchased the commodity can describe the shopping experience of the commodity on the platform, and the shopping experiences UGC provide a very valuable decision reference for the subsequent consumer to purchase the commodity. However, as the number of consumers increases, the number of commodity reviews UGC increases sharply, the practicability of the consumer review system is gradually weakened, most consumers are reluctant to spend a lot of time reading thousands of review information in order to know the attributes of commodities, so that the review information is rich, but the useful information obtained from the review UGC by potential consumers is less and less. Therefore, a convenient and effective UGC evaluation system is established for network platforms such as electronic commerce and tourism experience, and the important practical significance is achieved for helping users to quickly and accurately know information such as commodities and services.
Emotion is a direct way of expressing attitude of people, and is consistent with feelings and intentions, and the categories of emotion can be simply divided into positive emotion and negative emotion. The emotion recognition problem was first presented in The Society of mint of Minsky. In order to accurately quantify emotion, Picard defined emotion calculation in 1997 in the book "emotion Computing", and proposed that a computer can understand emotion like a human, realize emotion recognition, emotion representation, intensity calculation and the like, and promoted the progress in the fields of text analysis, speech recognition, face recognition and the like.
Currently, the main tasks of user generated content Emotion Analysis can be roughly divided into polarization Analysis calculation (Emotion Analysis) and viewpoint orientation Analysis calculation (Sentiment Analysis). According to the emotion analysis process, the method can be divided into the steps of extracting emotion elements, classifying emotion information and retrieving and inducing emotion.
The UGC sentiment analysis also has obvious effect, and can help users to make correct purchasing decisions, help a platform to monitor and predict public sentiment, and help enterprises to protect and promote public praise. In the financial field, sentiment analysis of comment text is also used to predict the tendency of stocks. Problems to be solved exist in the research field of emotion analysis, such as source of emotion, feature identification, influence of propagation mechanism, and main problems before the quantitative question of external factors, including the representation problem of source and essential features of emotion. And (4) selecting fuzzy statistical methods to determine index weight and comprehensive evaluation membership degree in an emotion analysis stage, such as brave, and the like, and providing a UGC fuzzy comprehensive evaluation model (FCE). Also to provide a comprehensive evaluation of product reviews, Raghupathi et al propose a more accurate overall sentiment rating algorithm that evaluates the leaves of a word tree with an influencing linguistic dictionary from a single text analysis. With the personalization, the concept is continuously added to the related research of the user generated content, such as user portrait, recommendation system, network public opinion analysis, and the like. The mainstream technology of text emotion analysis is summarized as follows:
method based on emotion dictionary/semantic
A good polarity dictionary can effectively improve the classification result of emotion analysis, and is an essential tool for analyzing user emotion. In the English field, the General Inquire, sentiWordNet, Option Lexicon dictionaries, etc. are relatively popular. General Inquirer is the earliest emotion dictionary in English, and contains two categories of positive emotion words and negative emotion words, which are derived from Harvard dictionary and Lasiville dictionary. sentiWordNet is an upgraded version of an English semantic knowledge base WordNet, a dictionary is further expanded on the basis of a WordNet dictionary, and information such as emotional words, emotional scores corresponding to the emotional words, synonyms and antonyms of partial words and the like is given in the dictionary.
The research aiming at Chinese emotion analysis starts late, and the complexity of Chinese causes that a Chinese dictionary is different from an English dictionary and does not have complete semantic resources. Currently, the HowNet emotion dictionary issued by jondon et al is widely used, and provides two types of emotion words of english and chinese, in addition to 214 degree adverbs and 38 propositions. The DUTIR emotion vocabulary ontology base issued by university of major graduates an emotion word in multiple angles, labels the part of speech, sets the intensity value of response, and continuously subdivides 7 large classes and 21 small classes, wherein the emotion word contains three classes of neutrality, commendation and deresion. NTUSD (national Taiwan University sentational dictionary) is an emotion dictionary published by Taiwan University, which is mainly used to make much expansion on derogatory words. With the continuous development of the internet, the appearance of emotion data such as facial expressions increases the number of english markup corpora.
Because the scale of the existing Chinese dictionary is not large enough and the words are formal, the existing Chinese dictionary is not suitable for analyzing the network text, and the dictionary needs to be supplemented in time along with the updating of the network words, Xuhualin provides a new word discovery method, and new words appearing in the text are screened out according to the Max-confidence of the composite new words. Xu and the like consider that some polysemous emotional words have positivity, negativity and neutrality, and the part-of-speech polarity cannot be accurately expressed, so that the accuracy of the text injury analysis is reduced to a certain extent, and an expanded emotional dictionary containing the basic emotional words, the polysemous emotional words and the polysemous emotional words is constructed. Wu and the like construct Chinese vocabulary based on social cognition, propose that before calculating emotional value, emotional tendency and user opinion need to be clarified, and research the limitation of the method based on the traditional machine learning technology. To build adaptive emotional vocabulary to improve the polarity classification of emotions in microblogs. And Keshavrz and the like generate an emotion dictionary in the microblog text based on a genetic algorithm so as to find out the optimal emotion vocabulary.
Method based on machine learning
The emotion analysis method of Machine Learning (ML) is more intelligent than the emotion dictionary, and classifies emotion by using a specific algorithm classification, and classifies texts by converting text numbers into models and combining mathematical concepts with self-Learning of a Machine.
In the aspect of machine learning application, aiming at a large amount of online opinions generated every day in the hotel industry, in order to keep trust of users on the online comments, Martinez-Torres and the like think that an automation tool using a machine learning method needs to be developed to distinguish between positive and negative deceptive comments and non-deceptive comments. Luo et al experimentally verified that a robust classification algorithm based on a Support Vector Machine (SVM) and a Fuzzy Domain Ontology (FDO) algorithm is superior to traditional classification algorithms such as naive Bayes (MB) and an SVM ontology in predicting the usefulness of online comments. Alfrjani et al propose a hybrid semantic knowledge base machine learning method for mining viewpoints at a domain feature level and classifying overall viewpoints at a multi-point scale. The method utilizes a new semantic knowledge base method to analyze a set of comments at the domain feature level and generate a set of structured information to associate expressed opinions with specific domain features.
Method based on deep learning
Deep Learning (DL) is a research branch of machine Learning, and is also a continuous evolution based on machine Learning, gradually moving to the field of artificial intelligence. Depending on the content class, deep learning mainly includes Convolutional Neural Networks (CNNs), self-coding neural networks of multi-layer neurons, and more optimized Deep Belief Networks (DBNs).
Zheng et al considers the short-term and long-term context dependencies and proposes a chinese emotion classification model based on the concept of Convolution Control Block (CCB) aiming at dividing chinese sentences into positive sentences and negative sentences. Considering that the deceptive comment and the authenticity comment are composed of authors who have no actual experience and no actual experience in using goods or services purchased on the web by consumers, there should be different context information between the two. Zhang et al propose a deep learning approach for text representation-a deep context representation of the Word vector (DC-Word) is used for identification of fraudulent comments. Lee et al propose a new unified product ranking method based on online product evaluation. The difference from existing methods is that deep learning techniques are used to extract high-level potential opinion representations that contain the most semantic information in the learning process.
In summary, the emotion analysis research aiming at the user generated content is usually isolated from the user main body at present, and each online comment is not separated from the comment main body, so how to combine the two comments and determine the emotion preference contained in the UGC by the UGC main body is a difficulty existing at present. The existing user generated content emotion analysis aims at UGC in a specific form, such as text emotion analysis, voice emotion analysis and even emotion analysis in face recognition. Considering that UGC has diversity as an information carrier, and the way of expressing emotion by users is also various, how to perform multi-mode mixed emotion analysis is an important research content.
Disclosure of Invention
Aiming at the problem that the calculation of the emotion polarity in the existing user generated content is not accurate enough, the invention provides a sentence comment emotion polarity analysis method considering modifiers.
The invention adopts the following technical scheme:
a sentence comment sentiment polarity analysis method considering modifiers comprises the following steps:
step 1: comment text preprocessing: segmenting the comment text, and deleting stop words, punctuation marks and space marks;
step 2: improving a HowNet dictionary;
and step 3: extracting the characteristics of the preprocessed comment text based on an improved HowNet dictionary;
and 4, step 4: and (3) identifying the emotion phrases: the emotional words, the modified adverbs and the negative words form emotional phrases;
and 5: calculating the emotion polarity value of the emotion phrase: acquiring the emotion value of the emotion word according to the improved HowNet dictionary, wherein the emotion polarity value of the positive emotion word is 1, and the emotion polarity value of the negative emotion word is-1;
obtaining weights of the modified adverbs and the negative words, taking the product of the weights of the modified adverbs and the negative words multiplied by the emotion value of the emotion words as the emotion polarity value of the emotion phrase, taking the absolute value of the emotion polarity value of the emotion phrase to be normalized by the power of 1/n when n modified adverbs exist, and setting the sign of the power operation result to be the same as the sign of the original value;
the calculation formula of the emotion polarity value PS of the emotion phrase is as follows:
Figure BDA0002784440670000041
wherein wnAssignment of a negative word, waRespectively the weight of the modified adverb; m and n are the numbers of the negative word and the modified adverb respectively; s is the emotion polarity value of the emotion word, and the value of S is 1 or-1;
step 6: and (3) sentence emotion polarity value calculation:
the calculation formula of the emotion polarity value of the sentence is as follows:
Figure BDA0002784440670000042
PSithe sentiment polarity value of the ith sentiment phrase is, and j is the number of the sentiment phrases in the sentence.
Preferably, step 1 specifically comprises: the preprocessing comprises word segmentation, punctuation mark deletion, stop word deletion, line feed character deletion and space character deletion, and the preprocessed comment texts are stored in a list form, wherein each element in the list form is a single word and is stored in a character type.
Preferably, step 2 specifically comprises:
the HowNet dictionary is supplemented with emotion words, modified adverbs and negative words, and after the emotion words, the modified adverbs and the negative words are supplemented, the HowNet dictionary is divided into three categories, which are respectively: an emotion word dictionary, a modified adverb dictionary and a negative word dictionary;
the emotion word dictionary comprises positive emotion words and negative emotion words, the emotion polarity value of the positive emotion words is 1, and the emotion polarity value of the negative emotion words is-1;
the modified adverb dictionary includes 6 weighted values according to the difference of the words, which are respectively: 2. 1.75, 1.5, 1.25, 0.5 and 0.25;
the assignment of a negative word in the negative word dictionary is-1.
Preferably, the step 3 of feature extraction specifically includes:
a) determining the number of active emotion words;
b) determining the number of negative emotion words;
c) determining the number and the position of negative words;
d) and determining the number and the position of the modified adverbs.
Preferably, the value range of the emotion polarity value of the sentence is [ -5,5 ].
The invention has the beneficial effects that:
the invention provides a sentence comment emotion polarity analysis method considering modifiers, which is used for performing emotion analysis from two dimensions of emotion polarity direction and emotion polarity intensity, firstly judging the emotion polarity direction expressed by a user in a sentence through emotion words, and then obtaining the deviation degree of emotion tendency by calculating the influence degree of modifiers and negatives on emotion polarity according to a dependency syntax theory, thereby more delicately depicting the emotion of the user in a comment text. In order to solve the problems that the emotion value change amplitude is too large and is not comparable caused by different use habits of user languages, the value range of the emotion polarity value of the emotion phrase is normalized to be [ -5,5 ]. Experimental results show that the emotion polarity value of the sentence obtained by calculation can reflect the emotion of the user more delicately, and the value range is more reasonable.
Drawings
FIG. 1 shows the pretreatment of a single piece of UGC in example 1.
FIG. 2 shows the emotion polarity values calculated by the SP method in example 1.
Fig. 3 is the emotion polarity value calculated by the QCSP method in example 1.
FIG. 4 is the emotion polarity value calculated by the method of the present invention in example 1.
Fig. 5 is a comparison diagram of emotion polarity values of three calculation methods of 50 comment UGC texts.
Fig. 6 shows the emotion polarity value calculated by the SP method for the original information of comment item 1 in example 2.
Fig. 7 shows the emotion polarity values calculated by the QCSP method for the original information of comment item 1 in example 2.
FIG. 8 is the emotion polarity value calculated by the method of the present invention for the original information of item 1 comment in example 2.
Fig. 9 is a comparison diagram of emotion polarity values of three calculation methods of 1000 comment UGC texts.
Detailed Description
The following description of the embodiments of the present invention will be made with reference to the accompanying drawings:
with reference to fig. 1 to 9, a sentence comment emotion polarity analysis method considering modifiers includes analyzing emotion polarity trends expressed by users in comment texts by identifying emotion words, and then obtaining emotion trend deviation degrees by calculating influence degrees of modified adverbs and negatives on emotion polarities, so as to obtain finer emotion polarity values.
The method comprises the following steps:
step 1: comment text preprocessing: and segmenting the comment text, and deleting stop words, punctuation marks and space marks.
The preprocessing comprises word segmentation, punctuation mark deletion, stop word deletion, line break deletion, space break deletion and the like, and the preprocessed comment texts are stored in a List (List) form, wherein each element in the List form is a single word and is stored in a character (string) type.
As shown in fig. 1, fig. 1 shows the content of the original UGC, the content of the UGC after word segmentation, and the content after the preprocessing stage is completed.
Step 2: the HowNet dictionary is improved.
The HowNet dictionary is supplemented with emotion words, modified adverbs and negative words, and after the emotion words, the modified adverbs and the negative words are supplemented, the HowNet dictionary is divided into three categories, which are respectively: an emotion word dictionary, a modified adverb dictionary, and a negative word dictionary.
The emotion word dictionary is as shown in Table 1:
TABLE 1 emotional thesaurus example
Figure BDA0002784440670000061
The emotion word dictionary comprises 7176 positive emotion words and 12062 negative emotion words, wherein the emotion polarity value of the positive emotion words is 1, and the emotion polarity value of the negative emotion words is-1.
There are 219 modified adverbs in the modified adverb dictionary, and the modified adverb dictionary includes 6 weighted values according to the difference of words, which are: 2. 1.75, 1.5, 1.25, 0.5 and 0.25. As shown in table 2:
TABLE 2 exemplary degree adverb dictionary
Figure BDA0002784440670000062
There are 58 negative words in the negative word dictionary, whose assignment is-1. As shown in table 3:
TABLE 3 negative word dictionary example
Figure BDA0002784440670000063
And step 3: the method for extracting the features of the preprocessed comment text based on the improved HowNet dictionary comprises the following steps:
a) determining the number of active emotion words;
b) determining the number of negative emotion words;
c) determining the number and the position of negative words;
d) and determining the number and the position of the modified adverbs.
And 4, step 4: and (3) identifying the emotion phrases: the emotional words, the modified adverbs and the negative words form emotional phrases.
In order to enhance the accuracy of the emotion analysis result based on the emotion dictionary, the emotion analysis based on the emotion dictionary also has to consider the dependency relationship existing between the modified adverb, the negation word and the emotion word in UGC. If the dependency syntax phenomenon existing in the context is not considered, a large error exists in the emotion analysis method based on the emotion dictionary.
In order to reduce the emotion calculation deviation brought by Chinese semantics, Chinese Dependency analysis (DP) can be introduced in the emotion extreme value calculation process, namely, the syntax structure of the Chinese Dependency analysis is revealed by analyzing the Dependency relationship between components in language units. The basic task of syntactic analysis is to determine the syntactic structure of a sentence or the dependency relationship between words in the sentence, which is not the final target of a natural language processing task, but is often the key link for achieving the final target. For simplicity, the invention only considers the dependency relationship between the degree adverb and the emotional word which are located in front of the emotional word; for negative words, only the influence of the negative words on the emotional polarity is considered, and the influence of the positions on the emotional intensity is not considered.
And 5: calculating the emotion polarity value of the emotion phrase: acquiring the emotion value of the emotion word according to the improved HowNet dictionary, wherein the emotion polarity value of the positive emotion word is 1, and the emotion polarity value of the negative emotion word is-1;
obtaining weights of the modified adverbs and the negative words, taking the product of the weights of the modified adverbs and the negative words multiplied by the emotion value of the emotion words as the emotion polarity value of the emotion phrase, taking the absolute value of the emotion polarity value of the emotion phrase to be normalized by the power of 1/n when n modified adverbs exist, and setting the sign of the power operation result to be the same as the sign of the original value;
just as mentioned above, only simple structural emotional phrases are considered, whose general form of structure is: (negation word) | (adverb) emotion word, and the calculation formula of the emotion polarity value PS of the emotion phrase is as follows:
Figure BDA0002784440670000071
wherein wnAssignment of a negative word, waRespectively the weight of the modified adverb; m and n are the numbers of the negative word and the modified adverb respectively; s is the emotion polarity value of the emotion word, and the value of S is 1 or-1.
The weight of the modified adverb is uniformly set according to the strength, and is increased by 0.25 step length, and the value range is [0.25,2 ]. See table 2.
In the emotion polarity value calculation process of the emotion phrases, when negative words are contained in the modified adverbs of the positive emotion words, the emotion phrases are counted into a negative emotion word set; similarly, when the modified adverbs of the negative emotion words contain negative words, the negative emotion phrases should be included in the positive emotion word set. Whether the positive emotion words or the negative emotion words exist, the emotion polarity of the emotion phrases is changed every time negative words are encountered in the corresponding emotion phrases.
Step 6: and (3) sentence emotion polarity value calculation:
the calculation formula of the emotion polarity value of the sentence is as follows:
Figure BDA0002784440670000072
PSithe sentiment polarity value of the ith sentiment phrase is, and j is the number of the sentiment phrases in the sentence.
The value range of the sentence emotion polarity value is [ -5,5 ].
Because the value range of a single modification adverb is [0.25,2], in order to solve the problem of overlarge emotion value caused by the overlarge number of modification adverbs, the emotion polarity value of the emotion phrase is calculated by adopting a calculation method of 1/n square and averaging to adjust the influence of the degree adverbs in the phrase on the emotion polarity value of the emotion phrase. Considering that the user evaluation grade which is common in the prior E-commerce platform adopts a five-grade scoring system, the value range of the UGC emotion polarity value is adjusted to be [ -5,5] when the UGC emotion polarity value is calculated.
Example 1
The UGC data of the experiment in this example was obtained by crawling on some e-commerce network platform using a Python crawler. UGC data content is comments made by users aiming at consumption conditions, and the data types are text data including punctuations, characters and blank characters. The present embodiment is described with a single UGC as an example.
The single UGC content is' express receipt, the appearance is good, the UGC is used for a while, the running speed is smooth, the logistics and the service are good, and the whole UGC is satisfied. "
FIG. 1 is a pre-processing of the single piece UGC.
In this example, a total of 4 positive emotion words, which are "nice look", "smooth", "good" and "satisfied", respectively, are included, each emotion word has a modified adverb, the modified adverb of "nice look" is "straight", the modified adverb of "smooth" is "very", the weight is 1.75, the modified adverb of "good look" is "very", the weight is 1.75, the modified adverb of "satisfied" is "very", and the weight is 1.75.
By using the sentence comment emotion polarity analysis method considering modifiers, the obtained sentence emotion polarity value 4.062 is finally calculated. As shown in fig. 4.
The following compares a simple sentence UGC emotion polarity calculation method SP in which modifiers are not considered, a sentence UGC emotion polarity calculation method QCSP in which modifiers are considered but degree adverbs are not normalized, and an emotion polarity analysis method of the present invention.
The SP method is an emotional tendency analysis method based on a simple word frequency (TF) thought, only the word frequency number of the emotional characteristic items in a sentence appearing in positive and negative emotional categories is counted, and the frequency of the emotional characteristic items appearing in the positive and negative emotional categories is high, so that the type of emotion of the sentence is judged, and the number of the positive and negative emotional items and the emotional polarity value of the sentence are judged. Because the comment UGC has a large number of short text phenomena, the emotional words in each comment UGC sentence are used as the characteristic items of the sentence.
The QCSP method directly multiplies the weight of all modifiers and the emotion value of the emotion word to obtain the emotion polarity value of the emotion phrase, and then adds the polarity values of all emotion phrases in the sentence to obtain the emotion polarity value of the sentence.
The SP method has the following calculation formula:
Figure BDA0002784440670000081
wherein SiThe emotion polarity value of the ith emotion word is; the calculation formula of the QCSP method is as follows:
Figure BDA0002784440670000082
PS' represents the emotion phrase situation calculated by the QCSP methodA polarity sensing value.
As can be seen from fig. 2, in the SP method, a single UGC of the present embodiment includes 4 positive emotion words, which are "nice looking", "fluent", "good" and "satisfactory", respectively. The emotional polarity value of the example UGC sentence is 4, i.e., the piece of UGC is positive UGC.
As can be seen from fig. 3, in the QCSP method, the positive emotion score of the UGC is 6.5, the negative emotion score is 0, and the final emotion polarity value is 6.5.
FIG. 4 shows an emotion polarity value 4.062 calculated using the method of the present invention.
The emotion polarity value calculated by the method is larger than a result 4 calculated by the SP method. The reason is that the SP method only considers the number of the emotional words in the sentence UGC, and does not consider the influence of the modified adverbs and negative words in the sentence on the expression strength of the emotional words.
The emotion polarity value 4.062 calculated by the method is smaller than the emotion polarity value 6.5 of the QCSP method, because the QCSP method does not consider the difference of the number of the modified adverbs expressing the same meaning caused by the difference of expression habits of different users, and the weights of the modified adverbs and the negative adverbs are directly applied to the emotion polarity value adjustment of the emotion words. The influence of the modification adverbs on the emotion polarity value is considered, the regression problem when the modification adverbs expressing the same meaning repeatedly modify the same emotion word is also considered, and the (1/n) square, the average value and the normalization processing are carried out on the emotion polarity value of the emotion word in the QCSP method, so that the problem of excessive repetition of the modification adverbs caused by the adjustment of the expression habit difference is solved. The result of the embodiment 1 shows that the emotion polarity value calculated by the invention has better reference value.
Example 2
And (5) crawling 1000 pieces of comment UGC texts from a certain E-commerce platform by utilizing a Python crawler program, and storing the comment UGC texts in a text file format.
Firstly, aiming at 50 pieces of comment UGC texts, the experimental result and analysis of UGC emotion polarity value are calculated by using the method, the QCSP method and the SP method.
FIG. 5 is a comparison of emotion polarity values of three calculation methods of 50 comment UGC texts.
When the emotion polarity value of the comment UGC is larger than 0, namely the positive emotion score is larger than the negative emotion score, the overall emotional tendency of the single comment is positive. When the emotion polarity value of the comment UGC is smaller than 0, namely the positive emotion score is smaller than the negative emotion score, the overall emotional tendency of the single comment is negative.
For example, the 1 st comment original information is "in general, also can a few disadvantages: 1) bad smell in the toilet; 2) inconvenient to land, to walk across the bridge to the street to call 3) the restaurant is expensive. "
As shown in fig. 6 to 8, the emotion polarity scores calculated by the 1 st comment under the three methods of the SP method, the QCSP method and the method of the present invention are respectively: 1. -1.5, -2.292.
The same comment UGC shows that the emotion polarity results obtained by the three evaluation methods are different, and although the same positive and negative emotion words are identified by the three calculation methods, the SP calculation method only counts the number of the positive and negative emotion words and does not consider the modified adverbs before the emotion words, so that a positive emotion value 1 representing the positive emotion polarity is obtained.
The QCSP method and the emotion polarity value of the invention are negative emotion polarity values representing negative emotions, because the influence of the modification side words on the emotion expression intensity of the emotion words is considered by the calculation method, and more accurate emotion polarity values are obtained.
The fact that the absolute value of the emotion polarity value calculated by the method is larger than the absolute value of the emotion polarity value calculated by the QCSP method is not that the negative emotion of the comment is judged to be stronger by the method, but the method disclosed by the invention enables the emotion polarity values of all comments UGC to be uniformly distributed between [ -5,5] no matter the comment is a long text or a short text through normalization processing calculation results, and reduces errors caused by concise expression habits of users or detailed emotion polarity values.
Therefore, the emotion tendency of the text UGC can be judged by considering the emotion extreme value calculation of the modified adverb based on the emotion dictionary, and the emotion intensity among different UGCs can be judged according to the magnitude of the numerical value. The user emotion contained in the comment UGC issued by the user can be better balanced.
And secondly, calculating emotion polarity value experiment results and analyzing 1000 UGC data.
Fig. 9 is a comparison of emotion polarity values of three calculation methods of 1000 comment UGC texts.
Statistical index of emotional value of data in table 41000
Maximum value Minimum value Mean value of Variance (variance)
SP 28 -7 1.25 6.42
QCSP 52.5 -12 1.68 13.34
The invention 5 -5 0.69 3.83
As can be seen from fig. 9 and table 4, the emotion polarity values obtained by the SP method, the QCSP method, and the calculation method of the present invention are in the ranges of [ -7,28], [ -12,52.5], [ -5,5], respectively, and it is obvious that the emotion polarity boundary values obtained by the SP method and the QCSP method will have significant changes with the change of the comment UGC, and thus the emotion intensities expressed by the same emotion polarity value obtained by the two methods in UGC corpora of different fields may be different. For example, for the same value 2, if the emotion polarity value range obtained in the A field UGC corpus analysis is [ x,2], 2 represents the strongest positive emotion; if the value range of the emotion polarity value obtained by UGC corpus analysis in the B field is [2, x ], 2 represents the weakest positive emotion, so that information expression is uncertain and confusion is caused. The emotion polarity value calculated by the method is between-5 and 5, and fluctuation with larger amplitude along with the UGC difference is avoided, so that the method has a better reference value.
It is to be understood that the above description is not intended to limit the present invention, and the present invention is not limited to the above examples, and those skilled in the art may make modifications, alterations, additions or substitutions within the spirit and scope of the present invention.

Claims (5)

1. A sentence comment emotion polarity analysis method considering modifiers is characterized by comprising the following steps:
step 1: comment text preprocessing: segmenting the comment text, and deleting stop words, punctuation marks and space marks;
step 2: improving a HowNet dictionary;
and step 3: extracting the characteristics of the preprocessed comment text based on an improved HowNet dictionary;
and 4, step 4: and (3) identifying the emotion phrases: the emotional words, the modified adverbs and the negative words form emotional phrases;
and 5: calculating the emotion polarity value of the emotion phrase: acquiring the emotion value of the emotion word according to the improved HowNet dictionary, wherein the emotion polarity value of the positive emotion word is 1, and the emotion polarity value of the negative emotion word is-1;
obtaining weights of the modified adverbs and the negative words, taking the product of the weights of the modified adverbs and the negative words multiplied by the emotion value of the emotion words as the emotion polarity value of the emotion phrase, taking the absolute value of the emotion polarity value of the emotion phrase to be normalized by the power of 1/n when n modified adverbs exist, and setting the sign of the power operation result to be the same as the sign of the original value;
the calculation formula of the emotion polarity value PS of the emotion phrase is as follows:
Figure FDA0002784440660000011
wherein wnAssignment of a negative word, waRespectively the weight of the modified adverb; m and n are the numbers of the negative word and the modified adverb respectively; s is the emotion polarity value of the emotion word, and the value of S is 1 or-1;
step 6: and (3) sentence emotion polarity value calculation:
the calculation formula of the emotion polarity value of the sentence is as follows:
Figure FDA0002784440660000012
PSithe sentiment polarity value of the ith sentiment phrase is, and j is the number of the sentiment phrases in the sentence.
2. The method for analyzing emotion polarity of sentence comments in consideration of modifiers according to claim 1, wherein the step 1 specifically comprises: the preprocessing comprises word segmentation, punctuation mark deletion, stop word deletion, line feed character deletion and space character deletion, and the preprocessed comment texts are stored in a list form, wherein each element in the list form is a single word and is stored in a character type.
3. The method for analyzing emotion polarity of sentence comments in consideration of modifiers according to claim 1, wherein the step 2 specifically comprises:
the HowNet dictionary is supplemented with emotion words, modified adverbs and negative words, and after the emotion words, the modified adverbs and the negative words are supplemented, the HowNet dictionary is divided into three categories, which are respectively: an emotion word dictionary, a modified adverb dictionary and a negative word dictionary;
the emotion word dictionary comprises positive emotion words and negative emotion words, the emotion polarity value of the positive emotion words is 1, and the emotion polarity value of the negative emotion words is-1;
the modified adverb dictionary includes 6 weighted values according to the difference of the words, which are respectively: 2. 1.75, 1.5, 1.25, 0.5 and 0.25;
the assignment of a negative word in the negative word dictionary is-1.
4. The method for analyzing emotion polarity of sentence comments taking modifiers into account as claimed in claim 1, wherein the step 3 of feature extraction specifically comprises:
a) determining the number of active emotion words;
b) determining the number of negative emotion words;
c) determining the number and the position of negative words;
d) and determining the number and the position of the modified adverbs.
5. The method of claim 1, wherein the value range of the sentiment polarity value of the sentence is [ -5,5 ].
CN202011293192.1A 2020-11-18 2020-11-18 Sentence comment emotion polarity analysis method considering modifiers Pending CN112364646A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011293192.1A CN112364646A (en) 2020-11-18 2020-11-18 Sentence comment emotion polarity analysis method considering modifiers

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011293192.1A CN112364646A (en) 2020-11-18 2020-11-18 Sentence comment emotion polarity analysis method considering modifiers

Publications (1)

Publication Number Publication Date
CN112364646A true CN112364646A (en) 2021-02-12

Family

ID=74532508

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011293192.1A Pending CN112364646A (en) 2020-11-18 2020-11-18 Sentence comment emotion polarity analysis method considering modifiers

Country Status (1)

Country Link
CN (1) CN112364646A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113128198A (en) * 2021-04-23 2021-07-16 广州酷狗计算机科技有限公司 Feedback information processing method and device, computer equipment and storage medium
CN113688620A (en) * 2021-08-26 2021-11-23 北京阅神智能科技有限公司 Article emotion analysis method and device
CN117851688A (en) * 2024-03-06 2024-04-09 成都理工大学 Personalized recommendation method based on deep learning and user comment content

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110598219A (en) * 2019-10-23 2019-12-20 安徽理工大学 Emotion analysis method for broad-bean-net movie comment

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110598219A (en) * 2019-10-23 2019-12-20 安徽理工大学 Emotion analysis method for broad-bean-net movie comment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
武雅利: "基于情感词典的用户生成内容个性化情感分析", 万方:HTTPS://D.WANFANGDATA.COM.CN/THESIS/CHJUAGVZAXNOZXDTMJAYMZAXMTISCUQWMJA1MDI3NROICZLXZWXKB20%3D *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113128198A (en) * 2021-04-23 2021-07-16 广州酷狗计算机科技有限公司 Feedback information processing method and device, computer equipment and storage medium
CN113688620A (en) * 2021-08-26 2021-11-23 北京阅神智能科技有限公司 Article emotion analysis method and device
CN113688620B (en) * 2021-08-26 2024-03-22 北京阅神智能科技有限公司 Article emotion analysis method and device
CN117851688A (en) * 2024-03-06 2024-04-09 成都理工大学 Personalized recommendation method based on deep learning and user comment content
CN117851688B (en) * 2024-03-06 2024-05-03 成都理工大学 Personalized recommendation method based on deep learning and user comment content

Similar Documents

Publication Publication Date Title
Bonta et al. A comprehensive study on lexicon based approaches for sentiment analysis
CN111767741B (en) Text emotion analysis method based on deep learning and TFIDF algorithm
Xu et al. Hierarchical emotion classification and emotion component analysis on Chinese micro-blog posts
CN112364646A (en) Sentence comment emotion polarity analysis method considering modifiers
CN111666480A (en) False comment identification method based on rolling type collaborative training
CN112861541B (en) Commodity comment sentiment analysis method based on multi-feature fusion
KR101851788B1 (en) Apparatus and method for updating dictionary of text sentimental analysis
Kaushik et al. A study on sentiment analysis: methods and tools
CN107357860A (en) A kind of personal share mood assemblage method based on news data
Dubey et al. Extended opinion lexicon and ML-based sentiment analysis of tweets: a novel approach towards accurate classifier
Sarkar Sentiment polarity detection in Bengali tweets using LSTM recurrent neural networks
Rao et al. Detection of sarcasm on amazon product reviews using machine learning algorithms under sentiment analysis
Chinnalagu et al. Context-based sentiment analysis on customer reviews using machine learning linear models
Ghosh et al. A comparative study of different classification techniques for sentiment analysis
Imani et al. Aspect extraction and classification for sentiment analysis in drug reviews
Fasha et al. Opinion mining using sentiment analysis: a case study of readers’ response on long Litt Woon’s the way through the woods in goodreads
Anjali et al. A novel sentiment classification of product reviews using Levenshtein distance
Ahmad et al. Ranking system for opinion mining of features from review documents
Šandor et al. Sarcasm detection in online comments using machine learning
KR101851795B1 (en) Apparatus and method for update of emotion dictionary using domain-specific terminology
Niu et al. Sentiment analysis and contrastive experiments of long news texts
Fouadi et al. Applications of deep learning in Arabic sentiment analysis: research perspective
KR101851794B1 (en) Apparatus and Method for Generating Emotion Scores for Target Phrases
CN115906824A (en) Text fine-grained emotion analysis method, system, medium and computing equipment
Modak et al. A study on sentiment analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210212