CN112711941A - Emotional score analysis processing method based on emotional dictionary entity - Google Patents

Emotional score analysis processing method based on emotional dictionary entity Download PDF

Info

Publication number
CN112711941A
CN112711941A CN202110021645.3A CN202110021645A CN112711941A CN 112711941 A CN112711941 A CN 112711941A CN 202110021645 A CN202110021645 A CN 202110021645A CN 112711941 A CN112711941 A CN 112711941A
Authority
CN
China
Prior art keywords
entity
emotional
entities
emotion
score
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110021645.3A
Other languages
Chinese (zh)
Other versions
CN112711941B (en
Inventor
张娴
王盼盼
周庆勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Cloud Information Technology Co Ltd
Original Assignee
Inspur Cloud Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Cloud Information Technology Co Ltd filed Critical Inspur Cloud Information Technology Co Ltd
Priority to CN202110021645.3A priority Critical patent/CN112711941B/en
Publication of CN112711941A publication Critical patent/CN112711941A/en
Application granted granted Critical
Publication of CN112711941B publication Critical patent/CN112711941B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides an analysis processing method of emotion scores based on an emotion dictionary entity, which belongs to the field of natural language processing and comprises 6 steps: 1) preparing a dictionary; 2) establishing a structure defining an entity, and 3) establishing an entity comparator; 4) traversing the text to be analyzed according to the established entity to generate all candidate entities; 5) screening candidate entities; 6) an emotion score is calculated. The method uses four dictionaries such as an emotion dictionary to create the entity, and fine-grained processing is performed on the traversal of the entity, so that errors are reduced.

Description

Emotional score analysis processing method based on emotional dictionary entity
Technical Field
The invention relates to the field of natural language processing, in particular to an emotion score analysis processing method based on an emotion dictionary entity.
Background
What is the sentiment analysis? Briefly, this is the process of analyzing, processing, generalizing, and reasoning subjective text with emotional colors. A great deal of valuable review information about people, events, products, etc. is generated on the internet (e.g., blogs and forums and social service networks such as mass reviews, beauty groups). The comment information expresses various emotional colors and emotional tendencies of people, such as happiness, anger, grief, music and criticism, praise and the like. Based on this, the potential user can know the opinion of the public opinion on a certain event or product by browsing the subjective color comments. Developments and rapid initiatives in this area benefit from rapid development of social media on the network, such as product reviews, forum discussions, micro blogs, and the like. Since the early 2000 s, emotion analysis has grown into one of the most active research areas in Natural Language Processing (NLP), and has been a widespread research in data mining, Web mining, text mining, and information retrieval. At present, the emotional direction is mainly analyzed by a text classification method or a dictionary-based method, and the classification method has the defects that labels of training samples need to be labeled manually, and manpower and material resources are consumed; the dictionary-based calculation method only considers one kind of dictionary of the emotion dictionary or has certain error in searching the emotion words.
Disclosure of Invention
In order to solve the technical problems, the invention provides an emotion score analysis processing method based on an emotion dictionary entity, which performs fine-grained processing on the traversal of the entity, reduces errors and aims to perform emotion score analysis processing on unstructured emotion text data through text processing and statistical methods.
The technical scheme of the invention is as follows:
an analysis processing method based on the emotion score of an emotion dictionary entity,
comprises 6 steps:
1) dictionary preparation
2) The structure defining the entity is established and,
3) establishing an entity comparator;
4) traversing the text to be analyzed according to the established entity to generate all candidate entities;
5) screening candidate entities;
6) an emotion score is calculated.
Further, in the above-mentioned case,
four dictionaries of emotion words, degree adverbs, negative words and punctuation marks need to be prepared first.
The four dictionaries come from a general dictionary or a custom dictionary of a specific industry according to specific requirements; wherein the content of the first and second substances,
the method comprises the following steps of representing that positive emotion words are assigned to positive scores and the stronger emotions are, the higher scores are, and negative emotion words are assigned to negative scores and the stronger emotions are, the lower scores are; each degree adverb in the degree adverb dictionary is assigned with a different score according to different expressed strengths, and the score is larger when the degree represented by a general degree word is higher; the negative adverbs are a dictionary of simple negative words; punctuation dictionaries are also commonly used for punctuation or segmentation.
Further, in the above-mentioned case,
the entity structure comprises an entity name, an entity starting index, an entity ending index, an entity type and an entity length, wherein the entity type is divided into emotional words, degree adverbs, negative words and punctuation marks.
Further, in the above-mentioned case,
an entity comparator is established, namely two entities are set: and if the initial position of the entity I is larger than that of the entity II, returning to 1, if the initial position of the entity I is smaller than that of the entity II, returning to-1, and the initial positions of the two entities are equal, comparing the lengths of the two entities, if the length of the entity I is larger than that of the entity II, returning to 1, and otherwise, returning to-1.
Further, in the above-mentioned case,
generating candidate entities, giving a text to be analyzed, sequentially traversing the four dictionaries, if words in the dictionaries appear in the text, constructing a corresponding entity by the words, putting the entity into a candidate entity list, generating all the candidate entities after traversing the four dictionaries, and sequencing the candidate entities according to a defined filter, wherein the candidate entity list is a list generated according to the size of a starting position.
Further, in the above-mentioned case,
when screening entities, the candidate entity list is searched iteratively, if the initial indexes of the following entities are consistent with the initial index of the current entity, the longest entity is found and used as the entity of the current index, the initial index of the next word is larger than the end index of the longest entity, the index of the current entity is smaller than the end index of the last entity, the next entity is judged by directly skipping, and finally the required entity list is obtained.
Further, in the above-mentioned case,
and traversing the generated final entity list, directly skipping if the type of the current entity is not an emotional entity, and if the type of the current entity is the emotional entity, searching the position of the emotional entity or the punctuation mark entity closest to the emotional entity forward according to the position of the entity as an index, and simultaneously recording the number of all emotional entities.
Calculating the emotion score of the current emotional entity: the initial weight of the emotional entity is the score of the emotional word, the negative entity and the degree adverb entity which appear are found from the emotional entity to the position index, and the situation of the degree adverb, the degree adverb and the emotional word is removed, and the score of the emotional entity is as follows: and (3) obtaining the emotion score of the current emotional entity by the degree adverb entity score ^ times of the degree adverb entity (-1) times of the negation word entity ^ initial weight.
Traversing all the emotional entities, and summing all the emotional scores to obtain the emotional score of the text to be analyzed; if normalization is required, it can be divided by the number of affective entities.
The invention has the advantages that
1. The invention is not limited to a specific field or scene, and the emotional text to be analyzed can come from fields such as news, product evaluation, public opinion analysis and the like;
2. the analysis of the text class usually performs word segmentation first, and then has a certain word segmentation error. The method does not perform operations such as basic word segmentation and the like on the text to be analyzed, so that certain accuracy is improved;
3. the user-defined method of the invention comprises four dictionaries, punctuation mark entities of sentences or paragraphs are added, the accuracy of searching the entities is improved, and modified entities for modifying the entities are searched for and corresponding weight change is carried out.
Drawings
FIG. 1 is a schematic workflow diagram of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer and more complete, the technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention, and based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without creative efforts belong to the scope of the present invention.
The invention provides an analysis processing method of emotion scores based on an emotion dictionary entity, which is mainly realized by the following technical scheme and specifically comprises the following steps:
1. dictionary preparation
Firstly, four dictionaries of emotion words, degree adverbs, negative words and punctuation marks are prepared: the four dictionaries can be from general dictionaries or custom dictionaries of specific industries according to specific requirements; each emotion word in the emotion word dictionary is assigned with a certain fraction to express the strength of the emotion, which generally means that positive emotion words are assigned with positive scores and the stronger the emotion, the higher the score is, and negative emotion words are assigned with negative scores and the stronger the emotion, the lower the score is; each degree adverb in the degree adverb dictionary is assigned with a different score according to different expressed strengths, and the score is larger when the degree represented by a general degree word is higher; the negative adverbs are a dictionary of simple negative words; punctuation dictionaries are also commonly used for punctuation or segmentation.
2. Defining the structure of an entity
The entity structure comprises an entity name, an entity starting index, an entity ending index, an entity type and an entity length, wherein the entity type is divided into emotional words, degree adverbs, negative words and punctuation marks. Subsequent calculation steps will use these specific properties of the entity for calculation.
3. Building a physical comparator
For example, there are two entities, i.e., entity one and entity two, and if the starting position of entity one is greater than the starting position of entity two, return to 1, if entity one is less than the starting position of entity two, return to-1, if the starting positions of the two entities are equal, compare the lengths of the two entities, and if the length of entity one is greater than the length of entity two, return to 1, otherwise return to-1.
4. Generating candidate entities
And giving a text to be analyzed, sequentially traversing the four dictionaries, if a word in the dictionary appears in the text, constructing a corresponding entity by the word, putting the entity into a candidate entity list, generating all candidate entities after traversing the four dictionaries, and sequencing the candidate entities according to a defined filter, wherein the candidate entity list is a list generated according to the size of the starting position.
5. Screening candidate entities
And iteratively searching the candidate entity list, if the initial indexes of the subsequent entities are consistent with the initial index of the current entity, finding the longest entity as the entity of the current index, directly skipping the initial index of the next word which is larger than the ending index of the longest entity and the index of the current entity which is smaller than the ending index of the previous entity, and judging the next entity. And finally obtaining the required entity list.
6. Calculating an emotion score
And traversing the final entity list generated in the previous step, directly skipping if the current entity type is not an emotional entity, and if the current entity type is the emotional entity, searching forward the position of the emotional entity or the punctuation mark entity closest to the emotional entity according to the position of the entity as an index, and simultaneously recording the number of all the emotional entities. Calculating the emotion score of the current emotion entity, wherein the initial weight of the emotion entity is the score of the emotion word, finding the negative entity and the degree adverb entity from the emotion entity to the position index, and removing the situation of the degree adverb and the degree adverb, and the score of the emotion entity is as follows: and (3) obtaining the emotion score of the current emotional entity by the degree adverb entity score ^ times of the degree adverb entity (-1) times of the negation word entity ^ initial weight. And traversing all the emotional entities, and summing all the emotional scores to obtain the emotional score of the text to be analyzed. If normalization is required, it can be divided by the number of affective entities.
The invention can be adjusted according to actual requirements, for example, specific contents of four dictionaries are customized according to actual requirements, and corresponding personalization is performed on specific details, for example, the definition of emotion words in different industries is possibly different, and optimization can be performed through modification of emotion dictionaries. In the method, the combination of four dictionaries is considered, and weights can be given to different combination forms, for example, when the degree adverb, the degree adverb and the emotional word, a user highlights the combination more and can assign corresponding weights, so that the method has great applicability and expandability.
The method does not perform operations such as word segmentation and filtering on the text to be analyzed, and reduces errors caused by inaccurate processing of information by operations such as word segmentation. Candidate entities are generated in an entity traversal mode, further entity screening is performed according to the candidate entities and designed corresponding rules, final entities are reserved, and accuracy is improved. And finally, calculating to obtain emotion scores according to the text to be analyzed, and carrying out standardization or normalization, wherein the user can divide emotion grades according to needs.
The above description is only a preferred embodiment of the present invention, and is only used to illustrate the technical solutions of the present invention, and not to limit the protection scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (10)

1. An analysis processing method of emotion score based on emotion dictionary entity is characterized in that,
comprises 6 steps:
1) preparing a dictionary;
2) establishing a structure for defining an entity;
3) establishing an entity comparator;
4) traversing the text to be analyzed according to the established entity to generate all candidate entities;
5) screening candidate entities;
6) an emotion score is calculated.
2. The method of claim 1,
four dictionaries of emotion words, degree adverbs, negative words and punctuation marks need to be prepared first.
3. The method of claim 2,
the four dictionaries come from a general dictionary or a custom dictionary of a specific industry according to specific requirements; wherein the content of the first and second substances,
the method comprises the following steps of representing that positive emotion words are assigned to positive scores and the stronger emotions are, the higher scores are, and negative emotion words are assigned to negative scores and the stronger emotions are, the lower scores are; each degree adverb in the degree adverb dictionary is assigned with a different score according to different expressed strengths, and the score is larger when the degree represented by a general degree word is higher; the negative adverbs are a dictionary of simple negative words; punctuation dictionaries are also commonly used for punctuation or segmentation.
4. The method of claim 1,
the entity structure comprises an entity name, an entity starting index, an entity ending index, an entity type and an entity length, wherein the entity type is divided into emotional words, degree adverbs, negative words and punctuation marks.
5. The method of claim 1,
an entity comparator is established, namely two entities are set: and if the initial position of the entity I is larger than that of the entity II, returning to 1, if the initial position of the entity I is smaller than that of the entity II, returning to-1, and the initial positions of the two entities are equal, comparing the lengths of the two entities, if the length of the entity I is larger than that of the entity II, returning to 1, and otherwise, returning to-1.
6. The method of claim 1,
generating candidate entities, giving a text to be analyzed, sequentially traversing the four dictionaries, if words in the dictionaries appear in the text, constructing a corresponding entity by the words, putting the entity into a candidate entity list, generating all the candidate entities after traversing the four dictionaries, and sequencing the candidate entities according to a defined filter, wherein the candidate entity list is a list generated according to the size of a starting position.
7. The method of claim 6,
when screening entities, the candidate entity list is searched iteratively, if the initial indexes of the following entities are consistent with the initial index of the current entity, the longest entity is found and used as the entity of the current index, the initial index of the next word is larger than the end index of the longest entity, the index of the current entity is smaller than the end index of the last entity, the next entity is judged by directly skipping, and finally the required entity list is obtained.
8. The method of claim 7,
and traversing the generated final entity list, directly skipping if the type of the current entity is not an emotional entity, and if the type of the current entity is the emotional entity, searching the position of the emotional entity or the punctuation mark entity closest to the emotional entity forward according to the position of the entity as an index, and simultaneously recording the number of all emotional entities.
9. The method of claim 8,
calculating the emotion score of the current emotional entity: the initial weight of the emotional entity is the score of the emotional word, the negative entity and the degree adverb entity which appear are found from the emotional entity to the position index, and the situation of the degree adverb, the degree adverb and the emotional word is removed, and the score of the emotional entity is as follows: and (3) obtaining the emotion score of the current emotional entity by the degree adverb entity score ^ times of the degree adverb entity (-1) times of the negation word entity ^ initial weight.
10. The method of claim 9,
traversing all the emotional entities, and summing all the emotional scores to obtain the emotional score of the text to be analyzed; if normalization is required, it can be divided by the number of affective entities.
CN202110021645.3A 2021-01-08 2021-01-08 Emotional score analysis processing method based on emotional dictionary entity Active CN112711941B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110021645.3A CN112711941B (en) 2021-01-08 2021-01-08 Emotional score analysis processing method based on emotional dictionary entity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110021645.3A CN112711941B (en) 2021-01-08 2021-01-08 Emotional score analysis processing method based on emotional dictionary entity

Publications (2)

Publication Number Publication Date
CN112711941A true CN112711941A (en) 2021-04-27
CN112711941B CN112711941B (en) 2022-12-27

Family

ID=75548493

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110021645.3A Active CN112711941B (en) 2021-01-08 2021-01-08 Emotional score analysis processing method based on emotional dictionary entity

Country Status (1)

Country Link
CN (1) CN112711941B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130226928A1 (en) * 2012-02-23 2013-08-29 Palo Alto Research Center Incorporated System And Method For Mapping Text Phrases To Geographical Locations
CN105320960A (en) * 2015-10-14 2016-02-10 北京航空航天大学 Voting based classification method for cross-language subjective and objective sentiments
CN106407235A (en) * 2015-08-03 2017-02-15 北京众荟信息技术有限公司 A semantic dictionary establishing method based on comment data
CN106610990A (en) * 2015-10-22 2017-05-03 北京国双科技有限公司 Emotional tendency analysis method and apparatus
CN106897274A (en) * 2017-01-09 2017-06-27 北京众荟信息技术股份有限公司 Method is repeated in a kind of comment across languages
CN107656917A (en) * 2016-07-26 2018-02-02 深圳联友科技有限公司 A kind of Chinese sentiment analysis method and system
CN107862087A (en) * 2017-12-01 2018-03-30 广州简亦迅信息科技有限公司 Sentiment analysis method, apparatus and storage medium based on big data and deep learning
CN110362679A (en) * 2019-06-05 2019-10-22 北京大学(天津滨海)新一代信息技术研究院 A kind of financial field comment sensibility classification method and system based on sentiment dictionary
CN110399603A (en) * 2018-04-25 2019-11-01 北京中润普达信息技术有限公司 A kind of text-processing technical method and system based on sense-group division
CN111027322A (en) * 2019-12-13 2020-04-17 新华智云科技有限公司 Sentiment dictionary-based sentiment analysis method for fine-grained entities in financial news
CN111612339A (en) * 2020-05-21 2020-09-01 中国标准化研究院 Big data-based online commodity emotional tendency analysis method
CN111985223A (en) * 2020-08-25 2020-11-24 武汉长江通信产业集团股份有限公司 Emotion calculation method based on combination of long and short memory networks and emotion dictionaries

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130226928A1 (en) * 2012-02-23 2013-08-29 Palo Alto Research Center Incorporated System And Method For Mapping Text Phrases To Geographical Locations
CN106407235A (en) * 2015-08-03 2017-02-15 北京众荟信息技术有限公司 A semantic dictionary establishing method based on comment data
CN105320960A (en) * 2015-10-14 2016-02-10 北京航空航天大学 Voting based classification method for cross-language subjective and objective sentiments
CN106610990A (en) * 2015-10-22 2017-05-03 北京国双科技有限公司 Emotional tendency analysis method and apparatus
CN107656917A (en) * 2016-07-26 2018-02-02 深圳联友科技有限公司 A kind of Chinese sentiment analysis method and system
CN106897274A (en) * 2017-01-09 2017-06-27 北京众荟信息技术股份有限公司 Method is repeated in a kind of comment across languages
CN107862087A (en) * 2017-12-01 2018-03-30 广州简亦迅信息科技有限公司 Sentiment analysis method, apparatus and storage medium based on big data and deep learning
CN110399603A (en) * 2018-04-25 2019-11-01 北京中润普达信息技术有限公司 A kind of text-processing technical method and system based on sense-group division
CN110362679A (en) * 2019-06-05 2019-10-22 北京大学(天津滨海)新一代信息技术研究院 A kind of financial field comment sensibility classification method and system based on sentiment dictionary
CN111027322A (en) * 2019-12-13 2020-04-17 新华智云科技有限公司 Sentiment dictionary-based sentiment analysis method for fine-grained entities in financial news
CN111612339A (en) * 2020-05-21 2020-09-01 中国标准化研究院 Big data-based online commodity emotional tendency analysis method
CN111985223A (en) * 2020-08-25 2020-11-24 武汉长江通信产业集团股份有限公司 Emotion calculation method based on combination of long and short memory networks and emotion dictionaries

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
杨奎等: "基于情感词典方法的情感倾向性分析", 《计算机时代》 *
闫晓东等: "基于情感词典的藏语文本句子情感分类", 《中文信息学报》 *

Also Published As

Publication number Publication date
CN112711941B (en) 2022-12-27

Similar Documents

Publication Publication Date Title
CN107862343A (en) The rule-based and comment on commodity property level sensibility classification method of neutral net
Styawati et al. Sentiment analysis on online transportation reviews using Word2Vec text embedding model feature extraction and support vector machine (SVM) algorithm
Sharma et al. Comparative Analysis of Online Fashion Retailers Using Customer Sentiment Analysis on Twitter
CN109101478B (en) Aspect-level emotion analysis method for E-commerce comment text
US20160299955A1 (en) Text mining system and tool
CN108596637B (en) Automatic E-commerce service problem discovery system
CN104199845B (en) Line Evaluation based on agent model discusses sensibility classification method
CN112183056A (en) Context-dependent multi-classification emotion analysis method and system based on CNN-BilSTM framework
Biradar et al. Machine learning tool for exploring sentiment analysis on twitter data
CN112287197B (en) Method for detecting sarcasm of case-related microblog comments described by dynamic memory cases
CN104346326A (en) Method and device for determining emotional characteristics of emotional texts
CN113312922A (en) Improved chapter-level triple information extraction method
CN112749283A (en) Entity relationship joint extraction method for legal field
Shi et al. A Word2vec model for sentiment analysis of weibo
Hase Automated content analysis
CN114547303A (en) Text multi-feature classification method and device based on Bert-LSTM
Chumwatana COMMENT ANALYSIS FOR PRODUCT AND SERVICE SATISFACTION FROM THAI CUSTOMERS'REVIEW IN SOCIAL NETWORK
CN112711941B (en) Emotional score analysis processing method based on emotional dictionary entity
Jayasekara et al. Opinion mining of customer reviews: feature and smiley based approach
CN115238709A (en) Method, system and equipment for analyzing sentiment of policy announcement network comments
Munnes et al. Examining sentiment in complex texts. A comparison of different computational approaches
Alorini et al. Machine learning enabled sentiment index estimation using social media big data
CN115018255A (en) Tourist attraction evaluation information quality validity analysis method based on integrated learning data mining technology
CN114942991A (en) Emotion classification model construction method based on metaphor recognition
CN114254620A (en) Policy analysis method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant