CN110399481B - Method and device for screening emotional entity words - Google Patents

Method and device for screening emotional entity words Download PDF

Info

Publication number
CN110399481B
CN110399481B CN201910491200.4A CN201910491200A CN110399481B CN 110399481 B CN110399481 B CN 110399481B CN 201910491200 A CN201910491200 A CN 201910491200A CN 110399481 B CN110399481 B CN 110399481B
Authority
CN
China
Prior art keywords
emotion
words
entity
ape
sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910491200.4A
Other languages
Chinese (zh)
Other versions
CN110399481A (en
Inventor
杨志明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ideepwise Artificial Intelligence Robot Technology Beijing Co ltd
Original Assignee
Ideepwise Artificial Intelligence Robot Technology Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ideepwise Artificial Intelligence Robot Technology Beijing Co ltd filed Critical Ideepwise Artificial Intelligence Robot Technology Beijing Co ltd
Priority to CN201910491200.4A priority Critical patent/CN110399481B/en
Publication of CN110399481A publication Critical patent/CN110399481A/en
Application granted granted Critical
Publication of CN110399481B publication Critical patent/CN110399481B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a method and a device for screening emotional entity words, which comprise the following steps: traversing each sentence of the candidate text, and selecting the emotional entity word with the largest weight index in each sentence as a candidate emotional entity word; the emotion entity words are the combination of emotion object words and emotion words in a sentence; and counting the occurrence frequency of different words in all the candidate emotion entity words, carrying out non-decreasing sequencing on the different words according to the occurrence frequency, and taking the candidate emotion entity words with the preset number in the front in the sequencing as standby emotion entity words. Based on the method, the standby emotion entity words of the candidate text are automatically generated without manual participation, so that the processing efficiency is improved, and errors caused by manual processing can be avoided.

Description

Method and device for screening emotional entity words
Technical Field
The invention relates to the field of computers, in particular to a method and a device for screening emotional entity words.
Background
With the development of the internet and social media, a great deal of text information including wikipedia entries, academic articles, news reports and various after-sales service comments exists on the network, and a great deal of valuable information is contained in the text information. The existing text classification technology can roughly extract specific information in the text.
Sentiment calculation is a text classification technology, and the satisfaction degree of a consumer for the product or the service can be known through sentiment analysis on after-sale comments. Currently, emotion computation most commonly uses keywords in an emotion dictionary to classify or score a given sentence.
The most important step in emotion calculation based on the emotion dictionary is the construction of the emotion dictionary, and the construction of the emotion dictionary is realized through manual construction, including the construction of emotion entity words and emotion classification thereof. The manual construction is not only laborious and laborious, but also prone to errors.
Disclosure of Invention
In view of the above, the invention provides a method and a device for screening emotional entity words, which solve the problem of artificial construction of the emotional entity words in the existing emotional dictionary.
The invention provides a method for screening emotional entity words, which comprises the following steps
Traversing each sentence of the candidate text, and selecting the emotional entity word with the largest weight index in each sentence as a candidate emotional entity word; the emotion entity words are the combination of emotion object words and emotion words in a sentence;
and counting the occurrence frequency of different words in all the candidate emotion entity words, carrying out non-decreasing sequencing on the different words according to the occurrence frequency, and taking the candidate emotion entity words with the preset number in the front in the sequencing as standby emotion entity words.
The present invention also provides a non-transitory computer-readable storage medium storing instructions which, when executed by a processor, cause the processor to perform the steps in the above-described emotion entity word screening method.
The invention also provides a screening device of the emotion entity words, which comprises a processor and the non-transitory computer readable storage medium.
The method of the invention screens the standby emotion entity words of the candidate text based on the frequency and the weight index by taking sentences as units, and can ensure that the obtained standby emotion entity words are popular important emotion entity words.
The method can automatically run without manual participation, thereby improving the processing efficiency and avoiding errors caused by manual processing.
Drawings
FIG. 1 is a flow chart of a method for screening emotional entity words according to the invention;
FIG. 2 is a flowchart illustrating the implementation of emotional entity words and their weighting indexes according to the present invention;
FIG. 3 is a diagram showing the construction of an apparatus for screening emotional entity words according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.
As shown in FIG. 1, the invention provides a method for screening emotional entity words, which comprises the steps of
S10: traversing each sentence of the candidate text, and selecting the emotional entity word with the largest weight index in each sentence as a candidate emotional entity word; the emotion entity words are the combination of emotion object words and emotion words in a sentence;
s20: and counting the occurrence frequency of different words in all the candidate emotion entity words, carrying out non-decreasing sequencing on the different words according to the occurrence frequency, and taking the candidate emotion entity words with the preset number in the front in the sequencing as standby emotion entity words.
In sentence: the most satisfactory point is that not only the appearance of the interior decoration is configured but also the appearance of the interior decoration is not inferior, and the candidate emotion entity words and the weight indexes thereof in the sentence can be generated according to the method shown in the figure 2.
S31: and performing word segmentation and part-of-speech analysis on the sentence.
For example, word segmentation and part-of-speech analysis are performed on example sentences by using jiaba and hanLP, and taking hanLP as an example, the following can be obtained: the list of examples is [ max/d, satisfied/v,/ude 1, point/m, is/vshi, not only/c, appearance/n, bad breath/an, trim/nz, configuration/vn, also/d, milli/d, inferior/a ].
Where d denotes an adverb, v denotes a verb, ude1 denotes "bottom/base", m denotes a number word, vshi denotes "verb is", c denotes a conjunct, n denotes a noun, an denotes an ideogram, nz denotes other proper names, vn denotes a noun verb, and a denotes an adjective.
S32: and extracting the emotion object words and the emotion words after the words are segmented from the sentence.
Wherein, the emotion object words include: nouns (n), animal nouns (vn), biological names (nb), animal names (nba), animal compendium (nbc), plant names (nbp), food (nf), noun morphemes (ng), health-related nouns (nh) such as medical diseases, diseases (nhd), drugs (nhm), noun idioms (nl), item names (nm), chemical names (nmc), work-related nouns (nn), occupations (nnd), place names (ns), transliterated place names (nsf), banks (ntcb), hotels (ntch), hospitals (nth), other special names (nz).
The emotional words comprise: adjectives (a), adverbs (ad), adjective morphemes (ag), adjective idioms (al), and ideograms (an).
From the list of example sentences one can obtain:
list of emotion object words and corresponding list of positions: [ appearance, interior, arrangement ] < - [6,8,9]
List of emotion words and corresponding list of positions: [ dominance, senescent ] < - [7,12]
Taking the emotion object word "appearance" as an example, the corresponding position is "6", which indicates that: the list of illustrative sentences (6) 'look'.
Step 33: combining the words in the emotion object word list and the words in the emotion word list into emotion entity words in pairs, and calculating ape of each emotion entity word.
ape is in direct proportion to the absolute value of the deviation of the first position and the second position, wherein the first position is the position of an emotional object word of the emotional entity word in the sentence; and the second position is the position of the emotional words of the emotional entity words in the sentence.
Taking the emotion object word list and the emotion word list as examples, the following emotion entity words can be obtained:
[ appearance, interior, arrangement ] × [ bad smell, fade ] ([ appearance: bad smell, interior: bad smell, arrangement: bad smell, appearance: fade, interior: fade, arrangement: fade ]
Further, the calculation rule of ape is as follows:
when the second position is less than the first position, ape ═ (first position-second position) × preset value;
and when the second position is larger than or equal to the first position, ape is the second position-the first position.
Wherein the preset value is > 1; when the preset value is 3; the ape of the emotional entity word of the example sentence is [1,3,6,6,4,3 ].
Step 34: ape _ max and ape _ min are determined.
ape _ max is the maximum value of apes of all emotional entity words in the sentence where the emotional entity word is located; ape _ min is the minimum value of apes of all emotional entity words in the sentence where the emotional entity words are located.
According to the ape of the example sentence, ape _ max is 6 and ape _ min is 1.
Step 35: and calculating the relevance index c of the emotional entity words.
The relevance index c is the relevance between the emotion object words and the emotion words in the emotion entity words.
One calculation method for c is given below: c is cosine (emotion object word vector, emotion word vector). The cosine input is two vectors, and the cosine output is the cosine value of the included angle of the two vectors. Where words may be converted to word vectors by word2 vec.
Step 36: and calculating the weight index weight of the emotional entity words.
weight — c × w, w is a correction parameter, and the correction parameter may be a fixed value, or may be:
w=(1.0+ape_max-ape_min)/(1.0+ape-ape_min);
from this formula, the farther two words are apart, the larger ape and the smaller w and weight. The "emotional entity word" naturally existing in the sentence corresponds to the smallest ape, for example, "the appearance: the bad breath", and the corresponding correction parameter value is the largest.
The weight index weight of each emotion entity word in a sentence can be obtained through steps 31-36.
Prior to step 31, the candidate text is segmented into a plurality of sentences by sentence segmentation. The clause can be realized by recognizing punctuation marks at the end of the sentence or by sentence-dividing software.
The method of the invention screens the standby emotion entity words of the candidate text based on the frequency and the weight index by taking sentences as units, and can ensure that the obtained standby emotion entity words are popular important emotion entity words.
The method can automatically run without manual participation, thereby improving the processing efficiency and avoiding errors caused by manual processing.
The present invention also provides a non-transitory computer-readable storage medium storing instructions which, when executed by a processor, cause the processor to perform the steps in the above-described emotion entity word screening method.
The invention also provides a screening device of the emotion entity words, which comprises a processor and the non-transitory computer readable storage medium.
As shown in fig. 3, the apparatus includes:
a traversing module: traversing each sentence of the candidate text, and selecting the emotional entity word with the largest weight index in each sentence as a candidate emotional entity word; the emotion entity words are the combination of emotion object words and emotion words in a sentence;
a sorting and screening module: and counting the occurrence frequency of different words in all the candidate emotion entity words, carrying out non-decreasing sequencing on the different words according to the occurrence frequency, and taking the candidate emotion entity words with the preset number in the front in the sequencing as standby emotion entity words.
The weight index is in direct proportion to the relevance index of the emotion entity words, and the relevance index is the relevance of the emotion object words and the emotion words in the emotion entity words.
Optionally, the relevance index is a cosine value of an included angle between the emotion object word vector and the emotion word vector in the emotion entity word.
Further, the weight index is also proportional to a correction parameter (1.0+ ape _ max-ape _ min)/(1.0+ ape-ape _ min);
the ape is in direct proportion to the absolute value of the deviation of the first position and the second position, and the first position is the position of an emotion object word of the emotion entity word in the sentence; the second position is the position of the emotional words of the emotional entity words in the sentence;
ape _ max is the maximum value of apes of all emotional entity words in the sentence where the emotional entity word is located; ape _ min is the minimum value of apes of all emotional entity words in the sentence where the emotional entity words are located.
Specifically, when the second position is smaller than the first position, ape ═ second position (first position — second position) × the preset value, and when the second position is equal to or greater than the first position, ape ═ second position — first position, the preset value is greater than 1.
The embodiment of the apparatus for screening emotional entity words according to the present invention is the same in principle as the embodiment of the method for screening emotional entity words, and the relevant points can be referred to each other.
The above description is only exemplary of the present invention and should not be taken as limiting the scope of the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (6)

1. A method for screening emotional entity words is characterized by comprising the following steps:
traversing each sentence of the candidate text, and selecting the emotional entity word with the largest weight index in each sentence as a candidate emotional entity word; the emotion entity words are the combination of emotion object words and emotion words in a sentence;
counting the occurrence frequency of different words in all candidate emotion entity words, carrying out non-descending sorting on the different words according to the occurrence frequency, and taking the candidate emotion entity words with the preset number in the front in the sorting as standby emotion entity words;
the weight index is proportional to a correction parameter (1.0+ ape _ max-ape _ min)/(1.0+ ape-ape _ min);
the ape is in direct proportion to the absolute value of the deviation of a first position and a second position, wherein the first position is the position of an emotion object word of the emotion entity word in a sentence; the second position is the position of the emotion word of the emotion entity word in the sentence;
ape _ max is the maximum value of apes of all emotion entity words in the sentence where the emotion entity word is located; ape _ min is the minimum value of apes of all emotion entity words in the sentence where the emotion entity word is located;
when the second position is smaller than the first position, ape ═ (first position-second position) × a preset value, and when the second position is greater than or equal to the first position, ape ═ second position-first position, the preset value is greater than 1.
2. The method of claim 1, wherein the weight index is proportional to a relevance index of the emotional entity words, and the relevance index is a relevance of emotional object words and emotional words in the emotional entity words.
3. The method of claim 2, wherein the relevance indicator is a cosine value of an angle between an emotion object word vector and an emotion word vector in the emotion entity word.
4. A non-transitory computer readable storage medium storing instructions which, when executed by a processor, cause the processor to perform the steps in the method for filtering emotional entity words according to any one of claims 1 to 3.
5. An apparatus for screening emotional entity words, comprising a processor and the non-transitory computer-readable storage medium of claim 4.
6. A screening device of emotion entity words, characterized by comprising:
a traversing module: traversing each sentence of the candidate text, and selecting the emotional entity word with the largest weight index in each sentence as a candidate emotional entity word; the emotion entity words are the combination of emotion object words and emotion words in a sentence;
a sorting and screening module: counting the occurrence frequency of different words in all candidate emotion entity words, carrying out non-descending sorting on the different words according to the occurrence frequency, and taking the candidate emotion entity words with the preset number in the front in the sorting as standby emotion entity words;
the weight index is proportional to a correction parameter (1.0+ ape _ max-ape _ min)/(1.0+ ape-ape _ min);
the ape is in direct proportion to the absolute value of the deviation of a first position and a second position, wherein the first position is the position of an emotion object word of the emotion entity word in a sentence; the second position is the position of the emotion word of the emotion entity word in the sentence;
ape _ max is the maximum value of apes of all emotion entity words in the sentence where the emotion entity word is located; ape _ min is the minimum value of apes of all emotion entity words in the sentence where the emotion entity word is located;
when the second position is smaller than the first position, ape ═ (first position-second position) × a preset value, and when the second position is greater than or equal to the first position, ape ═ second position-first position, the preset value is greater than 1.
CN201910491200.4A 2019-06-06 2019-06-06 Method and device for screening emotional entity words Active CN110399481B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910491200.4A CN110399481B (en) 2019-06-06 2019-06-06 Method and device for screening emotional entity words

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910491200.4A CN110399481B (en) 2019-06-06 2019-06-06 Method and device for screening emotional entity words

Publications (2)

Publication Number Publication Date
CN110399481A CN110399481A (en) 2019-11-01
CN110399481B true CN110399481B (en) 2022-04-12

Family

ID=68323139

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910491200.4A Active CN110399481B (en) 2019-06-06 2019-06-06 Method and device for screening emotional entity words

Country Status (1)

Country Link
CN (1) CN110399481B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103646088A (en) * 2013-12-13 2014-03-19 合肥工业大学 Product comment fine-grained emotional element extraction method based on CRFs and SVM
CN108241682A (en) * 2016-12-26 2018-07-03 北京国双科技有限公司 Determine the method and device of text emotion
CN108363784A (en) * 2018-01-20 2018-08-03 西北工业大学 A kind of public sentiment trend estimate method based on text machine learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100917784B1 (en) * 2007-12-24 2009-09-21 한성주 Method and system for retrieving information of collective emotion based on comments about content

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103646088A (en) * 2013-12-13 2014-03-19 合肥工业大学 Product comment fine-grained emotional element extraction method based on CRFs and SVM
CN108241682A (en) * 2016-12-26 2018-07-03 北京国双科技有限公司 Determine the method and device of text emotion
CN108363784A (en) * 2018-01-20 2018-08-03 西北工业大学 A kind of public sentiment trend estimate method based on text machine learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
面向商品评价的情感要素提取;冯仓龙,白宇,蔡东风;《沈阳航空航天大学学报》;20161230;第33卷(第6期);71-76 *

Also Published As

Publication number Publication date
CN110399481A (en) 2019-11-01

Similar Documents

Publication Publication Date Title
US20210117617A1 (en) Methods and systems for summarization of multiple documents using a machine learning approach
Shwartz et al. Still a pain in the neck: Evaluating text representations on lexical composition
US9613024B1 (en) System and methods for creating datasets representing words and objects
US8463594B2 (en) System and method for analyzing text using emotional intelligence factors
Fikri et al. A comparative study of sentiment analysis using SVM and SentiWordNet
US9245015B2 (en) Entity disambiguation in natural language text
US9880998B1 (en) Producing datasets for representing terms and objects based on automated learning from text contents
US20150286627A1 (en) Contextual sentiment text analysis
US11593557B2 (en) Domain-specific grammar correction system, server and method for academic text
WO2014002775A1 (en) Synonym extraction system, method and recording medium
Abdullah et al. Sentiment analysis on arabic tweets: Challenges to dissecting the language
De Freitas et al. Exploring resources for sentiment analysis in Portuguese language
Singh et al. Sentiment analysis using lexicon based approach
CN111782759A (en) Question and answer processing method and device and computer readable storage medium
US9262395B1 (en) System, methods, and data structure for quantitative assessment of symbolic associations
Gîfu et al. Multi-dimensional analysis of political language
KR20130103249A (en) Method of classifying emotion from multi sentence using context information
CN109992647B (en) Content searching method and device
CN110399481B (en) Method and device for screening emotional entity words
CN106294312A (en) Information processing method and information processing system
Mahafdah et al. Arabic Part of speech Tagging using k-Nearest Neighbour and Naive Bayes Classifiers Combination.
JP2019003387A (en) Program, device and method creating scatter diagram with scattered word group
Sierra Martínez et al. Building a Nasa Yuwe Language Corpus and Tagging with a Metaheuristic Approach
Liebeskind et al. An algorithmic scheme for statistical thesaurus construction in a morphologically rich language
Sani et al. Sentiment Analysis of Hausa Language Tweet Using Machine Learning Approach

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant