CN111104515A - Emotional word text information classification method - Google Patents

Emotional word text information classification method Download PDF

Info

Publication number
CN111104515A
CN111104515A CN201911341489.8A CN201911341489A CN111104515A CN 111104515 A CN111104515 A CN 111104515A CN 201911341489 A CN201911341489 A CN 201911341489A CN 111104515 A CN111104515 A CN 111104515A
Authority
CN
China
Prior art keywords
words
text
emotional
word
emotion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911341489.8A
Other languages
Chinese (zh)
Inventor
李春燕
苏航
李松和
武传涛
刘瑞欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHANDONG ZHONGZHI ELECTRONICS CO Ltd
Original Assignee
SHANDONG ZHONGZHI ELECTRONICS CO Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHANDONG ZHONGZHI ELECTRONICS CO Ltd filed Critical SHANDONG ZHONGZHI ELECTRONICS CO Ltd
Priority to CN201911341489.8A priority Critical patent/CN111104515A/en
Publication of CN111104515A publication Critical patent/CN111104515A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Abstract

The invention discloses a method for classifying emotional word text information, which comprises the following steps: acquiring text information; inputting text words; judging the part of speech, wherein the part of speech in the judged text is divided into emotional words, negative words and degree adverbs; scoring the emotional words, inputting the emotional words into an emotional word dictionary to score to obtain a score of 1; judging whether the front and the back of the emotional words correspond to degree adverbs or not; obtaining a new emotion word score of 2; judging the front and the back of the emotional words again to obtain an emotional word score 3; outputting a text final score and the sum of the final scores of all the emotional words; and dividing the text categories according to the final scores of the texts. The invention aims to establish a text information classification model, and aims to score emotion colors of texts, and further divide the texts into three categories of positive, neutral and negative directions.

Description

Emotional word text information classification method
Technical Field
The invention relates to the technical field of text information classification, in particular to a method for classifying emotional word text information.
Background
Emotion classification is a typical problem in the field of Natural Language Processing (NLP) and describes a given segment of text (which may be a sentence or an article) to determine whether the emotion expressed by the article is positive, negative or neutral.
The sentiment classification problem itself is a topic that is extensively studied both in academia and industry. The use of an emotion dictionary is one approach to solving the emotion classification problem. Firstly, some emotion words such as positive emotion words and negative emotion words are set artificially, and then emotion classification of the text is determined by counting the proportion of the positive emotion words and the negative emotion words of the input text.
And the judgment accuracy of partial samples with low absolute score values is not high, and the classification is not clear enough.
When the emotional color of the text is analyzed, the relationship between the context and the sentence is not considered, and the special text such as the irony text is easy to be judged by mistake.
Disclosure of Invention
The invention aims to solve the problems in the prior art.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a method for classifying emotional word text information comprises the following steps:
acquiring text information;
inputting text words;
judging the part of speech, wherein the part of speech in the judged text is divided into emotional words, negative words and degree adverbs;
scoring the emotional words, inputting the emotional words into an emotional word dictionary to score to obtain a score of 1;
judging whether the front and the back of the emotional words correspond to degree adverbs or not;
obtaining a new emotion word score of 2;
judging the front and the back of the emotional words again to obtain an emotional word score 3;
outputting a text final score and the sum of the final scores of all the emotional words;
and dividing the text categories according to the final scores of the texts.
And further, the method also comprises a text dictionary, wherein the text dictionary comprises an emotion dictionary, a negative word dictionary and a degree adverb dictionary, and words in the word list of each text object are classified to generate the emotion word dictionary, the negative word dictionary and the degree adverb dictionary of the text object.
Further, the emotion dictionary comprises positive emotion words and negative emotion words, the degree adverb dictionary and the emotion words have scores, and the negative words have no scores.
Further, a text data set is established, each text object is marked manually, emotion classification of the text is divided into 3 types of positive direction, neutral direction and negative direction, the 3 types are marked as 1, 0 and-1 respectively, each text object is processed by utilizing the ending participle, stop words are removed according to the stop word dictionary, and a word list of each text is obtained.
Further, judging whether the calculation range of the front and rear scores of the emotional words is that a negative word and a degree adverb between the two emotional words and the latter emotional word form an emotional phrase, wherein the sum of the scores of all the emotional phrases is the emotion polarity score of the text. The formula is as follows:
Figure BDA0002332387630000031
wherein a isiIs the number of negative words in the ith emotional phrase, biIs the product of the weights of all degree adverbs in the phrase, ciIs the score of the emotional adverb.
The invention has the beneficial effects that: according to the technical scheme, the texts are subjected to emotion color scoring in the implementation process, so that the texts are divided into three categories of positive, neutral and negative, and compared with the text object marks in the text data set, the updated open source dictionary can be supplemented at any time.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
FIG. 1 is a flowchart of a classification process according to a first embodiment of the present invention;
Detailed Description
The technical solution of the present invention is further specifically described below by way of specific examples in conjunction with the accompanying drawings.
In the first embodiment, please refer to fig. 1, the present invention aims to establish an emotion word text information classification method, which is used for performing emotion color scoring on a text, and further classifying the text into three categories, namely positive, neutral and negative.
1. Establishing a dictionary
And establishing an emotion dictionary (comprising positive emotion words and negative emotion words), a negative word dictionary and a degree adverb dictionary. The degree adverb dictionary and the emotion words have scores, the negative words have no scores, and the dictionary format is shown in table 1. (the above 3 dictionaries are open source dictionaries issued on the network)
Figure BDA0002332387630000041
TABLE 1 dictionary Format
2 text segmentation
Establishing a text data set, manually marking each text object, and dividing the emotion classification of the text into 3 types of positive, neutral and negative, which are respectively marked as 1, 0 and-1.
And processing each text object by using the ending segmentation words, and removing stop words according to the stop word dictionary to obtain a word list of each text. (stop word dictionary as used herein is an open source dictionary that has been published on the network)
3 generating a text dictionary
And classifying words in the word list of each text object to generate an emotion word dictionary, a negative word dictionary and a degree adverb dictionary of the text object.
4 calculate the sentiment polarity score
The negative words and the degree adverbs between the two emotional words and the latter emotional word form an emotional phrase, and the sum of the scores of all the emotional phrases is the emotion polarity score of the text. The formula is as follows:
Figure BDA0002332387630000051
wherein a isiIs the number of negative words in the ith emotional phrase, biIs the product of the weights of all degree adverbs in the phrase, ciIs the score of the emotional adverb.
5 determining the classification range
And dividing the positive and negative orientation or neutrality of the text emotion by using the scores of the text objects.
And observing the scores, and finding that the score 0 is not a reasonable positive and negative boundary, so that the classification problem is abstracted into an optimization problem, namely, upper and lower limits of optimal neutral scores are searched, and the accuracy of the obtained classification is highest compared with the classification marked manually. And after this range is obtained, the classification criteria can be applied to other texts.
The feasible region is determined according to the sample fraction, such as the range of the intermediate segment data with reasonable percentage according to the sorted fraction, wherein the lower feasible region is (-2,4), and the upper feasible region is (-1, 6). The objective function is the classification accuracy. And if the accuracy of the new upper and lower boundaries is higher than that of the old upper and lower boundaries, updating the upper and lower boundaries.
The upper neutral limit was found to be 3.7 points, the lower neutral limit was found to be-1 points, and the classification accuracy was found to be 86.24%. That is, when the score of a text is less than-1, the text is negative emotionally; the score is between-1 and 3.7, and the text is emotionally neutral; the score is greater than 3.7 and the text is emotionally positive. The accuracy of emotion classification of text using this method is 86.24%.
The method has certain dependency on the emotion dictionary, and the emotion dictionary can be optimized by utilizing a naive Bayes algorithm of machine learning according to samples on different platforms in the later period, so that the method is more suitable for text emotion judgment on a specific platform.
The above-described embodiments are only preferred embodiments of the present invention, and are not intended to limit the present invention in any way, and other variations and modifications may be made without departing from the spirit of the invention as set forth in the claims.

Claims (5)

1. A method for classifying emotional word text information is characterized by comprising the following steps:
acquiring text information;
inputting text words;
judging the part of speech, wherein the part of speech in the judged text is divided into emotional words, negative words and degree adverbs;
scoring the emotional words, inputting the emotional words into an emotional word dictionary to score to obtain a score of 1;
judging whether the front and the back of the emotional words correspond to degree adverbs or not;
obtaining a new emotion word score of 2;
judging the front and the back of the emotional words again to obtain an emotional word score 3;
outputting a text final score and the sum of the final scores of all the emotional words;
and dividing the text categories according to the final scores of the texts.
2. The emotion word text information classification method of claim 1, characterized in that: the method also comprises a text dictionary, wherein the text dictionary comprises an emotion dictionary, a negative word dictionary and a degree adverb dictionary which are established, words in a word list of each text object are classified, and the emotion word dictionary, the negative word dictionary and the degree adverb dictionary of the text object are generated.
3. The emotion word text information classification method of claim 2, characterized in that: the emotion dictionary comprises positive emotion words and negative emotion words, the degree adverb dictionary and the emotion words have scores, and the negative words have no scores.
4. The emotion word text information classification method of claim 3, characterized in that: the method also comprises the steps of establishing a text data set, manually marking each text object, dividing emotion classification of the text into 3 types of positive, neutral and negative directions, respectively marking the types as 1, 0 and-1, processing each text object by using a stop word and removing stop words according to a stop word dictionary to obtain a word list of each text.
5. The emotion word text information classification method according to any one of claims 1 to 4, characterized in that: and judging whether the calculation range of the front and rear scores of the emotional words is negative words and degree adverbs between the two emotional words and the latter emotional word form an emotional phrase, wherein the sum of the scores of all the emotional phrases is the emotion polarity score of the text. The formula is as follows:
Figure FDA0002332387620000021
wherein a isiIs the number of negative words in the ith emotional phrase, biIs the product of the weights of all degree adverbs in the phrase, ciIs the score of the emotional adverb.
CN201911341489.8A 2019-12-24 2019-12-24 Emotional word text information classification method Pending CN111104515A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911341489.8A CN111104515A (en) 2019-12-24 2019-12-24 Emotional word text information classification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911341489.8A CN111104515A (en) 2019-12-24 2019-12-24 Emotional word text information classification method

Publications (1)

Publication Number Publication Date
CN111104515A true CN111104515A (en) 2020-05-05

Family

ID=70423969

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911341489.8A Pending CN111104515A (en) 2019-12-24 2019-12-24 Emotional word text information classification method

Country Status (1)

Country Link
CN (1) CN111104515A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111767399A (en) * 2020-06-30 2020-10-13 平安国际智慧城市科技股份有限公司 Emotion classifier construction method, device, equipment and medium based on unbalanced text set
CN112668330A (en) * 2020-12-31 2021-04-16 北京大米科技有限公司 Data processing method and device, readable storage medium and electronic equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102929863A (en) * 2012-11-06 2013-02-13 苏州两江科技有限公司 Method for intelligently analyzing Chinese character emotional tendency through computer
CN105138506A (en) * 2015-07-09 2015-12-09 天云融创数据科技(北京)有限公司 Financial text sentiment analysis method
CN106503049A (en) * 2016-09-22 2017-03-15 南京理工大学 A kind of microblog emotional sorting technique for merging multiple affection resources based on SVM
CN106598944A (en) * 2016-11-25 2017-04-26 中国民航大学 Civil aviation security public opinion emotion analysis method
CN107656917A (en) * 2016-07-26 2018-02-02 深圳联友科技有限公司 A kind of Chinese sentiment analysis method and system
CN110362679A (en) * 2019-06-05 2019-10-22 北京大学(天津滨海)新一代信息技术研究院 A kind of financial field comment sensibility classification method and system based on sentiment dictionary
CN110598219A (en) * 2019-10-23 2019-12-20 安徽理工大学 Emotion analysis method for broad-bean-net movie comment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102929863A (en) * 2012-11-06 2013-02-13 苏州两江科技有限公司 Method for intelligently analyzing Chinese character emotional tendency through computer
CN105138506A (en) * 2015-07-09 2015-12-09 天云融创数据科技(北京)有限公司 Financial text sentiment analysis method
CN107656917A (en) * 2016-07-26 2018-02-02 深圳联友科技有限公司 A kind of Chinese sentiment analysis method and system
CN106503049A (en) * 2016-09-22 2017-03-15 南京理工大学 A kind of microblog emotional sorting technique for merging multiple affection resources based on SVM
CN106598944A (en) * 2016-11-25 2017-04-26 中国民航大学 Civil aviation security public opinion emotion analysis method
CN110362679A (en) * 2019-06-05 2019-10-22 北京大学(天津滨海)新一代信息技术研究院 A kind of financial field comment sensibility classification method and system based on sentiment dictionary
CN110598219A (en) * 2019-10-23 2019-12-20 安徽理工大学 Emotion analysis method for broad-bean-net movie comment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111767399A (en) * 2020-06-30 2020-10-13 平安国际智慧城市科技股份有限公司 Emotion classifier construction method, device, equipment and medium based on unbalanced text set
CN112668330A (en) * 2020-12-31 2021-04-16 北京大米科技有限公司 Data processing method and device, readable storage medium and electronic equipment
CN112668330B (en) * 2020-12-31 2024-01-26 北京大米科技有限公司 Data processing method and device, readable storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
CN109002473B (en) Emotion analysis method based on word vectors and parts of speech
Maia et al. Finsslx: A sentiment analysis model for the financial domain using text simplification
CN108563638B (en) Microblog emotion analysis method based on topic identification and integrated learning
CN106776538A (en) The information extracting method of enterprise's noncanonical format document
Rizki et al. Comparison of stemming algorithms on Indonesian text processing
CN107038249A (en) Network public sentiment information sensibility classification method based on dictionary
CN107168956B (en) Chinese chapter structure analysis method and system based on pipeline
CN110096587B (en) Attention mechanism-based LSTM-CNN word embedded fine-grained emotion classification model
CN110674296B (en) Information abstract extraction method and system based on key words
CN104142912A (en) Accurate corpus category marking method and device
CN111104515A (en) Emotional word text information classification method
CN107818173B (en) Vector space model-based Chinese false comment filtering method
Burlot et al. Word representations in factored neural machine translation
CN107451116B (en) Statistical analysis method for mobile application endogenous big data
CN109062977A (en) A kind of automatic question answering text matching technique, automatic question-answering method and system based on semantic similarity
Fauziah et al. Lexicon based sentiment analysis in Indonesia languages: A systematic literature review
CN112989848B (en) Training method for neural machine translation model of field adaptive medical literature
CN113032559B (en) Language model fine tuning method for low-resource adhesive language text classification
CN111191029B (en) AC construction method based on supervised learning and text classification
CN113065350A (en) Biomedical text word sense disambiguation method based on attention neural network
CN112632272A (en) Microblog emotion classification method and system based on syntactic analysis
WO2020199590A1 (en) Mood detection analysis method and related device
CN110708619A (en) Word vector training method and device for intelligent equipment
CN110362682A (en) A kind of entity coreference resolution method based on statistical machine learning algorithm
CN111898375B (en) Automatic detection and division method for article discussion data based on word vector sentence chain

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200505