CN111104515A - Emotional word text information classification method - Google Patents
Emotional word text information classification method Download PDFInfo
- Publication number
- CN111104515A CN111104515A CN201911341489.8A CN201911341489A CN111104515A CN 111104515 A CN111104515 A CN 111104515A CN 201911341489 A CN201911341489 A CN 201911341489A CN 111104515 A CN111104515 A CN 111104515A
- Authority
- CN
- China
- Prior art keywords
- words
- text
- emotional
- word
- emotion
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000002996 emotional effect Effects 0.000 title claims abstract description 51
- 238000000034 method Methods 0.000 title claims abstract description 19
- 230000008451 emotion Effects 0.000 claims abstract description 49
- 230000007935 neutral effect Effects 0.000 claims abstract description 11
- 102100029469 WD repeat and HMG-box DNA-binding protein 1 Human genes 0.000 claims description 3
- 101710097421 WD repeat and HMG-box DNA-binding protein 1 Proteins 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 2
- 238000013145 classification model Methods 0.000 abstract 1
- 239000003086 colorant Substances 0.000 abstract 1
- 238000003058 natural language processing Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
Abstract
The invention discloses a method for classifying emotional word text information, which comprises the following steps: acquiring text information; inputting text words; judging the part of speech, wherein the part of speech in the judged text is divided into emotional words, negative words and degree adverbs; scoring the emotional words, inputting the emotional words into an emotional word dictionary to score to obtain a score of 1; judging whether the front and the back of the emotional words correspond to degree adverbs or not; obtaining a new emotion word score of 2; judging the front and the back of the emotional words again to obtain an emotional word score 3; outputting a text final score and the sum of the final scores of all the emotional words; and dividing the text categories according to the final scores of the texts. The invention aims to establish a text information classification model, and aims to score emotion colors of texts, and further divide the texts into three categories of positive, neutral and negative directions.
Description
Technical Field
The invention relates to the technical field of text information classification, in particular to a method for classifying emotional word text information.
Background
Emotion classification is a typical problem in the field of Natural Language Processing (NLP) and describes a given segment of text (which may be a sentence or an article) to determine whether the emotion expressed by the article is positive, negative or neutral.
The sentiment classification problem itself is a topic that is extensively studied both in academia and industry. The use of an emotion dictionary is one approach to solving the emotion classification problem. Firstly, some emotion words such as positive emotion words and negative emotion words are set artificially, and then emotion classification of the text is determined by counting the proportion of the positive emotion words and the negative emotion words of the input text.
And the judgment accuracy of partial samples with low absolute score values is not high, and the classification is not clear enough.
When the emotional color of the text is analyzed, the relationship between the context and the sentence is not considered, and the special text such as the irony text is easy to be judged by mistake.
Disclosure of Invention
The invention aims to solve the problems in the prior art.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a method for classifying emotional word text information comprises the following steps:
acquiring text information;
inputting text words;
judging the part of speech, wherein the part of speech in the judged text is divided into emotional words, negative words and degree adverbs;
scoring the emotional words, inputting the emotional words into an emotional word dictionary to score to obtain a score of 1;
judging whether the front and the back of the emotional words correspond to degree adverbs or not;
obtaining a new emotion word score of 2;
judging the front and the back of the emotional words again to obtain an emotional word score 3;
outputting a text final score and the sum of the final scores of all the emotional words;
and dividing the text categories according to the final scores of the texts.
And further, the method also comprises a text dictionary, wherein the text dictionary comprises an emotion dictionary, a negative word dictionary and a degree adverb dictionary, and words in the word list of each text object are classified to generate the emotion word dictionary, the negative word dictionary and the degree adverb dictionary of the text object.
Further, the emotion dictionary comprises positive emotion words and negative emotion words, the degree adverb dictionary and the emotion words have scores, and the negative words have no scores.
Further, a text data set is established, each text object is marked manually, emotion classification of the text is divided into 3 types of positive direction, neutral direction and negative direction, the 3 types are marked as 1, 0 and-1 respectively, each text object is processed by utilizing the ending participle, stop words are removed according to the stop word dictionary, and a word list of each text is obtained.
Further, judging whether the calculation range of the front and rear scores of the emotional words is that a negative word and a degree adverb between the two emotional words and the latter emotional word form an emotional phrase, wherein the sum of the scores of all the emotional phrases is the emotion polarity score of the text. The formula is as follows:
wherein a isiIs the number of negative words in the ith emotional phrase, biIs the product of the weights of all degree adverbs in the phrase, ciIs the score of the emotional adverb.
The invention has the beneficial effects that: according to the technical scheme, the texts are subjected to emotion color scoring in the implementation process, so that the texts are divided into three categories of positive, neutral and negative, and compared with the text object marks in the text data set, the updated open source dictionary can be supplemented at any time.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
FIG. 1 is a flowchart of a classification process according to a first embodiment of the present invention;
Detailed Description
The technical solution of the present invention is further specifically described below by way of specific examples in conjunction with the accompanying drawings.
In the first embodiment, please refer to fig. 1, the present invention aims to establish an emotion word text information classification method, which is used for performing emotion color scoring on a text, and further classifying the text into three categories, namely positive, neutral and negative.
1. Establishing a dictionary
And establishing an emotion dictionary (comprising positive emotion words and negative emotion words), a negative word dictionary and a degree adverb dictionary. The degree adverb dictionary and the emotion words have scores, the negative words have no scores, and the dictionary format is shown in table 1. (the above 3 dictionaries are open source dictionaries issued on the network)
TABLE 1 dictionary Format
2 text segmentation
Establishing a text data set, manually marking each text object, and dividing the emotion classification of the text into 3 types of positive, neutral and negative, which are respectively marked as 1, 0 and-1.
And processing each text object by using the ending segmentation words, and removing stop words according to the stop word dictionary to obtain a word list of each text. (stop word dictionary as used herein is an open source dictionary that has been published on the network)
3 generating a text dictionary
And classifying words in the word list of each text object to generate an emotion word dictionary, a negative word dictionary and a degree adverb dictionary of the text object.
4 calculate the sentiment polarity score
The negative words and the degree adverbs between the two emotional words and the latter emotional word form an emotional phrase, and the sum of the scores of all the emotional phrases is the emotion polarity score of the text. The formula is as follows:
wherein a isiIs the number of negative words in the ith emotional phrase, biIs the product of the weights of all degree adverbs in the phrase, ciIs the score of the emotional adverb.
5 determining the classification range
And dividing the positive and negative orientation or neutrality of the text emotion by using the scores of the text objects.
And observing the scores, and finding that the score 0 is not a reasonable positive and negative boundary, so that the classification problem is abstracted into an optimization problem, namely, upper and lower limits of optimal neutral scores are searched, and the accuracy of the obtained classification is highest compared with the classification marked manually. And after this range is obtained, the classification criteria can be applied to other texts.
The feasible region is determined according to the sample fraction, such as the range of the intermediate segment data with reasonable percentage according to the sorted fraction, wherein the lower feasible region is (-2,4), and the upper feasible region is (-1, 6). The objective function is the classification accuracy. And if the accuracy of the new upper and lower boundaries is higher than that of the old upper and lower boundaries, updating the upper and lower boundaries.
The upper neutral limit was found to be 3.7 points, the lower neutral limit was found to be-1 points, and the classification accuracy was found to be 86.24%. That is, when the score of a text is less than-1, the text is negative emotionally; the score is between-1 and 3.7, and the text is emotionally neutral; the score is greater than 3.7 and the text is emotionally positive. The accuracy of emotion classification of text using this method is 86.24%.
The method has certain dependency on the emotion dictionary, and the emotion dictionary can be optimized by utilizing a naive Bayes algorithm of machine learning according to samples on different platforms in the later period, so that the method is more suitable for text emotion judgment on a specific platform.
The above-described embodiments are only preferred embodiments of the present invention, and are not intended to limit the present invention in any way, and other variations and modifications may be made without departing from the spirit of the invention as set forth in the claims.
Claims (5)
1. A method for classifying emotional word text information is characterized by comprising the following steps:
acquiring text information;
inputting text words;
judging the part of speech, wherein the part of speech in the judged text is divided into emotional words, negative words and degree adverbs;
scoring the emotional words, inputting the emotional words into an emotional word dictionary to score to obtain a score of 1;
judging whether the front and the back of the emotional words correspond to degree adverbs or not;
obtaining a new emotion word score of 2;
judging the front and the back of the emotional words again to obtain an emotional word score 3;
outputting a text final score and the sum of the final scores of all the emotional words;
and dividing the text categories according to the final scores of the texts.
2. The emotion word text information classification method of claim 1, characterized in that: the method also comprises a text dictionary, wherein the text dictionary comprises an emotion dictionary, a negative word dictionary and a degree adverb dictionary which are established, words in a word list of each text object are classified, and the emotion word dictionary, the negative word dictionary and the degree adverb dictionary of the text object are generated.
3. The emotion word text information classification method of claim 2, characterized in that: the emotion dictionary comprises positive emotion words and negative emotion words, the degree adverb dictionary and the emotion words have scores, and the negative words have no scores.
4. The emotion word text information classification method of claim 3, characterized in that: the method also comprises the steps of establishing a text data set, manually marking each text object, dividing emotion classification of the text into 3 types of positive, neutral and negative directions, respectively marking the types as 1, 0 and-1, processing each text object by using a stop word and removing stop words according to a stop word dictionary to obtain a word list of each text.
5. The emotion word text information classification method according to any one of claims 1 to 4, characterized in that: and judging whether the calculation range of the front and rear scores of the emotional words is negative words and degree adverbs between the two emotional words and the latter emotional word form an emotional phrase, wherein the sum of the scores of all the emotional phrases is the emotion polarity score of the text. The formula is as follows:
wherein a isiIs the number of negative words in the ith emotional phrase, biIs the product of the weights of all degree adverbs in the phrase, ciIs the score of the emotional adverb.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911341489.8A CN111104515A (en) | 2019-12-24 | 2019-12-24 | Emotional word text information classification method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911341489.8A CN111104515A (en) | 2019-12-24 | 2019-12-24 | Emotional word text information classification method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111104515A true CN111104515A (en) | 2020-05-05 |
Family
ID=70423969
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911341489.8A Pending CN111104515A (en) | 2019-12-24 | 2019-12-24 | Emotional word text information classification method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111104515A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111767399A (en) * | 2020-06-30 | 2020-10-13 | 平安国际智慧城市科技股份有限公司 | Emotion classifier construction method, device, equipment and medium based on unbalanced text set |
CN112668330A (en) * | 2020-12-31 | 2021-04-16 | 北京大米科技有限公司 | Data processing method and device, readable storage medium and electronic equipment |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102929863A (en) * | 2012-11-06 | 2013-02-13 | 苏州两江科技有限公司 | Method for intelligently analyzing Chinese character emotional tendency through computer |
CN105138506A (en) * | 2015-07-09 | 2015-12-09 | 天云融创数据科技(北京)有限公司 | Financial text sentiment analysis method |
CN106503049A (en) * | 2016-09-22 | 2017-03-15 | 南京理工大学 | A kind of microblog emotional sorting technique for merging multiple affection resources based on SVM |
CN106598944A (en) * | 2016-11-25 | 2017-04-26 | 中国民航大学 | Civil aviation security public opinion emotion analysis method |
CN107656917A (en) * | 2016-07-26 | 2018-02-02 | 深圳联友科技有限公司 | A kind of Chinese sentiment analysis method and system |
CN110362679A (en) * | 2019-06-05 | 2019-10-22 | 北京大学(天津滨海)新一代信息技术研究院 | A kind of financial field comment sensibility classification method and system based on sentiment dictionary |
CN110598219A (en) * | 2019-10-23 | 2019-12-20 | 安徽理工大学 | Emotion analysis method for broad-bean-net movie comment |
-
2019
- 2019-12-24 CN CN201911341489.8A patent/CN111104515A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102929863A (en) * | 2012-11-06 | 2013-02-13 | 苏州两江科技有限公司 | Method for intelligently analyzing Chinese character emotional tendency through computer |
CN105138506A (en) * | 2015-07-09 | 2015-12-09 | 天云融创数据科技(北京)有限公司 | Financial text sentiment analysis method |
CN107656917A (en) * | 2016-07-26 | 2018-02-02 | 深圳联友科技有限公司 | A kind of Chinese sentiment analysis method and system |
CN106503049A (en) * | 2016-09-22 | 2017-03-15 | 南京理工大学 | A kind of microblog emotional sorting technique for merging multiple affection resources based on SVM |
CN106598944A (en) * | 2016-11-25 | 2017-04-26 | 中国民航大学 | Civil aviation security public opinion emotion analysis method |
CN110362679A (en) * | 2019-06-05 | 2019-10-22 | 北京大学(天津滨海)新一代信息技术研究院 | A kind of financial field comment sensibility classification method and system based on sentiment dictionary |
CN110598219A (en) * | 2019-10-23 | 2019-12-20 | 安徽理工大学 | Emotion analysis method for broad-bean-net movie comment |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111767399A (en) * | 2020-06-30 | 2020-10-13 | 平安国际智慧城市科技股份有限公司 | Emotion classifier construction method, device, equipment and medium based on unbalanced text set |
CN112668330A (en) * | 2020-12-31 | 2021-04-16 | 北京大米科技有限公司 | Data processing method and device, readable storage medium and electronic equipment |
CN112668330B (en) * | 2020-12-31 | 2024-01-26 | 北京大米科技有限公司 | Data processing method and device, readable storage medium and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109002473B (en) | Emotion analysis method based on word vectors and parts of speech | |
Maia et al. | Finsslx: A sentiment analysis model for the financial domain using text simplification | |
CN108563638B (en) | Microblog emotion analysis method based on topic identification and integrated learning | |
CN106776538A (en) | The information extracting method of enterprise's noncanonical format document | |
Rizki et al. | Comparison of stemming algorithms on Indonesian text processing | |
CN107038249A (en) | Network public sentiment information sensibility classification method based on dictionary | |
CN107168956B (en) | Chinese chapter structure analysis method and system based on pipeline | |
CN110096587B (en) | Attention mechanism-based LSTM-CNN word embedded fine-grained emotion classification model | |
CN110674296B (en) | Information abstract extraction method and system based on key words | |
CN104142912A (en) | Accurate corpus category marking method and device | |
CN111104515A (en) | Emotional word text information classification method | |
CN107818173B (en) | Vector space model-based Chinese false comment filtering method | |
Burlot et al. | Word representations in factored neural machine translation | |
CN107451116B (en) | Statistical analysis method for mobile application endogenous big data | |
CN109062977A (en) | A kind of automatic question answering text matching technique, automatic question-answering method and system based on semantic similarity | |
Fauziah et al. | Lexicon based sentiment analysis in Indonesia languages: A systematic literature review | |
CN112989848B (en) | Training method for neural machine translation model of field adaptive medical literature | |
CN113032559B (en) | Language model fine tuning method for low-resource adhesive language text classification | |
CN111191029B (en) | AC construction method based on supervised learning and text classification | |
CN113065350A (en) | Biomedical text word sense disambiguation method based on attention neural network | |
CN112632272A (en) | Microblog emotion classification method and system based on syntactic analysis | |
WO2020199590A1 (en) | Mood detection analysis method and related device | |
CN110708619A (en) | Word vector training method and device for intelligent equipment | |
CN110362682A (en) | A kind of entity coreference resolution method based on statistical machine learning algorithm | |
CN111898375B (en) | Automatic detection and division method for article discussion data based on word vector sentence chain |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200505 |