CN111008527A - Emotion analysis system and method - Google Patents

Emotion analysis system and method Download PDF

Info

Publication number
CN111008527A
CN111008527A CN201911310436.XA CN201911310436A CN111008527A CN 111008527 A CN111008527 A CN 111008527A CN 201911310436 A CN201911310436 A CN 201911310436A CN 111008527 A CN111008527 A CN 111008527A
Authority
CN
China
Prior art keywords
emotion
text
text data
emotion analysis
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911310436.XA
Other languages
Chinese (zh)
Inventor
陈泽勇
张治同
姚松
张莉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Dippmann Information Technology Co Ltd
Original Assignee
Chengdu Dippmann Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Dippmann Information Technology Co Ltd filed Critical Chengdu Dippmann Information Technology Co Ltd
Priority to CN201911310436.XA priority Critical patent/CN111008527A/en
Publication of CN111008527A publication Critical patent/CN111008527A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses an emotion analysis system and method. The method comprises the following steps: reading a text data file needing emotion analysis, preprocessing the text data file, performing text emotion calculation analysis, and outputting emotion analysis results. The method can perform text sentiment classification operation on a given text file data set, directly generate three sentiment labels of specified types, namely neutral, positive and negative, and support documents of two languages, namely Chinese and English.

Description

Emotion analysis system and method
Technical Field
The invention belongs to the field of data processing, and particularly relates to an emotion analysis system and method.
Background
Computer technology and network technology are rapidly developed at present, and the Internet plays an important role in daily life of people. There are many text data on the internet, such as the blog of a microblog, the articles of various news websites, and the like. These text data carry positive, neutral, and negative subjective sentiments. By analyzing the subjective emotions, the public sentiment can be effectively monitored and managed. The analysis of subjective emotion can be performed manually, but on one hand, the workload is too large, and on the other hand, new text data cannot be processed in real time.
Disclosure of Invention
The invention aims to provide an emotion analysis system and method aiming at the defects of the prior art, which can carry out text emotion classification on a given text file data set.
An emotion analysis method comprising:
inputting text data: and reading a text data file needing emotion analysis.
Preprocessing text data, including word segmentation and stop word removal; if the English document is an English document, the word segmentation step is not performed; word embedding is carried out in a word2vector or bert vector mode.
Performing text emotion calculation and analysis, wherein a Conv-GRNN algorithm or an LSTM-GRNN algorithm is adopted, and the text emotion calculation and analysis comprises sentence expression and text expression; sentence expression: embedding words, and converting the words into embedded 200-dimensional word vectors; mining characteristics of unitary, binary and ternary in the sentence by using three convolution kernels with the widths of 1, 2 and 3; outputting the linear layer to a mean pooling layer, and converting the linear layer into a vector with fixed length; adding a tanh activation function to introduce nonlinearity, then integrating the results of the three convolutions, averaging and outputting; text expression: the output vector of the last hidden layer is used as the characteristic expression for emotion classification by the GatedN, and the calculation mode of the GatedRNN is as follows:
it=sigmoid(Wi⋅[ht−1;st]+bi)
ft=sigmoid(Wf⋅[ht−1;st]+bf)
gt=tanh(Wr⋅[ht−1;st]+br)
ht=tanh(it⊙[ht−1;st]+bi);
and may further use the mean of gatednns to integrate historical information.
And (4) emotion classification, manually labeling a positive emotion label, a neutral emotion label or a negative emotion label for each text sample participating in training, then putting the sample labeled with the emotion label into training, and stopping training when the classification precision value of F1 reaches 80% after iteration for a certain turn. And then, verifying the trained model by using a test sample prepared in advance, if the classification precision value of F1 on the test sample also reaches 80%, indicating that the trained model is satisfactory, and carrying out emotion classification on the specified text data by using the model.
And outputting emotion analysis results.
An emotion analysis system comprises a text data input module, a text data preprocessing module, a text emotion algorithm module and an emotion analysis result output module; the text data input module is used for reading a text data file which needs emotion analysis; the text data preprocessing module is used for preprocessing the loaded text data; the text emotion algorithm module is used for calculating and judging the emotion type of the text data; and the emotion analysis result output module is used for outputting emotion analysis results.
The text data preprocessing module comprises a word2vector processing module and a Bert processing module.
The text emotion algorithm module comprises a Conv-GRNN algorithm module and an LSTM-GRNN algorithm module.
The emotion analysis result is in an EXCEL format, and each corresponding text document has an emotion label corresponding to neutral, positive and negative.
The invention has the beneficial effects that: the method can perform text emotion classification operation on a given Chinese and English text file data set, and directly generate three emotion labels of specified types, namely neutral, positive and negative, for a specified document.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
FIG. 2 is a web services system architecture diagram.
Detailed Description
In order to more clearly understand the technical features, objects, and effects of the present invention, embodiments of the present invention will now be described with reference to the accompanying drawings.
The first embodiment is as follows:
an emotion analysis method comprising:
inputting text data: and reading a text data file needing emotion analysis.
Preprocessing text data, including word segmentation and stop word removal; if the English document is an English document, the word segmentation step is not performed; word embedding is carried out in a word2vector mode. The vector of the document without the stop words is directly calculated, the specific mode is that the vector of the document is superposed by the sentence vectors and then averaged, and the sentence vectors are superposed by the word vectors and then averaged.
Performing text emotion calculation and analysis, wherein a Conv-GRNN algorithm is adopted, and the method comprises sentence expression and text expression; sentence expression: embedding words, and converting the words into embedded 200-dimensional word vectors; mining characteristics of unitary, binary and ternary in the sentence by using three convolution kernels with the widths of 1, 2 and 3; outputting the linear layer to a mean pooling layer, and converting the linear layer into a vector with fixed length; adding a tanh activation function to introduce nonlinearity, then integrating the results of the three convolutions, averaging and outputting; text expression: the output vector of the last hidden layer is used as the characteristic expression for emotion classification by the GatedN, and the calculation mode of the GatedRNN is as follows:
it=sigmoid(Wi⋅[ht−1;st]+bi)
ft=sigmoid(Wf⋅[ht−1;st]+bf)
gt=tanh(Wr⋅[ht−1;st]+br)
ht=tanh(it⊙[ht−1;st]+bi);
and may further use the mean of gatednns to integrate historical information.
And (4) emotion classification, manually labeling a positive emotion label, a neutral emotion label or a negative emotion label for each text sample participating in training, then putting the sample labeled with the emotion label into training, and stopping training when the classification precision value of F1 reaches 80% after iteration for a certain turn. And then, verifying the trained model by using a test sample prepared in advance, and if the classification precision value of F1 on the test sample also reaches 80%, indicating that the trained model is in accordance with the requirements, and then exporting the model in a file form. The model file can be directly used by other business programs or software in the application of an actual scene to carry out emotion classification on the specified text data.
And outputting emotion analysis results, and storing the emotion analysis results by adopting a map structure.
An emotion analysis system comprises a text data input module, a text data preprocessing module, a text emotion algorithm module and an emotion analysis result output module; the text data input module is used for reading a text data file which needs emotion analysis; the text data preprocessing module is used for preprocessing the loaded text data; the text emotion algorithm module is used for calculating and judging the emotion type of the text data; and the emotion analysis result output module is used for outputting emotion analysis results.
The text data preprocessing module adopts a word2vector processing module.
The text emotion algorithm module adopts a Conv-GRNN algorithm module.
The emotion analysis result is in an EXCEL format, and each corresponding text document has an emotion label corresponding to neutral, positive and negative.
The system also comprises an error processing module which is used for displaying error information, including error time, error grade, error reason and error place. And when the text data set is too large and the memory overflows, all the data in the system is rolled back to the state before the error occurs. The log function is opened by default in the tool, a log module manages logs, and log files are stored in the same root directory as the tool.
Example two:
an emotion analysis method comprising:
inputting text data: and reading a text data file needing emotion analysis.
Preprocessing text data, including word segmentation and stop word removal; if the English document is an English document, the word segmentation step is not performed; and (4) word embedding is carried out in a bert vector mode, and the vector representation of each word or phrase obtained after a specific corpus is trained by using an attention mechanism and a transform mechanism is directly used.
Text emotion calculation and analysis, wherein an LSTM-GRNN algorithm is adopted, and sentence expression and text expression are included; sentence expression: embedding words, and converting the words into embedded 200-dimensional word vectors; mining characteristics of unitary, binary and ternary in the sentence by using three convolution kernels with the widths of 1, 2 and 3; outputting the linear layer to a mean pooling layer, and converting the linear layer into a vector with fixed length; adding a tanh activation function to introduce nonlinearity, then integrating the results of the three convolutions, averaging and outputting; text expression: the output vector of the last hidden layer is used as the characteristic expression for emotion classification by the GatedN, and the calculation mode of the GatedRNN is as follows:
it=sigmoid(Wi⋅[ht−1;st]+bi)
ft=sigmoid(Wf⋅[ht−1;st]+bf)
gt=tanh(Wr⋅[ht−1;st]+br)
ht=tanh(it⊙[ht−1;st]+bi);
and may further use the mean of gatednns to integrate historical information.
And (4) emotion classification, manually labeling a positive emotion label, a neutral emotion label or a negative emotion label for each text sample participating in training, then putting the sample labeled with the emotion label into training, and stopping training when the classification precision value of F1 reaches 80% after iteration for a certain turn. And then, verifying the trained model by using a test sample prepared in advance, and if the classification precision value of F1 on the test sample also reaches 80%, indicating that the trained model is in accordance with the requirements, and then exporting the model in a file form. The model file can be directly used by other business programs or software in the application of an actual scene to carry out emotion classification on the specified text data.
And outputting emotion analysis results, and storing the emotion analysis results by adopting a map structure.
An emotion analysis system comprises a text data input module, a text data preprocessing module, a text emotion algorithm module and an emotion analysis result output module; the text data input module is used for reading a text data file which needs emotion analysis; the text data preprocessing module is used for preprocessing the loaded text data; the text emotion algorithm module is used for calculating and judging the emotion type of the text data; and the emotion analysis result output module is used for outputting emotion analysis results.
The text data preprocessing module adopts a Bert processing module, and simultaneously embeds a small WEB service system in the system, because the BERT model has large scale and relatively long running time, if a plurality of users use the BERT model at the same time, blocking phenomena can occur, so that a small WEB service system is developed by using a DOCKER container to be embedded in the whole system, so that the BERT model can be used by a plurality of users at the same time, and WEB services based on the Restful style are provided in an HTTP mode through a WEB service layer at the level of the DOCKER container.
The text emotion algorithm module adopts an LSTM-GRNN algorithm module.
The emotion analysis result is in an EXCEL format, and each corresponding text document has an emotion label corresponding to neutral, positive and negative.
The system also comprises an error processing module which is used for displaying error information, including error time, error grade, error reason and error place. And when the text data set is too large and the memory overflows, all the data in the system is rolled back to the state before the error occurs. The log function is opened by default in the tool, a log module manages logs, and log files are stored in the same root directory as the tool.
The method can perform text emotion classification operation on the given Chinese and English text file data sets, and directly generate three emotion labels of specified types, namely neutral, positive and negative emotion labels for the specified documents.
The foregoing shows and describes the general principles and broad features of the present invention and advantages thereof. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (17)

1. An emotion analysis method, comprising the steps of:
s1: inputting text data: reading a text data file needing emotion analysis;
s2: text data preprocessing, comprising the following substeps:
s21: if the text data is a Chinese document, performing word segmentation processing on the text and then entering step S22; if the text is an English document, proceed directly to step S22;
s22: removing stop words;
s3: calculating and analyzing text emotion;
step S3 includes the following substeps:
s31: sentence expression, comprising the following substeps:
s311: embedding words, and converting the words into word vectors with embedded dimensions d;
s312: mining characteristics of unitary, binary and ternary in the sentence by using three convolution kernels with the widths of 1, 2 and 3;
s313: outputting the linear layer to a mean pooling layer, and converting the linear layer into a vector with fixed length;
s314: adding a tanh activation function to introduce nonlinearity, then integrating the results of the three convolutions, averaging and outputting;
s32: text expression: the output vector of the last hidden layer is used as the characteristic expression for emotion classification by the GatedN, and the calculation mode of the GatedRNN is as follows:
it=sigmoid(Wi⋅[ht−1;st]+bi)
ft=sigmoid(Wf⋅[ht−1;st]+bf)
gt=tanh(Wr⋅[ht−1;st]+br)
ht=tanh(it⊙[ht−1;st]+bi);
s33: classifying the emotions;
s4: and outputting emotion analysis results.
2. The emotion analysis method of claim 1, wherein in step S2, word embedding is performed in the text data preprocessing by using a word2vector mode.
3. The emotion analysis method of claim 1, wherein the text data preprocessing in step S2 adopts a bert vector approach for word embedding.
4. The emotion analysis method of claim 1, wherein in step S3, the text emotion calculation analysis adopts a Conv-GRNN algorithm.
5. The emotion analysis method of claim 1, wherein in step S3, the text emotion calculation analysis adopts LSTM-GRNN algorithm.
6. A sentiment analysis method according to claim 1 wherein d =200 in step S311.
7. An emotion analysis method as claimed in claim 1, wherein step S32 further includes using the mean value of GatednN to integrate history information.
8. The emotion analysis method according to claim 1, wherein the emotion classification in step S33 is specifically performed by:
s331: manually labeling each text sample participating in training with a positive emotion label, a neutral emotion label or a negative emotion label;
s332: putting the sample labeled with the emotion label into training, and stopping training after iterating for a certain turn until the classification precision value of F1 reaches 80%;
s333: verifying the model obtained by training in the step S332 by using a test sample prepared in advance, wherein if the classification accuracy value of F1 on the test sample also reaches 80%, the model obtained by training meets the requirement;
s334: and performing emotion classification on the specified text data by using the model obtained in step S333.
9. The emotion analysis system is characterized by comprising a text data input module, a text data preprocessing module, a text emotion algorithm module and an emotion analysis result output module;
the text data input module is used for reading a text data file which needs emotion analysis;
the text data preprocessing module is used for preprocessing the loaded text data;
the text emotion algorithm module is used for calculating and judging the emotion type of the text data;
and the emotion analysis result output module is used for outputting emotion analysis results.
10. An emotion analysis system as claimed in claim 9, wherein the text data pre-processing module comprises a word2vector processing module.
11. An emotion analysis system as claimed in claim 9, wherein the text data preprocessing module comprises a Bert processing module.
12. An emotion analysis system as claimed in claim 9, wherein the text emotion algorithm module comprises a Conv-GRNN algorithm module.
13. An emotion analysis system as claimed in claim 9, wherein the text emotion algorithm module comprises the LSTM-GRNN algorithm module.
14. An emotion analysis system as claimed in claim 9, wherein the emotion analysis result is in EXCEL format.
15. An emotion analysis system as claimed in claim 11, further comprising a web service system built by DOCKER container technology; restful style based WEB services are provided in HTTP form through a WEB service layer at the DOCKER container level.
16. An emotion analysis system as claimed in any one of claims 9 to 15, further comprising an error handling module, wherein the error handling module is configured to display error information, and when a memory overflow occurs due to an excessively large text data set, all data in the system is rolled back to a pre-error state.
17. An emotion analysis system as claimed in claim 16, wherein the error information includes an error time, an error level, an error cause and an error location.
CN201911310436.XA 2019-12-18 2019-12-18 Emotion analysis system and method Pending CN111008527A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911310436.XA CN111008527A (en) 2019-12-18 2019-12-18 Emotion analysis system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911310436.XA CN111008527A (en) 2019-12-18 2019-12-18 Emotion analysis system and method

Publications (1)

Publication Number Publication Date
CN111008527A true CN111008527A (en) 2020-04-14

Family

ID=70116534

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911310436.XA Pending CN111008527A (en) 2019-12-18 2019-12-18 Emotion analysis system and method

Country Status (1)

Country Link
CN (1) CN111008527A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150120282A1 (en) * 2013-10-30 2015-04-30 Lenovo (Singapore) Pte. Ltd. Preserving emotion of user input
CN106776581A (en) * 2017-02-21 2017-05-31 浙江工商大学 Subjective texts sentiment analysis method based on deep learning
US20180124242A1 (en) * 2016-11-02 2018-05-03 International Business Machines Corporation System and Method for Monitoring and Visualizing Emotions in Call Center Dialogs by Call Center Supervisors
CN109858009A (en) * 2017-11-30 2019-06-07 财团法人资讯工业策进会 Device, method and its computer storage medium of control instruction are generated according to text
CN109933795A (en) * 2019-03-19 2019-06-25 上海交通大学 Based on context-emotion term vector text emotion analysis system
CN110134765A (en) * 2019-05-05 2019-08-16 杭州师范大学 A kind of dining room user comment analysis system and method based on sentiment analysis
CN110232109A (en) * 2019-05-17 2019-09-13 深圳市兴海物联科技有限公司 A kind of Internet public opinion analysis method and system
CN110502753A (en) * 2019-08-23 2019-11-26 昆明理工大学 A kind of deep learning sentiment analysis model and its analysis method based on semantically enhancement

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150120282A1 (en) * 2013-10-30 2015-04-30 Lenovo (Singapore) Pte. Ltd. Preserving emotion of user input
US20180124242A1 (en) * 2016-11-02 2018-05-03 International Business Machines Corporation System and Method for Monitoring and Visualizing Emotions in Call Center Dialogs by Call Center Supervisors
CN106776581A (en) * 2017-02-21 2017-05-31 浙江工商大学 Subjective texts sentiment analysis method based on deep learning
CN109858009A (en) * 2017-11-30 2019-06-07 财团法人资讯工业策进会 Device, method and its computer storage medium of control instruction are generated according to text
CN109933795A (en) * 2019-03-19 2019-06-25 上海交通大学 Based on context-emotion term vector text emotion analysis system
CN110134765A (en) * 2019-05-05 2019-08-16 杭州师范大学 A kind of dining room user comment analysis system and method based on sentiment analysis
CN110232109A (en) * 2019-05-17 2019-09-13 深圳市兴海物联科技有限公司 A kind of Internet public opinion analysis method and system
CN110502753A (en) * 2019-08-23 2019-11-26 昆明理工大学 A kind of deep learning sentiment analysis model and its analysis method based on semantically enhancement

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
郑亚楠: ""基于LSTM的汉语语义角色标注研究"", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》 *

Similar Documents

Publication Publication Date Title
CN110188194B (en) False news detection method and system based on multitask learning model
CN108269125B (en) Comment information quality evaluation method and system and comment information processing method and system
CN109101489B (en) Text automatic summarization method and device and electronic equipment
CN112084334B (en) Label classification method and device for corpus, computer equipment and storage medium
CN111368175B (en) Event extraction method and system and entity classification model
Sergio et al. Stacked DeBERT: All attention in incomplete data for text classification
CN110442872B (en) Text element integrity checking method and device
US10528609B2 (en) Aggregating procedures for automatic document analysis
CN110019820B (en) Method for detecting time consistency of complaints and symptoms of current medical history in medical records
CN106610932A (en) Corpus processing method and device and corpus analyzing method and device
CN115017916A (en) Aspect level emotion analysis method and device, electronic equipment and storage medium
US9563847B2 (en) Apparatus and method for building and using inference engines based on representations of data that preserve relationships between objects
EP3752929A1 (en) Computer-implemented methods, computer-readable media, and systems for identifying causes of loss
CN112784580A (en) Financial data analysis method and device based on event extraction
CN112632975A (en) Upstream and downstream relation extraction method and device, electronic equipment and storage medium
CN107291686B (en) Method and system for identifying emotion identification
Riadsolh et al. Cloud-Based Sentiment Analysis for Measuring Customer Satisfaction in the Moroccan Banking Sector Using Na? ve Bayes and Stanford NLP
CN111008527A (en) Emotion analysis system and method
CN109344388A (en) A kind of comment spam recognition methods, device and computer readable storage medium
CN114676699A (en) Entity emotion analysis method and device, computer equipment and storage medium
CN111178038B (en) Document similarity recognition method and device based on latent semantic analysis
CN107729509A (en) The chapter similarity decision method represented based on recessive higher-dimension distributed nature
CN113127607A (en) Text data labeling method and device, electronic equipment and readable storage medium
CN112417858A (en) Entity weight scoring method, system, electronic equipment and storage medium
CN113934842A (en) Text clustering method and device and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200414