CN109165298B - Text emotion analysis system capable of achieving automatic upgrading and resisting noise - Google Patents

Text emotion analysis system capable of achieving automatic upgrading and resisting noise Download PDF

Info

Publication number
CN109165298B
CN109165298B CN201810930606.3A CN201810930606A CN109165298B CN 109165298 B CN109165298 B CN 109165298B CN 201810930606 A CN201810930606 A CN 201810930606A CN 109165298 B CN109165298 B CN 109165298B
Authority
CN
China
Prior art keywords
industry
algorithm model
emotion
module
learning algorithm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810930606.3A
Other languages
Chinese (zh)
Other versions
CN109165298A (en
Inventor
陈福
刘洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Wujie Data Technology Co ltd
Original Assignee
Shanghai Wujie Data Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Wujie Data Technology Co ltd filed Critical Shanghai Wujie Data Technology Co ltd
Priority to CN201810930606.3A priority Critical patent/CN109165298B/en
Publication of CN109165298A publication Critical patent/CN109165298A/en
Application granted granted Critical
Publication of CN109165298B publication Critical patent/CN109165298B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

An autonomously upgraded and anti-noise text emotion analysis system relates to the technical field of text emotion analysis and comprises a user side, a background side and a text emotion judgment system; the text emotion judging system comprises a medium classification module, an industry classification module, a medium engine group, an industry engine group and a rule learning engine group; the rule learning engine group counts the accuracy of the emotion judgment paths for judging the text content emotion tendencies according to the judgment result data, and the emotion judgment paths are the emotion judgment paths with the highest text matching accuracy; meanwhile, the rule learning engine group trains the existing deep learning algorithm model or machine learning algorithm model on line to form a new deep learning algorithm model or machine learning algorithm model, and compares the new deep learning algorithm model or machine learning algorithm model with the existing machine learning algorithm model to realize iterative upgrade. The application provides a text emotion analysis system with self-learning ability, self-adaptive environment and strong anti-interference ability, and the accuracy is improved while the efficiency is guaranteed.

Description

Independently-upgraded and anti-noise text emotion analysis system
Technical Field
The invention relates to the technical field of text emotion analysis, in particular to an autonomously upgraded and noise-resistant text emotion analysis system.
Background
The emotion analysis and accurate judgment of customers are the targets pursued by the Cuminum of the merchants, and with the mass increase of Internet text data, it is not possible to analyze the data manually, so a machine learning method is introduced in a dispute to analyze the emotion of the long or short texts and the information expressed by the texts through a machine, and further, the accurate judgment and grasp of the emotion of the user are expected to be made.
Currently, numerous such technologies have emerged: there are semantic-based and also statistical-based; some are supervised, also have unsupervised, also have semi-supervised; based on traditional SVM or random forest algorithm, and also based on deep learning; there are short text only and long text only. But the performance of such techniques is not as satisfactory from the presently disclosed point of view. For example, in a hundred degree public short text sentiment analysis engine, we have measured that the accuracy is only about 75%. That is, the currently used technology for recognizing text emotion tendencies by a machine has a much lower accuracy for emotion judgment of texts on the internet, which is far from manual judgment and even not more than 80%, compared with the machine AI technology in the field of video recognition.
The analysis shows that the main reasons for poor emotion analysis of the current restricted text are as follows:
1. the existing word segmentation technology and the like can introduce words irrelevant to the article and even causing ambiguity, and the words are the basis of all machine learning algorithms because the words are the source of article feature extraction;
2. the same vocabulary is often different in emotional significance in different types of articles and articles in different fields;
3. the internet is a changed pronoun, new words emerge continuously, or a word has different meanings in similar scenes along with the change of time;
4. although a machine learning type algorithm is adopted, an algorithm model is often well trained by people before an online production environment, and the complex internet environment cannot be learned and adapted autonomously in the operation process.
In conclusion, the interference of the internet is too much, and the currently used machine learning algorithm, although it can prejudge the emotion of the article (despite of inaccuracy), lacks the ability of autonomous adaptation and autonomous learning, that is, lacks a mechanism for resisting noise, so that the accuracy of the currently machine-learned text emotion judgment technology is not high.
Disclosure of Invention
In order to solve at least one of the defects in the prior art, the text emotion analysis system with self-learning capability, self-adaptive environment and strong anti-interference capability is provided, and the accuracy is improved while the efficiency is guaranteed.
In order to achieve the technical effects, the specific technical scheme of the invention is as follows:
an autonomously upgraded and anti-noise text emotion analysis system comprises a user side, a background side and a text emotion judgment system; the text emotion judging system comprises a medium classification module, an industry classification module, a medium engine group, an industry engine group and a rule learning engine group;
the media classification module acquires text content to be subjected to emotion analysis, judges whether the text content is from a media or not, and sends the text content to a media feature dictionary of a corresponding media type if the text content is from the media, or does not send the text content to the media feature dictionary if the text content is not from the media; the media type comprises comments, news, blogs, weChat and microblogs, and the media feature dictionary correspondingly comprises a comment feature dictionary, a news feature dictionary, a blog feature dictionary, a WeChat feature dictionary and a microblog feature dictionary;
the media feature dictionary receives text contents to be subjected to emotion analysis of corresponding media types, words to be subjected to emotion analysis are generated through a word segmentation module, the generated words to be subjected emotion analysis are all sent to M media feature extraction modules in a media feature extraction module group, the media feature extraction modules comprise a first media feature extraction module, a second media feature extraction module, a third media feature extraction module and an Mth media feature extraction module, M is an integer, each media feature extraction module sends a feature vector extracted by the media feature extraction module to N media feature selection modules in the media feature selection module group, each media feature selection module comprises a first media feature selection module, a second media feature selection module and an Nth media feature selection module, N is an integer, and each media feature selection module sends a feature vector selected by the media feature selection module to the media engine group;
the medium engine group comprises a medium deep learning engine group and a medium machine learning engine group, the medium deep learning engine group comprises Q medium deep learning engines for realizing emotion judgment based on a deep learning algorithm model, Q is an integer, the medium machine learning engine group comprises S medium machine learning engines for realizing emotion judgment based on a machine learning algorithm model, and S is an integer, the medium engine group calculates the received feature vector sent by the medium feature selection module based on a corresponding algorithm model, calculates emotion analysis result data of each vocabulary to be subjected to emotion analysis, and sends the calculated emotion analysis result data to the medium emotion tendency judgment module;
the medium emotional tendency judgment module is used for judging whether each emotional analysis result data is correct or not and sending the judgment result data to the rule learning engine group;
the industry classification module acquires text content to be subjected to emotion analysis, judges whether the text content belongs to an industry field or not, and sends the text content to an industry feature dictionary of the corresponding industry field if the text content belongs to the industry field, otherwise, does not send the text content to the industry feature dictionary; the industry fields comprise catering, electronics, automobiles, communication and clothing, and the industry feature dictionary correspondingly comprises a catering field feature dictionary, an electronic field feature dictionary, an automobile field feature dictionary, a communication field feature dictionary and a clothing field feature dictionary;
the industry feature dictionary receives text contents to be subjected to emotion analysis in corresponding industry fields, vocabularies to be subjected to emotion analysis are generated through a word segmentation module, the generated vocabularies to be subjected emotion analysis are all sent to X industry feature extraction modules in an industry feature extraction module group, each industry feature extraction module comprises a first industry feature extraction module, a second industry feature extraction module, a third industry feature extraction module and an Xth industry feature extraction module, X is an integer, each industry feature extraction module sends the feature vectors extracted by the industry feature extraction module to Y industry feature selection modules in the industry feature selection module group, each industry feature selection module comprises a first industry feature selection module, a second industry feature selection module and a Yth industry feature selection module, Y is an integer, and each industry feature selection module sends the feature vectors selected by the industry feature selection module to the industry engine group;
the industry engine group comprises an industry deep learning engine group and an industry machine learning engine group, the industry deep learning engine group comprises U industry deep learning engines for realizing emotion judgment based on deep learning algorithm models, U is an integer, the industry machine learning engine group comprises V industry machine learning engines for realizing emotion judgment based on machine learning algorithm models, V is an integer, the industry engine group calculates the received feature vectors sent by the industry feature selection module based on corresponding algorithm models, calculates emotion analysis result data of the vocabularies to be subjected to emotion analysis, and the industry engine group sends the calculated emotion analysis result data to an industry emotion tendency judgment module;
the industry emotional tendency judgment module is used for judging whether each emotional analysis result data is correct or not and sending the judgment result data to the rule learning engine group;
the rule learning engine group counts the accuracy rate of judging the text content emotional tendency to be subjected to emotional analysis by M X N (Q + S) or/and X X Y (U + V) emotion judging paths according to the received judging result data sent by the medium emotional tendency judging module and the industry emotional tendency judging module, and the judging result data are the emotion judging paths with the highest matching accuracy rate for texts in different medium types and industry fields; meanwhile, the rule learning engine group trains the existing deep learning algorithm model or machine learning algorithm model on line according to the known judgment result data to form a new deep learning algorithm model or machine learning algorithm model, and the new deep learning algorithm model or machine learning algorithm model is added into the medium engine group or the industry engine group to be compared with the existing deep learning algorithm model or machine learning algorithm model for quality, so that the iterative upgrade of the deep learning algorithm model or the machine learning algorithm model is realized.
Further, when the same vocabulary to be emotion analyzed from the text content is simultaneously sent to the medium classification module and the industry classification module and is judged by the medium emotion tendency judgment module and the industry emotion tendency judgment module, the rule learning engine group adopts the following steps:
s1, the rule learning engine group respectively acquires judgment result data of the medium emotional tendency judgment module and the industry emotional tendency judgment module;
s2, judging whether the two judgment result data are consistent, if so, sending the judgment result data to a user at a user side, and enabling the user to online tag texts and form user marking data based on the tag texts; if not, performing step S3;
s3, the rule learning engine group informs an administrator of a background end;
s4, forming administrator marking data based on the online label text of the administrator;
s5, judging whether the industry to which the text content belongs is correct or not by a pipeline operator, and if so, putting the text into an industry correct text training library and a medium error text training library; if not, go to step S6;
and S6, putting the text into an industry error text training library and a medium correct text training library.
Further, the new deep learning algorithm model or the machine learning algorithm model is compared with the existing deep learning algorithm model or the machine learning algorithm model in terms of quality, and the specific steps for realizing the iterative upgrade of the deep learning algorithm model or the machine learning algorithm model are as follows:
a. constructing a new training test sample, wherein the new training test sample consists of user marking data, administrator marking data and training test sample data extracted from new text contents;
b. training an existing deep learning algorithm model or a machine learning algorithm model by using a new training test sample to form a new deep learning algorithm model or a machine learning algorithm model capable of realizing emotion judgment, and simultaneously supplementing new words and old words recognized and obtained in the training process to a corresponding medium feature dictionary or industry feature dictionary;
c. verifying whether the judgment accuracy rate of the new deep learning algorithm model or the machine learning algorithm model on the emotional tendency of the new training test sample reaches 85%, and if the judgment accuracy rate reaches the standard, adding the new deep learning algorithm model or the machine learning algorithm model into a medium engine group or an industry engine group; if not, performing step d;
d. abandoning the iteration;
e. if so, retaining the new deep learning algorithm model or the machine learning algorithm model and deleting the existing deep learning algorithm model or the machine learning algorithm model at the same time, and if not, performing the step f;
f. the existing deep learning algorithm model or machine learning algorithm model is reserved, and the new deep learning algorithm model or machine learning algorithm model is deleted;
g. and repeating the steps a, b, c, d, e and f.
Furthermore, the single emotion judgment path comprises a feature extraction module, a feature selection module and an emotion judgment algorithm model on the path, the feature extraction module performs feature extraction on the participles formed by operating the original texts through the participle module by adopting a participle technology, and extracted feature vectors are selected and corrected by the feature selection module and then are transmitted to the emotion judgment algorithm model for training to form a new emotion judgment algorithm model.
Further, in the operation process of the system, the rule learning engine group is further used for recognizing new words and new meanings of old words and supplementing the obtained new words and new meanings of the old words to corresponding medium feature dictionaries or industry feature dictionaries according to the judgment result data sent by the medium emotional tendency judgment module and the industry emotional tendency judgment module.
Further, the machine learning algorithm model comprises a decision tree algorithm model, a regression algorithm model, a clustering algorithm model and an artificial neural network algorithm model.
According to the technical scheme, a medium classification module and an industry classification module are set, on one hand, different medium feature dictionaries are constructed from different medium types such as comments, news, blogs, microblogs and WeChat, on the other hand, different industry feature dictionaries are constructed from different industry fields such as catering, electronics, clothing, automobiles and communication, and then appropriate feature dictionaries, feature extraction modes and feature vector expression contents, algorithm models and engines are selected according to the medium types and the industry fields to obtain more accurate emotional tendency judgment; in the invention, the adaptation industry and the adaptation medium are two mutually independent emotion judgment paths, a text definitely belongs to different media and industries, and feature extraction and emotion tendency identification are carried out on the basis of the medium and industry characteristics of the text through the two paths; in addition, the invention also introduces a 'rule learning engine group', which realizes the following functions: 1) In the running process of the system, counting each emotion judgment path, judging the accuracy of the text emotion tendency, and searching the most appropriate emotion judgment path for texts in different media types and industry fields; 2) In the system operation process, training an algorithm model on line according to known judgment result data, adding a new algorithm model into an engine group, and competing with other algorithm models; the autonomous learning in the invention is mainly realized in a rule learning engine group and is divided into two levels:
1) Autonomous optimal selection: selecting an emotion judgment path with the highest prejudgment accuracy rate by counting judgment result data of each emotion judgment path;
2) Self-iterative upgrade: and training the old model and generating the new model at regular time by collecting error correction information of users and administrators, and testing, upgrading and replacing the old model with the new model.
Compared with the prior art, the invention has the following advantages:
1. emotion judgment based on media classification and industry classification;
2. the defect that a large amount of noise is introduced by the Internet text in a space-time mode is overcome;
3. the method comprises the following steps of (1) obtaining a proper training and testing sample by good user interaction and facing to mass data of the Internet;
4. the optimal emotion judgment path is selected and is automatically and continuously iterated and upgraded, and the high accuracy of emotion judgment is guaranteed.
Drawings
The present invention will be described in further detail with reference to the following detailed description and the accompanying drawings.
FIG. 1 is a general framework of the present invention;
FIG. 2 is a partial block diagram of the present invention;
FIG. 3 is another partial block diagram of the present invention;
FIG. 4 is a flowchart of a method employed by the rule learning engine set when the same vocabulary to be analyzed for emotion is simultaneously transmitted to the media classification module and the industry classification module;
FIG. 5 is a flowchart of a method for implementing iterative upgrade of a deep learning algorithm model or a machine learning algorithm model in the present invention;
FIG. 6 is a schematic diagram of the generation and use of an emotion judgment path in the present invention;
wherein, 1, a user side; 2. a back desk end; 3. a text emotion judgment system; 4. a medium classification module; 5. an industry classification module; 6. a media engine group; 61. a group of medium deep learning engines; 62. a set of media machine learning engines; 7. an industry engine group; 71. an industry deep learning engine group; 72. an industry machine learning engine set; 8. a set of rule learning engines; 9. a medium feature dictionary; 10. a medium characteristic extraction module group; 11. a media feature selection module group; 12. a medium emotional tendency judgment module; 13. an industry feature dictionary; 14. an industry feature extraction module group; 15. selecting a module group by industry characteristics; 16. an industry emotional tendency judgment module; 17. and judging a path by emotion.
Detailed Description
To make the objects, technical solutions and advantages of the present embodiments more clear, the technical solutions in the present embodiments will be clearly and completely described below with reference to the drawings in the present embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the scope of the present protection.
Furthermore, the terms "first", "second", "M", "Xth", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, features defined as "first," "second," "M," "X," etc. may explicitly or implicitly include one or more of the features.
In the present invention, unless otherwise explicitly stated or limited, the terms "mounted," "connected," "fixed," and the like are to be construed broadly, e.g., as being fixedly connected, detachably connected, or integrated; can be mechanically or electrically connected; they may be directly connected or indirectly connected through intervening media, or may be connected through the use of two elements or the interaction of two elements. The specific meanings of the above terms in the present invention can be understood according to specific situations by those of ordinary skill in the art.
Examples
As shown in fig. 1, an autonomously upgraded and anti-noise text emotion analyzing system includes a user side 1, a background side 2, and a text emotion determining system 3; the text sentiment judgment system 3 comprises a medium classification module 4, an industry classification module 5, a medium engine group 6, an industry engine group 7 and a rule learning engine group 8.
The main classification module 3 is used for acquiring text content to be subjected to emotion analysis, and sending the text content to the medium classification module 4 or/and the industry classification module 5 according to whether the text content is from a medium or has an attributive industry, if the text content is only from a medium, sending the text content to the medium classification module 4, if the text content is only from an attributive industry, sending the text content to the industry classification module 5, and if the text content is from a medium and has an attributive industry, sending the text content to the medium classification module 4 and the industry classification module 5 at the same time.
As shown in fig. 2, the media classification module 4 obtains the text content to be emotion analyzed, determines whether the text content is from a media, and sends the text content to the media feature dictionary 9 of the corresponding media type if the text content is from the media, otherwise, does not send the text content to the media feature dictionary 9; the media types comprise comments, news, blogs, weChat and microblog, and the media feature dictionary 9 correspondingly comprises a comment feature dictionary, a news feature dictionary, a blog feature dictionary, a WeChat feature dictionary and a microblog feature dictionary.
As shown in fig. 2, the medium feature dictionary 9 receives text contents to be emotion analyzed of corresponding medium types, and generates vocabularies to be emotion analyzed by a word segmentation module (not shown in the figure), where the word segmentation module and its word segmentation technology are the prior art, and the generated vocabularies to be emotion analyzed are all sent to M medium feature extraction modules in the medium feature extraction module group 10, where the medium feature extraction modules include a first medium feature extraction module, a second medium feature extraction module, a third medium feature extraction module to an mth medium feature extraction module, and M is an integer, each of the medium feature extraction modules sends its own extracted feature vector to N medium feature selection modules in the medium feature selection module group 11, and the medium feature selection modules include a first medium feature selection module, a second medium feature selection module to an nth medium feature selection module, N is an integer, and each of the medium feature selection modules sends its own selected feature vector to the medium engine group 6; in the invention, a first medium feature extraction module, a second medium feature extraction module, a third medium feature extraction module to an Mth medium feature extraction module, and a first industry feature extraction module, a second industry feature extraction module, a third industry feature extraction module to an Xth industry feature extraction module which are mentioned below are used, in order to achieve the purpose of name distinguishing, the essence of the invention is still the feature extraction module, the same or different feature extraction technologies are adopted, and the feature extraction technologies are the prior art and are not detailed herein; similarly, the essence of the method is still the feature selection module, and the same or different feature selection technologies are adopted, and the feature selection technologies are the prior art.
As shown in fig. 2, the medium engine group 6 includes a medium deep learning engine group 61 and a medium machine learning engine group 62, the medium deep learning engine group 61 includes Q medium deep learning engines for implementing emotion judgment based on a deep learning algorithm model, Q is an integer, the medium machine learning engine group 62 includes S medium machine learning engines for implementing emotion judgment based on a machine learning algorithm model, S is an integer, the medium engine group 6 calculates the received feature vector sent by the medium feature selection module based on a corresponding algorithm model, calculates emotion analysis result data of each vocabulary to be emotion analyzed, and the medium engine group 6 sends the calculated emotion analysis result data to the medium emotion tendency judgment module 12; the machine learning algorithm model comprises a decision tree algorithm model, a regression algorithm model, a clustering algorithm model and an artificial neural network algorithm model.
It is noted here that: in the invention, the medium deep learning engine group 61 and the industry deep learning engine group 71, the medium machine learning engine group 62 and the industry machine learning engine group 72, the medium deep learning engine and the industry deep learning engine, and the medium machine learning engine and the industry machine learning engine are still deep learning engines and machine learning engines in essence for the requirement of name distinction, and the medium deep learning engine group 61 and the industry deep learning engine group 71, the medium machine learning engine group 62 and the industry machine learning engine group 72 can adopt deep learning engines and machine learning engines of the same technology or different technologies; the deep learning engine group is a combination of a plurality of deep learning engines for emotion judgment based on a deep learning algorithm model, different deep learning engines are arranged in the combination due to a plurality of deep learning algorithm models, and the machine learning engine group is a combination of a plurality of machine learning engines for emotion judgment based on a machine learning algorithm model, and different machine learning engines are arranged in the combination due to a plurality of machine learning algorithm models; on an emotion judgment path, a deep learning engine or a machine learning engine is required.
The medium emotional tendency judgment module 12 is configured to judge whether each piece of emotion analysis result data is correct, and send the judgment result data to the rule learning engine group 8.
As shown in fig. 3, the industry classification module 5 acquires text content to be emotion analyzed, determines whether there is an affiliated industry field, and sends the text content to the industry feature dictionary 13 of the corresponding industry field if the text content is an affiliated industry field, otherwise, does not send the text content to the industry feature dictionary 13; the industry field comprises catering, electronics, automobiles, communication and clothing, and the industry feature dictionary 13 correspondingly comprises a catering field feature dictionary, an electronics field feature dictionary, an automobile field feature dictionary, a communication field feature dictionary and a clothing field feature dictionary.
As shown in fig. 3, the industry feature dictionary 13 receives text content to be emotion analyzed in a corresponding industry field, words to be emotion analyzed are generated through a word segmentation module, the generated words to be emotion analyzed are all sent to X industry feature extraction modules in an industry feature extraction module group 14, the industry feature extraction modules include a first industry feature extraction module, a second industry feature extraction module, a third industry feature extraction module to an X-th industry feature extraction module, X is an integer, each industry feature extraction module sends a feature vector extracted by each industry feature extraction module to Y industry feature selection modules in an industry feature selection module group 15, each industry feature selection module includes a first industry feature selection module, a second industry feature selection module to a Y-th industry feature selection module, Y is an integer, and each industry feature selection module sends a feature vector selected by each industry feature selection module to the industry engine group 7.
As shown in fig. 3, the industry engine group 7 includes an industry deep learning engine group 71 and an industry machine learning engine group 72, the industry deep learning engine group 71 includes U industry deep learning engines for implementing emotion judgment based on a deep learning algorithm model, U is an integer, the industry machine learning engine group 72 includes V industry machine learning engines for implementing emotion judgment based on a machine learning algorithm model, V is an integer, the industry engine group 7 calculates the received feature vector sent by the industry feature selection module based on a corresponding algorithm model, calculates emotion analysis result data of each vocabulary to be emotion analyzed, and the industry engine group 7 sends the calculated emotion analysis result data to the industry emotion tendency judgment module 16.
The industry emotional tendency judgment module 16 is configured to judge whether each piece of emotion analysis result data is correct, and send the judgment result data to the rule learning engine group 8.
The rule learning engine group 8 counts the accuracy of judging the emotion tendencies of the text contents to be emotion analyzed by the M × N (Q + S) or/and X × Y (U + V) emotion judgment paths 17 according to the received judgment result data sent by the medium emotion tendency judgment module 12 and the industry emotion tendency judgment module 16, and the judgment result data is the emotion judgment path 17 with the highest text matching accuracy in different medium types and industry fields; as shown in fig. 6, the generation and use of one emotion judgment path in the system: the single emotion judgment path comprises a feature extraction module, a feature selection module and an emotion judgment algorithm model on the path, the feature extraction module performs feature extraction on a participle formed by operating an original text through a participle module by adopting a participle technology, an extracted feature vector is selected and corrected by the feature selection module and then is transmitted to the emotion judgment algorithm model for training, a new emotion judgment algorithm model is formed, and the emotion judgment path is explained by taking a regression algorithm model as an example.
The rule learning engine group 8 trains the existing deep learning algorithm model or machine learning algorithm model on line according to the known judgment result data to form a new deep learning algorithm model or machine learning algorithm model, and adds the new deep learning algorithm model or machine learning algorithm model into the medium engine group 6 or the industry engine group 7 to compare the quality with the existing deep learning algorithm model or machine learning algorithm model, so as to realize the iterative upgrade of the deep learning algorithm model or machine learning algorithm model. Taking a decision tree algorithm model as an example, the online training process of the algorithm model is considered as follows:
1) Collecting all manually marked sample data including marking data of a user and marking data of an administrator, and splitting the sample data into a training set and a testing set according to the proportion of 2;
2) And calculating the information entropy of the feature vocabulary A in the training set D according to the following formula:
Figure BDA0001766511860000101
P(X=A)=P i ,i=1,2,3,...,n
wherein pi is the probability of the characteristic vocabulary A;
3) The information gain G (D, A) of the feature vocabulary A to the training set D is calculated according to the following formula:
G(D,A)=H(D)-H(D|A)
h (D) is the experience entropy of the training set D, and H (D | A) is the experience condition entropy of the training set D under the condition that the characteristic vocabulary A is known;
4) Generating a new decision tree algorithm model according to the decision tree algorithm models such as ID3, CART and the like based on the training set D and the set E (which is an array, wherein each element is a threshold value E) according to the information gain G (D, A) obtained by the previous calculation;
5) Testing the newly generated decision tree by using a test set, and deploying the decision tree to a machine learning engine group when the accuracy is higher than 85%;
6) In a production environment, a newly generated decision tree algorithm model is in competition with an old decision tree algorithm model, and the decision tree algorithm model is superior to the old decision tree algorithm model, certainly, the decision tree algorithm model is in competition with algorithm models such as random forests and the like to acquire the decision right of more text emotions.
This process is done completely online. It is conceivable that as the sample data increases, each algorithm model will fit the real production environment more and more, and their determination accuracy will be higher and higher.
As shown in fig. 5, the new deep learning algorithm model or machine learning algorithm model is compared with the existing deep learning algorithm model or machine learning algorithm model in terms of quality, and the specific steps for realizing the iterative upgrade of the deep learning algorithm model or machine learning algorithm model are as follows:
a. constructing a new training test sample, wherein the new training test sample consists of user marking data, administrator marking data and training test sample data extracted from new text contents;
b. training the existing deep learning algorithm model or machine learning algorithm model by using the new training test sample to form a new deep learning algorithm model or machine learning algorithm model capable of realizing emotion judgment, and simultaneously supplementing new words and new ideas of the old words recognized and obtained in the training process to a corresponding medium feature dictionary 9 or an industry feature dictionary 13;
c. verifying whether the judgment accuracy rate of the new deep learning algorithm model or the machine learning algorithm model on the emotional tendency of the new training test sample reaches 85%, and if the judgment accuracy rate reaches the standard, adding the new deep learning algorithm model or the machine learning algorithm model into the medium engine group 6 or the industry engine group 7; if the standard is not met, performing step d;
d. abandoning the iteration;
e. if so, retaining the new deep learning algorithm model or the machine learning algorithm model and deleting the existing deep learning algorithm model or the machine learning algorithm model at the same time, and if not, performing the step f;
f. the existing deep learning algorithm model or machine learning algorithm model is reserved, and the new deep learning algorithm model or machine learning algorithm model is deleted;
g. and repeating the steps a, b, c, d, e and f.
In the invention, the iterative upgrade of the new and old algorithm models is carried out in a production environment through a mechanism of superior and inferior, so that the corresponding emotion judgment path is ensured to advance in a unidirectional way, in other words, the emotion analysis system is ensured to be more and more clever, and the emotion judgment is more and more accurate.
The massive data of the Internet is benefited, and training and test sample data required by the method are guaranteed. Based on Spark, the method is realized, so that the thought can be implemented on a parallel computing cluster based on mass data, the same mass noise on the Internet can be effectively resisted, and the emotion analysis system capable of self online evolution is realized.
As shown in fig. 4, when the same vocabulary to be emotion analyzed from the text content is simultaneously sent to the medium classification module 4 and the industry classification module 5 and judged by the medium emotion tendency judgment module 12 and the industry emotion tendency judgment module 16, the rule learning engine group 8 adopts the following steps:
s1, the rule learning engine group 8 respectively acquires judgment result data of the medium emotional tendency judgment module 12 and the industry emotional tendency judgment module 16;
s2, judging whether the two judgment result data are consistent, if so, sending the judgment result data to a user at the user side 1, and enabling the user to online tag texts and form user marking data based on the tag texts; if not, performing step S3;
s3, the rule learning engine group 8 informs an administrator of the background end 2;
s4, forming administrator marking data based on the online label text of the administrator;
s5, judging whether the industry to which the text content belongs is correct or not by a plumber, and if so, putting the text into an industry correct text training library and a medium error text training library; if not, go to step S6;
and S6, putting the text into an industry type error text training library and a medium type correct text training library.
In addition, in the operation process of the system, the rule learning engine group 8 is further configured to recognize new words and new meanings of old words according to the determination result data sent by the medium emotional tendency determination module 12 and the industry emotional tendency determination module 16, and supplement the obtained new words and new meanings of old words to the corresponding medium feature dictionary 9 or industry feature dictionary 13.
The above description is provided by way of example only to aid understanding of the present invention, and is not intended to limit the present invention. For a person skilled in the art, several simple deductions, modifications or substitutions may be made according to the present idea.

Claims (6)

1. An autonomously upgraded and anti-noise text emotion analysis system is characterized by comprising a user side, a background side and a text emotion judgment system; the text emotion judging system comprises a medium classification module, an industry classification module, a medium engine group, an industry engine group and a rule learning engine group;
the media classification module acquires text content to be subjected to emotion analysis, judges whether the text content is from a media or not, and sends the text content to a media feature dictionary of a corresponding media type if the text content is from the media, or does not send the text content to the media feature dictionary if the text content is not from the media; the media type comprises comments, news, blogs, weChat and microblogs, and the media feature dictionary correspondingly comprises a comment feature dictionary, a news feature dictionary, a blog feature dictionary, a WeChat feature dictionary and a microblog feature dictionary;
the media feature dictionary receives text content to be subjected to emotion analysis of corresponding media types, words to be subjected to emotion analysis are generated through a word segmentation module, the generated words to be subjected emotion analysis are all sent to M media feature extraction modules in a media feature extraction module group, each media feature extraction module comprises a first media feature extraction module, a second media feature extraction module, a third media feature extraction module, an Mth media feature extraction module and an Nth media feature selection module, M is an integer, each media feature extraction module sends the feature vector extracted by the media feature extraction module to the N media feature selection modules in the media feature selection module group, each media feature selection module comprises a first media feature selection module, a second media feature selection module, an Nth media feature selection module, N is an integer, and each media feature selection module sends the feature vector selected by the media feature selection module to the media engine group;
the medium engine group comprises a medium deep learning engine group and a medium machine learning engine group, the medium deep learning engine group comprises Q medium deep learning engines for realizing emotion judgment based on a deep learning algorithm model, Q is an integer, the medium machine learning engine group comprises S medium machine learning engines for realizing emotion judgment based on a machine learning algorithm model, and S is an integer, the medium engine group calculates the received feature vector sent by the medium feature selection module based on the corresponding algorithm model, calculates emotion analysis result data of each vocabulary to be subjected to emotion analysis, and sends the calculated emotion analysis result data to the medium emotion tendency judgment module;
the medium emotional tendency judgment module is used for judging whether each piece of emotional analysis result data is correct or not and sending the judgment result data to the rule learning engine group;
the industry classification module acquires text content to be subjected to emotion analysis, judges whether the text content belongs to an industry field or not, and sends the text content to an industry feature dictionary of the corresponding industry field if the text content belongs to the industry field, otherwise, does not send the text content to the industry feature dictionary; the industry fields comprise catering, electronics, automobiles, communication and clothing, and the industry feature dictionary correspondingly comprises a catering field feature dictionary, an electronic field feature dictionary, an automobile field feature dictionary, a communication field feature dictionary and a clothing field feature dictionary;
the industry feature dictionary receives text contents to be subjected to emotion analysis in corresponding industry fields, vocabularies to be subjected to emotion analysis are generated through a word segmentation module, the generated vocabularies to be subjected emotion analysis are all sent to X industry feature extraction modules in an industry feature extraction module group, each industry feature extraction module comprises a first industry feature extraction module, a second industry feature extraction module, a third industry feature extraction module and an Xth industry feature extraction module, X is an integer, each industry feature extraction module sends the feature vector extracted by the industry feature extraction module to Y industry feature selection modules in the industry feature selection module group, each industry feature selection module comprises a first industry feature selection module, a second industry feature selection module and a Yth industry feature selection module, Y is an integer, and each industry feature selection module sends the feature vector selected by the industry feature selection module to the industry engine group;
the industry engine group comprises an industry deep learning engine group and an industry machine learning engine group, the industry deep learning engine group comprises U industry deep learning engines for realizing emotion judgment based on a deep learning algorithm model, U is an integer, the industry machine learning engine group comprises V industry machine learning engines for realizing emotion judgment based on a machine learning algorithm model, and V is an integer, the industry engine group calculates the received feature vector sent by the industry feature selection module based on a corresponding algorithm model, calculates emotion analysis result data of each vocabulary to be subjected to emotion analysis, and sends the calculated emotion analysis result data to an industry emotion tendency judgment module;
the industry emotional tendency judgment module is used for judging whether each emotional analysis result data is correct or not and sending the judgment result data to the rule learning engine group;
the rule learning engine group counts the accuracy rate of judging the text content emotional tendency to be subjected to emotional analysis by M X N (Q + S) or/and X X Y (U + V) emotion judging paths according to the received judging result data sent by the medium emotional tendency judging module and the industry emotional tendency judging module, and the judging result data are the emotion judging paths with the highest matching accuracy rate for texts in different medium types and industry fields; meanwhile, the rule learning engine group trains the existing deep learning algorithm model or machine learning algorithm model on line according to the known judgment result data to form a new deep learning algorithm model or machine learning algorithm model, and the new deep learning algorithm model or machine learning algorithm model is added into the medium engine group or the industry engine group to be compared with the existing deep learning algorithm model or machine learning algorithm model for quality, so that the iterative upgrade of the deep learning algorithm model or the machine learning algorithm model is realized.
2. The system of claim 1, wherein when the same vocabulary to be emotion analyzed from the text content is simultaneously transmitted to the media classification module and the industry classification module and judged by the media emotion tendency judgment module and the industry emotion tendency judgment module, the rule learning engine set comprises the following steps:
s1, the rule learning engine group respectively acquires judgment result data of the medium emotional tendency judgment module and the industry emotional tendency judgment module;
s2, judging whether the two judgment result data are consistent, if so, sending the judgment result data to a user at a user side, and enabling the user to online tag texts and form user marking data based on the tag texts; if not, performing step S3;
s3, the rule learning engine group informs an administrator of a background end;
s4, forming an administrator marking data based on the online label text of the administrator;
s5, judging whether the industry to which the text content belongs is correct or not by a plumber, and if so, putting the text into an industry correct text training library and a medium error text training library; if not, go to step S6;
and S6, putting the text into an industry type error text training library and a medium type correct text training library.
3. The system for autonomously upgrading and anti-noise text sentiment analysis of claim 2, wherein the new deep learning algorithm model or the machine learning algorithm model is compared with the existing deep learning algorithm model or the machine learning algorithm model in terms of quality, and the steps of the iterative upgrading of the deep learning algorithm model or the machine learning algorithm model are as follows:
a. constructing a new training test sample, wherein the new training test sample consists of user marking data, administrator marking data and training test sample data extracted from new text contents;
b. training the existing deep learning algorithm model or machine learning algorithm model by using the new training test sample to form a new deep learning algorithm model or machine learning algorithm model capable of realizing emotion judgment, and simultaneously supplementing new words and new ideas of the old words recognized and obtained in the training process to a corresponding medium characteristic dictionary or industry characteristic dictionary;
c. verifying whether the judgment accuracy rate of the new deep learning algorithm model or the machine learning algorithm model on the emotional tendency of the new training test sample reaches 85%, and if the judgment accuracy rate reaches the standard, adding the new deep learning algorithm model or the machine learning algorithm model into a medium engine group or an industry engine group; if the standard is not met, performing step d;
d. abandoning the iteration;
e. if so, retaining the new deep learning algorithm model or the machine learning algorithm model and deleting the existing deep learning algorithm model or the machine learning algorithm model at the same time, and if not, performing the step f;
f. the existing deep learning algorithm model or machine learning algorithm model is reserved, and the new deep learning algorithm model or machine learning algorithm model is deleted;
g. and repeating the steps a, b, c, d, e and f.
4. The system of claim 1, wherein a single emotion judgment path comprises a feature extraction module, a feature selection module and an emotion judgment algorithm model, the feature extraction module performs feature extraction on a participle formed by operating an original text through a participle technology by the participle module, and extracted feature vectors are selected and corrected by the feature selection module and then are transmitted to the emotion judgment algorithm model for training to form a new emotion judgment algorithm model.
5. An autonomously upgraded and noise-resistant text emotion analysis system as claimed in any one of claims 1 to 4, wherein during the operation of the system, the rule learning engine set is further used for recognizing new words and new ideas of old words and supplementing the obtained new words and new ideas to the corresponding media feature dictionary or industry feature dictionary according to the judgment result data sent by the media emotion propensity judgment module and the industry emotion propensity judgment module.
6. An autonomously upgraded and noise immune text emotion analysis system according to claim 5, wherein said machine learning algorithm model includes a decision tree algorithm model, a regression algorithm model, a clustering algorithm model, an artificial neural network algorithm model.
CN201810930606.3A 2018-08-15 2018-08-15 Text emotion analysis system capable of achieving automatic upgrading and resisting noise Active CN109165298B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810930606.3A CN109165298B (en) 2018-08-15 2018-08-15 Text emotion analysis system capable of achieving automatic upgrading and resisting noise

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810930606.3A CN109165298B (en) 2018-08-15 2018-08-15 Text emotion analysis system capable of achieving automatic upgrading and resisting noise

Publications (2)

Publication Number Publication Date
CN109165298A CN109165298A (en) 2019-01-08
CN109165298B true CN109165298B (en) 2022-11-15

Family

ID=64895868

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810930606.3A Active CN109165298B (en) 2018-08-15 2018-08-15 Text emotion analysis system capable of achieving automatic upgrading and resisting noise

Country Status (1)

Country Link
CN (1) CN109165298B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021056127A1 (en) * 2019-09-23 2021-04-01 Beijing Didi Infinity Technology And Development Co., Ltd. Systems and methods for analyzing sentiment
CN110888983B (en) * 2019-11-26 2022-07-15 厦门市美亚柏科信息股份有限公司 Positive and negative emotion analysis method, terminal equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103123620A (en) * 2012-12-11 2013-05-29 中国互联网新闻中心 Web text sentiment analysis method based on propositional logic
CN104281694A (en) * 2014-10-13 2015-01-14 安徽华贞信息科技有限公司 Analysis system of emotional tendency of text
CN105335352A (en) * 2015-11-30 2016-02-17 武汉大学 Entity identification method based on Weibo emotion
CN107291696A (en) * 2017-06-28 2017-10-24 达而观信息科技(上海)有限公司 A kind of comment word sentiment analysis method and system based on deep learning
CN107945033A (en) * 2017-11-14 2018-04-20 李勇 A kind of analysis method of network public-opinion, system and relevant apparatus

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8412530B2 (en) * 2010-02-21 2013-04-02 Nice Systems Ltd. Method and apparatus for detection of sentiment in automated transcriptions
WO2014183089A1 (en) * 2013-05-09 2014-11-13 Metavana, Inc. Hybrid human machine learning system and method
US20150199609A1 (en) * 2013-12-20 2015-07-16 Xurmo Technologies Pvt. Ltd Self-learning system for determining the sentiment conveyed by an input text
US10664759B2 (en) * 2014-10-23 2020-05-26 Fair Isaac Corporation Dynamic business rule creation using scored sentiments

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103123620A (en) * 2012-12-11 2013-05-29 中国互联网新闻中心 Web text sentiment analysis method based on propositional logic
CN104281694A (en) * 2014-10-13 2015-01-14 安徽华贞信息科技有限公司 Analysis system of emotional tendency of text
CN105335352A (en) * 2015-11-30 2016-02-17 武汉大学 Entity identification method based on Weibo emotion
CN107291696A (en) * 2017-06-28 2017-10-24 达而观信息科技(上海)有限公司 A kind of comment word sentiment analysis method and system based on deep learning
CN107945033A (en) * 2017-11-14 2018-04-20 李勇 A kind of analysis method of network public-opinion, system and relevant apparatus

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
一种分层多算法集成的微博情感分类方法;左荣欣;《电子世界》;20140915(第17期);全文 *
基于网络舆情倾向性分析的机器学习方法研究;汪淳等;《智能计算机与应用》;20170428(第02期);全文 *
面向互联网评论情感分析的中文主观性自动判别方法研究;叶强等;《信息系统学报》;20071115(第01期);全文 *

Also Published As

Publication number Publication date
CN109165298A (en) 2019-01-08

Similar Documents

Publication Publication Date Title
Shenoy et al. Multilogue-net: A context aware rnn for multi-modal emotion detection and sentiment analysis in conversation
CN107832663B (en) Multi-modal emotion analysis method based on quantum theory
CN107608956B (en) Reader emotion distribution prediction algorithm based on CNN-GRNN
US20190371299A1 (en) Question Answering Method and Apparatus
CN105069072B (en) Hybrid subscriber score information based on sentiment analysis recommends method and its recommendation apparatus
CN109670039B (en) Semi-supervised e-commerce comment emotion analysis method based on three-part graph and cluster analysis
CN107862087B (en) Emotion analysis method and device based on big data and deep learning and storage medium
CN110059183B (en) Automobile industry user viewpoint emotion classification method based on big data
CN107833059B (en) Service quality evaluation method and system for customer service
CN108304479B (en) Quick density clustering double-layer network recommendation method based on graph structure filtering
CN113434628B (en) Comment text confidence detection method based on feature level and propagation relation network
CN108009297B (en) Text emotion analysis method and system based on natural language processing
CN107103093B (en) Short text recommendation method and device based on user behavior and emotion analysis
CN111447574B (en) Short message classification method, device, system and storage medium
CN109165298B (en) Text emotion analysis system capable of achieving automatic upgrading and resisting noise
CN115309860B (en) False news detection method based on pseudo twin network
CN107766560B (en) Method and system for evaluating customer service flow
CN113255366A (en) Aspect-level text emotion analysis method based on heterogeneous graph neural network
CN116091836A (en) Multi-mode visual language understanding and positioning method, device, terminal and medium
Tellamekala et al. COLD fusion: Calibrated and ordinal latent distribution fusion for uncertainty-aware multimodal emotion recognition
CN113486174B (en) Model training, reading understanding method and device, electronic equipment and storage medium
CN111611375B (en) Text emotion classification method based on deep learning and turning relation
CN111523311B (en) Search intention recognition method and device
CN113486143A (en) User portrait generation method based on multi-level text representation and model fusion
TWI734085B (en) Dialogue system using intention detection ensemble learning and method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20220927

Address after: 201100 5th and 6th floor, 380 Xinsong Road, Minhang District, Shanghai

Applicant after: Shanghai WuJie Data Technology Co.,Ltd.

Address before: Room 1449, No. 4999, Zhongchun Road, Minhang District, Shanghai, 201100

Applicant before: SHANGHAI WENJUN INFORMATION TECHNOLOGY Co.,Ltd.

GR01 Patent grant
GR01 Patent grant