CN110837561A - Text analysis method, text analysis device and storage medium - Google Patents

Text analysis method, text analysis device and storage medium Download PDF

Info

Publication number
CN110837561A
CN110837561A CN201911126259.XA CN201911126259A CN110837561A CN 110837561 A CN110837561 A CN 110837561A CN 201911126259 A CN201911126259 A CN 201911126259A CN 110837561 A CN110837561 A CN 110837561A
Authority
CN
China
Prior art keywords
text
tendency
emotion
emotional
emotional tendency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201911126259.XA
Other languages
Chinese (zh)
Inventor
陈汝龙
戴敏
陈誉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Long Mobile Network Technology Co Ltd
Original Assignee
Suzhou Long Mobile Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Long Mobile Network Technology Co Ltd filed Critical Suzhou Long Mobile Network Technology Co Ltd
Priority to CN201911126259.XA priority Critical patent/CN110837561A/en
Publication of CN110837561A publication Critical patent/CN110837561A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data

Abstract

The invention discloses a text analysis method, which comprises the following steps: analyzing the texts by using a plurality of emotion tendency analysis models to obtain emotion tendency results of a plurality of texts; and combining the emotion tendency results of the plurality of texts according to the weights of the different emotion tendency analysis models to obtain the final emotion tendency result of the text. Compared with the prior art, the text analysis method provided by the invention has the advantages that the texts are analyzed by adopting the plurality of emotion tendency models, and the analysis results are combined according to the weight, so that the emotion tendency of the texts can be stably analyzed, the accuracy of the analysis results is very high, and the first perception of general people on news emotion is met.

Description

Text analysis method, text analysis device and storage medium
Technical Field
The present invention relates to the field of internet technologies, and in particular, to a text analysis method, a text analysis device, and a storage medium.
Background
The emotion tendency analysis is to analyze subjective information in text content, mine viewpoints and attitudes expressed by the text, and qualitatively analyze whether viewpoint attitudes transmitted by the text content are positive emotions, negative emotions or neutral emotions.
The currently common text emotion analysis method is to vectorize a text, and then train the text through a model, wherein the model algorithm generally comprises algorithms such as an SVM, Random Forest, xgboost, adaboost, GBDT and the like.
However, when these models are used for emotion tendency analysis, the accuracy of the result of emotion tendency analysis is low for the text types that are not covered by the test set of the training model.
Disclosure of Invention
The invention aims to provide a text analysis method, equipment and a storage medium.
In order to achieve one of the above objects, an embodiment of the present invention provides a method for analyzing a text, including:
analyzing the texts by using a plurality of emotion tendency analysis models to obtain emotion tendency results of a plurality of texts;
and combining the emotion tendency results of the plurality of texts according to the weights of the different emotion tendency analysis models to obtain the final emotion tendency result of the text.
As a further improvement of an embodiment of the present invention, the weight of each emotional orientation analysis model is determined by the simulation test precision of the emotional orientation analysis model.
As a further improvement of an embodiment of the present invention, the weights of the plurality of emotional tendency analysis models are adjusted according to the actual testing precision of each emotional tendency analysis model.
As a further improvement of an embodiment of the present invention, the titles and the texts of the text are analyzed by using a plurality of emotional orientation analysis models, so as to obtain emotional orientation results of the titles and emotional orientation results of the texts;
respectively combining the emotional tendency results of the plurality of titles and the emotional tendency results of the plurality of texts according to the weights of the different emotional tendency analysis models to obtain final emotional tendency results of the titles and final emotional tendency results of the texts;
and combining the emotional tendency result of the title and the emotional tendency result of the text according to the weight of the emotional tendency of the title and the text to obtain the emotional tendency result of the text.
As a further improvement of an embodiment of the present invention, a weight of the emotional tendency of the title is greater than a weight of the emotional tendency of the body.
As a further improvement of an embodiment of the present invention, a ratio of a weight of an emotional tendency of the title to a weight of an emotional tendency of the body text is 7: 3.
As a further improvement of an embodiment of the present invention, when the length of the body text exceeds a predetermined length threshold, the weight of the body text is adjusted downward while the weight of the title is adjusted upward.
As a further improvement of an embodiment of the invention, when the length of the text exceeds a predetermined length threshold, the part of the text exceeding the predetermined length threshold is discarded, and only the emotional tendency of the remaining part of the text is analyzed.
In order to achieve one of the above objects, an embodiment of the present invention provides an electronic device, which includes a memory and a processor, wherein the memory stores a computer program operable on the processor, and the processor implements the steps in the text analysis method according to any one of the above items when executing the program.
To achieve one of the above objects, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps in the text analysis method according to any one of the above items.
Compared with the prior art, the text analysis method provided by the invention has the advantages that the texts are analyzed by adopting the plurality of emotion tendency models, and the analysis results are combined according to the weight, so that the emotion tendency of the texts can be stably analyzed, the accuracy of the analysis results is very high, and the first perception of general people on news emotion is met.
Drawings
Fig. 1 is a flowchart illustrating a text analysis method according to a first embodiment of the present invention.
Fig. 2 is a flowchart illustrating a text analysis method according to a second embodiment of the present invention.
Detailed Description
The present invention will be described in detail below with reference to specific embodiments shown in the drawings. These embodiments are not intended to limit the present invention, and structural, methodological, or functional changes made by those skilled in the art according to these embodiments are included in the scope of the present invention.
As shown in fig. 1, a method for analyzing a text according to a first embodiment of the present invention is mainly used for emotion tendency analysis of a text, and the method includes:
step S110: and analyzing the texts by using a plurality of emotion tendency analysis models to obtain emotion tendency results of the plurality of texts.
For the same test set, different emotion analysis models are trained from different angles, so that when a single model is used for emotion tendency analysis, the obtained analysis result has instability, namely the accuracy of the analysis structure is high for a certain type of text, but the accuracy is low for another type of text.
The text is analyzed by adopting a plurality of (at least two) emotion tendency analysis models to obtain emotion tendency results of a plurality of texts, and the texts are analyzed from multiple angles, so that the stability and the accuracy of the analysis results are improved.
In a preferred embodiment, the invention respectively adopts 5 emotion tendency analysis models, including SVM, Random Forest, xgboost, adaboost and GBDT, to perform emotion tendency analysis on the text, and obtain emotion tendency results of 5 texts. Among them, an SVM (support Vector Mac support Vector machine) is a model of binary classification. In the field of machine learning, a supervised learning model is typically used for pattern recognition, classification, and regression analysis. The main idea is to find a hyperplane in space that is more able to divide all data samples and to minimize the distance of all data in this set to this hyperplane.
Random Forest (RF) is an algorithm for integrating multiple trees by the idea of Ensemble Learning, the basic unit of which is a decision tree, and the essence of which belongs to a large branch of machine Learning, namely, the Ensemble Learning (Ensemble Learning) method.
The XGboost (eXtreme Gradient Boosting) is a Gradient Boosting algorithm and a residual decision tree, and the basic idea is as follows: one tree is gradually added into the model, and the overall effect (the objective function is reduced) is improved when one CRAT decision tree is added. A plurality of decision trees (a plurality of single weak classifiers) are used to form a combined classifier, and each leaf node is given a certain weight.
Adaboost (Adaboost algorithm) is an iterative algorithm, and the core idea is to train different classifiers (weak classifiers) aiming at the same training set and then collect the weak classifiers to form a stronger final classifier (strong classifier).
GBDT (Gradient Boosting Decision Tree) is a Decision Tree model based on an integrated idea and is essentially based on residual learning.
Step S120: and combining the emotion tendency results of the plurality of texts according to the weights of the different emotion tendency analysis models to obtain the final emotion tendency result of the text.
And each emotion tendency analysis model has a weight, and the emotion tendency results of the plurality of texts are combined according to the weights of different emotion tendency analysis models, namely, the weighted average value of the emotion tendency results is calculated to obtain the final emotion tendency result of the text.
According to the text analysis method, the texts are analyzed by adopting the plurality of emotion tendency models, and the analysis results are combined according to the weight, so that the emotion tendency of the texts can be stably analyzed, the accuracy of the analysis results is very high, and the first perception of general people on news emotion is met.
In a preferred embodiment, the weight of each emotional tendency analysis model is determined by the test accuracy (simulated test accuracy) of the same batch of test sets. For example, when 5 models are used to test the same batch of test sets, the model with high test accuracy has a high corresponding weight, whereas the model with low test accuracy has a low corresponding weight.
In addition, when the method is used for analyzing the emotional orientation of the text, the weights of the plurality of emotional orientation analysis models can be adjusted according to the actual test precision of each emotional orientation analysis model. When the actual test precision of the model A is higher, the weight of the model A is properly adjusted upwards; when the actual test precision of the model B is low, the weight of the model B is properly adjusted downwards.
In this embodiment, it is also possible to analyze the emotion tendencies of texts in combination with the second embodiment described later.
As shown in fig. 2, the method for analyzing a text according to the second embodiment of the present invention separately performs emotion tendentiousness analysis on a title and a body of the text, and then combines emotion tendentiousness results of the title and the body according to weights. Specifically, the method comprises the following steps:
step S210: and analyzing the emotional tendency of the title of the text to obtain an emotional tendency result of the title.
The single model can be used for carrying out emotion tendency analysis on the title of the text to obtain an emotion tendency result of the title; the method of the first embodiment of the present invention may also be used, that is, the title is analyzed by using multiple models, and then the weighted average calculation is performed on all the individual emotional tendency results by combining the weight of each model, so as to obtain the emotional tendency result of the title.
Step S220: and analyzing the emotional tendency of the text to obtain the emotional tendency result of the text.
Similar to the emotional tendency analysis of the title, the single model can be used for performing emotional tendency analysis on the text of the text to obtain an emotional tendency result of the text; the method of the first embodiment of the present invention may also be used, that is, the emotional tendency analysis is performed on the text by using multiple models, and then the weighted average calculation is performed on all the individual emotional tendency results by combining the weight of each model, so as to obtain the emotional tendency result of the text.
Step S230: and combining the emotional tendency result of the title and the emotional tendency result of the text according to the weight of the emotional tendency of the title and the text to obtain the emotional tendency result of the text.
And combining the emotional tendency result of the title and the emotional tendency result of the text according to the weight of the emotional tendency of the title and the text, namely calculating the weighted average value of the emotional tendency result of the title and the emotional tendency result of the text to obtain the emotional tendency result of the text.
The second embodiment of the present invention adopts a method of analyzing emotional orientation by separating the caption and the text, which can make the analysis result more stable and accurate.
In a preferred embodiment, the weight of the emotional tendency of the title is greater than the weight of the emotional tendency of the body. Specifically, the ratio of the weight of the emotional tendency of the title to the weight of the emotional tendency of the body is 7: 3.
In one embodiment, for some texts, the space is long, a lot of things can be stated, the emotional orientation result of the whole text can be naturally influenced, and preferably, the weight of the emotional orientation of the title is appropriately increased and the weight of the emotional orientation of the text is reduced, so that the accuracy of the emotional orientation result of the text is improved. In addition, for text with a long text body, for example, the length of the text body exceeds a predetermined length threshold, the text at the back of the article, that is, the part of the text exceeding the predetermined length threshold, can also be appropriately discarded, so as to improve the accuracy of the emotional tendency result of the text body.
In a specific embodiment, when emotion tendency analysis is performed on a text, emotion tendency analysis is performed on a title and a text of the text respectively, then a weight is set for an emotion tendency result of the title and an emotion tendency result of the text respectively, and the emotion tendency result of the title and the emotion tendency result of the text are combined according to the weights to obtain an emotion tendency result of the text.
In addition, when the emotional tendency analysis is performed on the title or the text, 5 models are respectively adopted, including SVM, Random Forest, xgboost, adaboost and GBDT, then the weight of each model is determined according to the test precision (simulation test precision) of the models to the same batch of test sets, and the test results of the 5 models are combined according to the weight to obtain the final emotional tendency result of the title or the text.
It should be noted that the emotional tendency result includes a positive value and a negative value (the positive value is also called an integrated value, and the negative value is also called a negative value), the positive value and the negative value are positive numbers between 0 and 1, and the sum of the positive value and the negative value is 1. And when the positive values are respectively in the first interval, the second interval and the third interval, judging that the emotion tendentiousness result of the text is negative emotion, neutral emotion and positive emotion. And when the negative values are respectively in the first interval, the second interval and the third interval, judging that the emotion tendentiousness result of the text is positive emotion, neutral emotion and negative emotion. In the present invention, it is preferable that the first interval, the second interval and the third interval are [0, 0.4 ], [0.4,0.6] and (0.6,1], respectively.
In another specific example, the emotional orientation results of the five models for the headline and body analysis of a piece of news are as follows (negative going forward, positive going backward):
the emotional orientation results of the five models to the text are:
rf:[[0.66 0.34]]
xgboost:[[0.8003988 0.19960119]]
svm:[[0.86228466 0.13771534]]
adaboost:[[0.49986316 0.50013684]]
gbdt:[[0.84524205 0.15475795]]
the final result calculated from the weights is:
FINAL_SCORE:(0.7381552884608918,0.261844714668352)
the emotional tendency results for the title for the five models are:
rf_title:[[0.87 0.13]]
xgboost_title:[[0.8829204 0.11707961]]
svm_title:[[0.28065291 0.71934709]]
adaboost_title:[[0.50293054 0.49706946]]
gbdt_title:[[0.87352908 0.12647092]]
the final result calculated from the weights is:
FINAL_SCORE:(0.6999926527045315,0.3000073457308467)
according to the emotional tendency weight of the title and the text, the calculated emotional tendency result of the news is as follows:
"neg" (negative direction): 0.711441, "pos" (positive direction): 0.288559
Therefore, the emotional tendency result of the news is negative emotion.
In yet another specific embodiment, the emotional orientation results of the five models for the headline and body analysis of a piece of news are as follows (negative going forward, positive going backward):
the emotional orientation results of the five models to the text are:
rf:[[0.02 0.98]]
xgboost:[[0.01483941 0.9851606]]
svm:[[6.39660492e-04 9.99360340e-01]]
adaboost:[[0.48290936 0.51709064]]
gbdt:[[0.01755406 0.98244594]]
the final result calculated from the weights is:
FINAL_SCORE:(0.10759540624364658,0.8924045937563534)
the emotional tendency results for the title for the five models are:
rf_title:[[0.13 0.87]]
xgboost_title:[[0.15893984 0.84106016]]
svm_title:[[0.01168385 0.98831615]]
adaboost_title:[[0.48711506 0.51288494]]
gbdt_title:[[0.14882678 0.85117322]]
the final result calculated from the weights is:
FINAL_SCORE:(0.19209332726134196,0.8079066727386581)
according to the emotional tendency weight of the title and the text, the calculated emotional tendency result of the news is as follows:
"neg" (negative direction): 0.166744, "pos" (positive direction): 0.833256
Therefore, the emotional tendency result of the news is positive emotion.
It can be seen from the above two specific embodiments that some single models may have inaccurate judgment for a certain type of text (news), but when the results of multiple models are combined, the results are corrected, and finally a correct emotional tendency result is obtained, which meets the first perception of general people on news emotion.
The emotion tendentiousness of a text (news) is analyzed, so that the opinion attitude of a netizen on policies and current events, comments of users on commodities or sellers in online stores, business competition and the like can be known, the recent development condition of a certain company can be monitored, for example, relevant news of the company in the recent period is collected, the emotion tendentiousness analysis is carried out on the news, if the number of positive emotion news is large, the recent development of the company is good, and conversely, if the number of negative emotion news is large, the company is troubled and unsmooth in the recent period.
The invention further provides an electronic device, which includes a memory and a processor, where the memory stores a computer program operable on the processor, and the processor implements any one of the steps of the text analysis method when executing the program, that is, implements the steps in any one of the technical solutions of the text analysis method.
The present invention also provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements any one of the steps of the above-mentioned text analysis method, that is, implements the steps of any one of the above-mentioned text analysis methods.
It should be understood that although the present description refers to embodiments, not every embodiment contains only a single technical solution, and such description is for clarity only, and those skilled in the art should make the description as a whole, and the technical solutions in the embodiments can also be combined appropriately to form other embodiments understood by those skilled in the art.
The above-listed detailed description is only a specific description of a possible embodiment of the present invention, and they are not intended to limit the scope of the present invention, and equivalent embodiments or modifications made without departing from the technical spirit of the present invention should be included in the scope of the present invention.

Claims (10)

1. A method for analyzing text, the method comprising:
analyzing the texts by using a plurality of emotion tendency analysis models to obtain emotion tendency results of a plurality of texts;
and combining the emotion tendency results of the plurality of texts according to the weights of the different emotion tendency analysis models to obtain the final emotion tendency result of the text.
2. The method of analyzing text according to claim 1, wherein:
and determining the weight of each emotional tendency analysis model according to the simulation test precision of each emotional tendency analysis model.
3. The method of analyzing text according to claim 1, wherein:
and adjusting the weights of the plurality of emotional tendency analysis models according to the actual test precision of each emotional tendency analysis model.
4. The method of analyzing text according to claim 1, wherein:
using a plurality of emotion tendency analysis models to respectively analyze the titles and the texts of the texts to obtain emotion tendency results of the titles and emotion tendency results of the texts;
respectively combining the emotional tendency results of the plurality of titles and the emotional tendency results of the plurality of texts according to the weights of the different emotional tendency analysis models to obtain final emotional tendency results of the titles and final emotional tendency results of the texts;
and combining the emotional tendency result of the title and the emotional tendency result of the text according to the weight of the emotional tendency of the title and the text to obtain the emotional tendency result of the text.
5. The method of analyzing text according to claim 4, wherein:
the weight of the emotional tendency of the title is greater than the weight of the emotional tendency of the body.
6. The method of analyzing text according to claim 5, wherein:
the ratio of the weight of the emotional tendency of the title to the weight of the emotional tendency of the body is 7: 3.
7. The method of analyzing text according to claim 4, wherein:
when the length of the text exceeds a preset length threshold, the weight of the text is adjusted downwards, and the weight of the title is adjusted upwards at the same time.
8. The method of analyzing text according to claim 4, wherein:
when the length of the text exceeds a preset length threshold, discarding the part of the text exceeding the preset length threshold, and only analyzing the emotional tendency of the rest part of the text.
9. An electronic device comprising a memory and a processor, said memory storing a computer program operable on said processor, wherein said processor implements the steps in the method of analyzing text according to any of claims 1-8 when executing said program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method for analyzing a text according to any one of claims 1 to 8.
CN201911126259.XA 2019-11-18 2019-11-18 Text analysis method, text analysis device and storage medium Withdrawn CN110837561A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911126259.XA CN110837561A (en) 2019-11-18 2019-11-18 Text analysis method, text analysis device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911126259.XA CN110837561A (en) 2019-11-18 2019-11-18 Text analysis method, text analysis device and storage medium

Publications (1)

Publication Number Publication Date
CN110837561A true CN110837561A (en) 2020-02-25

Family

ID=69576763

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911126259.XA Withdrawn CN110837561A (en) 2019-11-18 2019-11-18 Text analysis method, text analysis device and storage medium

Country Status (1)

Country Link
CN (1) CN110837561A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113221534A (en) * 2021-05-25 2021-08-06 深圳和锐网络科技有限公司 Text emotion analysis method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109918499A (en) * 2019-01-14 2019-06-21 平安科技(深圳)有限公司 A kind of file classification method, device, computer equipment and storage medium
CN110059183A (en) * 2019-03-22 2019-07-26 重庆邮电大学 A kind of automobile industry User Perspective sensibility classification method based on big data
CN110287405A (en) * 2019-05-21 2019-09-27 百度在线网络技术(北京)有限公司 The method, apparatus and storage medium of sentiment analysis

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109918499A (en) * 2019-01-14 2019-06-21 平安科技(深圳)有限公司 A kind of file classification method, device, computer equipment and storage medium
CN110059183A (en) * 2019-03-22 2019-07-26 重庆邮电大学 A kind of automobile industry User Perspective sensibility classification method based on big data
CN110287405A (en) * 2019-05-21 2019-09-27 百度在线网络技术(北京)有限公司 The method, apparatus and storage medium of sentiment analysis

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李彤等: "基于模型集成的突发事件舆情分析与趋势预测研究", 《系统工程理论与实践》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113221534A (en) * 2021-05-25 2021-08-06 深圳和锐网络科技有限公司 Text emotion analysis method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN107067025B (en) Text data automatic labeling method based on active learning
US20190205704A1 (en) Method for Training Model and Information Recommendation System
CN111159404B (en) Text classification method and device
CN110717023B (en) Method and device for classifying interview answer text, electronic equipment and storage medium
CN108845988B (en) Entity identification method, device, equipment and computer readable storage medium
CN105144239A (en) Image processing device, program, and image processing method
CN109993057A (en) Method for recognizing semantics, device, equipment and computer readable storage medium
CN105279148B (en) A kind of APP software users comment on uniformity determination methods
CN109992781B (en) Text feature processing method and device and storage medium
CN108804577B (en) Method for estimating interest degree of information tag
CN109214444B (en) Game anti-addiction determination system and method based on twin neural network and GMM
CN109508374A (en) Text data Novel semi-supervised based on genetic algorithm
CN109740530A (en) Extracting method, device, equipment and the computer readable storage medium of video-frequency band
CN112527958A (en) User behavior tendency identification method, device, equipment and storage medium
CN114492423A (en) False comment detection method, system and medium based on feature fusion and screening
CN108733652A (en) The test method of film review emotional orientation analysis based on machine learning
CN106874255A (en) Method and device for rule matching
CN108536671B (en) Method and system for recognizing emotion index of text data
CN110837561A (en) Text analysis method, text analysis device and storage medium
Mountassir et al. Some methods to address the problem of unbalanced sentiment classification in an arabic context
CN109214275B (en) Vulgar picture identification method based on deep learning
CN111651590A (en) Data processing method and device, electronic equipment and storage medium
CN115577109A (en) Text classification method and device, electronic equipment and storage medium
CN114896398A (en) Text classification system and method based on feature selection
CN114091446A (en) Method and device for generating text

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20200225

WW01 Invention patent application withdrawn after publication