WO2020082734A1

WO2020082734A1 - Text emotion recognition method and apparatus, electronic device, and computer non-volatile readable storage medium

Info

Publication number: WO2020082734A1
Application number: PCT/CN2019/089166
Authority: WO
Inventors: 方豪; 马骏; 王少军
Original assignee: 平安科技（深圳）有限公司
Priority date: 2018-10-24
Filing date: 2019-05-30
Publication date: 2020-04-30
Also published as: CN109344257A

Abstract

The present application relates to the technical field of artificial intelligence, and provides a text emotion recognition method and apparatus, an electronic device, and a computer non-volatile readable storage medium. The method comprises: obtaining a sample text set, the sample text set comprising multiple sample texts and emotion classification labels corresponding to the multiple sample texts; carrying out correction calculation on an initial cost according to the number distribution of the emotion classification labels in the sample text set to obtain a correction cost; training a lifting algorithm learning model by means of the sample text set and the correction cost to obtain a text emotion recognition model; and recognizing a text to be recognized by means of the text emotion recognition model to obtain the emotion recognition result of said text. By means of the present application, the recognition accuracy and balance of the text of different emotional categories can be improved, the recognition effect is improved, and the method has strong applicability. (FIG. 3)

Description

Text emotion recognition method, device, electronic equipment and computer non-volatile readable storage medium

This application requires the priority of the Chinese patent application 201811244553.6 filed on October 24, 2018, with the application name "Text Emotion Recognition Method, Device, Electronic Equipment, and Computer Non-Volatile Readable Storage Medium", which is incorporated herein by reference All content is merged here.

Technical field

The present application relates to the field of artificial intelligence technology, and in particular, to a text emotion recognition method, device, electronic device, and computer non-volatile readable storage medium.

Background technique

With the development of computer technology, more and more Internet companies are committed to improving service quality by analyzing big data. Among them, emotional recognition of text is an important task, such as emotional recognition of service evaluations made by users, emotional recognition and classification of Internet articles, etc., so as to better understand user demands or achieve precise positioning of text With beneficial effects such as recommendation.

Most existing text emotion recognition methods use conventional machine learning models and rely on sample texts of specific corpora to train the model. However, the inventor of the present application realized that in many corpora, the sample texts of different emotions have a problem of imbalanced proportions. For example, in the scene of recognizing the emotions of e-commerce consumers on product evaluation, the number of positive evaluations is usually Far more than the number of negative evaluations, resulting in an uneven proportion of sample text, the accuracy of the machine learning model trained to recognize positive emotional text will be higher than the accuracy of identifying negative emotional text, which affects the effect of text emotional recognition.

Summary of the invention

In order to solve the above technical problems, an object of the present application is to provide a text emotion recognition method, device, electronic device, and computer non-volatile readable storage medium.

Among them, the technical solutions adopted in this application are:

On the one hand, a text sentiment recognition method is characterized by comprising: acquiring a sample text set, the sample text set includes a plurality of sample texts and an emotion classification label corresponding to each of the sample texts; according to the The number distribution of sentiment classification labels corrects the initial cost to obtain a modified cost; through the sample text set and the modified cost, a lifting algorithm learning model is trained to obtain a text emotion recognition model; the text emotion recognition model is used to recognize Recognize the text to obtain the emotion recognition result of the text to be recognized.

On the other hand, a text sentiment recognition device is characterized by comprising: a sample acquisition module for acquiring a sample text set, the sample text set including a plurality of sample texts and emotion classification tags corresponding to each of the sample texts; The cost correction module is used for correcting and calculating the initial cost according to the number distribution of the sentiment classification tags in the sample text set, to obtain the correction cost. The model acquisition module is used to train a lifting algorithm learning model through the sample text set and the correction cost to obtain a text emotion recognition model. The target recognition module is used for recognizing the text to be recognized through the text emotion recognition model to obtain the emotion recognition result of the text to be recognized.

On the other hand, a text emotion recognition device includes a processor and a memory, and the memory stores computer-readable instructions, which are implemented by the processor to implement the text emotion recognition method as described above .

On the other hand, a computer non-volatile readable storage medium has stored thereon a computer program, which when executed by a processor implements the text emotion recognition method as described above.

In the above technical solution, a text emotion recognition model is trained and obtained based on the acquired sample text set and the modified cost weights obtained based on the number distribution of different emotion sample texts, and then emotion recognition is performed on the text to be recognized through the text emotion recognition model. On the one hand, the initial cost is corrected according to the number distribution of sample texts of different emotions, so that the correction cost can balance the deviation of the number of sample texts of different emotions, which can improve the accuracy and balance of the recognition rate of different emotion texts by the text emotion recognition model. Improve the effect of text emotion recognition; on the other hand, when training the lifting algorithm to learn the model, by modifying the cost to guide the model's preference, it can strengthen the focus on the sample text with higher correction cost, thereby speeding up the training process and achieve more Good training effect; on the other hand, in this embodiment, there are no special requirements for the corpus of the application scene, and the correction cost can be adjusted to meet the needs of different scenarios, making the text emotion recognition method of this embodiment has a strong applicability.

It should be understood that the above general description and the following detailed description are only exemplary and explanatory, and do not limit the present application.

BRIEF DESCRIPTION

The drawings here are incorporated into the specification and constitute a part of the specification, show embodiments consistent with the present application, and are used together with the specification to explain the principles of the present application.

FIG. 1 schematically shows a flowchart of a text emotion recognition method in this exemplary embodiment;

FIG. 2 schematically shows a sub-flow diagram of a text emotion recognition method in this exemplary embodiment;

FIG. 3 schematically shows a sub-flow diagram of another text emotion recognition method in this exemplary embodiment;

FIG. 4 schematically shows a structural block diagram of a text emotion recognition device in this exemplary embodiment;

5 shows a block diagram of an electronic device for implementing the above method according to an exemplary embodiment;

FIG. 6 shows a schematic diagram of a computer non-volatile readable storage medium for implementing the above method according to an exemplary embodiment.

Through the above drawings, clear embodiments of the present application have been shown, which will be described in more detail later. These drawings and text descriptions are not intended to limit the scope of the present application in any way, but by referring to specific embodiments The concept of the present application will be explained to those skilled in the art.

detailed description

The exemplary embodiments will be explained in detail here, examples of which are shown in the drawings. When the following description refers to the accompanying drawings, unless otherwise indicated, the same numerals in different drawings represent the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with this application. Rather, they are merely examples of devices and methods consistent with some aspects of the application as detailed in the appended claims.

Example embodiments will now be described more fully with reference to the drawings. However, the example embodiments can be implemented in various forms and should not be construed as being limited to the examples set forth herein; on the contrary, providing these embodiments makes the disclosure more comprehensive and complete, and fully conveys the concept of the example embodiments For those skilled in the art. The described features, structures, or characteristics may be combined in one or more embodiments in any suitable manner.

Exemplary embodiments of the present disclosure first provide a text emotion recognition method, where text generally refers to information in text form. In this embodiment, speech information can also be converted into text by a specific tool for emotion recognition; emotion recognition may be Classify and judge the sentiment states conveyed by the text, such as whether the sentiment of the text is positive or negative, commendatory or derogatory, etc.

The following further describes the exemplary embodiment with reference to FIG. 1. As shown in FIG. 1, the text emotion recognition method may include the following steps S110-S140:

Step S110: Acquire a sample text set, where the sample text set includes a plurality of sample texts and emotion classification tags corresponding to the sample texts.

The sample text may be text extracted from the corpus of a specific application scenario, and may generally cover various types of text in the corpus. According to the needs of text emotion recognition in this application scenario, the sample text can be labeled with emotion classification to obtain the emotion classification label. For example, in the scenario of identifying the emotions of e-commerce consumers on product evaluation, it is usually necessary to classify emotions as positive and negative , You can extract a large number of sample texts from the evaluation text and mark them as positive or negative emotional texts one by one; for example, when identifying the personal dynamic emotions of social network users, you usually need to classify the emotions as "happy" and "frustrated" "," Anger "," sadness "and other categories, for the sample text" weather is too good ", you can label its emotion classification label as" happy ", for the sample text" really bad luck today ", you can label its emotion classification label For "frustrated" etc. In this embodiment, the specific content of the emotion classification label is not particularly limited.

In step S120, the initial cost is corrected and calculated according to the number distribution of the sentiment classification tags in the sample text set to obtain the modified cost.

Among them, cost is a concept in cost-sensitive learning, reflecting the severity of consequences caused by misidentification. The initial cost may be a parameter determined from the application scenario and considering the cost of misrecognizing the sentiment of the text. In the same application scenario, the initial cost of incorrect recognition of texts of different emotion types is usually different; in different application scenarios, the initial cost of incorrect recognition of text of the same emotion type may also be different. For example, when using a positive evaluation system to evaluate agent customer service personnel, they generally pay more attention to the positive emotional evaluations given by customers to encourage and praise excellent customer service personnel. In this scenario, the positive emotional text is mistakenly identified as the initial negative emotional text The cost is high, and the initial cost of misrecognizing negative emotional text as positive emotional text is low; when evaluating e-commerce products, usually pay more attention to the negative emotional evaluation given by consumers to improve product quality. In this scenario, The initial cost of misidentifying negative emotional text as positive emotional text is higher, and the initial cost of misidentifying positive emotional text as negative emotional text is lower.

In the sample text set, the distribution of the number of sentiment classification labels reflects the imbalance of the sample texts of different emotions, which can be quantitatively expressed by one or more indicators such as the ratio, variance, or standard deviation between the sample texts of different emotions, for example: In a sample text set, there are 80,000 “positive” sentiment classification tags and 20,000 “negative” sentiment classification tags, then the number distribution of sentiment classification tags in the sample set can be 4: 1; or in the sample text set, the number The distribution reflects that the "positive" sentiment classification tags account for 4/5 of the total sentiment classification tags, and the "negative" sentiment classification tags account for 1/5 of the total sentiment classification tags. In multi-classification scenarios, variance or standard deviation is usually used to represent the number distribution of sentiment classification tags. This embodiment is not particularly limited.

According to the above-mentioned distribution of the number of sentiment classification tags, the initial cost of texts of different sentiment types can be corrected and calculated through a specific function or formula, and combined with the desired correction direction, the correction cost can be obtained. For example, if the proportion of positive sample text is low or the number is small, the initial cost of positive emotional text can be modified to have a higher cost weight. If the proportion of negative sample text is low or the number is small, the initial cost of negative emotional text can be modified to make it have a higher cost weight.

Step S130: Train a lifting algorithm learning model through the sample text set and the correction cost to obtain a text emotion recognition model.

The lifting algorithm learning model can be applied to the scenario of improving the accuracy of the weak classification algorithm. In this embodiment, the lifting algorithm learning model can set different sampling weights for sample texts with different accuracy rates, so that the model pays more attention to the correction cost. High sample text. The lifting algorithm learning model may include a variety of models, for example, gradient lifting decision tree model, Adaboost model or Xgboost model.

The training process can include: lifting the algorithm learning model to take the sample text as input, output the sentiment classification result of the sample text, and compare the sentiment classification result with the sentiment classification label; and then calculate the comparison result by correcting the cost to obtain the accuracy of the model recognition Rate; by iteratively adjusting the parameters of the model until the accuracy rate reaches a certain standard, it can be considered that the training is complete. The trained learning model of the lifting algorithm is the text emotion recognition model.

Step S140: Recognize the text to be recognized through the text emotion recognition model to obtain the emotion recognition result of the text to be recognized.

The text emotion recognition model completed through the above training can recognize the text to be recognized, and the emotion recognition result is the emotion classification result of the text to be recognized. For example, the emotion recognition result may be a positive emotion text or a negative emotion text.

Based on the above description, in this exemplary embodiment, based on the acquired sample text set and the modified cost weights obtained based on the number distribution of different emotion sample texts, a text emotion recognition model is trained and obtained, and then the recognized text is treated through the text emotion recognition model Perform emotion recognition. On the one hand, the initial cost is corrected according to the number distribution of sample texts of different emotions, so that the correction cost can balance the deviation of the number of sample texts of different emotions, which can improve the accuracy and balance of the recognition rate of different emotion texts by the text emotion recognition model. Improve the effect of text emotion recognition; on the other hand, when training the lifting algorithm to learn the model, by modifying the cost to guide the model's preference, it can strengthen the focus on the sample text with higher correction cost, thereby speeding up the training process and achieve more Good training effect; on the other hand, in this embodiment, there are no special requirements for the corpus of the application scene, and the correction cost can be adjusted to meet the needs of different scenarios, making the text emotion recognition method of this embodiment has a strong applicability.

In an exemplary embodiment, the sentiment classification label may include positive sentiment text and negative sentiment text. Step S120 can be implemented by the following steps:

The initial costs cost ₁₀ and cost _{01 are} obtained. Cost ₁₀ is the initial cost of mistaken positive emotion text as negative emotion text, and cost ₀₁ is the initial cost of mistaken negative emotion text as positive emotion text.

The number of positive emotional text Q1 and the number of negative emotional text Q0 in the sample text set are counted.

Modify the initial cost by the following formula to obtain the modified cost:

Among them, R ₁₀ is the sample deviation ratio, costm ₁₀ is the correction cost of mistaken positive emotion text as negative emotion text, costm ₀₁ is the correction cost of mistaken negative emotion text as positive emotion text, and a is the exponential parameter.

According to the above analysis, sample texts with different sentiment classifications in the sample text set have different initial costs and correction costs. When the emotion classification labels are positive emotion text and negative emotion text, "0" can be used to indicate negative emotions, and "1" can be used to indicate positive emotions. The obtained initial costs cost ₁₀ and cost ₀₁ can respectively represent the initial cost of misrepresenting positive emotional text as negative emotional text and the initial cost of misrepresenting negative emotional text as positive emotional text.

Based on the number of positive emotional text Q1 and the number of negative emotional text Q0 in the sample text set, the correction cost can be calculated by formula (1), formula (2) and formula (3), a is the index parameter, reflecting the degree of correction, the greater a , Indicates the higher the degree of correction; generally 0 <a≤1, the value of a can be set according to experience and actual use.

For example, if the number of positive sentiment texts Q1 = 80000, the number of negative sentiment texts Q0 = 20000, set a = 1/2, according to the formula calculation can get R ₁₀ = 4, substituted into formula (2) and formula (3) , Available costm10 = 0.5cost10, costm01 = 2cost01. It can be seen that after correction calculation, the correction cost of positive emotional text is lower than its initial cost, and the correction cost of negative emotional text is higher than its initial cost.

In other embodiments, the initial cost can also be corrected by calculating the deviation ratio of the sample texts of different sentiment classifications. For example: in the sample text set, the number of negative emotion texts is Q0, the number of positive emotion texts is Q1, and the deviation ratio of negative emotions can be:

The cost can be modified by the formula: costm ₁₀ = cost ₁₀ · R ₀ ,

To calculate.

For example, if the number of positive sentiment texts Q1 = 80000, the number of negative sentiment texts Q0 = 20000, and R ₀ = 0.4, then substitute the formula costm ₁₀ = cost ₁₀ · R ₀ and the formula

Adjust the initial cost to obtain costm10 = 0.4cost10 and costm01 = 2.5cost01. After the correction calculation, the correction cost of positive emotional text can also be lower than its initial cost, and the correction cost of negative emotional text is higher than its initial cost.

In an exemplary embodiment, referring to FIG. 2, step S130 may include the following steps:

In step S201, the sample text set is divided into a training subset T and a verification subset D, D = {x ₁ , x ₂ … x _m }.

In step S202, the training subset T is used to train the lifting algorithm to learn the model.

In step S203, the emotion recognition result f (xi) of each sample text xi in the verification subset D is obtained by improving the algorithm learning model.

Step S204, calculating the error rate of the algorithm learning model according to formula (4):

Step S205, if the error rate is lower than the learning threshold, it is determined that the training model of the lifting algorithm learning is completed, and the trained learning model of the lifting algorithm is determined as a text emotion recognition model.

Where m is the number of sample texts in the verification subset, i ∈ [1, m]; E is the error rate of the algorithm learning model, D + is the positive emotion sample text subset of verification subset D, and D- is the verification subset D is the negative sentiment sample text subset, y _i is the sentiment classification label of the sample text xi.

In step S201, the sample text set can be directly divided into two mutually exclusive sets, one of which is used as a training subset and the other is used as a verification subset. After training the model, it is used to evaluate its verification error. As an estimate of the generalization error. Assuming that the sample text set contains 100,000 sample texts, taking 8/2 samples, it can be divided into a subset containing 80,000 training sample texts, namely training subset T, and a subset containing 20,000 verification sample texts, namely Verification subset D, D = {x ₁ , x ₂ … x _m }, x1, x2, etc. represent the sample text in D. Among them, the distribution ratio of the training subset and the verification subset can be determined according to needs, and no specific limitation is made here.

The lifting algorithm learning model can take the training subset as input, output the sentiment classification result of the sample text in the training subset, adjust the model parameters, continue training the model, and then can verify whether the model meets the requirements by verifying the subset, by formula (4) The calculation improves the error rate of the algorithm learning model. In formula (4), Ⅱ (·) is the indicator function, and the values in brackets are 1 and 0 when true and false, respectively. For each sample text xi in D, if the model outputs the result f (xi ) Is the same as the sentiment classification label yi, the error index of xi is 0; if the output of the model is different from the sentiment classification label, the error index of xi is costm10 (when xi is positive sample text) or costm01 (when xi is negative Sample text); taking the arithmetic average of the error indices of all sample texts in D, the error rate E of the model can be obtained. The lower the value of the error rate E, the better the effect of improving the algorithm learning model training.

In the model training, a learning threshold judgment mechanism can be set to judge whether the error rate of the improved algorithm learning model is within an acceptable range. If the calculated error rate is lower than the learning threshold, it is judged that the model training is completed to obtain a text emotion recognition model; if the calculated error rate is equal to or higher than the learning threshold, it cannot pass the verification, and the model can continue to be trained. The learning threshold can be set according to experience or actual usage, and this embodiment does not limit its specific value.

In an exemplary embodiment, the text emotion recognition method may further include the following steps:

According to formula (5) and formula (6), calculate the positive sample error rate E + and negative sample error rate E- of the improved algorithm learning model:

According to formula (7), calculate the error rate ratio of the improved algorithm learning model:

If the error rate ratio is within a preset range, continue to detect whether the error rate is below the learning threshold.

Where s is the number of positive sentiment sample texts in the verification subset D, that is, the number of sample texts in D +, and v is the number of negative sentiment sample texts in the verification subset D, namely the number of sample texts in D-, m = s + v.

Considering the difference between the positive sample and negative sample error rates, the positive sample error rate E + and the negative sample error rate E- of the improved algorithm learning model can be calculated according to formulas (5) and (6), respectively. The positive sample error rate E + is The positive sample text subset D + verification is used to increase the error rate of the algorithm learning model, that is, the error rate for positive sample text recognition; the negative sample error rate E- is the negative sample text subset D- verification to improve the algorithm learning model error rate, That is, the error rate for negative sample text recognition. Then the error rate calculated by the above formula (4) is the error rate for the overall recognition of positive sample text and negative sample text.

In an exemplary embodiment, after calculating E + and E-, the formula can also be used

The error rate of the sample text subset D verification lifting algorithm learning model is calculated, which is consistent with the error rate calculated by the above formula (4).

According to formula (7), the error rate ratio A of the lifting algorithm model can be calculated. A reflects the imbalance of the error rate of the model for different emotion sample text recognition. When A is 1, it means that the positive sample error rate E + is equal to the negative sample error rate E-. At this time, the error rate of the model for the positive sample text and the negative sample text recognition is balanced; when A and 1 are too different, whether it is greater than 1 Or less than 1, it means that the model has a high degree of unevenness in the recognition rate of positive sample text and negative sample text, and the training has not met the requirements. This embodiment means that before determining whether the error rate of the learning model of the lifting algorithm meets the requirement, first determine whether the error rate of the model identifying different emotion sample texts is balanced, and if the balance reaches the requirement, continue to determine whether the error rate meets the requirement.

According to the acceptable degree of error rate imbalance in the application scenario, a preset range can be set to measure whether the error rate balance meets the requirements. When the error rate ratio is within the preset range, it means that the balance reaches the requirements and can be continued Determine whether the error rate reaches the standard of learning threshold. For example, the preset range can be set to [0.5, 2], when the error rate of positive emotion samples is twice the error rate of negative emotion samples, the calculated error rate ratio A = 0.5, when the error rate of negative emotion samples is positive emotion samples When the error rate is twice, the calculated error rate ratio A = 2, which is within the preset range, indicates that this degree of imbalance can be accepted, and continues to detect whether the error rate is below the learning threshold.

In other embodiments, B = | lgA | can also be used to quantitatively express the degree of unbalanced error rate of the lifting algorithm learning model for different emotion sample text recognition. When B = 0, it means complete equilibrium. The larger the B, the more balanced it is. Poor, so you can set a threshold on B to measure whether the model's error rate balance meets the requirements.

Further, if the error rate ratio is not within the preset range, the learning model of the lifting algorithm needs to be further trained. In an exemplary embodiment, the text emotion recognition method further includes the following steps:

If the error rate ratio is not within the preset range, the training subset T is used again to train the lifting algorithm to learn the model.

Recalculate the proportion of error rates that improve the algorithm learning model by the following formula:

It is checked again whether the error rate ratio is within the preset range.

For example, if the positive sample error rate E + calculated in equations (5) and (6) is greater than the negative sample error rate E-, the error rate ratio A is greater than 1. In order to improve the error rate balance of the algorithm learning model , You can train the model again, and calculate E- and E + again by formula (8) and formula (9). In formula (8) and formula (9), if the A calculated in the previous verification is greater than 1, E + is improved by multiplying by A, and E- is reduced by multiplying by 1 / A, that is, training in this round If E + and E- are not greatly improved, the ratio A will continue to increase, so the training process of the model will be accelerated and the training effect will be improved. Through the above process, the error rate of the model can be balanced faster.

FIG. 3 shows a flow chart of a text emotion recognition model training in this exemplary embodiment. The sample deviation ratio is calculated for the sample text set, and the correction cost is calculated according to the sample deviation ratio to train the lifting algorithm to learn the model; and then calculate The error rate ratio and error rate of model training, and judge accordingly; if it is judged that the error rate ratio is not within the preset range, you can return to the model training step to continue training to improve the algorithm to learn the model, if the error rate ratio is judged to be preset Within the range, you can continue to judge whether the error rate is lower than the learning threshold; further, if you judge that the error rate is equal to or higher than the learning threshold, you can return to the model training step to continue training to improve the algorithm to learn the model, if you judge the error rate is low Based on the learning threshold, it can be considered that the model training is completed and a text emotion recognition model is obtained.

In an exemplary embodiment, the emotion classification tags may include: level 1 positive emotion text, level 2 positive emotion text, ..., level n positive emotion text and level 1 negative emotion text, level 2 negative emotion text, ..., N-level negative emotion text, n is an integer greater than 1.

Among them, the emotions of the sample text can be classified into positive emotions and negative emotions. Further, positive emotions and negative emotions can be divided into level 1 positive emotion text, level 2 positive emotion text, ..., level n positive emotion text and Level 1 negative emotion text, level 2 negative emotion text, ..., level n negative emotion text. You can determine the sentiment classification level by identifying keywords or keywords. For example, sample text with a keyword of "good" can be labeled with a sentiment classification label of level 1 positive sentiment text. Keywords include "very" and "good" The sample text of can be marked with a sentiment classification label of level 2 positive sentiment text and so on. In addition, the sentiment classification tags may also include neutral sentiment text, etc., which is not specifically limited here.

Exemplary embodiments of the present disclosure also provide a text emotion recognition device. Referring to FIG. 4, the apparatus 400 may include a sample acquisition module 410, a cost correction module 420, a model acquisition module 430, and a target recognition module 440. Among them, the sample acquisition module 410 is used to acquire a sample text set, the sample text set includes multiple sample texts and the sentiment classification tags corresponding to each sample text; the cost correction module 420 is used to determine the initial value based on the number of sentiment classification tags in the sample text set. The cost is corrected and calculated to obtain the modified cost; the model acquisition module 430 is used to train a lifting algorithm to learn the model through the sample text set and the modified cost, and the text emotion recognition model is obtained; the target recognition module 440 is used to treat the recognized text through the text emotion recognition model Recognize and get the emotion recognition result of the text to be recognized.

In an exemplary embodiment, the sentiment classification label includes positive sentiment text and negative sentiment text; the cost correction module may include: an initial cost acquisition unit, used to obtain initial costs cost ₁₀ and cost ₀₁ , cost ₁₀ is the positive sentiment text error Think of the initial cost of negative emotion text, cost ₀₁ is the initial cost of mistaken negative emotion text as positive emotion text; text statistics unit, used to count the number of positive emotion text Q1 and the number of negative emotion text Q0 in the sample text set; cost correction unit , Used to modify the initial cost by the following formula to obtain the modified cost:

In an exemplary embodiment, the model acquisition module may include: a division unit for dividing the sample text set into a training subset T and a verification subset D, D = {x ₁ , x ₂ … x _m }; the training unit , Used to train the lifting algorithm to learn the model using the training subset T; the verification unit, to obtain the sentiment recognition result f (xi) of each sample text xi in the verification subset D through the lifting algorithm learning model; Formula (4) is calculated to improve the error rate of the algorithm learning model:

Judgment unit, used to determine that the lifting algorithm learning model training is completed when the error rate is lower than the learning threshold, and determine the trained lifting algorithm learning model as a text emotion recognition model; where m is the number of sample text in the verification subset, i ∈ [1, m]; E is the error rate of the learning model of the algorithm, D + is the positive emotion sample text subset of the verification subset D, D- is the negative emotion sample text subset of the verification subset D, and y _i is the sample Sentiment classification label for text xi.

In an exemplary embodiment, the calculation unit may also be used to calculate the positive sample error rate E + and the negative sample error rate E- of the boosting algorithm learning model according to equations (5) and (6), respectively:

And, it is used to calculate the error rate ratio of the improved algorithm learning model according to formula (7):

The judgment unit can also be used to continue to detect whether the error rate is lower than the learning threshold when the error rate ratio is within a preset range. Where, s is the number of positive emotion sample texts in the verification subset D, v is the number of negative emotion sample texts in the verification subset D, and m = s + v.

In an exemplary embodiment, the training unit can also be used to train the lifting algorithm to learn the model again using the training subset T if the error rate ratio is not within the preset range; the computing unit can also be used to recalculate the lifting algorithm by the following formula The error rate ratio of the learning model:

The judgment unit can also be used to detect again whether the error rate ratio is within a preset range.

In an exemplary embodiment, the emotion classification tags may include level 1 positive emotion text, level 2 positive emotion text, ..., level n positive emotion text and level 1 negative emotion text, level 2 negative emotion text, ..., n Grade negative emotion text, n is an integer greater than 1.

In an exemplary embodiment, the lifting algorithm learning model may include a gradient lifting decision tree model, an Adaboost model, or an Xgboost model.

The specific details of the above modules / units have been described in detail in the corresponding method embodiments, so they will not be repeated here.

Exemplary embodiments of the present disclosure also provide an electronic device capable of implementing the above method.

Those skilled in the art can understand that various aspects of the present disclosure can be implemented as a system, method, or program product. Therefore, various aspects of the present disclosure may be specifically implemented in the form of: a complete hardware implementation, a complete software implementation (including firmware, microcode, etc.), or a combination of hardware and software implementations, which may be collectively referred to herein as "Circuit", "Module" or "System".

The electronic device 500 according to this exemplary embodiment of the present disclosure is described below with reference to FIG. 5. The electronic device 500 shown in FIG. 5 is only an example, and should not bring any limitation to the functions and use scope of the embodiments of the present disclosure.

As shown in FIG. 5, the electronic device 500 is represented in the form of a general-purpose computing device. The components of the electronic device 500 may include, but are not limited to: the at least one processing unit 510, the at least one storage unit 520, a bus 530 connecting different system components (including the storage unit 520 and the processing unit 510), and the display unit 540.

Wherein, the storage unit stores the program code, and the program code may be executed by the processing unit 510, so that the processing unit 510 executes the steps according to various exemplary embodiments of the present disclosure described in the "Exemplary Method" section of this specification. For example, the processing unit 510 may execute steps S110 to S140 shown in FIG. 1, or may execute steps S201 to S205 shown in FIG. 2, and so on.

The storage unit 520 may include a readable medium in the form of a volatile storage unit, such as a random access storage unit (RAM) 521 and / or a cache storage unit 522, and may further include a read-only storage unit (ROM) 523.

The storage unit 520 may further include a program / utility tool 524 having a set of (at least one) program modules 525. Such program modules 525 include but are not limited to: an operating system, one or more application programs, other program modules, and program data. Each of these examples or some combination may include an implementation of the network environment.

The bus 530 may represent one or more of several types of bus structures, including a storage unit bus or a storage unit controller, a peripheral bus, a graphics acceleration port, a processing unit, or a local area using any of a variety of bus structures bus.

The electronic device 500 may also communicate with one or more external devices 700 (eg, keyboard, pointing device, Bluetooth device, etc.), and may also communicate with one or more devices that enable a user to interact with the electronic device 500, and / or This enables the electronic device 500 to communicate with any device (eg, router, modem, etc.) that communicates with one or more other computing devices. Such communication may be performed through an input / output (I / O) interface 550. Moreover, the electronic device 500 can also communicate with one or more networks (such as a local area network (LAN), a wide area network (WAN), and / or a public network, such as the Internet) through a network adapter 560. As shown, the network adapter 560 communicates with other modules of the electronic device 500 through the bus 530. It should be understood that although not shown in the figure, other hardware and / or software modules may be used in conjunction with the electronic device 500, including but not limited to: microcode, device driver, redundant processing unit, external disk drive array, RAID system, tape drive And data backup storage system.

Through the description of the above embodiments, those skilled in the art can easily understand that the example embodiments described herein can be implemented by software, or can be implemented by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, U disk, mobile hard disk, etc.) or on a network , Including several instructions to cause a computing device (which may be a personal computer, server, terminal device, or network device, etc.) to perform the method according to an exemplary embodiment of the present disclosure.

Exemplary embodiments of the present disclosure also provide a computer non-volatile readable storage medium on which is stored a program product capable of implementing the above method of this specification. In some possible implementations, various aspects of the present disclosure may also be implemented in the form of a program product, which includes program code, and when the program product runs on the terminal device, the program code is used to cause the terminal device to execute The steps according to various exemplary embodiments of the present disclosure described in the "Exemplary Method" section.

Referring to FIG. 6, a program product 600 for implementing the above method according to an exemplary embodiment of the present disclosure is described, which can adopt a portable compact disk read-only memory (CD-ROM) and include a program code, and can be used in a terminal Devices, such as personal computers. However, the program product of the present disclosure is not limited thereto, and in this document, the readable storage medium may be any tangible medium containing or storing a program, which may be used by or in combination with an instruction execution system, apparatus, or device. The program product may use any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination of the above. More specific examples of readable storage media (non-exhaustive list) include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing. The computer-readable signal medium may include a data signal that is transmitted in baseband or as part of a carrier wave, in which readable program code is carried. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above. The readable signal medium may also be any readable medium other than a readable storage medium, and the readable medium may send, propagate, or transmit a program for use by or in combination with an instruction execution system, apparatus, or device.

It should be understood that the present disclosure is not limited to the precise structure that has been described above and shown in the drawings, and that various modifications and changes can be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

A text sentiment recognition method, which includes:

Acquiring a sample text set, the sample text set including a plurality of sample texts and emotion classification tags corresponding to the sample texts;

Modify the initial cost according to the number distribution of the sentiment classification tags in the sample text set to obtain the modified cost;

Training a lifting algorithm learning model through the sample text set and the modified cost to obtain a text emotion recognition model;

Recognize the text to be recognized through the text emotion recognition model to obtain the emotion recognition result of the text to be recognized.
The method according to claim 1, wherein the sentiment classification tags include positive sentiment text and negative sentiment text;

The modifying the initial cost according to the number distribution of the sentiment classification tags in the sample text set, and obtaining the modified cost includes:

Obtain initial costs cost 10 and cost 01 , cost 10 is the initial cost of mistaken positive emotion text as negative emotion text, cost 01 is the initial cost of mistaken negative emotion text as positive emotion text;

Count the number of positive emotional text Q1 and the number of negative emotional text Q0 in the sample text set;

Modify the initial cost by the following formula to obtain the modified cost:

Among them, R 10 is the sample deviation ratio, costm 10 is the correction cost of mistaken positive emotion text as negative emotion text, costm 01 is the correction cost of mistaken negative emotion text as positive emotion text, and a is the exponential parameter.
The method according to claim 1, wherein the training a lifting algorithm learning model through the sample text set and the correction cost to obtain a text emotion recognition model includes:

Divide the sample text set into a training subset T and a verification subset D, D = {x 1 , x 2 … x m };

Training the lifting algorithm learning model using the training subset T;

Acquiring the emotion recognition result f (xi) of each sample text xi in the verification subset D through the lifting algorithm learning model;

Calculate the error rate of the improved algorithm learning model according to the following formula:

If the error rate is lower than the learning threshold, it is determined that the lifting algorithm learning model training is completed, and the trained training algorithm learning model is determined as the text emotion recognition model;

Where m is the number of sample texts in the verification subset, i∈ [1, m]; E is the error rate of the learning model of the lifting algorithm, and D + is the positive emotion sample text subset of the verification subset D D- is a negative emotion sample text subset of the verification subset D, and y i is an emotion classification label of the sample text xi.
The method of claim 3, further comprising:

The positive sample error rate E + and negative sample error rate E- of the lifting algorithm learning model are calculated according to the following two formulas:

Calculate the error rate ratio of the learning model of the lifting algorithm according to the following formula:

If the error rate ratio is within a preset range, continue to detect whether the error rate is lower than the learning threshold.

Where s is the number of positive sentiment sample texts in the verification subset D, v is the number of negative sentiment sample texts in the verification subset D, and m = s + v.
The method according to claim 4, wherein the method further comprises:

If the error rate ratio is not within the preset range, the training subset T is used again to train the lifting algorithm learning model;

Recalculate the error rate ratio of the learning model of the lifting algorithm by the following formula:

It is checked again whether the error rate ratio is within the preset range.
The method according to claim 1, wherein the sentiment classification tags include level 1 positive emotion text, level 2 positive emotion text, ..., level n positive emotion text and level 1 negative emotion text, level 2 negative emotion text Text, ..., n-level negative emotional text, n is an integer greater than 1.
The method according to claim 1, wherein the learning model of the lifting algorithm comprises a gradient lifting decision tree model, an Adaboost model or an Xgboost model.
A text emotion recognition device, characterized in that it includes:

A sample acquisition module, configured to acquire a sample text set, the sample text set including a plurality of sample texts and emotion classification tags corresponding to the sample texts;

The cost correction module is used for correcting and calculating the initial cost according to the number distribution of the sentiment classification tags in the sample text set, to obtain the correction cost.

A model acquisition module is used to train a lifting algorithm learning model through the sample text set and the correction cost to obtain a text emotion recognition model.

The target recognition module is used for recognizing the text to be recognized through the text emotion recognition model to obtain the emotion recognition result of the text to be recognized.
The apparatus according to claim 8, wherein the sentiment classification label includes positive sentiment text and negative sentiment text, and the cost correction module includes:

The initial cost acquisition unit is used to obtain initial costs cost 10 and cost 01 , cost 10 is the initial cost of mistaken positive emotion text as negative emotion text, and cost 01 is the initial cost of mistaken negative emotion text as positive emotion text;

A text statistics unit, used to count the number Q1 of positive emotional text and the number Q0 of negative emotional text in the sample text set;

The cost correction unit is used to modify and calculate the initial cost by the following formula to obtain the modified cost:

Among them, R 10 is the sample deviation ratio, costm 10 is the correction cost of mistaken positive emotion text as negative emotion text, costm 01 is the correction cost of mistaken negative emotion text as positive emotion text, and a is the exponential parameter.
The apparatus according to claim 8, the model acquisition module comprising:

A dividing unit for dividing the sample text set into a training subset T and a verification subset D, D = {x 1 , x 2 … x m };

A training unit for training the lifting algorithm learning model using the training subset T;

A verification unit, for acquiring the emotion recognition result f (xi) of each sample text xi in the verification subset D through the lifting algorithm learning model;

The calculation unit is used to calculate the error rate of the learning model of the lifting algorithm according to the following formula:

The judging unit is used to judge that the lifting algorithm learning model training is completed if the error rate is lower than the learning threshold, and determine the trained training algorithm learning model as the text emotion recognition model;

Where m is the number of sample texts in the verification subset, i ∈ [1, m]; E is the error rate of the learning model of the lifting algorithm, and D + is the positive emotion sample text subset of the verification subset D, D- is a negative emotion sample text subset of the verification subset D, and y i is an emotion classification label of the sample text xi.
The device according to claim 10, further comprising:

The calculation unit is also used to calculate the positive sample error rate E + and the negative sample error rate E- of the boosting algorithm learning model according to the following formulas:

And calculate the error rate ratio of the learning model of the lifting algorithm according to the following formula:

The judging unit is also used to continue to detect whether the error rate is lower than the learning threshold if the error rate ratio is within a preset range.

Where s is the number of positive sentiment sample texts in the verification subset D, v is the number of negative sentiment sample texts in the verification subset D, and m = s + v.
The device according to claim 11, further comprising:

The training unit is further used to train the lifting algorithm learning model again using the training subset T if the error rate ratio is not within the preset range;

The calculation unit is also used to recalculate the error rate ratio of the learning model of the lifting algorithm by the following formula:

The judging unit is also used to detect again whether the error rate ratio is within the preset range.
A text emotion recognition device, characterized in that it includes a processor and a memory, and the memory stores computer-readable instructions. When the computer-readable instructions are executed by the processor, the processor is used to perform the following processing :

Acquiring a sample text set, the sample text set including a plurality of sample texts and emotion classification tags corresponding to the sample texts;

Modify the initial cost according to the number distribution of the sentiment classification tags in the sample text set to obtain the modified cost;

Training a lifting algorithm learning model through the sample text set and the modified cost to obtain a text emotion recognition model;

Recognize the text to be recognized through the text emotion recognition model to obtain the emotion recognition result of the text to be recognized.
The device according to claim 13, wherein the sentiment classification tags include positive sentiment text and negative sentiment text;

The modifying the initial cost according to the number distribution of the sentiment classification tags in the sample text set, and obtaining the modified cost includes:

Obtain initial costs cost 10 and cost 01 , cost 10 is the initial cost of mistaken positive emotion text as negative emotion text, cost 01 is the initial cost of mistaken negative emotion text as positive emotion text;

Count the number of positive emotional text Q1 and the number of negative emotional text Q0 in the sample text set;

Modify the initial cost by the following formula to obtain the modified cost:

Among them, R 10 is the sample deviation ratio, costm 10 is the correction cost of mistaken positive emotion text as negative emotion text, costm 01 is the correction cost of mistaken negative emotion text as positive emotion text, and a is the exponential parameter.
The apparatus according to claim 13, wherein the training model for training a lifting algorithm through the sample text set and the correction cost is used to obtain a text emotion recognition model, and the processor is configured to perform the following processing:

Divide the sample text set into a training subset T and a verification subset D, D = {x 1 , x 2 … x m };

Training the lifting algorithm learning model using the training subset T;

Acquiring the emotion recognition result f (xi) of each sample text xi in the verification subset D through the lifting algorithm learning model;

Calculate the error rate of the improved algorithm learning model according to the following formula:

If the error rate is lower than the learning threshold, it is determined that the lifting algorithm learning model training is completed, and the trained training algorithm learning model is determined as the text emotion recognition model;

Where m is the number of sample texts in the verification subset, i ∈ [1, m]; E is the error rate of the learning model of the lifting algorithm, and D + is the positive emotion sample text subset of the verification subset D, D- is a negative emotion sample text subset of the verification subset D, and y i is an emotion classification label of the sample text xi.
The apparatus according to claim 15, wherein the processor is further configured to perform the following processing:

Calculate the positive sample error rate E + and negative sample error rate E- of the lifting algorithm learning model according to the following formulas:

Calculate the error rate ratio of the learning model of the lifting algorithm according to formula (7):

If the error rate ratio is within a preset range, continue to detect whether the error rate is lower than the learning threshold.

Where s is the number of positive sentiment sample texts in the verification subset D, v is the number of negative sentiment sample texts in the verification subset D, and m = s + v.
The apparatus according to claim 16, wherein the processor is further configured to perform the following steps:

If the error rate ratio is not within the preset range, the training subset T is used again to train the lifting algorithm learning model;

Recalculate the error rate ratio of the learning model of the lifting algorithm by the following formula:

It is checked again whether the error rate ratio is within the preset range.
A computer non-volatile readable storage medium on which a computer program is stored, characterized in that, when the computer program is executed by a processor, the processor is used to perform the following steps:

Acquiring a sample text set, the sample text set including a plurality of sample texts and emotion classification tags corresponding to the sample texts;

Modify the initial cost according to the number distribution of the sentiment classification tags in the sample text set to obtain the modified cost;

Training a lifting algorithm learning model through the sample text set and the modified cost to obtain a text emotion recognition model;

Recognize the text to be recognized through the text emotion recognition model to obtain the emotion recognition result of the text to be recognized.
The computer non-volatile readable storage medium according to claim 18, wherein the emotion classification label includes positive emotional text and negative emotional text;

The modifying the initial cost according to the number distribution of the sentiment classification tags in the sample text set, and obtaining the modified cost includes:

Obtain initial costs cost 10 and cost 01 , cost 10 is the initial cost of mistaken positive emotion text as negative emotion text, cost 01 is the initial cost of mistaken negative emotion text as positive emotion text;

Count the number of positive emotional text Q1 and the number of negative emotional text Q0 in the sample text set;

Modify the initial cost by the following formula to obtain the modified cost:

Among them, R 10 is the sample deviation ratio, costm 10 is the correction cost of mistaken positive emotion text as negative emotion text, costm 01 is the correction cost of mistaken negative emotion text as positive emotion text, and a is the exponential parameter.
The computer non-volatile readable storage medium according to claim 18, characterized in that the training model of a lifting algorithm is trained through the sample text set and the correction cost to obtain a text emotion recognition model, and The device is used to perform the following steps:

Divide the sample text set into a training subset T and a verification subset D, D = {x 1 , x 2 … x m };

Training the lifting algorithm learning model using the training subset T;

Acquiring the emotion recognition result f (xi) of each sample text xi in the verification subset D through the lifting algorithm learning model;

Calculate the error rate of the improved algorithm learning model according to the following formula:

If the error rate is lower than the learning threshold, it is determined that the lifting algorithm learning model training is completed, and the trained training algorithm learning model is determined as the text emotion recognition model;

Where m is the number of sample texts in the verification subset, i ∈ [1, m]; E is the error rate of the learning model of the lifting algorithm, and D + is the positive emotion sample text subset of the verification subset D, D- is a negative emotion sample text subset of the verification subset D, and y i is an emotion classification label of the sample text xi.
The computer non-volatile storage medium according to claim 20, wherein when the computer program is executed by the processor, the processor is further caused to perform the following processing:

Calculate the positive sample error rate E + and negative sample error rate E- of the lifting algorithm learning model according to the following formulas:

Calculate the error rate ratio of the learning model of the lifting algorithm according to formula (7):

If the error rate ratio is within a preset range, continue to detect whether the error rate is lower than the learning threshold.

Where s is the number of positive sentiment sample texts in the verification subset D, v is the number of negative sentiment sample texts in the verification subset D, and m = s + v.
The computer non-volatile storage medium according to claim 21, wherein when the computer program is executed by the processor, the processor is further caused to perform the following processing:

If the error rate ratio is not within the preset range, the training subset T is used again to train the lifting algorithm learning model;

Recalculate the error rate ratio of the learning model of the lifting algorithm by the following formula:

It is checked again whether the error rate ratio is within the preset range.