CN108595568B

CN108595568B - Text emotion classification method based on great irrelevant multiple logistic regression

Info

Publication number: CN108595568B
Application number: CN201810332338.5A
Authority: CN
Inventors: 雷大江; 张红宇; 陈浩; 张莉萍; 吴渝; 杨杰; 程克非
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2018-04-13
Filing date: 2018-04-13
Publication date: 2022-05-17
Anticipated expiration: 2038-04-13
Also published as: CN108595568A

Abstract

The invention provides a text emotion classification method based on a great irrelevant multiple logistic regression, which comprises the following steps: acquiring text data and preprocessing the text data; on the basis of the cost function of the first model, a cost function of a second model is obtained by introducing a related parameter penalty term; inputting the training data obtained by preprocessing into a derivative function of a cost function of the second model, and solving to obtain the second model; the first model is a multiple logistic regression model, and the second model is a maximum-independent multiple logistic regression model; and inputting the data to be predicted obtained by preprocessing into the second model to obtain the emotion category to which each text entry in the data to be predicted belongs. The method has the advantages that the method is made to have higher robustness for redundant data by adding irrelevant constraint items; the complexity of the traditional multiple logistic regression model is reduced, and the generalization capability is stronger; and then the text entries in the acquired target text data can be accurately classified.

Description

Text emotion classification method based on great irrelevant multiple logistic regression

Technical Field

The invention relates to the field of machine learning, in particular to a text emotion classification method based on extremely large irrelevant multiple logistic regression.

Background

The classification is used as a key part of machine learning and data mining, and has wide application in the aspects of image recognition, drug development, voice recognition, handwriting recognition and the like. It is a supervised learning problem that identifies which class a new instance belongs to based on a known training set. In a classification algorithm, non-linear classification capabilities and whether it can be extended to multi-classification are important.

A Support Vector Machine (SVM) is a classical binary classifier that uses the Hinge loss to establish the optimal boundary between datasets by solving a quadratic optimization problem with constraints. Compared with other algorithms, the important advantages are that: SVMs can be used for both linear and non-linear classification by using different kernel functions. But SVMs are very limited in multi-class classification because they rely on a one-to-one pattern, and these methods still have many negative effects despite many efforts in extending SVMs to multi-class classification. For example, in multi-class classification, the SVM one-to-many decision method is deeply influenced by imbalance among classes of data sets. Another important issue is that it is possible to assign the same instance to multiple classes. Although many methods have been proposed to address these problems, they all have other adverse effects: such as efficiency. The result of SVM is purely dichotomous and does not support probabilistic output. The SVM is not comparable from the numerical output of one task to the numerical output of another task. Furthermore, such unlimited values make it difficult for the end user to interpret what is behind them, as compared to a confidence-based classifier.

Logistic Regression (LR) is one of the important methods of classification. Standard logistic regression uses logistic losses and is classified by a coefficient weighted linear combination of the input variables. Logistic regression greatly reduces the weight of points far away from a classification plane through nonlinear mapping, improves the weight of data points most relevant to classification, can give corresponding class distribution estimation from a given class compared with a support vector machine, and also has great advantages in model training time. Logistic regression is relatively simple and well understood in terms of model, and is convenient to realize when large-scale linear classification is carried out. Furthermore, standard logistic regression is more easily extended to multi-class classes than support vector machines. Some improved algorithms for logistic regression are for example: sparse logistic regression, weighted logistic regression and the like all have good effects in corresponding fields.

However, logistic regression can only be applied to the two-class problem, and cannot be directly applied to the multi-class (class k >2) classification problem. In order to solve the multi-classification problem by using logistic regression, two types of logistic regression extension modes are generally used, one type is to establish k independent binary classifiers, each classifier marks one type of samples as positive samples, and marks all other types of samples as negative samples. For a given test sample, each classifier can get the probability that the test sample belongs to this class, and thus can perform multi-classification by taking the maximum class probability. Another category is called Multiple Logistic Regression (MLR), which is a generalization of the Logistic Regression model to the multi-classification problem. The specific method to be selected for processing the multi-classification problem generally depends on whether the classes to be classified are mutually exclusive. For multi-classification problems, there are usually mutual exclusions between classes. Thus, using multiple logistic regression generally gives better results than logistic regression. Meanwhile, the multiple logistic regression only needs to be trained once, so that the method has higher running speed.

In the field of computer information processing, a text data set usually contains more common information, the common information greatly increases the complexity and the recognition error of recognition, and although multivariate logistic regression trains multiple groups of parameters to calculate corresponding probabilities for each category, the problem of whether the parameters of each group are related or not is not considered. Therefore, the realization of the text emotion classification method based on the multivariate logistic regression with great independence has certain practical significance.

Disclosure of Invention

In order to solve the technical problem, the invention provides a text emotion classification method based on a great irrelevant multiple logistic regression, which comprises the following steps:

acquiring text data and preprocessing the text data; the text data comprises training data and data to be predicted; the data to be predicted comprises a plurality of text entries;

on the basis of the cost function of the first model, a cost function of a second model is obtained by introducing a related parameter penalty term;

inputting the training data obtained by preprocessing into a derivative function of a cost function of the second model, and solving to obtain the second model; the first model is a multiple logistic regression model, and the second model is a maximum-independent multiple logistic regression model;

and inputting the data to be predicted obtained by preprocessing into the second model to obtain the emotion category to which each text entry in the data to be predicted belongs.

Further, the step of inputting the preprocessed data to be predicted into the second model to obtain the emotion category to which each text entry in the data to be predicted belongs includes:

inputting each text entry in the data to be predicted obtained through preprocessing into the second model to obtain the text emotion category probability of each text entry;

setting a classification threshold;

when the text emotion category probability of the text entry is larger than the classification threshold value, judging that the text entry belongs to a first emotion category;

and when the text emotion category probability of the text entry is less than or equal to the classification threshold value, judging that the text entry belongs to a second emotion category.

Further, the obtaining a cost function of the second model by introducing a penalty term of a relevant parameter on the basis of the cost function of the first model includes:

obtaining a negative log-likelihood function of a model parameter of the first model;

acquiring irrelevant constraint items;

and introducing the uncorrelated constraint terms into the cost function of the first model to obtain the cost function of the second model.

Further, the first model is:

wherein

The negative log-likelihood function of the parameter θ of the first model is:

the negative log-likelihood function is the cost function of the first model; where m is the number of independent samples.

Further, the irrelevant constraint term is:

the irrelevant constraint item is a relevant parameter penalty item; wherein, theta_iAnd theta_jAny two different sets of parameters;

the cost function of the second model is:

further, the derivative function of the cost function of the second model is:

the text emotion classification method based on the great irrelevant multiple logistic regression has the technical effects that:

on the basis of a traditional multiple logistic regression model, a cost function of the largely irrelevant multiple logistic regression model is obtained by introducing a relevant parameter penalty term (irrelevant constraint term); and obtaining the maximum irrelevant multiple logistic regression model according to the derivative function of the cost function of the maximum irrelevant multiple logistic regression model. The method has the advantages that the method is made to have higher robustness for redundant data by adding irrelevant constraint items; the complexity of the traditional multiple logistic regression model is reduced, and the obtained new classification model (the largely irrelevant multiple logistic regression model) has stronger generalization capability; and then the text entries in the acquired target text data can be accurately classified.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions and advantages of the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a flowchart of a text sentiment classification method based on maximal irrelevant multiple logistic regression according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating an example of target text data provided by an embodiment of the invention;

FIG. 3 is a flowchart of a method for determining an emotion classification to which each text entry belongs according to an embodiment of the present invention;

FIG. 4 is a flowchart of a method for obtaining a cost function of a second model according to an embodiment of the present invention;

fig. 5 is a diagram illustrating the magnitude of the MNIST data set MLR and UMLR parameter norms provided in an embodiment of the present invention;

FIG. 6 is a diagram illustrating exemplary MLR and UMLR parameter norms for the COIL20 data set provided in an embodiment of the present invention;

fig. 7 is a diagram illustrating the norm sizes of the ORL data set MLR and UMLR parameters provided in the embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in other sequences than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be noted that the prior art includes the Logistic Regression (LR) algorithm and l₂The constrained multiple logistic regression (RMLR) algorithm has some disadvantages and shortcomings in classification applications, and an improved algorithm is proposed that is largely independent of multiple logistic regression algorithms.

Logistic Regression (LR) algorithm:

for logistic regression, it is assumed that there is a data set D ═ x_i,y_i},i＝1,…,N,x_i∈RD,y_iE {0,1}, and the input vector x ═ x (x)⁽¹⁾,…,x^(D)) Class label y is a binary function: y is 0 or 1. Logistic Regression (LR) is a probabilistic model based on:

wherein the content of the first and second substances,

called Logistic function or Sigmoid function.

For the binary problem, assuming that y takes a value of 0 or 1, and the probability of y ═ 1 follows bernoulli distribution, then:

p(y＝1|x；θ)＝h_θ(x)

p(y＝0|x；θ)＝1-h_θ(x)

the two formulas can be combined as follows:

p(y|x；θ)＝h_θ(x)^y(1-h_θ(x))^1-y (2)

where y is equal to 0, 1. Assuming that the m samples are independent, a likelihood function for the parameter θ can be written:

the log-likelihood function can be expressed as:

the optimum θ can be obtained by maximizing l (θ). Usually make

And obtaining a loss function corresponding to l (theta), and solving the optimal theta by minimizing the loss function. However, logistic regression can only deal with the two-class problem and cannot be directly applied to the multi-class problem.

l₂Constrained multiple logistic regression (RMLR) algorithm:

for the problem that the traditional logistic regression can not process the multi-classification, the Multiple Logistic Regression (MLR) adapts to the multi-classification problem by modifying the cost function of the logistic regression.

Suppose that the data set D ═ x_i,y_i},i＝1,…,N,x_i∈RD,y_i∈{0,…,K}(K>2) The input vector is x ═ x⁽¹⁾,…,x^(D)) Multivariate Logistic Regression (MLR) is a probabilistic model based on:

wherein

The cost function is:

however, multiple logistic regression has an unusual feature that has a "redundant" set of parameters. Suppose we derive a parameter vector θ from_jThe vector ψ is subtracted from, at which time, each θ_jAll become theta_j- ψ (j ═ 1, …, k). At this time, the assumption function becomes the following equation:

this indicates a deviation from θ_jSubtracting ψ does not affect the prediction result of the hypothesis function at all, that is, there are redundant parameters in the above-described multiple logistic regression model.

For the multiple logistic regression model over-parameterization problem, l₂The constrained multiple logistic regression (RMLR) algorithm modifies the cost function by adding a weighted decay term that penalizes excessive parameter values, changing the cost function into a strict convex function, thus ensuring that a unique solution is obtained. The cost function is:

the Hessian matrix at this time becomes a reversible matrix, and since the cost function is a convex function, convergence to a global optimal solution can be guaranteed by using an optimization algorithm. Although l₂Constrained multiple logistic regression (RMLR) algorithms alleviate the overfitting problem to some extent, however for datasets with redundant information, l₂Constrained multiple logistic regression (RMLR) algorithms perform poorly.

According to the analysis, a great irrelevant multiple logistic regression model is further provided: specifically, the embodiment provides a text emotion classification method based on a maximum irrelevant multiple logistic regression, as shown in fig. 1, the method includes:

s101, acquiring text data and preprocessing the text data; the text data comprises training data and data to be predicted; the data to be predicted comprises a plurality of text entries;

for example, the evaluation text data after the consumer-to-store consumption is read, and the text data is composed of the comments after the consumer-to-store consumption. As shown in fig. 2, the first column is a text label column, 0 represents a positive comment, and 1 represents a negative comment. The second column is the consumer review column. Then, because a large amount of noise exists in the original text data, the training is not suitable for direct training, and corresponding preprocessing is needed; preprocessing the evaluation text data set, specifically comprising:

in step S103 and step S106, the method for preprocessing the RDD includes:

acquiring interval characters in a text comment sentence to be processed, and replacing the interval symbols with empty character strings;

acquiring special character strings, numbers and the like in the comment sentences, and replacing the special character strings, the numbers and the like with empty character strings;

acquiring words expressing fuzzy tone in the comment sentences, converting the words expressing fuzzy tone into absolute expression words, and converting the expression of fuzzy tone into absolute expression;

adding a custom dictionary, and adding a noun with higher frequency in the text comment sentence to be processed into the custom dictionary;

performing word segmentation on words in the processed comment sentences, and filtering stop words in the comment sentences;

and performing vector conversion on the words in the comment sentences of which the word segmentation is finished so as to generate word vectors.

Specifically, the method for preprocessing the text to be processed comprises the following steps:

matching comments beginning with and ending with a "#" in the comments by using a function re.complex ('# ([ < lambda > ]) #'), and replacing with a null character string; re is a python regular expression module, and functions in re can be directly called to realize the regular matching of the character strings.

Matching special character strings, numbers and the like in the comment by using a function re, namely, using u '[ < Lambda >/u 4e00- \\ u9fa5| a-zA-Z ] +', and replacing the special character strings with null character strings;

replacing the comment text with a function flash. The fuzzy mood expression is converted into an absolute expression. For example, replacing "what is not so" with "bad", "not special" with "not;

adding a self-defined dictionary, and adding new nouns into the dictionary aiming at nouns with higher frequency in the text data set, so that the word segmentation accuracy is enhanced; the method is characterized in that a user-defined dictionary is added with a specific noun according to a specific scene, and word segmentation is completed more efficiently and accurately.

Segmenting the comment by using a function jieba.cut (), and filtering stop words in the comment; wherein the stop word is a word or words that is not helpful to the text classification target, such as 'of', 'on', 'o', etc.; different scenes have different stop word lists, and stop words in the text are deleted according to the corresponding stop word lists.

The comment data set that has completed the word segmentation is converted into a word2vec model using the function generic.

For a comment, a word vector of each word is generated by using a word2vec model, and the word vectors of all the words in the comment are averaged according to dimensions to obtain word vector representation of the comment. Assume that the comment dataset contains n non-identical words. If a sentence contains m words, the word vector of each word of the sentence is shown as equation (81):

the word vector of the sentence is as shown in equation (82):

wherein the content of the first and second substances,

and repeating the step of generating the word vector corresponding to each word by using the word2vec model, thereby obtaining the word vector representation of the whole comment data set.

S102, on the basis of the cost function of the first model, obtaining a cost function of a second model by introducing a related parameter penalty term;

s103, inputting the training data obtained through preprocessing into a derivative function of a cost function of a second model, and solving to obtain the second model; the first model is a multiple logistic regression model, and the second model is a maximum irrelevant multiple logistic regression model;

and S104, inputting the data to be predicted obtained through preprocessing into the second model to obtain the emotion type of each text entry in the data to be predicted.

Namely, introducing a relative parameter penalty term, and establishing a maximum irrelevant multiple logistic regression model; inputting the processed formatted data into a model, and predicting the emotion type of each evaluation text.

Specifically, in step S104, the preprocessed data to be predicted is input into the second model, so as to obtain an emotion category to which each text entry in the data to be predicted belongs, as shown in fig. 3, where the emotion category includes:

s104a, inputting each text entry in the data to be predicted obtained through preprocessing into the second model to obtain the text emotion category probability of each text entry;

s104b, setting a classification threshold value;

wherein, preferably, the classification threshold is in a binary classification problem in the multivariate logistic regression, and is specifically 0.5.

S104c, when the text emotion category probability of the text entry is larger than the classification threshold, judging that the text entry belongs to a first emotion category;

s104d, when the text emotion category probability of the text entry is smaller than or equal to the classification threshold value, judging that the text entry belongs to a second emotion category.

For example; and setting the classification threshold value to be 0.5, and when the probability of the class to which the sample belongs is calculated by the model to be greater than 0.5, marking the comment as 1 and representing the comment as a positive comment. When the probability of the sample belonging to the category is less than or equal to 0.5, the comment is marked as 0 and is represented as a negative comment.

In the field of computer information processing, a data set usually contains more common information, the common information greatly increases the complexity and the identification error of identification, although multivariate logistic regression trains k groups of parameters to calculate corresponding probability for each category, the problem of whether the k groups of parameters are related or not is not considered, if the parameters (theta)₁,θ₂,…,θ_k) Is the minimum point of the cost function, any parameter theta_iAll can be replaced by other theta_j(j ≠ i) linear representation, i.e.

θ_i＝λ₀+∑_j≠iλ_jθ_j (9)

This indicates that there is a correlation between the parameters of the different classes. l₂Regularization

Although the intra-group elements of each group of parameters are constrained, the problems related to different types of parameters are not considered, so that the classification effect on the data sets with more redundant information is poor. For any two different sets of parameters theta_iAnd theta_jAccording to the basic inequality:

wherein if and only if θ_i＝θ_jThe maximum value is obtained.

If theta_iAnd theta_jCorrelation, i.e. theta_i＝λ₀+λ_jθ_jThen, then

The value is large, so we add an irrelevant constraint term:

the constraint penalizes the relevant parameters to ensure that more irrelevant and discriminant features are retained as much as possible. And because of

The cost function can thus be obtained as:

to use the optimization algorithm, the derivative of J (θ) is found as follows:

according to the derivation, the irrelevant parameter theta can be rapidly obtained through the gradient descent algorithm and the improved algorithm thereof.

Specifically, in this embodiment, in step S102, on the basis of the cost function of the first model, by introducing a penalty term of a relevant parameter, the obtaining of the cost function of the second model, as shown in fig. 4, includes:

s102a, obtaining a negative log-likelihood function of a model parameter of a first model;

s102b, acquiring irrelevant constraint items;

s102c, introducing the irrelevant constraint item into the cost function of the first model to obtain the cost function of the second model.

Correspondingly, the first model is:

wherein

The negative log-likelihood function of the parameter θ of the first model is:

the negative log-likelihood function is the cost function of the first model; where m is the number of independent samples. Further, the irrelevant constraint term is:

the irrelevant constraint item is a relevant parameter penalty item; wherein, theta_iAnd theta_jAny two different sets of parameters; the cost function of the second model is:

further, the derivative function of the cost function of the second model is:

aiming at the above, the algorithm comprises the following steps:

inputting: training set D { (x)₁,y₁),(x₂,y₂),…,(x_m,y_m)}；

The process is as follows:

Initializeλ,η,Θ

While stopping criterion are not satisfied do:

Forj＝1,2,…,k:

Θ＝L-BFGS(Loss,dΘ)

and (3) outputting: regression coefficient theta

Further, performing convergence analysis on the maximal-independence multiple logistic regression algorithm:

loss function according to the maximum independent multiple logistic regression:

it is possible to obtain:

j (θ) is a strictly convex function because the second derivative of J (θ) is constantly greater than 0.

Wherein the algorithm convergence is verifiable according to an online learning framework analysis algorithm and a convergence analysis with respect to the Adam algorithm.

Further, the maximal irrelevant multiple logistic regression (UMLR) algorithm proposed by the present invention was evaluated. The experimental results mainly focus on the following two problems of classification precision and execution speed. Data classification algorithms for comparison include weight-decaying multivariate logistic regression, support vector machines, and parameter-independent multivariate logistic regression. The experiment respectively adopts artificial data sets with different correlation degrees and 4 real data sets such as MNIST, COIL20, GT and ORL, and the verification mode is cross-fold verification.

(1) Normalization

Suppose Φ (x)_minAnd phi (x)_maxRespectively, a maximum and a minimum in the data set. For one example, the normalization algorithm is as follows:

and the dimensionalized expression is converted into the dimensionless expression in a normalization mode, so that the problem of unbalanced data contribution is solved.

(2) Experimental results on an artificial data set

To verify the validity of the algorithm on a linearly dependent dataset, we generated an artificial dataset as follows: the intra-class correlation degree is more than 0.9, and the inter-class similarity degrees are respectively 0.5, 0.6, 0.7, 0.8 and 0.9.

The sample size and data dimension are chosen to be (m, n) ═ 5000,1000, for a total of 5 classes, 1000 samples per class.

The following are the maximum independent multiple logistic regression algorithm and for data of different correlation degrees₂And (4) comparing the recognition rates of the constraint multiple logistic regression algorithm.

TABLE 1 recognition rates of MLR, UMLR different correlation data sets

(3) Experimental results on MNIST and COIL20 data sets

MNIST datasets are widely used in the field of pattern recognition. It contains 10 categories, ten of which correspond to handwritten numbers 0-9, with 5000 pictures per category. The COIL20 dataset had 20 different categories with 72 pictures in each category.

TABLE 2 recognition rates of SVM, MLR, UMLR against MINIST, COIL20 data sets

The above table demonstrates the accuracy of three different algorithms for two data sets. FIG. 5 is a diagram illustrating the magnitude of the MLR and UMLR parameter norms of the MNIST data set; FIG. 6 is a diagram of the MLR and UMLR parameter norm sizes of the COIL20 data set. Wherein, the left side in fig. 5 and fig. 6 corresponds to the UMLR parameter norm size histogram under the corresponding data set, and the right side in fig. 5 and fig. 6 corresponds to the MLR parameter norm size histogram under the corresponding data set.

(4) Experimental results on the GT and ORL datasets

The GT data set has 50 categories, each containing 15 pictures. The ORL dataset contains 20 categories, each containing 10 pictures.

TABLE 3 recognition rates of SVM, MLR, UMLR against GT, ORL data sets

FIG. 7 is a diagram illustrating the magnitudes of the norm of the ORL data set MLR and the UMLR parameters; the left side of fig. 7 corresponds to a histogram of UMLR parameter norm size under the corresponding data set, and the right side of fig. 7 corresponds to a histogram of MLR parameter norm size under the corresponding data set.

(5) Analysis of Experimental results

The experimental result shows that the great irrelevant multiple logistic regression compares₂The constrained multiple logistic regression algorithm and the support vector machine algorithm have higher classification precision. The method has obvious effect particularly on the data sets with high correlation among classes, and shows that the method has high robustness on redundant data. The convergence parameter is compared to the comparison₂The convergence parameters of the constraint multiple logistic regression are small, which generally means that the constraint multiple logistic regression has stronger generalization capability.

According to the analysis of the experimental results, the classification is used as an important branch of pattern recognition and data mining, and has increasingly wide application fields, so that the classification gradually becomes the core and key technology of public security criminal investigation, electronic payment, medical treatment and other systems.

The invention provides a great irrelevant multiple logistic regression model; the method constructs a novel classifier based on a basic model of multiple logistic regression. The experimental result shows that the method has advantages over the traditional classification algorithm in classification precision and classification robustness. And the model obtained by training has stronger interpretability than methods such as a support vector machine and naive Bayes.

In summary, the text emotion classification method based on the largely irrelevant multiple logistic regression provided by the invention has the technical effects that:

It should be noted that: the sequence of the above embodiments of the present invention is only for description, and does not represent the advantages or disadvantages of the embodiments.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A text emotion classification method based on maximal irrelevant multiple logistic regression is characterized by comprising the following steps:

on the basis of the cost function of the first model, the cost function of the second model is obtained by introducing a relative parameter penalty term, and the method comprises the following steps: acquiring irrelevant constraint items;

inputting the data to be predicted obtained by preprocessing into the second model to obtain the emotion category to which each text entry in the data to be predicted belongs;

wherein the first model is:

wherein

The irrelevant constraint term is:

the irrelevant constraint item is a relevant parameter penalty item; wherein, theta_iAnd theta_jAre any two different sets of parameters.

2. The method of claim 1, wherein the inputting the preprocessed data to be predicted into the second model to obtain an emotion category to which each text entry in the data to be predicted belongs comprises:

setting a classification threshold;

3. The method according to claim 1 or 2, wherein the obtaining the cost function of the second model by introducing a relevant parameter penalty term on the basis of the cost function of the first model further comprises:

4. The method of claim 3, wherein the negative log-likelihood function for the parameter θ of the first model is:

5. The method of claim 4, wherein the cost function of the second model is:

6. the method of claim 5, wherein the derivative function of the cost function of the second model is: