CN110929030A

CN110929030A - Text abstract and emotion classification combined training method

Info

Publication number: CN110929030A
Application number: CN201911080385.6A
Authority: CN
Inventors: 高建彬; 潘慧
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2019-11-07
Filing date: 2019-11-07
Publication date: 2020-03-27
Anticipated expiration: 2039-11-07
Also published as: CN110929030B

Abstract

The invention provides a text abstract and emotion classification combined training method, which is realized by adopting a text abstract and emotion classification combined model and specifically comprises the following steps: preprocessing a text, and constructing a training set vocabulary; constructing a text abstract model, and performing text abstract task pre-training; and adding an emotion classification layer on the basis of the text abstract model, constructing a layered end-to-end model, and performing joint training on emotion classification and text abstract tasks. According to the text abstract and emotion classification combined training method, content consistency between the generated abstract and the input text can be improved through combined training of two types of tasks, the generated abstract can better contain emotion information of the input text, and key information of the input text is extracted through the abstract tasks, so that emotion prediction is more accurate.

Description

Text abstract and emotion classification combined training method

Technical Field

The invention relates to a text abstract and emotion classification method in the field of natural language processing, in particular to a text abstract and emotion classification based combined training method.

Background

With the explosive growth of text information in recent years, people can be exposed to massive text information, such as news, microblogs, blogs, reports, treatises and the like every day. The text abstract has wide application scenes, is visual and can be used for generating news headlines, thesis keywords, abstracts and the like; in a broad view, the text summarization technology can also be applied to result optimization of search engines such as Google and Baidu, and all tasks requiring extraction of key information from a text to form a refined expression can be solved by using an automatic text summarization technology. The mainstream methods of text summarization are divided into two categories: an extraction formula (active) and a production formula (inactive). The extraction method extracts some representative text segments from the original document set to form an abstract, and the segments can be sentences, clauses, paragraphs or measures in the whole document according to different segmentation modes of the input text. The generation method is based on a deep learning technology, adopts a Sequence-to-Sequence framework and is added with an Attention mechanism at the same time, and generates a summary containing key contents of a text aiming at an input text. Compared with an extraction method, the generation method is more complex, but the obtained abstract is more refined and concise, and the expression is smoother.

Emotion classification is the assignment of an emotion tag to text to determine attitudes or opinions in the text. This is also known as opinion mining, i.e., extracting opinions or attitudes of the speaker to derive. Emotion classification includes unsupervised and supervised methods, including the use of emotion dictionaries, parsing, syntactic patterns, and the like. In the supervision method, a traditional machine learning method (such as a support vector machine, maximum entropy, naive Bayes and the like) and feature combinations are mainly adopted, and along with the development of deep learning, a deep model combining a cyclic neural network (R NN), a Convolutional Neural Network (CNN) and an Attention mechanism is well developed in an emotion classification task.

The text summarization task and the emotion classification are both targeted at the main idea of text mining. Text summaries describe text in a more concrete way using words and sentences, whereas emotion classification summarizes text in a more abstract way using labels. In previous research work, the text summarization and emotion classification tasks were trained separately through models, so that the joint expression of the two tasks was not well learned between the two models.

Disclosure of Invention

Aiming at the existing problems, the invention combines with an Attention mechanism in deep learning and utilizes a layered end-to-end model framework to carry out joint training on the text abstract and the emotion classification task so as to simultaneously improve the learning effect of the text abstract and the emotion classification task.

The invention provides a text abstract and emotion classification combined training method, which comprises the following specific steps:

step 1: and (3) preprocessing the text, training by using large-scale data of Chinese Wikipedia and the like through Word2Vec, Glove, the latest Elmo, Bert and other language models to obtain a Word vector matrix, and calculating the fixed-length vector expression of Chinese words to facilitate the later vector expression of the text. And constructing a proper training set (required to be specific to each text sample and simultaneously containing abstract and emotion category labels), performing Chinese word segmentation and part-of-speech extraction on the text in the training set, and constructing a training set vocabulary.

Step 2: and (2) according to the word vector matrix obtained in the step (1), representing the text after word segmentation of the training set as a fixed-length vector, inputting the fixed-length vector as a model, performing text abstract task pre-training, constructing a text abstract model, and updating network parameters by using a gradient descent algorithm by using a large-scale text abstract data set until a loss function of the text abstract model is converged.

And step 3: performing text abstract and emotion classification task combined training, taking parameters obtained by the text abstract model training in the step 2 as initialization, adding an emotion classification layer on the basis of the text abstract model, constructing a combined loss function of the text abstract task and the emotion classification task, and finally performing end-to-end training on the whole network by using a gradient descent algorithm until the combined loss function is converged.

This hierarchy establishes a tight relationship between the text summary and the sentiment classification, so these two tasks can be promoted over each other. After the text is compressed through the text summarization layer, the emotion classifier can predict the emotion labels of the more refined text more easily. In addition, by adding the convolution gating module, the text abstract layer can also learn the importance distribution of the input text, obtain the weight distribution of the words in the input text, and remove redundant and misleading information harmful to the prediction emotion. And moreover, the emotion classification task can provide more important supervision signals for the text summarization task and guide the summarization component to acquire the emotion tendencies of the source text, and the content consistency between the generated summarization and the input text can be improved through the joint training of the two types of tasks. The method has the advantages that pre-training is carried out through a large-scale text abstract data set, partial parameters of a text abstract network are initialized, network convergence is accelerated, and a learning effect is improved. The method has the advantages that joint training is carried out on emotion analysis and text abstract tasks by utilizing a layered end-to-end model framework, text abstract and emotion classification level are improved, so that the generated abstract can better contain emotion information of an input text, commercial application is facilitated, and key information of the input text is extracted through the abstract tasks, so that emotion prediction is more accurate.

Drawings

FIG. 1 shows an Encoder partial convolution gate control unit

FIG. 2 is a schematic diagram of a text summarization and emotion classification combined model of the present invention

FIG. 3 is a diagram of the working effect of the text summarization and emotion classification combined model of the present invention

Detailed Description

All of the features disclosed in this application, or all of the steps in any method or process disclosed, may be combined in any combination, except combinations where mutually exclusive features or steps are present. Any feature disclosed in this application (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving equivalent or similar purposes, unless expressly stated otherwise. That is, unless expressly stated otherwise, each feature is only an example of a generic series of equivalent or similar features.

In the invention, a layered end-to-end model is designed to carry out joint training on emotion classification and text summarization tasks, the layered end-to-end model comprises a text summarization layer and an emotion classification layer, and the text summarization layer compresses a source text into a short sentence so as to generate a text summary; and the emotion classification layer is used for further summarizing the generated text abstract into an emotion category.

This hierarchy establishes a tight relationship between the text summary and the sentiment classification, so these two tasks can be promoted over each other. After the text is compressed through the text summarization layer, the emotion classifier can predict the emotion labels of the more refined text more easily. In addition, by adding a convolution gating module and referring to fig. 1, after the output of a Recurrent Neural Network (RNN) encoder, a one-dimensional convolution is realized by using a structure similar to inclusion, wherein k represents the size of a convolution kernel, the module uses a Convolutional Neural Network (CNN) to improve semantic representation of RNN output, so as to enhance the relation between the semantic representation and context, and a text abstract layer can also learn importance distribution of an input text, obtain weight distribution of words in the input text, and remove redundant and misleading information harmful to prediction emotion. And moreover, the emotion classification task can provide more important supervision signals for the text summarization task and guide the summarization component to acquire the emotion tendencies of the source text, and the content consistency between the generated summarization and the input text can be improved through the joint training of the two types of tasks.

The method of the invention is realized by adopting a text abstract and emotion classification combined model shown in figure 2, and the text abstract and emotion classification combined model comprises the following steps: a pre-training module (implemented by using models such as Word2Vec, Glove, and Elmo, Bert), a text summarization module, and a hierarchical end-to-end joint training module.

The pre-training module is used for pre-processing original texts in a training set; the text abstract module is used for generating a text abstract; the layered end-to-end module is used for combining the emotion classification task and the text abstract task to generate an emotion classification category which is adaptive to the text abstract.

The text abstract module is realized by adopting a generating text abstract model, is combined with an Attention mechanism and comprises an encoding layer (En coder) and a decoding layer (Decoder) structure, wherein the Encoder structure adopts a Bi-directional recurrent neural network Bi-LSTM, a convolution gate control unit and a self-Attention (self-Attention) mechanism are used for optimizing hidden state semantic representation, and the Decoder layer adopts a Pointer-Generator mechanism to generate a text abstract.

The Attention mechanism is widely applied to natural language tasks, and can be regarded as an automatic weighting, which is roughly defined as: given a set of vector set values and a vector query, the Attention mechanism is a mechanism for calculating the weighted sum of the vector set values according to the vector query, the query and the value are two vectors with the same dimension, and a number can be obtained through any formula in the formula (2). Currently, the following calculation formulas are mainstream:

the Attention mechanism is a channel connecting an encoding layer (Encoder) and a decoding layer (Decoder). Since the hidden state of each recurrent neural network RNN unit is preserved in the Encode, the hidden state m of the s-th time step of the Encode is assumed_sThen, for each time step in the Decoder, calculating to obtain the hidden state of the current time step, and assuming that the hidden state of the t-th time step is m_tThen, the weight of the s-th word of the input text of the corresponding coding part at the t-th time step can be calculated according to the formula and is marked as f (m)_t,m_s) And performing softmax calculation to obtain a final weight a_ts. dot, general, concat, and percentron represent weight calculation formulas of four current mainstream. Wherein n represents the total number of words of the input text of the encoding part, W_aRepresenting a parameter matrix, v_aRepresenting a parameter vector, U_aSame as W_aAnd updating the parameters through back propagation in the training process.

If the query is contained in the values set, it is a Self-Attention (Self-Attention) mechanism. The invention introduces the Self-orientation module in the Encoder part to extract the key information of the input text.

The Pointer-Generator mechanism can effectively solve the problem of generating an unknown word (OOV) and a low-frequency word in the summary, and the probability of automatically learning pointing (Pointer) and generating (Generator) through a network at each time step of the Decoder is defined as follows:

wherein

Is a context vector (corresponding to the hidden state of the Encoder at each t-time), s_tIs a hidden state of the Decoder part at time t, x_tIs the input at time t, σ represents a sigmoid function, maps the value to 0-1, scalar P_gen∈[0,1]Representing generator probability, 1-P_genRepresenting pointer probabilities. When P is present_genWhen biased to 1, the vocabulary is generated normally, and when biased to 0, the probability distribution a from the Attention is_t(a_tsRepresenting the probability of entering a word of text, a_tProbability vector representing the entire input text word) samples a corresponding word, and the final resulting word w is defined as follows:

where w represents the entire vocabulary of the training set, w_iThe ith word representing the vocabulary of words,

denotes the weight, P, of the ith word Attention in the vocabulary at the t-th time step of the Decoder_vocab(w) represents the lexical probability distribution when words are spoken by the model in generator,

represents the vocabulary probability distribution when the words are output in the pointer mode, and P (w) represents the total vocabulary probability distribution, and determines the final generated words.

The invention provides a text abstract and emotion classification combined training method, which comprises the following steps:

step 1, preprocessing original texts in a training set

(1.1) preprocessing the text by adopting a pre-training module, training by using large-scale data of Chinese Wikipedia and the like through models of Word2Vec, Glove, the latest Elmo, Bert and the like to obtain a Word vector matrix, and calculating the fixed-length vector expression of Chinese words to facilitate the later vector expression of the text.

And (1.2) constructing a proper training set (requiring that each text sample in the training set should contain labels of abstract and emotion category at the same time), and performing Chinese word segmentation and part of speech extraction on each text sample in the training set to construct a training set vocabulary. Then, initializing an embedding layer (embedding) of the text abstract and emotion classification combined model by using the word vector matrix obtained by pre-training, splicing the embedding vector of the part of speech of each text sample in a training set with the embedding vector of a word, and assuming that the size of a vocabulary of the training set is L and the dimension of the embedding vector of the word is E₁The dimension of the embedding vector of the part of speech is E₂Finally, obtaining the matrix expression of the input text of the training set, wherein the dimensionality of the matrix is L (E)₁+E₂) The number of rows of the matrix is equal to the number of words in the whole training set, each row represents a word, and each text sample in the training set can be represented as a fixed-length input text vector through the matrix of the input text in the training set.

Step 2, performing a stage of training by using a large-scale text abstract data set to obtain initial parameters of the network

(2.1) pre-training a text abstract task, constructing a text abstract module which is realized by a generative text abstract model, inputting the fixed-length input text vector obtained in the step (1) into an Encoder part of the text abstract module, coding the fixed-length input text vector through a bidirectional recurrent neural network Bi-LSTM, and assuming that the vector dimension of an output layer of the bidirectional recurrent neural network Bi-LSTM is H, so as to obtain an initial feature of the fixed-length input text vector, the size of the initial feature is L multiplied by H, the initial feature is a weighted text vector, then sequentially sending the initial feature into a convolution gate control module and a self-orientation module to obtain the weight distribution of each vocabulary in the fixed-length input text vector (namely one weight vector for each word of the input text), the size of the weight is L multiplied by H, endowing the weight with the initial feature, adjusting the initial feature, and filtering invalid information of the fixed-length input text vector, and obtaining the final vector expression of the vector.

(2.2) initializing the Decoder part by using the characteristics of the last time step of the Encoder part, training to obtain the weight distribution of a text abstract task by referring to the Attention computing mode described above, generating a text abstract by using a Pointer-generator algorithm, and assuming that the length of the text abstract is L^′And obtaining the output characteristics of the Decoder part LSTM module, wherein the size of the output characteristics is L' multiplied by H, and the output characteristics are text vectors. The network parameters are updated according to the following loss function.

y_tA real label representing a text abstract, x represents the input text of the Encoder part, namely a reference abstract (a reference abstract corresponding to a text sample in a training set), p (y)_t| x) represents the conditional probability, i.e. the probability of giving a section of text, then generating the word in the t-th step, and repeatedly updating the network parameters by using a gradient descent algorithm until the loss function L is reached_sAnd (6) converging.

Step 3, performing combined training on text abstract and emotion classification tasks by utilizing a layered end-to-end model

And (3.1) adding an emotion classification layer on the basis of the generative text abstract model in the process 2, constructing a layered end-to-end model, initializing model parameters except for emotion classification by using network parameters obtained by training in the process 2, and constructing a combined loss function L of the text abstract task and the emotion classification task. Splicing the initial features with the size of L multiplied by H obtained in the process 2.1 and the output features with the size of L' multipliedby H learned for the emotion classification task obtained in the process 2.2, then performing maximum pooling operation to obtain an emotion vector with the dimension of H, and finally updating network parameters through a layered end-to-end model and a cross entropy loss function through gradient descent by using a layered end-to-end model, namely performing end-to-end training on the whole layered end-to-end model until a joint loss function L converges. The joint loss function is defined as follows:

L＝L_s+λL_c

L_c＝-logp(l|x)

wherein L is_sRepresenting the loss of text summarization task, and adopting cross entropy for calculation, L_cRepresenting the loss of the emotion classification task, y_tAnd l respectively represents a text abstract and a real label of an emotion category, x represents a group input by the Encoder part (represents a reference abstract of a text sample in a training set and is used for calculating loss), and lambda is a hyper-parameter used for balancing two types of loss and preset according to actual conditions.

And (3.2) after the training of the layered end-to-end model is finished, the layered end-to-end model can be directly applied to text summarization and emotion classification tasks.

For specific tasks, such as user comments, customer service conversations, news and the like, the domain data set of the task can be used for embedding pre-training and text summarization task pre-training according to the steps 1-3, and then the layered end-to-end model is used for carrying out joint training on emotion classification and text summarization tasks, so that a better effect can be achieved.

The text abstract module processes a universal architecture Encode-Decoder and a classical model PointerNet based on natural language, and is improved as follows: a convolution gate control unit and a Self-extension mechanism are added in an Encoder part, and the weight of a key vocabulary in an input text is larger by weighting an output vector of each time step of the input text in an Encode r part, so that a text vector learned in the Encoder part can contain more key information of the input text. And moreover, as a combined task data set of text summarization and emotion classification is more difficult to construct, a large-scale text summarization data set is utilized in the step 2, only the text summarization task is trained to obtain initial parameters of a generative text summarization model, and the training process in the step 3 can be converged more quickly.

And (3) adding an emotion classification layer in the Decoder part according to the structure and the model parameters of the generative text abstract model in the step (2), constructing a layered end-to-end model, and performing joint training on a text abstract and an emotion classification task. The two tasks of the Decoder part are used for carrying out Attention calculation on the output of the Encoder part respectively, the generated abstract can better cover the vocabulary containing emotion information in the input text due to the adoption of the joint loss function of the two tasks, and the hierarchical end-to-end model can predict emotion types more accurately due to the fact that the text abstract can extract key information.

Referring to fig. 3, for example, a test set text sample (today sunny).

The invention is not limited to the foregoing embodiments. The invention extends to any novel feature or any novel combination of features disclosed in this specification and any novel method or process steps or any novel combination of features disclosed.

Claims

1. A text abstract and emotion classification combined training method is characterized in that the method is realized by adopting a text abstract and emotion classification combined model, and the text abstract and emotion classification combined model comprises the following steps: the system comprises a pre-training module, a text summarization module and a layered end-to-end joint training module; the pre-training module is used for pre-processing original texts in a training set; the text abstract module is used for generating a text abstract; the layered end-to-end joint training module is used for combining the emotion classification and text abstract tasks to generate emotion classification categories which are adaptive to the generated text abstract;

the method specifically comprises the following steps:

step 1, preprocessing original texts in a training set

(1.1) preprocessing a text by adopting a pre-training module, training by using large-scale Chinese Wikipedia data through Word2Vec, Glove, Elmo and Bert models to obtain a Word vector matrix, and calculating fixed-length vector expression of Chinese words so as to conveniently perform vector expression on the text at a later stage;

(1.2) constructing a proper training set, wherein each text sample in the training set simultaneously contains a reference abstract and a label of an emotion category, and performing Chinese word segmentation and part of speech extraction on each text sample in the training set to construct a training set vocabulary; then, initializing an embedding layer (embedding) of the text abstract and emotion classification combined model by using the word vector matrix obtained in the step (1.1), splicing the embedding vector of the part of speech of each text sample in the training set with the embedding vector of a word, and assuming that the size of the vocabulary of the training set is L and the dimension of the embedding vector of the word is E₁The dimension of the embedding vector of the part of speech is E₂Finally, obtaining the matrix expression of the input text of the training set, wherein the matrix dimension of the input text of the training set is L (E)₁+E₂) The number of rows of the matrix is equal to the number of words in the whole training set, each row represents one word, and each text sample in the training set can be represented as a fixed-length input text vector through the matrix of the input text in the training set;

(2.1) pre-training a text abstract task, constructing a text abstract module, wherein the text abstract module is realized through a generative text abstract model, the generative text abstract model is combined with an Attention mechanism and comprises an encoding layer (Encoder) part and a decoding layer (Decoder), the Encoder part adopts a Bi-directional recurrent neural network Bi-LSTM and optimizes hidden state semantic representation by using a convolution gate control unit and a self-Attention (self-Attention) mechanism, and the Decoder part adopts a Pointer-Generator mechanism to generate a text abstract;

the method specifically comprises the following steps: inputting the fixed-length input text vector obtained in the step 1 into an Encoder part of the generative text abstract model, coding the fixed-length input text vector through a bidirectional recurrent neural network Bi-LSTM, and assuming that the vector dimension of an output layer of the bidirectional recurrent neural network Bi-LSTM is H, so as to obtain initial characteristics of the fixed-length input text vector, wherein the size of the initial characteristics is L multiplied by H, then sequentially sending the initial characteristics into a convolution gating module and a self-orientation module of the Encoder part, so as to obtain weight distribution of each vocabulary in the fixed-length input text vector, wherein the size of the weight distribution is L multiplied by H, endowing the initial characteristics with the weight distribution, adjusting the initial characteristics, filtering invalid information of the fixed-length input text vector, and obtaining final vector expression of each text sample in the training set;

(2.2) initializing the Decoder part of the generative text abstract model by using the characteristics of the last time step of the Encoder part, training to obtain the weight distribution of a text abstract task by adopting an Attention mechanism calculation mode, generating a text abstract by using a Pointer-Generator (Pointer-Generator) algorithm, and assuming that the length of the text abstract is L ', obtaining the output characteristics of the LSTM module of the Decoder part, wherein the size of the output characteristics is L' × H; updating the network parameters of the generative text abstract model according to the following loss function:

wherein, y_tRepresenting the real label at the time t in the reference abstract, x representing the reference abstract corresponding to the text sample in the training set input by the Encoder part, p (y)_t| x) represents conditional probability, and network parameters of the generative text abstract model are repeatedly updated by using a gradient descent algorithm until the loss function L_sConverging;

(3.1) adding an emotion classification layer on the basis of the generated text abstract model in the step 2, constructing a layered end-to-end joint training module, wherein the layered end-to-end joint training module is realized by adopting a layered end-to-end model, initializing the text abstract and emotion classification joint model parameters outside the emotion classification layer by using the network parameters of the generated text abstract model obtained by training in the step 2, and constructing a joint loss function L of the text abstract and emotion classification joint model; splicing the initial features obtained in the step (2.1) and the output features learned for emotion classification tasks obtained in the step (2.2), then performing maximum pooling operation to obtain emotion vectors with dimension H, finally updating network parameters of the text abstract and emotion classification combined model through the hierarchical end-to-end model and a cross entropy loss function by gradient descent, namely performing end-to-end training on the whole hierarchical end-to-end model until the combined loss function L is converged, wherein the combined loss function L is defined as follows:

L＝L_s+λL_c

L_c＝-logp(l|x)

wherein L is_sRepresenting the loss of text summarization task, and adopting cross entropy for calculation, L_cRepresenting the loss of the emotion classification task, y_tThe real label represents the t moment in the reference abstract, l represents the real label of the emotion category, x represents the reference abstract corresponding to the text sample in the training set input by the Encoder part, and lambda is a hyper-parameter and is used for balancing the two types of losses and preset according to the actual situation;

and (3.2) obtaining the trained text abstract and emotion classification combined model after the training of the layered end-to-end model is finished, wherein the trained text abstract and emotion classification combined model is directly applied to a text abstract and emotion classification task.

2. The text summarization and emotion classification joint training method of claim 1, wherein the Attention mechanism in step (2.2) is specifically: the Attention mechanism is an automatic weighting defined as: given a set of vector set values, and a vector query, the Attention mechanism is a mechanism that computes a weighted sum of the vector set values from the vector query, the query and value being two vectors of the same dimension, by the following equation f (m)_t，m_s) Any one of the formulas in the right side can yield a number:

the Attention mechanism is a channel connecting an encoding layer (Encoder) and a decoding layer (Decode), since a hidden state of each recurrent neural network RNN unit is maintained in the Encoder portion, assuming that a hidden state of an s-th time step of the Encoder portion is m_sThen, for each time step in the Decoder part, calculating to obtain the hidden state of the current time step, and assuming that the hidden state of the t-th time step is m_tThen, the weight of the s word in the input text of the Encoder part corresponding to the t time step can be calculated and recorded as f (m)_t，m_s) And performing softmax calculation to obtain a final weight a_ts，a_tsRepresenting the probability of inputting a word of text in the training set; dot, general, concat, and percentron represent weight calculation formulas of four current main streams, wherein n represents the total word number of the input text of the Encoder part, and W represents the total word number of the input text of the Encoder part_aRepresenting a parameter matrix, v_aRepresenting a parameter vector, U_aSame as W_aUpdating the parameters through back propagation in the training process;

if the query is contained in the values set, in order to be a Self-Attention (Self-Attention) mechanism, a Self-Attention module can learn key information and structural features of the input text, and the Self-Attention module is introduced into the Encoder part to extract the key information of the input text.

3. The text summarization and emotion classification joint training method of claim 2, wherein, in the step (2.2), the Pointer-Generator algorithm is used to solve the problem of generating unknown words (OOV) and low-frequency words in the summary, and the probability of automatically learning pointing (Pointer) and generating (Generator) at each time step of the Decoder part is defined as follows:

wherein

Is a context vector, i.e. corresponding to the hidden state of the Encoder part at each time t, s_tIs a hidden state of the Decoder part at time t, x_tIs the input at time t, σ represents a sigmoid function, maps the value to 0-1, scalar P_gen∈[0，1]Representing generator probability, 1-P_genRepresenting pointer probability; when P is present_genWhen biased to 1, the vocabulary is generated normally, and when biased to 0, the probability distribution a from the Attention is_tWherein a corresponding word is sampled, wherein a_tThe probability vector representing the entire input text word in the training set, and the final generated word w, are defined as follows:

wherein w represents the entire vocabulary of the training set, w_iThe ith word representing the vocabulary in question,

4. The text summarization and emotion classification joint training method of any of claims 1-3, wherein the training set is a data set of user comments, customer service conversations, or news domains.