CN110929030A - Text abstract and emotion classification combined training method - Google Patents

Text abstract and emotion classification combined training method Download PDF

Info

Publication number
CN110929030A
CN110929030A CN201911080385.6A CN201911080385A CN110929030A CN 110929030 A CN110929030 A CN 110929030A CN 201911080385 A CN201911080385 A CN 201911080385A CN 110929030 A CN110929030 A CN 110929030A
Authority
CN
China
Prior art keywords
text
abstract
training
vector
emotion classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911080385.6A
Other languages
Chinese (zh)
Other versions
CN110929030B (en
Inventor
高建彬
潘慧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201911080385.6A priority Critical patent/CN110929030B/en
Publication of CN110929030A publication Critical patent/CN110929030A/en
Application granted granted Critical
Publication of CN110929030B publication Critical patent/CN110929030B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a text abstract and emotion classification combined training method, which is realized by adopting a text abstract and emotion classification combined model and specifically comprises the following steps: preprocessing a text, and constructing a training set vocabulary; constructing a text abstract model, and performing text abstract task pre-training; and adding an emotion classification layer on the basis of the text abstract model, constructing a layered end-to-end model, and performing joint training on emotion classification and text abstract tasks. According to the text abstract and emotion classification combined training method, content consistency between the generated abstract and the input text can be improved through combined training of two types of tasks, the generated abstract can better contain emotion information of the input text, and key information of the input text is extracted through the abstract tasks, so that emotion prediction is more accurate.

Description

Text abstract and emotion classification combined training method
Technical Field
The invention relates to a text abstract and emotion classification method in the field of natural language processing, in particular to a text abstract and emotion classification based combined training method.
Background
With the explosive growth of text information in recent years, people can be exposed to massive text information, such as news, microblogs, blogs, reports, treatises and the like every day. The text abstract has wide application scenes, is visual and can be used for generating news headlines, thesis keywords, abstracts and the like; in a broad view, the text summarization technology can also be applied to result optimization of search engines such as Google and Baidu, and all tasks requiring extraction of key information from a text to form a refined expression can be solved by using an automatic text summarization technology. The mainstream methods of text summarization are divided into two categories: an extraction formula (active) and a production formula (inactive). The extraction method extracts some representative text segments from the original document set to form an abstract, and the segments can be sentences, clauses, paragraphs or measures in the whole document according to different segmentation modes of the input text. The generation method is based on a deep learning technology, adopts a Sequence-to-Sequence framework and is added with an Attention mechanism at the same time, and generates a summary containing key contents of a text aiming at an input text. Compared with an extraction method, the generation method is more complex, but the obtained abstract is more refined and concise, and the expression is smoother.
Emotion classification is the assignment of an emotion tag to text to determine attitudes or opinions in the text. This is also known as opinion mining, i.e., extracting opinions or attitudes of the speaker to derive. Emotion classification includes unsupervised and supervised methods, including the use of emotion dictionaries, parsing, syntactic patterns, and the like. In the supervision method, a traditional machine learning method (such as a support vector machine, maximum entropy, naive Bayes and the like) and feature combinations are mainly adopted, and along with the development of deep learning, a deep model combining a cyclic neural network (R NN), a Convolutional Neural Network (CNN) and an Attention mechanism is well developed in an emotion classification task.
The text summarization task and the emotion classification are both targeted at the main idea of text mining. Text summaries describe text in a more concrete way using words and sentences, whereas emotion classification summarizes text in a more abstract way using labels. In previous research work, the text summarization and emotion classification tasks were trained separately through models, so that the joint expression of the two tasks was not well learned between the two models.
Disclosure of Invention
Aiming at the existing problems, the invention combines with an Attention mechanism in deep learning and utilizes a layered end-to-end model framework to carry out joint training on the text abstract and the emotion classification task so as to simultaneously improve the learning effect of the text abstract and the emotion classification task.
The invention provides a text abstract and emotion classification combined training method, which comprises the following specific steps:
step 1: and (3) preprocessing the text, training by using large-scale data of Chinese Wikipedia and the like through Word2Vec, Glove, the latest Elmo, Bert and other language models to obtain a Word vector matrix, and calculating the fixed-length vector expression of Chinese words to facilitate the later vector expression of the text. And constructing a proper training set (required to be specific to each text sample and simultaneously containing abstract and emotion category labels), performing Chinese word segmentation and part-of-speech extraction on the text in the training set, and constructing a training set vocabulary.
Step 2: and (2) according to the word vector matrix obtained in the step (1), representing the text after word segmentation of the training set as a fixed-length vector, inputting the fixed-length vector as a model, performing text abstract task pre-training, constructing a text abstract model, and updating network parameters by using a gradient descent algorithm by using a large-scale text abstract data set until a loss function of the text abstract model is converged.
And step 3: performing text abstract and emotion classification task combined training, taking parameters obtained by the text abstract model training in the step 2 as initialization, adding an emotion classification layer on the basis of the text abstract model, constructing a combined loss function of the text abstract task and the emotion classification task, and finally performing end-to-end training on the whole network by using a gradient descent algorithm until the combined loss function is converged.
This hierarchy establishes a tight relationship between the text summary and the sentiment classification, so these two tasks can be promoted over each other. After the text is compressed through the text summarization layer, the emotion classifier can predict the emotion labels of the more refined text more easily. In addition, by adding the convolution gating module, the text abstract layer can also learn the importance distribution of the input text, obtain the weight distribution of the words in the input text, and remove redundant and misleading information harmful to the prediction emotion. And moreover, the emotion classification task can provide more important supervision signals for the text summarization task and guide the summarization component to acquire the emotion tendencies of the source text, and the content consistency between the generated summarization and the input text can be improved through the joint training of the two types of tasks. The method has the advantages that pre-training is carried out through a large-scale text abstract data set, partial parameters of a text abstract network are initialized, network convergence is accelerated, and a learning effect is improved. The method has the advantages that joint training is carried out on emotion analysis and text abstract tasks by utilizing a layered end-to-end model framework, text abstract and emotion classification level are improved, so that the generated abstract can better contain emotion information of an input text, commercial application is facilitated, and key information of the input text is extracted through the abstract tasks, so that emotion prediction is more accurate.
Drawings
FIG. 1 shows an Encoder partial convolution gate control unit
FIG. 2 is a schematic diagram of a text summarization and emotion classification combined model of the present invention
FIG. 3 is a diagram of the working effect of the text summarization and emotion classification combined model of the present invention
Detailed Description
All of the features disclosed in this application, or all of the steps in any method or process disclosed, may be combined in any combination, except combinations where mutually exclusive features or steps are present. Any feature disclosed in this application (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving equivalent or similar purposes, unless expressly stated otherwise. That is, unless expressly stated otherwise, each feature is only an example of a generic series of equivalent or similar features.
In the invention, a layered end-to-end model is designed to carry out joint training on emotion classification and text summarization tasks, the layered end-to-end model comprises a text summarization layer and an emotion classification layer, and the text summarization layer compresses a source text into a short sentence so as to generate a text summary; and the emotion classification layer is used for further summarizing the generated text abstract into an emotion category.
This hierarchy establishes a tight relationship between the text summary and the sentiment classification, so these two tasks can be promoted over each other. After the text is compressed through the text summarization layer, the emotion classifier can predict the emotion labels of the more refined text more easily. In addition, by adding a convolution gating module and referring to fig. 1, after the output of a Recurrent Neural Network (RNN) encoder, a one-dimensional convolution is realized by using a structure similar to inclusion, wherein k represents the size of a convolution kernel, the module uses a Convolutional Neural Network (CNN) to improve semantic representation of RNN output, so as to enhance the relation between the semantic representation and context, and a text abstract layer can also learn importance distribution of an input text, obtain weight distribution of words in the input text, and remove redundant and misleading information harmful to prediction emotion. And moreover, the emotion classification task can provide more important supervision signals for the text summarization task and guide the summarization component to acquire the emotion tendencies of the source text, and the content consistency between the generated summarization and the input text can be improved through the joint training of the two types of tasks.
The method of the invention is realized by adopting a text abstract and emotion classification combined model shown in figure 2, and the text abstract and emotion classification combined model comprises the following steps: a pre-training module (implemented by using models such as Word2Vec, Glove, and Elmo, Bert), a text summarization module, and a hierarchical end-to-end joint training module.
The pre-training module is used for pre-processing original texts in a training set; the text abstract module is used for generating a text abstract; the layered end-to-end module is used for combining the emotion classification task and the text abstract task to generate an emotion classification category which is adaptive to the text abstract.
The text abstract module is realized by adopting a generating text abstract model, is combined with an Attention mechanism and comprises an encoding layer (En coder) and a decoding layer (Decoder) structure, wherein the Encoder structure adopts a Bi-directional recurrent neural network Bi-LSTM, a convolution gate control unit and a self-Attention (self-Attention) mechanism are used for optimizing hidden state semantic representation, and the Decoder layer adopts a Pointer-Generator mechanism to generate a text abstract.
The Attention mechanism is widely applied to natural language tasks, and can be regarded as an automatic weighting, which is roughly defined as: given a set of vector set values and a vector query, the Attention mechanism is a mechanism for calculating the weighted sum of the vector set values according to the vector query, the query and the value are two vectors with the same dimension, and a number can be obtained through any formula in the formula (2). Currently, the following calculation formulas are mainstream:
Figure BDA0002263773290000041
Figure BDA0002263773290000042
the Attention mechanism is a channel connecting an encoding layer (Encoder) and a decoding layer (Decoder). Since the hidden state of each recurrent neural network RNN unit is preserved in the Encode, the hidden state m of the s-th time step of the Encode is assumedsThen, for each time step in the Decoder, calculating to obtain the hidden state of the current time step, and assuming that the hidden state of the t-th time step is mtThen, the weight of the s-th word of the input text of the corresponding coding part at the t-th time step can be calculated according to the formula and is marked as f (m)t,ms) And performing softmax calculation to obtain a final weight ats. dot, general, concat, and percentron represent weight calculation formulas of four current mainstream. Wherein n represents the total number of words of the input text of the encoding part, WaRepresenting a parameter matrix, vaRepresenting a parameter vector, UaSame as WaAnd updating the parameters through back propagation in the training process.
If the query is contained in the values set, it is a Self-Attention (Self-Attention) mechanism. The invention introduces the Self-orientation module in the Encoder part to extract the key information of the input text.
The Pointer-Generator mechanism can effectively solve the problem of generating an unknown word (OOV) and a low-frequency word in the summary, and the probability of automatically learning pointing (Pointer) and generating (Generator) through a network at each time step of the Decoder is defined as follows:
Figure BDA0002263773290000043
wherein
Figure BDA0002263773290000044
Is a context vector (corresponding to the hidden state of the Encoder at each t-time), stIs a hidden state of the Decoder part at time t, xtIs the input at time t, σ represents a sigmoid function, maps the value to 0-1, scalar Pgen∈[0,1]Representing generator probability, 1-PgenRepresenting pointer probabilities. When P is presentgenWhen biased to 1, the vocabulary is generated normally, and when biased to 0, the probability distribution a from the Attention ist(atsRepresenting the probability of entering a word of text, atProbability vector representing the entire input text word) samples a corresponding word, and the final resulting word w is defined as follows:
Figure BDA0002263773290000045
where w represents the entire vocabulary of the training set, wiThe ith word representing the vocabulary of words,
Figure BDA0002263773290000046
denotes the weight, P, of the ith word Attention in the vocabulary at the t-th time step of the Decodervocab(w) represents the lexical probability distribution when words are spoken by the model in generator,
Figure BDA0002263773290000047
represents the vocabulary probability distribution when the words are output in the pointer mode, and P (w) represents the total vocabulary probability distribution, and determines the final generated words.
The invention provides a text abstract and emotion classification combined training method, which comprises the following steps:
step 1, preprocessing original texts in a training set
(1.1) preprocessing the text by adopting a pre-training module, training by using large-scale data of Chinese Wikipedia and the like through models of Word2Vec, Glove, the latest Elmo, Bert and the like to obtain a Word vector matrix, and calculating the fixed-length vector expression of Chinese words to facilitate the later vector expression of the text.
And (1.2) constructing a proper training set (requiring that each text sample in the training set should contain labels of abstract and emotion category at the same time), and performing Chinese word segmentation and part of speech extraction on each text sample in the training set to construct a training set vocabulary. Then, initializing an embedding layer (embedding) of the text abstract and emotion classification combined model by using the word vector matrix obtained by pre-training, splicing the embedding vector of the part of speech of each text sample in a training set with the embedding vector of a word, and assuming that the size of a vocabulary of the training set is L and the dimension of the embedding vector of the word is E1The dimension of the embedding vector of the part of speech is E2Finally, obtaining the matrix expression of the input text of the training set, wherein the dimensionality of the matrix is L (E)1+E2) The number of rows of the matrix is equal to the number of words in the whole training set, each row represents a word, and each text sample in the training set can be represented as a fixed-length input text vector through the matrix of the input text in the training set.
Step 2, performing a stage of training by using a large-scale text abstract data set to obtain initial parameters of the network
(2.1) pre-training a text abstract task, constructing a text abstract module which is realized by a generative text abstract model, inputting the fixed-length input text vector obtained in the step (1) into an Encoder part of the text abstract module, coding the fixed-length input text vector through a bidirectional recurrent neural network Bi-LSTM, and assuming that the vector dimension of an output layer of the bidirectional recurrent neural network Bi-LSTM is H, so as to obtain an initial feature of the fixed-length input text vector, the size of the initial feature is L multiplied by H, the initial feature is a weighted text vector, then sequentially sending the initial feature into a convolution gate control module and a self-orientation module to obtain the weight distribution of each vocabulary in the fixed-length input text vector (namely one weight vector for each word of the input text), the size of the weight is L multiplied by H, endowing the weight with the initial feature, adjusting the initial feature, and filtering invalid information of the fixed-length input text vector, and obtaining the final vector expression of the vector.
(2.2) initializing the Decoder part by using the characteristics of the last time step of the Encoder part, training to obtain the weight distribution of a text abstract task by referring to the Attention computing mode described above, generating a text abstract by using a Pointer-generator algorithm, and assuming that the length of the text abstract is LAnd obtaining the output characteristics of the Decoder part LSTM module, wherein the size of the output characteristics is L' multiplied by H, and the output characteristics are text vectors. The network parameters are updated according to the following loss function.
Figure BDA0002263773290000051
ytA real label representing a text abstract, x represents the input text of the Encoder part, namely a reference abstract (a reference abstract corresponding to a text sample in a training set), p (y)t| x) represents the conditional probability, i.e. the probability of giving a section of text, then generating the word in the t-th step, and repeatedly updating the network parameters by using a gradient descent algorithm until the loss function L is reachedsAnd (6) converging.
Step 3, performing combined training on text abstract and emotion classification tasks by utilizing a layered end-to-end model
And (3.1) adding an emotion classification layer on the basis of the generative text abstract model in the process 2, constructing a layered end-to-end model, initializing model parameters except for emotion classification by using network parameters obtained by training in the process 2, and constructing a combined loss function L of the text abstract task and the emotion classification task. Splicing the initial features with the size of L multiplied by H obtained in the process 2.1 and the output features with the size of L' multipliedby H learned for the emotion classification task obtained in the process 2.2, then performing maximum pooling operation to obtain an emotion vector with the dimension of H, and finally updating network parameters through a layered end-to-end model and a cross entropy loss function through gradient descent by using a layered end-to-end model, namely performing end-to-end training on the whole layered end-to-end model until a joint loss function L converges. The joint loss function is defined as follows:
L=Ls+λLc
Figure BDA0002263773290000061
Lc=-logp(l|x)
wherein L issRepresenting the loss of text summarization task, and adopting cross entropy for calculation, LcRepresenting the loss of the emotion classification task, ytAnd l respectively represents a text abstract and a real label of an emotion category, x represents a group input by the Encoder part (represents a reference abstract of a text sample in a training set and is used for calculating loss), and lambda is a hyper-parameter used for balancing two types of loss and preset according to actual conditions.
And (3.2) after the training of the layered end-to-end model is finished, the layered end-to-end model can be directly applied to text summarization and emotion classification tasks.
For specific tasks, such as user comments, customer service conversations, news and the like, the domain data set of the task can be used for embedding pre-training and text summarization task pre-training according to the steps 1-3, and then the layered end-to-end model is used for carrying out joint training on emotion classification and text summarization tasks, so that a better effect can be achieved.
The text abstract module processes a universal architecture Encode-Decoder and a classical model PointerNet based on natural language, and is improved as follows: a convolution gate control unit and a Self-extension mechanism are added in an Encoder part, and the weight of a key vocabulary in an input text is larger by weighting an output vector of each time step of the input text in an Encode r part, so that a text vector learned in the Encoder part can contain more key information of the input text. And moreover, as a combined task data set of text summarization and emotion classification is more difficult to construct, a large-scale text summarization data set is utilized in the step 2, only the text summarization task is trained to obtain initial parameters of a generative text summarization model, and the training process in the step 3 can be converged more quickly.
And (3) adding an emotion classification layer in the Decoder part according to the structure and the model parameters of the generative text abstract model in the step (2), constructing a layered end-to-end model, and performing joint training on a text abstract and an emotion classification task. The two tasks of the Decoder part are used for carrying out Attention calculation on the output of the Encoder part respectively, the generated abstract can better cover the vocabulary containing emotion information in the input text due to the adoption of the joint loss function of the two tasks, and the hierarchical end-to-end model can predict emotion types more accurately due to the fact that the text abstract can extract key information.
Referring to fig. 3, for example, a test set text sample (today sunny).
The invention is not limited to the foregoing embodiments. The invention extends to any novel feature or any novel combination of features disclosed in this specification and any novel method or process steps or any novel combination of features disclosed.

Claims (4)

1. A text abstract and emotion classification combined training method is characterized in that the method is realized by adopting a text abstract and emotion classification combined model, and the text abstract and emotion classification combined model comprises the following steps: the system comprises a pre-training module, a text summarization module and a layered end-to-end joint training module; the pre-training module is used for pre-processing original texts in a training set; the text abstract module is used for generating a text abstract; the layered end-to-end joint training module is used for combining the emotion classification and text abstract tasks to generate emotion classification categories which are adaptive to the generated text abstract;
the method specifically comprises the following steps:
step 1, preprocessing original texts in a training set
(1.1) preprocessing a text by adopting a pre-training module, training by using large-scale Chinese Wikipedia data through Word2Vec, Glove, Elmo and Bert models to obtain a Word vector matrix, and calculating fixed-length vector expression of Chinese words so as to conveniently perform vector expression on the text at a later stage;
(1.2) constructing a proper training set, wherein each text sample in the training set simultaneously contains a reference abstract and a label of an emotion category, and performing Chinese word segmentation and part of speech extraction on each text sample in the training set to construct a training set vocabulary; then, initializing an embedding layer (embedding) of the text abstract and emotion classification combined model by using the word vector matrix obtained in the step (1.1), splicing the embedding vector of the part of speech of each text sample in the training set with the embedding vector of a word, and assuming that the size of the vocabulary of the training set is L and the dimension of the embedding vector of the word is E1The dimension of the embedding vector of the part of speech is E2Finally, obtaining the matrix expression of the input text of the training set, wherein the matrix dimension of the input text of the training set is L (E)1+E2) The number of rows of the matrix is equal to the number of words in the whole training set, each row represents one word, and each text sample in the training set can be represented as a fixed-length input text vector through the matrix of the input text in the training set;
step 2, performing a stage of training by using a large-scale text abstract data set to obtain initial parameters of the network
(2.1) pre-training a text abstract task, constructing a text abstract module, wherein the text abstract module is realized through a generative text abstract model, the generative text abstract model is combined with an Attention mechanism and comprises an encoding layer (Encoder) part and a decoding layer (Decoder), the Encoder part adopts a Bi-directional recurrent neural network Bi-LSTM and optimizes hidden state semantic representation by using a convolution gate control unit and a self-Attention (self-Attention) mechanism, and the Decoder part adopts a Pointer-Generator mechanism to generate a text abstract;
the method specifically comprises the following steps: inputting the fixed-length input text vector obtained in the step 1 into an Encoder part of the generative text abstract model, coding the fixed-length input text vector through a bidirectional recurrent neural network Bi-LSTM, and assuming that the vector dimension of an output layer of the bidirectional recurrent neural network Bi-LSTM is H, so as to obtain initial characteristics of the fixed-length input text vector, wherein the size of the initial characteristics is L multiplied by H, then sequentially sending the initial characteristics into a convolution gating module and a self-orientation module of the Encoder part, so as to obtain weight distribution of each vocabulary in the fixed-length input text vector, wherein the size of the weight distribution is L multiplied by H, endowing the initial characteristics with the weight distribution, adjusting the initial characteristics, filtering invalid information of the fixed-length input text vector, and obtaining final vector expression of each text sample in the training set;
(2.2) initializing the Decoder part of the generative text abstract model by using the characteristics of the last time step of the Encoder part, training to obtain the weight distribution of a text abstract task by adopting an Attention mechanism calculation mode, generating a text abstract by using a Pointer-Generator (Pointer-Generator) algorithm, and assuming that the length of the text abstract is L ', obtaining the output characteristics of the LSTM module of the Decoder part, wherein the size of the output characteristics is L' × H; updating the network parameters of the generative text abstract model according to the following loss function:
Figure FDA0002263773280000021
wherein, ytRepresenting the real label at the time t in the reference abstract, x representing the reference abstract corresponding to the text sample in the training set input by the Encoder part, p (y)t| x) represents conditional probability, and network parameters of the generative text abstract model are repeatedly updated by using a gradient descent algorithm until the loss function LsConverging;
step 3, performing combined training on text abstract and emotion classification tasks by utilizing a layered end-to-end model
(3.1) adding an emotion classification layer on the basis of the generated text abstract model in the step 2, constructing a layered end-to-end joint training module, wherein the layered end-to-end joint training module is realized by adopting a layered end-to-end model, initializing the text abstract and emotion classification joint model parameters outside the emotion classification layer by using the network parameters of the generated text abstract model obtained by training in the step 2, and constructing a joint loss function L of the text abstract and emotion classification joint model; splicing the initial features obtained in the step (2.1) and the output features learned for emotion classification tasks obtained in the step (2.2), then performing maximum pooling operation to obtain emotion vectors with dimension H, finally updating network parameters of the text abstract and emotion classification combined model through the hierarchical end-to-end model and a cross entropy loss function by gradient descent, namely performing end-to-end training on the whole hierarchical end-to-end model until the combined loss function L is converged, wherein the combined loss function L is defined as follows:
L=Ls+λLc
Figure FDA0002263773280000022
Lc=-logp(l|x)
wherein L issRepresenting the loss of text summarization task, and adopting cross entropy for calculation, LcRepresenting the loss of the emotion classification task, ytThe real label represents the t moment in the reference abstract, l represents the real label of the emotion category, x represents the reference abstract corresponding to the text sample in the training set input by the Encoder part, and lambda is a hyper-parameter and is used for balancing the two types of losses and preset according to the actual situation;
and (3.2) obtaining the trained text abstract and emotion classification combined model after the training of the layered end-to-end model is finished, wherein the trained text abstract and emotion classification combined model is directly applied to a text abstract and emotion classification task.
2. The text summarization and emotion classification joint training method of claim 1, wherein the Attention mechanism in step (2.2) is specifically: the Attention mechanism is an automatic weighting defined as: given a set of vector set values, and a vector query, the Attention mechanism is a mechanism that computes a weighted sum of the vector set values from the vector query, the query and value being two vectors of the same dimension, by the following equation f (m)t,ms) Any one of the formulas in the right side can yield a number:
Figure FDA0002263773280000031
Figure FDA0002263773280000032
the Attention mechanism is a channel connecting an encoding layer (Encoder) and a decoding layer (Decode), since a hidden state of each recurrent neural network RNN unit is maintained in the Encoder portion, assuming that a hidden state of an s-th time step of the Encoder portion is msThen, for each time step in the Decoder part, calculating to obtain the hidden state of the current time step, and assuming that the hidden state of the t-th time step is mtThen, the weight of the s word in the input text of the Encoder part corresponding to the t time step can be calculated and recorded as f (m)t,ms) And performing softmax calculation to obtain a final weight ats,atsRepresenting the probability of inputting a word of text in the training set; dot, general, concat, and percentron represent weight calculation formulas of four current main streams, wherein n represents the total word number of the input text of the Encoder part, and W represents the total word number of the input text of the Encoder partaRepresenting a parameter matrix, vaRepresenting a parameter vector, UaSame as WaUpdating the parameters through back propagation in the training process;
if the query is contained in the values set, in order to be a Self-Attention (Self-Attention) mechanism, a Self-Attention module can learn key information and structural features of the input text, and the Self-Attention module is introduced into the Encoder part to extract the key information of the input text.
3. The text summarization and emotion classification joint training method of claim 2, wherein, in the step (2.2), the Pointer-Generator algorithm is used to solve the problem of generating unknown words (OOV) and low-frequency words in the summary, and the probability of automatically learning pointing (Pointer) and generating (Generator) at each time step of the Decoder part is defined as follows:
Figure FDA0002263773280000041
wherein
Figure FDA0002263773280000042
Is a context vector, i.e. corresponding to the hidden state of the Encoder part at each time t, stIs a hidden state of the Decoder part at time t, xtIs the input at time t, σ represents a sigmoid function, maps the value to 0-1, scalar Pgen∈[0,1]Representing generator probability, 1-PgenRepresenting pointer probability; when P is presentgenWhen biased to 1, the vocabulary is generated normally, and when biased to 0, the probability distribution a from the Attention istWherein a corresponding word is sampled, wherein atThe probability vector representing the entire input text word in the training set, and the final generated word w, are defined as follows:
Figure FDA0002263773280000043
wherein w represents the entire vocabulary of the training set, wiThe ith word representing the vocabulary in question,
Figure FDA0002263773280000044
denotes the weight, P, of the ith word Attention in the vocabulary at the t-th time step of the Decodervocab(w) represents the lexical probability distribution when words are spoken by the model in generator,
Figure FDA0002263773280000045
represents the vocabulary probability distribution when the words are output in the pointer mode, and P (w) represents the total vocabulary probability distribution, and determines the final generated words.
4. The text summarization and emotion classification joint training method of any of claims 1-3, wherein the training set is a data set of user comments, customer service conversations, or news domains.
CN201911080385.6A 2019-11-07 2019-11-07 Text abstract and emotion classification combined training method Active CN110929030B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911080385.6A CN110929030B (en) 2019-11-07 2019-11-07 Text abstract and emotion classification combined training method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911080385.6A CN110929030B (en) 2019-11-07 2019-11-07 Text abstract and emotion classification combined training method

Publications (2)

Publication Number Publication Date
CN110929030A true CN110929030A (en) 2020-03-27
CN110929030B CN110929030B (en) 2022-05-03

Family

ID=69852497

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911080385.6A Active CN110929030B (en) 2019-11-07 2019-11-07 Text abstract and emotion classification combined training method

Country Status (1)

Country Link
CN (1) CN110929030B (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111159416A (en) * 2020-04-02 2020-05-15 腾讯科技(深圳)有限公司 Language task model training method and device, electronic equipment and storage medium
CN111475640A (en) * 2020-04-03 2020-07-31 支付宝(杭州)信息技术有限公司 Text emotion recognition method and device based on emotion abstract
CN111563373A (en) * 2020-04-13 2020-08-21 中南大学 Attribute-level emotion classification method for focused attribute-related text
CN111626041A (en) * 2020-05-07 2020-09-04 杭州东信北邮信息技术有限公司 Music comment generation method based on deep learning
CN111639176A (en) * 2020-05-29 2020-09-08 厦门大学 Real-time event summarization method based on consistency monitoring
CN111897949A (en) * 2020-07-28 2020-11-06 北京工业大学 Guided text abstract generation method based on Transformer
CN111931496A (en) * 2020-07-08 2020-11-13 广东工业大学 Text style conversion system and method based on recurrent neural network model
CN112579739A (en) * 2020-12-23 2021-03-30 合肥工业大学 Reading understanding method based on ELMo embedding and gating self-attention mechanism
CN113111663A (en) * 2021-04-28 2021-07-13 东南大学 Abstract generation method fusing key information
CN113221560A (en) * 2021-05-31 2021-08-06 平安科技(深圳)有限公司 Personality trait and emotion prediction method, personality trait and emotion prediction device, computer device, and medium
CN113282710A (en) * 2021-06-01 2021-08-20 平安国际智慧城市科技股份有限公司 Training method and device of text relation extraction model and computer equipment
CN113380418A (en) * 2021-06-22 2021-09-10 浙江工业大学 System for analyzing and identifying depression through dialog text
CN113468318A (en) * 2020-03-31 2021-10-01 中国电信股份有限公司 Automatic abstract generation method and device and computer readable storage medium
CN113761204A (en) * 2021-09-06 2021-12-07 南京大学 Emoji text emotion analysis method and system based on deep learning
CN113849634A (en) * 2021-03-01 2021-12-28 天翼智慧家庭科技有限公司 Method for improving interpretability of depth model recommendation scheme
CN114255044A (en) * 2020-09-11 2022-03-29 四川大学 Intelligent customer service technology based on cross-media analysis
CN114691858A (en) * 2022-03-15 2022-07-01 电子科技大学 Improved UNILM abstract generation method
CN116432605A (en) * 2023-06-14 2023-07-14 山东大学 Composition comment generation method and device integrating priori knowledge
WO2023173537A1 (en) * 2022-03-17 2023-09-21 平安科技(深圳)有限公司 Text sentiment analysis method and apparatus, device and storage medium
CN117633239A (en) * 2024-01-23 2024-03-01 中国科学技术大学 End-to-end face emotion recognition method combining combined category grammar

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190050443A1 (en) * 2017-08-11 2019-02-14 International Business Machines Corporation Method and system for improving training data understanding in natural language processing
CN109992775A (en) * 2019-03-25 2019-07-09 浙江大学 A kind of text snippet generation method based on high-level semantics

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190050443A1 (en) * 2017-08-11 2019-02-14 International Business Machines Corporation Method and system for improving training data understanding in natural language processing
CN109992775A (en) * 2019-03-25 2019-07-09 浙江大学 A kind of text snippet generation method based on high-level semantics

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SHUMING MA等: "A Hierarchical End-to-End Model for Jointly Improving Text Summarization and Sentiment Classification", 《PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-18)》 *
李雅昆: "基于改进的多层BLSTM的中文分词和标点符号预测", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113468318A (en) * 2020-03-31 2021-10-01 中国电信股份有限公司 Automatic abstract generation method and device and computer readable storage medium
CN111159416B (en) * 2020-04-02 2020-07-17 腾讯科技(深圳)有限公司 Language task model training method and device, electronic equipment and storage medium
CN111159416A (en) * 2020-04-02 2020-05-15 腾讯科技(深圳)有限公司 Language task model training method and device, electronic equipment and storage medium
CN111475640A (en) * 2020-04-03 2020-07-31 支付宝(杭州)信息技术有限公司 Text emotion recognition method and device based on emotion abstract
CN111563373A (en) * 2020-04-13 2020-08-21 中南大学 Attribute-level emotion classification method for focused attribute-related text
CN111563373B (en) * 2020-04-13 2023-08-18 中南大学 Attribute-level emotion classification method for focused attribute-related text
CN111626041A (en) * 2020-05-07 2020-09-04 杭州东信北邮信息技术有限公司 Music comment generation method based on deep learning
CN111626041B (en) * 2020-05-07 2023-09-15 新讯数字科技(杭州)有限公司 Music comment generation method based on deep learning
CN111639176A (en) * 2020-05-29 2020-09-08 厦门大学 Real-time event summarization method based on consistency monitoring
CN111639176B (en) * 2020-05-29 2022-07-01 厦门大学 Real-time event summarization method based on consistency monitoring
CN111931496A (en) * 2020-07-08 2020-11-13 广东工业大学 Text style conversion system and method based on recurrent neural network model
CN111897949B (en) * 2020-07-28 2021-10-26 北京工业大学 Guided text abstract generation method based on Transformer
CN111897949A (en) * 2020-07-28 2020-11-06 北京工业大学 Guided text abstract generation method based on Transformer
CN114255044A (en) * 2020-09-11 2022-03-29 四川大学 Intelligent customer service technology based on cross-media analysis
CN112579739A (en) * 2020-12-23 2021-03-30 合肥工业大学 Reading understanding method based on ELMo embedding and gating self-attention mechanism
CN113849634A (en) * 2021-03-01 2021-12-28 天翼智慧家庭科技有限公司 Method for improving interpretability of depth model recommendation scheme
CN113849634B (en) * 2021-03-01 2024-04-16 天翼视联科技有限公司 Method for improving interpretability of depth model recommendation scheme
CN113111663A (en) * 2021-04-28 2021-07-13 东南大学 Abstract generation method fusing key information
CN113221560A (en) * 2021-05-31 2021-08-06 平安科技(深圳)有限公司 Personality trait and emotion prediction method, personality trait and emotion prediction device, computer device, and medium
CN113282710A (en) * 2021-06-01 2021-08-20 平安国际智慧城市科技股份有限公司 Training method and device of text relation extraction model and computer equipment
CN113380418A (en) * 2021-06-22 2021-09-10 浙江工业大学 System for analyzing and identifying depression through dialog text
CN113761204B (en) * 2021-09-06 2023-07-28 南京大学 Emoji text emotion analysis method and system based on deep learning
CN113761204A (en) * 2021-09-06 2021-12-07 南京大学 Emoji text emotion analysis method and system based on deep learning
CN114691858B (en) * 2022-03-15 2023-10-03 电子科技大学 Improved UNILM digest generation method
CN114691858A (en) * 2022-03-15 2022-07-01 电子科技大学 Improved UNILM abstract generation method
WO2023173537A1 (en) * 2022-03-17 2023-09-21 平安科技(深圳)有限公司 Text sentiment analysis method and apparatus, device and storage medium
CN116432605B (en) * 2023-06-14 2023-09-22 山东大学 Composition comment generation method and device integrating priori knowledge
CN116432605A (en) * 2023-06-14 2023-07-14 山东大学 Composition comment generation method and device integrating priori knowledge
CN117633239A (en) * 2024-01-23 2024-03-01 中国科学技术大学 End-to-end face emotion recognition method combining combined category grammar
CN117633239B (en) * 2024-01-23 2024-05-17 中国科学技术大学 End-to-end face emotion recognition method combining combined category grammar

Also Published As

Publication number Publication date
CN110929030B (en) 2022-05-03

Similar Documents

Publication Publication Date Title
CN110929030B (en) Text abstract and emotion classification combined training method
CN109472024B (en) Text classification method based on bidirectional circulation attention neural network
CN108897857B (en) Chinese text subject sentence generating method facing field
CN110532557B (en) Unsupervised text similarity calculation method
CN113239700A (en) Text semantic matching device, system, method and storage medium for improving BERT
CN110532554A (en) Chinese abstract generation method, system and storage medium
CN109325112A (en) A kind of across language sentiment analysis method and apparatus based on emoji
CN117151220B (en) Entity link and relationship based extraction industry knowledge base system and method
CN113343683A (en) Chinese new word discovery method and device integrating self-encoder and countertraining
CN110442880B (en) Translation method, device and storage medium for machine translation
CN113705238B (en) Method and system for analyzing aspect level emotion based on BERT and aspect feature positioning model
CN115392252A (en) Entity identification method integrating self-attention and hierarchical residual error memory network
CN111145914B (en) Method and device for determining text entity of lung cancer clinical disease seed bank
CN111159405B (en) Irony detection method based on background knowledge
CN113094502A (en) Multi-granularity takeaway user comment sentiment analysis method
CN115374270A (en) Legal text abstract generation method based on graph neural network
CN113934835B (en) Retrieval type reply dialogue method and system combining keywords and semantic understanding representation
CN114169447B (en) Event detection method based on self-attention convolution bidirectional gating cyclic unit network
Paria et al. A neural architecture mimicking humans end-to-end for natural language inference
CN114048314A (en) Natural language steganalysis method
CN116757195B (en) Implicit emotion recognition method based on prompt learning
CN116522165B (en) Public opinion text matching system and method based on twin structure
CN116562275B (en) Automatic text summarization method combined with entity attribute diagram
CN115577111A (en) Text classification method based on self-attention mechanism
CN115840815A (en) Automatic abstract generation method based on pointer key information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant