CN110825869A - Text abstract generating method of variation generation decoder based on copying mechanism - Google Patents

Text abstract generating method of variation generation decoder based on copying mechanism Download PDF

Info

Publication number
CN110825869A
CN110825869A CN201910872440.9A CN201910872440A CN110825869A CN 110825869 A CN110825869 A CN 110825869A CN 201910872440 A CN201910872440 A CN 201910872440A CN 110825869 A CN110825869 A CN 110825869A
Authority
CN
China
Prior art keywords
hidden layer
variable
layer
text
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910872440.9A
Other languages
Chinese (zh)
Inventor
黄晓
滕蔚
林嘉良
保延翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN201910872440.9A priority Critical patent/CN110825869A/en
Publication of CN110825869A publication Critical patent/CN110825869A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The invention provides a text abstract generating method of a variable generation decoder based on a replication mechanism, which generates hidden variables containing potential sentence structure information by utilizing a VAE thought, and maximally utilizes the potential characteristic information of a sentence, so that the generated abstract has stronger structure and readability. A replication mechanism is introduced in a replication part, and a context semantic vector with an attention mechanism is used as the replication part to relieve the common repeated problem of the traditional sequence-to-sequence framework. The generation part and the copy part act together, so that the generated abstract is more accurate.

Description

Text abstract generating method of variation generation decoder based on copying mechanism
Technical Field
The invention relates to the technical field of natural language processing, in particular to a text abstract generating method of a variation generation decoder based on a copying mechanism.
Background
The automatic summarization goal is to generate a concise summary from a specified piece of text. The method for realizing the method mainly comprises two methods: an extraction formula and a generation formula. The extraction type automatic abstract mainly extracts sentences or phrases from the original text as an abstract; the generative automatic abstract is to regenerate a new sentence which is not in the original text as an abstract.
Conventional neural network generative text summarization methods are generally attention-based sequence-to-sequence methods. The method can extract shallow semantic information and generate simple abstract from the original text. But the generated abstract often contains repeated words, and the potential structural information contained in the abstract is often ignored when the abstract is generated, so that the quality of the generated abstract is greatly limited.
Disclosure of Invention
The invention provides a text abstract generating method of a variation generation decoder based on a copying mechanism, which can realize the generation of a text abstract with clear sentence structure and accurate content.
In order to achieve the technical effects, the technical scheme of the invention is as follows:
a text abstract generating method of a variation generation decoder based on a copying mechanism comprises the following steps:
s1: mapping the input text content into a semantic sequence by encoding;
s2: decoding the semantic sequence obtained in the step S1;
s3: optimizing the decoded output of step S2 results in a text summary of the input text.
Further, the specific process of step S1 is:
inputting text content X ═ { X ═ X1,x2,...,xTMapping the word vector into a corresponding word vector, and sending the word vector as input into a bidirectional gate control circulation unit GRU of an encoder for encoding to obtain the hidden layer state of the input text in an encoding stage, wherein the bidirectional gate control circulation unit GRU comprises a forward direction-backward direction and a backward direction-forward direction for encoding, and the forward direction-backward direction is formed by x1To xT(ii) a Backward-forward is formed by xTTo x1The resulting hidden layer states are as follows:
Figure BDA0002203238370000021
Figure BDA0002203238370000022
and connecting the hidden layer states in two directions to obtain the hidden layer state of the input text in the encoding stage:
Figure BDA0002203238370000023
further, the decoding process in step S2 includes potential sentence structure information modeling based on VAE and replication information modeling based on replication mechanism, wherein the potential sentence structure information modeling based on VAE is:
the hidden layer state of the decoding stage is calculated by two layers of GRUs, the first layer hidden layer state
Figure BDA0002203238370000024
The calculation method of (c) is as follows:
wherein, yt-1Is the output of the last moment in time,
Figure BDA0002203238370000026
is the first layer hidden layer state at the previous moment;
using the first layer hidden layer state at the current time
Figure BDA0002203238370000027
And all hidden layer states of the encoding stage
Figure BDA0002203238370000028
To calculate an attention weight distribution, a context semantic vector ctWeighted sum of hidden layer states representing input text content, attention weight atiThe calculation method is as follows:
Figure BDA0002203238370000029
Figure BDA00022032383700000210
Figure BDA00022032383700000211
wherein, the weight of the attention mechanism is used for measuring the relevance of the output at the time t and the content in the text,
Figure BDA00022032383700000212
and
Figure BDA00022032383700000213
is a weight matrix; baIs an offset;
second layer hidden layer state
Figure BDA00022032383700000214
The attention mechanism is introduced into the decoding stage, so c istAdding into
Figure BDA00022032383700000215
In the calculation of (2):
Figure BDA0002203238370000031
wherein, yt-1Is the output of the last moment in time,
Figure BDA0002203238370000032
is the second layer hidden layer state at the previous time.
In sentence modeling, the output variable y before t moment is taken into consideration of the timing problem<tAnd the latent structural variable z<tMapping to a posterior distribution qφ(zt|y<t,z<t) Using posterior distribution qφ(zt|y<t,z<t) Approximate output true posterior distribution pθ(zt|y<t,z<t) And assuming that the distribution is normal, from qφ(zt|y<t,z<t) And (3) sampling z in the middle, because the sampling process is not conductive, and in order to ensure that the backward propagation of the training process is smoothly carried out, introducing a reconstruction parameter skill, and obtaining a new sampling result through parameter change, wherein a reconstruction parameter formula is as follows:
z=μ+σε
wherein epsilon-N (0, I) is a noise variable, and Gaussian parameters mu and sigma are respectively a variation mean and a standard deviation;
the sampled latent variable z contains the latent sentence structure information of the target abstract, and the latent structure variable z and the state of the second-layer hidden layer
Figure BDA0002203238370000033
Co-mapping to a new hidden layer state as a generating partial variable:
Figure BDA0002203238370000034
wherein tanh (-) is an activation function,
Figure BDA0002203238370000035
andis a matrix of the weights that is,
Figure BDA0002203238370000037
is an offset.
Further, the process of modeling the replication information based on the replication mechanism in step S2 is:
in order to solve the repeated problem in the generation of the abstract, a copying mechanism is introduced in the decoding stage, and a context semantic vector c is usedtAs a copy vector, by copying vector ctThe second layer hiding layer state
Figure BDA0002203238370000038
And the latent structural variable ztCalculate a control gate gswitch
Figure BDA0002203238370000039
Wherein sigmoid (. cndot.) is an activation function,
Figure BDA00022032383700000310
and
Figure BDA00022032383700000311
is a weight matrix, bswitchIs an offset;
gswitchfor assigning different weights to the variables of the generation part and the variables of the copy part, gswitchRepresenting the weight of the generated part; (1-g)switch) Represents the weight of the copied part, ctThe variable serving as the copy part can ensure that each final hidden layer has different degrees of copy, thereby achieving the effect of strengthening the attention mechanism again and reducing the problem of repeated words;
the final hidden layer variables are calculated as follows:
Figure BDA0002203238370000041
and (5) obtaining the target word y by the final hidden layer variable h through a softmax (·) functiontIs represented by the following formula:
wherein sigmoid (. cndot.) is an activation function,
Figure BDA0002203238370000043
is a matrix of the weights that is,is an offset.
Further, the specific process of step S3 is:
the network is optimized, and the loss function mainly comprises two parts: and generating a negative number of the log-likelihood function of the abstract and a lower bound of the variation, and combining a log term in the lower bound of the variation and the log-likelihood function of the abstract. The loss function is shown as follows:
Figure BDA0002203238370000045
wherein, { XnDenotes training text content, { ynDenotes the generated target digest, DKL[·]And the divergence is KL, N is the number of samples, and T is the length of the output sequence.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that:
the invention provides a text abstract generating method of a variation generation decoder based on a copying mechanism. The method is based on an attention sequence-to-sequence framework and combines a Variational Auto-Encoder (VAE) and a replication mechanism to extract and fully utilize effective information contained in sentences. The encoder of the method is the same as the traditional encoder, and a bidirectional Gated cycle Unit (GRU) is used as a basic sequence modeling Unit. Different from the traditional decoder, the decoder of the method has three parts, wherein the first part is a GRU decoding part, two layers of GRUs are adopted, the GRU of the first layer is used for calculating the attention mechanism, the GRU of the second layer is used for introducing the attention mechanism, and the state of the hidden layer of the decoder is obtained through calculation. And the second part is a VAE part and is used for modeling latent variables, extracting the structure information of the latent sentences by utilizing the VAE and generating the hidden variables containing the target abstract latent structure information. And mapping the hidden variable and the hidden layer state of the decoder into a new variable as the variable of the generation part. The third part is a copy part, and a semantic vector containing the context of the attention mechanism is used as a variable of the copy part. And in the final output part of the model, the variables of the generating part and the variables of the copying part act together to decode and output, and a text abstract with clear sentence structure and accurate content is generated.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a data preprocessing diagram;
FIG. 3 is a word graph;
FIG. 4 VAE ideogram;
FIG. 5 is a diagram of potential structural analysis in summary form;
fig. 6 generates an example summary graph.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the patent;
for the purpose of better illustrating the embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product;
it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.
Example 1
As shown in fig. 1, a text summary generation method for a variation generation decoder based on a copy mechanism is to give a text sequence X ═ { X ═ X%1,x2,...,xTAs input, the goal is to generate a summary Y ═ Y of the corresponding text sequence1,y2,...,yT}. The whole framework is divided into two parts of an encoder and a decoder as shown in figure 1. The encoder section encodes the input sequence using bidirectional GRUs. The decoder is composed of three parts, wherein the first part is a GRU decoding part, two layers of GRUs are adopted, the GRU of the first layer is used for calculating the attention mechanism, the GRU of the second layer is introduced into the attention mechanism, and the hidden layer state of the decoder is obtained through calculation. And the second part is a VAE part and is used for modeling latent variables, extracting the structure information of the latent sentences by utilizing the VAE and generating the hidden variables containing the target abstract latent structure information. And mapping the hidden variable and the hidden layer state of the decoder into a new variable as the variable of the generation part. The third part is a copy part, and a semantic vector containing the context of the attention mechanism is used as a variable of the copy part. And in the final output part of the model, the variables of the generation part and the variables of the copy part act together to decode and output to obtain the target abstract. Short Chinese abstract with Xinlang microblogAn open-size (LCTS) data set is described as an example.
First, data is preprocessed as shown in fig. 2. And performing word segmentation according to the mode of FIG. 3 to construct a dictionary. And respectively segmenting the text content and the target abstract and constructing a dictionary. In order to avoid errors caused by word segmentation errors, word segmentation is carried out by taking the word as a unit to construct a dictionary. The words of the dictionary are then mapped into a randomly initialized word vector. And respectively obtaining a dictionary of the text content and a dictionary of the target abstract.
Then, the input text content X is set as X1,x2,...,xTConverting the data into an ID form, indexing in a dictionary through the ID, finding a word vector corresponding to the word, and sending the word vector into a bidirectional GRU for coding by taking the word vector as input to obtain a hidden layer state of a coding stage. Bidirectional GRUs contain both forward-backward and backward-forward encoding. Forward-backward is formed by x1To xT(ii) a Backward-forward is formed by xTTo x1. The resulting hidden layer states are as follows:
Figure BDA0002203238370000061
Figure BDA0002203238370000062
and connecting the hidden layer states in two directions to obtain the hidden layer state in the encoding stage:
Figure BDA0002203238370000063
the hidden layer state of the decoding stage is next calculated. The hidden layer state of the decoding stage is computed by the two layers GRU. First layer hidden layer state
Figure BDA0002203238370000064
The calculation method of (c) is as follows:
wherein, yt-1Is the output of the last moment in time,is the first layer hidden layer state at the last moment.
Then using the first layer hidden layer state at the current moment
Figure BDA0002203238370000067
And all hidden layer states of the encoding stage
Figure BDA0002203238370000068
To calculate the attention weight distribution. The attention weight is calculated as follows:
Figure BDA0002203238370000071
wherein the content of the first and second substances,
Figure BDA0002203238370000072
and
Figure BDA0002203238370000073
is a weight matrix; baIs an offset.
Context semantic vector ctWeighted sum of hidden layer states representing input text content:
Figure BDA0002203238370000074
the second layer hides the layer state, taking into account the attention mechanism introduced in the decoding stage, ctAdding intoIn the calculation of (2):
Figure BDA0002203238370000076
wherein, yt-1Is the output of the last moment in time,
Figure BDA0002203238370000077
is the second layer hidden layer state at the previous time.
Next, the potential structure of the sentence is modeled, and the potential structure in the sentence is extracted as shown in fig. 5. Using a posterior distribution qφ(zt|y<t,z<t) Approximate output true posterior distribution pθ(zt|y<t,z<t) And assuming that the distribution is normal, from qφ(zt|y<t,z<t) And z is sampled out. Since the sampling process is not conductive, in order to ensure the backward propagation of the training process to be smoothly performed, a reconstruction parameter skill is introduced, and a new sampling result is obtained through parameter change, as shown in fig. 5. The reconstruction parameter equation is as follows:
z=μ+σε
where ε -N (0, I) is the noise variable and the Gaussian parameters μ and σ are the variation mean and standard deviation, respectively.
The sampled latent variable z contains the latent sentence structure information of the target abstract. Hiding the latent structural variable z and the second layer hidden layer state
Figure BDA0002203238370000078
Co-mapping to a new hidden layer state as a generating partial variable:
Figure BDA0002203238370000079
wherein tanh (-) is an activation function,
Figure BDA00022032383700000710
andis a matrix of the weights that is,is an offset.
This is followed by modeling of the replication variables. A replication mechanism is introduced at the decoding stage. Context semantic vector ctAs a copy vector. By copying vector ctThe second layer hiding layer stateAnd the latent structural variable ztCalculate a control gate gswitch
Figure BDA0002203238370000082
Wherein sigmoid (. cndot.) is an activation function,
Figure BDA0002203238370000083
and
Figure BDA0002203238370000084
is a weight matrix, bswitchIs an offset.
gswitchUsed to assign different weights to the variables of the generation part and the variables of the replication part. gswitchUsed to assign different weights to the variables of the generation part and the variables of the replication part. gswitchRepresenting the weight of the generated part; (1-g)switch) Representing the weight of the copied portion. C is totThe variable serving as the copy part can ensure that each final hidden layer has different degrees of copy, thereby achieving the effect of strengthening the attention mechanism again and lightening the problem of repeated words.
The final hidden layer variables are therefore calculated as follows:
Figure BDA0002203238370000085
finally, the final hidden layer variable h is processed by a softmax (·) function to obtain a target word ytProbability distribution of, e.g.Represented by the formula:
Figure BDA0002203238370000086
wherein sigmoid (. cndot.) is an activation function,
Figure BDA0002203238370000087
is a matrix of the weights that is,
Figure BDA0002203238370000088
is an offset.
The network is optimized, and the loss function mainly comprises two parts: the negative of the log-likelihood function of the summary and the lower bound of the variation are generated. For convenience of representation, the log terms in the lower bound of the variation and the log-likelihood function of generating the summary are combined. The loss function is shown as follows:
Figure BDA0002203238370000091
wherein, { XnDenotes training text content, { ynDenotes the generated target digest, DKL[·]And the divergence is KL, N is the number of samples, and T is the length of the output sequence. FIG. 6 is a diagram illustrating the summary generation result of the embodiment.
The method of the embodiment is based on the attention sequence-to-sequence framework and combines VAE and a replication mechanism to extract and fully utilize effective information contained in sentences. The encoder of the method is the same as the traditional encoder, and a bidirectional GRU is used as a basic sequence modeling unit. Different from the traditional decoder, the decoder of the method has three parts, wherein the first part is a GRU decoding part, two layers of GRUs are adopted, the GRU of the first layer is used for calculating the attention mechanism, the GRU of the second layer is used for introducing the attention mechanism, and the state of the hidden layer of the decoder is obtained through calculation. And the second part is a VAE part and is used for modeling latent variables, extracting the structure information of the latent sentences by utilizing the VAE and generating the hidden variables containing the target abstract latent structure information. And mapping the hidden variable and the hidden layer state of the decoder into a new variable as the variable of the generation part. The third part is a copy part, and a semantic vector containing the context of the attention mechanism is used as a variable of the copy part. And in the final output part of the model, the variables of the generating part and the variables of the copying part act together to decode and output, and a text abstract with clear sentence structure and accurate content is generated.
Compared with the traditional sequence-to-sequence model, the method of the invention does not introduce a generative model, and the abstract sentence information is not extracted completely, so that the potential sentence structure information can not be utilized, and the generated abstract quality is not high. According to the method, a potential sentence structure information modeling part is added under a sequence framework based on the attention sequence, VAE is introduced as a generation model to model potential structure variables, and the structure and readability of the generated abstract are improved.
Meanwhile, the invention introduces a copy mechanism in the decoding part, and divides the output part of the method into a generation part and a copy part. The hidden variable generated by the VAE and the hidden layer state of the decoder are combined and mapped into a new hidden variable as a generating part, a semantic vector containing the context of the attention mechanism is used as a copying part, and the generating part and the copying part are combined together to decode and output. The method not only ensures that the deep semantic information of the sentences can be utilized, but also relieves the common repeated problem from the sequence to the sequence frame, so that the summary generation is orderly carried out with high quality.
In addition, the GRU decoding part of the decoder of the present invention adopts two layers of GRUs. The first layer of GRU is used for calculating an attention mechanism, and the second layer of GRU introduces the attention mechanism to obtain a hidden layer state of the decoder, so that the combination of a variation idea and a replication idea is realized, and a variation generation decoder based on the replication mechanism is constructed; the method is verified on an LCSTS data set, and proves that the algorithm is superior to a common baseline model.
The same or similar reference numerals correspond to the same or similar parts;
the positional relationships depicted in the drawings are for illustrative purposes only and are not to be construed as limiting the present patent;
it should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims (5)

1. A text abstract generating method of a variation generation decoder based on a copying mechanism is characterized by comprising the following steps:
s1: mapping the input text content into a semantic sequence by encoding;
s2: decoding the semantic sequence obtained in the step S1;
s3: optimizing the decoded output of step S2 results in a text summary of the input text.
2. The method for generating a text summary of a replication mechanism-based variation generation decoder according to claim 1, wherein the specific process of step S1 is:
inputting text content X ═ { X ═ X1,x2,...,xTMapping the word vector into a corresponding word vector, and sending the word vector as input into a bidirectional gate control circulation unit GRU of an encoder for encoding to obtain the hidden layer state of the input text in an encoding stage, wherein the bidirectional gate control circulation unit GRU comprises a forward direction-backward direction and a backward direction-forward direction for encoding, and the forward direction-backward direction is formed by x1To xT(ii) a Backward-forward is formed by xTTo x1The resulting hidden layer states are as follows:
Figure FDA0002203238360000011
Figure FDA0002203238360000012
connecting hidden layer states in two directions to obtain input text in encoding stageHidden layer state:
Figure FDA0002203238360000013
3. the method for generating a text summary of a replication mechanism-based variation generation decoder according to claim 2, wherein the decoding process in step S2 includes potential sentence structure information modeling based on VAE and replication information modeling based on replication mechanism, wherein the potential sentence structure information modeling based on VAE is:
the hidden layer state of the decoding stage is calculated by two layers of GRUs, the first layer hidden layer state
Figure FDA0002203238360000014
The calculation method of (c) is as follows:
wherein, yt-1Is the output of the last moment in time,
Figure FDA0002203238360000016
is the first layer hidden layer state at the previous moment;
using the first layer hidden layer state at the current timeAnd all hidden layer states of the encoding stage
Figure FDA0002203238360000022
To calculate an attention weight distribution, a context semantic vector ctWeighted sum of hidden layer states representing input text content, attention weight atiThe calculation method is as follows:
Figure FDA0002203238360000023
Figure FDA0002203238360000024
Figure FDA0002203238360000025
wherein, the weight of the attention mechanism is used for measuring the relevance of the output at the time t and the content in the text,
Figure FDA0002203238360000026
for the hidden layer state of the encoding stage at time i,
Figure FDA0002203238360000027
and
Figure FDA0002203238360000028
is a weight matrix; baIs an offset;
second layer hidden layer state
Figure FDA0002203238360000029
The attention mechanism is introduced into the decoding stage, so c istAdding intoIn the calculation of (2):
Figure FDA00022032383600000211
wherein, yt-1Is the output of the last moment in time,
Figure FDA00022032383600000212
is the state of the second hidden layer at the previous moment, ctIs a context semantic vector.
In sentence modeling, the output variable y before t moment is taken into consideration of the timing problem<tAnd the latent structural variable z<tMapping to a posterior distribution qφ(zt|y<t,z<t) Using posterior distribution qφ(zt|y<t,z<t) Approximate output true posterior distribution pθ(zt|y<t,z<t) And assuming that the distribution is normal, from qφ(zt|y<t,z<t) And (3) sampling z in the middle, because the sampling process is not conductive, and in order to ensure that the backward propagation of the training process is smoothly carried out, introducing a reconstruction parameter skill, and obtaining a new sampling result through parameter change, wherein a reconstruction parameter formula is as follows:
z=μ+σε
wherein epsilon-N (0, I) is a noise variable, and Gaussian parameters mu and sigma are respectively a variation mean and a standard deviation;
the sampled latent variable z contains the latent sentence structure information of the target abstract, and the latent structure variable z and the state of the second-layer hidden layerCo-mapping to a new hidden layer state as a generating partial variable:
Figure FDA0002203238360000032
where tanh (-) is the activation function, z is the latent structural variable,
Figure FDA0002203238360000033
the layer state is hidden for the second layer,and
Figure FDA0002203238360000035
is a matrix of the weights that is,
Figure FDA0002203238360000036
is an offset.
4. The method for generating a text summary of a replication mechanism-based variation generation decoder according to claim 3, wherein the replication mechanism-based replication information modeling in step S2 is:
in order to solve the repeated problem in the generation of the abstract, a copying mechanism is introduced in the decoding stage, and a context semantic vector c is usedtAs a copy vector, by copying vector ctThe second layer hiding layer stateAnd the latent structural variable ztCalculate a control gate gswitch
Figure FDA0002203238360000038
Wherein sigmoid (. cndot.) is an activation function, ctIn the form of a context semantic vector,
Figure FDA0002203238360000039
the layer state is hidden for the second layer,
Figure FDA00022032383600000310
and
Figure FDA00022032383600000311
is a weight matrix, bswitchIs an offset;
gswitchfor assigning different weights to the variables of the generation part and the variables of the copy part, gswitchRepresenting the weight of the generated part; (1-g)switch) Represents the weight of the copied part, ctThe variable serving as the copy part can ensure that each final hidden layer has different degrees of copy, thereby achieving the effect of strengthening the attention mechanism again and reducing the problem of repeated words;
the final hidden layer variables are calculated as follows:
Figure FDA00022032383600000312
wherein the content of the first and second substances,
Figure FDA00022032383600000313
to generate a partial hidden state, ctIs a context semantic vector. And (5) obtaining the target word y by the final hidden layer variable h through a softmax (·) functiontIs represented by the following formula:
Figure FDA0002203238360000041
wherein sigmoid (. cndot.) is an activation function,is a matrix of the weights that is,is the bias and h is the final hidden layer variable.
5. The method for generating a text summary of a replication mechanism-based variation generation decoder according to claim 4, wherein the specific process of step S3 is:
the network is optimized, and the loss function mainly comprises two parts: and generating a negative number of the log-likelihood function of the abstract and a lower bound of the variation, and combining a log term in the lower bound of the variation and the log-likelihood function of the abstract. The loss function is shown as follows:
Figure FDA0002203238360000044
wherein, { XnDenotes training text content, { ynDenotes the generated target digest, DKL[·]And the divergence is KL, N is the number of samples, and T is the length of the output sequence.
CN201910872440.9A 2019-09-16 2019-09-16 Text abstract generating method of variation generation decoder based on copying mechanism Pending CN110825869A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910872440.9A CN110825869A (en) 2019-09-16 2019-09-16 Text abstract generating method of variation generation decoder based on copying mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910872440.9A CN110825869A (en) 2019-09-16 2019-09-16 Text abstract generating method of variation generation decoder based on copying mechanism

Publications (1)

Publication Number Publication Date
CN110825869A true CN110825869A (en) 2020-02-21

Family

ID=69548131

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910872440.9A Pending CN110825869A (en) 2019-09-16 2019-09-16 Text abstract generating method of variation generation decoder based on copying mechanism

Country Status (1)

Country Link
CN (1) CN110825869A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113626614A (en) * 2021-08-19 2021-11-09 车智互联(北京)科技有限公司 Method, device, equipment and storage medium for constructing information text generation model

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110032638A (en) * 2019-04-19 2019-07-19 中山大学 A kind of production abstract extraction method based on coder-decoder

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110032638A (en) * 2019-04-19 2019-07-19 中山大学 A kind of production abstract extraction method based on coder-decoder

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113626614A (en) * 2021-08-19 2021-11-09 车智互联(北京)科技有限公司 Method, device, equipment and storage medium for constructing information text generation model
CN113626614B (en) * 2021-08-19 2023-10-20 车智互联(北京)科技有限公司 Method, device, equipment and storage medium for constructing information text generation model

Similar Documents

Publication Publication Date Title
CN110598221B (en) Method for improving translation quality of Mongolian Chinese by constructing Mongolian Chinese parallel corpus by using generated confrontation network
CN112115687B (en) Method for generating problem by combining triplet and entity type in knowledge base
CN110795556A (en) Abstract generation method based on fine-grained plug-in decoding
CN107967262A (en) A kind of neutral net covers Chinese machine translation method
CN112765345A (en) Text abstract automatic generation method and system fusing pre-training model
CN113158665A (en) Method for generating text abstract and generating bidirectional corpus-based improved dialog text
CN110032638B (en) Encoder-decoder-based generative abstract extraction method
CN111708877B (en) Text abstract generation method based on key information selection and variational potential variable modeling
CN108845994B (en) Neural machine translation system using external information and training method of translation system
CN111666756B (en) Sequence model text abstract generation method based on theme fusion
CN111125333B (en) Generation type knowledge question-answering method based on expression learning and multi-layer covering mechanism
CN111160452A (en) Multi-modal network rumor detection method based on pre-training language model
CN111581374A (en) Text abstract obtaining method and device and electronic equipment
CN110084297B (en) Image semantic alignment system for small samples
CN109992775A (en) A kind of text snippet generation method based on high-level semantics
CN113657123A (en) Mongolian aspect level emotion analysis method based on target template guidance and relation head coding
CN108763230B (en) Neural machine translation method using external information
CN113723103A (en) Chinese medical named entity and part-of-speech combined learning method integrating multi-source knowledge
CN115719072A (en) Chapter-level neural machine translation method and system based on mask mechanism
CN112380882B (en) Mongolian Chinese neural machine translation method with error correction function
CN116720531B (en) Mongolian neural machine translation method based on source language syntax dependency and quantization matrix
CN110825869A (en) Text abstract generating method of variation generation decoder based on copying mechanism
CN111191023B (en) Automatic generation method, device and system for topic labels
CN112464673B (en) Language meaning understanding method for fusing meaning original information
CN112989845B (en) Chapter-level neural machine translation method and system based on routing algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200221