CN116501863A - Text abstract generation method and device based on natural language processing - Google Patents
Text abstract generation method and device based on natural language processing Download PDFInfo
- Publication number
- CN116501863A CN116501863A CN202310793395.4A CN202310793395A CN116501863A CN 116501863 A CN116501863 A CN 116501863A CN 202310793395 A CN202310793395 A CN 202310793395A CN 116501863 A CN116501863 A CN 116501863A
- Authority
- CN
- China
- Prior art keywords
- text
- vector
- probability
- attention
- word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 40
- 238000003058 natural language processing Methods 0.000 title claims abstract description 39
- 239000013598 vector Substances 0.000 claims abstract description 142
- 238000009826 distribution Methods 0.000 claims abstract description 100
- 238000000605 extraction Methods 0.000 claims abstract description 8
- 239000011159 matrix material Substances 0.000 claims description 35
- 230000006870 function Effects 0.000 claims description 15
- 230000007246 mechanism Effects 0.000 claims description 13
- 239000000654 additive Substances 0.000 claims description 12
- 230000000996 additive effect Effects 0.000 claims description 12
- 230000002452 interceptive effect Effects 0.000 claims description 7
- 230000004913 activation Effects 0.000 claims description 3
- 230000008569 process Effects 0.000 abstract description 3
- 239000010410 layer Substances 0.000 description 12
- 230000000694 effects Effects 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 4
- 238000004590 computer program Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000802 evaporation-induced self-assembly Methods 0.000 description 1
- 210000003128 head Anatomy 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000002356 single layer Substances 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/34—Browsing; Visualisation therefor
- G06F16/345—Summarisation for human users
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3346—Query execution using probabilistic model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Machine Translation (AREA)
Abstract
The invention relates to the technical field of automatic abstract generation, and discloses a text abstract generation method and device based on natural language processing, wherein the method comprises the following steps: analyzing and splitting a text by utilizing a linear layer in a pre-constructed Fastformer model, pre-encoding a query sequence, a key sequence and a value sequence based on the Fastformer model, importing a vocabulary embedding vector into a pre-constructed pointer to generate a network model, optimizing attention probability distribution through the pre-constructed coverage vector, calculating abstract vocabulary probability of the target words according to the abstract vocabulary probability model, copying or constructing vocabulary from the text based on the abstract vocabulary probability model, and generating a text abstract. The invention can solve the inaccuracy problem of the traditional Sequence-to-Sequence model in the process of interpreting the original text, thereby generating a text abstract with higher quality and more accuracy, and has important value in the natural language processing fields such as information extraction, text understanding and the like.
Description
Technical Field
The present invention relates to the field of automatic abstract generation technology, and in particular, to a method and apparatus for generating a text abstract based on natural language processing, an electronic device, and a computer readable storage medium.
Background
With the rapid development of the internet, the amount of information contained in the world of the network is increasing, and how to extract key information from massive information becomes a research hotspot. The text abstract generation task is a process of extracting the most important information from text information, and mainly forms an abstract by selecting words or re-describing sentences from text, wherein the words or the sentences can obviously express text meanings.
The conventional Sequence-to-Sequence model is mainly adopted in the current text abstract generation method, so that the information in the context can be well extracted, the text can be freely generated, and the flexibility is high. However, the conventional Sequence-to-Sequence model is not very accurate for understanding the original text, so that the generated text abstract has poor summarization effect on the original text.
Disclosure of Invention
The invention provides a text abstract generation method, a device and a computer readable storage medium based on natural language processing, which mainly aim to solve the problem that the traditional Sequence-to-Sequence model is inaccurate in understanding an original text, so that the generated text abstract has poor summarization effect on the original text.
In order to achieve the above object, the present invention provides a text abstract generating method based on natural language processing, including:
Splitting a preset text by utilizing a linear layer in a pre-constructed Fastformer model to obtain a query sequence, a key sequence and a value sequence, wherein the text consists of target words;
pre-coding the query sequence, the key sequence and the value sequence based on a Fastformer model to obtain word embedding vectors;
importing the word embedded vector into a pre-constructed pointer generation network model to obtain attention probability distribution, wherein the pointer generation network model consists of an encoder and a decoder;
optimizing the attention probability distribution according to the pre-constructed coverage vector to obtain an optimized attention probability distribution;
constructing a summary vocabulary probability model according to the optimized attention probability distribution, wherein the summary vocabulary probability model is as follows:
wherein,,tag representing target word->The sign label is +>The abstract vocabulary probability of the target word of (c),the sign label is +>The lexical probability of duplication of the target word of +.>The sign label is +>Word probability of generation of target word of +.>Representation->At the moment of timeThe>The personal label is->Is an optimized attention probability distribution of the target word, < ->Representing the number of target words in the text;
And copying or constructing words from the text based on the abstract word probability model to obtain a text abstract.
Optionally, the pre-encoding the query sequence, the key sequence and the value sequence based on the Fastformer model to obtain a word embedding vector, including:
converting the query sequence according to an additive attention mechanism in a Fastformer model to obtain a global query vector;
performing interactive modeling on the global query vector and a key sequence based on a pre-constructed element level multiplication to obtain a key matrix;
converting the key matrix according to the additive attention mechanism to obtain a global key vector;
performing interactive modeling on the global key vector and the value sequence based on the element level multiplication to obtain a global attention matrix;
and adding the global attention matrix with the query sequence to obtain a word embedding vector.
Alternatively, the global query vector and the global key vector may be calculated by the following formula:
wherein,,representing global query vectors,/->Representing global key vectors,/->And->Indicate->Person and->A query sequence->And->Representing the%>Person and->Vector(s) >Representing the total number of query sequences, +.>Representing the number of vectors in the key matrix, +.>And->Representing a learnable parameter vector,/->Is->Transposed matrix of>Is->Transposed matrix of>Representing the hidden layer dimension size of the text.
Optionally, the importing the word embedding vector into a pre-constructed pointer to generate a network model to obtain the attention probability distribution includes:
importing the word embedding vector into the encoder to perform feature extraction to obtain an encoder hiding state vector;
importing the word embedded vector into the decoder to perform feature extraction to obtain a decoder hidden state vector;
obtaining an attention probability distribution of the decoder hidden state vector to an encoder hidden state vector according to a pre-constructed attention generation formula, wherein the attention generation formula is represented by:
wherein,,representation->Attention probability distribution of time of day->、/>、/>And->Representing a learnable parameter->Representing encoder hidden state vector,/->Representation->The decoder conceals the state vector at time of day.
Optionally, the optimizing the attention probability distribution according to the pre-constructed coverage vector, to obtain an optimized attention probability distribution, includes:
Summing the attention probability distribution according to a pre-constructed summation formula to obtain a coverage vector, wherein the summation formula is as follows:
wherein,,representation->Time coverage vector->;
And optimizing the attention probability distribution based on the coverage vector to obtain the optimized attention probability distribution.
Alternatively, the optimized attention probability distribution may be calculated by the following equation:
wherein,,representation->Optimized attention probability distribution of time instant +.>Representing the learnable parameters.
Optionally, the constructing a summary vocabulary probability model according to the optimized attention probability distribution, and the utilizing the summary vocabulary probability model includes:
based on the optimized attention probability distribution, a context vector is calculated according to a pre-constructed weighted summation formula, wherein the weighted summation formula is represented by:
wherein,,representation->Context vector at time of day;
calculating the probability of duplicate vocabulary and the probability of generating words according to the context vector and the hidden state vector of the decoder;
and constructing a summary vocabulary probability model based on the duplication vocabulary probability and the generated word probability.
Alternatively, the duplicate vocabulary probability and the generated word probability may be calculated by the following equation:
Wherein,,the sign label is +>The lexical probability of duplication of the target word of +.>The sign label is +>Word probability of generation of target word of +.>Representing the activation function Sigmoid function.
Optionally, the loss function employed by the pointer generation network model is as follows:
wherein,,representing the +.>Loss value at time of day->Representation->Tag of target word at moment +.>Representing the target word +.>Digest vocabulary probability>Representing the super parameter.
In order to solve the above problems, the present invention further provides a device for generating a text abstract based on natural language processing, the device comprising:
the text receiving module is used for splitting a preset text by utilizing a linear layer in a preset Fastformer model to obtain a query sequence, a key sequence and a value sequence, wherein the text consists of target words;
the pre-coding module is used for pre-coding the query sequence, the key sequence and the value sequence based on the Fastformer model to obtain word embedding vectors;
the optimized attention probability distribution module is used for guiding the word embedded vector into a pre-constructed pointer generation network model to obtain attention probability distribution, wherein the pointer generation network model consists of an encoder and a decoder; optimizing the attention probability distribution according to the pre-constructed coverage vector to obtain an optimized attention probability distribution;
The abstract vocabulary probability module is used for constructing an abstract vocabulary probability model according to the optimized attention probability distribution, wherein the abstract vocabulary probability model is as follows:
wherein,,tag representing target word->The sign label is +>The abstract vocabulary probability of the target word of (c),the sign label is +>The lexical probability of duplication of the target word of +.>The sign label is +>Word probability of generation of target word of +.>Representation->Time of day +.>The personal label is->Is an optimized attention probability distribution of the target word, < ->Representing the number of target words in the text;
and the text abstract generating module is used for copying or constructing words from the text based on the abstract word probability model to obtain a text abstract.
In order to solve the above-mentioned problems, the present invention also provides an electronic apparatus including:
at least one processor;
and a memory communicatively coupled to the at least one processor;
the memory stores instructions executable by the at least one processor to implement the text abstract generation method based on natural language processing.
In order to solve the above-mentioned problems, the present invention also provides a computer-readable storage medium having stored therein at least one instruction that is executed by a processor in an electronic device to implement the above-mentioned text digest generation method based on natural language processing.
The method comprises the steps of firstly splitting a pre-built text by utilizing a linear layer in a pre-built Fastformer model to obtain a query sequence, a key sequence and a value sequence, pre-encoding the query sequence, the key sequence and the value sequence based on the Fastformer model to obtain word embedded vectors, importing the word embedded vectors into a pre-built pointer generation network model to obtain attention probability distribution, wherein the pointer generation network model consists of an encoder and a decoder, and optimizing the attention probability distribution according to the pre-built coverage vector to obtain optimized attention probability distribution. The abstract vocabulary probability model is constructed according to the optimized attention probability distribution, and the abstract vocabulary probability of the target word is calculated by using the abstract vocabulary probability model, so that the abstract vocabulary probability model adopted by the invention is constructed based on the generated word probability and the copied vocabulary probability, and the target word can be directly copied from the text. Therefore, the text abstract generation method, the device, the electronic equipment and the computer readable storage medium based on natural language processing can solve the problem that the traditional Sequence-to-Sequence model is inaccurate in understanding the original text, so that the generated text abstract has poor summarization effect on the original text.
Drawings
FIG. 1 is a flowchart of a text abstract generation method based on natural language processing according to an embodiment of the present invention;
FIG. 2 is a functional block diagram of a text abstract generating device based on natural language processing according to an embodiment of the invention;
fig. 3 is a schematic structural diagram of an electronic device for implementing the text abstract generation method based on natural language processing according to an embodiment of the invention.
The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The embodiment of the application provides a text abstract generation method based on natural language processing. The execution subject of the text abstract generation method based on natural language processing includes, but is not limited to, at least one of a server, a terminal and the like, which can be configured to execute the method provided by the embodiment of the application. In other words, the text digest generation method based on natural language processing may be performed by software or hardware installed in a terminal device or a server device. The service end includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like.
Example 1:
referring to fig. 1, a flow chart of a text abstract generation method based on natural language processing according to an embodiment of the invention is shown. In this embodiment, the text abstract generating method based on natural language processing includes:
s1, splitting the pre-constructed text by utilizing a linear layer in a pre-constructed Fastformer model to obtain a query sequence, a key sequence and a value sequence, wherein the text consists of target words.
It is to be appreciated that the Fastformer model is an additive attention-based fransformer model with powerful text understanding capabilities that can enable context modeling with linear complexity. The text pre-constructed in the embodiment of the invention is input in a matrix form, the text is formed by combining target words, and the number of words of the target words can be one or more. For example, if the preset text is "one pair of large eyes", the target word is "one pair", "large", and "eye", etc.
It should be explained that the complexity of attention computation in the text abstract generation model is the square of the sequence length of the text, and when the input text sequence is very long, a very large computing resource is consumed, so that the Fastformer model constructed by the embodiment of the invention adopts a multi-head additive attention mechanism, and a pre-constructed text matrix is split into 3 different sequences by utilizing a linear layer in each attention head, and the query sequence, the key sequence and the value sequence are represented in a matrix form.
Illustratively, a small sheet is to extract a text abstract of an article, the article is first entered into a pre-constructed Fastformer model, the entered article is represented in a matrix, denoted as E= [ E ] 1 ,e 2 ,e 3 ,...,e n ]Is composed of a series of vectors. The Fastformer model adopted by the embodiment of the invention has three independent linear layers, and converts an input matrix E into a query sequence, a key sequence and a value sequence, and is recorded as Q= [ Q ] 1 ,q 2 ,q 3 ,...,q n ],K=[k 1 ,k 2 ,k 3 ,...,k n ],V=[v 1 ,v 2 ,v 3 ,...,v n ]Thereby reducing the length of the text sequence and reducing the computing resources.
S2, pre-coding the query sequence, the key sequence and the value sequence based on a Fastformer model to obtain a word embedding vector.
In detail, the pre-coding the query sequence, the key sequence and the value sequence based on the Fastformer model to obtain a word embedding vector comprises the following steps:
converting the query sequence according to an additive attention mechanism in a Fastformer model to obtain a global query vector;
performing interactive modeling on the global query vector and a key sequence based on a pre-constructed element level multiplication to obtain a key matrix;
converting the key matrix according to the additive attention mechanism to obtain a global key vector;
Performing interactive modeling on the global key vector and the value sequence based on the element level multiplication to obtain a global attention matrix;
and adding the global attention matrix with the query sequence to obtain a word embedding vector.
It should be explained that, in order to fully extract the characteristics of the text, the embodiment of the invention uses the Fastformer model as a pointer to generate the precoder of the network model, so that the model can fully understand the context information.
It should be appreciated that the attention mechanism is an attempt to selectively focus certain related things while ignoring others in the deep neural network. There are generally two types of attention calculations: additive attention and multiplicative attention. The embodiment of the invention adopts an additive attention mechanism, which is an attention mechanism based on linear combination between a query vector and a context vector.
It should be emphasized that the embodiment of the present invention compresses the context information of the query matrix by using an additive attention mechanism, and summarizes the compressed context information into a vector, which is called a global query vector.
It should be explained that element-level multiplication is a mathematical operation, meaning multiplication of elements of corresponding rows and columns, which is an efficient way of modeling between two vectors. The complexity of attention calculation is reduced from the square of the sequence length N to N by adopting element-level multiplication, so that the consumption of calculation resources is greatly reduced, and the context information of the text can be effectively captured.
Illustratively, element-level multiplication operation is carried out between the obtained key sequence k vector and the global query vector V matrix, wherein each vector V in the k vector and the V matrix i The interaction modeling is carried out between the two, and the calculation mode can be expressed as u i =k×v i Thereby obtaining the key matrix u i 。
Notably, like the global query vector, the global key matrix is also derived by an additive attention mechanism, which is used to sum the key matrix into a vector containing global context information.
It should be understood that, like the key matrix, the global attention matrix is also obtained by element-level multiplication, and the global attention matrix is obtained by performing interactive modeling on each of the obtained value sequence vectors and the global key vectors.
In detail, the global query vector and the global key vector may be calculated by the following formula:
wherein,,representing global query vectors,/->Representing global key vectors,/->And->Indicate->Person and->A query sequence->And->Representing the%>Person and->Vector(s)>Representing the total number of query sequences, +.>Representing the number of vectors in the key matrix, +.>And->Representing a learnable parameter vector,/- >Is->Transposed matrix of>Is->Transposed matrix of>Representing the hidden layer dimension size of the text.
S3, importing the word embedded vector into a pre-constructed pointer generation network model to obtain the attention probability distribution, wherein the pointer generation network model consists of an encoder and a decoder.
It should be explained that the pointer generation network model is a generated text abstract model constructed on the basis of the Sequence model Sequence-to-Sequence. The pointer generation network model consists of an encoder and a decoder, wherein the encoder adopts a single-layer two-way long-short-term memory neural network, and the decoder adopts a one-way long-short-term memory neural network.
In detail, the step of importing the word embedding vector into a pre-constructed pointer to generate a network model to obtain an attention probability distribution includes:
importing the word embedding vector into the encoder to perform feature extraction to obtain an encoder hiding state vector;
importing the word embedded vector into the decoder to perform feature extraction to obtain a decoder hidden state vector;
obtaining an attention probability distribution of the decoder hidden state vector to an encoder hidden state vector according to a pre-constructed attention generation formula, wherein the attention generation formula is represented by:
Wherein,,representation->Attention probability distribution of time of day->、/>、/>And->Representing a learnable parameter->Representing encoder hidden state vector,/->Representation->The decoder conceals the state vector at time of day.
It should be understood that, in the embodiment of the present invention, the attention probability distribution refers to the similarity between the decoder hidden state vector and the encoder hidden state vector, and the greater the similarity between the two, the greater the attention probability distribution. For the extracted decoder hidden state vector and encoder hidden state vector, features such as word meaning, content and the like can be seen, and if the features are almost the same, the attention probability distribution is high.
In detail, the loss function employed by the pointer generation network model is as follows:
wherein,,representing the +.>Loss value at time of day->Representation->Tag of target word at moment +.>Representing the target word +.>Digest vocabulary probability>Representing the super parameter.
And S4, optimizing the attention probability distribution according to the pre-constructed coverage vector to obtain an optimized attention probability distribution.
It should be explained that, since the pointer generation network model is built based on Sequence-to-Sequence, the pointer generation network model tends to repeat itself, and there is a disadvantage that the generated text abstract has excessive repetitive word part of vocabulary. Therefore, the embodiment of the invention adopts the coverage vector to relieve the word repetition problem in the text abstract and punish the repeated position in the pointer generation network model.
It should be appreciated that the coverage vector refers to the sum of the probability distributions of attention over time. The attention probability distribution is optimized through the coverage vector, so that the optimized attention probability distribution is obtained, the currently adopted attention probability distribution can be ensured to be based on the decision of the attention probability distribution in all previous time, and the attention mechanism avoids repeated attention to the same position and generates repeated content.
In detail, the optimizing the attention probability distribution according to the pre-constructed coverage vector to obtain an optimized attention probability distribution includes:
the coverage vector sums the attention probability distribution according to a pre-constructed summation formula to obtain the coverage vector, wherein the summation formula is as follows:
wherein,,representation->A time of day coverage vector;
and optimizing the attention probability distribution based on the coverage vector to obtain the optimized attention probability distribution.
In detail, the optimized attention probability distribution may be calculated by the following formula:
wherein,,representation->Optimized attention probability distribution of time instant +.>Representing the learnable parameters.
S5, constructing a summary vocabulary probability model according to the optimized attention probability distribution.
In detail, the abstract vocabulary probability model is as follows:
wherein,,tag representing target word->The sign label is +>The abstract vocabulary probability of the target word of (c),the sign label is +>The lexical probability of duplication of the target word of +.>The sign label is +>Word probability of generation of target word of +.>Representation->Time of day +.>The personal label is->Is an optimized attention probability distribution of the target word, < ->Representing the number of target words in the text.
It should be explained that the vocabulary probability is replicatedMeans that the label is +>Probability distribution of target words in vocabulary, < ->The sign label is +>The sum of the optimized probability distribution of the target word in the text is used for generating word probability ++through constructing the abstract vocabulary probability model>As a soft switch, the tag is +.>The probability distribution of target words in the vocabulary and the text is weighted and averaged to obtain the abstract vocabulary probability, and the label is determined as +.>Whether to construct words from the vocabulary or to copy words from the text according to the attention profile.
It should be appreciated that when one of the target words that exists is a word outside of the vocabulary, then Will be equal to zero and the text abstract will replicate words from the text according to the resulting abstract vocabulary probability, according to the optimized attention profile size of the target word. Likewise, if the target word does not appear in the text, +.>At zero, the text summary will be according to +.>And->And selecting words from the vocabulary and the text to construct vocabulary, and generating a abstract.
In detail, the constructing a summary vocabulary probability model according to the optimized attention probability distribution includes:
based on the optimized attention probability distribution, a context vector is calculated according to a pre-constructed weighted summation formula, wherein the weighted summation formula is represented by:
wherein,,representation->Context vector at time of day;
calculating the probability of duplicate vocabulary and the probability of generating words according to the context vector and the hidden state vector of the decoder;
and constructing a summary vocabulary probability model based on the duplication vocabulary probability and the generated word probability.
In detail, the duplicate vocabulary probability and the generated word probability may be calculated by the following formula:
wherein,,the sign label is +>The lexical probability of duplication of the target word of +.>The sign label is +>Word probability of generation of target word of +.>Representing the activation function Sigmoid function.
And S6, copying or constructing words from the text based on the abstract word probability model to obtain a text abstract.
It should be explained that the generating of the abstract model is to rewrite according to the original text, allowing new words to be generated, and no phrase is in the original text to compose the abstract, so the model cannot copy words directly from the text. In the embodiment of the invention, the extraction method and the generation method are fused by adopting a pointer generation network, so that the model has the capability of copying words from texts. By constructing the abstract vocabulary probability model, target words can be directly copied or generated from the text based on the generated word probability and the copied vocabulary probability, and the text abstract information is obtained.
The method comprises the steps of firstly splitting a pre-built text by utilizing a linear layer in a pre-built Fastformer model to obtain a query sequence, a key sequence and a value sequence, pre-encoding the query sequence, the key sequence and the value sequence based on the Fastformer model to obtain word embedded vectors, importing the word embedded vectors into a pre-built pointer generation network model to obtain attention probability distribution, wherein the pointer generation network model consists of an encoder and a decoder, optimizing the attention probability distribution according to a pre-built coverage vector to obtain optimized attention probability distribution, constructing a summary vocabulary probability model according to the optimized attention probability distribution, calculating the summary vocabulary probability of a target word by utilizing the summary vocabulary probability model, and directly copying the target word from the text. Therefore, the text abstract generation method, the device, the electronic equipment and the computer readable storage medium based on natural language processing can solve the problem that the traditional Sequence-to-Sequence model is inaccurate in understanding the original text, so that the generated text abstract has poor summarization effect on the original text.
Example 2:
fig. 2 is a functional block diagram of a text abstract generation apparatus based on natural language processing according to an embodiment of the invention.
The apparatus 100 for generating a text abstract based on natural language processing according to the present invention may be installed in an electronic device. Depending on the implemented functions, the device for generating a text abstract based on natural language processing 100 may include a text receiving module 101, a pre-encoding module 102, an optimized attention probability distribution module 103, an abstract vocabulary probability module 104, and a text abstract generating module 105. The module of the invention, which may also be referred to as a unit, refers to a series of computer program segments, which are stored in the memory of the electronic device, capable of being executed by the processor of the electronic device and of performing a fixed function.
The text receiving module 101 is configured to split a preset text by using a linear layer in a pre-constructed Fastformer model to obtain a query sequence, a key sequence and a value sequence, where the text is composed of target words;
the pre-coding module 102 is configured to pre-code the query sequence, the key sequence, and the value sequence based on a Fastformer model to obtain a word embedding vector;
The optimized attention probability distribution module 103 is configured to import the word embedding vector into a pre-constructed pointer generation network model to obtain an attention probability distribution, where the pointer generation network model is composed of an encoder and a decoder; optimizing the attention probability distribution according to the pre-constructed coverage vector to obtain an optimized attention probability distribution;
the abstract vocabulary probability module 104 is configured to construct an abstract vocabulary probability model according to the optimized attention probability distribution, where the abstract vocabulary probability model is:
wherein,,tag representing target word->The sign label is +>The abstract vocabulary probability of the target word of (c),the sign label is +>The lexical probability of duplication of the target word of +.>The sign label is +>Word probability of generation of target word of +.>Representation->Time of day +.>The personal label is->Is a target word optimized attention probability distribution,/>representing the number of target words in the text;
the text abstract generating module 105 is configured to copy or construct words from the text based on the abstract word probability model, so as to obtain a text abstract.
In detail, the modules in the device 100 for generating a text abstract based on natural language processing in the embodiment of the invention adopt the same technical means as the method for generating a text abstract based on natural language processing in fig. 1, and can generate the same technical effects, which are not described herein.
Example 3:
fig. 3 is a schematic structural diagram of an electronic device for implementing a text abstract generation method based on natural language processing according to an embodiment of the invention.
The electronic device 1 may comprise a processor 10, a memory 11, a bus 12 and a communication interface 13, and may further comprise a computer program stored in the memory 11 and executable on the processor 10, such as a generation program of a text excerpt based on natural language processing.
The memory 11 includes at least one type of readable storage medium, including flash memory, a mobile hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may in other embodiments also be an external storage device of the electronic device 1, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only for storing application software installed in the electronic device 1 and various types of data, such as code of a generation program of a text digest based on natural language processing, but also for temporarily storing data that has been output or is to be output.
The processor 10 may be comprised of integrated circuits in some embodiments, for example, a single packaged integrated circuit, or may be comprised of multiple integrated circuits packaged with the same or different functions, including one or more central processing units (Central Processing unit, CPU), microprocessors, digital processing chips, graphics processors, combinations of various control chips, and the like. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects respective components of the entire electronic device using various interfaces and lines, executes or executes programs or modules (e.g., a text digest generation program based on natural language processing, etc.) stored in the memory 11, and invokes data stored in the memory 11 to perform various functions of the electronic device 1 and process data.
The bus may be a peripheral component interconnect standard (peripheral component interconnect, PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others. The bus may be classified as an address bus, a data bus, a control bus, etc. The bus is arranged to enable a connection communication between the memory 11 and at least one processor 10 etc.
Fig. 3 shows only an electronic device with components, it being understood by a person skilled in the art that the structure shown in fig. 3 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than shown, or may combine certain components, or may be arranged in different components.
For example, although not shown, the electronic device 1 may further include a power source (such as a battery) for supplying power to each component, and preferably, the power source may be logically connected to the at least one processor 10 through a power management device, so that functions of charge management, discharge management, power consumption management, and the like are implemented through the power management device. The power supply may also include one or more of any of a direct current or alternating current power supply, recharging device, power failure detection circuit, power converter or inverter, power status indicator, etc. The electronic device 1 may further include various sensors, bluetooth modules, wi-Fi modules, etc., which will not be described herein.
Further, the electronic device 1 may also comprise a network interface, optionally the network interface may comprise a wired interface and/or a wireless interface (e.g. WI-FI interface, bluetooth interface, etc.), typically used for establishing a communication connection between the electronic device 1 and other electronic devices.
The electronic device 1 may optionally further comprise a user interface, which may be a Display, an input unit, such as a Keyboard (Keyboard), or a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch, or the like. The display may also be referred to as a display screen or display unit, as appropriate, for displaying information processed in the electronic device 1 and for displaying a visual user interface.
It should be understood that the embodiments described are for illustrative purposes only and are not limited to this configuration in the scope of the patent application.
The generation program of the text excerpt based on natural language processing stored in the memory 11 in the electronic device 1 is a combination of a plurality of instructions, which when executed in the processor 10, can implement:
splitting a preset text by utilizing a linear layer in a pre-constructed Fastformer model to obtain a query sequence, a key sequence and a value sequence, wherein the text consists of target words;
Pre-coding the query sequence, the key sequence and the value sequence based on a Fastformer model to obtain word embedding vectors;
importing the word embedded vector into a pre-constructed pointer generation network model to obtain attention probability distribution, wherein the pointer generation network model consists of an encoder and a decoder;
optimizing the attention probability distribution according to the pre-constructed coverage vector to obtain an optimized attention probability distribution;
constructing a summary vocabulary probability model according to the optimized attention probability distribution, wherein the summary vocabulary probability model is as follows:
wherein,,tag representing target word->The sign label is +>The abstract vocabulary probability of the target word of (c),the sign label is +>The lexical probability of duplication of the target word of +.>The sign label is +>Word probability of generation of target word of +.>Representation->Time of day +.>The personal label is->Is an optimized attention probability distribution of the target word, < ->Representing the number of target words in the text;
and copying or constructing words from the text based on the abstract word probability model to obtain a text abstract.
Specifically, the specific implementation method of the above instruction by the processor 10 may refer to descriptions of related steps in the corresponding embodiments of fig. 1 to 2, which are not repeated herein.
Further, the modules/units integrated in the electronic device 1 may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as separate products. The computer readable storage medium may be volatile or nonvolatile. For example, the computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM).
The present invention also provides a computer readable storage medium storing a computer program which, when executed by a processor of an electronic device, can implement:
splitting a preset text by utilizing a linear layer in a pre-constructed Fastformer model to obtain a query sequence, a key sequence and a value sequence, wherein the text consists of target words;
pre-coding the query sequence, the key sequence and the value sequence based on a Fastformer model to obtain word embedding vectors;
importing the word embedded vector into a pre-constructed pointer generation network model to obtain attention probability distribution, wherein the pointer generation network model consists of an encoder and a decoder;
Optimizing the attention probability distribution according to the pre-constructed coverage vector to obtain an optimized attention probability distribution;
constructing a summary vocabulary probability model according to the optimized attention probability distribution, wherein the summary vocabulary probability model is as follows:
wherein,,tag representing target word->The sign label is +>The abstract vocabulary probability of the target word of (c),the sign label is +>The lexical probability of duplication of the target word of +.>The sign label is +>Word probability of generation of target word of +.>Representation->Time of day +.>The personal label is->Is an optimized attention probability distribution of the target word, < ->Representing the number of target words in the text;
and copying or constructing words from the text based on the abstract word probability model to obtain a text abstract.
In the several embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be other manners of division when actually implemented.
The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units can be realized in a form of hardware or a form of hardware and a form of software functional modules.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof.
Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention.
Claims (10)
1. A method for generating a text abstract based on natural language processing, the method comprising:
splitting a preset text by utilizing a linear layer in a pre-constructed Fastformer model to obtain a query sequence, a key sequence and a value sequence, wherein the text consists of target words;
Pre-coding the query sequence, the key sequence and the value sequence based on a Fastformer model to obtain word embedding vectors;
importing the word embedded vector into a pre-constructed pointer generation network model to obtain attention probability distribution, wherein the pointer generation network model consists of an encoder and a decoder;
optimizing the attention probability distribution according to the pre-constructed coverage vector to obtain an optimized attention probability distribution;
constructing a summary vocabulary probability model according to the optimized attention probability distribution, wherein the summary vocabulary probability model is as follows:
wherein (1)>A tag representing a target word is provided,the sign label is +>Digest lexical probability of target word of +.>The sign label is +>The lexical probability of duplication of the target word of +.>The sign label is +>Word probability of generation of target word of +.>Representation->Time of day +.>The personal label is->Is an optimized attention probability distribution of the target word, < ->Representing the number of target words in the text;
and copying or constructing words from the text based on the abstract word probability model to obtain a text abstract.
2. The method for generating a text abstract based on natural language processing according to claim 1, wherein the pre-coding the query sequence, the key sequence and the value sequence based on the Fastformer model to obtain word embedding vectors comprises:
Converting the query sequence according to an additive attention mechanism in a Fastformer model to obtain a global query vector;
performing interactive modeling on the global query vector and a key sequence based on a pre-constructed element level multiplication to obtain a key matrix;
converting the key matrix according to the additive attention mechanism to obtain a global key vector;
performing interactive modeling on the global key vector and the value sequence based on the element level multiplication to obtain a global attention matrix;
and adding the global attention matrix with the query sequence to obtain a word embedding vector.
3. The method for generating a text abstract based on natural language processing of claim 2, wherein the global query vector and the global key vector are calculated by:
;
wherein (1)>Representing global query vectors,/->Representing global key vectors,/->And->Respectively represent +.>Person and->A query sequence->And->Representing the%>Person and->Vector(s)>Representing the total number of query sequences, +.>Representing the number of vectors in the key matrix, +.>And->Representing a learnable parameter vector,/->Is->Transposed matrix of>Is- >Transposed matrix of>Representing the hidden layer dimension size of the text.
4. The method for generating a text abstract based on natural language processing of claim 2, wherein said importing said word embedding vector into a pre-constructed pointer generation network model, obtaining an attention probability distribution, comprises:
importing the word embedding vector into the encoder to perform feature extraction to obtain an encoder hiding state vector;
importing the word embedded vector into the decoder to perform feature extraction to obtain a decoder hidden state vector;
obtaining an attention probability distribution of the decoder hidden state vector to an encoder hidden state vector according to a pre-constructed attention generation formula, wherein the attention generation formula is represented by:
wherein (1)>Representation->Attention probability distribution of time of day->、、/>And->Representing a learnable parameter->Representing encoder hidden state vector,/->Representation->The decoder conceals the state vector at time of day.
5. The method for generating a text abstract based on natural language processing of claim 4, wherein optimizing the attention probability distribution according to a pre-constructed coverage vector to obtain an optimized attention probability distribution comprises:
Summing the attention probability distribution according to a pre-constructed summation formula to obtain a coverage vector, wherein the summation formula is as follows:
wherein (1)>Representation->Time coverage vector->;
And optimizing the attention probability distribution based on the coverage vector to obtain the optimized attention probability distribution.
6. A method of generating a text excerpt based on natural language processing as recited in claim 5, wherein the optimized attention probability distribution is calculated by:
wherein (1)>Representation->Optimized attention probability distribution of time instant +.>Representing the learnable parameters.
7. The method for generating a text abstract based on natural language processing of claim 6, wherein said constructing an abstract vocabulary probability model from said optimized attention probability distribution comprises:
based on the optimized attention probability distribution, a context vector is calculated according to a pre-constructed weighted summation formula, wherein the weighted summation formula is represented by:
wherein (1)>Representation->Context vector at time of day;
calculating the probability of duplicate vocabulary and the probability of generating words according to the context vector and the hidden state vector of the decoder;
and constructing a summary vocabulary probability model based on the duplication vocabulary probability and the generated word probability.
8. The method for generating a text abstract based on natural language processing of claim 7 wherein said replica vocabulary probability and said generated word probability are calculated by:
;
wherein (1)>The sign label is +>The probability of the word being copied for the target word of (c),the sign label is +>Word probability of generation of target word of +.>Representing the activation function Sigmoid function.
9. The method for generating a text abstract based on natural language processing of claim 4 wherein a penalty function used by said pointer generation network model is as follows:
wherein (1)>Representing the +.>Loss value at time of day->Representation->Tag of target word at moment +.>Representing the target word +.>Digest vocabulary probability>Representing the super parameter.
10. An apparatus for realizing intelligent projector control based on auto-induction, the apparatus comprising:
the text receiving module is used for splitting a preset text by utilizing a linear layer in a preset Fastformer model to obtain a query sequence, a key sequence and a value sequence, wherein the text consists of target words;
the pre-coding module is used for pre-coding the query sequence, the key sequence and the value sequence based on the Fastformer model to obtain word embedding vectors;
The optimized attention probability distribution module is used for guiding the word embedded vector into a pre-constructed pointer generation network model to obtain attention probability distribution, wherein the pointer generation network model consists of an encoder and a decoder; optimizing the attention probability distribution according to the pre-constructed coverage vector to obtain an optimized attention probability distribution;
the abstract vocabulary probability module is used for constructing an abstract vocabulary probability model according to the optimized attention probability distribution, wherein the abstract vocabulary probability model is as follows:
wherein (1)>A tag representing a target word is provided,the sign label is +>Digest lexical probability of target word of +.>Representing the label as/>The lexical probability of duplication of the target word of +.>The sign label is +>Word probability of generation of target word of +.>Representation->Time of day +.>The personal label is->Is an optimized attention probability distribution of the target word, < ->Representing the number of target words in the text;
and the text abstract generating module is used for copying or constructing words from the text based on the abstract word probability model to obtain a text abstract.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310793395.4A CN116501863A (en) | 2023-06-30 | 2023-06-30 | Text abstract generation method and device based on natural language processing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310793395.4A CN116501863A (en) | 2023-06-30 | 2023-06-30 | Text abstract generation method and device based on natural language processing |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116501863A true CN116501863A (en) | 2023-07-28 |
Family
ID=87323513
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310793395.4A Pending CN116501863A (en) | 2023-06-30 | 2023-06-30 | Text abstract generation method and device based on natural language processing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116501863A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110390103A (en) * | 2019-07-23 | 2019-10-29 | 中国民航大学 | Short text auto-abstracting method and system based on Dual-encoder |
US20190371307A1 (en) * | 2018-05-31 | 2019-12-05 | Robert Bosch Gmbh | Slot Filling in Spoken Language Understanding with Joint Pointer and Attention |
CN114757177A (en) * | 2022-03-11 | 2022-07-15 | 重庆邮电大学 | Text summarization method for generating network based on BART fusion pointer |
CN115526149A (en) * | 2022-10-21 | 2022-12-27 | 重庆邮电大学 | Text summarization method for fusing double attention and generating confrontation network |
-
2023
- 2023-06-30 CN CN202310793395.4A patent/CN116501863A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190371307A1 (en) * | 2018-05-31 | 2019-12-05 | Robert Bosch Gmbh | Slot Filling in Spoken Language Understanding with Joint Pointer and Attention |
CN110390103A (en) * | 2019-07-23 | 2019-10-29 | 中国民航大学 | Short text auto-abstracting method and system based on Dual-encoder |
CN114757177A (en) * | 2022-03-11 | 2022-07-15 | 重庆邮电大学 | Text summarization method for generating network based on BART fusion pointer |
CN115526149A (en) * | 2022-10-21 | 2022-12-27 | 重庆邮电大学 | Text summarization method for fusing double attention and generating confrontation network |
Non-Patent Citations (1)
Title |
---|
胡清丰 等: "基于指针生成网络的中文对话文本摘要模型", 《计算机系统应用》, pages 224 - 232 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022134759A1 (en) | Keyword generation method and apparatus, and electronic device and computer storage medium | |
CN113157927B (en) | Text classification method, apparatus, electronic device and readable storage medium | |
CN108959482A (en) | Single-wheel dialogue data classification method, device and electronic equipment based on deep learning | |
CN113378970B (en) | Sentence similarity detection method and device, electronic equipment and storage medium | |
CN113360654B (en) | Text classification method, apparatus, electronic device and readable storage medium | |
CN111563158A (en) | Text sorting method, sorting device, server and computer-readable storage medium | |
CN112507663A (en) | Text-based judgment question generation method and device, electronic equipment and storage medium | |
CN107943788B (en) | Enterprise abbreviation generation method and device and storage medium | |
CN112861514A (en) | Attention-enhanced fully-correlated variational auto-encoder for partitioning syntax and semantics | |
CN116432611A (en) | Manuscript writing auxiliary method, system, terminal and storage medium | |
CN115238115A (en) | Image retrieval method, device and equipment based on Chinese data and storage medium | |
CN109657127B (en) | Answer obtaining method, device, server and storage medium | |
CN109754306A (en) | Information processing method and device, electronic equipment and computer-readable medium | |
CN117520535A (en) | Method, system, device and storage medium for generating text abstract | |
CN117390213A (en) | Training method of image-text retrieval model based on OSCAR and method for realizing image-text retrieval | |
CN116127925B (en) | Text data enhancement method and device based on destruction processing of text | |
CN116701635A (en) | Training video text classification method, training video text classification device, training video text classification equipment and storage medium | |
CN109446518B (en) | Decoding method and decoder for language model | |
CN116978028A (en) | Video processing method, device, electronic equipment and storage medium | |
CN116501863A (en) | Text abstract generation method and device based on natural language processing | |
CN113139129B (en) | Virtual reading trajectory graph generation method and device, electronic equipment and storage medium | |
CN111414452B (en) | Search word matching method and device, electronic equipment and readable storage medium | |
CN113239215A (en) | Multimedia resource classification method and device, electronic equipment and storage medium | |
CN112749264A (en) | Problem distribution method and device based on intelligent robot, electronic equipment and storage medium | |
CN117910473B (en) | Event argument extraction method integrating entity type information and related equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20230728 |