CN114254175A - Method for extracting generative abstract of power policy file - Google Patents

Method for extracting generative abstract of power policy file Download PDF

Info

Publication number
CN114254175A
CN114254175A CN202111550623.2A CN202111550623A CN114254175A CN 114254175 A CN114254175 A CN 114254175A CN 202111550623 A CN202111550623 A CN 202111550623A CN 114254175 A CN114254175 A CN 114254175A
Authority
CN
China
Prior art keywords
attention
model
decoder
encoder
abstract
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111550623.2A
Other languages
Chinese (zh)
Inventor
郑福康
陈正飞
王嘉豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Power Supply Bureau Co Ltd
Original Assignee
Shenzhen Power Supply Bureau Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Power Supply Bureau Co Ltd filed Critical Shenzhen Power Supply Bureau Co Ltd
Priority to CN202111550623.2A priority Critical patent/CN114254175A/en
Publication of CN114254175A publication Critical patent/CN114254175A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a method for extracting a generative abstract of a power policy file, which comprises the following steps: step S10, acquiring an electronic document of the power policy file by adopting a crawler technology; step S11, performing word segmentation processing on the electronic document, forming initial embedded data according to a word vector model, and inputting the initial embedded data into a pre-trained abstract generation model; step S12, adding position coding in the bottom embedding of the encoder and the decoder; and step S13, automatically generating abstract contents by using the output of the current time and the previous time of the decoder and the generation probability of the pointer generation network obtained by attention distribution splicing. The invention can improve the efficiency and the accuracy of generating the abstract.

Description

Method for extracting generative abstract of power policy file
Technical Field
The invention relates to the technical field of natural language processing, in particular to a method for extracting a generating type automatic abstract of a power policy file.
Background
For power supply enterprises, the enhancement of power price management is an important guarantee that the sales income can be realized and the profit level can be improved. The method has the advantages of seriously executing national electricity price policies and regulations, standardizing the order of electricity price management, and having important significance for ensuring the regulation and control of national industrial policies, saving energy and maintaining the economic benefits of both power supply and power utilization parties. The electricity price policy needs to be known in time so as to make a reasonable electricity marketing strategy and promote the development of the electric power enterprises.
At present, the inherent habits of people in daily work and life are being changed rapidly by the rise of artificial intelligence and deep learning technology, the technical speciality of the automatic summarization method based on deep learning can be exerted in the field of electricity price policy management, generally speaking, electricity price policy information can be published on websites of the national level with strong specialty and authority, so that an electricity price policy electronic document can be obtained from the websites, in order to facilitate managers to know the key content of an electricity price policy text rapidly, key information in the electricity price policy electronic document needs to be extracted, then a summary document based on the electricity price policy electronic document is generated automatically, and policy makers are helped to obtain relevant information more efficiently.
In the prior art, the technology for realizing automatic summarization mainly comprises an abstract type summarization and a generated summarization, and a TextRank ordering algorithm is adopted in the abstract type summarization method, so that the abstract type summarization method is widely applied to the industry due to the characteristics of conciseness and high efficiency. However, the abstraction type abstract mainly considers word frequency, does not have too much semantic information, and cannot establish complete semantic information in text paragraphs.
The generated text abstract is mainly realized by a deep neural network structure, a basic framework is from Sequence-to-Sequence (Seq 2Seq) sequences proposed by the Google Brain team in 2014, and an Encoder-Decoder framework (Encoder-Decoder) is adopted, wherein the most classical Encoder and Decoder are both formed by a plurality of layers of RNN/LSTM, the Encoder is responsible for encoding an original text into a vector, and the Decoder is responsible for extracting information from the vector, acquiring semantics and generating the text abstract.
However, when the existing automatic summarization technology is applied to texts with certain writing formats, such as scientific papers and policy documents, the summarization effect is still insufficient, and the problems of Out of vocabulary (OOV), text repetition discontinuity and long-distance dependence mainly exist.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide a method for extracting a generative digest of a power policy document, which can solve the above mentioned problems and improve the efficiency and accuracy of generating the digest.
To solve the above technical problem, as an aspect of the present invention, there is provided a method for extracting a generative digest of a power policy file, including the steps of:
step S10, acquiring an electronic document of the power policy file from a specific website by adopting a crawler technology;
step S11, performing word segmentation processing on the electronic document, forming initial embedded data according to a word vector model, and inputting the initial embedded data into a pre-trained abstract generation model;
wherein the digest generation model employs an encoder-decoder framework in which an attention-based bidirectional Transformer model (Transformer) is used as a language representation model;
the encoder includes: the multi-head attention layer and the fully-connected feedforward neural network layer are formed by two sublayers, the connection between the sublayers adopts residual connection, and then layer normalization is carried out; the decoder at least comprises a multi-head attention layer with a mask, a multi-head attention layer and a fully-connected feedforward neural network layer, and the sub-layers are connected by adopting residual errors and are normalized;
step S12, adding position coding in the bottom embedding of the encoder and the decoder;
step S13, obtaining the generation probability of Pointer generation network (Pointer-Generator Networks) by using the output of the decoder at the current time and the previous time and attention distribution concatenation, and controlling to copy the content in the electronic document source text according to the generation probability or generate corresponding abstract content according to the attention.
Preferably, further comprising:
and constructing a summary generation model in advance and training to obtain the trained summary generation model.
Preferably, the pre-constructing a summary generation model and performing training, and the obtaining of the trained summary generation model further includes:
constructing a summary generation model adopting an encoder-decoder framework, wherein a bidirectional converter model based on an attention mechanism is used in both an encoder and a decoder;
counting all words in the training corpus and generating a dictionary file; and forming a training set;
and initially embedding the dictionary files in the training set into an encoder of the abstract generating model through a vector model, training the abstract generating model, and finally obtaining the trained abstract generating model.
Preferably, in step S13, the generation probability is obtained in the pointer generation network by:
calculating the attention product of each word embedding input and the decoder output, normalizing to obtain weights, and weighting and summing to obtain the attention score ei
ei=νTtanh(Whhi+Wsst+Wcct);
Then, attention is paid to the softmax operation
Figure BDA0003417091780000031
Figure BDA0003417091780000032
Calculating content vector c by multiplying attention and hidden layer of encoderiAnd a vocabulary distribution Pvocab
Figure BDA0003417091780000033
Pvocab=softmax(L(st,ci))
Wherein h isiEncoder hidden layer state representing the ith word, ciRepresents a content vector, siThe decoder hidden layer state representing the ith word;
calculating a generation probability P bygen
pgen=σ(Wc'ci+Wh'hi+Wxxt+bptr)
Obtaining the probability distribution of the final word list by combining the word list distribution and the attention distribution;
Figure BDA0003417091780000035
wherein, P_vocabIs a distribution of the word list,
Figure BDA0003417091780000034
is the attention distribution.
Preferably, the pointer generation network employs a hierarchical pointer generation network.
The implementation of the invention has the following beneficial effects:
the invention provides a method for extracting a generative abstract of a power policy file, which adopts a Seq2Seq framework integrated with an attention mechanism as a basic model for generating the abstract, and adds a pointer to generate a network at the same time, so that words are directly copied from a source document to solve the OOV problem;
and then combining a hierarchical structure of the policy document, and adding language model modeling language segment information of a language segment level (section level) on the basis of a pointer generation network. In the technology of language model modeling language segment information, the invention abandons the traditional RNN and LSTM structures, introduces a bidirectional converter model as a language representation model in a Seq2Seq framework integrated with an attention mechanism, and effectively solves the problem of long-distance dependence. The invention designs an improved attention mechanism to solve the problems of incoherent irrelevant content and repeated sentences in long texts.
The invention designs an automatic abstract identification method suitable for the long text aiming at the characteristics of the electricity price policy text that the long text and the writing format are relatively fixed, and integrates the special format characteristics of the automatic abstract identification method. The efficiency and the accuracy of the abstract extraction process can be improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is within the scope of the present invention for those skilled in the art to obtain other drawings based on the drawings without inventive exercise.
Fig. 1 is a schematic main flow chart of an embodiment of a method for collecting an electricity price policy document according to the present invention;
fig. 2 is a schematic diagram of the hierarchical pointer generation network referred to in fig. 1.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.
Fig. 1 is a main flow diagram illustrating an embodiment of a method for extracting a generative digest of a power policy file according to the present invention. Referring to fig. 2 together, in this embodiment, the method includes the following steps:
step S10, acquiring an electronic document of the power policy file from a specific website by adopting a crawler technology;
step S11, performing word segmentation processing on the electronic document, forming initial embedded data according to a word vector model, and inputting the initial embedded data into a pre-trained abstract generation model;
wherein the digest generation model employs an encoder-decoder framework in which an attention-based bidirectional Transformer model (Transformer) is used as a language representation model;
the encoder includes: the multi-head attention layer and the fully-connected feedforward neural network layer are formed by two sublayers, the connection between the sublayers adopts residual connection, and then layer normalization is carried out; the decoder at least comprises a multi-head attention layer with a mask, a multi-head attention layer and a fully-connected feedforward neural network layer, and the sub-layers are connected by adopting residual errors and are normalized;
step S12, adding position coding in the bottom embedding of the encoder and the decoder;
step S13, obtaining the generation probability of Pointer generation network (Pointer-Generator Networks) by using the output of the decoder at the current time and the previous time and attention distribution concatenation, and controlling to copy the content in the electronic document source text according to the generation probability or generate corresponding abstract content according to the attention. Specifically, if there is no decoded word in the lexical distribution, then it is replicated using the multi-head attention distribution, and if there is a decoded word in the lexical distribution, then a distributed representation of the decoded word is used
It is understood that, in a specific example of the present invention, further comprising:
and constructing a summary generation model in advance and training to obtain the trained summary generation model.
In one example, the pre-constructing and training the summary generation model, and obtaining the trained summary generation model further includes:
constructing a summary generation model adopting an encoder-decoder framework, wherein a bidirectional converter model based on an attention mechanism is used in both an encoder and a decoder;
counting all words in the training corpus and generating a dictionary file; and forming a training set;
and initially embedding the dictionary files in the training set into an encoder of the abstract generating model through a vector model, training the abstract generating model, and finally obtaining the trained abstract generating model.
Specifically, in step S13, the generation probability is obtained in the pointer generation network by:
calculating the attention product of each word embedding input and the decoder output, normalizing to obtain weights, and weighting and summing to obtain the attention score ei
ei=νTtanh(Whhi+Wsst+Wcct);
Then, attention is paid to the softmax operation
Figure BDA0003417091780000061
Figure BDA0003417091780000062
Calculating content vector c by multiplying attention and hidden layer of encoderiAnd vocabulary distribution Pvocab:
Figure BDA0003417091780000063
Pvocab=softmax(L(st,ci))
wherein h isiEncoder hidden layer state representing the ith word, ciRepresents a content vector, siThe decoder hidden layer state representing the ith word;
calculating a generation probability P bygen
pgen=σ(Wc'ci+Wh'hi+Wxxt+bptr)
Obtaining the probability distribution of the final word list by combining the word list distribution and the attention distribution;
Figure BDA0003417091780000065
wherein, P_vocabIs a distribution of the word list,
Figure BDA0003417091780000064
is the attention distribution.
In one example of the present invention, the pointer generation network employs a hierarchical pointer generation network.
For better understanding, the following further describes each of the aspects of the present invention.
First, in the embodiments provided herein, the Sequence-to-Sequence (Sequence-to-Sequence) framework of Attention (Attention) mechanism is adopted, which attempts to use the RNN as an encoder and decoder first. And simultaneously adding a Pointer to generate a network (Pointer-Generator Networks), and using initial embedding of words obtained by a pre-trained word vector model as the input of the model.
It can be understood that, compared with the Seq2Seq framework which is only adopted and is integrated with the Attention mechanism, the decoder generates the distribution P of a word list through the softmax functionvocabDifferent, it means that attention calculation is performed once for the generating network and the words in the source document at the decoder stage, thereby generating an attention distribution. At this point, the pointer generation network computes the attention product with its decoder output by embedding each word in the input, then normalizing to get the weight, and weighted summing to get the attention score ei
ei=νTtanh(Whhi+Wsst+Wcct);
Then, attention is paid to the softmax operation
Figure BDA0003417091780000071
Figure BDA0003417091780000072
Calculating content vector c by multiplying attention and hidden layer of encoderiAnd a vocabulary distribution Pvocab
Figure BDA0003417091780000073
Pvocab=softmax(L(st,ci))
Wherein h isiEncoder hidden layer state representing the ith word, ciRepresents a content vector, siThe decoder hidden layer state representing the ith word;
calculating a generation probability P bygen
pgen=σ(Wc'ci+Wh'hi+Wxxt+bptr)
Obtaining the probability distribution of the final word list by combining the word list distribution and the attention distribution;
Figure BDA0003417091780000074
wherein, P_vocabIs a distribution of the word list,
Figure BDA0003417091780000075
is the attention distribution. PgenCan be viewed as a switch that controls whether to copy words from the input queue or to generate new words, P if unregistered_vocab0, which can only be obtained by replication; words can only be generated by the model if they do not appear in the input text.
The probability of generating words is obtained as the final output result.
Secondly, in the invention, a pointer with a hierarchical structure is adopted to generate a network.
Since price policy articles are often structured, they are organized into a section. When summarization is performed, information is generally extracted from different speech segments and then summarized. Therefore, the invention adds language model modeling language segment information of a language segment level (section level) on the basis of the prior art.
As shown in fig. 2, a schematic diagram of a hierarchical pointer generation network employed by the present invention is shown.
Wherein, for the encoder:
the lowest layer word-level RNN generates an expression of a section, wherein an superscript(s) represents the section, (t) represents a decode step, (e) represents an encoder, (d) represents a decoder, a subscript i represents a word number, and j represents the section number;
Figure BDA0003417091780000081
x(j,i)the word representing the ith word, part of the jth word, embeds a vector.
Section-level RNN generates a representation of a document using underlying input
Figure BDA0003417091780000082
Figure BDA0003417091780000083
Representing the hidden layer state of the ith part.
For the decoder:
the specific method is that the context coefficient is provided with the information of section level, the context coefficient is represented as firstly summing in a section and then summing all the sections.
Figure BDA0003417091780000084
Wherein the content of the first and second substances,
Figure BDA0003417091780000085
the encoder hidden layer state of the ith word representing the jth part,
Figure BDA0003417091780000086
representing a corresponding attention. c. CiA content vector is represented.
The newly introduced variable relates to the attribute of the section level:
Figure BDA0003417091780000087
wherein the content of the first and second substances,
Figure BDA0003417091780000091
the encoder hidden layer state of the jth section is represented,
Figure BDA0003417091780000092
representing the decoder hidden layer state at time t-1.
In general, the attention coefficient is calculated as follows,
Figure BDA0003417091780000093
wherein the content of the first and second substances,
Figure BDA0003417091780000094
the encoder hidden layer state representing the jth partial ith word,
Figure BDA0003417091780000095
representing the decoder hidden layer state at time t-1.
The coverage vector is calculated as:
Figure BDA0003417091780000096
and (3) calculating the final probability:
Figure BDA0003417091780000097
wherein the content of the first and second substances,
Figure BDA0003417091780000098
representing the decoder hidden layer state at time t, ctIs a coverage vector.
Third, in the present invention, the RNN model in the encoder-decoder framework is replaced with a bidirectional Transformer model (Transformer).
The problem of long-distance dependence and parallel computation can be effectively solved by a bidirectional converter model (Transformer), which is based on a self-attention (self-attention) layer and is divided into an encoder part and a decoder part. The model structure can be combined with the model structure in the prior art.
The input word embedding vector and the corresponding element of the position embedding vector are added at the encoder end, so that the model can learn more information of word positions in the sentence, and finally the model can distinguish words at different positions in the sentence. They are input to the self attention layer, attention coefficients are calculated, and finally a vector Z is output to the next encoder.
Figure BDA0003417091780000099
Where the input to the attention mechanism is Q (query), the key-value pair (K, V) is used to store the context. For the attention mechanism, Q is K and V, and is calculated by the similarity of the text and the multiplication of the text and the text. And the result is spliced by using a multi-head attention mechanism, so that the current word and other words can show more relationships by the multi-head attention.
Then, the addition operation of the corresponding elements is performed through an addition (Add) Layer, and Layer Normalization is used to avoid the over-fitting problem. Then a Feed-Forward neural network (Feed Forward) is connected to map the attention matrix Z into a space with higher dimensionality, and the ReLU is used for carrying out nonlinear operation, and finally the attention matrix Z is restored to be in the same dimensionality as Z.
After 6 identical encoder processes, a vector R is finally output, representing all the encoded representation information for the source sequence. The vector R will be converted K, V into two vectors, where the key-value pairs (K, V) are used to store the context, which will be used for the computation of the encoding-decoding attention layer in the decoder part, thus enabling the information integration of the encoder and decoder.
In the decoder section, the process before the Linear layer is the same as the encoder. Because the encoder belongs to the prediction process, Linear operation is required to be performed on a Linear layer to realize dimension expansion (vector dimension is changed into the length of a dictionary), the final probability distribution of the whole dictionary is obtained through softmax normalization operation, and the index (index) with the maximum probability is selected, so that the corresponding generated word can be obtained.
This word is then input as the next predicted word, and so on until the sentence end flag < EOS > is generated, at which point the decoder portion ends.
It can be appreciated that the method adopted by the present invention solves the OOV problem in the policy document by adopting the Seq2Seq framework integrated with the attention mechanism and adopting the improved pointer to generate the network; compared with the traditional Seq2Seq framework integrated with attention mechanism, the pointer generation network has the capability of directly copying words from source documents, and the capability has a very good effect on generating some OOVs. Since the policy articles are often structured, they are organized into a section. When people abstract, information is usually extracted from different speech segments and then summarized. Therefore, the language model modeling language segment information of the language segment level (section level) is added to the basic pointer generation network, and the pointer generation network with the hierarchical structure is designed.
Meanwhile, a bidirectional converter model is introduced into a Seq2Seq framework integrated with an attention mechanism as a language representation model so as to solve the problem that the existing attention mechanism is easy to repeat and discontinuous for generating long texts; the abstract generation model designed by the invention can not pay more attention to a specific part and generate repeated sentences. First, the attention layer at the encoder calculates weights for each word in the input, which enables the generated content information to be overlaid on the original text without paying attention to a specific piece of content. And the attention layer at the decoder also calculates weights for the words that have already been generated, which avoids generating duplicate content. After the attention mechanism is used on the encoder and the decoder respectively, the encoder and the decoder are spliced together to decode to generate the next word, so that the generation of repeated sentences can be avoided.
The implementation of the invention has the following beneficial effects: the implementation of the invention has the following beneficial effects:
the invention provides a method for extracting a generative abstract of a power policy file, which adopts a Seq2Seq framework integrated with an attention mechanism as a basic model for generating the abstract, and adds a pointer to generate a network at the same time, so that words are directly copied from a source document to solve the OOV problem;
and then combining a hierarchical structure of the policy document, and adding language model modeling language segment information of a language segment level (section level) on the basis of a pointer generation network. In the technology of language model modeling language segment information, the invention abandons the traditional RNN and LSTM structures, introduces a bidirectional converter model as a language representation model in a Seq2Seq framework integrated with an attention mechanism, and effectively solves the problem of long-distance dependence. The invention designs an improved attention mechanism to solve the problems of incoherent irrelevant content and repeated sentences in long texts.
The invention designs an automatic abstract identification method suitable for the long text aiming at the characteristics of the electricity price policy text that the long text and the writing format are relatively fixed, and integrates the special format characteristics of the automatic abstract identification method. The efficiency and the accuracy of the abstract extraction process can be improved.
While the invention has been described in connection with what is presently considered to be the most practical and preferred embodiment, it is to be understood that the invention is not to be limited to the disclosed embodiment, but on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (5)

1. A method for extracting a generative abstract of a power policy document is characterized by comprising the following steps:
step S10, acquiring an electronic document of the power policy file from a specific website by adopting a crawler technology;
step S11, performing word segmentation processing on the electronic document, forming initial embedded data according to a word vector model, and inputting the initial embedded data into a pre-trained abstract generation model;
wherein the digest generation model employs an encoder-decoder framework in which an attention-based bidirectional Transformer model (Transformer) is used as a language representation model;
the encoder includes: the multi-head attention layer and the fully-connected feedforward neural network layer are formed by two sublayers, the connection between the sublayers adopts residual connection, and then layer normalization is carried out; the decoder at least comprises a multi-head attention layer with a mask, a multi-head attention layer and a fully-connected feedforward neural network layer, and the sub-layers are connected by adopting residual errors and are normalized;
step S12, adding position coding in the bottom embedding of the encoder and the decoder;
step S13, obtaining the generation probability of Pointer generation network (Pointer-Generator Networks) by using the output of the decoder at the current time and the previous time and attention distribution concatenation, and controlling to copy the content in the electronic document source text according to the generation probability or generate corresponding abstract content according to the attention.
2. The method of claim 1, further comprising:
and constructing a summary generation model in advance and training to obtain the trained summary generation model.
3. The method of claim 2, wherein the pre-constructing and training the summary generation model, and obtaining the trained summary generation model further comprises:
constructing a summary generation model adopting an encoder-decoder framework, wherein a bidirectional converter model based on an attention mechanism is used in both an encoder and a decoder;
counting all words in the training corpus and generating a dictionary file; and forming a training set;
and initially embedding the dictionary files in the training set into an encoder of the abstract generating model through a vector model, training the abstract generating model, and finally obtaining the trained abstract generating model.
4. The method of claim 1, wherein in step S13, the generation probability is obtained in the pointer generation network by:
calculating the attention product of each word embedding input and the decoder output, normalizing to obtain weights, and weighting and summing to obtain the attention score ei
ei=νTtanh(Whhi+Wsst+Wcct);
Then, attention is paid to the softmax operation
Figure FDA0003417091770000021
Figure FDA0003417091770000022
Calculating content vector c by multiplying attention and hidden layer of encoderiAnd a vocabulary distribution Pvocab
Figure FDA0003417091770000023
Pvocab=soft max(L(st,ci))
Wherein h isiEncoder hidden layer state representing the ith word, ciRepresents a content vector, siThe decoder hidden layer state representing the ith word;
calculating a generation probability P bygen
pgen=σ(Wc'ci+Wh'hi+Wxxt+bptr)
Obtaining the probability distribution of the final word list by combining the word list distribution and the attention distribution;
Figure FDA0003417091770000024
wherein, P_vocabIs a distribution of the word list,
Figure FDA0003417091770000025
is the attention distribution.
5. The method of any of claims 1 to 4, wherein the pointer generation network employs a hierarchical pointer generation network.
CN202111550623.2A 2021-12-17 2021-12-17 Method for extracting generative abstract of power policy file Pending CN114254175A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111550623.2A CN114254175A (en) 2021-12-17 2021-12-17 Method for extracting generative abstract of power policy file

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111550623.2A CN114254175A (en) 2021-12-17 2021-12-17 Method for extracting generative abstract of power policy file

Publications (1)

Publication Number Publication Date
CN114254175A true CN114254175A (en) 2022-03-29

Family

ID=80795597

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111550623.2A Pending CN114254175A (en) 2021-12-17 2021-12-17 Method for extracting generative abstract of power policy file

Country Status (1)

Country Link
CN (1) CN114254175A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116933785A (en) * 2023-06-30 2023-10-24 国网湖北省电力有限公司武汉供电公司 Transformer-based electronic file abstract generation method, system and medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116933785A (en) * 2023-06-30 2023-10-24 国网湖北省电力有限公司武汉供电公司 Transformer-based electronic file abstract generation method, system and medium

Similar Documents

Publication Publication Date Title
CN110134771B (en) Implementation method of multi-attention-machine-based fusion network question-answering system
US11741109B2 (en) Dialogue system, a method of obtaining a response from a dialogue system, and a method of training a dialogue system
US11210306B2 (en) Dialogue system, a method of obtaining a response from a dialogue system, and a method of training a dialogue system
CN113158665B (en) Method for improving dialog text generation based on text abstract generation and bidirectional corpus generation
CN111460092B (en) Multi-document-based automatic complex problem solving method
CN110020438A (en) Enterprise or tissue Chinese entity disambiguation method and device based on recognition sequence
CN110413768B (en) Automatic generation method of article titles
CN109992775B (en) Text abstract generation method based on high-level semantics
CN112765345A (en) Text abstract automatic generation method and system fusing pre-training model
CN111666756B (en) Sequence model text abstract generation method based on theme fusion
CN112417901A (en) Non-autoregressive Mongolian machine translation method based on look-around decoding and vocabulary attention
CN110807324A (en) Video entity identification method based on IDCNN-crf and knowledge graph
CN112818698A (en) Fine-grained user comment sentiment analysis method based on dual-channel model
CN115062140A (en) Method for generating abstract of BERT SUM and PGN fused supply chain ecological district length document
CN115600581B (en) Controlled text generation method using syntactic information
CN114139497A (en) Text abstract extraction method based on BERTSUM model
CN114218928A (en) Abstract text summarization method based on graph knowledge and theme perception
CN113239666A (en) Text similarity calculation method and system
CN112417138A (en) Short text automatic summarization method combining pointer generation type and self-attention mechanism
CN115048511A (en) Bert-based passport layout analysis method
Qiu et al. Text summarization based on multi-head self-attention mechanism and pointer network
Wang et al. Vector-to-sequence models for sentence analogies
CN114281982B (en) Book propaganda abstract generation method and system adopting multi-mode fusion technology
CN116663578A (en) Neural machine translation method based on strategy gradient method improvement
CN114254175A (en) Method for extracting generative abstract of power policy file

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination