CN115496061B - Construction method of neural network title generation model - Google Patents
Construction method of neural network title generation model Download PDFInfo
- Publication number
- CN115496061B CN115496061B CN202211213861.9A CN202211213861A CN115496061B CN 115496061 B CN115496061 B CN 115496061B CN 202211213861 A CN202211213861 A CN 202211213861A CN 115496061 B CN115496061 B CN 115496061B
- Authority
- CN
- China
- Prior art keywords
- word
- calculated
- node
- layer
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/258—Heading extraction; Automatic titling; Numbering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
- G06F40/186—Templates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Machine Translation (AREA)
Abstract
The invention relates to the technical field of natural language processing, in particular to a method for constructing a neural network title generation model. The neural network header generation model includes: a rich information word vector layer; an interaction attention layer between nodes; an interaction attention layer in the node; a node selection layer; decoding the layer. The invention creatively proposes the modeling of taking the sampling result generated by the basic neural network model as a soft template to assist the neural network title generation. With the improvements made herein, the invocation of additional information retrieval libraries, as well as manual design data cleansing rules, can be avoided. The method can ensure the conciseness and continuity of the abstract on the premise of not requiring any training data.
Description
Technical Field
The invention relates to the technical field of natural language processing, in particular to a method for constructing a neural network title generation model.
Background
With the rapid development of deep learning technology in the field of natural language processing, the generation of end-to-end neural network titles also enters a brand new development stage. Through a huge neural network, the end-to-end neural network title generation system maps between the input article and the title, and generates a corresponding title for a document word by word without additional linguistic knowledge and more manual labels.
Despite significant success, neural network header generation models still face problems such as losing important information and generating duplicates or extra words. Therefore, how to help a neural network header generation system circumvent the above-described problems has attracted a great deal of attention.
Disclosure of Invention
In order to solve the technical problems, the invention provides a method for constructing a neural network title generation model, which can ensure the conciseness and consistency of the abstract on the premise of not requiring any training data.
In order to solve the technical problems, the invention adopts the following technical scheme:
a method for constructing a neural network title generation model includes:
a rich information word vector layer;
an interaction attention layer between nodes;
an interaction attention layer in the node;
a node selection layer;
a decoding layer;
end-to-end training.
Preferably, the method for constructing the rich information word vector layer includes:
s1: a selected one comprisesNews document->Corresponding to said document containing +.>Title of individual word->And->Corresponding templates generated by the basic neural network header generation model sample +.>Wherein->The individual templates comprise->A personal word;
s2: combining the document and each template into a pair, and combining the document templates into a pairTreated as a node;
s3: obtaining word representations corresponding to each word in all nodes by adopting a pre-trained language model;
The pre-trained language model has the ability to efficiently generate contextually relevant word representations rich in semantic and syntactic information over a variety of natural language processing tasks.
Preferably, the method for constructing the interaction attention layer between the nodes comprises the following steps:
first calculateAnd->First->Interactive attention weight associated with initial representation of individual words>Calculated by formula (2):
then the firstFirst->The representation of the individual words is aggregated from +.>Information of individual nodes is calculated by the formula (3):
further build up of the first according to the above formulaNode and->Node-dependent vector representations are calculated by equation (4):
different nodes constructed using different templates contain unique information, and semantic interactions between the different nodes will help the model better capture important information. The fully connected inter-node interaction attention layer is intended to implement this idea.
Preferably, the method for constructing the inter-node interaction attention layer comprises the following steps:
s1: calculating a document-template matching matrixIndicating->Matching degree between source document and template in individual node for +.>Each element of->Use of->Personal document word and +.>The rich information word vector of each template word is calculated, and the calculation is carried out through a formula (5):
s2: obtaining attention scores of source document words and template wordsAttention score of template word relative to source document word +.>By the formulas (6) and (7), respectively:
s3: the correlation vectors of the source article and the soft template are calculated by equation (8) and equation (9), respectively:
preferably, the node selection layer is intended to controlMake the firstFinal fine-grained node representation of individual nodes +.>Middle->Is calculated by the formula (10):
wherein the method comprises the steps ofRepresenting element multiplication->Representing cascade operation, a given node selects the attention fraction +.>First->No. 5 of individual nodes>The final fine granularity of the individual words is calculated by equation (11):
preferably, a transducer solution is selectedThe encoder decodes the output header word by word, the first of the decoded output headersThe conditional probability of an individual word is calculated by equation (13):
wherein the method comprises the steps ofFrom the target representation matrix->,/>Indicating the number of decoder layers, ">Representing a feed-forward neural network;
compared with the prior art, the invention has the beneficial effects that:
the method can ensure the conciseness and continuity of the abstract on the premise of not requiring any training data. However, it is not practical to manually create all templates because this work requires not only intensive labor, but also a great deal of domain knowledge. In the context of deep learning, an improved template-based headline generation method provides similar guidance for summaries using summaries of specific articles in some training sets as templates. This approach, while avoiding the problem of manually creating templates, requires a elaborate design for the process of retrieving a particular article. The retrieval module uses an information retrieval standard library Apache Lucene, and a certain background knowledge is required for calling the library itself. Then, in order to retrieve according to an article, some specific information in the article, such as date, guide language, etc. needs to be removed to eliminate the influence of the specific information on the matching of the article. Then, according to the cleaned articles, a small part of candidate documents are searched out through a search system, and corresponding titles are searched out from the training set and used as soft templates. The invention creatively proposes the modeling of taking the sampling result generated by the basic neural network model as a soft template to assist the neural network title generation. With the improvements made herein, the invocation of additional information retrieval libraries, as well as manual design data cleansing rules, can be avoided.
Drawings
FIG. 1 is a flow chart of the present invention.
Description of the embodiments
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1, a method for constructing a neural network header generation model, the neural network header generation model includes:
a rich information word vector layer;
an interaction attention layer between nodes;
an interaction attention layer in the node;
a node selection layer;
a decoding layer;
end-to-end training.
The construction method of the rich information word vector layer comprises the following steps:
s1: a selected one comprisesNews document->Corresponding to the document contain->Title of individual word->And->Corresponding templates generated by the basic neural network header generation model sample +.>Wherein->The individual templates comprise->A personal word;
s2: combining the document and each template into a pair, and combining the document templates into a pairTreated as a node;
s3: obtaining word representations corresponding to each word in all nodes by adopting a pre-trained language model;
The construction method of the interaction attention layer between the nodes comprises the following steps:
first calculateAnd->First->Interactive attention weight associated with initial representation of individual words>Calculated by formula (2):
then the firstFirst->The representation of the individual words is aggregated from +.>Information of individual nodes is expressed by the formula (3) And (3) calculating:
further build up of the first according to the above formulaNode and->Node-dependent vector representations are calculated by equation (4):
s1: calculating a document-template matching matrixIndicating->Matching degree between source document and template in individual node for +.>Each element of->Use of->Personal document word and +.>The rich information word vector of each template word is calculated, and the calculation is carried out through a formula (5):
s2: obtaining attention scores of source document words and template wordsAttention score of template word relative to source document word +.>By the formulas (6) and (7), respectively:
s3: the correlation vectors of the source article and the soft template are calculated by equation (8) and equation (9), respectively:
the node selection layer is intended to control theFinal fine-grained node representation of individual nodes +.>Middle->Is calculated by the formula (10):
wherein the method comprises the steps ofRepresenting element multiplication->Representing cascade operation, a given node selects the attention fraction +.>First->No. 5 of individual nodes>The final fine granularity of the individual words is calculated by equation (11):
selecting a transducer decoder to decode the output header word by word, decoding the first of the output headersThe conditional probability of an individual word is calculated by equation (13):
wherein the method comprises the steps ofFrom the target representation matrix->,/>Indicating the number of decoder layers, ">Representing a feed-forward neural network;
the preferred embodiments of the present invention have been described in detail, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the spirit of the present invention, and the various changes are included in the scope of the present invention.
Claims (2)
1. The method for constructing the neural network title generation model is characterized by comprising the following steps of:
a rich information word vector layer;
an interaction attention layer between nodes;
an interaction attention layer in the node;
a node selection layer;
a decoding layer;
end-to-end training;
the construction method of the rich information word vector layer comprises the following steps:
s1: a selected one comprisesNews document->Corresponding to said document containing +.>Title of individual word->And->Corresponding templates generated by the basic neural network header generation model sample +.>Wherein->The individual templates comprise->A personal word;
s2: combining the document and each template into a pair, and combining the document templates into a pairTreated as a node;
s3: obtaining word representations corresponding to each word in all nodes by adopting a pre-trained language model;
The construction method of the interaction attention layer between the nodes comprises the following steps:
first calculateAnd->First->Interactive attention weight associated with initial representation of individual words>Calculated by formula (2):
then the firstFirst->The representation of the individual words is aggregated from +.>Information of individual nodes is calculated by the formula (3):
further build up of the first according to the above formulaNode and->Node-dependent vector representations are calculated by equation (4):
the construction method of the intra-node interaction attention layer comprises the following steps:
s1: calculating a document-template matching matrixIndicating->Matching degree between source document and template in individual node for +.>Each element of->Use of->Personal document word and +.>Personal templatesThe rich information word vector of the word is calculated by the formula (5):
s2: obtaining attention scores of source document words and template wordsAttention score of template word relative to source document word +.>By the formulas (6) and (7), respectively:
s3: the correlation vectors of the source article and the soft template are calculated by equation (8) and equation (9), respectively:
the node selection layer is intended to control theFinal fine-grained node representation of individual nodes +.>Middle->Is calculated by the formula (10):
wherein the method comprises the steps ofRepresenting element multiplication->Representing cascade operation, a given node selects the attention fraction +.>First->No. 5 of individual nodes>The final fine granularity of the individual words is calculated by equation (11):
2. according toThe method for constructing a neural network header generation model as claimed in claim 1, wherein: selecting a transducer decoder to decode the output header word by word, decoding the first of the output headersThe conditional probability of an individual word is calculated by equation (13):
wherein the method comprises the steps ofFrom the target representation matrix->,/>Indicating the number of decoder layers, ">Representing a feed-forward neural network;
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211213861.9A CN115496061B (en) | 2022-09-30 | 2022-09-30 | Construction method of neural network title generation model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211213861.9A CN115496061B (en) | 2022-09-30 | 2022-09-30 | Construction method of neural network title generation model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115496061A CN115496061A (en) | 2022-12-20 |
CN115496061B true CN115496061B (en) | 2023-06-20 |
Family
ID=84471478
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211213861.9A Active CN115496061B (en) | 2022-09-30 | 2022-09-30 | Construction method of neural network title generation model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115496061B (en) |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6674900B1 (en) * | 2000-03-29 | 2004-01-06 | Matsushita Electric Industrial Co., Ltd. | Method for extracting titles from digital images |
CN106502985B (en) * | 2016-10-20 | 2020-01-31 | 清华大学 | neural network modeling method and device for generating titles |
US10474709B2 (en) * | 2017-04-14 | 2019-11-12 | Salesforce.Com, Inc. | Deep reinforced model for abstractive summarization |
CN108984524A (en) * | 2018-07-05 | 2018-12-11 | 北京理工大学 | A kind of title generation method based on variation neural network topic model |
CN110956041A (en) * | 2019-11-27 | 2020-04-03 | 重庆邮电大学 | Depth learning-based co-purchase recombination bulletin summarization method |
CN113407708A (en) * | 2020-03-17 | 2021-09-17 | 阿里巴巴集团控股有限公司 | Feed generation method, information recommendation method, device and equipment |
CN112560456B (en) * | 2020-11-03 | 2024-04-09 | 重庆安石泽太科技有限公司 | Method and system for generating generated abstract based on improved neural network |
CN114020900B (en) * | 2021-11-16 | 2024-03-26 | 桂林电子科技大学 | Chart English abstract generating method based on fusion space position attention mechanism |
CN114218928A (en) * | 2021-12-30 | 2022-03-22 | 杭州电子科技大学 | Abstract text summarization method based on graph knowledge and theme perception |
CN115019142B (en) * | 2022-06-14 | 2024-03-29 | 辽宁工业大学 | Image title generation method and system based on fusion characteristics and electronic equipment |
-
2022
- 2022-09-30 CN CN202211213861.9A patent/CN115496061B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN115496061A (en) | 2022-12-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Eisenstein | Introduction to natural language processing | |
Zhou et al. | KdConv: A Chinese multi-domain dialogue dataset towards multi-turn knowledge-driven conversation | |
CN109086408B (en) | Text generation method and device, electronic equipment and computer readable medium | |
Chakrabarty et al. | MERMAID: Metaphor generation with symbolism and discriminative decoding | |
CN107944027B (en) | Method and system for creating semantic key index | |
Liao et al. | Improving readability for automatic speech recognition transcription | |
Evain et al. | Task agnostic and task specific self-supervised learning from speech with lebenchmark | |
CN108153864A (en) | Method based on neural network generation text snippet | |
Zhang et al. | Effective subword segmentation for text comprehension | |
Wang et al. | TEDT: Transformer-based encoding–decoding translation network for multimodal sentiment analysis | |
Shen et al. | Compose like humans: Jointly improving the coherence and novelty for modern chinese poetry generation | |
Xu et al. | A comprehensive survey of automated audio captioning | |
Pei et al. | S2SPMN: A simple and effective framework for response generation with relevant information | |
CN115358289A (en) | Text generation algorithm fusing multi-type knowledge base and inference technology | |
Shang et al. | Entity resolution in open-domain conversations | |
Zhu | Machine reading comprehension: algorithms and practice | |
Li et al. | Semi-supervised Domain Adaptation for Dependency Parsing via Improved Contextualized Word Representations | |
Wei et al. | KICGPT: Large Language Model with Knowledge in Context for Knowledge Graph Completion | |
Khan et al. | A clustering framework for lexical normalization of Roman Urdu | |
Rizou et al. | Efficient intent classification and entity recognition for university administrative services employing deep learning models | |
CN115496061B (en) | Construction method of neural network title generation model | |
Chang et al. | Singability-enhanced lyric generator with music style transfer | |
Bao et al. | AEG: Argumentative essay generation via a dual-decoder model with content planning | |
JP2023071785A (en) | Acoustic signal search device, acoustic signal search method, data search device, data search method and program | |
Ni et al. | Masked siamese prompt tuning for few-shot natural language understanding |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |