CN115496061B - Construction method of neural network title generation model - Google Patents

Construction method of neural network title generation model Download PDF

Info

Publication number
CN115496061B
CN115496061B CN202211213861.9A CN202211213861A CN115496061B CN 115496061 B CN115496061 B CN 115496061B CN 202211213861 A CN202211213861 A CN 202211213861A CN 115496061 B CN115496061 B CN 115496061B
Authority
CN
China
Prior art keywords
word
calculated
node
layer
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211213861.9A
Other languages
Chinese (zh)
Other versions
CN115496061A (en
Inventor
阿雅娜
卜范玉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inner mongolia university of finance and economics
Original Assignee
Inner mongolia university of finance and economics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inner mongolia university of finance and economics filed Critical Inner mongolia university of finance and economics
Priority to CN202211213861.9A priority Critical patent/CN115496061B/en
Publication of CN115496061A publication Critical patent/CN115496061A/en
Application granted granted Critical
Publication of CN115496061B publication Critical patent/CN115496061B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/258Heading extraction; Automatic titling; Numbering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to the technical field of natural language processing, in particular to a method for constructing a neural network title generation model. The neural network header generation model includes: a rich information word vector layer; an interaction attention layer between nodes; an interaction attention layer in the node; a node selection layer; decoding the layer. The invention creatively proposes the modeling of taking the sampling result generated by the basic neural network model as a soft template to assist the neural network title generation. With the improvements made herein, the invocation of additional information retrieval libraries, as well as manual design data cleansing rules, can be avoided. The method can ensure the conciseness and continuity of the abstract on the premise of not requiring any training data.

Description

Construction method of neural network title generation model
Technical Field
The invention relates to the technical field of natural language processing, in particular to a method for constructing a neural network title generation model.
Background
With the rapid development of deep learning technology in the field of natural language processing, the generation of end-to-end neural network titles also enters a brand new development stage. Through a huge neural network, the end-to-end neural network title generation system maps between the input article and the title, and generates a corresponding title for a document word by word without additional linguistic knowledge and more manual labels.
Despite significant success, neural network header generation models still face problems such as losing important information and generating duplicates or extra words. Therefore, how to help a neural network header generation system circumvent the above-described problems has attracted a great deal of attention.
Disclosure of Invention
In order to solve the technical problems, the invention provides a method for constructing a neural network title generation model, which can ensure the conciseness and consistency of the abstract on the premise of not requiring any training data.
In order to solve the technical problems, the invention adopts the following technical scheme:
a method for constructing a neural network title generation model includes:
a rich information word vector layer;
an interaction attention layer between nodes;
an interaction attention layer in the node;
a node selection layer;
a decoding layer;
end-to-end training.
Preferably, the method for constructing the rich information word vector layer includes:
s1: a selected one comprises
Figure SMS_2
News document->
Figure SMS_4
Corresponding to said document containing +.>
Figure SMS_6
Title of individual word->
Figure SMS_3
And->
Figure SMS_5
Corresponding templates generated by the basic neural network header generation model sample +.>
Figure SMS_7
Wherein->
Figure SMS_8
The individual templates comprise->
Figure SMS_1
A personal word;
s2: combining the document and each template into a pair, and combining the document templates into a pair
Figure SMS_9
Treated as a node;
s3: obtaining word representations corresponding to each word in all nodes by adopting a pre-trained language model;
s4: obtaining the obtainedThe obtained rich information word vector
Figure SMS_10
Calculated by formula (1):
Figure SMS_11
(1)
wherein the method comprises the steps of
Figure SMS_12
And->
Figure SMS_13
Respectively represent "[ CLS ]]"and" [ SEP]”。
The pre-trained language model has the ability to efficiently generate contextually relevant word representations rich in semantic and syntactic information over a variety of natural language processing tasks.
Preferably, the method for constructing the interaction attention layer between the nodes comprises the following steps:
first calculate
Figure SMS_14
And->
Figure SMS_15
First->
Figure SMS_16
Interactive attention weight associated with initial representation of individual words>
Figure SMS_17
Calculated by formula (2):
Figure SMS_18
(2)
wherein the method comprises the steps of
Figure SMS_19
Representing a weight matrix;
then the first
Figure SMS_20
First->
Figure SMS_21
The representation of the individual words is aggregated from +.>
Figure SMS_22
Information of individual nodes is calculated by the formula (3):
Figure SMS_23
(3)
further build up of the first according to the above formula
Figure SMS_24
Node and->
Figure SMS_25
Node-dependent vector representations are calculated by equation (4):
Figure SMS_26
(4)
different nodes constructed using different templates contain unique information, and semantic interactions between the different nodes will help the model better capture important information. The fully connected inter-node interaction attention layer is intended to implement this idea.
Preferably, the method for constructing the inter-node interaction attention layer comprises the following steps:
s1: calculating a document-template matching matrix
Figure SMS_27
Indicating->
Figure SMS_28
Matching degree between source document and template in individual node for +.>
Figure SMS_29
Each element of->
Figure SMS_30
Use of->
Figure SMS_31
Personal document word and +.>
Figure SMS_32
The rich information word vector of each template word is calculated, and the calculation is carried out through a formula (5):
Figure SMS_33
(5)
wherein the method comprises the steps of
Figure SMS_34
Representing a weight matrix;
s2: obtaining attention scores of source document words and template words
Figure SMS_35
Attention score of template word relative to source document word +.>
Figure SMS_36
By the formulas (6) and (7), respectively:
Figure SMS_37
Figure SMS_38
(7);
s3: the correlation vectors of the source article and the soft template are calculated by equation (8) and equation (9), respectively:
Figure SMS_39
Figure SMS_40
(9)
preferably, the node selection layer is intended to controlMake the first
Figure SMS_41
Final fine-grained node representation of individual nodes +.>
Figure SMS_42
Middle->
Figure SMS_43
Is calculated by the formula (10):
Figure SMS_44
(10)
wherein the method comprises the steps of
Figure SMS_45
Representing element multiplication->
Figure SMS_46
Representing cascade operation, a given node selects the attention fraction +.>
Figure SMS_47
First->
Figure SMS_48
No. 5 of individual nodes>
Figure SMS_49
The final fine granularity of the individual words is calculated by equation (11):
Figure SMS_50
(11)
according to the above formula, the first
Figure SMS_51
The individual nodes are calculated by equation (12):
Figure SMS_52
(12)
preferably, a transducer solution is selectedThe encoder decodes the output header word by word, the first of the decoded output headers
Figure SMS_53
The conditional probability of an individual word is calculated by equation (13):
Figure SMS_54
(13)
wherein the method comprises the steps of
Figure SMS_55
From the target representation matrix->
Figure SMS_56
,/>
Figure SMS_57
Indicating the number of decoder layers, ">
Figure SMS_58
Representing a feed-forward neural network;
Figure SMS_59
defined by equation (14):
Figure SMS_60
(14)
wherein the method comprises the steps of
Figure SMS_61
Representation layer normalization operations;
Figure SMS_62
calculated by equation (15):
Figure SMS_63
(15)。
compared with the prior art, the invention has the beneficial effects that:
the method can ensure the conciseness and continuity of the abstract on the premise of not requiring any training data. However, it is not practical to manually create all templates because this work requires not only intensive labor, but also a great deal of domain knowledge. In the context of deep learning, an improved template-based headline generation method provides similar guidance for summaries using summaries of specific articles in some training sets as templates. This approach, while avoiding the problem of manually creating templates, requires a elaborate design for the process of retrieving a particular article. The retrieval module uses an information retrieval standard library Apache Lucene, and a certain background knowledge is required for calling the library itself. Then, in order to retrieve according to an article, some specific information in the article, such as date, guide language, etc. needs to be removed to eliminate the influence of the specific information on the matching of the article. Then, according to the cleaned articles, a small part of candidate documents are searched out through a search system, and corresponding titles are searched out from the training set and used as soft templates. The invention creatively proposes the modeling of taking the sampling result generated by the basic neural network model as a soft template to assist the neural network title generation. With the improvements made herein, the invocation of additional information retrieval libraries, as well as manual design data cleansing rules, can be avoided.
Drawings
FIG. 1 is a flow chart of the present invention.
Description of the embodiments
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1, a method for constructing a neural network header generation model, the neural network header generation model includes:
a rich information word vector layer;
an interaction attention layer between nodes;
an interaction attention layer in the node;
a node selection layer;
a decoding layer;
end-to-end training.
The construction method of the rich information word vector layer comprises the following steps:
s1: a selected one comprises
Figure SMS_66
News document->
Figure SMS_67
Corresponding to the document contain->
Figure SMS_69
Title of individual word->
Figure SMS_64
And->
Figure SMS_68
Corresponding templates generated by the basic neural network header generation model sample +.>
Figure SMS_70
Wherein->
Figure SMS_71
The individual templates comprise->
Figure SMS_65
A personal word;
s2: combining the document and each template into a pair, and combining the document templates into a pair
Figure SMS_72
Treated as a node;
s3: obtaining word representations corresponding to each word in all nodes by adopting a pre-trained language model;
s4: obtained rich information word vector
Figure SMS_73
By the formula (1)And (3) calculating:
Figure SMS_74
(1)
wherein the method comprises the steps of
Figure SMS_75
And->
Figure SMS_76
Respectively represent "[ CLS ]]"and" [ SEP]”。
The construction method of the interaction attention layer between the nodes comprises the following steps:
first calculate
Figure SMS_77
And->
Figure SMS_78
First->
Figure SMS_79
Interactive attention weight associated with initial representation of individual words>
Figure SMS_80
Calculated by formula (2):
Figure SMS_81
(2)
wherein the method comprises the steps of
Figure SMS_82
Representing a weight matrix;
then the first
Figure SMS_83
First->
Figure SMS_84
The representation of the individual words is aggregated from +.>
Figure SMS_85
Information of individual nodes is expressed by the formula (3) And (3) calculating:
Figure SMS_86
(3)
further build up of the first according to the above formula
Figure SMS_87
Node and->
Figure SMS_88
Node-dependent vector representations are calculated by equation (4):
Figure SMS_89
。 (4)
s1: calculating a document-template matching matrix
Figure SMS_90
Indicating->
Figure SMS_91
Matching degree between source document and template in individual node for +.>
Figure SMS_92
Each element of->
Figure SMS_93
Use of->
Figure SMS_94
Personal document word and +.>
Figure SMS_95
The rich information word vector of each template word is calculated, and the calculation is carried out through a formula (5):
Figure SMS_96
(5)
wherein the method comprises the steps of
Figure SMS_97
Representing a weight matrix;
s2: obtaining attention scores of source document words and template words
Figure SMS_98
Attention score of template word relative to source document word +.>
Figure SMS_99
By the formulas (6) and (7), respectively:
Figure SMS_100
Figure SMS_101
; (7)
s3: the correlation vectors of the source article and the soft template are calculated by equation (8) and equation (9), respectively:
Figure SMS_102
Figure SMS_103
(9)
the node selection layer is intended to control the
Figure SMS_104
Final fine-grained node representation of individual nodes +.>
Figure SMS_105
Middle->
Figure SMS_106
Is calculated by the formula (10):
Figure SMS_107
(10)
wherein the method comprises the steps of
Figure SMS_108
Representing element multiplication->
Figure SMS_109
Representing cascade operation, a given node selects the attention fraction +.>
Figure SMS_110
First->
Figure SMS_111
No. 5 of individual nodes>
Figure SMS_112
The final fine granularity of the individual words is calculated by equation (11):
Figure SMS_113
(11)
according to the above formula, the first
Figure SMS_114
The individual nodes are calculated by equation (12):
Figure SMS_115
(12)
selecting a transducer decoder to decode the output header word by word, decoding the first of the output headers
Figure SMS_116
The conditional probability of an individual word is calculated by equation (13):
Figure SMS_117
(13)
wherein the method comprises the steps of
Figure SMS_118
From the target representation matrix->
Figure SMS_119
,/>
Figure SMS_120
Indicating the number of decoder layers, ">
Figure SMS_121
Representing a feed-forward neural network;
Figure SMS_122
defined by equation (14):
Figure SMS_123
(14)
wherein the method comprises the steps of
Figure SMS_124
Representation layer normalization operations;
Figure SMS_125
calculated by equation (15):
Figure SMS_126
。 (15)
the preferred embodiments of the present invention have been described in detail, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the spirit of the present invention, and the various changes are included in the scope of the present invention.

Claims (2)

1. The method for constructing the neural network title generation model is characterized by comprising the following steps of:
a rich information word vector layer;
an interaction attention layer between nodes;
an interaction attention layer in the node;
a node selection layer;
a decoding layer;
end-to-end training;
the construction method of the rich information word vector layer comprises the following steps:
s1: a selected one comprises
Figure QLYQS_3
News document->
Figure QLYQS_5
Corresponding to said document containing +.>
Figure QLYQS_7
Title of individual word->
Figure QLYQS_2
And->
Figure QLYQS_4
Corresponding templates generated by the basic neural network header generation model sample +.>
Figure QLYQS_6
Wherein->
Figure QLYQS_8
The individual templates comprise->
Figure QLYQS_1
A personal word;
s2: combining the document and each template into a pair, and combining the document templates into a pair
Figure QLYQS_9
Treated as a node;
s3: obtaining word representations corresponding to each word in all nodes by adopting a pre-trained language model;
s4: obtained rich information word vector
Figure QLYQS_10
Calculated by formula (1):
Figure QLYQS_11
(1)
wherein the method comprises the steps of
Figure QLYQS_12
And->
Figure QLYQS_13
Respectively represent "[ CLS ]]"and" [ SEP]”;
The construction method of the interaction attention layer between the nodes comprises the following steps:
first calculate
Figure QLYQS_14
And->
Figure QLYQS_15
First->
Figure QLYQS_16
Interactive attention weight associated with initial representation of individual words>
Figure QLYQS_17
Calculated by formula (2):
Figure QLYQS_18
(2)
wherein the method comprises the steps of
Figure QLYQS_19
Representing a weight matrix;
then the first
Figure QLYQS_20
First->
Figure QLYQS_21
The representation of the individual words is aggregated from +.>
Figure QLYQS_22
Information of individual nodes is calculated by the formula (3):
Figure QLYQS_23
(3)
further build up of the first according to the above formula
Figure QLYQS_24
Node and->
Figure QLYQS_25
Node-dependent vector representations are calculated by equation (4):
Figure QLYQS_26
(4);
the construction method of the intra-node interaction attention layer comprises the following steps:
s1: calculating a document-template matching matrix
Figure QLYQS_27
Indicating->
Figure QLYQS_28
Matching degree between source document and template in individual node for +.>
Figure QLYQS_29
Each element of->
Figure QLYQS_30
Use of->
Figure QLYQS_31
Personal document word and +.>
Figure QLYQS_32
Personal templatesThe rich information word vector of the word is calculated by the formula (5):
Figure QLYQS_33
(5)
wherein the method comprises the steps of
Figure QLYQS_34
Representing a weight matrix;
s2: obtaining attention scores of source document words and template words
Figure QLYQS_35
Attention score of template word relative to source document word +.>
Figure QLYQS_36
By the formulas (6) and (7), respectively:
Figure QLYQS_37
Figure QLYQS_38
(7);
s3: the correlation vectors of the source article and the soft template are calculated by equation (8) and equation (9), respectively:
Figure QLYQS_39
Figure QLYQS_40
(9);
the node selection layer is intended to control the
Figure QLYQS_41
Final fine-grained node representation of individual nodes +.>
Figure QLYQS_42
Middle->
Figure QLYQS_43
Is calculated by the formula (10):
Figure QLYQS_44
(10);
wherein the method comprises the steps of
Figure QLYQS_45
Representing element multiplication->
Figure QLYQS_46
Representing cascade operation, a given node selects the attention fraction +.>
Figure QLYQS_47
First->
Figure QLYQS_48
No. 5 of individual nodes>
Figure QLYQS_49
The final fine granularity of the individual words is calculated by equation (11):
Figure QLYQS_50
(11);
according to the above formula, the first
Figure QLYQS_51
The individual nodes are calculated by equation (12):
Figure QLYQS_52
(12)。
2. according toThe method for constructing a neural network header generation model as claimed in claim 1, wherein: selecting a transducer decoder to decode the output header word by word, decoding the first of the output headers
Figure QLYQS_53
The conditional probability of an individual word is calculated by equation (13):
Figure QLYQS_54
(13)
wherein the method comprises the steps of
Figure QLYQS_55
From the target representation matrix->
Figure QLYQS_56
,/>
Figure QLYQS_57
Indicating the number of decoder layers, ">
Figure QLYQS_58
Representing a feed-forward neural network;
Figure QLYQS_59
defined by equation (14):
Figure QLYQS_60
(14)
wherein the method comprises the steps of
Figure QLYQS_61
Representation layer normalization operations;
Figure QLYQS_62
calculated by equation (15):
Figure QLYQS_63
(15)。
CN202211213861.9A 2022-09-30 2022-09-30 Construction method of neural network title generation model Active CN115496061B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211213861.9A CN115496061B (en) 2022-09-30 2022-09-30 Construction method of neural network title generation model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211213861.9A CN115496061B (en) 2022-09-30 2022-09-30 Construction method of neural network title generation model

Publications (2)

Publication Number Publication Date
CN115496061A CN115496061A (en) 2022-12-20
CN115496061B true CN115496061B (en) 2023-06-20

Family

ID=84471478

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211213861.9A Active CN115496061B (en) 2022-09-30 2022-09-30 Construction method of neural network title generation model

Country Status (1)

Country Link
CN (1) CN115496061B (en)

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6674900B1 (en) * 2000-03-29 2004-01-06 Matsushita Electric Industrial Co., Ltd. Method for extracting titles from digital images
CN106502985B (en) * 2016-10-20 2020-01-31 清华大学 neural network modeling method and device for generating titles
US10474709B2 (en) * 2017-04-14 2019-11-12 Salesforce.Com, Inc. Deep reinforced model for abstractive summarization
CN108984524A (en) * 2018-07-05 2018-12-11 北京理工大学 A kind of title generation method based on variation neural network topic model
CN110956041A (en) * 2019-11-27 2020-04-03 重庆邮电大学 Depth learning-based co-purchase recombination bulletin summarization method
CN113407708A (en) * 2020-03-17 2021-09-17 阿里巴巴集团控股有限公司 Feed generation method, information recommendation method, device and equipment
CN112560456B (en) * 2020-11-03 2024-04-09 重庆安石泽太科技有限公司 Method and system for generating generated abstract based on improved neural network
CN114020900B (en) * 2021-11-16 2024-03-26 桂林电子科技大学 Chart English abstract generating method based on fusion space position attention mechanism
CN114218928A (en) * 2021-12-30 2022-03-22 杭州电子科技大学 Abstract text summarization method based on graph knowledge and theme perception
CN115019142B (en) * 2022-06-14 2024-03-29 辽宁工业大学 Image title generation method and system based on fusion characteristics and electronic equipment

Also Published As

Publication number Publication date
CN115496061A (en) 2022-12-20

Similar Documents

Publication Publication Date Title
Eisenstein Introduction to natural language processing
Zhou et al. KdConv: A Chinese multi-domain dialogue dataset towards multi-turn knowledge-driven conversation
CN109086408B (en) Text generation method and device, electronic equipment and computer readable medium
Chakrabarty et al. MERMAID: Metaphor generation with symbolism and discriminative decoding
CN107944027B (en) Method and system for creating semantic key index
Liao et al. Improving readability for automatic speech recognition transcription
Evain et al. Task agnostic and task specific self-supervised learning from speech with lebenchmark
CN108153864A (en) Method based on neural network generation text snippet
Zhang et al. Effective subword segmentation for text comprehension
Wang et al. TEDT: Transformer-based encoding–decoding translation network for multimodal sentiment analysis
Shen et al. Compose like humans: Jointly improving the coherence and novelty for modern chinese poetry generation
Xu et al. A comprehensive survey of automated audio captioning
Pei et al. S2SPMN: A simple and effective framework for response generation with relevant information
CN115358289A (en) Text generation algorithm fusing multi-type knowledge base and inference technology
Shang et al. Entity resolution in open-domain conversations
Zhu Machine reading comprehension: algorithms and practice
Li et al. Semi-supervised Domain Adaptation for Dependency Parsing via Improved Contextualized Word Representations
Wei et al. KICGPT: Large Language Model with Knowledge in Context for Knowledge Graph Completion
Khan et al. A clustering framework for lexical normalization of Roman Urdu
Rizou et al. Efficient intent classification and entity recognition for university administrative services employing deep learning models
CN115496061B (en) Construction method of neural network title generation model
Chang et al. Singability-enhanced lyric generator with music style transfer
Bao et al. AEG: Argumentative essay generation via a dual-decoder model with content planning
JP2023071785A (en) Acoustic signal search device, acoustic signal search method, data search device, data search method and program
Ni et al. Masked siamese prompt tuning for few-shot natural language understanding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant