CN115496061B

CN115496061B - Construction method of neural network title generation model

Info

Publication number: CN115496061B
Application number: CN202211213861.9A
Authority: CN
Inventors: 阿雅娜; 卜范玉
Original assignee: Inner mongolia university of finance and economics
Current assignee: Inner mongolia university of finance and economics
Priority date: 2022-09-30
Filing date: 2022-09-30
Publication date: 2023-06-20
Anticipated expiration: 2042-09-30
Also published as: CN115496061A

Abstract

The invention relates to the technical field of natural language processing, in particular to a method for constructing a neural network title generation model. The neural network header generation model includes: a rich information word vector layer; an interaction attention layer between nodes; an interaction attention layer in the node; a node selection layer; decoding the layer. The invention creatively proposes the modeling of taking the sampling result generated by the basic neural network model as a soft template to assist the neural network title generation. With the improvements made herein, the invocation of additional information retrieval libraries, as well as manual design data cleansing rules, can be avoided. The method can ensure the conciseness and continuity of the abstract on the premise of not requiring any training data.

Description

Construction method of neural network title generation model

Technical Field

The invention relates to the technical field of natural language processing, in particular to a method for constructing a neural network title generation model.

Background

With the rapid development of deep learning technology in the field of natural language processing, the generation of end-to-end neural network titles also enters a brand new development stage. Through a huge neural network, the end-to-end neural network title generation system maps between the input article and the title, and generates a corresponding title for a document word by word without additional linguistic knowledge and more manual labels.

Despite significant success, neural network header generation models still face problems such as losing important information and generating duplicates or extra words. Therefore, how to help a neural network header generation system circumvent the above-described problems has attracted a great deal of attention.

Disclosure of Invention

In order to solve the technical problems, the invention provides a method for constructing a neural network title generation model, which can ensure the conciseness and consistency of the abstract on the premise of not requiring any training data.

In order to solve the technical problems, the invention adopts the following technical scheme:

a method for constructing a neural network title generation model includes:

a rich information word vector layer;

an interaction attention layer between nodes;

an interaction attention layer in the node;

a node selection layer;

a decoding layer;

end-to-end training.

Preferably, the method for constructing the rich information word vector layer includes:

s1: a selected one comprises

News document->

Corresponding to said document containing +.>

Title of individual word->

And->

Corresponding templates generated by the basic neural network header generation model sample +.>

Wherein->

The individual templates comprise->

A personal word;

s2: combining the document and each template into a pair, and combining the document templates into a pair

Treated as a node;

s3: obtaining word representations corresponding to each word in all nodes by adopting a pre-trained language model;

s4: obtaining the obtainedThe obtained rich information word vector

Calculated by formula (1):

（1）

wherein the method comprises the steps of

And->

Respectively represent "[ CLS ]]"and" [ SEP]”。

The pre-trained language model has the ability to efficiently generate contextually relevant word representations rich in semantic and syntactic information over a variety of natural language processing tasks.

Preferably, the method for constructing the interaction attention layer between the nodes comprises the following steps:

first calculate

And->

First->

Interactive attention weight associated with initial representation of individual words>

Calculated by formula (2):

（2）

wherein the method comprises the steps of

Representing a weight matrix;

then the first

First->

The representation of the individual words is aggregated from +.>

Information of individual nodes is calculated by the formula (3):

（3）

further build up of the first according to the above formula

Node and->

Node-dependent vector representations are calculated by equation (4):

（4）

different nodes constructed using different templates contain unique information, and semantic interactions between the different nodes will help the model better capture important information. The fully connected inter-node interaction attention layer is intended to implement this idea.

Preferably, the method for constructing the inter-node interaction attention layer comprises the following steps:

s1: calculating a document-template matching matrix

Indicating->

Matching degree between source document and template in individual node for +.>

Each element of->

Use of->

Personal document word and +.>

The rich information word vector of each template word is calculated, and the calculation is carried out through a formula (5):

（5）

wherein the method comprises the steps of

Representing a weight matrix;

s2: obtaining attention scores of source document words and template words

Attention score of template word relative to source document word +.>

By the formulas (6) and (7), respectively:

（7）；

s3: the correlation vectors of the source article and the soft template are calculated by equation (8) and equation (9), respectively:

（9）

preferably, the node selection layer is intended to controlMake the first

Final fine-grained node representation of individual nodes +.>

Middle->

Is calculated by the formula (10):

（10）

wherein the method comprises the steps of

Representing element multiplication->

Representing cascade operation, a given node selects the attention fraction +.>

First->

No. 5 of individual nodes>

The final fine granularity of the individual words is calculated by equation (11):

（11）

according to the above formula, the first

The individual nodes are calculated by equation (12):

（12）

preferably, a transducer solution is selectedThe encoder decodes the output header word by word, the first of the decoded output headers

The conditional probability of an individual word is calculated by equation (13):

（13）

wherein the method comprises the steps of

From the target representation matrix->

，/>

Indicating the number of decoder layers, ">

Representing a feed-forward neural network;

defined by equation (14):

（14）

wherein the method comprises the steps of

Representation layer normalization operations;

calculated by equation (15):

（15）。

compared with the prior art, the invention has the beneficial effects that:

the method can ensure the conciseness and continuity of the abstract on the premise of not requiring any training data. However, it is not practical to manually create all templates because this work requires not only intensive labor, but also a great deal of domain knowledge. In the context of deep learning, an improved template-based headline generation method provides similar guidance for summaries using summaries of specific articles in some training sets as templates. This approach, while avoiding the problem of manually creating templates, requires a elaborate design for the process of retrieving a particular article. The retrieval module uses an information retrieval standard library Apache Lucene, and a certain background knowledge is required for calling the library itself. Then, in order to retrieve according to an article, some specific information in the article, such as date, guide language, etc. needs to be removed to eliminate the influence of the specific information on the matching of the article. Then, according to the cleaned articles, a small part of candidate documents are searched out through a search system, and corresponding titles are searched out from the training set and used as soft templates. The invention creatively proposes the modeling of taking the sampling result generated by the basic neural network model as a soft template to assist the neural network title generation. With the improvements made herein, the invocation of additional information retrieval libraries, as well as manual design data cleansing rules, can be avoided.

Drawings

FIG. 1 is a flow chart of the present invention.

Description of the embodiments

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

As shown in fig. 1, a method for constructing a neural network header generation model, the neural network header generation model includes:

a rich information word vector layer;

an interaction attention layer between nodes;

an interaction attention layer in the node;

a node selection layer;

a decoding layer;

end-to-end training.

The construction method of the rich information word vector layer comprises the following steps:

s1: a selected one comprises

News document->

Corresponding to the document contain->

Title of individual word->

And->

Wherein->

The individual templates comprise->

A personal word;

Treated as a node;

s4: obtained rich information word vector

By the formula (1)And (3) calculating:

（1）

wherein the method comprises the steps of

And->

Respectively represent "[ CLS ]]"and" [ SEP]”。

The construction method of the interaction attention layer between the nodes comprises the following steps:

first calculate

And->

First->

Calculated by formula (2):

（2）

wherein the method comprises the steps of

Representing a weight matrix;

then the first

First->

The representation of the individual words is aggregated from +.>

Information of individual nodes is expressed by the formula (3) And (3) calculating:

（3）

further build up of the first according to the above formula

Node and->

Node-dependent vector representations are calculated by equation (4):

。（4）

s1: calculating a document-template matching matrix

Indicating->

Matching degree between source document and template in individual node for +.>

Each element of->

Use of->

Personal document word and +.>

（5）

wherein the method comprises the steps of

Representing a weight matrix;

s2: obtaining attention scores of source document words and template words

Attention score of template word relative to source document word +.>

By the formulas (6) and (7), respectively:

；（7）

（9）

the node selection layer is intended to control the

Final fine-grained node representation of individual nodes +.>

Middle->

Is calculated by the formula (10):

（10）

wherein the method comprises the steps of

Representing element multiplication->

Representing cascade operation, a given node selects the attention fraction +.>

First->

No. 5 of individual nodes>

（11）

according to the above formula, the first

The individual nodes are calculated by equation (12):

（12）

selecting a transducer decoder to decode the output header word by word, decoding the first of the output headers

（13）

wherein the method comprises the steps of

From the target representation matrix->

，/>

Indicating the number of decoder layers, ">

Representing a feed-forward neural network;

defined by equation (14):

（14）

wherein the method comprises the steps of

Representation layer normalization operations;

calculated by equation (15):

。（15）

the preferred embodiments of the present invention have been described in detail, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the spirit of the present invention, and the various changes are included in the scope of the present invention.

Claims

1. The method for constructing the neural network title generation model is characterized by comprising the following steps of:

a rich information word vector layer;

an interaction attention layer between nodes;

an interaction attention layer in the node;

a node selection layer;

a decoding layer;

end-to-end training;

s1: a selected one comprises

News document->

Corresponding to said document containing +.>

Title of individual word->

And->

Wherein->

The individual templates comprise->

A personal word;

Treated as a node;

s4: obtained rich information word vector

Calculated by formula (1):

（1）

wherein the method comprises the steps of

And->

Respectively represent "[ CLS ]]"and" [ SEP]”；

first calculate

And->

First->

Calculated by formula (2):

（2）

wherein the method comprises the steps of

Representing a weight matrix;

then the first

First->

The representation of the individual words is aggregated from +.>

Information of individual nodes is calculated by the formula (3):

（3）

further build up of the first according to the above formula

Node and->

Node-dependent vector representations are calculated by equation (4):

（4）；

the construction method of the intra-node interaction attention layer comprises the following steps:

s1: calculating a document-template matching matrix

Indicating->

Matching degree between source document and template in individual node for +.>

Each element of->

Use of->

Personal document word and +.>

Personal templatesThe rich information word vector of the word is calculated by the formula (5):

（5）

wherein the method comprises the steps of

Representing a weight matrix;

s2: obtaining attention scores of source document words and template words

Attention score of template word relative to source document word +.>

By the formulas (6) and (7), respectively:

（7）；

（9）；

the node selection layer is intended to control the

Final fine-grained node representation of individual nodes +.>

Middle->

Is calculated by the formula (10):

（10）；

wherein the method comprises the steps of

Representing element multiplication->

Representing cascade operation, a given node selects the attention fraction +.>

First->

No. 5 of individual nodes>

（11）；

according to the above formula, the first

The individual nodes are calculated by equation (12):

（12）。

2. according toThe method for constructing a neural network header generation model as claimed in claim 1, wherein: selecting a transducer decoder to decode the output header word by word, decoding the first of the output headers