CN115577118B

CN115577118B - Text generation method based on mixed grouping ordering and dynamic entity memory planning

Info

Publication number: CN115577118B
Application number: CN202211216143.7A
Authority: CN
Inventors: 荣欢; 孙圣杰; 马廷淮
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2022-09-30
Filing date: 2022-09-30
Publication date: 2023-05-30
Anticipated expiration: 2042-09-30
Also published as: CN115577118A

Abstract

The invention discloses a text generation method based on mixed grouping ordering and dynamic entity memory planning, which aims to automatically convert input structured data into readable text describing the data. The invention selects sub-image grouping through a length control module and a sub-image observation module in the grouping stage, and sorts the data according to the group; generating a static node content plan in a static planning stage to achieve ordered inter-group groups in the group; each time step dynamically decides the data output in the next step according to the memory network on the basis of static planning; with three-level reconstruction, the decoder is directed from multiple angles to capture essential features in the input. The invention introduces a finer granularity grouping mechanism to make up the gap between structured data and unstructured text; the dynamic content planning is further combined with a memory network, so that the consistency of the semantics is enhanced; a three-level reconstruction mechanism is introduced to capture the intrinsic feature dependency between the input and the output from different levels.

Description

Text generation method based on mixed grouping ordering and dynamic entity memory planning

Technical Field

The invention belongs to the field of natural language processing, and particularly relates to a text generation method which is suitable for converting input structured data into readable text problems describing the data.

Background

Text generation tasks are an important topic in the field of natural language processing. Life data may appear in different forms under different circumstances, even some forms are difficult for a non-professional field to understand, for example: knowledge graph. And the conversion of such data into readable text provides people with a great deal of time and effort. Whereas the Data-to-text task aims at automatically converting the input structured Data into readable text describing these Data.

Reiter ^[1] Summarizing the text generation system, it is considered that it can be divided into three more independent modules: (1) Content planning (Content planning), i.e. selecting which data records or data fields to describe; (2) Sentence planning (Sentence planning), i.e. determining the order of the selected data records or data fields in the sentence; (3) Sentence realization (Surface realization), i.e. the generation of the actual text based on the result of the sentence planning. Intuitively, content planning mainly solves what, sentence planning mainly solves speaking order, and sentence implementation mainly solves what. This essentially becomes the paradigm of text generation systems, and more end-to-end models have begun to add content selection and content planning modules to improve performance in recent years. Puduppully et al ^[2] A neural network architecture is provided, which divides the generation task into a content selection and planning stage and a sentence realization stage. Given a set of data records, a content plan is first generated highlighting which information should be mentioned and in what order, then documents are generated based on the content plan, and a replication mechanism is added to promote decoder effects. Chen et al ^[3] A text generation model based on dynamic content planning is proposed to be implemented according to a generated text information dynamic adjustment plan, and a reconstruction mechanism is added to promote a decoder to capture essential features which the encoder wants to express. Puduppully et al ^[4] According to the generation process information and entity memory, the entity representation is dynamically updated, entity transition among sentences is captured, continuity among sentences is increased, and the content to be described is more properly selected.

Although the sentence realization phase can generate fluent text, information loss, duplication or illusion problems still occur, so the grouping concept is widely used, aligning entities with descriptive text to solve such problems. Lin et al ^[5] Separators are added to the plan for fine-grained classification to facilitate long text generation. Shen et al ^[6] Grouping the entity data, each portion being associated with a segment of the traceThe text corresponds to the corresponding description text can be generated by the designated entity pair without paying attention to the whole data. Xu et al ^[7] And sequencing and aggregating the input triplet data so as to align the triplet data with the output descriptive text, and generating the descriptive text sentence by sentence.

In connection with the above understanding, the content planning and sentence planning sections are further enhanced. Introducing a grouping mechanism with finer granularity, and constructing a corresponding static planning generation strategy in a matched manner; the memory network is further combined with entity transfer to grasp transfer of description key points among sentences; further reconstruction from multiple angles ensures that the multiple stages capture the essential features between the input and output.

Reference is made to:

[1]Reiter E.An architecture for data-to-text systems[C]//proceedings of the eleventh European workshop on natural language generation(ENLG 07).2007:97-104.

[2]Puduppully R,Dong L,Lapata M.Data-to-text generation with content selection and planning[C]//Proceedings of the AAAI conference on artificial intelligence.2019,33(01):6908-6915.

[3]Chen K,Li F,Hu B,et al.Neural data-to-text generation with dynamic content planning[J].Knowledge-Based Systems,2021,215:106610.

[4]Puduppully R,Dong L,Lapata M.Data-to-text generation with entity modeling[J].arXiv preprint arXiv:1906.03221,2019.

[5]Lin X,Cui S,Zhao Z,et al.GGP:A Graph-based Grouping Planner for Explicit Control of Long Text Generation[C]//Proceedings of the 30th ACM International Conference on Information&Knowledge Management.2021:3253-3257.

[6]Shen X,Chang E,Su H,et al.Neural data-to-text generation via jointly learning the segmentation and correspondence[J].arXiv preprint arXiv:2005.01096,2020.

[7]Xu X,

O,Rieser V,et al.AGGGEN:Ordering and Aggregating while Generating[J].arXiv preprint arXiv:2106.05580,2021.

disclosure of Invention

The invention aims to: aiming at the structural difference when converting structured data into linear readable texts, the prior model adopts an advanced planning method to make up for the structural difference, but the traditional planning method adopts a single cyclic neural network which is simple and not fine enough in granularity, and all the methods are realized by planning first and then, and the problems of adjustment in combination with a text generation process are solved.

The technical scheme is as follows: in order to achieve the purpose of the invention, the technical scheme adopted by the invention is as follows:

a text generation method based on mixed grouping ordering and dynamic entity memory planning is characterized in that the entity representation is updated by utilizing information of a generation process and entity transfer memory based on static planning, the static planning is corrected, and finally a decoder is promoted to obtain more accurate important information from an encoder through three-level reconstruction. The method specifically comprises the following steps:

step 1) taking a structured data set needing to generate a corresponding text as model input, wherein the data is expressed in a form of a table or a knowledge graph, converting the obtained data into a bipartite graph, and carrying out embedded expression by using a graph attention mechanism;

step 2) grouping and sorting the data vectors obtained in the step 1) through a grouping stage; the grouping phase comprises two modules: a length control module and a sub-graph observation module; the length control module acts on each generation step, the generated sub-image sequence information is combined to map into probability distribution, the sub-image with the number of triples LC is selected from the next time step generation sub-image according to the probability distribution, the time step can only select the sub-image with the number of triples LC, if LC is selected as-1, the grouping stage is ended, and the step 5 is entered;

step 3) using the selection space for generating the sub-graph length LC control sub-graph obtained in the step 2), the sub-graph observation mechanism obtains the representation of the sub-graph according to the self-attention mechanism of all nodes in the sub-graph, and performs the attention mechanism with the sub-graph and node information in the previously generated sub-graph sequence, so as to generate the probability of selecting each sub-graph;

step 4) selecting a certain subgraph according to the probability distribution obtained in the step 3), and then updating node representations in all subgraphs by using the hidden state of the current step of the cyclic neural network, namely updating the representations of all subgraphs, and returning to the step 2); if step 2) LC is selected as-1, a final sub-graph sequence is obtained, and each sub-graph is a subset of the input structured data set;

step 5) static content planning stage selection generating entity sequence SP, representing V by global node _global As the initialization state of the cyclic neural network, the selection space of each step is the corresponding subgraph in the sequence of step 4); when generating special sub-graph end mark<EOG>When the next step of input of the cyclic neural network is the representation of the current subgraph, the next step of selection space is obtained according to the subgraph sequence obtained in the step 4); when traversing the sub-image sequence, obtaining a final static content planning SP entity sequence;

step 6) coding the SP entity sequence obtained in the step 5) through a bidirectional gating loop network to obtain an SP sequence entity hiding representation e _1-n N represents the total number of entities in the SP sequence; transmitting the SP sequence hidden representation to a generation stage and an entity memory module;

step 7), the entity memory module uses the SP sequence entity hidden representation as initial content to carry out memory storage; hiding state d by using generation-stage recurrent neural network _t-1 Updating physical memory u _t,k Memorizing the entity u _t,k And d _t-1 Multiplying to obtain memory weight ψ _t,k Where t represents the t-th time step and k represents the kth entity;

step 8) circulating the hidden state d of the neural network according to the generation stage _t-1 And e _1-n The attention doing mechanism gets the attention score a _1-n Score attention a _t,k And corresponding entity memory u _t,k Multiplication to obtain the entity context vector S _t,k ；

Step 9) memorizing the weight ψ _t,k And corresponding entity context vector S _t,k The weights are summed to obtain a context vector q _t As an input of the pointer generation decoder, enhancing the pointer decoder by adopting a graph structure enhancement mechanism to generate a translation text corresponding to the structured data;

and 10) adopting three-level reconstruction to enable the decoder to completely acquire the information contained in the encoder, respectively reconstructing a static content planning SP according to the translation text, removing a sub-picture sequence of a reconstruction grouping stage according to the static content planning sequence, generating a decoding result of the decoder according to the pointer, and recovering the decoding result to be a bipartite graph representation.

Further, the data in the step 1) are represented in the form of a table or a knowledge graph, wherein the structured data exist in the form of records, and the structured data exist in the form of triples in the knowledge graph;

the knowledge graph is used as structured input data, and the triples are formed by a head entity, a relation and a tail entity; converting the obtained data into a bipartite graph, namely, representing the relation in the triplet as a node, and simultaneously adding a global node to observe the structural system information of the whole graph; all nodes are represented embedded using a graph attention mechanism.

Further, the SP entity sequence obtains the SP entity hidden representation e through Bi-gating cyclic network Bi-GRU _1-n Fusing the sequence information of the SP into entity embedding;

hidden state d using generation of decoding recurrent neural network RNN _t-1 Updating each physical memory u in the memory network _t,k Where t represents the t-th time step and k represents the kth entity; comprising the following steps:

u _-1,k ＝W·e _k (5)

γ _t ＝softmax(W·d _t-1 +b _γ ) (6)

δ _t,k ＝γ _t ⊙softmax(W·d _t-1 +b _d +W·u _t-1,k +b _u ) (7)

first, equation (5) represents e by entity _k Initializing the memory of each entity, denoted as u _-1,k The method comprises the steps of carrying out a first treatment on the surface of the Shown in formula (6), gamma _t Representing gating, deciding whether to modify, and determining the hidden state d according to the previous time step in the generation stage _t-1 I.e. information decision of the generated text; shown in formula (7), delta _t,k Indicating the extent to which modification is required, according to d _t-1 And physical memory u _t-1,k Determining; in the formula (8), the expression "a",

representing content of the memory modification to the entity; finally, the formula (9) is based on the physical memory u at the previous moment _t-1,k And modify content->

Updating to obtain entity memory content u of current time step _t,k The method comprises the steps of carrying out a first treatment on the surface of the Equation (10) will u _t,k And d _t-1 Performing an attention mechanism to obtain the attention weight ψ of the memory module _t,k ；W,b _γ ,b _d ,b _u Is a super parameter.

Further, step 8) loops the hidden state d of the neural network according to the generation stage _t-1 And e _1-n The attention doing mechanism gets the attention score a _1-n Score attention a _t,k And corresponding entity memory u _t,k Multiplication to obtain the entity context vector S _t,k The method comprises the steps of carrying out a first treatment on the surface of the The method comprises the following steps:

S _t,k ＝a _t,k ·e _k (12)

equation (11) will generate the hidden state d of the phase-cycled neural network _t-1 Hidden with the 1 st to n th entities to represent e _1-n Attention score a is obtained through an attention mechanism _1-n Equation (12) scores attention a _t,k Hidden from entities representation e _k Multiplication to obtain the entity context vector S _t,k Where t represents the t-th time step and k represents the kth entity;

equation (13) is based on the memory module attention weight ψ _t,k For entity context vector S _t,k The weight sums to obtain the context vector q of the current t moment _t As a pointer, generates the input to the network.

The beneficial effects are that: compared with the prior art, the technical scheme of the invention has the following beneficial technical effects:

the invention relates to a text generation method which combines deep reinforcement learning and is realized while planning, and is used for the condition that a readable text is required to be automatically output given structured input data. In planning, not only the importance of the input data per se is considered, but also the information of the generated text and the memory of the past physical changes are considered, and fine granularity grouping is further carried out, so that the generated planning is consistent with the golden planning as much as possible.

Meanwhile, three-level reconstruction is adopted to capture the essential characteristics between input and output from different layers. Reconstructing a static plan according to the generated text, and enabling the generated text to be basically consistent with the static plan, so as to ensure that the dynamic plan finely adjusts the static plan only according to the generated information; reconstructing the sequence of the selected entities in the groups according to the static programming sequence to ensure that the static programming sequence still keeps ordered among the groups; and restoring the decoding result of the pointer generation decoder to be bipartite graph representation, and reconstructing based on vector angles to enable the final decoding result to reflect the input essential characteristics.

Drawings

FIG. 1 is a flow chart of a text generation algorithm of the present invention;

FIG. 2 is a block diagram of a text generation algorithm of the present invention;

FIG. 3 is a packet phase flow diagram;

FIG. 4 is a block diagram of a packet phase;

fig. 5 is a process build diagram of a packet phase selection subgraph.

Detailed Description

The technical scheme of the invention is further described below with reference to the accompanying drawings and examples.

The text generation algorithm based on the mixed packet ordering and dynamic entity memory planning according to the invention is further described in detail with reference to the flow chart and the implementation case.

The text generation algorithm is improved by adopting mixed grouping ordering and dynamic entity memory planning, so that the planning performance is improved, and a coherent text is generated. The flow of the method is shown in fig. 1, the algorithm structure is shown in fig. 2, and the method comprises the following steps:

and step 10, converting the input structured data into bipartite graph representation, namely, representing the relationship in the triples as nodes, adding global nodes to observe the structural system information of the whole graph, and carrying out node embedding representation through a graph attention mechanism (GAT).

Step 20, inter-group ordering is performed by the grouping stage. The grouping phase comprises two modules: the length control module is used for controlling the selection space of the subgraph according to the length, and the subgraph observation module is used for selecting the subgraph by combining the generated subgraph sequence information in the selection space designated by the length control.

And step 30, generating an entity sequence SP in the static programming stage, and achieving the order among groups in the groups, as shown in formulas (1) - (4). The static planning stage adopts a cyclic neural network, and aims to generate a node sequence, and plan and generate the description content and sequence of the text in advance.

If LC＝-1 then d ₀ ＝V _GLOBAL (1)

When the value range of the return value LC of the Length control module is [ -1, max Length]Max Length is the total number of triples in the input structure data and is used for determining the space for selecting the subgraph in the next time step. As shown in formula (1), when the length control module return value LC of the grouping phase is-1, the grouping phase is ended, the static programming phase is entered, and the global node is used for representing V _global As an initialization state of the recurrent neural network.

The selection space of the nodes is limited by the group, and all nodes or nodes in the current group can be selected in the content planning stage<EOG>An indication is made to indicate that the sub-graph utilization currently being the selected space is complete. Equation (2) is based on the kth subgraph G _k The z-th node in (a) represents

Hidden state d with previous time step loop network _t-1 Computing gating gate _z Gating gate _z The degree of association of the node itself with the static plan is measured, where z represents the z-th node. Gate is controlled as shown in formula (3) _z And the kth sub-graph G _k The z-th node in (2) represents->

The context representation of the node is obtained after multiplication +.>

The node importance is judged in conjunction with the generated SP sequence. Finally formula (4) is according to +.>

Calculating the node of the current step selection node _k,z Probability of (2)

Namely, the relevance between each node in the subgraph and the node sequence generated by the preamble is measured, the node is selected, and each time step of the cyclic neural network takes the node representation selected in the previous step as input.

When the last time step SP is generated as<EOG>With special symbols, i.e. representing sub-graph G currently being the space of choice _k With end, selecting the next sub-graph G in accordance with the sub-graph sequence generated in the grouping phase _k+1 . Inputting a last sub-graph vector representation into a recurrent neural network

Sub-picture vector representation +.>

Pooling subgraph G by averaging _k+1 Is derived from all node representations. For G _k+1 The subgraph repeats the operations of the above formulas (1) - (4). And when the sub-graph sequence traversal obtained in the grouping stage is finished, the static content planning stage also obtains a final SP entity sequence.

Step 40, the SP entity sequence obtains the SP entity hidden representation e through Bi-directional gating cyclic network Bi-GRU _1-n ，(e ₁ ,e ₂ ,...,e _n )＝Bi-GRU(SP ₁ ,SP ₂ ,...,SP _n ) The sequence information of the SP is fused into the entity embedding.

Step 50, combining the generated text information with the entity memory stored in the entity memory network to obtain a context q _t As shown in equations (5) - (13).

u _-1,k ＝W·e _k (5)

γ _t ＝softmax(W·d _t-1 +b _γ ) (6)

δ _t,k ＝γ _t ⊙softmax(W·d _t-1 +b _d +W·u _t-1,k +b _u ) (7)

Further, utilizing the hidden state d of the generated decoding recurrent neural network RNN _t-1 Updating each physical memory u in the memory network _t,k Where t represents the t-th time step and k represents the kth entity. First, equation (5) represents e by entity _k Initializing the memory of each entity, denoted as u _-1,k . Shown in formula (6), gamma _t Representing gating, deciding whether to modify, and determining the hidden state d according to the previous time step in the generation stage _t-1 I.e. the information decision that text has been generated. Shown in formula (7), delta _t,k Indicating the extent to which modification is required, according to d _t-1 And physical memory u _t-1,k And (5) determining. In the formula (8), the expression "a",

representing the content of the memory modification to the entity. Finally, the formula (9) is based on the physical memory u at the previous moment _t-1,k And modify content->

Updating to obtain entity memory content u of current time step _t,k . Equation (10) will u _t,k And d _t-1 Attention machine is performed to make the attention weight ψ of the memory module _t,k 。

S _t,k ＝a _t,k ·e _k (12)

Further, equation (11) will generate hidden state d of the phase-cycled neural network _t-1 Hidden with the 1 st to n th entities to represent e _1-n Attention score a is obtained through an attention mechanism _1-n Equation (12) scores attention a _t,k Hidden from entities representation e _k Multiplication to obtain the entity context vector S _t,k Where t represents the t-th time step and k represents the kth entity.

Step 60, q _t As an input to the pointer generation network, the pointer generation network is enhanced with a graph structure enhancement mechanism (Graph Structure Enhancement Mechanism) to generate text, as shown in formulas (14) - (19).

Equation (14) is based on the hidden state d of the generation module _t And context vector q _t Context vector computed to generate hidden states

Equation (15) utilizes a context vector that generates phase hidden states

Projected as a probability distribution of the same length as the vocabulary. Equation (16) takes the attention weight to an entity in the memory network as the duplication probability +.>

The formula (17) adopts the existing graph structure enhancement mechanism (Graph Structure Enhancement Mechanism) to enhance the replication probability given by the pointer generation network by means of the collage of the graph.

θ＝Sigmoid(W·d _t +b _d ) (18)

The conditional probability is used to combine the generation probability and the replication probability. Equation (18) calculates the probability that θ is used to select the copy or generate mode. Equation (19) soft combines the generation probability and the graph structure enhanced replication probability to obtain the final probability distribution.

The model is constructed by adopting a pipeline model and is divided into a grouping stage, a static programming stage, a physical memory stage and a pointer generation decoding stage. And extracting by the existing information extraction system, and comparing the reference text in the sample with the input structured data to obtain the static planning gold standard. Since dynamic content planning does not give the displayed golden standard, the memory module parameters are updated by means of the generated loss function. The reference text in the sample and the obtained static planning gold standard can be compared with the generated text and the generated static planning to obtain a loss function.

Equation (20) represents a negative log likelihood loss function of the generated text, causing the generated text to coincide as much as possible with the reference text given in the sample. Wherein, the liquid crystal display device comprises a liquid crystal display device,

expressed as reference text, t represents the t-th time step. Equation (21) is used for regularization of the loss function, where T represents the generated text length, i.e. the total number of time steps, +.>

And (3) an average value of the statistical probability, wherein gamma is a super parameter.

Equation (22) characterizes the negative log likelihood of generating the SP sequence, maximizing the probability of generating the static programming gold standard, |SP| represents the sequence length of the static programming,

representing static content planning gold marks->

Is included in the node (a).

And step 70, reconstructing static programming according to the generated translation text by adopting three-level reconstruction, de-reconstructing groups according to a static programming sequence, generating a decoding result according to the pointer, and recovering to be a bipartite graph representation.

P ^rec1 (SP＝node _z )＝Softmax(W·h _t +b) (23)

Further, the first level of reconstruction uses a recurrent neural network as a model, and the static programming is reconstructed according to the translation text, namely, the static programming SP is extracted from the decoded vocabulary embedded representation. Generating hidden state vector of last moment of decoder by pointer

Initializing the hidden state of the cyclic neural network, which is defined as h ₀ . Calculation of translated text context vector by attention manipulation of all vocabulary embedded representations of hidden state and pointer generation decoder>

As an input to the recurrent neural network. Equation (23) calculates the probability of selecting a node, where h _t The hidden state output by the neural network is cycled at t time, and W, b is a super parameter. Thus, the loss function to generate a reconstruction of the text to the static programming section may be defined as:

/>

where |SP| represents the sequence length of the generated static plan, node _z Represents the z-th node in the SP entity sequence generated in the static planning stage,

and (3) calculating the average value of the probabilities of each time step in the reconstruction 1 stage, wherein the average value is used for regularization of a loss function, and gamma is a super parameter. Loss function L _rec1 The aim is to extract as much as possible the previous static plan from the generated descriptive text.

Further, the second level of reconstruction de-reconstructs the order of these selected entities in the packet according to the static programming sequence, i.e., reverts from the static programming to the packet sequence number, with the aim of preserving the ordered characteristics among the group of packet phases. The second-level reconstruction and the first-level reconstruction adopt similar structures, a static planning sequence is input through an attention mechanism, and a cyclic neural network is adopted to generate a corresponding grouping sequence. The loss function of the secondary reconstruction can be defined as:

where |g| denotes the length of the packet sequence, G _k Representing the kth sub-graph in the sub-graph sequence generated by the packet phase,

and (3) calculating the average value of the probabilities of each time step in the 2-stage reconstruction, wherein the calculation mode is the same as the formula (25), and gamma is a super parameter.

Further, the first-stage reconstruction and the second-stage reconstruction, namely, the generation of text reconstruction static programming and static programming reconstruction packet sequences, adopt serial number-based reconstruction. For the third level reconstruction, i.e. the network decoding result is restored to the bipartite graph representation according to the pointer generation, the reconstruction based on the vector representation is adopted.

L _rec3 ＝KL((m ₁ ,m ₂ ,...m _|V| ),GAT _CUBE (Bipartite)) (27)

Equation (27) is expressed as m by encoding and re-decoding the pointer generation network decoding result to obtain the codes of the 1 st to |V| nodes ₁ ,m ₂ ,...m _|V| ，L _rec3 The decoded result is required to be consistent with the embedded representation of the bipartite graph after being subjected to graph attention mechanism (GAT) coding, and the reconstruction is carried out from the angle of vector representation, so that the KL divergence is used as a loss function.

L _TOTAL ＝λ ₁ ·L _sp +λ ₂ ·L _lm +λ ₃ ·L _rec1 +λ ₄ ·L _rec2 +λ ₅ ·L _rec3 (28)

Finally, the model penalty function may be defined as equation (28), defined by a combination of three penalty functions of static programming penalty, text penalty generation, and three level reconstruction. Lambda (lambda) _1-5 Are super parameters.

As shown in fig. 3, the packet phase flow chart is as follows:

step 101, the Length control module selects the Length LC of the sub-image to be generated by combining the information of the generated sub-image sequence, the range of the LC is [ -1, max Length]Max Length is the total number of triples in the input structure data and is used for determining the space for selecting the subgraph in the next time step. The method comprises the steps of carrying out a first treatment on the surface of the If LC is-1, the packet phase terminates.

The representation of the sub-graph is selected for the previous step for updating the length control memory vector L.

P _LC ＝Softmax(W _LC ·L _t +b _LC ) (30)

The formula (29) updates the Length control memory vector L according to the generated sequence, the formula (30) uses the L vector to project into probability steps with the Length equal to Max length+1, and the number of triples is selected according to the probability, and is defined as LC. Limiting the selection space of generating the sub-graph in the current step through the length of the sub-graph, and only selecting the sub-graph with the number of triples LC in the sub-graph by sub-graph observation. Where γ is the hyper-parameter and t represents the t-th time step.

V _Global ←GAT(Bipartite) (31)

LC＝Sigmoid(W·V _Global +b _LC ) (32)

As shown in the block diagram of fig. 4, the first time step lacks the generated sub-picture sequence information, so equation (31) is initialized with the global nodes in the bipartite graph.

As shown in fig. 5, all possible sub-graph sets can be defined as a three-dimensional Tensor, called square matrix Cube. Equation (33) converts the selected LC into one-hot vector LC _onehot The selection of a sub-image space of a certain length can be intuitively understood from fig. 5, i.e. the selection of the corresponding page from Cube.

Step 102, controlling a selection space according to the length LC of the generated sub-graph, observing the sub-graph, and carrying out an attention mechanism with sub-graph and node information in the sequence of the generated sub-graph according to the representation of each sub-graph, so as to generate the probability of selecting each sub-graph.

The formula (34) is to average and pool the node representations in the subgraph to be the representation of the subgraph, and the node representations are updated at each moment, so that the subgraph representations are different, the subscript t represents the t-th time step, i represents the i-th subgraph, and j represents the j-th node in the subgraph. Attention mechanisms are added from the sub-graph level and the node level, respectively.

Equation (35) calculates the attention score, g, for the candidate sub-graph and the previously selected sub-graph _t,i Representation of a candidate sub-graph i at time t, G _k The selected sub-graph embeds the representation at time k before the representation. Equation (36) will be the information g of the candidate graph itself _t,i With previously selected sub-picture information G _k The attention mechanism of the sub-graph layer is completed by fusing the attention mechanismsThe following vectors

Equation (37) will sub-graph context vector

The attention score is calculated with all nodes of the previously selected subgraph,

representing the z-th node in the selected graph k. Equation (38) will be the candidate subgraph context vector +.>

Information about all nodes in the previously selected subgraph +.>

The attention mechanism of the node layer is completed by fusing the attention mechanisms, and the context vectors of the child nodes are +.>

Finally, formula (39) is based on the node context vector +.>

Calculating a probability of selecting the subgraph, wherein W _G 、b _G Is a packet phase super parameter. As shown in figure 5 of the drawings,intuitively, it is understood that selecting a sub-image representation from sub-image pages of the same length.

Step 103, updating node representations in all sub-graphs after selecting a certain sub-graph, namely updating the representations of all sub-graphs.

u＝σ(W _update G _t +b _update ) (40)

/>

After each step of selecting the sub-graph, the representation of all nodes is updated by selecting the information of the sub-graph, so that even if the sub-graph is repeatedly selected, its representation is different. Formula (40) subgraph G selected by the previous step _t Representing computing update content, wherein W _update 、b _update To update the super parameters. Equation (41) calculates gating for adjusting the relationship between the previous information retention and the updated information of the newly added subgraph, W _gate 、b _gate Super parameters are calculated for gating. Equation (42) updates its representation for each node, where t represents the t-th time and v represents the v-th node, so the node representation is different for each time.

Step 104, repeating the operations from step 101 to step 103 after updating all node representations until the LC value generated in step 101 is-1, and ending the grouping stage to obtain a final sub-graph sequence.

The grouping phase cannot extract gold standards from the samples. Therefore, the whole model can be warmed up first, the model is unfamiliar to data in the initial stage, and the weight distribution is continuously corrected by learning with a smaller learning rate. After the model has a certain familiarity to the data, fixing all parameters of the subsequent modules, and adjusting the weight of the grouping stage by using the loss function of the subsequent modules. Method 1: and fixing parameters of the static planning module, inputting grouping results of the grouping stage into the static planning stage to generate SP, and comparing the SP with the golden static planning given by the data set to obtain a loss function. Method 2: and (3) completely fixing all parameters from the static planning module to the pointer generation network module, inputting a grouping result, outputting a generated text, and comparing the generated text with a reference text given by a sample to obtain a loss function. The super parameters of the training packet phase are updated by

methods

1 and 2.

Claims

1. A text generation method based on mixed grouping ordering and dynamic entity memory planning is characterized in that: the method comprises the following steps:

2. The text generation method according to claim 1, characterized in that: step 1) the data are expressed in the form of a table or a knowledge graph, wherein the structured data exist in the form of records in the table, and the structured data exist in the form of triples in the knowledge graph;

3. The text generation method according to claim 2, characterized in that: SP entity sequence obtains SP sequence entity hiding representation e through Bi-gating cyclic network Bi-GRU _1-n Fusing the sequence information of the SP into entity embedding;

u _-1,k ＝W·e _k (5)

γ _t ＝softmax(W·d _t-1 +b _γ ) (6)

δ _t,k ＝γ _t ⊙softmax(W·d _t-1 +b _d +W·u _t-1,k +b _u ) (7)

representing content of the memory modification to the entity; finally, the formula (9) is based on the physical memory u at the previous moment _t-1,k And modifying the content u-to-u _t,k Updating to obtain entity memory content u of current time step _t,k The method comprises the steps of carrying out a first treatment on the surface of the Equation (10) will u _t,k And d _t-1 Performing an attention mechanism to obtain the attention weight ψ of the memory module _t,k ；W,b _γ ,b _d ,b _u Is a super parameter.

4. A text generation method according to claim 3, characterized in that: step 8) circulating the hidden state d of the neural network according to the generation stage _t-1 And e _1-n The attention doing mechanism gets the attention score a _1-n Score attention a _t,k And corresponding entity memory u _t,k Multiplication to obtain the entity context vector S _t,k The method comprises the steps of carrying out a first treatment on the surface of the The method comprises the following steps:

S _t,k ＝a _t,k ·e _k (12)

equation (11) will generate hidden states for the phase-cycled neural networkState d _t-1 Hidden with the 1 st to n th entities to represent e _1-n Attention score a is obtained through an attention mechanism _1-n Equation (12) scores attention a _t,k Hidden from entities representation e _k Multiplication to obtain the entity context vector S _t,k Where t represents the t-th time step and k represents the kth entity;