CN115577118B - Text generation method based on mixed grouping ordering and dynamic entity memory planning - Google Patents

Text generation method based on mixed grouping ordering and dynamic entity memory planning Download PDF

Info

Publication number
CN115577118B
CN115577118B CN202211216143.7A CN202211216143A CN115577118B CN 115577118 B CN115577118 B CN 115577118B CN 202211216143 A CN202211216143 A CN 202211216143A CN 115577118 B CN115577118 B CN 115577118B
Authority
CN
China
Prior art keywords
entity
graph
sub
sequence
memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211216143.7A
Other languages
Chinese (zh)
Other versions
CN115577118A (en
Inventor
荣欢
孙圣杰
马廷淮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Information Science and Technology
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN202211216143.7A priority Critical patent/CN115577118B/en
Publication of CN115577118A publication Critical patent/CN115577118A/en
Application granted granted Critical
Publication of CN115577118B publication Critical patent/CN115577118B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3346Query execution using probabilistic model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a text generation method based on mixed grouping ordering and dynamic entity memory planning, which aims to automatically convert input structured data into readable text describing the data. The invention selects sub-image grouping through a length control module and a sub-image observation module in the grouping stage, and sorts the data according to the group; generating a static node content plan in a static planning stage to achieve ordered inter-group groups in the group; each time step dynamically decides the data output in the next step according to the memory network on the basis of static planning; with three-level reconstruction, the decoder is directed from multiple angles to capture essential features in the input. The invention introduces a finer granularity grouping mechanism to make up the gap between structured data and unstructured text; the dynamic content planning is further combined with a memory network, so that the consistency of the semantics is enhanced; a three-level reconstruction mechanism is introduced to capture the intrinsic feature dependency between the input and the output from different levels.

Description

Text generation method based on mixed grouping ordering and dynamic entity memory planning
Technical Field
The invention belongs to the field of natural language processing, and particularly relates to a text generation method which is suitable for converting input structured data into readable text problems describing the data.
Background
Text generation tasks are an important topic in the field of natural language processing. Life data may appear in different forms under different circumstances, even some forms are difficult for a non-professional field to understand, for example: knowledge graph. And the conversion of such data into readable text provides people with a great deal of time and effort. Whereas the Data-to-text task aims at automatically converting the input structured Data into readable text describing these Data.
Reiter [1] Summarizing the text generation system, it is considered that it can be divided into three more independent modules: (1) Content planning (Content planning), i.e. selecting which data records or data fields to describe; (2) Sentence planning (Sentence planning), i.e. determining the order of the selected data records or data fields in the sentence; (3) Sentence realization (Surface realization), i.e. the generation of the actual text based on the result of the sentence planning. Intuitively, content planning mainly solves what, sentence planning mainly solves speaking order, and sentence implementation mainly solves what. This essentially becomes the paradigm of text generation systems, and more end-to-end models have begun to add content selection and content planning modules to improve performance in recent years. Puduppully et al [2] A neural network architecture is provided, which divides the generation task into a content selection and planning stage and a sentence realization stage. Given a set of data records, a content plan is first generated highlighting which information should be mentioned and in what order, then documents are generated based on the content plan, and a replication mechanism is added to promote decoder effects. Chen et al [3] A text generation model based on dynamic content planning is proposed to be implemented according to a generated text information dynamic adjustment plan, and a reconstruction mechanism is added to promote a decoder to capture essential features which the encoder wants to express. Puduppully et al [4] According to the generation process information and entity memory, the entity representation is dynamically updated, entity transition among sentences is captured, continuity among sentences is increased, and the content to be described is more properly selected.
Although the sentence realization phase can generate fluent text, information loss, duplication or illusion problems still occur, so the grouping concept is widely used, aligning entities with descriptive text to solve such problems. Lin et al [5] Separators are added to the plan for fine-grained classification to facilitate long text generation. Shen et al [6] Grouping the entity data, each portion being associated with a segment of the traceThe text corresponds to the corresponding description text can be generated by the designated entity pair without paying attention to the whole data. Xu et al [7] And sequencing and aggregating the input triplet data so as to align the triplet data with the output descriptive text, and generating the descriptive text sentence by sentence.
In connection with the above understanding, the content planning and sentence planning sections are further enhanced. Introducing a grouping mechanism with finer granularity, and constructing a corresponding static planning generation strategy in a matched manner; the memory network is further combined with entity transfer to grasp transfer of description key points among sentences; further reconstruction from multiple angles ensures that the multiple stages capture the essential features between the input and output.
Reference is made to:
[1]Reiter E.An architecture for data-to-text systems[C]//proceedings of the eleventh European workshop on natural language generation(ENLG 07).2007:97-104.
[2]Puduppully R,Dong L,Lapata M.Data-to-text generation with content selection and planning[C]//Proceedings of the AAAI conference on artificial intelligence.2019,33(01):6908-6915.
[3]Chen K,Li F,Hu B,et al.Neural data-to-text generation with dynamic content planning[J].Knowledge-Based Systems,2021,215:106610.
[4]Puduppully R,Dong L,Lapata M.Data-to-text generation with entity modeling[J].arXiv preprint arXiv:1906.03221,2019.
[5]Lin X,Cui S,Zhao Z,et al.GGP:A Graph-based Grouping Planner for Explicit Control of Long Text Generation[C]//Proceedings of the 30th ACM International Conference on Information&Knowledge Management.2021:3253-3257.
[6]Shen X,Chang E,Su H,et al.Neural data-to-text generation via jointly learning the segmentation and correspondence[J].arXiv preprint arXiv:2005.01096,2020.
[7]Xu X,
Figure BDA0003876166390000021
O,Rieser V,et al.AGGGEN:Ordering and Aggregating while Generating[J].arXiv preprint arXiv:2106.05580,2021.
disclosure of Invention
The invention aims to: aiming at the structural difference when converting structured data into linear readable texts, the prior model adopts an advanced planning method to make up for the structural difference, but the traditional planning method adopts a single cyclic neural network which is simple and not fine enough in granularity, and all the methods are realized by planning first and then, and the problems of adjustment in combination with a text generation process are solved.
The technical scheme is as follows: in order to achieve the purpose of the invention, the technical scheme adopted by the invention is as follows:
a text generation method based on mixed grouping ordering and dynamic entity memory planning is characterized in that the entity representation is updated by utilizing information of a generation process and entity transfer memory based on static planning, the static planning is corrected, and finally a decoder is promoted to obtain more accurate important information from an encoder through three-level reconstruction. The method specifically comprises the following steps:
step 1) taking a structured data set needing to generate a corresponding text as model input, wherein the data is expressed in a form of a table or a knowledge graph, converting the obtained data into a bipartite graph, and carrying out embedded expression by using a graph attention mechanism;
step 2) grouping and sorting the data vectors obtained in the step 1) through a grouping stage; the grouping phase comprises two modules: a length control module and a sub-graph observation module; the length control module acts on each generation step, the generated sub-image sequence information is combined to map into probability distribution, the sub-image with the number of triples LC is selected from the next time step generation sub-image according to the probability distribution, the time step can only select the sub-image with the number of triples LC, if LC is selected as-1, the grouping stage is ended, and the step 5 is entered;
step 3) using the selection space for generating the sub-graph length LC control sub-graph obtained in the step 2), the sub-graph observation mechanism obtains the representation of the sub-graph according to the self-attention mechanism of all nodes in the sub-graph, and performs the attention mechanism with the sub-graph and node information in the previously generated sub-graph sequence, so as to generate the probability of selecting each sub-graph;
step 4) selecting a certain subgraph according to the probability distribution obtained in the step 3), and then updating node representations in all subgraphs by using the hidden state of the current step of the cyclic neural network, namely updating the representations of all subgraphs, and returning to the step 2); if step 2) LC is selected as-1, a final sub-graph sequence is obtained, and each sub-graph is a subset of the input structured data set;
step 5) static content planning stage selection generating entity sequence SP, representing V by global node global As the initialization state of the cyclic neural network, the selection space of each step is the corresponding subgraph in the sequence of step 4); when generating special sub-graph end mark<EOG>When the next step of input of the cyclic neural network is the representation of the current subgraph, the next step of selection space is obtained according to the subgraph sequence obtained in the step 4); when traversing the sub-image sequence, obtaining a final static content planning SP entity sequence;
step 6) coding the SP entity sequence obtained in the step 5) through a bidirectional gating loop network to obtain an SP sequence entity hiding representation e 1-n N represents the total number of entities in the SP sequence; transmitting the SP sequence hidden representation to a generation stage and an entity memory module;
step 7), the entity memory module uses the SP sequence entity hidden representation as initial content to carry out memory storage; hiding state d by using generation-stage recurrent neural network t-1 Updating physical memory u t,k Memorizing the entity u t,k And d t-1 Multiplying to obtain memory weight ψ t,k Where t represents the t-th time step and k represents the kth entity;
step 8) circulating the hidden state d of the neural network according to the generation stage t-1 And e 1-n The attention doing mechanism gets the attention score a 1-n Score attention a t,k And corresponding entity memory u t,k Multiplication to obtain the entity context vector S t,k
Step 9) memorizing the weight ψ t,k And corresponding entity context vector S t,k The weights are summed to obtain a context vector q t As an input of the pointer generation decoder, enhancing the pointer decoder by adopting a graph structure enhancement mechanism to generate a translation text corresponding to the structured data;
and 10) adopting three-level reconstruction to enable the decoder to completely acquire the information contained in the encoder, respectively reconstructing a static content planning SP according to the translation text, removing a sub-picture sequence of a reconstruction grouping stage according to the static content planning sequence, generating a decoding result of the decoder according to the pointer, and recovering the decoding result to be a bipartite graph representation.
Further, the data in the step 1) are represented in the form of a table or a knowledge graph, wherein the structured data exist in the form of records, and the structured data exist in the form of triples in the knowledge graph;
the knowledge graph is used as structured input data, and the triples are formed by a head entity, a relation and a tail entity; converting the obtained data into a bipartite graph, namely, representing the relation in the triplet as a node, and simultaneously adding a global node to observe the structural system information of the whole graph; all nodes are represented embedded using a graph attention mechanism.
Further, the SP entity sequence obtains the SP entity hidden representation e through Bi-gating cyclic network Bi-GRU 1-n Fusing the sequence information of the SP into entity embedding;
hidden state d using generation of decoding recurrent neural network RNN t-1 Updating each physical memory u in the memory network t,k Where t represents the t-th time step and k represents the kth entity; comprising the following steps:
u -1,k =W·e k (5)
γ t =softmax(W·d t-1 +b γ ) (6)
δ t,k =γ t ⊙softmax(W·d t-1 +b d +W·u t-1,k +b u ) (7)
Figure BDA0003876166390000041
Figure BDA0003876166390000042
Figure BDA0003876166390000043
first, equation (5) represents e by entity k Initializing the memory of each entity, denoted as u -1,k The method comprises the steps of carrying out a first treatment on the surface of the Shown in formula (6), gamma t Representing gating, deciding whether to modify, and determining the hidden state d according to the previous time step in the generation stage t-1 I.e. information decision of the generated text; shown in formula (7), delta t,k Indicating the extent to which modification is required, according to d t-1 And physical memory u t-1,k Determining; in the formula (8), the expression "a",
Figure BDA0003876166390000044
representing content of the memory modification to the entity; finally, the formula (9) is based on the physical memory u at the previous moment t-1,k And modify content->
Figure BDA0003876166390000045
Updating to obtain entity memory content u of current time step t,k The method comprises the steps of carrying out a first treatment on the surface of the Equation (10) will u t,k And d t-1 Performing an attention mechanism to obtain the attention weight ψ of the memory module t,k ;W,b γ ,b d ,b u Is a super parameter.
Further, step 8) loops the hidden state d of the neural network according to the generation stage t-1 And e 1-n The attention doing mechanism gets the attention score a 1-n Score attention a t,k And corresponding entity memory u t,k Multiplication to obtain the entity context vector S t,k The method comprises the steps of carrying out a first treatment on the surface of the The method comprises the following steps:
Figure BDA0003876166390000046
S t,k =a t,k ·e k (12)
equation (11) will generate the hidden state d of the phase-cycled neural network t-1 Hidden with the 1 st to n th entities to represent e 1-n Attention score a is obtained through an attention mechanism 1-n Equation (12) scores attention a t,k Hidden from entities representation e k Multiplication to obtain the entity context vector S t,k Where t represents the t-th time step and k represents the kth entity;
Figure BDA0003876166390000047
equation (13) is based on the memory module attention weight ψ t,k For entity context vector S t,k The weight sums to obtain the context vector q of the current t moment t As a pointer, generates the input to the network.
The beneficial effects are that: compared with the prior art, the technical scheme of the invention has the following beneficial technical effects:
the invention relates to a text generation method which combines deep reinforcement learning and is realized while planning, and is used for the condition that a readable text is required to be automatically output given structured input data. In planning, not only the importance of the input data per se is considered, but also the information of the generated text and the memory of the past physical changes are considered, and fine granularity grouping is further carried out, so that the generated planning is consistent with the golden planning as much as possible.
Meanwhile, three-level reconstruction is adopted to capture the essential characteristics between input and output from different layers. Reconstructing a static plan according to the generated text, and enabling the generated text to be basically consistent with the static plan, so as to ensure that the dynamic plan finely adjusts the static plan only according to the generated information; reconstructing the sequence of the selected entities in the groups according to the static programming sequence to ensure that the static programming sequence still keeps ordered among the groups; and restoring the decoding result of the pointer generation decoder to be bipartite graph representation, and reconstructing based on vector angles to enable the final decoding result to reflect the input essential characteristics.
Drawings
FIG. 1 is a flow chart of a text generation algorithm of the present invention;
FIG. 2 is a block diagram of a text generation algorithm of the present invention;
FIG. 3 is a packet phase flow diagram;
FIG. 4 is a block diagram of a packet phase;
fig. 5 is a process build diagram of a packet phase selection subgraph.
Detailed Description
The technical scheme of the invention is further described below with reference to the accompanying drawings and examples.
The text generation algorithm based on the mixed packet ordering and dynamic entity memory planning according to the invention is further described in detail with reference to the flow chart and the implementation case.
The text generation algorithm is improved by adopting mixed grouping ordering and dynamic entity memory planning, so that the planning performance is improved, and a coherent text is generated. The flow of the method is shown in fig. 1, the algorithm structure is shown in fig. 2, and the method comprises the following steps:
and step 10, converting the input structured data into bipartite graph representation, namely, representing the relationship in the triples as nodes, adding global nodes to observe the structural system information of the whole graph, and carrying out node embedding representation through a graph attention mechanism (GAT).
Step 20, inter-group ordering is performed by the grouping stage. The grouping phase comprises two modules: the length control module is used for controlling the selection space of the subgraph according to the length, and the subgraph observation module is used for selecting the subgraph by combining the generated subgraph sequence information in the selection space designated by the length control.
And step 30, generating an entity sequence SP in the static programming stage, and achieving the order among groups in the groups, as shown in formulas (1) - (4). The static planning stage adopts a cyclic neural network, and aims to generate a node sequence, and plan and generate the description content and sequence of the text in advance.
If LC=-1 then d 0 =V GLOBAL (1)
When the value range of the return value LC of the Length control module is [ -1, max Length]Max Length is the total number of triples in the input structure data and is used for determining the space for selecting the subgraph in the next time step. As shown in formula (1), when the length control module return value LC of the grouping phase is-1, the grouping phase is ended, the static programming phase is entered, and the global node is used for representing V global As an initialization state of the recurrent neural network.
Figure BDA0003876166390000061
Figure BDA0003876166390000062
Figure BDA0003876166390000063
The selection space of the nodes is limited by the group, and all nodes or nodes in the current group can be selected in the content planning stage<EOG>An indication is made to indicate that the sub-graph utilization currently being the selected space is complete. Equation (2) is based on the kth subgraph G k The z-th node in (a) represents
Figure BDA0003876166390000064
Hidden state d with previous time step loop network t-1 Computing gating gate z Gating gate z The degree of association of the node itself with the static plan is measured, where z represents the z-th node. Gate is controlled as shown in formula (3) z And the kth sub-graph G k The z-th node in (2) represents->
Figure BDA0003876166390000065
The context representation of the node is obtained after multiplication +.>
Figure BDA0003876166390000066
The node importance is judged in conjunction with the generated SP sequence. Finally formula (4) is according to +.>
Figure BDA0003876166390000067
Calculating the node of the current step selection node k,z Probability of (2)
Figure BDA0003876166390000068
Namely, the relevance between each node in the subgraph and the node sequence generated by the preamble is measured, the node is selected, and each time step of the cyclic neural network takes the node representation selected in the previous step as input.
When the last time step SP is generated as<EOG>With special symbols, i.e. representing sub-graph G currently being the space of choice k With end, selecting the next sub-graph G in accordance with the sub-graph sequence generated in the grouping phase k+1 . Inputting a last sub-graph vector representation into a recurrent neural network
Figure BDA0003876166390000069
Sub-picture vector representation +.>
Figure BDA00038761663900000610
Pooling subgraph G by averaging k+1 Is derived from all node representations. For G k+1 The subgraph repeats the operations of the above formulas (1) - (4). And when the sub-graph sequence traversal obtained in the grouping stage is finished, the static content planning stage also obtains a final SP entity sequence.
Step 40, the SP entity sequence obtains the SP entity hidden representation e through Bi-directional gating cyclic network Bi-GRU 1-n ,(e 1 ,e 2 ,...,e n )=Bi-GRU(SP 1 ,SP 2 ,...,SP n ) The sequence information of the SP is fused into the entity embedding.
Step 50, combining the generated text information with the entity memory stored in the entity memory network to obtain a context q t As shown in equations (5) - (13).
u -1,k =W·e k (5)
γ t =softmax(W·d t-1 +b γ ) (6)
δ t,k =γ t ⊙softmax(W·d t-1 +b d +W·u t-1,k +b u ) (7)
Figure BDA0003876166390000071
Figure BDA0003876166390000072
Figure BDA0003876166390000073
Further, utilizing the hidden state d of the generated decoding recurrent neural network RNN t-1 Updating each physical memory u in the memory network t,k Where t represents the t-th time step and k represents the kth entity. First, equation (5) represents e by entity k Initializing the memory of each entity, denoted as u -1,k . Shown in formula (6), gamma t Representing gating, deciding whether to modify, and determining the hidden state d according to the previous time step in the generation stage t-1 I.e. the information decision that text has been generated. Shown in formula (7), delta t,k Indicating the extent to which modification is required, according to d t-1 And physical memory u t-1,k And (5) determining. In the formula (8), the expression "a",
Figure BDA0003876166390000074
representing the content of the memory modification to the entity. Finally, the formula (9) is based on the physical memory u at the previous moment t-1,k And modify content->
Figure BDA0003876166390000075
Updating to obtain entity memory content u of current time step t,k . Equation (10) will u t,k And d t-1 Attention machine is performed to make the attention weight ψ of the memory module t,k
Figure BDA0003876166390000076
S t,k =a t,k ·e k (12)
Further, equation (11) will generate hidden state d of the phase-cycled neural network t-1 Hidden with the 1 st to n th entities to represent e 1-n Attention score a is obtained through an attention mechanism 1-n Equation (12) scores attention a t,k Hidden from entities representation e k Multiplication to obtain the entity context vector S t,k Where t represents the t-th time step and k represents the kth entity.
Figure BDA0003876166390000077
Equation (13) is based on the memory module attention weight ψ t,k For entity context vector S t,k The weight sums to obtain the context vector q of the current t moment t As a pointer, generates the input to the network.
Step 60, q t As an input to the pointer generation network, the pointer generation network is enhanced with a graph structure enhancement mechanism (Graph Structure Enhancement Mechanism) to generate text, as shown in formulas (14) - (19).
Figure BDA0003876166390000078
Equation (14) is based on the hidden state d of the generation module t And context vector q t Context vector computed to generate hidden states
Figure BDA0003876166390000079
Figure BDA00038761663900000710
Figure BDA0003876166390000081
Figure BDA0003876166390000082
Equation (15) utilizes a context vector that generates phase hidden states
Figure BDA0003876166390000083
Projected as a probability distribution of the same length as the vocabulary. Equation (16) takes the attention weight to an entity in the memory network as the duplication probability +.>
Figure BDA0003876166390000084
The formula (17) adopts the existing graph structure enhancement mechanism (Graph Structure Enhancement Mechanism) to enhance the replication probability given by the pointer generation network by means of the collage of the graph.
θ=Sigmoid(W·d t +b d ) (18)
Figure BDA0003876166390000085
The conditional probability is used to combine the generation probability and the replication probability. Equation (18) calculates the probability that θ is used to select the copy or generate mode. Equation (19) soft combines the generation probability and the graph structure enhanced replication probability to obtain the final probability distribution.
The model is constructed by adopting a pipeline model and is divided into a grouping stage, a static programming stage, a physical memory stage and a pointer generation decoding stage. And extracting by the existing information extraction system, and comparing the reference text in the sample with the input structured data to obtain the static planning gold standard. Since dynamic content planning does not give the displayed golden standard, the memory module parameters are updated by means of the generated loss function. The reference text in the sample and the obtained static planning gold standard can be compared with the generated text and the generated static planning to obtain a loss function.
Figure BDA0003876166390000086
Figure BDA0003876166390000087
Equation (20) represents a negative log likelihood loss function of the generated text, causing the generated text to coincide as much as possible with the reference text given in the sample. Wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA0003876166390000088
expressed as reference text, t represents the t-th time step. Equation (21) is used for regularization of the loss function, where T represents the generated text length, i.e. the total number of time steps, +.>
Figure BDA0003876166390000089
And (3) an average value of the statistical probability, wherein gamma is a super parameter.
Figure BDA00038761663900000810
Equation (22) characterizes the negative log likelihood of generating the SP sequence, maximizing the probability of generating the static programming gold standard, |SP| represents the sequence length of the static programming,
Figure BDA00038761663900000811
representing static content planning gold marks->
Figure BDA00038761663900000812
Is included in the node (a).
And step 70, reconstructing static programming according to the generated translation text by adopting three-level reconstruction, de-reconstructing groups according to a static programming sequence, generating a decoding result according to the pointer, and recovering to be a bipartite graph representation.
P rec1 (SP=node z )=Softmax(W·h t +b) (23)
Further, the first level of reconstruction uses a recurrent neural network as a model, and the static programming is reconstructed according to the translation text, namely, the static programming SP is extracted from the decoded vocabulary embedded representation. Generating hidden state vector of last moment of decoder by pointer
Figure BDA0003876166390000091
Initializing the hidden state of the cyclic neural network, which is defined as h 0 . Calculation of translated text context vector by attention manipulation of all vocabulary embedded representations of hidden state and pointer generation decoder>
Figure BDA0003876166390000092
As an input to the recurrent neural network. Equation (23) calculates the probability of selecting a node, where h t The hidden state output by the neural network is cycled at t time, and W, b is a super parameter. Thus, the loss function to generate a reconstruction of the text to the static programming section may be defined as:
Figure BDA0003876166390000093
/>
Figure BDA0003876166390000094
where |SP| represents the sequence length of the generated static plan, node z Represents the z-th node in the SP entity sequence generated in the static planning stage,
Figure BDA0003876166390000095
and (3) calculating the average value of the probabilities of each time step in the reconstruction 1 stage, wherein the average value is used for regularization of a loss function, and gamma is a super parameter. Loss function L rec1 The aim is to extract as much as possible the previous static plan from the generated descriptive text.
Further, the second level of reconstruction de-reconstructs the order of these selected entities in the packet according to the static programming sequence, i.e., reverts from the static programming to the packet sequence number, with the aim of preserving the ordered characteristics among the group of packet phases. The second-level reconstruction and the first-level reconstruction adopt similar structures, a static planning sequence is input through an attention mechanism, and a cyclic neural network is adopted to generate a corresponding grouping sequence. The loss function of the secondary reconstruction can be defined as:
Figure BDA0003876166390000096
where |g| denotes the length of the packet sequence, G k Representing the kth sub-graph in the sub-graph sequence generated by the packet phase,
Figure BDA0003876166390000097
and (3) calculating the average value of the probabilities of each time step in the 2-stage reconstruction, wherein the calculation mode is the same as the formula (25), and gamma is a super parameter.
Further, the first-stage reconstruction and the second-stage reconstruction, namely, the generation of text reconstruction static programming and static programming reconstruction packet sequences, adopt serial number-based reconstruction. For the third level reconstruction, i.e. the network decoding result is restored to the bipartite graph representation according to the pointer generation, the reconstruction based on the vector representation is adopted.
L rec3 =KL((m 1 ,m 2 ,...m |V| ),GAT CUBE (Bipartite)) (27)
Equation (27) is expressed as m by encoding and re-decoding the pointer generation network decoding result to obtain the codes of the 1 st to |V| nodes 1 ,m 2 ,...m |V| ,L rec3 The decoded result is required to be consistent with the embedded representation of the bipartite graph after being subjected to graph attention mechanism (GAT) coding, and the reconstruction is carried out from the angle of vector representation, so that the KL divergence is used as a loss function.
L TOTAL =λ 1 ·L sp2 ·L lm3 ·L rec14 ·L rec25 ·L rec3 (28)
Finally, the model penalty function may be defined as equation (28), defined by a combination of three penalty functions of static programming penalty, text penalty generation, and three level reconstruction. Lambda (lambda) 1-5 Are super parameters.
As shown in fig. 3, the packet phase flow chart is as follows:
step 101, the Length control module selects the Length LC of the sub-image to be generated by combining the information of the generated sub-image sequence, the range of the LC is [ -1, max Length]Max Length is the total number of triples in the input structure data and is used for determining the space for selecting the subgraph in the next time step. The method comprises the steps of carrying out a first treatment on the surface of the If LC is-1, the packet phase terminates.
Figure BDA0003876166390000101
The representation of the sub-graph is selected for the previous step for updating the length control memory vector L.
Figure BDA0003876166390000102
P LC =Softmax(W LC ·L t +b LC ) (30)
The formula (29) updates the Length control memory vector L according to the generated sequence, the formula (30) uses the L vector to project into probability steps with the Length equal to Max length+1, and the number of triples is selected according to the probability, and is defined as LC. Limiting the selection space of generating the sub-graph in the current step through the length of the sub-graph, and only selecting the sub-graph with the number of triples LC in the sub-graph by sub-graph observation. Where γ is the hyper-parameter and t represents the t-th time step.
V Global ←GAT(Bipartite) (31)
LC=Sigmoid(W·V Global +b LC ) (32)
As shown in the block diagram of fig. 4, the first time step lacks the generated sub-picture sequence information, so equation (31) is initialized with the global nodes in the bipartite graph.
Figure BDA0003876166390000103
As shown in fig. 5, all possible sub-graph sets can be defined as a three-dimensional Tensor, called square matrix Cube. Equation (33) converts the selected LC into one-hot vector LC onehot The selection of a sub-image space of a certain length can be intuitively understood from fig. 5, i.e. the selection of the corresponding page from Cube.
Step 102, controlling a selection space according to the length LC of the generated sub-graph, observing the sub-graph, and carrying out an attention mechanism with sub-graph and node information in the sequence of the generated sub-graph according to the representation of each sub-graph, so as to generate the probability of selecting each sub-graph.
Figure BDA0003876166390000104
The formula (34) is to average and pool the node representations in the subgraph to be the representation of the subgraph, and the node representations are updated at each moment, so that the subgraph representations are different, the subscript t represents the t-th time step, i represents the i-th subgraph, and j represents the j-th node in the subgraph. Attention mechanisms are added from the sub-graph level and the node level, respectively.
Figure BDA0003876166390000111
Figure BDA0003876166390000112
Equation (35) calculates the attention score, g, for the candidate sub-graph and the previously selected sub-graph t,i Representation of a candidate sub-graph i at time t, G k The selected sub-graph embeds the representation at time k before the representation. Equation (36) will be the information g of the candidate graph itself t,i With previously selected sub-picture information G k The attention mechanism of the sub-graph layer is completed by fusing the attention mechanismsThe following vectors
Figure BDA0003876166390000113
Figure BDA0003876166390000114
Figure BDA0003876166390000115
Figure BDA0003876166390000116
Equation (37) will sub-graph context vector
Figure BDA0003876166390000117
The attention score is calculated with all nodes of the previously selected subgraph,
Figure BDA0003876166390000118
representing the z-th node in the selected graph k. Equation (38) will be the candidate subgraph context vector +.>
Figure BDA0003876166390000119
Information about all nodes in the previously selected subgraph +.>
Figure BDA00038761663900001110
The attention mechanism of the node layer is completed by fusing the attention mechanisms, and the context vectors of the child nodes are +.>
Figure BDA00038761663900001111
Finally, formula (39) is based on the node context vector +.>
Figure BDA00038761663900001112
Calculating a probability of selecting the subgraph, wherein W G 、b G Is a packet phase super parameter. As shown in figure 5 of the drawings,intuitively, it is understood that selecting a sub-image representation from sub-image pages of the same length.
Step 103, updating node representations in all sub-graphs after selecting a certain sub-graph, namely updating the representations of all sub-graphs.
u=σ(W update G t +b update ) (40)
Figure BDA00038761663900001113
/>
Figure BDA00038761663900001114
After each step of selecting the sub-graph, the representation of all nodes is updated by selecting the information of the sub-graph, so that even if the sub-graph is repeatedly selected, its representation is different. Formula (40) subgraph G selected by the previous step t Representing computing update content, wherein W update 、b update To update the super parameters. Equation (41) calculates gating for adjusting the relationship between the previous information retention and the updated information of the newly added subgraph, W gate 、b gate Super parameters are calculated for gating. Equation (42) updates its representation for each node, where t represents the t-th time and v represents the v-th node, so the node representation is different for each time.
Step 104, repeating the operations from step 101 to step 103 after updating all node representations until the LC value generated in step 101 is-1, and ending the grouping stage to obtain a final sub-graph sequence.
The grouping phase cannot extract gold standards from the samples. Therefore, the whole model can be warmed up first, the model is unfamiliar to data in the initial stage, and the weight distribution is continuously corrected by learning with a smaller learning rate. After the model has a certain familiarity to the data, fixing all parameters of the subsequent modules, and adjusting the weight of the grouping stage by using the loss function of the subsequent modules. Method 1: and fixing parameters of the static planning module, inputting grouping results of the grouping stage into the static planning stage to generate SP, and comparing the SP with the golden static planning given by the data set to obtain a loss function. Method 2: and (3) completely fixing all parameters from the static planning module to the pointer generation network module, inputting a grouping result, outputting a generated text, and comparing the generated text with a reference text given by a sample to obtain a loss function. The super parameters of the training packet phase are updated by methods 1 and 2.

Claims (4)

1. A text generation method based on mixed grouping ordering and dynamic entity memory planning is characterized in that: the method comprises the following steps:
step 1) taking a structured data set needing to generate a corresponding text as model input, wherein the data is expressed in a form of a table or a knowledge graph, converting the obtained data into a bipartite graph, and carrying out embedded expression by using a graph attention mechanism;
step 2) grouping and sorting the data vectors obtained in the step 1) through a grouping stage; the grouping phase comprises two modules: a length control module and a sub-graph observation module; the length control module acts on each generation step, the generated sub-image sequence information is combined to map into probability distribution, the sub-image with the number of triples LC is selected from the next time step generation sub-image according to the probability distribution, the time step can only select the sub-image with the number of triples LC, if LC is selected as-1, the grouping stage is ended, and the step 5 is entered;
step 3) using the selection space for generating the sub-graph length LC control sub-graph obtained in the step 2), the sub-graph observation mechanism obtains the representation of the sub-graph according to the self-attention mechanism of all nodes in the sub-graph, and performs the attention mechanism with the sub-graph and node information in the previously generated sub-graph sequence, so as to generate the probability of selecting each sub-graph;
step 4) selecting a certain subgraph according to the probability distribution obtained in the step 3), and then updating node representations in all subgraphs by using the hidden state of the current step of the cyclic neural network, namely updating the representations of all subgraphs, and returning to the step 2); if step 2) LC is selected as-1, a final sub-graph sequence is obtained, and each sub-graph is a subset of the input structured data set;
step 5) static content planning stage selection generating entity sequence SP, representing V by global node global As the initialization state of the cyclic neural network, the selection space of each step is the corresponding subgraph in the sequence of step 4); when generating special sub-graph end mark<EOG>When the next step of input of the cyclic neural network is the representation of the current subgraph, the next step of selection space is obtained according to the subgraph sequence obtained in the step 4); when traversing the sub-image sequence, obtaining a final static content planning SP entity sequence;
step 6) coding the SP entity sequence obtained in the step 5) through a bidirectional gating loop network to obtain an SP sequence entity hiding representation e 1-n N represents the total number of entities in the SP sequence; transmitting the SP sequence hidden representation to a generation stage and an entity memory module;
step 7), the entity memory module uses the SP sequence entity hidden representation as initial content to carry out memory storage; hiding state d by using generation-stage recurrent neural network t-1 Updating physical memory u t,k Memorizing the entity u t,k And d t-1 Multiplying to obtain memory weight ψ t,k Where t represents the t-th time step and k represents the kth entity;
step 8) circulating the hidden state d of the neural network according to the generation stage t-1 And e 1-n The attention doing mechanism gets the attention score a 1-n Score attention a t,k And corresponding entity memory u t,k Multiplication to obtain the entity context vector S t,k
Step 9) memorizing the weight ψ t,k And corresponding entity context vector S t,k The weights are summed to obtain a context vector q t As an input of the pointer generation decoder, enhancing the pointer decoder by adopting a graph structure enhancement mechanism to generate a translation text corresponding to the structured data;
and 10) adopting three-level reconstruction to enable the decoder to completely acquire the information contained in the encoder, respectively reconstructing a static content planning SP according to the translation text, removing a sub-picture sequence of a reconstruction grouping stage according to the static content planning sequence, generating a decoding result of the decoder according to the pointer, and recovering the decoding result to be a bipartite graph representation.
2. The text generation method according to claim 1, characterized in that: step 1) the data are expressed in the form of a table or a knowledge graph, wherein the structured data exist in the form of records in the table, and the structured data exist in the form of triples in the knowledge graph;
the knowledge graph is used as structured input data, and the triples are formed by a head entity, a relation and a tail entity; converting the obtained data into a bipartite graph, namely, representing the relation in the triplet as a node, and simultaneously adding a global node to observe the structural system information of the whole graph; all nodes are represented embedded using a graph attention mechanism.
3. The text generation method according to claim 2, characterized in that: SP entity sequence obtains SP sequence entity hiding representation e through Bi-gating cyclic network Bi-GRU 1-n Fusing the sequence information of the SP into entity embedding;
hidden state d using generation of decoding recurrent neural network RNN t-1 Updating each physical memory u in the memory network t,k Where t represents the t-th time step and k represents the kth entity; comprising the following steps:
u -1,k =W·e k (5)
γ t =softmax(W·d t-1 +b γ ) (6)
δ t,k =γ t ⊙softmax(W·d t-1 +b d +W·u t-1,k +b u ) (7)
Figure FDA0003876166380000021
Figure FDA0003876166380000022
Figure FDA0003876166380000023
first, equation (5) represents e by entity k Initializing the memory of each entity, denoted as u -1,k The method comprises the steps of carrying out a first treatment on the surface of the Shown in formula (6), gamma t Representing gating, deciding whether to modify, and determining the hidden state d according to the previous time step in the generation stage t-1 I.e. information decision of the generated text; shown in formula (7), delta t,k Indicating the extent to which modification is required, according to d t-1 And physical memory u t-1,k Determining; in the formula (8), the expression "a",
Figure FDA0003876166380000024
representing content of the memory modification to the entity; finally, the formula (9) is based on the physical memory u at the previous moment t-1,k And modifying the content u-to-u t,k Updating to obtain entity memory content u of current time step t,k The method comprises the steps of carrying out a first treatment on the surface of the Equation (10) will u t,k And d t-1 Performing an attention mechanism to obtain the attention weight ψ of the memory module t,k ;W,b γ ,b d ,b u Is a super parameter.
4. A text generation method according to claim 3, characterized in that: step 8) circulating the hidden state d of the neural network according to the generation stage t-1 And e 1-n The attention doing mechanism gets the attention score a 1-n Score attention a t,k And corresponding entity memory u t,k Multiplication to obtain the entity context vector S t,k The method comprises the steps of carrying out a first treatment on the surface of the The method comprises the following steps:
Figure FDA0003876166380000025
S t,k =a t,k ·e k (12)
equation (11) will generate hidden states for the phase-cycled neural networkState d t-1 Hidden with the 1 st to n th entities to represent e 1-n Attention score a is obtained through an attention mechanism 1-n Equation (12) scores attention a t,k Hidden from entities representation e k Multiplication to obtain the entity context vector S t,k Where t represents the t-th time step and k represents the kth entity;
Figure FDA0003876166380000031
equation (13) is based on the memory module attention weight ψ t,k For entity context vector S t,k The weight sums to obtain the context vector q of the current t moment t As a pointer, generates the input to the network.
CN202211216143.7A 2022-09-30 2022-09-30 Text generation method based on mixed grouping ordering and dynamic entity memory planning Active CN115577118B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211216143.7A CN115577118B (en) 2022-09-30 2022-09-30 Text generation method based on mixed grouping ordering and dynamic entity memory planning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211216143.7A CN115577118B (en) 2022-09-30 2022-09-30 Text generation method based on mixed grouping ordering and dynamic entity memory planning

Publications (2)

Publication Number Publication Date
CN115577118A CN115577118A (en) 2023-01-06
CN115577118B true CN115577118B (en) 2023-05-30

Family

ID=84582422

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211216143.7A Active CN115577118B (en) 2022-09-30 2022-09-30 Text generation method based on mixed grouping ordering and dynamic entity memory planning

Country Status (1)

Country Link
CN (1) CN115577118B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111078866A (en) * 2019-12-30 2020-04-28 华南理工大学 Chinese text abstract generation method based on sequence-to-sequence model
US11010666B1 (en) * 2017-10-24 2021-05-18 Tunnel Technologies Inc. Systems and methods for generation and use of tensor networks
CN113360655A (en) * 2021-06-25 2021-09-07 中国电子科技集团公司第二十八研究所 Track point classification and text generation method based on sequence annotation
CN113657115A (en) * 2021-07-21 2021-11-16 内蒙古工业大学 Multi-modal Mongolian emotion analysis method based on ironic recognition and fine-grained feature fusion
CN114048350A (en) * 2021-11-08 2022-02-15 湖南大学 Text-video retrieval method based on fine-grained cross-modal alignment model

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10474709B2 (en) * 2017-04-14 2019-11-12 Salesforce.Com, Inc. Deep reinforced model for abstractive summarization
EP3598339A1 (en) * 2018-07-19 2020-01-22 Tata Consultancy Services Limited Systems and methods for end-to-end handwritten text recognition using neural networks
CA3081242A1 (en) * 2019-05-22 2020-11-22 Royal Bank Of Canada System and method for controllable machine text generation architecture
CN110795556B (en) * 2019-11-01 2023-04-18 中山大学 Abstract generation method based on fine-grained plug-in decoding
US11481418B2 (en) * 2020-01-02 2022-10-25 International Business Machines Corporation Natural question generation via reinforcement learning based graph-to-sequence model

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11010666B1 (en) * 2017-10-24 2021-05-18 Tunnel Technologies Inc. Systems and methods for generation and use of tensor networks
CN111078866A (en) * 2019-12-30 2020-04-28 华南理工大学 Chinese text abstract generation method based on sequence-to-sequence model
CN113360655A (en) * 2021-06-25 2021-09-07 中国电子科技集团公司第二十八研究所 Track point classification and text generation method based on sequence annotation
CN113657115A (en) * 2021-07-21 2021-11-16 内蒙古工业大学 Multi-modal Mongolian emotion analysis method based on ironic recognition and fine-grained feature fusion
CN114048350A (en) * 2021-11-08 2022-02-15 湖南大学 Text-video retrieval method based on fine-grained cross-modal alignment model

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
带有情感增强与情感融合的文本情感预测关键技术研究;荣欢;中国博士学位论文全文数据库 信息科技辑;全文 *
用户粒度级的个性化社交文本生成模型;高永兵,高军甜;计算机应用;全文 *
面向连贯性强化的无真值依赖文本摘要模型;马廷淮;计算机科学与探索;全文 *

Also Published As

Publication number Publication date
CN115577118A (en) 2023-01-06

Similar Documents

Publication Publication Date Title
Yu et al. PICK: processing key information extraction from documents using improved graph learning-convolutional networks
CN108415977B (en) Deep neural network and reinforcement learning-based generative machine reading understanding method
CN111538848B (en) Knowledge representation learning method integrating multi-source information
Chen et al. Delving deeper into the decoder for video captioning
CN111966820B (en) Method and system for constructing and extracting generative abstract model
CN111985205A (en) Aspect level emotion classification model
CN111062214B (en) Integrated entity linking method and system based on deep learning
CN112417092A (en) Intelligent text automatic generation system based on deep learning and implementation method thereof
CN115391563B (en) Knowledge graph link prediction method based on multi-source heterogeneous data fusion
CN115510236A (en) Chapter-level event detection method based on information fusion and data enhancement
CN114663962A (en) Lip-shaped synchronous face forgery generation method and system based on image completion
CN116579347A (en) Comment text emotion analysis method, system, equipment and medium based on dynamic semantic feature fusion
CN113641854B (en) Method and system for converting text into video
CN115630649A (en) Medical Chinese named entity recognition method based on generative model
CN110347853A (en) A kind of image hash code generation method based on Recognition with Recurrent Neural Network
CN115577118B (en) Text generation method based on mixed grouping ordering and dynamic entity memory planning
CN114880527B (en) Multi-modal knowledge graph representation method based on multi-prediction task
CN111444328A (en) Natural language automatic prediction inference method with interpretation generation
CN116340569A (en) Semi-supervised short video classification method based on semantic consistency
CN112069777B (en) Two-stage data-to-text generation method based on skeleton
CN112580370B (en) Mongolian nerve machine translation method integrating semantic knowledge
CN114972959A (en) Remote sensing image retrieval method for sample generation and in-class sequencing loss in deep learning
Wei et al. Stack-vs: Stacked visual-semantic attention for image caption generation
CN114780725A (en) Text classification algorithm based on deep clustering
CN113297385A (en) Multi-label text classification model and classification method based on improved GraphRNN

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant