CN115017178B - Training method and device for data-to-text generation model - Google Patents

Training method and device for data-to-text generation model Download PDF

Info

Publication number
CN115017178B
CN115017178B CN202210589921.0A CN202210589921A CN115017178B CN 115017178 B CN115017178 B CN 115017178B CN 202210589921 A CN202210589921 A CN 202210589921A CN 115017178 B CN115017178 B CN 115017178B
Authority
CN
China
Prior art keywords
data
structured data
text
target
loss value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210589921.0A
Other languages
Chinese (zh)
Other versions
CN115017178A (en
Inventor
耿瑞莹
石翔
黎槟华
孙健
李永彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
Alibaba China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba China Co Ltd filed Critical Alibaba China Co Ltd
Priority to CN202210589921.0A priority Critical patent/CN115017178B/en
Publication of CN115017178A publication Critical patent/CN115017178A/en
Application granted granted Critical
Publication of CN115017178B publication Critical patent/CN115017178B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the application provides a training method and device for a data-to-text generation model. The training method comprises the following steps: acquiring first training data, wherein the first training data comprises first structured data and target texts corresponding to the first structured data; acquiring a prediction text output by a first preset neural network model after the first structured data is input into the first preset neural network model; acquiring a first loss value between a predicted text and a target text; obtaining a second loss value between the predicted structured data and the first structured data, and determining a target loss value according to the first loss value and the second loss value; adjusting parameters of a first preset neural network model according to the target loss value to obtain a target neural network model; and sending the target neural network model to the target device. The technical scheme provided by the application can improve the fidelity of the generated text and the input structured data in terms of semantics.

Description

数据到文本生成模型的训练方法和装置Training method and device for data-to-text generation model

技术领域Technical Field

本申请涉及人工智能领域,尤其涉及一种数据到文本生成模型的训练方法和装置。The present application relates to the field of artificial intelligence, and in particular to a training method and device for a data-to-text generation model.

背景技术Background Art

数据到文本生成任务是文本生成的重要研究任务之一,其目标是根据输入的结构化数据自动生成相关的描述性文本。The data-to-text generation task is one of the important research tasks of text generation, and its goal is to automatically generate relevant descriptive text based on the input structured data.

目前,通常基于端到端的生成模型来将结构化数据生成文本。其中,端到端模型可以认为是一个黑盒子模型,该黑盒子模型在给定结构化数据对应的图结构(输入数据)后,可以生成与该输入数据对应的文本信息。Currently, text is usually generated from structured data based on an end-to-end generative model, where the end-to-end model can be considered as a black box model, which can generate text information corresponding to the input data after a graph structure (input data) corresponding to the structured data is given.

然而,在基于端到端的生成模型进行数据到文本的生成时,生成的文本与输入数据在语义上的忠实度不高。因此,如何提升生成的文本与输入的结构化数据在语义上的忠实度成为亟待解决的技术问题。However, when generating data to text based on an end-to-end generative model, the semantic fidelity between the generated text and the input data is not high. Therefore, how to improve the semantic fidelity between the generated text and the input structured data has become a technical problem that needs to be solved urgently.

发明内容Summary of the invention

本申请提供了一种数据到文本生成模型的训练方法和装置,能够提升生成的文本与输入的结构化数据在语义上的忠实度。The present application provides a method and device for training a data-to-text generation model, which can improve the semantic fidelity of generated text and input structured data.

第一方面,本申请提供一种数据到文本生成模型的训练方法,包括:获取第一训练数据,所述第一训练数据包括第一结构化数据和所述第一结构化数据对应的目标文本;获取所述第一结构化数据输入第一预设神经网络模型之后所述第一预设神经网络模型输出的预测文本;获取所述预测文本与所述目标文本之间的第一损失值;获取预测结构化数据与所述第一结构化数据之间的第二损失值,所述预测结构化数据为基于预设转换算法将所述预测文本转换之后的结构化数据,所述预设转换算法用于将文本信息转换为结构化数据;根据所述第一损失值和所述第二损失值确定目标损失值;根据所述目标损失值调整所述第一预设神经网络模型的参数,获得目标神经网络模型;向目标设备发送所述目标神经网络模型。In a first aspect, the present application provides a training method for a data-to-text generation model, comprising: obtaining first training data, the first training data comprising first structured data and a target text corresponding to the first structured data; obtaining predicted text output by a first preset neural network model after the first structured data is input into a first preset neural network model; obtaining a first loss value between the predicted text and the target text; obtaining a second loss value between the predicted structured data and the first structured data, the predicted structured data being structured data after the predicted text is converted based on a preset conversion algorithm, the preset conversion algorithm being used to convert text information into structured data; determining a target loss value based on the first loss value and the second loss value; adjusting parameters of the first preset neural network model based on the target loss value to obtain a target neural network model; and sending the target neural network model to a target device.

本实施例中,基于目标损失值来调整第一预设神经网络模型的参数,即同时优化第一损失值和第二损失值来调整预设网络模型的参数。In this embodiment, the parameters of the first preset neural network model are adjusted based on the target loss value, that is, the first loss value and the second loss value are simultaneously optimized to adjust the parameters of the preset network model.

应理解,由于第一损失值可以反映出当前预测文本与目标文本与之间的偏差程度,而第二损失值可以反映出预测文本与第一结构化数据之间的忠实度,因此,在本实施例中,相比只考虑当前预测文本与目标文本与之间的偏差程度来调整第一预设神经网络模型参数的方法,由于增加了优化预测文本与第一结构化数据之间的忠实度的任务,因此,能够在基于目标损失值调整第一预设神经网络模型的参数过程中,提升训练出的目标神经网络模型预测的文本与输入的结构化数据在语义上的忠实度。It should be understood that since the first loss value can reflect the degree of deviation between the current predicted text and the target text, and the second loss value can reflect the fidelity between the predicted text and the first structured data, therefore, in this embodiment, compared with the method of adjusting the parameters of the first preset neural network model by only considering the degree of deviation between the current predicted text and the target text, since the task of optimizing the fidelity between the predicted text and the first structured data is added, it is possible to improve the semantic fidelity between the text predicted by the trained target neural network model and the input structured data in the process of adjusting the parameters of the first preset neural network model based on the target loss value.

结合第一方面,在一种可能的实现方式中,所述目标设备为第二服务器。In combination with the first aspect, in a possible implementation manner, the target device is a second server.

结合第一方面,在一种可能的实现方式中,所述目标设备为终端设备。In combination with the first aspect, in a possible implementation manner, the target device is a terminal device.

结合第一方面,在一种可能的实现方式中,所述方法还包括:接收所述目标设备发送的所述第一预设神经网络模型。In combination with the first aspect, in a possible implementation, the method further includes: receiving the first preset neural network model sent by the target device.

结合第一方面,在一种可能的实现方式中,所述第一训练数据中还包括所述第一结构化数据中的每个数据的目标顺序,所述目标顺序为所述目标文本中与所述每个数据对应的文本在所述目标文本中的排列顺序;相应地,所述方法还包括:获取第一结构化数据输入所述第一预设神经网络模型之后第一预设神经网络模型输出的所述第一结构化数据中的每个数据的预测顺序,所述预测顺序为所述预测文本中与所述每个数据对应的文本在所述预测文本中的排列顺序;根据所述预测顺序与所述目标顺序确定第三损失值;相应地,所述根据所述第一损失值和所述第二损失值确定目标损失值,包括:根据所述第一损失值、所述第二损失值和所述第三损失值确定目标损失值。In combination with the first aspect, in a possible implementation, the first training data also includes a target order for each data in the first structured data, and the target order is the arrangement order of the text corresponding to each data in the target text; accordingly, the method also includes: obtaining the predicted order of each data in the first structured data output by the first preset neural network model after the first structured data is input into the first preset neural network model, and the predicted order is the arrangement order of the text corresponding to each data in the predicted text; determining a third loss value according to the predicted order and the target order; accordingly, determining the target loss value according to the first loss value and the second loss value includes: determining the target loss value according to the first loss value, the second loss value and the third loss value.

该实现方式中,还额外引入了优化第一结构体数据中的每个数据的预测位置与目标位置之间的损失值的训练任务,从而还可以提升调整第一预设神经网络模型的参数之后的模型输出的文本的流畅性。In this implementation, a training task is additionally introduced to optimize the loss value between the predicted position and the target position of each data in the first structure data, so as to improve the fluency of the text output by the model after adjusting the parameters of the first preset neural network model.

结合第一方面,在一种可能的实现方式中,所述目标损失值等于所述第一损失值、所述第二损失值及所述第三损失值之和。In combination with the first aspect, in a possible implementation manner, the target loss value is equal to the sum of the first loss value, the second loss value and the third loss value.

结合第一方面,在一种可能的实现方式中,所述方法还包括:获取M1个第二结构化数据,所述M1个第二结构化数据包括以下至少两种类型的数据:表格数据、结构化数据查询SQL数据、逻辑数据,M1为大于1的正整数;对所述M1个第二结构化数据进行预处理,获得与所述M1个第二结构化数据一一对应的M1个第二训练数据,所述M1个第二训练数据中与第j个第二结构化数据对应的第二训练数据包括:所述第j个第二结构化数据和所述第j个第二结构化数据对应的目标文本,j为正整数且j从1取至M1;使用所述M1个第二训练数据训练第二预设神经网络模型,获得所述第一预设神经网络模型。In combination with the first aspect, in a possible implementation, the method also includes: obtaining M1 second structured data, the M1 second structured data including at least two types of data: tabular data, structured data query SQL data, and logical data, M1 is a positive integer greater than 1; preprocessing the M1 second structured data to obtain M1 second training data corresponding one-to-one to the M1 second structured data, the second training data corresponding to the j-th second structured data in the M1 second training data including: the j-th second structured data and the target text corresponding to the j-th second structured data, j is a positive integer and j ranges from 1 to M1; using the M1 second training data to train a second preset neural network model to obtain the first preset neural network model.

该实施例中,通过基于各种结构化数据对应的训练数据对第二预设神经网络模型进行训练的方式来获得第一预设神经网络模型。可以理解的是,通过该实现方式获得的第一预设神经网络模型是基于各种结构化数据对应的训练数据训练得到的,因此,获得的第一预设神经网络模型已经具有了部分的从结构化数据到文本生成的能力。In this embodiment, the first preset neural network model is obtained by training the second preset neural network model based on the training data corresponding to various structured data. It can be understood that the first preset neural network model obtained by this implementation is trained based on the training data corresponding to various structured data, and therefore, the obtained first preset neural network model already has a partial ability to generate text from structured data.

结合第一方面,在一种可能的实现方式中,所述第二预设神经网络模型包括N1个编码器和N2个解码器,所述N1个编码器用于获取所述M1个结构化数据中每个结构化数据的特征向量,所述N2个解码器用于基于所述每个结构化数据的特征向量预测所述每个结构化数据对应的预测文本,N1和N2为大于1的正整数。In combination with the first aspect, in a possible implementation, the second preset neural network model includes N1 encoders and N2 decoders, the N1 encoders are used to obtain the feature vectors of each structured data in the M1 structured data, and the N2 decoders are used to predict the predicted text corresponding to each structured data based on the feature vector of each structured data, and N1 and N2 are positive integers greater than 1.

第二方面,本申请提供一种数据到文本生成模型的训练装置,包括:获取模块,用于获取第一训练数据,所述第一训练数据包括第一结构化数据和所述第一结构化数据对应的目标文本;所述获取模块,还用于获取所述第一结构化数据输入第一预设神经网络模型之后所述第一预设神经网络模型输出的预测文本;所述获取模块,还用于获取所述预测文本与所述目标文本之间的第一损失值;所述获取模块,还用于获取预测结构化数据与所述第一结构化数据之间的第二损失值,所述预测结构化数据为基于预设转换算法将所述预测文本转换之后的结构化数据,所述预设转换算法用于将文本信息转换为结构化数据;处理模块,用于根据所述第一损失值和所述第二损失值确定目标损失值;所述处理模块,还用于根据所述目标损失值调整所述第一预设神经网络模型的参数,获得目标神经网络模型;所述处理模块,还用于向目标设备发送所述目标神经网络模型In a second aspect, the present application provides a training device for a data-to-text generation model, comprising: an acquisition module, used to acquire first training data, wherein the first training data includes first structured data and a target text corresponding to the first structured data; the acquisition module is also used to acquire a predicted text output by the first preset neural network model after the first structured data is input into a first preset neural network model; the acquisition module is also used to acquire a first loss value between the predicted text and the target text; the acquisition module is also used to acquire a second loss value between the predicted structured data and the first structured data, wherein the predicted structured data is structured data after the predicted text is converted based on a preset conversion algorithm, and the preset conversion algorithm is used to convert text information into structured data; a processing module is used to determine a target loss value based on the first loss value and the second loss value; the processing module is also used to adjust the parameters of the first preset neural network model based on the target loss value to obtain a target neural network model; the processing module is also used to send the target neural network model to a target device.

结合第一方面,在一种可能的实现方式中,所述目标设备为第二服务器。In combination with the first aspect, in a possible implementation manner, the target device is a second server.

结合第一方面,在一种可能的实现方式中,所述目标设备为终端设备。In combination with the first aspect, in a possible implementation manner, the target device is a terminal device.

结合第一方面,在一种可能的实现方式中,所述处理模块还用于:接收所述目标设备发送的所述第一预设神经网络模型。结合第二方面,在一种可能的实现方式中,所述第一训练数据中还包括所述第一结构化数据中的每个数据的目标顺序,所述目标顺序为所述目标文本中与所述每个数据对应的文本在所述目标文本中的排列顺序;相应地,所述获取模块还用于:获取所述第一结构化数据输入所述第一预设神经网络模型之后所述第一预设神经网络模型输出的所述第一结构化数据中的每个数据的预测顺序,所述预测顺序为所述预测文本中与所述每个数据对应的文本在所述预测文本中的排列顺序;所述处理模块,还用于根据所述预测顺序与所述目标顺序确定第三损失值,以及根据所述第一损失值、所述第二损失值和所述第三损失值确定目标损失值In combination with the first aspect, in a possible implementation, the processing module is also used to: receive the first preset neural network model sent by the target device. In combination with the second aspect, in a possible implementation, the first training data also includes a target order for each data in the first structured data, and the target order is the arrangement order of the text corresponding to each data in the target text; accordingly, the acquisition module is also used to: acquire the predicted order of each data in the first structured data output by the first preset neural network model after the first structured data is input into the first preset neural network model, and the predicted order is the arrangement order of the text corresponding to each data in the predicted text; the processing module is also used to determine a third loss value based on the predicted order and the target order, and determine a target loss value based on the first loss value, the second loss value and the third loss value.

结合第二方面,在一种可能的实现方式中,所述目标损失值等于所述第一损失值、所述第二损失值及所述第三损失值之和。In combination with the second aspect, in a possible implementation manner, the target loss value is equal to the sum of the first loss value, the second loss value and the third loss value.

结合第二方面,在一种可能的实现方式中,所述获取模块还用于:获取M1个第二结构化数据,所述M1个第二结构化数据包括以下至少两种类型的数据:表格数据、结构化数据查询SQL数据、逻辑数据,M1为大于1的正整数;所述处理模块还用于:对所述M1个第二结构化数据进行预处理,获得与所述M1个第二结构化数据一一对应的M1个第二训练数据,所述M1个第二训练数据中与第j个第二结构化数据对应的第二训练数据包括:所述第j个第二结构化数据和所述第j个第二结构化数据对应的目标文本,j为正整数且j从1取至M1;所述处理模块还用于:使用所述M1个第二训练数据训练第二预设神经网络模型,获得所述第一预设神经网络模型。In combination with the second aspect, in a possible implementation, the acquisition module is also used to: acquire M1 second structured data, the M1 second structured data include at least two types of data: tabular data, structured data query SQL data, logical data, M1 is a positive integer greater than 1; the processing module is also used to: pre-process the M1 second structured data to obtain M1 second training data corresponding one-to-one to the M1 second structured data, the second training data corresponding to the j-th second structured data in the M1 second training data include: the j-th second structured data and the target text corresponding to the j-th second structured data, j is a positive integer and j ranges from 1 to M1; the processing module is also used to: use the M1 second training data to train a second preset neural network model to obtain the first preset neural network model.

结合第二方面,在一种可能的实现方式中,所述第二预设神经网络模型包括N1个编码器和N2个解码器,所述N1个编码器用于获取所述M1个结构化数据中每个结构化数据的特征向量,所述N2个解码器用于基于所述每个结构化数据的特征向量预测所述每个结构化数据对应的预测文本,N1和N2为大于1的正整数。In combination with the second aspect, in a possible implementation, the second preset neural network model includes N1 encoders and N2 decoders, the N1 encoders are used to obtain the feature vectors of each structured data in the M1 structured data, and the N2 decoders are used to predict the predicted text corresponding to each structured data based on the feature vector of each structured data, and N1 and N2 are positive integers greater than 1.

第三方面,提供了一种数据到文本生成模型的训练装置,包括处理器,该处理器用于从存储器调用计算机程序,当所述计算机程序被执行时,该处理器用于执行上述第一方面或第一方面中任意一种可能的实现方式中所述的方法。In a third aspect, a training device for a data-to-text generation model is provided, comprising a processor for calling a computer program from a memory, wherein when the computer program is executed, the processor is for executing the method described in the first aspect or any possible implementation of the first aspect.

第四方面,提供了一种通信设备,该通信设备包括上述第二方面或其中任意一种可能的实现方式中所述的装置。In a fourth aspect, a communication device is provided, which includes the apparatus described in the second aspect or any possible implementation manner thereof.

第五方面,提供了一种计算机可读存储介质,用于存储计算机程序,该计算机程序包括用于执行上述第一方面或第一方面的任意一种可能的实现方式中所述的方法的代码。In a fifth aspect, a computer-readable storage medium is provided for storing a computer program, wherein the computer program includes a code for executing the method described in the first aspect or any possible implementation of the first aspect.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本申请一实施例提供的应用场景的结构性示意图;FIG1 is a structural schematic diagram of an application scenario provided by an embodiment of the present application;

图2为本申请一实施例提供的数据到文本生成模型的训练方法的流程性示意图;FIG2 is a flow chart of a method for training a data-to-text generation model provided in an embodiment of the present application;

图3为本申请一实施例提供的基于图到文本的预训练模型的结构性示意图;FIG3 is a structural schematic diagram of a pre-training model based on graph to text provided in an embodiment of the present application;

图4为本申请实施例提供的将第一结构化数据转换为图结构的示意图;FIG4 is a schematic diagram of converting first structured data into a graph structure according to an embodiment of the present application;

图5为本申请另一个实施例提供的将第一结构化数据转换为图结构的示意图;FIG5 is a schematic diagram of converting first structured data into a graph structure according to another embodiment of the present application;

图6为本申请一个实施例提供的数据到文本生成模型的训练方法的结构性示意图;FIG6 is a structural schematic diagram of a method for training a data-to-text generation model provided in one embodiment of the present application;

图7为本申请一个实施例提供的编码器的结构性示意图;FIG7 is a schematic structural diagram of an encoder provided by an embodiment of the present application;

图8为本申请一个实施例提供的数据到文本生成模型的训练装置的结构性示意图;FIG8 is a structural schematic diagram of a training device for a data-to-text generation model provided by an embodiment of the present application;

图9为本申请另一个实施例提供的数据到文本生成模型的训练装置的结构性示意图。FIG9 is a structural schematic diagram of a training device for a data-to-text generation model provided in another embodiment of the present application.

具体实施方式DETAILED DESCRIPTION

为使本申请的目的、技术方案和优点更加清楚,下面将结合本申请具体实施例及相应的附图对本申请技术方案进行清楚、完整地描述。显然,所描述的实施例仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the purpose, technical solution and advantages of the present application clearer, the technical solution of the present application will be clearly and completely described below in combination with the specific embodiments of the present application and the corresponding drawings. Obviously, the described embodiments are only part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by ordinary technicians in this field without making creative work are within the scope of protection of the present application.

为了便于理解,首先介绍本申请实施例中涉及到的若干术语。To facilitate understanding, several terms involved in the embodiments of the present application are first introduced.

1、文本生成1. Text Generation

文本生成是目前自然语言处理领域一项非常重要但具有挑战性的任务,旨在将输入数据(例如,序列和关键字)以自然语言模式生成合理且可读的文本。其目的是希望生成可读的自然语言文本,比较有代表性的应用例如为对话系统、文本摘要和机器翻译等。Text generation is a very important but challenging task in the field of natural language processing. It aims to generate reasonable and readable text in natural language mode from input data (e.g., sequences and keywords). Its purpose is to generate readable natural language text. Representative applications include dialogue systems, text summarization, and machine translation.

2、神经网络2. Neural Networks

人工智能(artificial intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个分支,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。Artificial intelligence (AI) is the theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new type of intelligent machine that can respond in a similar way to human intelligence. Artificial intelligence is also the study of the design principles and implementation methods of various intelligent machines, so that machines have the functions of perception, reasoning and decision-making.

当今人工智能的关键技术是神经网络(neural networks,NN)。神经网络通过模拟人脑神经细胞连接,将大量的、简单的处理单元(称为神经元)广泛互连,形成复杂的网络系统。The key technology of artificial intelligence today is neural networks (NN). Neural networks simulate the connections between nerve cells in the human brain and widely interconnect a large number of simple processing units (called neurons) to form a complex network system.

一个简单的神经网络包含三个层次,分别是输入层、输出层和隐藏层(也称中间层),每一层之间的连接都对应一个权重(其值称为权值、参数)。神经网络之所以能在计算机视觉、自然语言处理等领域有出色性能,是因为通过训练算法调整权值,使神经网络的预测结果最佳。A simple neural network consists of three layers: input layer, output layer and hidden layer (also called middle layer). The connection between each layer corresponds to a weight (its value is called weight, parameter). The reason why neural networks can perform well in computer vision, natural language processing and other fields is that the weights are adjusted through training algorithms to make the prediction results of the neural network optimal.

神经网络的训练一般包含两个计算步骤,第一步为前向计算,第二步为反向计算。其中,前向计算为:输入值与参数经过计算后,再经过一个非线性函数产生输出值。输出值或作为网络的最终输出,或将作为后续的输入值继续执行类似的计算。网络的输出值与对应样本的实际标签值的偏差,用损失函数来衡量,损失函数表示为输入样本x和网络参数w的函数f(x,w),为了使损失函数降至最小,需要不断调整网络的参数w,而反向计算是为了得到参数w的更新值,在基于梯度下降的算法中,反向计算从神经网络的最后一层开始,计算损失函数对每一层参数的偏导数,最后得到全部参数的偏导数,称为梯度。每次迭代时,把参数w以一定步长η向梯度的反方向更新,得到新的参数w,即完成一步训练。该更新过程用下式表示:The training of neural networks generally includes two calculation steps, the first step is forward calculation, and the second step is reverse calculation. Among them, the forward calculation is: after the input value and the parameter are calculated, they are passed through a nonlinear function to generate the output value. The output value is either the final output of the network or will be used as the subsequent input value to continue to perform similar calculations. The deviation between the output value of the network and the actual label value of the corresponding sample is measured by the loss function. The loss function is expressed as a function f(x,w) of the input sample x and the network parameter w. In order to minimize the loss function, it is necessary to continuously adjust the network parameter w, and the reverse calculation is to obtain the updated value of the parameter w. In the algorithm based on gradient descent, the reverse calculation starts from the last layer of the neural network, calculates the partial derivative of the loss function with respect to the parameters of each layer, and finally obtains the partial derivative of all parameters, which is called the gradient. In each iteration, the parameter w is updated in the opposite direction of the gradient with a certain step size η to obtain a new parameter w, that is, one step of training is completed. The update process is expressed as follows:

其中,wt表示第t次迭代时使用的参数,wt+1表示更新后的参数,η称为学习率,Bt表示第t次迭代输入的样本集合。Among them, wt represents the parameters used in the tth iteration, wt+1 represents the updated parameters, η is called the learning rate, and Bt represents the sample set input in the tth iteration.

训练神经网络的过程也就是对神经元对应的权重进行学习的过程,其最终目的是得到训练好的神经网络的每一层神经元对应的权重。The process of training a neural network is the process of learning the weights corresponding to neurons. Its ultimate goal is to obtain the weights corresponding to each layer of neurons in the trained neural network.

近几年来,随着人工智能技术的飞速发展,文本生成成为自然语言处理中的一项重要技术。应用者可以利用既定信息与文本生成模型生成满足特定目标的文本序列。文本生成模型的应用场景非常丰富,例如阅读理解、人机对话或者智能写作等等。其中,在文本生成技术中,数据到文本的生成任务是文本生成技术的重要研究任务之一,其目标是根据输入的结构化数据自动生成相关的描述性文本。其中,结构化数据例如是表格(Table)数据、结构化查询语言(structured query language,SQL)数据、逻辑(Logic)数据等等。在此说明的是,有关Table数据、SQL数据和Logic数据的概念可以参考相关技术中的介绍,此处不再赘述。In recent years, with the rapid development of artificial intelligence technology, text generation has become an important technology in natural language processing. Applicants can use established information and text generation models to generate text sequences that meet specific goals. The application scenarios of text generation models are very rich, such as reading comprehension, human-computer dialogue, or intelligent writing, etc. Among them, in text generation technology, the task of generating data from text is one of the important research tasks of text generation technology, and its goal is to automatically generate relevant descriptive text based on the input structured data. Among them, structured data is, for example, table data, structured query language (SQL) data, logic data, etc. It should be noted here that the concepts of Table data, SQL data, and Logic data can be referred to the introduction in the relevant technology, and will not be repeated here.

示例性地,图1为本申请一个实施例提供的应用场景的结构性示意图。如图1所示,在该应用场景中,训练服务器101可以对结构化数据库中的结构化数据使用分类算法进行训练,可以得到训练好的模型,之后,训练服务器101可以将该训练好的模型发送给目标设备102,以使得目标设备102可以在接收到新的样本数据时,使用该训练好的模型生成新的样本数据对应的文本信息。For example, Figure 1 is a structural schematic diagram of an application scenario provided by an embodiment of the present application. As shown in Figure 1, in the application scenario, the training server 101 can use a classification algorithm to train the structured data in the structured database to obtain a trained model, and then the training server 101 can send the trained model to the target device 102, so that the target device 102 can use the trained model to generate text information corresponding to the new sample data when receiving the new sample data.

在此说明的是,本申请实施例对该应用场景中的具体生成任务类型不做限定。例如,该文本生成系统可以完成基于表格(Table)数据到文本(Text)的生成任务类型,也称为Table-to-Text的生成;或者,可以完成基于结构化查询语言(structured querylanguage,SQL)数据到Text的生成任务类型,也称为SQL-to-Text的生成;或者,可以完成基于逻辑(Logic)数据到文本(Text)的生成任务类型,也称为Logic-to-Text的生成;或者,也可以完成基于回复自然语言生成(ResponseNLG)数据到文本(Text)的生成任务类型。It is noted that the specific generation task type in the application scenario is not limited in the embodiment of the present application. For example, the text generation system can complete the generation task type based on table (Table) data to text (Text), also known as Table-to-Text generation; or, it can complete the generation task type based on structured query language (structured query language, SQL) data to Text, also known as SQL-to-Text generation; or, it can complete the generation task type based on logic (Logic) data to text (Text), also known as Logic-to-Text generation; or, it can also complete the generation task type based on response natural language generation (ResponseNLG) data to text (Text).

通常,在图1所示的应用场景中,训练服务器101在基于结构化数据库中的结构化数据进行训练时,使用的预设模型为基于自回归的生成模型,例如使用GPT模型或T5模型将数据生成文本。其中,GPT模型或T5模型可以认为是一个黑盒子模型,该黑盒子模型在给定输入数据后,可以生成与该数据对应的文本信息。Typically, in the application scenario shown in FIG1 , when the training server 101 performs training based on structured data in a structured database, the preset model used is a generative model based on autoregression, such as using a GPT model or a T5 model to generate text from data. The GPT model or the T5 model can be considered as a black box model, which can generate text information corresponding to the data after given input data.

然而,在基于GPT模型或T5模型从数据生成文本时,生成的文本和原本的输入数据之间通常会产生语义偏移,导致生成的文本的准确度不高。However, when generating text from data based on the GPT model or the T5 model, there is usually a semantic deviation between the generated text and the original input data, resulting in low accuracy of the generated text.

鉴于此,本申请实施例提供一种数据到文本生成模型的训练方法及装置。本申请提供的数据到文本生成模型的训练方法中,由于在训练预设的神经网络模型时,在除原有的训练任务之外,还额外引入其他的训练任务,其中,该额外的训练任务用于使得预设的神经网络模型输出的文本更贴近于输入的结构化数据,从而使得训练完成的神经网络模型在对实时的结构化数据进行预测时,提升生成的文本与该实时的结构化数据在语义上的忠实度。In view of this, an embodiment of the present application provides a data-to-text generation model training method and device. In the data-to-text generation model training method provided by the present application, in addition to the original training tasks, other training tasks are additionally introduced when training a preset neural network model, wherein the additional training tasks are used to make the text output by the preset neural network model closer to the input structured data, so that when the trained neural network model predicts real-time structured data, the semantic fidelity between the generated text and the real-time structured data is improved.

下面以具体地实施例对本申请的技术方案以及本申请的技术方案如何解决上述技术问题进行详细说明。下面这几个具体的实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例中不再赘述。下面将结合附图,对本申请的实施例进行描述。The technical solution of the present application and how the technical solution of the present application solves the above-mentioned technical problems are described in detail below with specific embodiments. The following specific embodiments can be combined with each other, and the same or similar concepts or processes may not be repeated in some embodiments. The embodiments of the present application will be described below in conjunction with the accompanying drawings.

图2是本申请一实施例的数据到文本生成模型的训练方法的流程性示意图。如图2所示,本实施例的方法可以包括S201、S202、S203、S204、S205、S206和S207,该方法可以由图1所示的训练服务器101执行。Fig. 2 is a flow diagram of a method for training a data-to-text generation model according to an embodiment of the present application. As shown in Fig. 2, the method of this embodiment may include S201, S202, S203, S204, S205, S206, and S207, and the method may be executed by the training server 101 shown in Fig. 1.

在此说明的是,本实施例中,也将训练服务器101称为第一服务器。It should be noted that, in this embodiment, the training server 101 is also referred to as the first server.

S201,获取第一训练数据,所述第一训练数据包括第一结构化数据和所述第一结构化数据对应的目标文本。S201: Acquire first training data, where the first training data includes first structured data and a target text corresponding to the first structured data.

应理解,若要对某个神经网络模型进行训练,其第一步都是需要获取到用于训练神经网络模型的样本,即训练数据。It should be understood that if a neural network model is to be trained, the first step is to obtain samples used to train the neural network model, that is, training data.

本实施例中,第一训练数据中包括第一结构化数据,以及该第一结构化数据对应的目标文本。In this embodiment, the first training data includes first structured data and a target text corresponding to the first structured data.

在此说明的是,本申请实施例对训练的神经网络模型的具体类型不做限定。例如,要训练一个基于表格(Table)数据到文本(Text)的类型的神经网络模型,即用于Table-to-Text的神经网络模型;也可以是要训练一个结构化查询语言(structured querylanguage,SQL)数据到Text的类型的神经网络模型,即用于SQL-to-Text的神经网络模型;也可以是要训练一个逻辑(Logic)数据到文本(Text)的类型的神经网络模型,即用于Logic-to-Text的神经网络模型;又或者,可以是要训练一个回复自然语言生成(ResponseNLG)数据到文本(Text)的类型的神经网络模型。It should be noted that the embodiments of the present application do not limit the specific type of the trained neural network model. For example, a neural network model based on table data to text is to be trained, that is, a neural network model for Table-to-Text; a neural network model based on structured query language (SQL) data to text is to be trained, that is, a neural network model for SQL-to-Text; a neural network model based on logic data to text is to be trained, that is, a neural network model for Logic-to-Text; or a neural network model based on response natural language generation (ResponseNLG) data to text is to be trained.

应理解,当本申请要训练的神经网络模型为Table-to-Text时,本实施例中的第一训练数据中包括第一结构化数据是指Table数据,当本申请要训练的神经网络模型为SQL-to-Text时,本实施例中的第一训练数据中包括第一结构化数据是指SQL数据,当本申请要训练的神经网络模型为Logic-to-Text时,本实施例中的第一训练数据中包括第一结构化数据是指Logic数据,而本申请要训练的神经网络模型为回复自然语言生成(ResponseNLG)数据到文本(Text)的生成任务时,本实施例中的第一训练数据中包括的第一结构化数据是指回复自然语言生成数据,其中,回复自然语言生成数据包括Table数据和SQL数据。It should be understood that when the neural network model to be trained in the present application is Table-to-Text, the first structured data included in the first training data in this embodiment refers to Table data, when the neural network model to be trained in the present application is SQL-to-Text, the first structured data included in the first training data in this embodiment refers to SQL data, when the neural network model to be trained in the present application is Logic-to-Text, the first structured data included in the first training data in this embodiment refers to Logic data, and when the neural network model to be trained in the present application is a generation task of response natural language generation (ResponseNLG) data to text (Text), the first structured data included in the first training data in this embodiment refers to response natural language generation data, wherein the response natural language generation data includes Table data and SQL data.

本实施例中,第一训练数据中包括了第一结构化数据对应的目标文本。应理解,该目标文本用于指示当第一结构化数据输入到需要训练的神经网络后,希望该需要训练的神经网络输出的目标值,即可以认为是需要输出的理想文本信息。In this embodiment, the first training data includes the target text corresponding to the first structured data. It should be understood that the target text is used to indicate the target value that the neural network to be trained is expected to output after the first structured data is input into the neural network to be trained, that is, it can be considered as the ideal text information to be output.

S202,获取所述第一结构化数据输入第一预设神经网络模型之后所述第一预设神经网络模型输出的预测文本。S202, obtaining the predicted text output by the first preset neural network model after the first structured data is input into the first preset neural network model.

在此说明的是,本实施例中,对第一预设神经网络模型的具体形式不做限定。例如,第一预设的神经网络模型可以是在一些已经在大型数据集上训练过的模型,又或者可以是重新设计的模型。还应理解,本实施例中的第一预设神经网络模型是指能够输出文本信息的模型。It should be noted that in this embodiment, the specific form of the first preset neural network model is not limited. For example, the first preset neural network model can be a model that has been trained on some large data sets, or it can be a redesigned model. It should also be understood that the first preset neural network model in this embodiment refers to a model that can output text information.

通常,当将某个数据输入到神经网络之后,神经网络模型可以输出与该某个数据对应的预测值。因此,本实施例中,当将第一结构化数据输入到第一预设的神经网络模型时,该第一预设的神经网络模型可以输出预测文本。Generally, when certain data is input into a neural network, the neural network model can output a predicted value corresponding to the certain data. Therefore, in this embodiment, when the first structured data is input into the first preset neural network model, the first preset neural network model can output a predicted text.

需要说明的是,在文本生成领域,为了能够快速获得一个针对某个特定类型的模型,通常会使用一些在大数据集上已经训练过的模型(也称为预训练模型)作为基础模型。应理解,当使用预训练模型作为基础模型时,就需要将用于训练特定类型的模型的训练数据与基础模型统一。It should be noted that in the field of text generation, in order to quickly obtain a model for a specific type, some models that have been trained on large data sets (also called pre-trained models) are usually used as basic models. It should be understood that when using a pre-trained model as a basic model, it is necessary to unify the training data used to train a specific type of model with the basic model.

在一种实施方案中,使用的预训练模型为图3所示的基于图(Graph)到文本(Text)的模型,也称为Graph-to-Text模型。如图3所示,Graph-to-Text模型包括编码器和解码器。其中,编码器用于获取某个图结构数据对应的特征向量C,解码器用于基于特征向量C预测该某个结构化数据对应的预测文本。In one embodiment, the pre-trained model used is a graph-to-text model as shown in FIG3 , also known as a Graph-to-Text model. As shown in FIG3 , the Graph-to-Text model includes an encoder and a decoder. The encoder is used to obtain a feature vector C corresponding to a certain graph structure data, and the decoder is used to predict the predicted text corresponding to the certain structured data based on the feature vector C.

应理解,如果本实施例中使用的预训练模型为基于图(Graph)到文本(Text)的模型时,也称为Graph-to-Text,此时就需要对第一结构化数据进行预处理,使第一结构化数据转换为图结构。It should be understood that if the pre-trained model used in this embodiment is a model based on graph to text, also called Graph-to-Text, then it is necessary to preprocess the first structured data to convert the first structured data into a graph structure.

示例性地,当第一结构化数据为SQL数据:Select定增年度,增发目的Where最新价格<‘10’时,图4为本申请一个实施例提供的将第一结构化数据转换为图结构的示意图。如图4所示,图结构的第一个节点为根节点,然后根节点分别连接选择(Select)节点和哪里(Where)节点,Select分别连接两个“AGG:none”节点,其中一个“AGG:none”节点连接定增年度节点,另一个“AGG:none”节点连接增发目的节点,Where节点连接操作“Op:<”节点,“Op:<”节点分别连接最新价格节点和10。Exemplarily, when the first structured data is SQL data: Select the fixed increase year, the purpose of additional issuance Where the latest price <'10', Figure 4 is a schematic diagram of converting the first structured data into a graph structure provided by an embodiment of the present application. As shown in Figure 4, the first node of the graph structure is the root node, and then the root node is respectively connected to the Select node and the Where node, Select is respectively connected to two "AGG:none" nodes, one of which is connected to the fixed increase year node, and the other "AGG:none" node is connected to the additional issuance purpose node, and the Where node is connected to the operation "Op:<" node, and the "Op:<" node is respectively connected to the latest price node and 10.

示例性地,当第一结构化数据为表1所示的Table数据时:Exemplarily, when the first structured data is the Table data shown in Table 1:

表1 Table数据Table 1 Table data

定增年度Fixed increase year 增发目的Purpose of additional issuance 20162016 融资其他资产Financing other assets 20162016 项目融资Project Financing

图5为本申请另一个实施例提供的将第一结构化数据转换为图结构的示意图。如图5所示,转换后的图结构中包括5个节点。其中,节点定增年度连接节点2016,节点增发目的与节点融资其他资产和节点项目融资相连相连,节点2016与节点融资其他资产和节点项目融资相连,节点融资其他资产和节点项目融资相连。FIG5 is a schematic diagram of converting the first structured data into a graph structure provided by another embodiment of the present application. As shown in FIG5 , the converted graph structure includes 5 nodes. Among them, the node fixed increase year is connected to the node 2016, the node additional issuance purpose is connected to the node financing other assets and the node project financing, the node 2016 is connected to the node financing other assets and the node project financing, and the node financing other assets is connected to the node project financing.

在此说明的是,图4和图5所示的图结构仅是一种示例,不构成本申请的限制。It should be noted that the graph structures shown in FIG. 4 and FIG. 5 are merely examples and do not constitute limitations of the present application.

S203,获取预测文本与目标文本之间的第一损失值。S203: Obtain a first loss value between the predicted text and the target text.

其中,预测文本是指将第一结构化数据输入到第一预设神经网络模型之后,第一预设神经网络模型输出的文本信息。目标文本即为第一结构化输入到预设神经网络之后,需要训练的预设神经网络输出的理想文本。The predicted text refers to the text information output by the first preset neural network model after the first structured data is input into the first preset neural network model. The target text is the ideal text output by the preset neural network to be trained after the first structured data is input into the preset neural network.

应理解,在训练神经网络时,目标文本与预测文本之间的偏差越小越好。因此,本实施例中,当获取到目标文本和预测文本之后,可以先计算该预测文本与目标文本之间的第一损失值。可以理解的是,该第一损失值能够反映出当前预测文本与目标文本之间的偏差程度。It should be understood that when training a neural network, the smaller the deviation between the target text and the predicted text, the better. Therefore, in this embodiment, after obtaining the target text and the predicted text, the first loss value between the predicted text and the target text can be calculated first. It can be understood that the first loss value can reflect the degree of deviation between the current predicted text and the target text.

S204,获取预测结构化数据与第一结构化数据之间的第二损失值,所述预测结构化数据为基于预设转换算法将预测文本转换之后的结构化数据,所述预设转换算法用于将文本信息转换为结构化数据。S204, obtaining a second loss value between the predicted structured data and the first structured data, wherein the predicted structured data is structured data after the predicted text is converted based on a preset conversion algorithm, and the preset conversion algorithm is used to convert text information into structured data.

可以理解的是,在使用神经网络模型基于第一结构化数据生成预测文本时,该预测文本在语义上应该要尽可能忠实于第一结构化数据,换言之,要求预测文本在语义上尽可能贴合于第一结构化数据。It is understandable that when a neural network model is used to generate predicted text based on the first structured data, the predicted text should be as semantically faithful to the first structured data as possible. In other words, the predicted text is required to be as semantically consistent with the first structured data as possible.

因此,本实施例中,在训练的过程中,当获取到预测文本后,先使用预设转换算法将预测文本转换为结构化数据,然后使用转换后的结构化数据与第一结构化数据进行比较,即计算预测结构化数据与第一结构化数据之间的第二损失值。可以理解的是,该第二损失值可以反映出预测文本与第一结构化数据之间的忠实度。Therefore, in this embodiment, during the training process, after the predicted text is obtained, the predicted text is first converted into structured data using a preset conversion algorithm, and then the converted structured data is compared with the first structured data, that is, the second loss value between the predicted structured data and the first structured data is calculated. It can be understood that the second loss value can reflect the fidelity between the predicted text and the first structured data.

S205,根据第一损失值和第二损失值确定目标损失值。S205: Determine a target loss value according to the first loss value and the second loss value.

在一种可能的实现方式中,目标损失值为第一损失值和第二损失值之和。In a possible implementation, the target loss value is the sum of the first loss value and the second loss value.

在另一种可能的实现方式中,目标损失值为第一损失和和第二损失值的加权求和值。In another possible implementation, the target loss value is a weighted sum of the first loss value and the second loss value.

S206,根据目标损失值调整第一预设神经网络模型的参数,获得目标神经网络模型。S206, adjusting the parameters of the first preset neural network model according to the target loss value to obtain a target neural network model.

在一种可能的实施方案中,第一预设神经网络模型为目标设备发送的神经网络模型。In a possible implementation, the first preset neural network model is a neural network model sent by the target device.

其中,该目标设备例如是终端设备或部署了第一预设神经网络模型的服务器(本实施例中也称为第二服务器)。The target device is, for example, a terminal device or a server (also referred to as a second server in this embodiment) on which a first preset neural network model is deployed.

在此说明的是,本实施例对第一预设神经网络模型中的参数的具体形式不做限定。例如,第一预设神经网络模型为初始化模型(即可以认为第一预设神经网络模型中的参数还未进行过训练),或者,第一预设神经网络模型为已经过训练的模型(即可以认为第一预设神经网络模型中的参数已经进行了训练)。It should be noted that this embodiment does not limit the specific form of the parameters in the first preset neural network model. For example, the first preset neural network model is an initialization model (that is, it can be considered that the parameters in the first preset neural network model have not been trained), or the first preset neural network model is a trained model (that is, it can be considered that the parameters in the first preset neural network model have been trained).

本实施例中,基于目标损失值来调整第一预设神经网络模型的参数。即同时优化第一损失值和第二损失值来调整预设网络模型的参数。应理解,由于第一损失值可以反映出当前预测文本与目标文本与之间的偏差程度,而第二损失值可以反映出预测文本与第一结构化数据之间的忠实度,因此,在本实施例中,相比只考虑当前预测文本与目标文本与之间的偏差程度来调整第一预设神经网络模型参数的方法,由于增加了优化预测文本与第一结构化数据之间的忠实度的任务,因此,能够在基于目标损失值调整第一预设神经网络模型的参数过程中,提升训练出的目标神经网络模型预测的文本与输入的结构化数据在语义上的忠实度。In this embodiment, the parameters of the first preset neural network model are adjusted based on the target loss value. That is, the first loss value and the second loss value are optimized simultaneously to adjust the parameters of the preset network model. It should be understood that since the first loss value can reflect the degree of deviation between the current predicted text and the target text, and the second loss value can reflect the fidelity between the predicted text and the first structured data, therefore, in this embodiment, compared with the method of adjusting the parameters of the first preset neural network model by only considering the degree of deviation between the current predicted text and the target text, the task of optimizing the fidelity between the predicted text and the first structured data is added. Therefore, in the process of adjusting the parameters of the first preset neural network model based on the target loss value, the semantic fidelity between the text predicted by the trained target neural network model and the input structured data can be improved.

S207,向目标设备发送目标神经网络模型。S207, sending the target neural network model to the target device.

示例性地,该目标设备为可以部署神经网络模型的服务器(本实施例中也称为第二服务器),或可以部署神经网络模型的终端设备。Exemplarily, the target device is a server (also referred to as a second server in this embodiment) on which a neural network model can be deployed, or a terminal device on which a neural network model can be deployed.

应理解,当目标设备为第二服务器时,若要使用目标神经网络模型生成文本信息,那么终端设备可以将需要生成文本信息的结构化数据(例如称为新的样本数据)先发送至第二服务器,然后第二服务器生成该新的样本数据的文本信息。即,在该场景中,终端设备可以通过访问第二服务器的方式,获得新的样本数据对应的文本信息。It should be understood that when the target device is the second server, if the target neural network model is to be used to generate text information, the terminal device can first send the structured data (such as new sample data) that needs to generate text information to the second server, and then the second server generates the text information of the new sample data. That is, in this scenario, the terminal device can obtain the text information corresponding to the new sample data by accessing the second server.

应理解,当目标设备为终端设备时,若要使用目标神经网络模型生成文本信息,则终端设备可以将需要生成文本信息的结构化数据(例如称为新的样本数据)直接输入到目标神经网络模型中,然后生成该新的样本数据的文本信息。It should be understood that when the target device is a terminal device, if the target neural network model is to be used to generate text information, the terminal device can directly input the structured data required to generate text information (for example, called new sample data) into the target neural network model, and then generate the text information of the new sample data.

作为一个可选的实施例,在图2所示实施例的基础上,在一种可能的实现方式中,本申请中的第一训练数据中还可以包括第一结构化数据中的每个数据的目标顺序,其中,该目标顺序为目标文本中与每个数据对应的文本在目标文本中的排列顺序;相应地,本申请的训练方法还包括:获取第一结构化数据输入第一预设神经网络模型之后第一预设神经网络模型输出的第一结构化数据中的每个数据的预测顺序,所述预测顺序为预测文本中与每个数据对应的文本在预测文本中的排列顺序;根据预测顺序与目标顺序确定第三损失值;相应地,根据第一损失值和第二损失值确定目标损失值,包括:根据第一损失值、第二损失值和第三损失值确定目标损失值。As an optional embodiment, based on the embodiment shown in Figure 2, in a possible implementation method, the first training data in the present application may also include a target order for each data in the first structured data, wherein the target order is the arrangement order of the text corresponding to each data in the target text; accordingly, the training method of the present application also includes: obtaining the predicted order of each data in the first structured data output by the first preset neural network model after the first structured data is input into the first preset neural network model, wherein the predicted order is the arrangement order of the text corresponding to each data in the predicted text; determining a third loss value according to the predicted order and the target order; accordingly, determining the target loss value according to the first loss value and the second loss value, including: determining the target loss value according to the first loss value, the second loss value and the third loss value.

应理解,第一结构化数据中通常是包括多个数据的。例如,对于某个表格数据,该表格数据里面包括了若个干属性,例如包括了姓名信息、出生年月信息、职业信息等。It should be understood that the first structured data usually includes multiple data. For example, for a certain table data, the table data includes several attributes, such as name information, date of birth information, occupation information, etc.

还应理解,通常,在训练一个从结构化数据到文本的生成模型时,生成的文本应该越通顺越好,而生成的文本是否通顺是与结构化数据中的每个数据在生成的文本中的顺序有关系的。例如,对于上述包括姓名信息、出生年月信息、职业信息的表格数据在生成文本时,可能先描述姓名、再描述出生年月,最后再描述职业信息是比较通顺的,但若换成其他描述,生成的文本则可能就比较混乱。It should also be understood that, generally, when training a generative model from structured data to text, the generated text should be as fluent as possible, and whether the generated text is fluent is related to the order of each data in the structured data in the generated text. For example, when generating text for the above-mentioned tabular data including name information, date of birth information, and occupation information, it may be more fluent to describe the name first, then the date of birth, and finally the occupation information, but if other descriptions are used, the generated text may be more confusing.

因此,在该实施例中,在第一训练数据中还包括了第一结构化数据中的每个数据的目标顺序。其中,目标顺序是指目标文本中与每个数据对应的文本在目标文本中的排列顺序,即,该目标顺序描述了第一结构化数据中的每个数据在生成文本时的理想位置。Therefore, in this embodiment, the first training data also includes the target order of each data in the first structured data, wherein the target order refers to the arrangement order of the text corresponding to each data in the target text, that is, the target order describes the ideal position of each data in the first structured data when generating text.

之后,在具体训练时,本实施例获取第一结构化数据输入第一预设神经网络模型之后第一预设神经网络模型输出的第一结构化数据中的每个数据的预测顺序,其中,该预测顺序可以认为是在当前的第一预设神经网络模型的参数下,第一预设神经网络模型预测的文本(称为预测文本)中与每个数据对应的文本在预测文本中的排列顺序,即预测顺序描述了第一预设神经网络模型预测的第一结构化数据中的每个数据的位置信息。Afterwards, during specific training, this embodiment obtains the predicted order of each data in the first structured data output by the first preset neural network model after the first structured data is input into the first preset neural network model, wherein the predicted order can be considered as the arrangement order of the text corresponding to each data in the text predicted by the first preset neural network model (called predicted text) under the parameters of the current first preset neural network model, that is, the predicted order describes the position information of each data in the first structured data predicted by the first preset neural network model.

应理解,预测顺序与目标顺序之间的偏差越小,说明生成的文本越通顺。因此,该实施例中,当获取到了预测顺序后,计算预测顺序与目标顺序之间的偏差,即根据预测顺序与目标顺序确定第三损失值;相应地,当确定出第三损失值之后,根据第一损失值、第二损失值和第三损失值确定目标损失值,然后再使用目标损失值去调整第一预设神经网络模型的参数。具体地,在根据目标损失值调整第一预设神网络模型的参数时,可以将最小化目标损失值作为优化目标。It should be understood that the smaller the deviation between the predicted order and the target order, the smoother the generated text. Therefore, in this embodiment, after the predicted order is obtained, the deviation between the predicted order and the target order is calculated, that is, the third loss value is determined according to the predicted order and the target order; accordingly, after the third loss value is determined, the target loss value is determined according to the first loss value, the second loss value and the third loss value, and then the target loss value is used to adjust the parameters of the first preset neural network model. Specifically, when adjusting the parameters of the first preset neural network model according to the target loss value, minimizing the target loss value can be used as the optimization goal.

该实现方式中,在基于第一结构体数据训练第一预设神经网络模型的过程中,除了在图2所示实施例中额外引入优化预测出的文本对应的预测结构化数据与第一结构化数据之间的损失值的训练任务之外,还额外引入了优化第一结构体数据中的每个数据的预测位置与目标位置之间的损失值的训练任务,从而还可以提升调整第一预设神经网络模型的参数之后的模型输出的文本的流畅性。In this implementation, in the process of training the first preset neural network model based on the first structured data, in addition to the training task of optimizing the loss value between the predicted structured data corresponding to the predicted text and the first structured data in the embodiment shown in Figure 2, a training task of optimizing the loss value between the predicted position and the target position of each data in the first structured data is also introduced, thereby improving the fluency of the text output by the model after adjusting the parameters of the first preset neural network model.

可以理解的是,神经网络模型在文本生成研究中已经取得重大进展,其优势在于神经网络模型可以端到端的学习输入数据到输出文本的语义映射,而不需要人工参与进行特征工程。但是,神经网络模型往往具有大量的参数,而大部分文本生成任务数据集都非常小。It is understandable that neural network models have made significant progress in text generation research. The advantage of neural network models is that they can learn the semantic mapping from input data to output text end-to-end without the need for human intervention in feature engineering. However, neural network models often have a large number of parameters, and most text generation task datasets are very small.

鉴于此,本实施例在针对某个特定的文本生成任务训练模型时,可以先采用多个类型的文本生成任务的结构化数据对神经网络模型进行预训练。例如,可以先获取M1个第二结构化数据,其中,该M1个第二结构化数据包括以下至少两种类型的数据:表格数据、结构化数据查询SQL数据、逻辑数据,M1为大于1的正整数。之后,对获取到的M1个第二结构化数据进行预处理,获得该M1个第二结构化数据一一对应的M1个第二训练数据,其中,该M1个第二训练数据中与第j个第二结构化数据对应的第二训练数据包括:第j个第二结构化数据和第j个第二结构化数据对应的目标文本;最后,使用该M1个第二训练数据训练第二预设神经网络模型,以获得第一预设神经网络模型。In view of this, when training a model for a specific text generation task, this embodiment may first use structured data of multiple types of text generation tasks to pre-train the neural network model. For example, M1 second structured data may be obtained first, wherein the M1 second structured data include at least two types of data: table data, structured data query SQL data, and logical data, and M1 is a positive integer greater than 1. Afterwards, the obtained M1 second structured data are pre-processed to obtain M1 second training data that correspond one-to-one to the M1 second structured data, wherein the second training data corresponding to the jth second structured data in the M1 second training data include: the jth second structured data and the target text corresponding to the jth second structured data; finally, the M1 second training data are used to train the second preset neural network model to obtain the first preset neural network model.

换句话说,该实施例中,通过基于各种结构化数据对应的训练数据来对第二预设神经网络模型进行训练的方式来获得第一预设神经网络模型。可以理解的是,通过该实现方式获得的第一预设神经网络模型是基于各种结构化数据对应的训练数据训练得到的,因此,获得的第一预设神经网络模型已经具有了部分的从结构化数据到文本生成的能力。In other words, in this embodiment, the first preset neural network model is obtained by training the second preset neural network model based on the training data corresponding to various structured data. It can be understood that the first preset neural network model obtained by this implementation is trained based on the training data corresponding to various structured data, and therefore, the obtained first preset neural network model already has a partial ability to generate text from structured data.

为便于理解,下面说明本申请实施例提供的一种数据到文本生成模型的训练方法的详细实现方式。For ease of understanding, the following describes a detailed implementation of a data-to-text generation model training method provided in an embodiment of the present application.

图6为本申请一个实施例提供的该实现方式下的数据到文本生成模型的训练方法。如图6所示,使用的第一预设模型中包括编码器和解码器,其中,编码器用于将输入的结构化数据编码成特征向量C,之后,C输入至解码器中且解码器基于C输出预测文本。此外,本实施例中,第一预设模型中还包括文本规划模块和最优传输模块。其中,文本规划模块用于预测结构化数据包括的每个数据在生成文本中的预测位置,以及根据每个数据在生成文本中的预测位置与理想位置确定第三损失值,记为Lplanning,最优传输模块用于计算生成文本对应的结构化数据与输入的结构化数据之间的第二损失值,记为LOT,应理解,本实施例中还包括第一损失值,其中第一损失值为编码器生成的预测文本与理想文本之间的损失值,记为LLMFIG6 is a training method for a data-to-text generation model under the implementation mode provided by an embodiment of the present application. As shown in FIG6, the first preset model used includes an encoder and a decoder, wherein the encoder is used to encode the input structured data into a feature vector C, and then C is input into the decoder and the decoder outputs a predicted text based on C. In addition, in this embodiment, the first preset model also includes a text planning module and an optimal transmission module. Among them, the text planning module is used to predict the predicted position of each data included in the structured data in the generated text, and to determine the third loss value according to the predicted position and the ideal position of each data in the generated text, which is recorded as L planning , and the optimal transmission module is used to calculate the second loss value between the structured data corresponding to the generated text and the input structured data, which is recorded as L OT . It should be understood that the first loss value is also included in this embodiment, wherein the first loss value is the loss value between the predicted text generated by the encoder and the ideal text, which is recorded as L LM .

在一种实现方式中,对于LOT,其可以等于之和。其中,表示生成的文字对应的结构化数据和输入的结构化数据中的所有数据之间的语义距离,分别表示生成的文字对应的结构化数据和需要出现列名的列以及不需要出现列名的列之间的语义距离。In one implementation, for L OT , it may be equal to and The sum of . Represents the semantic distance between the structured data corresponding to the generated text and all the data in the input structured data. and They respectively represent the semantic distance between the structured data corresponding to the generated text and the columns where column names need to appear and the columns where column names do not need to appear.

则,在具体优化时,可以记需要优化的目标损失值为L=LLM+LOT+Lplanning,然后基于目标损失值来调整第一预设模型中的参数。Then, during specific optimization, the target loss value to be optimized may be recorded as L=L LM +L OT +L planning , and then the parameters in the first preset model may be adjusted based on the target loss value.

更具体地,对于图6所示的第一预设模型,由于第一预设模型中的编码器接收的数据为图结构,则在将结构化数据输入至第一预设模型之前,还应该将结构化数据转换为图结构。其中,有关将结构化数据转换为图结构的描述可以参考本申请上述实施例中的描述,此处不再赘述。More specifically, for the first preset model shown in FIG6 , since the data received by the encoder in the first preset model is a graph structure, the structured data should be converted into a graph structure before being input into the first preset model. For the description of converting the structured data into a graph structure, reference can be made to the description in the above embodiment of the present application, which will not be repeated here.

在上述图6的基础上,本申请实施例对编码器和解码器的具体结构不做限定。例如,本实施例中所述的编码器包括N1个编码器模块。如图7所示,编码器由N1个编码器模块组成。其中,每个编码器模块中包括迁移编码块(transformer encoder block)、在transformer encoder block之后包括正则化模块,在正则化模块之后包括图注意力网络模块。其中,有关transformer encoder block、正则化模块以及图注意力网络模块的概念可以参考相关技术中的描述,此处不再赘述。Based on the above-mentioned Figure 6, the embodiment of the present application does not limit the specific structure of the encoder and decoder. For example, the encoder described in this embodiment includes N1 encoder modules. As shown in Figure 7, the encoder is composed of N1 encoder modules. Among them, each encoder module includes a migration coding block (transformer encoder block), a regularization module after the transformer encoder block, and a graph attention network module after the regularization module. Among them, the concepts of transformer encoder block, regularization module and graph attention network module can be referred to the description in the relevant technology, which will not be repeated here.

图8是本申请一实施例提供的数据到文本生成模型的训练装置800的结构示意图。如图8所示,该装置800包括:获取模块801和处理模块802。Fig. 8 is a schematic diagram of the structure of a training device 800 for a data-to-text generation model provided in an embodiment of the present application. As shown in Fig. 8 , the device 800 includes: an acquisition module 801 and a processing module 802 .

其中,获取模块801,用于获取第一训练数据,所述第一训练数据包括第一结构化数据和所述第一结构化数据对应的目标文本;所述获取模块801,还用于获取所述第一结构化数据输入第一预设神经网络模型之后所述第一预设神经网络模型输出的预测文本;所述获取模块801,还用于获取所述预测文本与所述目标文本之间的第一损失值;所述获取模块801,还用于获取预测结构化数据与所述第一结构化数据之间的第二损失值,所述预测结构化数据为基于预设转换算法将所述预测文本转换之后的结构化数据,所述预设转换算法用于将文本信息转换为结构化数据;处理模块802,用于根据所述第一损失值和所述第二损失值确定目标损失值;所述处理模块802,还用于根据所述目标损失值调整所述第一预设神经网络模型的参数,所述处理模块802,还用于向目标设备发送所述目标神经网络模型Among them, the acquisition module 801 is used to acquire first training data, and the first training data includes first structured data and a target text corresponding to the first structured data; the acquisition module 801 is also used to acquire the predicted text output by the first preset neural network model after the first structured data is input into the first preset neural network model; the acquisition module 801 is also used to acquire a first loss value between the predicted text and the target text; the acquisition module 801 is also used to acquire a second loss value between the predicted structured data and the first structured data, and the predicted structured data is structured data after the predicted text is converted based on a preset conversion algorithm, and the preset conversion algorithm is used to convert text information into structured data; the processing module 802 is used to determine a target loss value based on the first loss value and the second loss value; the processing module 802 is also used to adjust the parameters of the first preset neural network model according to the target loss value, and the processing module 802 is also used to send the target neural network model to the target device.

在一种可能的实现方式中,所述目标设备为第二服务器。In a possible implementation manner, the target device is a second server.

在一种可能的实现方式中,所述目标设备为终端设备。In a possible implementation manner, the target device is a terminal device.

在一种可能的实现方式中,所述处理模块802还用于:接收所述目标设备发送的所述第一预设神经网络模型。In a possible implementation, the processing module 802 is further used to: receive the first preset neural network model sent by the target device.

在一种可能的实现方式中,所述第一训练数据中还包括所述第一结构化数据中的每个数据的目标顺序,所述目标顺序为所述目标文本中与所述每个数据对应的文本在所述目标文本中的排列顺序;相应地,所述获取模块801还用于:获取所述第一结构化数据输入所述第一预设神经网络模型之后所述第一预设神经网络模型输出的所述第一结构化数据中的每个数据的预测顺序,所述预测顺序为所述预测文本中与所述每个数据对应的文本在所述预测文本中的排列顺序;所述处理模块802,还用于根据所述预测顺序与所述目标顺序确定第三损失值,以及根据所述第一损失值、所述第二损失值和所述第三损失值确定目标损失值In a possible implementation, the first training data also includes a target order for each data in the first structured data, and the target order is the order in which the text corresponding to each data in the target text is arranged in the target text; accordingly, the acquisition module 801 is also used to: obtain the predicted order of each data in the first structured data output by the first preset neural network model after the first structured data is input into the first preset neural network model, and the predicted order is the order in which the text corresponding to each data in the predicted text is arranged in the predicted text; the processing module 802 is also used to determine a third loss value based on the predicted order and the target order, and to determine a target loss value based on the first loss value, the second loss value and the third loss value.

在一种可能的实现方式中,所述目标损失值等于所述第一损失值、所述第二损失值及所述第三损失值之和。In a possible implementation manner, the target loss value is equal to the sum of the first loss value, the second loss value, and the third loss value.

在一种可能的实现方式中,所述获取模块801还用于:获取M1个第二结构化数据,所述M1个第二结构化数据包括以下至少两种类型的数据:表格数据、结构化数据查询SQL数据、逻辑数据,M1为大于1的正整数;所述处理模块802还用于:对所述M1个第二结构化数据进行预处理,获得与所述M1个第二结构化数据一一对应的M1个第二训练数据,所述M1个第二训练数据中与第j个第二结构化数据对应的第二训练数据包括:所述第j个第二结构化数据和所述第j个第二结构化数据对应的目标文本,j为正整数且j从1取至M1;所述处理模块802还用于:使用所述M1个第二训练数据训练第二预设神经网络模型,获得所述第一预设神经网络模型。In a possible implementation, the acquisition module 801 is also used to: acquire M1 second structured data, the M1 second structured data include at least two types of data: table data, structured data query SQL data, logical data, M1 is a positive integer greater than 1; the processing module 802 is also used to: pre-process the M1 second structured data to obtain M1 second training data corresponding one-to-one to the M1 second structured data, the second training data corresponding to the j-th second structured data in the M1 second training data include: the j-th second structured data and the target text corresponding to the j-th second structured data, j is a positive integer and j ranges from 1 to M1; the processing module 802 is also used to: use the M1 second training data to train a second preset neural network model to obtain the first preset neural network model.

在一种可能的实现方式中,所述第二预设神经网络模型包括N1个编码器和N2个解码器,所述N1个编码器用于获取所述M1个结构化数据中每个结构化数据的特征向量,所述N2个解码器用于基于所述每个结构化数据的特征向量预测所述每个结构化数据对应的预测文本,N1和N2为大于1的正整数。In one possible implementation, the second preset neural network model includes N1 encoders and N2 decoders, the N1 encoders are used to obtain the feature vectors of each structured data in the M1 structured data, and the N2 decoders are used to predict the predicted text corresponding to each structured data based on the feature vector of each structured data, and N1 and N2 are positive integers greater than 1.

图9是本申请一实施例的数据到文本生成模型的训练装置900的结构示意图。装置900用于执行上文中所述的方法。Fig. 9 is a schematic diagram of the structure of a data-to-text generation model training device 900 according to an embodiment of the present application. The device 900 is used to execute the method described above.

该装置900包括处理器910,处理器910用于执行存储器920存储的计算机程序或指令,或读取存储器920存储的数据,以执行上文各方法实施例中的方法。可选地,处理器910为一个或多个。The device 900 includes a processor 910, and the processor 910 is used to execute a computer program or instruction stored in a memory 920, or read data stored in the memory 920, so as to execute the method in each method embodiment above. Optionally, there are one or more processors 910.

可选地,如图9所示,该装置900还包括存储器920,存储器920用于存储计算机程序或指令和/或数据。该存储器920可以与处理器910集成在一起,或者也可以分离设置。可选地,存储器920为一个或多个。Optionally, as shown in FIG9 , the device 900 further includes a memory 920, and the memory 920 is used to store computer programs or instructions and/or data. The memory 920 may be integrated with the processor 910, or may be separately arranged. Optionally, the memory 920 is one or more.

可选地,如图9所示,该装置900还包括通信接口930,通信接口930用于信号的接收和/或发送。例如,处理器910用于控制通信接口930进行信号的接收和/或发送。Optionally, as shown in Fig. 9, the device 900 further includes a communication interface 930, and the communication interface 930 is used for receiving and/or sending signals. For example, the processor 910 is used to control the communication interface 930 to receive and/or send signals.

可选地,该装置900用于实现上文各个方法实施例中所述的操作。Optionally, the device 900 is used to implement the operations described in each method embodiment above.

例如,处理器910用于执行存储器920存储的计算机程序或指令,以实现上文各个方法实施例中所述的相关操作。例如,处理器910可以用于:获取第一训练数据,所述第一训练数据包括第一结构化数据和所述第一结构化数据对应的目标文本;获取所述第一结构化数据输入第一预设神经网络模型之后所述第一预设神经网络模型输出的预测文本;获取所述预测文本与所述目标文本之间的第一损失值;获取预测结构化数据与所述第一结构化数据之间的第二损失值,所述预测结构化数据为基于预设转换算法将所述预测文本转换之后的结构化数据,所述预设转换算法用于将文本信息转换为结构化数据;根据所述第一损失值和所述第二损失值确定目标损失值;根据所述目标损失值调整所述第一预设神经网络模型的参数;获得目标神经网络模型;向目标设备发送所述目标神经网络模型。For example, the processor 910 is used to execute the computer program or instructions stored in the memory 920 to implement the relevant operations described in the above various method embodiments. For example, the processor 910 can be used to: obtain first training data, the first training data includes first structured data and a target text corresponding to the first structured data; obtain the predicted text output by the first preset neural network model after the first structured data is input into the first preset neural network model; obtain a first loss value between the predicted text and the target text; obtain a second loss value between the predicted structured data and the first structured data, the predicted structured data is structured data after the predicted text is converted based on a preset conversion algorithm, and the preset conversion algorithm is used to convert text information into structured data; determine a target loss value based on the first loss value and the second loss value; adjust the parameters of the first preset neural network model based on the target loss value; obtain a target neural network model; send the target neural network model to a target device.

在一些示例中,所述目标设备为第二服务器。In some examples, the target device is a second server.

在一些示例中,所述目标设备为终端设备。In some examples, the target device is a terminal device.

在一些示例中,处理器910还用于:接收所述目标设备发送的所述第一预设神经网络模型。In some examples, processor 910 is also used to: receive the first preset neural network model sent by the target device.

在一些示例中,处理器910还用于:在所述第一训练数据中还包括所述第一结构化数据中的每个数据的目标顺序,所述目标顺序为所述目标文本中与所述每个数据对应的文本在所述目标文本中的排列顺序的情况下,获取所述第一结构化数据输入所述第一预设神经网络模型之后所述第一预设神经网络模型输出的所述第一结构化数据中的每个数据的预测顺序,所述预测顺序为所述预测文本中与所述每个数据对应的文本在所述预测文本中的排列顺序;根据所述预测顺序与所述目标顺序确定第三损失值;相应地,所述根据所述第一损失值和所述第二损失值确定目标损失值,包括:根据所述第一损失值、所述第二损失值和所述第三损失值确定目标损失值。In some examples, processor 910 is also used to: when the first training data also includes a target order for each data in the first structured data, and the target order is the arrangement order of texts corresponding to each data in the target text, obtain a predicted order for each data in the first structured data output by the first preset neural network model after the first structured data is input into the first preset neural network model, and the predicted order is the arrangement order of texts corresponding to each data in the predicted text; determine a third loss value based on the predicted order and the target order; accordingly, determining the target loss value based on the first loss value and the second loss value includes: determining the target loss value based on the first loss value, the second loss value and the third loss value.

在一些示例中,所述目标损失值等于所述第一损失值、所述第二损失值及所述第三损失值之和。In some examples, the target loss value is equal to the sum of the first loss value, the second loss value, and the third loss value.

在一些示例中,处理器910还用于:获取M1个第二结构化数据,所述M1个第二结构化数据包括以下至少两种类型的数据:表格数据、结构化数据查询SQL数据、逻辑数据,M1为大于1的正整数;对所述M1个第二结构化数据进行预处理,获得与所述M1个第二结构化数据一一对应的M1个第二训练数据,所述M1个第二训练数据中与第j个第二结构化数据对应的第二训练数据包括:所述第j个第二结构化数据和所述第j个第二结构化数据对应的目标文本,j为正整数且j从1取至M1;使用所述M1个第二训练数据训练第二预设神经网络模型,获得所述第一预设神经网络模型。In some examples, the processor 910 is also used to: obtain M1 second structured data, the M1 second structured data including at least two types of data: tabular data, structured data query SQL data, and logical data, where M1 is a positive integer greater than 1; preprocess the M1 second structured data to obtain M1 second training data corresponding one-to-one to the M1 second structured data, the second training data corresponding to the j-th second structured data in the M1 second training data including: the j-th second structured data and the target text corresponding to the j-th second structured data, where j is a positive integer and ranges from 1 to M1; use the M1 second training data to train a second preset neural network model to obtain the first preset neural network model.

在一些示例中,所述第二预设神经网络模型包括N1个编码器和N2个解码器,所述N1个编码器用于获取所述M1个结构化数据中每个结构化数据的特征向量,所述N2个解码器用于基于所述每个结构化数据的特征向量预测所述每个结构化数据对应的预测文本,N1和N2为大于1的正整数。In some examples, the second preset neural network model includes N1 encoders and N2 decoders, the N1 encoders are used to obtain the feature vectors of each structured data in the M1 structured data, and the N2 decoders are used to predict the predicted text corresponding to each structured data based on the feature vector of each structured data, and N1 and N2 are positive integers greater than 1.

在本申请实施例中,处理器是一种具有信号的处理能力的电路,在一种实现中,处理器可以是具有指令读取与运行能力的电路,例如中央处理器(central processingunit,CPU)、微处理器、图形处理器(graphics processing unit,GPU)(可以理解为一种微处理器)、或数字信号处理(digital signal processing,DSP)等;在另一种实现中,处理器可以通过硬件电路的逻辑关系实现一定功能,该硬件电路的逻辑关系是固定的或可以重构的,例如处理器为专用集成电路(application specific integrated circuit,ASIC)或可编程逻辑电路(programmable logic device,PLD)实现的硬件电路,例如现场可编程逻辑门阵列(field programmable gate array,FPGA)。在可重构的硬件电路中,处理器加载配置文档,实现硬件电路配置的过程,可以理解为处理器加载指令,以实现以上部分或全部单元的功能的过程。此外,还可以是针对人工智能设计的硬件电路,其可以理解为一种ASIC,例如神经网络处理器(neural-network processing unit,NPU)、张量处理器(tensorprocessing unit,TPU)、数据处理器(data processing unit,DPU)等。In the embodiment of the present application, the processor is a circuit with the ability to process signals. In one implementation, the processor can be a circuit with the ability to read and run instructions, such as a central processing unit (CPU), a microprocessor, a graphics processing unit (GPU) (which can be understood as a microprocessor), or a digital signal processing (DSP), etc.; in another implementation, the processor can realize certain functions through the logical relationship of the hardware circuit, and the logical relationship of the hardware circuit is fixed or reconfigurable, such as the processor is a hardware circuit implemented by an application specific integrated circuit (ASIC) or a programmable logic device (PLD), such as a field programmable gate array (FPGA). In a reconfigurable hardware circuit, the process of the processor loading a configuration document to implement the hardware circuit configuration can be understood as the process of the processor loading instructions to implement the functions of some or all of the above units. In addition, it can also be a hardware circuit designed for artificial intelligence, which can be understood as an ASIC, such as a neural-network processing unit (NPU), a tensor processing unit (TPU), a data processing unit (DPU), etc.

可见,以上装置中的各单元可以是被配置成实施以上方法的一个或多个处理器(或处理电路),例如:CPU、GPU、NPU、TPU、DPU、微处理器、DSP、ASIC、FPGA,或这些处理器形式中至少两种的组合。It can be seen that each unit in the above device can be one or more processors (or processing circuits) configured to implement the above method, such as: CPU, GPU, NPU, TPU, DPU, microprocessor, DSP, ASIC, FPGA, or a combination of at least two of these processor forms.

此外,以上装置中的各单元可以全部或部分可以集成在一起,或者可以独立实现。在一种实现中,这些单元集成在一起,以片上系统(system-on-a-chip,SOC)的形式实现。该SOC中可以包括至少一个处理器,用于实现以上任一种方法或实现该装置各单元的功能,该至少一个处理器的种类可以不同,例如包括CPU和FPGA,CPU和人工智能处理器,CPU和GPU等。In addition, all or part of the units in the above device can be integrated together, or can be implemented independently. In one implementation, these units are integrated together and implemented in the form of a system-on-a-chip (SOC). The SOC may include at least one processor for implementing any of the above methods or implementing the functions of each unit of the device. The type of the at least one processor may be different, for example, including a CPU and an FPGA, a CPU and an artificial intelligence processor, a CPU and a GPU, etc.

相应地,本申请实施例还提供一种存储有计算机程序的计算机可读存储介质,当计算机程序/指令被处理器执行时,致使处理器实现图2所述的方法中的步骤。Accordingly, an embodiment of the present application also provides a computer-readable storage medium storing a computer program. When the computer program/instructions are executed by a processor, the processor implements the steps in the method described in FIG. 2 .

相应地,本申请实施例还提供一种存储有计算机程序的计算机可读存储介质,当计算机程序/指令被处理器执行时,致使处理器实现图2所述的方法中的步骤。Accordingly, an embodiment of the present application also provides a computer-readable storage medium storing a computer program. When the computer program/instructions are executed by a processor, the processor implements the steps in the method described in FIG. 2 .

相应地,本申请实施例还提供一种计算机程序产品,包括计算机程序/指令,当计算机程序/指令被处理器执行时,致使处理器实现图2所述的方法中的步骤。Accordingly, an embodiment of the present application also provides a computer program product, including a computer program/instruction. When the computer program/instruction is executed by a processor, the processor is caused to implement the steps in the method described in FIG. 2 .

Claims (11)

1.一种数据到文本生成模型的训练方法,其特征在于,应用于第一服务器,包括:1. A training method for a data-to-text generation model, characterized in that it is applied to a first server and comprises: 获取第一训练数据,所述第一训练数据包括第一结构化数据和所述第一结构化数据对应的目标文本;Acquire first training data, where the first training data includes first structured data and a target text corresponding to the first structured data; 获取所述第一结构化数据输入第一预设神经网络模型之后所述第一预设神经网络模型输出的预测文本;Obtaining predicted text output by the first preset neural network model after the first structured data is input into the first preset neural network model; 获取所述预测文本与所述目标文本之间的第一损失值;Obtaining a first loss value between the predicted text and the target text; 获取预测结构化数据与所述第一结构化数据之间的第二损失值,所述预测结构化数据为基于预设转换算法将所述预测文本转换之后的结构化数据,所述预设转换算法用于将文本信息转换为结构化数据;Obtaining a second loss value between predicted structured data and the first structured data, wherein the predicted structured data is structured data after the predicted text is converted based on a preset conversion algorithm, and the preset conversion algorithm is used to convert text information into structured data; 根据所述第一损失值和所述第二损失值确定目标损失值;Determine a target loss value according to the first loss value and the second loss value; 根据所述目标损失值调整所述第一预设神经网络模型的参数,获得目标神经网络模型;Adjusting the parameters of the first preset neural network model according to the target loss value to obtain a target neural network model; 向目标设备发送所述目标神经网络模型。The target neural network model is sent to a target device. 2.根据权利要求1所述的方法,其特征在于,所述目标设备为第二服务器。2 . The method according to claim 1 , wherein the target device is a second server. 3.根据权利要求1所述的方法,其特征在于,所述目标设备为终端设备。3. The method according to claim 1 is characterized in that the target device is a terminal device. 4.根据权利要求1至3中任一项所述的方法,其特征在于,所述方法还包括:4. The method according to any one of claims 1 to 3, characterized in that the method further comprises: 接收所述目标设备发送的所述第一预设神经网络模型。Receive the first preset neural network model sent by the target device. 5.根据权利要求4所述的方法,其特征在于,所述第一训练数据中还包括所述第一结构化数据中的每个数据的目标顺序,所述目标顺序为所述目标文本中与所述每个数据对应的文本在所述目标文本中的排列顺序;5. The method according to claim 4, characterized in that the first training data also includes a target order of each data in the first structured data, and the target order is an arrangement order of texts corresponding to each data in the target text in the target text; 相应地,所述方法还包括:Accordingly, the method further comprises: 获取所述第一结构化数据输入所述第一预设神经网络模型之后所述第一预设神经网络模型输出的所述第一结构化数据中的每个数据的预测顺序,所述预测顺序为所述预测文本中与所述每个数据对应的文本在所述预测文本中的排列顺序;Obtaining a predicted order of each data in the first structured data output by the first preset neural network model after the first structured data is input into the first preset neural network model, wherein the predicted order is an arrangement order of texts corresponding to each data in the predicted text; 根据所述预测顺序与所述目标顺序确定第三损失值;Determining a third loss value according to the predicted order and the target order; 相应地,所述根据所述第一损失值和所述第二损失值确定目标损失值,包括:Accordingly, determining a target loss value according to the first loss value and the second loss value includes: 根据所述第一损失值、所述第二损失值和所述第三损失值确定目标损失值。A target loss value is determined according to the first loss value, the second loss value, and the third loss value. 6.根据权利要求5所述的方法,其特征在于,所述目标损失值等于所述第一损失值、所述第二损失值及所述第三损失值之和。6 . The method according to claim 5 , wherein the target loss value is equal to the sum of the first loss value, the second loss value and the third loss value. 7.根据权利要求6所述的方法,其特征在于,所述方法还包括:获取M1个第二结构化数据,所述M1个第二结构化数据包括以下至少两种类型的数据:表格数据、结构化数据查询SQL数据、逻辑数据,M1为大于1的正整数;7. The method according to claim 6, characterized in that the method further comprises: obtaining M1 second structured data, wherein the M1 second structured data comprises at least two types of data: table data, structured data query SQL data, and logical data, and M1 is a positive integer greater than 1; 对所述M1个第二结构化数据进行预处理,获得与所述M1个第二结构化数据一一对应的M1个第二训练数据,所述M1个第二训练数据中与第j个第二结构化数据对应的第二训练数据包括:所述第j个第二结构化数据和所述第j个第二结构化数据对应的目标文本,j为正整数且j从1取至M1;Preprocessing the M1 second structured data to obtain M1 second training data corresponding to the M1 second structured data one by one, wherein the second training data corresponding to the j-th second structured data in the M1 second training data includes: the j-th second structured data and the target text corresponding to the j-th second structured data, where j is a positive integer and ranges from 1 to M1; 使用所述M1个第二训练数据训练第二预设神经网络模型,获得所述第一预设神经网络模型。The second preset neural network model is trained using the M1 second training data to obtain the first preset neural network model. 8.根据权利要求7所述的方法,其特征在于,所述第二预设神经网络模型包括N1个编码器和N2个解码器,所述N1个编码器用于获取所述M1个结构化数据中每个结构化数据的特征向量,所述N2个解码器用于基于所述每个结构化数据的特征向量预测所述每个结构化数据对应的预测文本,N1和N2为大于1的正整数。8. The method according to claim 7 is characterized in that the second preset neural network model includes N1 encoders and N2 decoders, the N1 encoders are used to obtain the feature vectors of each structured data in the M1 structured data, and the N2 decoders are used to predict the predicted text corresponding to each structured data based on the feature vector of each structured data, and N1 and N2 are positive integers greater than 1. 9.一种数据到文本生成模型的训练装置,其特征在于,包括:9. A training device for a data-to-text generation model, comprising: 获取模块,用于获取第一训练数据,所述第一训练数据包括第一结构化数据和所述第一结构化数据对应的目标文本;An acquisition module, configured to acquire first training data, wherein the first training data includes first structured data and a target text corresponding to the first structured data; 所述获取模块,还用于获取所述第一结构化数据输入第一预设神经网络模型之后所述第一预设神经网络模型输出的预测文本;The acquisition module is further used to acquire the predicted text output by the first preset neural network model after the first structured data is input into the first preset neural network model; 所述获取模块,还用于获取所述预测文本与所述目标文本之间的第一损失值;The acquisition module is further used to acquire a first loss value between the predicted text and the target text; 所述获取模块,还用于获取预测结构化数据与所述第一结构化数据之间的第二损失值,所述预测结构化数据为基于预设转换算法将所述预测文本转换之后的结构化数据,所述预设转换算法用于将文本信息转换为结构化数据;The acquisition module is further used to acquire a second loss value between the predicted structured data and the first structured data, wherein the predicted structured data is structured data after the predicted text is converted based on a preset conversion algorithm, and the preset conversion algorithm is used to convert text information into structured data; 处理模块,用于根据所述第一损失值和所述第二损失值确定目标损失值;a processing module, configured to determine a target loss value according to the first loss value and the second loss value; 所述处理模块,还用于根据所述目标损失值调整所述第一预设神经网络模型的参数,获得目标神经网络模型;The processing module is further used to adjust the parameters of the first preset neural network model according to the target loss value to obtain a target neural network model; 所述处理模块,还用于向目标设备发送所述目标神经网络模型。The processing module is also used to send the target neural network model to the target device. 10.一种数据到文本生成的训练装置,包括处理器,该处理器用于从存储器调用计算机程序,当所述计算机程序被执行时,该处理器用于执行权利要求1至8中任一项所述的方法。10. A training device for data-to-text generation, comprising a processor, the processor being used to call a computer program from a memory, and when the computer program is executed, the processor being used to execute the method according to any one of claims 1 to 8. 11.一种计算机可读存储介质,用于存储计算机程序,该计算机程序包括用于执行权利要求1至8中任一项所述的方法的代码。11. A computer-readable storage medium for storing a computer program, the computer program comprising codes for executing the method according to any one of claims 1 to 8.
CN202210589921.0A 2022-05-26 2022-05-26 Training method and device for data-to-text generation model Active CN115017178B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210589921.0A CN115017178B (en) 2022-05-26 2022-05-26 Training method and device for data-to-text generation model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210589921.0A CN115017178B (en) 2022-05-26 2022-05-26 Training method and device for data-to-text generation model

Publications (2)

Publication Number Publication Date
CN115017178A CN115017178A (en) 2022-09-06
CN115017178B true CN115017178B (en) 2024-10-29

Family

ID=83071680

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210589921.0A Active CN115017178B (en) 2022-05-26 2022-05-26 Training method and device for data-to-text generation model

Country Status (1)

Country Link
CN (1) CN115017178B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115499187B (en) * 2022-09-13 2024-12-31 国网智能电网研究院有限公司 API safety monitoring model training method, monitoring method, device and equipment
CN115774995B (en) * 2022-12-02 2026-02-27 东南大学 A Keyword Generation Method Based on Optimal Transmission Theory
CN115796125B (en) * 2023-02-08 2023-05-05 阿里巴巴达摩院(杭州)科技有限公司 Text generation method, model training method and device
CN116469111B (en) * 2023-06-08 2023-09-15 江西师范大学 Character generation model training method and target character generation method
CN117093874A (en) * 2023-07-05 2023-11-21 中国银行股份有限公司 Text generation methods, devices, computer equipment, media and program products
CN116603249B (en) * 2023-07-19 2023-10-03 深圳须弥云图空间科技有限公司 Training method of large language model applied to role playing reasoning game
CN118246408B (en) * 2024-05-28 2024-08-27 珠海金山办公软件有限公司 Data generation method, device, electronic device and storage medium
CN120235194B (en) * 2025-05-30 2025-08-26 数据堂(北京)科技股份有限公司 Data enhancement method for large model training

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111709248A (en) * 2020-05-28 2020-09-25 北京百度网讯科技有限公司 Training method, device and electronic device for text generation model

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12248556B2 (en) * 2021-06-23 2025-03-11 Intel Corporation Authenticator-integrated generative adversarial network (GAN) for secure deepfake generation

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111709248A (en) * 2020-05-28 2020-09-25 北京百度网讯科技有限公司 Training method, device and electronic device for text generation model

Also Published As

Publication number Publication date
CN115017178A (en) 2022-09-06

Similar Documents

Publication Publication Date Title
CN115017178B (en) Training method and device for data-to-text generation model
CN112257858B (en) Model compression method and device
US11915128B2 (en) Neural network circuit device, neural network processing method, and neural network execution program
CN113157919B (en) Sentence Text Aspect-Level Sentiment Classification Method and System
WO2021190597A1 (en) Processing method for neural network model, and related device
CN112257751B (en) Neural Network Pruning Methods
CN113609337B (en) Pre-training method, training method, device, equipment and medium for graph neural network
CN110781686B (en) Statement similarity calculation method and device and computer equipment
CN114707655B (en) Quantum line conversion method, quantum line conversion system, storage medium and electronic equipment
CN115860100A (en) A neural network model training method, device and computing equipment
CN116797850A (en) Class incremental image classification method based on knowledge distillation and consistency regularization
CN117009792A (en) Model data processing method, device, computer equipment and storage medium
CN116738983A (en) Word embedding method, device and equipment for performing financial field task processing by model
CN116362301A (en) A kind of model quantification method and related equipment
CN120218209A (en) A processing method and device for updating model knowledge by optimizing MLP weights
Verma et al. Unleashing the power of deep neural networks: An interactive exploration of static and dynamic architectures
Lima et al. A grammar-based GP approach applied to the design of deep neural networks
WO2025117036A1 (en) Tuning large language models for next sentence prediction
WO2022119466A1 (en) Device and method for implementing a tensor-train decomposition operation
KR20210157826A (en) Method for sturcture learning and model compression for deep neural netwrok
Mala et al. Fuzzy rule based classification for heart dataset using fuzzy decision tree algorithm based on fuzzy RDBMS
CN116579403A (en) A data processing method and related equipment
CN115018050B (en) A training method and apparatus for a text generation model
CN115455162A (en) Answer sentence selection method and device for hierarchical capsule and multi-view information fusion
CN116090538A (en) Model weight acquisition method and related system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant