CN115017178A

CN115017178A - Training method and device for data-to-text generation model

Info

Publication number: CN115017178A
Application number: CN202210589921.0A
Authority: CN
Inventors: 耿瑞莹; 石翔; 黎槟华; 孙健; 李永彬
Original assignee: Alibaba China Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2022-05-26
Filing date: 2022-05-26
Publication date: 2022-09-06

Abstract

The embodiment of the application provides a training method and device for a data-to-text generation model. The training method comprises the following steps: acquiring first training data, wherein the first training data comprises first structured data and a target text corresponding to the first structured data; acquiring a prediction text output by a first preset neural network model after first structured data are input into the first preset neural network model; acquiring a first loss value between a predicted text and a target text; acquiring a second loss value between the predicted structured data and the first structured data, and determining a target loss value according to the first loss value and the second loss value; adjusting parameters of a first preset neural network model according to the target loss value to obtain a target neural network model; and sending the target neural network model to the target device. According to the technical scheme, the semantic fidelity of the generated text and the input structured data can be improved.

Description

Training method and device for data-to-text generation model

Technical Field

The application relates to the field of artificial intelligence, in particular to a training method and device for a data-to-text generation model.

Background

The data-to-text generation task is one of the important research tasks for text generation, and the goal thereof is to automatically generate relevant descriptive text from the input structured data.

Currently, structured data is typically generated into text based on an end-to-end generative model. The end-to-end model can be considered as a black box model, and the black box model can generate text information corresponding to the input data after the graph structure (input data) corresponding to the structured data is given.

However, when generating data-to-text based on an end-to-end generation model, the generated text is not semantically faithful to the input data. Therefore, how to improve the semantic fidelity of the generated text and the input structured data becomes an urgent technical problem to be solved.

Disclosure of Invention

The application provides a training method and device for a data-to-text generation model, which can improve the semantic fidelity of generated texts and input structured data.

In a first aspect, the present application provides a training method for a data-to-text generation model, including: acquiring first training data, wherein the first training data comprises first structured data and a target text corresponding to the first structured data; acquiring a predicted text output by a first preset neural network model after the first structured data is input into the first preset neural network model; acquiring a first loss value between the predicted text and the target text; acquiring a second loss value between predicted structured data and the first structured data, wherein the predicted structured data is structured data obtained after the predicted text is converted based on a preset conversion algorithm, and the preset conversion algorithm is used for converting text information into structured data; determining a target loss value according to the first loss value and the second loss value; adjusting parameters of the first preset neural network model according to the target loss value to obtain a target neural network model; and sending the target neural network model to a target device.

In this embodiment, the parameters of the first preset neural network model are adjusted based on the target loss value, that is, the parameters of the preset neural network model are adjusted by optimizing the first loss value and the second loss value simultaneously.

It should be understood that, since the first loss value may reflect the deviation degree between the current predicted text and the target text, and the second loss value may reflect the fidelity between the predicted text and the first structured data, in this embodiment, compared with the method of adjusting the first preset neural network model parameter only by considering the deviation degree between the current predicted text and the target text, the task of optimizing the fidelity between the predicted text and the first structured data is added, so that the semantic fidelity between the text predicted by the trained target neural network model and the input structured data can be improved in the process of adjusting the parameter of the first preset neural network model based on the target loss value.

With reference to the first aspect, in a possible implementation manner, the target device is a second server.

With reference to the first aspect, in a possible implementation manner, the target device is a terminal device.

With reference to the first aspect, in a possible implementation manner, the method further includes: and receiving the first preset neural network model sent by the target equipment.

With reference to the first aspect, in a possible implementation manner, the first training data further includes a target sequence of each data in the first structured data, where the target sequence is an arrangement sequence of a text corresponding to each data in the target text; accordingly, the method further comprises: acquiring a prediction sequence of each data in the first structured data output by a first preset neural network model after the first structured data are input into the first preset neural network model, wherein the prediction sequence is an arrangement sequence of a text corresponding to each data in the predicted text; determining a third loss value according to the prediction sequence and the target sequence; accordingly, the determining a target loss value from the first loss value and the second loss value comprises: determining a target loss value according to the first loss value, the second loss value and the third loss value.

In the implementation mode, a training task for optimizing a loss value between the predicted position and the target position of each datum in the first structure data is additionally introduced, so that fluency of texts output by the model after parameters of the first preset neural network model are adjusted can be improved.

With reference to the first aspect, in a possible implementation manner, the target loss value is equal to a sum of the first loss value, the second loss value, and the third loss value.

With reference to the first aspect, in a possible implementation manner, the method further includes: obtaining M1 second structured data, the M1 second structured data comprising at least two types of data: table data, structured data query SQL data, logical data, M1 is a positive integer greater than 1; preprocessing the M1 second structured data to obtain M1 second training data corresponding to the M1 second structured data one to one, where the second training data corresponding to the jth second structured data in the M1 second training data includes: the jth second structured data and a target text corresponding to the jth second structured data, j is a positive integer and j is taken from 1 to M1; and training a second preset neural network model by using the M1 second training data to obtain the first preset neural network model.

In this embodiment, the first preset neural network model is obtained by training the second preset neural network model based on training data corresponding to various kinds of structured data. It can be understood that the first preset neural network model obtained by the implementation is trained based on training data corresponding to various kinds of structured data, and therefore, the obtained first preset neural network model already has partial capability of generating text from the structured data.

With reference to the first aspect, in a possible implementation manner, the second preset neural network model includes N1 encoders and N2 decoders, where the N1 encoders are configured to obtain a feature vector of each of the M1 structured data, the N2 decoders are configured to predict a predicted text corresponding to each of the structured data based on the feature vector of each of the structured data, and N1 and N2 are positive integers greater than 1.

In a second aspect, the present application provides a training apparatus for a data-to-text generation model, comprising: the acquisition module is used for acquiring first training data, wherein the first training data comprise first structural data and a target text corresponding to the first structural data; the obtaining module is further configured to obtain a predicted text output by a first preset neural network model after the first structured data is input into the first preset neural network model; the obtaining module is further configured to obtain a first loss value between the predicted text and the target text; the obtaining module is further configured to obtain a second loss value between predicted structured data and the first structured data, where the predicted structured data is structured data obtained after the predicted text is converted based on a preset conversion algorithm, and the preset conversion algorithm is used to convert text information into structured data; the processing module is used for determining a target loss value according to the first loss value and the second loss value; the processing module is further configured to adjust parameters of the first preset neural network model according to the target loss value to obtain a target neural network model; the processing module is further used for sending the target neural network model to the target equipment

With reference to the first aspect, in a possible implementation manner, the processing module is further configured to: and receiving the first preset neural network model sent by the target equipment. With reference to the second aspect, in a possible implementation manner, the first training data further includes a target sequence of each data in the first structured data, where the target sequence is an arrangement sequence of a text corresponding to each data in the target text; correspondingly, the obtaining module is further configured to: obtaining a prediction sequence of each data in the first structured data output by the first preset neural network model after the first structured data are input into the first preset neural network model, wherein the prediction sequence is an arrangement sequence of a text corresponding to each data in the predicted text; the processing module is further configured to determine a third loss value according to the prediction order and the target order, and determine a target loss value according to the first loss value, the second loss value, and the third loss value

With reference to the second aspect, in one possible implementation manner, the target loss value is equal to a sum of the first loss value, the second loss value, and the third loss value.

With reference to the second aspect, in a possible implementation manner, the obtaining module is further configured to: obtaining M1 second structured data, the M1 second structured data comprising at least two types of data: table data, structured data query SQL data, logical data, M1 is a positive integer greater than 1; the processing module is further configured to: preprocessing the M1 second structured data to obtain M1 second training data corresponding to the M1 second structured data one to one, where the second training data corresponding to the jth second structured data in the M1 second training data includes: the jth second structured data and a target text corresponding to the jth second structured data, j is a positive integer and j is taken from 1 to M1; the processing module is further configured to: and training a second preset neural network model by using the M1 second training data to obtain the first preset neural network model.

With reference to the second aspect, in a possible implementation manner, the second preset neural network model includes N1 encoders and N2 decoders, where the N1 encoders are configured to obtain a feature vector of each of the M1 structured data, the N2 decoders are configured to predict a predicted text corresponding to each of the structured data based on the feature vector of each of the structured data, and N1 and N2 are positive integers greater than 1.

In a third aspect, there is provided a training apparatus for a data-to-text generative model, comprising a processor configured to invoke a computer program from a memory, and when the computer program is executed, the processor is configured to perform the method of the first aspect or any one of the possible implementations of the first aspect.

In a fourth aspect, a communication device is provided, which includes the apparatus in the second aspect or any one of the possible implementation manners.

In a fifth aspect, a computer-readable storage medium is provided for storing a computer program comprising code for performing the method of the first aspect or any one of the possible implementations of the first aspect.

Drawings

Fig. 1 is a schematic structural diagram of an application scenario provided in an embodiment of the present application;

FIG. 2 is a schematic flowchart of a training method for generating a model from data to text according to an embodiment of the present disclosure;

FIG. 3 is a schematic structural diagram of a graph-to-text-based pre-training model according to an embodiment of the present application;

FIG. 4 is a diagram illustrating a transformation of first structured data into a graph structure according to an embodiment of the present application;

FIG. 5 is a schematic diagram of converting first structured data into graph structures according to another embodiment of the present application;

FIG. 6 is a schematic structural diagram of a training method for generating a model from data to text according to an embodiment of the present application;

FIG. 7 is a schematic structural diagram of an encoder according to an embodiment of the present application;

FIG. 8 is a schematic structural diagram of a training apparatus for generating a model from data to text according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a training apparatus for generating a model from data to text according to another embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only a few embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

For ease of understanding, several terms referred to in the embodiments of the present application will be first introduced.

1. Text generation

Text generation is a very important but challenging task in the field of natural language processing today, aiming at generating reasonable and readable text from input data (e.g., sequences and keywords) in natural language mode. The purpose of this is to hope to generate readable natural language text, more representative applications such as dialog systems, text summarization and machine translation.

2. Neural network

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

A key technology of today's artificial intelligence is Neural Networks (NN). Neural networks widely interconnect a large number of simple processing units (called neurons) by simulating human brain neural cell connections, forming a complex network system.

A simple neural network comprises three layers, namely an input layer, an output layer and a hidden layer (also called an intermediate layer), and the connection between each layer corresponds to a weight (the value of which is called a weight and a parameter). The neural network has excellent performance in the fields of computer vision, natural language processing and the like, and the weight is adjusted through a training algorithm, so that the prediction result of the neural network is optimal.

The training of neural networks generally involves two computational steps, a first step being a forward computation and a second step being a reverse computation. Wherein, the forward calculation is: the input value and the parameter are calculated and then pass through a nonlinear function to generate an output value. The output value will either be the final output of the network or will continue to perform similar calculations as subsequent input values. The deviation of the output value of the network and the actual label value of the corresponding sample is measured by a loss function, the loss function is expressed as a function f (x, w) of an input sample x and a network parameter w, the parameter w of the network needs to be continuously adjusted in order to minimize the loss function, the reverse calculation is used for obtaining an updated value of the parameter w, in an algorithm based on gradient descent, the reverse calculation is started from the last layer of the neural network, the partial derivative of the loss function on each layer of the parameter is calculated, and finally the partial derivative of all the parameters is obtained, namely the gradient. And updating the parameter w to the opposite direction of the gradient by a certain step length eta during each iteration to obtain a new parameter w, namely completing one-step training. The update procedure is represented by the following equation:

wherein, w _t Denotes the parameter used at the t-th iteration, w _t+1 Representing the updated parameter, eta is called the learning rate, B _t A set of samples representing the input of the t-th iteration.

The process of training the neural network is a process of learning the weights corresponding to the neurons, and the final purpose is to obtain the weights corresponding to each layer of neurons of the trained neural network.

In recent years, with the rapid development of artificial intelligence technology, text generation has become an important technology in natural language processing. The user can generate a text sequence satisfying a specific target by using the established information and the text generation model. The text generation model has rich application scenes, such as reading comprehension, man-machine conversation or intelligent writing and the like. Among them, in the text generation technology, the data-to-text generation task is one of the important research tasks of the text generation technology, and the goal is to automatically generate related descriptive text according to the input structured data. The structured data is, for example, Table (Table) data, Structured Query Language (SQL) data, Logic (Logic) data, and the like. It is noted that, for the concepts related to Table data, SQL data and Logic data, reference may be made to the description in the related art, and the description thereof is omitted here.

Exemplarily, fig. 1 is a schematic structural diagram of an application scenario provided in an embodiment of the present application. As shown in fig. 1, in the application scenario, the training server 101 may train the structured data in the structured database using a classification algorithm, so as to obtain a trained model, and then the training server 101 may send the trained model to the target device 102, so that the target device 102 may generate text information corresponding to new sample data using the trained model when receiving the new sample data.

It should be noted that, in the embodiment of the present application, a specific generation task type in the application scenario is not limited. For example, the Text generation system may complete a Table (Table) data-to-Text (Text) based generation task type, also referred to as Table-to-Text generation; or, the generation of a task type from Structured Query Language (SQL) data to Text, which is also called SQL-to-Text, may be completed; alternatively, a logical-to-Text (Text) generation task type based on logical (Logic) data may be completed, also referred to as Logic-to-Text generation; alternatively, a generation task type based on replying natural language generation (ResponseNLG) data to Text (Text) may also be completed.

Generally, in the application scenario shown in fig. 1, when the training server 101 performs training based on the structured data in the structured database, the preset model used is an autoregressive-based generation model, for example, a GPT model or a T5 model is used to generate text from the data. The GPT model or the T5 model may be considered as a black box model that, given input data, generates text information corresponding to the data.

However, when text is generated from data based on the GPT model or the T5 model, semantic offset is generally generated between the generated text and the original input data, resulting in low accuracy of the generated text.

In view of this, the present application provides a method and an apparatus for training a data-to-text generation model. According to the training method of the data-to-text generation model, when the preset neural network model is trained, besides the original training task, other training tasks are additionally introduced, wherein the additional training tasks are used for enabling the text output by the preset neural network model to be closer to the input structured data, so that the semantic fidelity of the generated text and the real-time structured data is improved when the trained neural network model predicts the real-time structured data.

The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

FIG. 2 is a schematic flowchart of a training method of a data-to-text generative model according to an embodiment of the present application. As shown in fig. 2, the method of this embodiment may include S201, S202, S203, S204, S205, S206, and S207, and the method may be performed by the training server 101 shown in fig. 1.

It should be noted that, in this embodiment, the training server 101 is also referred to as a first server.

S201, obtaining first training data, wherein the first training data comprises first structural data and a target text corresponding to the first structural data.

It should be understood that, if a certain neural network model is to be trained, the first step is to acquire samples, i.e., training data, for training the neural network model.

In this embodiment, the first training data includes first structured data and a target text corresponding to the first structured data.

It should be noted that the embodiment of the present application does not limit the specific type of the trained neural network model. For example, a neural network model based on the type of Table (Table) data to Text (Text) is trained, i.e., a neural network model for Table-to-Text; or training a Structured Query Language (SQL) data-to-Text type neural network model, i.e., a neural network model for SQL-to-Text; or training a Logic (Logic) data to Text (Text) type neural network model, namely the neural network model for Logic-to-Text; still alternatively, it may be a type of neural network model that is trained to reply to natural language generated (ResponseNLG) data to Text (Text).

It should be understood that, when the neural network model to be trained by the present application is Table-to-Text, the first structured data included in the first training data in this embodiment refers to Table data, when the neural network model to be trained by the present application is SQL-to-Text, the first structured data included in the first training data in this embodiment refers to SQL data, when the neural network model to be trained in the present application is Logic-to-Text, the first training data in this embodiment includes first structured data that is Logic data, while the neural network model to be trained by the present application is a generation task of reverting natural language generated (ResponseNLG) data to Text (Text), the first structured data included in the first training data in this embodiment refers to reply natural language generated data, where the reply natural language generated data includes Table data and SQL data.

In this embodiment, the first training data includes a target text corresponding to the first structured data. It should be understood that the target text is used to indicate that, after the first structured data is input to the neural network to be trained, the target value output by the neural network to be trained is desired, i.e., the target text can be considered as the ideal text information to be output.

S202, acquiring a predicted text output by a first preset neural network model after the first structured data is input into the first preset neural network model.

It should be noted that, in this embodiment, a specific form of the first predetermined neural network model is not limited. For example, the first pre-set neural network model may be a model that has been trained on some large data set, or may be a redesigned model. It should also be understood that the first preset neural network model in the present embodiment refers to a model capable of outputting text information.

In general, when certain data is input to a neural network, the neural network model may output a predicted value corresponding to the certain data. Therefore, in the present embodiment, when the first structured data is input to the first preset neural network model, the first preset neural network model may output the predicted text.

It should be noted that in the field of text generation, in order to obtain a model for a specific type quickly, some models (also called pre-trained models) that have been trained on a large data set are usually used as basic models. It will be appreciated that when using a pre-trained model as a base model, it is desirable to unify the training data used to train a particular type of model with the base model.

In one embodiment, the pre-training model used is the Graph (Graph) to Text (Text) based model shown in FIG. 3, also referred to as the Graph-to-Text model. As shown in fig. 3, the Graph-to-Text model includes an encoder and a decoder. The encoder is used for acquiring a feature vector C corresponding to certain image structure data, and the decoder is used for predicting a predicted text corresponding to the certain structured data based on the feature vector C.

It should be understood that if the pre-training model used in the present embodiment is a Graph (Graph) -to-Text (Text) -based model, also referred to as Graph-to-Text, the first structured data needs to be pre-processed to convert the first structured data into a Graph structure.

Illustratively, when the first structured data is SQL data: when the date price of Where of increasing and delivering is less than 10' by selecting, fig. 4 is a schematic diagram for converting the first structured data into a graph structure according to an embodiment of the present application. As shown in FIG. 4, the first node of the graph structure is a root node, then the root node is respectively connected with a selection (Select) node and a Where (Where) node, the Select is respectively connected with two ' AGG: none ' nodes, wherein one ' AGG: none ' node is connected with a growth degree node, the other ' AGG: none ' node is connected with a development destination node, the Where node is connected with an operation ' Op: < ' node ', and the ' Op: < ' node is respectively connected with a latest price node and 10.

Exemplarily, when the first structured data is Table data shown in Table 1:

TABLE 1 Table data

Definite growth year	Purpose of increasing hair
		2016	Financing other assets
2016	Project financing

Fig. 5 is a schematic diagram of converting first structured data into a graph structure according to another embodiment of the present application. As shown in fig. 5, the converted graph structure includes 5 nodes. The nodes are connected with the node 2016 in a fixed increasing year, the node increasing and issuing destinations are connected with other asset financing and node project financing of the nodes, the node 2016 is connected with other asset financing and node project financing of the nodes, and other asset financing and node project financing of the nodes are connected.

It should be noted that the diagram structures shown in fig. 4 and 5 are merely examples, and do not limit the present application.

S203, a first loss value between the predicted text and the target text is obtained.

The predicted text refers to text information output by the first preset neural network model after the first structured data is input into the first preset neural network model. The target text is the ideal text output by the preset neural network which needs to be trained after the first structural input is performed to the preset neural network.

It should be understood that in training the neural network, the smaller the deviation between the target text and the predicted text, the better. Therefore, in this embodiment, after the target text and the predicted text are obtained, a first loss value between the predicted text and the target text may be calculated first. It will be appreciated that the first loss value reflects the degree of deviation between the current predicted text and the target text.

And S204, acquiring a second loss value between the predicted structured data and the first structured data, wherein the predicted structured data is the structured data obtained after the predicted text is converted based on a preset conversion algorithm, and the preset conversion algorithm is used for converting the text information into the structured data.

It will be appreciated that when generating the predicted text based on the first structured data using the neural network model, the predicted text should be semantically as faithful to the first structured data as possible, in other words, the predicted text is required to be semantically as close as possible to the first structured data.

Therefore, in the training process, after the predicted text is obtained, the predicted text is converted into the structured data by using the preset conversion algorithm, and then the converted structured data is compared with the first structured data, that is, the second loss value between the predicted structured data and the first structured data is calculated. It will be appreciated that the second loss value may reflect loyalty between the predictive text and the first structured data.

And S205, determining a target loss value according to the first loss value and the second loss value.

In one possible implementation, the target loss value is a sum of the first loss value and the second loss value.

In another possible implementation, the target loss value is a weighted sum of the first loss and the second loss value.

S206, adjusting parameters of the first preset neural network model according to the target loss value to obtain a target neural network model.

In one possible embodiment, the first predetermined neural network model is a neural network model transmitted by the target device.

The target device is, for example, a terminal device or a server (also referred to as a second server in this embodiment) that deploys a first preset neural network model.

It should be noted that, in this embodiment, the specific form of the parameter in the first preset neural network model is not limited. For example, the first predetermined neural network model is an initialization model (i.e., it can be considered that the parameters in the first predetermined neural network model have not been trained), or the first predetermined neural network model is a trained model (i.e., it can be considered that the parameters in the first predetermined neural network model have been trained).

In this embodiment, the parameters of the first predetermined neural network model are adjusted based on the target loss value. Namely, the first loss value and the second loss value are optimized simultaneously to adjust the parameters of the preset network model. It should be understood that, since the first loss value may reflect the deviation degree between the current predicted text and the target text, and the second loss value may reflect the fidelity between the predicted text and the first structured data, in this embodiment, compared with the method of adjusting the first preset neural network model parameter only by considering the deviation degree between the current predicted text and the target text, the task of optimizing the fidelity between the predicted text and the first structured data is added, so that the semantic fidelity between the text predicted by the trained target neural network model and the input structured data can be improved in the process of adjusting the parameter of the first preset neural network model based on the target loss value.

And S207, sending the target neural network model to the target equipment.

Illustratively, the target device is a server (also referred to as a second server in this embodiment) that can deploy the neural network model, or a terminal device that can deploy the neural network model.

It should be understood that when the target device is a second server, if the target neural network model is to be used to generate the text information, the terminal device may first send structured data (referred to as new sample data, for example) that needs to generate the text information to the second server, and then the second server generates the text information of the new sample data. That is, in this scenario, the terminal device may obtain the text information corresponding to the new sample data by accessing the second server.

It should be understood that when the target device is a terminal device, if the target neural network model is to be used to generate text information, the terminal device may directly input structured data (referred to as new sample data, for example) that needs to generate the text information into the target neural network model, and then generate the text information of the new sample data.

As an optional embodiment, on the basis of the embodiment shown in fig. 2, in a possible implementation manner, the first training data in the present application may further include a target sequence of each data in the first structured data, where the target sequence is an arrangement sequence of texts corresponding to each data in the target text; correspondingly, the training method of the present application further comprises: acquiring a prediction sequence of each data in the first structured data output by the first preset neural network model after the first structured data are input into the first preset neural network model, wherein the prediction sequence is an arrangement sequence of texts corresponding to each data in the predicted texts; determining a third loss value according to the prediction sequence and the target sequence; accordingly, determining a target loss value from the first loss value and the second loss value includes: and determining a target loss value according to the first loss value, the second loss value and the third loss value.

It should be understood that the first structured data typically includes a plurality of data. For example, for a table data, several attributes are included in the table data, such as name information, birth date and month information, occupation information, and the like.

It should also be understood that, in general, when training a generative model from structured data to text, the more and more flexible the text generated should be, and whether the text generated is flexible is related to the order of each of the structured data in the text generated. For example, when generating text, the above table data including name information, date and month information, and occupation information may describe the name first, then the date and month, and finally the occupation information, which is more straightforward, but if the description is changed to another description, the generated text may be more confusing.

Thus, in this embodiment, the target order of each of the first structured data is also included in the first training data. The target sequence refers to an arrangement sequence of texts corresponding to each data in the target text, namely, the target sequence describes an ideal position of each data in the first structured data when the text is generated.

Then, during specific training, the present embodiment obtains a prediction order of each data in the first structured data output by the first preset neural network model after the first structured data is input into the first preset neural network model, where the prediction order may be considered as an arrangement order of texts corresponding to each data in the texts predicted by the first preset neural network model (referred to as predicted texts) under parameters of the current first preset neural network model, that is, the prediction order describes position information of each data in the first structured data predicted by the first preset neural network model.

It should be understood that the smaller the deviation between the predicted order and the target order, the more smooth the generated text is. Therefore, in this embodiment, after the prediction order is obtained, the deviation between the prediction order and the target order is calculated, that is, the third loss value is determined according to the prediction order and the target order; correspondingly, after the third loss value is determined, the target loss value is determined according to the first loss value, the second loss value and the third loss value, and then the target loss value is used for adjusting the parameters of the first preset neural network model. Specifically, when the parameter of the first predetermined neural network model is adjusted according to the target loss value, the minimized target loss value may be used as the optimization target.

In this implementation, in the process of training the first preset neural network model based on the first structure volume data, in addition to the training task of optimizing the loss value between the predicted structured data corresponding to the predicted text and the first structured data, which is additionally introduced in the embodiment shown in fig. 2, the training task of optimizing the loss value between the predicted position and the target position of each data in the first structure volume data is additionally introduced, so that the fluency of the text output by the model after adjusting the parameters of the first preset neural network model can be further improved.

It can be appreciated that the neural network model has made significant progress in text generation research, with the advantage that the neural network model can learn semantic mappings of input data to output text end-to-end without human involvement for feature engineering. However, neural network models tend to have a large number of parameters, and most text-generating task data sets are very small.

In view of this, when the training model is generated for a specific text, the present embodiment may first adopt the structured data of multiple types of text generation tasks to pre-train the neural network model. For example, M1 second structured data may be obtained first, wherein the M1 second structured data includes at least two types of data: tabular data, structured data query SQL data, logical data, M1 is a positive integer greater than 1. Then, preprocessing the obtained M1 second structured data to obtain M1 second training data corresponding to the M1 second structured data one to one, where the second training data corresponding to the jth second structured data in the M1 second training data includes: the jth second structured data and the target text corresponding to the jth second structured data; finally, the second preset neural network model is trained by using the M1 second training data to obtain the first preset neural network model.

In other words, in this embodiment, the first preset neural network model is obtained by training the second preset neural network model based on training data corresponding to various kinds of structured data. It can be understood that the first preset neural network model obtained by the implementation is trained based on training data corresponding to various kinds of structured data, and therefore, the obtained first preset neural network model already has partial capability of generating text from the structured data.

For ease of understanding, the following describes a detailed implementation of a training method for a data-to-text generation model provided in the embodiments of the present application.

Fig. 6 is a method for training a data-to-text generation model in this implementation manner according to an embodiment of the present application. As shown in FIG. 6, a first predetermined model is used comprising an encoder and a decoder, wherein the encoder is used for encoding the input structured data into feature directionsQuantity C, after which C is input into the decoder and the decoder outputs the predicted text based on C. In addition, in this embodiment, the first preset model further includes a text planning module and an optimal transmission module. The text planning module is used for predicting the predicted position of each piece of data included in the structured data in the generated text, and determining a third loss value, which is marked as L, according to the predicted position and the ideal position of each piece of data in the generated text _planning The optimal transmission module is used for calculating a second loss value between the structured data corresponding to the generated text and the input structured data, and the second loss value is recorded as L _OT It should be understood that the present embodiment further includes a first loss value, where the first loss value is a loss value between the predicted text generated by the encoder and the ideal text and is denoted as L _LM 。

In one implementation, for L _OT Which may be equal to

And

and (4) summing. Wherein,

representing the semantic distance between the structured data corresponding to the generated text and all data in the input structured data,

and

and respectively representing the semantic distance between the structured data corresponding to the generated characters and the columns needing the column names to appear and the columns needing no column names to appear.

In the specific optimization, the target loss value to be optimized may be L ═ L _LM +L _OT +L _planning Then, parameters in the first preset model are adjusted based on the target loss value.

More specifically, for the first preset model shown in fig. 6, since the data received by the encoder in the first preset model is a graph structure, the structured data should be converted into the graph structure before being input to the first preset model. For the description of converting the structured data into the graph structure, reference may be made to the description in the foregoing embodiments of the present application, and details are not repeated here.

On the basis of fig. 6, the specific structures of the encoder and the decoder are not limited in the embodiments of the present application. For example, the encoder described in this embodiment includes N1 encoder modules. As shown in fig. 7, the encoder consists of N1 encoder modules. Wherein each encoder module comprises a migration encoding block (transform encoder block), a regularization module is included after the transform encoder block, and a graph attention network module is included after the regularization module. The concepts related to the transformer encoder block, the regularization module and the graph attention network module can refer to the description in the related art, and are not repeated herein.

Fig. 8 is a schematic structural diagram of a training apparatus 800 for generating a model from data to text according to an embodiment of the present application. As shown in fig. 8, the apparatus 800 includes: an acquisition module 801 and a processing module 802.

The acquiring module 801 is configured to acquire first training data, where the first training data includes first structured data and a target text corresponding to the first structured data; the obtaining module 801 is further configured to obtain a predicted text output by a first preset neural network model after the first structured data is input into the first preset neural network model; the obtaining module 801 is further configured to obtain a first loss value between the predicted text and the target text; the obtaining module 801 is further configured to obtain a second loss value between the predicted structured data and the first structured data, where the predicted structured data is structured data after the predicted text is converted based on a preset conversion algorithm, and the preset conversion algorithm is used to convert text information into structured data; a processing module 802, configured to determine a target loss value according to the first loss value and the second loss value; the processing module 802 is further configured to adjust a parameter of the first preset neural network model according to the target loss value, and the processing module 802 is further configured to send the target neural network model to a target device

In one possible implementation, the target device is a second server.

In one possible implementation, the target device is a terminal device.

In one possible implementation, the processing module 802 is further configured to: and receiving the first preset neural network model sent by the target equipment.

In a possible implementation manner, the first training data further includes a target sequence of each data in the first structured data, where the target sequence is an arrangement sequence of a text corresponding to each data in the target text; accordingly, the obtaining module 801 is further configured to: obtaining a prediction sequence of each data in the first structured data output by the first preset neural network model after the first structured data are input into the first preset neural network model, wherein the prediction sequence is an arrangement sequence of a text corresponding to each data in the predicted text; the processing module 802 is further configured to determine a third loss value according to the prediction order and the target order, and determine a target loss value according to the first loss value, the second loss value, and the third loss value

In one possible implementation, the target loss value is equal to a sum of the first loss value, the second loss value, and the third loss value.

In a possible implementation manner, the obtaining module 801 is further configured to: obtaining M1 second structured data, the M1 second structured data comprising at least two types of data: table data, structured data query SQL data, logical data, M1 is a positive integer greater than 1; the processing module 802 is further configured to: preprocessing the M1 second structured data to obtain M1 second training data corresponding to the M1 second structured data one to one, where the second training data corresponding to the jth second structured data in the M1 second training data includes: the jth second structured data and a target text corresponding to the jth second structured data, j is a positive integer and j is taken from 1 to M1; the processing module 802 is further configured to: and training a second preset neural network model by using the M1 second training data to obtain the first preset neural network model.

In a possible implementation manner, the second preset neural network model includes N1 encoders and N2 decoders, the N1 encoders are configured to obtain a feature vector of each of the M1 structured data, the N2 decoders are configured to predict a predicted text corresponding to each of the structured data based on the feature vector of each of the structured data, and N1 and N2 are positive integers greater than 1.

Fig. 9 is a schematic structural diagram of a training apparatus 900 for generating a model from data to text according to an embodiment of the present application. The apparatus 900 is configured to perform the method described above.

The apparatus 900 includes a processor 910, and the processor 910 is configured to execute the computer program or instructions stored in the memory 920, or read data stored in the memory 920, so as to execute the method in the above method embodiments. Optionally, the processor 910 is one or more.

Optionally, as shown in fig. 9, the apparatus 900 further comprises a memory 920, and the memory 920 is used for storing computer programs or instructions and/or data. The memory 920 may be integrated with the processor 910 or may be provided separately. Optionally, there are one or more of the memories 920.

Optionally, as shown in fig. 9, the apparatus 900 further comprises a communication interface 930, and the communication interface 930 is used for receiving and/or transmitting signals. For example, processor 910 is configured to control communication interface 930 to receive and/or transmit signals.

Optionally, the apparatus 900 is configured to implement the operations described in the above method embodiments.

For example, processor 910 is configured to execute computer programs or instructions stored in memory 920 to implement the relevant operations described in the various method embodiments above. For example, processor 910 may be configured to: acquiring first training data, wherein the first training data comprises first structured data and a target text corresponding to the first structured data; acquiring a predicted text output by a first preset neural network model after the first structured data is input into the first preset neural network model; acquiring a first loss value between the predicted text and the target text; acquiring a second loss value between predicted structured data and the first structured data, wherein the predicted structured data is structured data obtained after the predicted text is converted based on a preset conversion algorithm, and the preset conversion algorithm is used for converting text information into structured data; determining a target loss value according to the first loss value and the second loss value; adjusting parameters of the first preset neural network model according to the target loss value; obtaining a target neural network model; and sending the target neural network model to a target device.

In some examples, the target device is a second server.

In some examples, the target device is a terminal device.

In some examples, processor 910 is further configured to: and receiving the first preset neural network model sent by the target equipment.

In some examples, processor 910 is further configured to: when the first training data further includes a target sequence of each data in the first structured data, where the target sequence is an arrangement sequence of a text corresponding to each data in the target text, acquiring a prediction sequence of each data in the first structured data output by the first preset neural network model after the first structured data is input into the first preset neural network model, where the prediction sequence is an arrangement sequence of a text corresponding to each data in the predicted text; determining a third loss value according to the prediction sequence and the target sequence; accordingly, the determining a target loss value from the first loss value and the second loss value comprises: determining a target loss value according to the first loss value, the second loss value and the third loss value.

In some examples, the target loss value is equal to a sum of the first loss value, the second loss value, and the third loss value.

In some examples, processor 910 is further configured to: obtaining M1 second structured data, the M1 second structured data comprising at least two types of data: table data, structured data query SQL data, logical data, M1 is a positive integer greater than 1; preprocessing the M1 second structured data to obtain M1 second training data corresponding to the M1 second structured data one to one, where the second training data corresponding to the jth second structured data in the M1 second training data includes: the jth second structured data and a target text corresponding to the jth second structured data, j is a positive integer and j is taken from 1 to M1; and training a second preset neural network model by using the M1 second training data to obtain the first preset neural network model.

In some examples, the second predetermined neural network model includes N1 encoders and N2 decoders, the N1 encoders are configured to obtain feature vectors of each of the M1 structured data, the N2 decoders are configured to predict a predicted text corresponding to each of the M1 structured data based on the feature vectors of each of the structured data, and N1 and N2 are positive integers greater than 1.

In the embodiment of the present application, the processor is a circuit having a signal processing capability, and in one implementation, the processor may be a circuit having an instruction reading and executing capability, such as a Central Processing Unit (CPU), a microprocessor, a Graphics Processing Unit (GPU) (which may be understood as a microprocessor), or a Digital Signal Processing (DSP); in another implementation, a processor may implement certain functions through the logical relationship of hardware circuits, which may be fixed or reconfigurable, such as a hardware circuit implemented by a processor as an Application Specific Integrated Circuit (ASIC) or a Programmable Logic Device (PLD), such as a Field Programmable Gate Array (FPGA). In the reconfigurable hardware circuit, the process of loading the configuration document by the processor to implement the configuration of the hardware circuit may be understood as a process of loading instructions by the processor to implement the functions of some or all of the above units. Furthermore, a hardware circuit designed for artificial intelligence may be understood as an ASIC, such as a neural-Network Processing Unit (NPU), a Tensor Processing Unit (TPU), a Data Processing Unit (DPU), or the like.

It is seen that the units in the above apparatus may be one or more processors (or processing circuits) configured to implement the above method, for example: CPU, GPU, NPU, TPU, DPU, microprocessor, DSP, ASIC, FPGA, or a combination of at least two of these processor forms.

In addition, all or part of the units in the above apparatus may be integrated together, or may be implemented independently. In one implementation, these units are integrated together, implemented in the form of a system-on-a-chip (SOC). The SOC may include at least one processor for implementing any one of the above methods or implementing functions of the units of the apparatus, and the at least one processor may be of different types, for example, including a CPU and an FPGA, a CPU and an artificial intelligence processor, a CPU and a GPU, and the like.

Accordingly, embodiments of the present application also provide a computer readable storage medium storing a computer program, which when executed by a processor causes the processor to implement the steps in the method described in fig. 2.

Accordingly, embodiments of the present application also provide a computer program product, which includes computer programs/instructions, when executed by a processor, cause the processor to implement the steps in the method described in fig. 2.

Claims

1. A training method of a data-to-text generation model is applied to a first server and comprises the following steps:

acquiring first training data, wherein the first training data comprises first structured data and a target text corresponding to the first structured data;

acquiring a predicted text output by a first preset neural network model after the first structured data is input into the first preset neural network model;

acquiring a first loss value between the predicted text and the target text;

acquiring a second loss value between predicted structured data and the first structured data, wherein the predicted structured data is structured data obtained after the predicted text is converted based on a preset conversion algorithm, and the preset conversion algorithm is used for converting text information into structured data;

determining a target loss value according to the first loss value and the second loss value;

adjusting parameters of the first preset neural network model according to the target loss value to obtain a target neural network model;

and sending the target neural network model to a target device.

2. The method of claim 1, wherein the target device is a second server.

3. The method of claim 1, wherein the target device is a terminal device.

4. The method according to any one of claims 1 to 3, further comprising:

and receiving the first preset neural network model sent by the target equipment.

5. The method according to claim 4, wherein the first training data further comprises a target sequence of each data in the first structured data, and the target sequence is an arrangement sequence of texts corresponding to each data in the target texts;

accordingly, the method further comprises:

obtaining a prediction sequence of each data in the first structured data output by the first preset neural network model after the first structured data are input into the first preset neural network model, wherein the prediction sequence is an arrangement sequence of a text corresponding to each data in the predicted text;

determining a third loss value according to the prediction sequence and the target sequence;

accordingly, the determining a target loss value from the first loss value and the second loss value comprises:

determining a target loss value according to the first loss value, the second loss value and the third loss value.

6. The method of claim 5, wherein the target loss value is equal to a sum of the first loss value, the second loss value, and the third loss value.

7. The method of claim 6, further comprising: obtaining M1 second structured data, the M1 second structured data comprising at least two types of data: tabular data, structured data query SQL data, logical data, M1 is a positive integer greater than 1;

preprocessing the M1 second structured data to obtain M1 second training data corresponding to the M1 second structured data one to one, where the second training data corresponding to the jth second structured data in the M1 second training data includes: the jth second structured data and a target text corresponding to the jth second structured data, j is a positive integer and j is taken from 1 to M1;

and training a second preset neural network model by using the M1 second training data to obtain the first preset neural network model.

8. The method of claim 7, wherein the second predetermined neural network model comprises N1 encoders and N2 decoders, the N1 encoders are used for obtaining feature vectors of each of the M1 structured data, the N2 decoders are used for predicting predicted texts corresponding to each of the M1 structured data based on the feature vectors of each of the M1 structured data, and N1 and N2 are positive integers greater than 1.

9. An apparatus for training a data-to-text generative model, comprising:

the acquisition module is used for acquiring first training data, wherein the first training data comprises first structural data and a target text corresponding to the first structural data;

the obtaining module is further configured to obtain a predicted text output by a first preset neural network model after the first structured data is input into the first preset neural network model;

the obtaining module is further configured to obtain a first loss value between the predicted text and the target text;

the obtaining module is further configured to obtain a second loss value between predicted structured data and the first structured data, where the predicted structured data is structured data obtained after the predicted text is converted based on a preset conversion algorithm, and the preset conversion algorithm is used to convert text information into structured data;

the processing module is used for determining a target loss value according to the first loss value and the second loss value;

the processing module is further configured to adjust parameters of the first preset neural network model according to the target loss value to obtain a target neural network model;

the processing module is further configured to send the target neural network model to a target device.

10. A training apparatus for data-to-text generation comprising a processor for invoking from a memory a computer program for performing the method of any one of claims 1 to 8 when the computer program is executed.

11. A computer-readable storage medium storing a computer program comprising code for performing the method of any one of claims 1 to 8.