CN114579606A

CN114579606A - Pre-training model data processing method, electronic device and computer storage medium

Info

Publication number: CN114579606A
Application number: CN202210478807.0A
Authority: CN
Inventors: 惠彬原; 黎槟华; 李永彬; 孙健
Original assignee: Alibaba Damo Institute Hangzhou Technology Co Ltd
Current assignee: Alibaba Damo Institute Hangzhou Technology Co Ltd
Priority date: 2022-05-05
Filing date: 2022-05-05
Publication date: 2022-06-03
Anticipated expiration: 2042-05-05
Also published as: CN114579606B

Abstract

The embodiment of the application provides a pre-training model data processing method, electronic equipment and a computer storage medium, wherein the pre-training model data processing method comprises the following steps: acquiring training sample data, wherein each training sample data comprises a plurality of rounds of table question and answer training samples, and each round of table question and answer training sample comprises a natural language query sentence and corresponding database mode data; inputting training sample data into a pre-training model for feature extraction to obtain a plurality of sample features corresponding to a plurality of rounds of table question-answering training samples; training a pre-training model based on a plurality of sample features, corresponding positive and negative example labels and a preset contrast learning loss function, wherein the positive and negative example labels are determined according to the similarity between a plurality of database query sentences corresponding to the sample features, and the positive and negative example labels are used for representing whether the current sample feature is semantically related to other sample features in the sample features.

Description

Pre-training model data processing method, electronic device and computer storage medium

Technical Field

The embodiment of the application relates to the technical field of form question answering, in particular to a pre-training model data processing method, electronic equipment and a computer storage medium.

Background

Because the data structure is clear and easy to maintain, the table/SQL database becomes the most common structured data applied in various industries and is also an important answer source of an intelligent dialog system, a search engine and the like. The traditional table query needs a professional to write a query statement (such as an SQL statement) to complete, and the large-scale application of the table query is hindered due to high threshold. The table question and answer technique (also known as TableQA) is increasingly widely used by allowing users to interact directly with a table database using natural language by converting the natural language directly into SQL queries.

A form question-answering system is mainly composed of three parts including a natural language understanding part, a dialogue management part and a natural language generating part. The natural language understanding part mainly executes a semantic analysis algorithm and converts a natural language question sentence into a corresponding executable SQL sentence; the dialogue management part executes multi-round state tracking and strategy optimization; the natural language generating part generates a corresponding reply according to the analyzed SQL statement and the SQL execution result. For the natural language understanding part, the natural language understanding part of the follow-up table question-answering system is supported by the training output of the pre-training model at present. The pre-training model is an application of transfer learning, model parameters irrelevant to a specific task are obtained from large-scale data through self-supervision learning, and when a new task is supported, the pre-training model can be fine-tuned only by using the labeled data of the task.

However, since most studies are made on the TableQA problem of a single round, the pre-training model is basically solving the single round. In a real scene, a user often needs to obtain an expected answer through multiple rounds of queries, so that multiple rounds of TableQA problems are paid more and more attention, and how to obtain a pre-training model meeting the scene becomes a problem to be solved urgently.

Disclosure of Invention

In view of the above, embodiments of the present application provide a pre-training model data processing scheme to at least partially solve the above problems.

According to a first aspect of the embodiments of the present application, there is provided a pre-training model data processing method, including: acquiring training sample data, wherein each training sample data comprises a plurality of rounds of table question and answer training samples, and each round of table question and answer training sample comprises a natural language query statement and corresponding database mode data; inputting the training sample data into a pre-training model for feature extraction to obtain a plurality of sample features corresponding to a plurality of rounds of table question-answer training sample conversion; and training the pre-training model based on positive and negative example labels respectively corresponding to the sample features and a preset comparison learning loss function, wherein the positive and negative example labels are determined according to the similarity between the database query sentences corresponding to the sample features, and the positive and negative example labels are used for representing whether the current sample feature is semantically related to other sample features in the sample features.

According to a second aspect of embodiments of the present application, there is provided an electronic apparatus, including: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus; the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the corresponding operation of the method according to the first aspect.

According to a third aspect of embodiments herein, there is provided a computer storage medium having stored thereon a computer program which, when executed by a processor, performs the method according to the first aspect.

According to a fourth aspect of embodiments herein, there is provided a computer program product comprising computer instructions for instructing a computing device to perform operations corresponding to the method according to the first aspect.

According to the data processing scheme of the pre-training model provided by the embodiment of the application, firstly, training sample data comprises a plurality of rounds of table question-answer training samples, so that the training sample data carries table question-answer context information, and the pre-training model trained on the basis of the training sample data can learn the context information through learning. Secondly, when loss calculation is carried out, the model is trained based on positive and negative example labels respectively corresponding to the characteristics of the samples, and the form question-answer training samples with semantic correlation can be clearly distinguished, so that the model can better learn the semantics and the mutual relation of the form question-answer training samples. Moreover, the loss function of the pre-training model adopts a comparison learning loss function form, and through the form, the training sample most relevant to the current training sample can be found from the multiple rounds of table question-answer training samples, namely, useful training samples are reserved, useless training samples are removed, and the effect of deeply understanding the context relationship is achieved. Therefore, the pre-training model after training can be effectively suitable for multi-round query scenes. After the query is migrated to the form question-answering system, the form question-answering system can be effectively applied to a real scene in which a user needs to obtain a query result through multiple rounds of queries.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the description below are only some embodiments described in the embodiments of the present application, and other drawings can be obtained by those skilled in the art according to these drawings.

FIG. 1 is a schematic diagram of an exemplary system to which a pre-trained model data processing method of an embodiment of the present application is applicable;

FIG. 2 is a flowchart illustrating steps of a method for processing pre-training model data according to an embodiment of the present disclosure;

FIG. 3A is a flowchart illustrating steps of a method for processing pre-training model data according to a second embodiment of the present application;

FIG. 3B is an exemplary diagram of an input example of a pre-trained model in the embodiment shown in FIG. 3A;

FIG. 3C is a diagram illustrating an example of a scenario in the embodiment shown in FIG. 3A;

fig. 4 is a schematic structural diagram of an electronic device according to a third embodiment of the present application.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the embodiments of the present application, the technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, but not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application shall fall within the scope of the protection of the embodiments in the present application.

The following further describes specific implementations of embodiments of the present application with reference to the drawings of the embodiments of the present application.

FIG. 1 is a diagram illustrating an exemplary system to which a pre-training model data processing method according to an embodiment of the present application is applied. As shown in fig. 1, the system 100 may include a server 102, a communication network 104, and/or one or more user devices 106, illustrated in fig. 1 as a plurality of user devices.

Server 102 may be any suitable server for storing information, data, programs, and/or any other suitable type of content. In some embodiments, server 102 may perform any suitable functions. For example, in some embodiments, a form question and answer system is provided in server 102 to process user-entered query requests relating to forms or databases and return query results. As an optional example, in some embodiments, a pre-training model, which may also be referred to as a table pre-training model, is further provided in the server 102, so as to migrate to the table question-answering system for use after the training is completed. As an optional example, in some embodiments, the pre-training model in the server 102 is trained based on multiple rounds of table question-and-answer training samples, and after obtaining multiple sample features corresponding to the multiple rounds of table question-and-answer training samples, based on positive and negative example labels corresponding to the sample features, the positive and negative example labels may characterize semantic correlations between current sample features and other sample features; and then, training the pre-training model based on the plurality of sample characteristics, the corresponding positive and negative example labels and a preset comparison learning loss function, so that the pre-training model obtained by training can deeply understand the context of the query sentence, and can mine effective information and eliminate useless information.

In some embodiments, the communication network 104 may be any suitable combination of one or more wired and/or wireless networks. For example, the communication network 104 can include any one or more of the following: the internet, an intranet, a Wide Area Network (WAN), a Local Area Network (LAN), a wireless network, a Digital Subscriber Line (DSL) network, a frame relay network, an Asynchronous Transfer Mode (ATM) network, a Virtual Private Network (VPN), and/or any other suitable communication network. The user device 106 can be connected to the communication network 104 by one or more communication links (e.g., communication link 112), and the communication network 104 can be linked to the server 102 via one or more communication links (e.g., communication link 114). The communication link may be any communication link suitable for communicating data between the user device 106 and the server 102, such as a network link, a dial-up link, a wireless link, a hardwired link, any other suitable communication link, or any suitable combination of such links.

User devices 106 may include any one or more user devices of settings and interfaces that interact with a user. In some embodiments, user devices 106 may comprise any suitable type of device. For example, in some embodiments, the user device 106 may include a mobile device, a tablet computer, a laptop computer, a desktop computer, a wearable computer, a game console, a media player, a vehicle entertainment system, and/or any other suitable type of user device.

Although server 102 is illustrated as one device, in some embodiments, any suitable number of devices may be used to perform the functions performed by server 102. For example, in some embodiments, multiple devices may be used to implement the functions performed by the server 102. Alternatively, the functionality of the server 102 may be implemented using a cloud service.

Based on the above system, the embodiment of the present application provides a pre-training model data processing scheme, which is described in detail below through a plurality of embodiments.

Example one

Referring to fig. 2, a flowchart illustrating steps of a pre-training model data processing method according to a first embodiment of the present application is shown.

The pre-training model data processing method of the embodiment comprises the following steps:

step S202: and acquiring training sample data.

Since in the form question-answering system, schema linkage (the link between the natural language query statement and the database schema data) is one of its most important parts, it needs to be modeled based on the natural language query statement and the database schema data. Because the pre-training model in the embodiment of the application is subsequently used in the form question-answering system, when the pre-training model is trained, each training sample data used by the pre-training model comprises a plurality of rounds of form question-answering training samples, and each round of form question-answering training samples comprises a natural language query statement and corresponding database mode data. During specific training, each round of the multi-round table question and answer training samples can be used as the currently processed sample in turn, and training is performed by combining other training samples in the multi-round table question and answer training samples to which the round of the multi-round table question and answer training samples belong. It should be noted that in the embodiment of the present application, "multiple rounds" means a plurality of consecutive rounds, and the multiple rounds of table question and answer training samples may or may not have a relationship, or have a relationship partially and have no relationship partially. In the embodiments of the present application, the numbers "plural" and "plural" refer to two or more than two unless otherwise specified.

For each round of the table question-answer training sample, the natural language query statement and the database mode data are included in two different parts. The natural language query statement can be data corresponding to historical user query requests obtained under the condition that user data is authorized to be used; alternatively, the set of extended data generated according to a certain extension rule based on the data corresponding to the partial historical user query request and the data corresponding to the partial historical user query request may be used. Correspondingly, each natural language query statement corresponds to database schema data of the database or data table it queries. Database schema data, also called schema data of a database, is a set of interrelated database objects used to characterize information such as tables, table columns, data types of columns, indexes, foreign keys, etc. in the database. In the embodiment of the application, the database schema data mainly comprises data of table names, column names and values of the data table. Based on this, each set of natural language query sentences and its corresponding database schema data may be used as a (round of) table question-answer training sample.

Step S204: inputting training sample data into a pre-training model for feature extraction to obtain a plurality of sample features corresponding to the multi-round table question-answer training samples.

In the embodiment of the application, the pre-training model may output a feature representation corresponding to each item in the training sample, that is, a sample feature, according to a table question-answer training sample including a natural query statement and database pattern data. In one possible approach, the pre-trained model may be implemented based on a Transformer structure. Illustratively, the table question and answer training samples can be encoded by an Encoder of a Transformer to generate an encoding vector, and then a Decoder of the Transformer decodes a representation corresponding to each item of the table question and answer training samples.

Step S206: and training the pre-training model based on the positive and negative labels respectively corresponding to the plurality of sample characteristics and a preset comparison learning loss function.

The positive and negative example labels are determined according to the similarity between a plurality of database query sentences corresponding to a plurality of sample characteristics. The multiple database query sentences are generated according to the natural language query sentences in the training samples, and positive and negative example labels among the multiple training samples can be determined based on the similarity among the multiple natural language query sentences corresponding to the multiple rounds of table question-answer training samples. And because of the corresponding relation between the sample characteristics obtained by the pre-training model and the training samples, the positive and negative example labels among the multiple rounds of training samples can be used as the positive and negative example labels among the corresponding multiple sample characteristics.

The positive and negative examples are used to characterize whether the current sample feature is semantically related to other sample features of the plurality of sample features. For example, if a training sample data includes three rounds of table question-and-answer training samples, their respective natural language query statements are A, B and C, and the corresponding database query statements, i.e., SQL statements, are a ', B ', and C '. Assuming that the natural language query sentence a is used as a reference, if the semantic similarity between a 'and C' is greater than a preset similarity threshold, and the semantic similarity between a 'and B' is less than the preset similarity threshold, then a positive example is given with respect to a and C, and a negative example is given with respect to B. The preset similarity threshold may be set by a person skilled in the art according to actual requirements, and the embodiment of the present application does not limit this. Illustratively, it may be 0.5. Similarly, if the natural language query statement C is used as a reference, a is a positive example and B is a negative example with respect to C. On the other hand, if the natural language query statement B is used as a reference, a and C are negative examples with respect to B.

Further, if the pre-training model is used for feature extraction, and the sample features corresponding to the table question-answer training samples to which A, B and C belong are A1, B1 and C1 respectively, then the positive and negative example labels corresponding to the table question-answer training samples in the table above are known, and relative to A1, C1 is a positive example, and B1 is a negative example; relative to C1, a1 is positive and B1 is negative; both a1 and C1 are negative examples relative to B1.

In addition, in the embodiment of the application, a comparison learning loss function is also used for training the pre-training model. The contrast learning loss function is a loss function for dimension reduction learning, and can learn a mapping relation, wherein the mapping relation can make the distance of points in the same category but far away in a high-dimensional space become closer after the points are mapped to a low-dimensional space through the function, and the distance of points in different categories but near becomes farther in the low-dimensional space after the points are mapped. As a result, in the low-dimensional space, the same kind of points will have a clustering effect, and different kinds of points will be separated. By comparing the characteristics of the learning loss function, the embodiment of the application is applied to the training of the pre-training model of the embodiment of the application, so that the distance between positive examples related to semantics is closer, and the distance between negative examples unrelated to semantics is farther, effective information is mined, useless information is eliminated, and the effect of deep context understanding is achieved.

Based on the method, the corresponding loss value is obtained through the sample characteristics, the positive and negative example labels corresponding to the sample characteristics and the preset comparison learning loss function. Further, the pre-training model is trained based on the loss value until a training termination condition is reached (e.g., a preset training number is reached or the loss value reaches a preset threshold range).

As can be seen, according to this embodiment, first, training sample data includes multiple rounds of form question-answer training samples, so that the training sample data carries form question-answer context information, and a pre-training model trained based on the training sample data learns the context information through learning. Secondly, when loss calculation is carried out, the model is trained based on positive and negative example labels respectively corresponding to the characteristics of the samples, and the form question-answer training samples with semantic correlation can be clearly distinguished, so that the model can better learn the semantics and the mutual relation of the form question-answer training samples. Moreover, the loss function of the pre-training model adopts a comparison learning loss function form, and through the form, the training sample most relevant to the current training sample can be found from the multiple rounds of table question-answer training samples, namely, useful training samples are reserved, useless training samples are removed, and the effect of deeply understanding the context relationship is achieved. Therefore, the pre-training model after training can be effectively suitable for multi-round query scenes. After the query is migrated to the form question-answering system, the form question-answering system can be effectively applied to a real scene in which a user needs to obtain a query result through multiple rounds of queries.

Example two

Referring to fig. 3A, a flowchart illustrating steps of a method for processing pre-training model data according to a second embodiment of the present application is shown.

step S302: and acquiring training sample data.

Each round of table question-answer training sample comprises a natural language query statement, corresponding database mode data and positive and negative labels corresponding to each round of table question-answer training sample.

For example, assuming that the natural language query statement is "name of student whose height is told to be over 180 in trouble", and the corresponding database schema data is "name height sex", this set of data may form a round of form question-and-answer training samples, i.e., "name/name height sex of student whose height is told to be over 180 in trouble". Multiple similar form question-answer training samples can form multiple rounds of form question-answer training samples.

In a feasible manner, the positive and negative example labels corresponding to each round of the table question-answer training sample can be obtained by the following method: acquiring a plurality of database query sentences pre-generated for a plurality of rounds of table question-answer training samples; calculating the similarity among a plurality of database query sentences according to a preset Jacard function; and determining positive and negative example labels respectively corresponding to the multiple rounds of table question-answering training samples according to the similarity.

The pre-generated database query sentences can be generated by means of manual labeling and the like.

Because the SQL sentences corresponding to each two rounds of table question-answer training samples are more similar, the two rounds of table question-answer training samples are more semantically related. Exemplarily, assuming that SQL statements corresponding to natural language query statements in two rounds of table question-and-answer training samples are a and B, respectively, the Jaccard similarity between a and B can be calculated by the following formula:

in the above formula, A (including all elements in A's SQL statement) and B (package) are combinedAll elements in B-inclusive SQL statements)

At the union of A and B

The ratio of (A) to (B) is represented by the symbol J (A, B). The Jaccard similarity coefficient is an index for measuring the similarity of the A set and the B set, namely the similarity between the A set and the B set is calculated, and the value of an element is 0 or 1. Alternatively, a Jaccard similarity greater than 0.5 may be set as the semantically related decision threshold.

The similarity among the SQL sentences is determined through the similarity obtained based on the Jaccard function, and then the form of positive and negative example labels of the corresponding form question-answer training samples is determined, so that the method and the device can be effectively applied to similarity evaluation only under the condition that data of multiple rounds of form question-answer training samples are sparse, and misleading results are often generated in the data sparse scene due to traditional cosine similarity evaluation. And the adopted Jaccard can comprehensively and effectively utilize the information of all data in the scene, thereby obtaining more accurate results. The positive and negative example labels can be used as positive and negative example labels of sample characteristics corresponding to follow-up form question-answering training samples.

Step S304: inputting training sample data into a pre-training model for feature extraction to obtain a plurality of sample features corresponding to the multi-round table question-answer training samples.

In order to facilitate the processing of training sample data by a pre-training model, in a feasible manner of the embodiment of the application, natural language query sentences in multiple rounds of table question-answering training samples can be subjected to first splicing to obtain first spliced data; performing second splicing on database mode data in the multi-round table question-answer training samples to obtain second spliced data; splicing the first spliced data and the second spliced data, and inserting separators between the spliced natural language query sentences, between the spliced database mode data and between the natural language query sentences and the database mode data; and generating a corresponding splicing vector according to the splicing data after the separator is inserted, inputting the splicing vector into a pre-training model, and extracting the characteristics through the pre-training model to further obtain a plurality of sample characteristics corresponding to the multi-round table question-answer training samples. By adding the separators, the natural language query sentences, the database mode data and the natural language query sentences and the database mode data are separated, so that the subsequent identification and processing of the natural language query sentences and the database mode data can be facilitated, and the model training speed and efficiency are improved.

Illustratively, the input to a pre-training model is shown in FIG. 3B, which is simply illustrated as "[ s ] first round of ForskFleek training samples [/s ] second round of ForskFleek training samples [/s ] third round of ForskFleek training samples [/s ] pattern entry 1[/s ] pattern entry 2[/s ] pattern entry 3". In one specific example, assume that the first round of the form question-and-answer training sample is "the name of the student/the name height gender that trouble tells me that the height exceeds 180"; the second round of the table question-answer training sample is ' who the director of the cartoon ' basket senior hand ' is/the name of the cartoon director ' basket senior hand '; the third round of the tabular question-and-answer training sample is "what animator/animator name this director has also directed". In the three example tabular question-answering training samples described above, the preceding "/" represents a natural language query statement, and the following "/" represents the corresponding database schema data. After the three rounds of form question-answer training samples are spliced and separators are played, the problem of [ s ] is obtained to tell I the name of the student with the height exceeding 180 [/s ] cartoon' the director of the basket high hand [/s ] is who the director also has the name of the cartoon [/s ] name [/s ] height [/s ] sex [/s ] cartoon [/s ] director [/s ] name [/s ] basket high hand ]. In the above example, [ s ] is a start symbol and [/s ] is a separator. It should be noted that the above-mentioned [ s ] as the separator and [ s ] as the start symbol are only exemplary, in practical applications, those skilled in the art may adopt other forms of separators and start symbols according to actual needs, and the embodiment of the present application does not limit the specific implementation form of the separator.

After the splicing data added with the separators is obtained, the splicing data is converted into a corresponding vector form, namely a splicing vector, and the splicing vector can be input into a pre-training model.

Step S306: and determining positive and negative example labels respectively corresponding to the plurality of sample characteristics.

As described above, the positive and negative example labels are determined according to the similarity between the plurality of database query statements corresponding to the plurality of sample features. In essence, because the plurality of sample features correspond to the plurality of form question and answer training samples one-to-one, and the plurality of form question and answer training samples correspond to the plurality of database query sentences one-to-one, the plurality of database query sentences corresponding to the plurality of form question and answer training samples can be regarded as the database query sentences corresponding to the plurality of sample features. Furthermore, the positive and negative example labels of the multiple form question-answer training samples determined based on the similarity of the multiple database query sentences can be directly corresponding to the multiple sample characteristics as the positive and negative example labels respectively corresponding to the multiple sample characteristics. That is, after the positive and negative example labels respectively corresponding to the multiple rounds of table question and answer training samples are determined, the positive and negative example labels respectively corresponding to the multiple rounds of table question and answer training samples can be directly used as the positive and negative example labels of the corresponding multiple sample characteristics.

Wherein the positive and negative case labels are used to characterize whether the current sample feature is semantically related to other sample features of the plurality of sample features. With the positive and negative case labels, it can be determined whether a certain sample feature is a positive case or a negative case with respect to the sample feature currently being processed. The positive example, namely the sample features, have semantic correlation, and the negative example, namely the sample features, have no semantic correlation.

Step S308: and training the pre-training model based on the positive and negative labels respectively corresponding to the plurality of sample characteristics and a preset comparison learning loss function.

With positive and negative example labels, for a sample feature, sample features of positive examples related to its semantics and/or negative examples unrelated to its semantics can be determined simply and quickly. Further, a loss value can be obtained by combining a preset loss function, and the pre-training model is trained based on the loss value.

In one possible approach, this step can be implemented as: zooming in the distance between the sample features indicated as positive examples in the positive and negative example labels by minimizing a preset comparative learning loss function, and zooming out the distance between the sample features indicated as negative examples in the positive and negative example labels; and training the pre-training model according to the loss value of the comparison learning loss function after the minimization processing.

Illustratively, the contrast learning loss function may be implemented as the following equation:

wherein h is_iRepresenting the sample characteristics output by the pre-training model aiming at the ith table question-answer training sample, wherein i is more than or equal to 1 and less than or equal to N (N is the number of rounds of the multi-round table question-answer training samples); sim represents the similarity function (dot product); e represents an exponential function; h is_jRepresenting the sample characteristics output by the pre-training model aiming at the jth table question-answer training sample, wherein j is more than or equal to 1 and less than or equal to N (N is the number of rounds of the multi-round table question-answer training samples), h + represents the positive example of h, and h-represents the negative example of h; τ represents the temperature function and is used for normalization.

By minimizing the comparison learning loss function, the distance between positive examples is shortened, and the distance between negative examples is lengthened, so that the training sample most relevant to the current training sample is found, namely, useful training samples are reserved, useless training samples are removed, and the effect of deeply understanding the context relationship is achieved.

And (4) carrying out iterative reciprocating training on the pre-training model until a model training termination condition is reached, such as reaching a preset training time or a loss value reaching a preset threshold range. After the termination condition is reached, the model training is considered to be completed. The trained pre-training model can effectively understand the meaning of each natural language query sentence, and particularly can accurately understand the reference relation by combining the corresponding context of the natural language query sentence containing the reference relational words, so that the semantics of the whole natural language query sentence containing the reference relational words can be accurately understood, and more accurate semantic features are output.

After the pre-training model training is completed, subsequent migration applications may be performed. For the sake of understanding, the migration process is continued to be described by the following step S310 in the present embodiment, but it should be understood by those skilled in the art that the training process to the pre-training model of step S308 forms a complete solution, and the following step S310 is an optional step. In practical applications, step S308 and step S310 do not need to be executed continuously, and those skilled in the art can migrate the trained pre-training model to the form question-answering system at any time according to actual needs.

Step S310: and carrying out model migration from the pre-trained model to the form question-answering system based on the model parameters of the pre-trained model after training.

In one possible approach, this step can be implemented as: model migration from the pre-trained model to the form question-and-answer system is performed by migrating the model parameters in the pre-trained model after training to the natural language understanding part of the form question-and-answer system. The natural language understanding part comprises a part for performing semantic understanding on the natural language query statement and a part for converting the features output by the semantic understanding part into the SQL statement. Therefore, in particular, the model parameters in the pre-training model can be migrated to the semantic understanding part and combined with the subsequent feature conversion part to realize the Text-to-SQL function of the natural meaning understanding part.

Because the pre-training model is the training for the table question-answering system, the model parameters learned by the pre-training model can be directly transplanted to the semantic understanding part in the natural language understanding part of TableQA, and by means of the migrated model parameters, in combination with the part for converting the SQL statements, the natural language understanding part can not only perform semantic analysis on multiple rounds of query statements input in natural language, but also determine the associated query statements according to semantics to perform deep context understanding, and finally convert the query statements into accurate and executable SQL statements based on semantic information between the query statements with semantic association.

Exemplarily, the natural language understanding part can be realized as a text-to-SQL model, specifically, a form of a seq2seq neural network model can be adopted, a plurality of rounds of query sentences are input, query sentences with semantic association are extracted, query sentences without semantic association are excluded, and SQL sentences corresponding to the query sentences with semantic association are finally output. The semantic understanding part can adopt a model structure such as a Transformer and the like.

The natural language understanding part of TableQA after the model migration is completed is combined with the trained dialogue management part and the natural language generation part, so that a complete form question-answering system can be formed, and the corresponding form question-answering function is realized.

Hereinafter, a process of performing the form question answering by the above form question answering system will be schematically described by referring to fig. 3C in conjunction with optional step S312.

Step S312: receiving multiple rounds of natural language query requests input by a user, and returning corresponding query results through a form question-answering system.

In one possible approach, this step can be implemented as: receiving a current natural language query request input by a user, acquiring a preset number of historical natural language query requests before the current natural language query request, and generating multiple rounds of natural language query requests based on the current natural language query request and the historical natural language query requests; analyzing multiple rounds of natural language query requests through a natural language understanding part of a table question-answering system, and determining the natural language query requests which have semantic correlation with the current natural language query requests in the multiple rounds of natural language query requests; and generating a corresponding database query statement for the current natural language query request according to semantic correlation between the historical natural language query request and the current natural language query request.

Illustratively, as shown in FIG. 3C, assume that the user enters three rounds of natural language query requests, respectively: the user asks first "which animations are directed by the known director, Ben Jones or Brandon Vietti" (first round of natural language query request); the user may then ask other questions around this form, such as "help me show the first time of all animations" (second round of natural language query requests); however, when the third round, the user may continue to ask the first round of questions, such as "how long the animation of the known director is long" (third round of natural language query requests). In this case, it is necessary to know that the known director refers to BenJones or Brandon Vietti to complete the correct parsing for the third round of natural language query request. In a conversational session that includes the three rounds of natural language query requests described above, the first and third rounds are strongly semantically related, while the second round is actually redundant information for the third round. Therefore, there is a need for a form question-and-answer system that can recognize the above semantically related natural language query requests and parse them correctly to generate accurate SQL statements.

Based on this, when the user inputs the query request of "how long the animation of the known director is long", the three rounds of natural language query requests are all input into the table question-answering system, then the semantic relevance is determined and processed through the natural language understanding part (which can be realized as seq2seq model, the parameter of which is migrated FROM the model parameter of the pre-training model), the second round of natural language query request is removed, and the "known director" in the third round of natural language query request is determined to mean "BenJones or BrandonVietti" based on the relevance between the first round and the third round of natural language query requests, and the third round of natural language query request is converted into SQL statement "SELECT name, time length FROM animation drawing" based on the resultWHERE director = "Ben Jones" OR director = "Brandon Vietti""wherein the italic part is obtained by supplementing the information of the" known director "in the third round of natural language query request according to the semantic correlation between the first round of natural language query request and the third round of natural language query request. Therefore, based on the semantic correlation between the SQL sentences and the query result, the SQL sentences generated by conversion, especially the SQL sentences corresponding to the third round of natural language query request, are more accurate.

Furthermore, the natural language generation part of the table question-answering system can access a corresponding database based on the SQL sentence, obtain two query results which meet the query conditions and respectively correspond to the first round of natural language query request and the third round of natural language query request, further generate a reply corresponding to the two rounds of query requests based on the two query results, and the reply can be fed back to the user.

As can be seen from the above, firstly, the training sample data includes multiple rounds of form question-answer training samples, so that the training sample data carries form question-answer context information, and the pre-training model trained based on the training sample data learns the context information through learning. Secondly, when loss calculation is carried out, the model carries out training based on positive and negative example labels respectively corresponding to the sample characteristics, and the form question-answer training samples with semantic correlation can be clearly distinguished, so that the model can better learn the semantics and the mutual relation of the form question-answer training samples. Moreover, the loss function of the pre-training model adopts a comparison learning loss function form, and through the form, the training sample most relevant to the current training sample can be found from the multiple rounds of table question-answer training samples, namely, useful training samples are reserved, useless training samples are removed, and the effect of deeply understanding the context relationship is achieved. Therefore, the pre-training model after training can be effectively suitable for multi-round query scenes. After the query result is migrated to the form question-answering system, the form question-answering system can also be effectively applied to a real scene in which a user needs to obtain the query result through multiple rounds of queries, and can convert and generate more accurate SQL sentences based on semantic correlation among natural language query requests so as to obtain more accurate query results.

EXAMPLE III

Referring to fig. 4, a schematic structural diagram of an electronic device according to a third embodiment of the present application is shown, and the specific embodiment of the present application does not limit a specific implementation of the electronic device.

As shown in fig. 4, the electronic device may include: a processor (processor)402, a Communications Interface 404, a memory 406, and a Communications bus 408.

Wherein:

the processor 402, communication interface 404, and memory 406 communicate with each other via a communication bus 408.

A communication interface 404 for communicating with other electronic devices or servers.

The processor 402 is configured to execute the program 410, and may specifically execute the relevant steps in the above-described pre-training model data processing method embodiment.

In particular, program 410 may include program code comprising computer operating instructions.

The processor 402 may be a CPU, or an application Specific Integrated circuit (asic), or one or more Integrated circuits configured to implement embodiments of the present application. The intelligent device comprises one or more processors which can be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.

And a memory 406 for storing a program 410. Memory 406 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

The program 410 may be specifically configured to enable the processor 402 to execute operations corresponding to the pre-training model data processing method described in any of the method embodiments.

For specific implementation of each step in the program 410, reference may be made to corresponding steps and corresponding descriptions in units in the foregoing method embodiments, and corresponding beneficial effects are provided, which are not described herein again. It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described devices and modules may refer to the corresponding process descriptions in the foregoing method embodiments, and are not described herein again.

The embodiment of the present application further provides a computer program product, which includes computer instructions for instructing a computing device to execute an operation corresponding to any one of the pre-training model data processing methods in the foregoing method embodiments.

It should be noted that, according to the implementation requirement, each component/step described in the embodiment of the present application may be divided into more components/steps, and two or more components/steps or partial operations of the components/steps may also be combined into a new component/step to achieve the purpose of the embodiment of the present application.

The above-described methods according to embodiments of the present application may be implemented in hardware, firmware, or as software or computer code storable in a recording medium such as a CD ROM, a RAM, a floppy disk, a hard disk, or a magneto-optical disk, or as computer code originally stored in a remote recording medium or a non-transitory machine-readable medium downloaded through a network and to be stored in a local recording medium, so that the methods described herein may be stored in such software processes on a recording medium using a general-purpose computer, a dedicated processor, or programmable or dedicated hardware such as an ASIC or FPGA. It is understood that a computer, processor, microprocessor controller, or programmable hardware includes memory components (e.g., RAM, ROM, flash memory, etc.) that can store or receive software or computer code that, when accessed and executed by a computer, processor, or hardware, implements the methods described herein. Further, when a general-purpose computer accesses code for implementing the methods illustrated herein, execution of the code transforms the general-purpose computer into a special-purpose computer for performing the methods illustrated herein.

Those of ordinary skill in the art will appreciate that the various illustrative elements and method steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the embodiments of the present application.

The above embodiments are only used for illustrating the embodiments of the present application, and not for limiting the embodiments of the present application, and those skilled in the relevant art can make various changes and modifications without departing from the spirit and scope of the embodiments of the present application, so that all equivalent technical solutions also belong to the scope of the embodiments of the present application, and the scope of patent protection of the embodiments of the present application should be defined by the claims.

Claims

1. A pre-training model data processing method comprises the following steps:

acquiring training sample data, wherein each training sample data comprises a plurality of rounds of table question and answer training samples, and each round of table question and answer training sample comprises a natural language query statement and corresponding database mode data;

inputting the training sample data into a pre-training model for feature extraction to obtain a plurality of sample features corresponding to a plurality of rounds of table question-answering training samples;

training the pre-training model based on positive and negative example labels respectively corresponding to the sample features and a preset contrast learning loss function, wherein the positive and negative example labels are determined according to the similarity between the database query sentences corresponding to the sample features, and the positive and negative example labels are used for representing whether the current sample feature is semantically related to other sample features in the sample features.

2. The method of claim 1, wherein the positive and negative case labels respectively corresponding to the plurality of sample features are determined by:

obtaining a plurality of database query sentences pre-generated for the multi-round table question-answer training samples;

calculating the similarity among the plurality of database query sentences according to a preset Jacard function;

determining positive and negative example labels respectively corresponding to the multiple rounds of table question-answering training samples according to the similarity;

and respectively corresponding positive and negative example labels of the multi-round table question-answering training samples as corresponding positive and negative example labels of the multiple sample characteristics.

3. The method of claim 1, wherein the training the pre-training model based on the positive and negative example labels respectively corresponding to the plurality of sample features and a preset comparative learning loss function comprises:

zooming in the distance between the sample features indicated as positive examples in the positive and negative example labels and zooming out the distance between the sample features indicated as negative examples in the positive and negative example labels by minimizing the preset contrast learning loss function;

and training the pre-training model according to the loss value of the comparison learning loss function after minimization.

4. The method of claim 1, wherein said inputting said training sample data into a pre-training model for feature extraction comprises:

performing first splicing on natural language query sentences in multiple rounds of table question-answer training samples to obtain first spliced data; performing second splicing on database mode data in the multi-round table question-answer training samples to obtain second spliced data;

splicing the first spliced data and the second spliced data, and inserting separators between the spliced natural language query sentences, between the spliced database schema data and between the natural language query sentences and the database schema data;

and generating a corresponding splicing vector according to the splicing data after the separator is inserted, inputting the splicing vector into the pre-training model, and extracting the characteristics through the pre-training model.

5. The method of any of claims 1-4, wherein the method further comprises:

and carrying out model migration from the pre-trained model to a form question-answering system based on the trained model parameters of the pre-trained model.

6. The method of claim 5, wherein the performing model migration from the pre-trained model to a form question and answer system based on model parameters of the pre-trained model after training comprises:

and migrating the model parameters in the trained pre-training model to a natural language understanding part of the form question-answering system, so as to perform model migration from the pre-training model to the form question-answering system.

7. The method of claim 6, wherein the method further comprises:

receiving a current natural language query request input by a user, acquiring a preset number of historical natural language query requests before the current natural language query request, and generating multiple rounds of natural language query requests based on the current natural language query request and the historical natural language query requests;

analyzing the multiple rounds of natural language query requests through a natural language understanding part of the table question-answering system, and determining historical natural language query requests which have semantic relevance with the current natural language query requests in the multiple rounds of natural language query requests;

and generating a corresponding database query statement for the current natural language query request according to semantic correlation between the historical natural language query request and the current natural language query request.

8. An electronic device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;

the memory is used for storing at least one executable instruction which causes the processor to execute the corresponding operation of the method according to any one of claims 1-7.

9. A computer storage medium having stored thereon a computer program which, when executed by a processor, carries out the method of any one of claims 1 to 7.

10. A computer program product comprising computer instructions to instruct a computing device to perform operations corresponding to the method of any of claims 1-7.