CN114817295A

CN114817295A - Multi-table Text2sql model training method, system, device and medium

Info

Publication number: CN114817295A
Application number: CN202210416046.6A
Authority: CN
Inventors: 吴粤敏; 舒畅; 陈又新
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2022-04-20
Filing date: 2022-04-20
Publication date: 2022-07-29
Anticipated expiration: 2042-04-20
Also published as: CN114817295B

Abstract

The invention discloses a multi-table Text2sql model training method, a system, a device and a medium, which can be applied to the technical field of artificial intelligence. According to the method, a single table Text2SQL model is trained through a first natural statement and a first label SQL statement corresponding to the first natural statement to obtain a trained single table Text2SQL model, then parameters of the trained single table Text2SQL model are transferred to a multi-table Text2SQL model based on a migration learning mode, and then the multi-table Text2SQL model after parameter migration is trained through a second natural statement and a second SQL statement corresponding to the second natural statement, so that the multi-table Text2SQL model can achieve rapid training convergence on the basis of the parameters of the single table Text2SQL model, and the multi-table Text2SQL model can further improve statement conversion accuracy when multi-table data is queried.

Description

Multi-table Text2sql model training method, system, device and medium

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a multi-table Text2sql model training method, a multi-table Text2sql model training system, a multi-table Text2sql model training device and a multi-table Text2sql model training medium.

Background

The Text2SQL (Text to SQL) model is a model that converts natural language statements into SQL query statements that can be recognized by a database, and enables a user to interact directly with the database. The Text2sql comprises a single table Text2sql and a multi-table Text2sql, wherein the single table Text2sql refers to the fact that the natural language sentences input by the user only relate to one table in the database, and the multi-table Text2sql refers to the fact that the natural language sentences input by the user relate to multiple tables in the database. In the related art, the task of the multi-table Text2sql is more complex and difficult than the task of the single-table Text2sql, so that the accuracy of sentence conversion of the multi-table Text2sql on a public data set is low, and the precision of data query by a user is poor.

Disclosure of Invention

The following is a summary of the subject matter described in detail herein. This summary is not intended to limit the scope of the claims.

The embodiment of the invention provides a multi-table Text2sql model training method, a system, a device and a medium, which can effectively improve the sentence conversion accuracy of a multi-table Text2sql model, thereby improving the precision of user query data.

On one hand, the embodiment of the invention provides a multi-table Text2sql model training method, which comprises the following steps:

acquiring a first training data set; the first training data set comprises a first natural statement and a first label SQL statement corresponding to the first natural statement; the first natural sentence is used for acquiring first target data of a single table in a database; the first label SQL statement is used for inquiring and returning the first target data;

training a single-table Text2sql model according to the first training data set to obtain a trained single-table Text2sql model;

transferring the parameters of the trained single-table Text2sql model to a multi-table Text2sql model in a transfer learning mode;

acquiring a second training data set; the second training data set comprises a second natural statement and a second SQL statement corresponding to the second natural statement; the second natural sentence is used for acquiring second target data of a plurality of tables in the database; the second label SQL statement is used for inquiring and returning the second target data;

and training the multi-table Text2sql model after the parameter migration according to the second training data set to obtain the trained multi-table Text2sql model.

On the other hand, the embodiment of the invention provides a multi-table Text2sql model training system, which comprises the following components:

a first module for obtaining a first training data set; the first training data set comprises a first natural statement and a first label SQL statement corresponding to the first natural statement; the first natural sentence is used for acquiring first target data of a single table in a database; the first label SQL statement is used for inquiring and returning the first target data;

the second module is used for training the single-table Text2sql model according to the first training data set to obtain a trained single-table Text2sql model;

the third module is used for transferring the parameters of the trained single-table Text2sql model to the multi-table Text2sql model in a transfer learning mode;

a fourth module for obtaining a second training data set; the second training data set comprises a second natural statement and a second SQL statement corresponding to the second natural statement; the second natural sentence is used for acquiring second target data of a plurality of tables in the database; the second label SQL statement is used for inquiring and returning the second target data;

and the fifth module is used for training the multi-table Text2sql model after the parameter migration according to the second training data set to obtain the trained multi-table Text2sql model.

On the other hand, the embodiment of the invention provides a multi-table Text2sql model training device, which comprises:

at least one memory for storing a program;

at least one processor for loading the program to perform the multi-table Text2sql model training method.

In another aspect, an embodiment of the present invention provides a computer-readable storage medium, in which computer-executable instructions are stored, where the computer-executable instructions are configured to execute the multi-table Text2sql model training method.

The embodiment of the invention has the beneficial effects that: with a first training data set comprising a first natural language statement and a first tagged SQL statement corresponding to the first natural language statement, training the single-table Text2sql model to obtain a trained single-table Text2sql model, then the parameters of the trained single-table Text2sql model are transferred to a multi-table Text2sql model based on a transfer learning mode, so that the multi-table Text2sql model can be subsequently trained using the parameters of the single-table Text2sql model, then through a second training data set comprising a second natural language sentence and a second SQL sentence corresponding to the second natural language sentence, training the multi-table Text2sql model after parameter migration, so that the multi-table Text2sql model can realize rapid training convergence on the basis of the parameters of the single-table Text2sql model, and the multi-table Text2sql model obtained by training can further improve the sentence conversion accuracy rate when inquiring multi-table data, thereby improving the precision of user inquiring data.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the example serve to explain the principles of the invention and not to limit the invention.

FIG. 1 is a schematic diagram of an implementation environment of a multi-table Text2sql model training method according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating a multi-table Text2sql model training method according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a single-table Text2sql model according to an embodiment of the present invention;

FIG. 4 is a schematic diagram illustrating training of a single-table Text2sql model according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a multi-table Text2sql model according to an embodiment of the present invention;

FIG. 6 is a diagram illustrating training of a multi-table Text2sql model according to an embodiment of the present invention;

FIG. 7 is a block diagram of a multi-table Text2sql model training system according to an embodiment of the present invention;

FIG. 8 is a schematic structural diagram of a multi-table Text2sql model training device according to an embodiment of the present invention;

fig. 9 is a block diagram of a computer device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

It should be noted that although functional blocks are partitioned in a schematic diagram of an apparatus and a logical order is shown in a flowchart, in some cases, the steps shown or described may be performed in a different order than the partitioning of blocks in the apparatus or the order in the flowchart. The terms first, second and the like in the description and in the claims, and the drawings described above, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

The database stores information of various industries, such as course selection information, achievement information, staff flow information and the like of schools. The SQL statement is a query language for a database, but for professionals who are not in the field of computers, it is difficult to write the SQL statement with high accuracy for different databases and application scenarios. The Text2SQL model is a model capable of converting a natural language into an SQL query statement, and can effectively reduce the difficulty of non-professionals in data query by using the SQL statement. For example, the user inputs "where yeasts are born? "the Text2SQL model resolves the SQL statement" Select double _ city FROM director name "", and executes the SQL statement to the database, and then queries and returns "XX province XX city".

In the related technology, the Text2SQL model comprises a single-table Text2SQL model and a multi-table Text2SQL model, and the task execution process of the multi-table Text2SQL model is more complex and more difficult than that of the single-table Text2SQL model, so that the accuracy of SQL sentences obtained through conversion according to natural language input by a user is lower, and the accuracy of returned query results is lower.

Based on this, the embodiment provides a multi-table Text2sql model training method, and the method of the embodiment of the application applies the parameters of the trained single-table Text2sql model to the multi-table Text2sql model in a migration learning manner, so that after the multi-table Text2sql model is trained, the sentence conversion accuracy is further improved on the basis of the accuracy of the single-table Text2sql model, and the accuracy of data query of a user is improved.

Specific embodiments are described below with reference to the accompanying drawings:

referring to fig. 1, fig. 1 is a schematic diagram of an implementation environment of a multi-table Text2sql model training method according to an embodiment of the present application. Referring to fig. 1, the software and hardware main body of the implementation environment mainly includes an operation terminal 110 and a server 120, and the operation terminal 110 is connected with the server 120 in a communication manner. The multi-table Text2sql model training method may be separately configured to be executed by the operation terminal 110, may also be separately configured to be executed by the server 120, or may be executed based on the interaction between the operation terminal 110 and the server 120, which may be selected appropriately according to the actual application situation, and this embodiment is not limited in particular.

Specifically, the operation terminal 110 in the present application may include, but is not limited to, any one or more of a smart watch, a smart phone, a computer, a Personal Digital Assistant (PDA), an intelligent voice interaction device, an intelligent household appliance, or a vehicle-mounted terminal. The server 120 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a CDN (Content Delivery Network), a big data and artificial intelligence platform. The operation terminal 110 and the server 120 may establish a communication connection through a wireless Network or a wired Network, which uses standard communication technologies and/or protocols, and the Network may be set as the internet, or may be any other Network, such as, but not limited to, a Local Area Network (LAN), a Metropolitan Area Network (MAN), a Wide Area Network (WAN), a mobile, wired, or wireless Network, a private Network, or any combination of virtual private networks.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Referring to fig. 2, fig. 2 is a schematic diagram of a multi-table Text2sql model training method provided in an embodiment of the present application, where the multi-table Text2sql model training method may be configured in at least one of an operation terminal or a server. Specifically, referring to fig. 2, the multi-table Text2sql model training method according to the embodiment of the present application includes, but is not limited to, steps 210 to 250:

step 210, obtaining a first training data set; the first training data set comprises a first natural statement and a first label SQL statement corresponding to the first natural statement; the first natural sentence is used for acquiring first target data of a single table in a database; the first label SQL statement is used for inquiring and returning the first target data.

In an embodiment of the present application, the data in the first training data set may be historical data in a cloud database. For example, in some embodiments, the data in the first training data set may be query log data saved when the user made a data query using natural language sentences before the current time. In some embodiments, the first training data set may also be a data set that is publicly available on a predetermined website, for example, a WikiSQL data set or a NL2SQL data set. The disclosed data sets can be directly acquired through a data calling mode and then used for training the single table Text2sql model in the embodiment of the application. In other embodiments, the first natural language statement in the first training data set may also be natural language data collected from a web page in real time, and the first labeled SQL statement may be an SQL statement obtained by performing a re-matching operation on the natural language data collected in real time. The SQL statements may include statements for pointing to content within the database at a particular location, content, and the like. For example, an SQL statement may include a Select clause and a Where clause. Specifically, the Select clause is used for providing a selection statement template for acquiring target data from the database, and the Where clause is used for providing a query statement template for the specific position of the target data in the database. It is understood that the Select clause may include, but is not limited to, a Select column clause, a Select num clause, and a Select op clause. The Select column clause can be understood as a specific column of the target data on a single table; the Select num clause is understood as the number of the target data on a single table in the data; the Select op clause may be understood as the way the target data is chosen on a single table.

And step 220, training the single-table Text2sql model according to the first training data set to obtain the trained single-table Text2sql model.

In the embodiment of the application, after the first training data set is obtained, the first training data set can be input into the single-table Text2sql model, so that the single-table Text2sql model is trained through the first training data set, and parameters of the single-table Text2sql model are adjusted. Specifically, as shown in FIG. 3, the single-table Text2sql model may include, but is not limited to, a first semantic analysis layer 310 and a first classification output layer 320; the first semantic layer 310 is connected to the first classification output layer 320. Wherein, the first semantic analysis layer 310 may be configured to extract semantic feature data of an input natural sentence; the first classification output layer 320 may be configured to obtain an SQL statement corresponding to the semantic feature data according to the semantic feature data prediction.

In the embodiment of the present application, referring to fig. 3, the first semantic analysis layer 310 may employ a BERT model, which is a multi-layer bidirectional Transformer encoder that pre-trains the deep bidirectional representation by jointly adjusting left and right contexts in all layers. As shown in fig. 3, the first classification output layer 320 may include a first preset number of first linear normalization layers 321, and an output result of each of the first linear normalization layers 321 is used to query an attribute characteristic of the first target data in one dimension of the single table. The attribute features include positions of the target data in a single table, such as position features of columns, rows, numbers and the like. Specifically, the first linear normalization layers of the first preset number may include, but are not limited to, 9 first linear normalization layers, each of the first linear normalization layers is used for executing one classification subtask, and each of the classification subtasks may be executed by using a corresponding SQL statement. For example, a Select clause and a Where clause may be included in the SQL statement, which in turn may include, but is not limited to, 3 clauses including, but not limited to, a Select column clause, a Select num clause and a Select op clause, and a Where clause may include, but is not limited to, 6 clauses including, but not limited to, a Where column clause, a Where num clause, a Where conn, a Where op clause, a Where value start clause, and a Where value end clause. And each clause completes a corresponding classification task in a corresponding first linear normalization layer, so that the output result of each first linear normalization layer is respectively used for inquiring the attribute characteristics of one dimension of the first target data in a single table.

In the embodiment of the application, the single-table Text2sql model needs to be trained to achieve the preset sentence conversion precision. In order to enable each linear normalization layer to effectively learn the difference of the corresponding subtasks, the parameters between each linear normalization layer are set to be in an unshared form. Specifically, as shown in fig. 4, the training process of the single-table Text2sql model according to the embodiment of the present application includes, but is not limited to:

step 410, inputting the first natural sentence into the first semantic analysis layer to obtain first feature data;

in the embodiment of the application, after the first semantic analysis layer obtains the natural sentence, the semantic information of the natural sentence is analyzed, and the semantic information is converted into a vector form, so that the first classification output layer can effectively receive the semantic information in the vector form. In particular, in some embodiments, prior to the first semantic layer receiving the natural language sentence, the natural language sentence may be converted by the embedding layer into a vector form that the first semantic layer is capable of receiving.

Step 420, inputting the first characteristic data into the first classification output layer to obtain a first prediction SQL statement; the first predictive SQL statement is used for inquiring and returning data of a single table in the database.

In the embodiment of the application, after the first feature data in the form of a vector is obtained, since the first classification output layer of the single-table Text2sql model includes a plurality of linear normalization layers, and parameters of each linear normalization layer are not shared, in order to enable each linear normalization layer to effectively learn the difference between tasks, the first feature data needs to be transmitted to each linear normalization layer respectively, so as to train and adjust the parameters of each linear normalization layer.

And 430, adjusting parameters of the single-table Text2SQL model according to the first label SQL statement and the first prediction SQL statement.

In the embodiment of the present application, the accuracy of the prediction result of the single-table Text2sql model can be measured by a Loss Function (Loss Function). The loss function is defined on a single training data and is used for measuring the prediction error of the training data, and specifically, the loss value of the training data is determined according to the prediction result of the label and the model of the single training data on the training data. In actual training, a training data set has many training data, so a Cost Function (Cost Function) is generally adopted to measure the overall error of the training data set, and the Cost Function is defined on the whole training data set and is used for calculating the average value of prediction errors of all the training data, so that the prediction effect of the model can be measured better.

In the embodiment of the application, Machine Learning (ML) is a multi-domain cross discipline, and relates to a multi-domain discipline such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, and the like, and is used for specially researching how a computer simulates or realizes human Learning behaviors to acquire new knowledge or skills and reorganize an existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is a fundamental approach for enabling computers to have intelligence, is applied to various fields of artificial intelligence, and generally comprises technologies such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, formal education learning and the like. For a general machine learning model, based on the cost function, and a regularization term for measuring the complexity of the model, the regularization term can be used as a training objective function, and based on the objective function, the loss value of the whole training data set can be obtained. There are many kinds of commonly used loss functions, such as 0-1 loss function, square loss function, absolute loss function, logarithmic loss function, cross entropy loss function, etc. all can be used as the loss function of the machine learning model, and are not described one by one here.

In the embodiment of the present application, taking a Cross-Entropy loss Function (Cross-Entropy Function) as an example, the embodiment may determine a trained loss value through the Cross-Entropy loss Function according to a first label SQL statement and a first prediction SQL statement, and then adjust parameters of each layer on the single-table Text2SQL model through a back propagation algorithm according to the loss value. In particular, the cross entropy loss function has the advantage of fast weight update, which is used to measure the difference information between two pieces of information, for example, the difference between the first tagged SQL statement and the first predicted SQL statement. The essence of the back propagation algorithm is a mapping relationship, which is repeatedly iterated by two links, namely excitation propagation and weight updating, until the response of the network model to the input reaches a preset target range.

And step 230, transferring the parameters of the trained single-table Text2sql model to the multi-table Text2sql model in a transfer learning mode.

In the embodiment of the application, the structure of the single-table Text2sql model is the same as that of a part of the units in the multi-table Text2sql model. Therefore, after the training of the single-table Text2sql model is completed, the trained parameters of the single-table Text2sql model can be applied to the multi-table Text2sql model, so that the multi-table Text2sql model can perform subsequent training based on the parameters of the single-table Text2sql model, thereby saving training time and improving statement conversion precision of the multi-table Text2sql model. Specifically, the parameters of the trained single-table Text2sql model can be transferred to the multi-table Text2sql model in a migration learning manner. The transfer learning refers to the influence of one type of learning on another type of learning, or the influence of experience obtained by one type of learning on the completion of other learning activities. In the machine learning process, the transfer learning can better utilize the previously marked data to improve the training precision of a new task model. In particular, migration learning includes, but is not limited to, feature-based migration and shared parameter-based migration. Feature-based migration is to find feature representations common between the source domain and the target domain and then use these feature representations for knowledge migration. In this embodiment, the common feature representation of the single-table Text2sql model and the multi-table Text2sql model may be determined first, and then the parameters of the single-table Text2sql model may be migrated to the multi-table Text2sql model based on the common feature representation.

In some embodiments, as shown in FIG. 5, the multi-table Text2sql model includes a second semantic analysis layer 510 and a second classification output layer 520; the second semantic analysis layer 510 is connected to the second classification output layer 520. The second classification output layer 520 includes a second preset number of second linear normalization layers 521, and an output result of each of the second linear normalization layers 521 is used to query an attribute feature of the second target data in one dimension of the tables. Specifically, the second linear normalization layer with the second preset number may include, but is not limited to, 16 second linear normalization layers, each of the second linear normalization layers is used for executing one classification subtask, and each of the classification subtasks may be executed by using a corresponding SQL statement. For example, but not limited to, Select clauses, Where clauses, Order by clauses, group by clauses, Having, Limit clauses, Intersect clauses, Except clauses, and Union clauses may be included in the SQL statement. The Select clause may in turn include, but is not limited to, 3 clauses of Select column clause, Select num clause and Select op clause, and the Where clause may include, but is not limited to, 6 clauses of Where column clause, Where num clause, Where conn clause, Where op clause, Where value start clause, Where value end clause. And each clause completes a corresponding classification task in a corresponding second linear normalization layer, so that the output result of each second linear normalization layer is respectively used for inquiring the attribute characteristics of the second target data in one dimension of the plurality of tables. Comparing fig. 3 and fig. 4, it can be known that the multi-table Text2sql model has 7 more linear normalization layers compared to the single-table Text2sql model, and the rest of the structure is the same as that of the single-table Text2sql model, so that the first preset number is less than or equal to the second preset number, and the single-table Text2sql model parameters can be migrated into the corresponding units of the multi-table Text2sql model through feature-based migration in migration learning in the embodiment.

Step 240, acquiring a second training data set; the second training data set comprises a second natural statement and a second SQL statement corresponding to the second natural statement; the second natural sentence is used for acquiring second target data of a plurality of tables in the database; and the second label SQL statement is used for inquiring and returning the second target data.

In this embodiment, the data in the second training data set may be historical data in a cloud database. For example, in some embodiments, the data in the second training data set may be query log data saved when the user performs a data query using a natural language statement before the current time, and the query log data may include the natural language statement input by the user and a labeled SQL statement corresponding to the natural language statement. In some embodiments, the second training data set may also be a data set that is publicly available on a predetermined website, such as a spinner data set or a DuSql data set. The disclosed data sets can be directly acquired through a data calling mode and then used for training a multi-table Text2sql model in the embodiment of the application. In other embodiments, the first natural language statement in the first training data set may also be natural language data collected from the web page in real time, and the second tagged SQL statement may be an SQL statement that is re-matched based on the natural language data collected in real time. The SQL statements may include statements for pointing to content within the database at a particular location, content, and the like. For example, an SQL statement may include a Select clause and a Where clause. In this embodiment, since the second training data set is used to train the multi-table Text2sql model, the multi-table Text2sql model is more complex in structure than the single-table Text2sql model. Therefore, the tagged SQL statements in the second training data set may further include an Order by clause, a group by clause, a Having, a Limit clause, an Intersect clause, an Except clause, and a Union clause, which may be used to provide constraint query templates in a plurality of tables corresponding to query natural statements. In this embodiment, the Select clause is used to provide a selection statement template for obtaining target data from the database, and the Where clause is used to provide a query statement template for specifying the location of the target data in the database. It is understood that the Select clause may include, but is not limited to, a Select column clause, a Select num clause, and a Select op clause. The Select column clause can be understood as a specific column of the target data on a single table; select num clause to be understood as the number of target data on a single table within the data; the Select op clause may be understood as the way the target data is chosen on a single table.

And 250, training the multi-table Text2sql model after the parameters are transferred according to the second training data set to obtain the trained multi-table Text2sql model.

In this embodiment, after obtaining the parameters of the single-table Text2sql model and the second training data set, the multi-table Text2sql model configured with the parameters of the single-table Text2sql model may be trained and the parameters of the multi-table Text2sql model may be adjusted through the second training data set. Specifically, in order to enable each second linear normalization layer to effectively learn the difference of the corresponding subtasks, the parameter setting between each second linear normalization layer is not shared. In this embodiment, as shown in fig. 6, the training process of the multi-table Text2sql model of this embodiment includes, but is not limited to:

and step 610, inputting the second natural sentence into the second semantic analysis layer to obtain second feature data.

In the embodiment of the application, after the natural sentence is obtained, the second semantic analysis layer analyzes the semantic information of the natural sentence, and converts the semantic information into a vector form, so that the second classification output layer can effectively receive the semantic information in the vector form. Specifically, in some embodiments, prior to the second semantic layer receiving the natural language sentence, the natural language sentence may be converted by the embedding layer into a vector form that the first semantic layer is capable of receiving; or, when the embedding layer is arranged in the second semantic analysis layer, the natural sentence can be directly input into the second semantic analysis layer, and after the natural sentence is converted into a vector form by the embedding layer arranged in the second semantic analysis layer, the semantic information corresponding to the natural sentence can be extracted from the vector.

Step 620, inputting the second characteristic data into the second classification output layer to obtain a second prediction SQL statement; the second predictive SQL statement is used for inquiring and returning data of a plurality of tables in the database.

In this embodiment, as shown in fig. 5, the second classification output layer may include 16 linear normalization layers, each of which performs one classification subtask, and parameters between each of which are not shared. Therefore, in this embodiment, after the second feature data output by the second semantic analysis layer is obtained, the second feature data needs to be respectively transmitted to each linear normalization layer, so that each linear normalization layer separately predicts one SQL statement as a second predicted SQL statement, and the second predicted SQL statement is used to query the attribute feature of the second target data in one dimension of the tables.

And 630, adjusting the parameters of the multi-table Text2SQL model after the parameter migration according to the second label SQL statement and the second prediction SQL statement.

In this embodiment, in the embodiment of the present application, the accuracy of the prediction result of the multi-table Text2sql model may be measured by a Loss Function (Loss Function). The loss function is defined on a single training data and is used for measuring the prediction error of the training data, and specifically, the loss value of the training data is determined according to the prediction result of the label and the model of the single training data on the training data. In actual training, a training data set has many training data, so a Cost Function (Cost Function) is generally adopted to measure the overall error of the training data set, and the Cost Function is defined on the whole training data set and is used for calculating the average value of prediction errors of all the training data, so that the prediction effect of the model can be measured better. For a general machine learning model, based on the cost function, and a regularization term for measuring the complexity of the model, the regularization term can be used as a training objective function, and based on the objective function, the loss value of the whole training data set can be obtained. There are many kinds of commonly used loss functions, such as 0-1 loss function, square loss function, absolute loss function, logarithmic loss function, cross entropy loss function, etc. all can be used as the loss function of the machine learning model, and are not described one by one here.

In the embodiment of the present application, taking a Cross-Entropy loss Function (Cross-Entropy Function) as an example, the embodiment may determine a trained loss value through the Cross-Entropy loss Function according to a second label SQL statement and a second prediction SQL statement, and then adjust parameters of each layer on the multi-table Text2SQL model through a back propagation algorithm according to the loss value. In particular, the cross entropy loss function has the advantage of fast weight update, which is used to measure the difference information between two pieces of information, for example, the difference between the second tagged SQL statement and the second predicted SQL statement. The essence of the back propagation algorithm is a mapping relationship, which is repeatedly iterated by two links, namely excitation propagation and weight updating, until the response of the network model to the input reaches a preset target range.

In this embodiment, the training process of the multi-table Text2SQL model is stopped by continuously calculating the loss values of the second label SQL statement and the second prediction SQL statement, and then reversely adjusting the parameters of each layer on the multi-table Text2SQL model according to the loss values until the loss values meet the preset requirements.

To sum up, in the embodiment of the application, when a multi-table Text2SQL model is trained, data including a first natural statement and a first label SQL statement corresponding to the first natural statement is obtained as a first training data set, the first training data set is input into a semantic analysis layer of the single-table Text2SQL model, feature data is extracted to be used as first feature data, the first feature data is input into a classification output layer of the single-table Text2SQL model, the SQL statement is obtained by classification prediction to be used as a first prediction SQL statement, loss values are calculated according to the first prediction SQL statement and the first label SQL statement, and parameters of the single-table Text2SQL model are reversely adjusted according to the calculated loss values.

After the parameters of the single-table Text2sql model are adjusted, the parameters of the single-table Text2sql model are configured into a target unit of the multi-table Text2sql model in a transfer learning mode, wherein the structure of the target unit is the same as that of the single-table Text2sql model, so that the parameters of the single-table Text2sql model can be effectively and accurately configured into the multi-table Text2sql model.

After the parameters of the single-table Text2SQL model are configured in the multi-table Text2SQL model, a second training data set comprising a second natural statement and a second SQL statement corresponding to the second natural statement is obtained, then the second training data set is input into a semantic analysis layer of the multi-table Text2SQL model, feature data is extracted to be used as second feature data, the second feature data is input into a classification output layer of the multi-table Text2SQL model, the SQL statement is obtained through classification prediction to be used as a second prediction SQL statement, loss values are calculated according to the second prediction SQL statement and a second label SQL statement, and then the parameters of the multi-table Text2SQL model are reversely adjusted according to the calculated loss values.

After completing the multi-table Text2SQL model training, a user inputs a natural statement through terminal equipment, the multi-table Text2SQL model can convert the natural statement into a corresponding SQL statement, multi-table query is carried out according to the SQL statement to obtain target data, and the target data is returned to the terminal equipment of the user.

In conclusion, in the embodiment, the parameters of the single-table Text2sql model with higher accuracy are migrated into the multi-table Text2sql model with lower accuracy by using migration learning, so that the sentence conversion accuracy of the multi-table Text2sql model is improved; and the parameters of the single-table Text2sql model are used in the multi-table Text2sql task by using transfer learning, so that the problem of low precision of the multi-table Text2sql model caused by insufficient training data is effectively solved.

Referring to fig. 7, an embodiment of the present invention provides a multi-table Text2sql model training system, including:

a first module 710 for obtaining a first training data set; the first training data set comprises a first natural statement and a first label SQL statement corresponding to the first natural statement; the first natural sentence is used for acquiring first target data of a single table in a database; the first label SQL statement is used for inquiring and returning the first target data;

a second module 720, configured to train the single-table Text2sql model according to the first training data set, to obtain a trained single-table Text2sql model;

a third module 730, configured to transfer the parameters of the trained single-table Text2sql model to the multi-table Text2sql model in a transfer learning manner;

a fourth module 740 for obtaining a second training data set; the second training data set comprises a second natural statement and a second SQL statement corresponding to the second natural statement; the second natural sentence is used for acquiring second target data of a plurality of tables in the database; the second label SQL statement is used for inquiring and returning the second target data;

and a fifth module 750, configured to train the multi-table Text2sql model after parameter migration according to the second training data set, so as to obtain a trained multi-table Text2sql model.

The contents of the embodiments of the method of the present invention are all applicable to the embodiments of the present system, the functions specifically implemented by the embodiments of the present system are the same as those of the embodiments of the method described above, and the beneficial effects achieved by the embodiments of the present system are also the same as those achieved by the methods described above, which are not described herein again.

Referring to fig. 8, an embodiment of the present invention provides a multi-table Text2sql model training apparatus, including:

at least one memory 810 for storing programs;

at least one processor 820 configured to load the program to perform the multi-table Text2sql model training method shown in fig. 2.

In some alternative embodiments, the memory, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. Further, the memory may include high speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory located remotely from the processor, and these remote memories may be connected to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The non-transitory software programs and instructions required to implement the multi-table Text2sql model training method of the above-described embodiments are stored in a memory and, when executed by a processor, perform the processing method of the user search intention in the above-described embodiments, such as performing the method steps 210 to 250 in fig. 2 described above.

In some alternative embodiments, the processing device may be a computer device, and the computer device may be a server and may be a user terminal. In this embodiment, taking a computer device as a user terminal as an example, the following is specific:

as shown in fig. 9, the computer device may include RF (Radio Frequency) circuitry 910, memory 920 including one or more computer-readable storage media, an input unit 930, a display unit 940, a sensor 950, audio circuitry 960, a short-range wireless transmission module 970, a processor 980 including one or more processing cores, and a power supply 990, among other components.

RF circuit 910 may be configured to receive and transmit information, receive signals from, and transmit signals to one or more processors 980, in particular, receive downlink information from a base station; in addition, data relating to uplink is transmitted to the base station. In general, RF circuit 910 includes, but is not limited to, an antenna, at least one Amplifier, a tuner, one or more oscillators, a Subscriber Identity Module (SIM) card, a transceiver, a coupler, an LNA (low noise Amplifier), a duplexer, and the like. In addition, the RF circuit 910 may also communicate with networks and other devices via wireless communication. The wireless communication may use any communication standard or protocol, including but not limited to GSM (Global System for Mobile communications), GPRS (General Packet Radio Service), CDMA (Code Division Multiple Access), WCDMA (Wideband Code Division Multiple Access), LTE (Long Term Evolution), email, SMS (Short Messaging Service), etc.

Memory 920 may be used to store software programs and modules. The processor 980 executes various functional applications and data processing by running software programs and modules stored in the memory 920. The memory 920 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function (such as a sound recording function, an image viewing function, etc.), and the like; the storage data area may store data (such as audio data, text, etc.) created according to the use of the device, etc. Further, the memory 920 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 920 may also include a memory controller to provide the processor 980 and the input unit 930 with access to the memory 920.

The input unit 930 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control. In particular, the input unit 930 may include a touch-sensitive surface 931 as well as other input devices 932. The touch-sensitive surface 931, also referred to as a touch screen or a touch pad, may collect touch operations by a user on or near the touch-sensitive surface 931 (e.g., operations by a user on or near the touch-sensitive surface 931 using a finger, a stylus, or any other suitable object or attachment) and drive the corresponding connecting device according to a predetermined program. Alternatively, the touch sensitive surface 931 may include both a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 980, and can receive and execute commands sent by the processor 980. In addition, the touch sensitive surface 931 may be implemented in various types, such as resistive, capacitive, infrared, and surface acoustic wave. The input unit 930 may also include other input devices 932 in addition to the touch-sensitive surface 931. In particular, other input devices 932 may include, but are not limited to, one or more of a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like.

The display unit 940 may be used to display various graphic user interfaces for information input by a user or information provided to a user and control, which may be configured of graphics, text, icons, video, and any combination thereof. The Display unit 940 may include a Display panel 941, and optionally, the Display panel 941 may be configured in the form of an LCD (Liquid Crystal Display), an OLED (Organic Light-Emitting Diode), or the like. Further, the touch-sensitive surface 931 may overlay the display panel 941, and when a touch operation is detected on or near the touch-sensitive surface 931, the touch operation is transmitted to the processor 980 to determine the type of touch event, and the processor 980 then provides a corresponding visual output on the display panel 941 according to the type of touch event.

The above described system embodiments are merely illustrative, wherein the units illustrated as separate components may or may not be physically separate, i.e. may be located in one place, or may also be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

Furthermore, an embodiment of the present invention provides a computer-readable storage medium storing computer-executable instructions for performing the multi-table Text2sql model training method shown in fig. 2.

One of ordinary skill in the art will appreciate that all or some of the steps, systems, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.

While the preferred embodiments of the present invention have been described in detail, it will be understood by those skilled in the art that the foregoing and various other changes, omissions and deviations in the form and detail thereof may be made without departing from the scope of this invention.

Claims

1. A multi-table Text2sql model training method is characterized by comprising the following steps:

2. The method for training the multi-table Text2sql model according to claim 1, wherein the transferring the parameters of the trained single-table Text2sql model to the multi-table Text2sql model by means of transfer learning comprises:

obtaining a first model parameter of the trained single-table Text2sql model;

and configuring the first model parameters to a target unit in the multi-table Text2sql model, wherein the target unit and the single-table Text2sql model have the same structure.

3. The method of claim 1, wherein the single-table Text2sql model comprises a first semantic analysis layer and a first classification output layer; the first semantic analysis layer is connected with the first classification output layer; the training of the single-table Text2sql model according to the first training data set comprises:

inputting the first natural sentence into the first semantic analysis layer to obtain first feature data;

inputting the first characteristic data into the first classification output layer to obtain a first prediction SQL statement; the first prediction SQL statement is used for inquiring and returning data of a single table in the database;

and adjusting parameters of the single-table Text2SQL model according to the first label SQL statement and the first prediction SQL statement.

4. The method of claim 3, wherein the first classification output layer comprises a first preset number of first linear normalization layers, and an output result of each first linear normalization layer is used for querying attribute features of the first target data in one dimension of the single table.

5. The method of claim 3, wherein the adjusting parameters of the single-table Text2SQL model according to the first tagged SQL statement and the first predicted SQL statement comprises:

determining a trained loss value through a cross entropy loss function according to the first label SQL statement and the first prediction SQL statement;

and adjusting parameters of the single-table Text2sql model through a back propagation algorithm according to the loss value.

6. The method of claim 4, wherein the multi-table Text2sql model comprises a second semantic analysis layer and a second classification output layer; the second semantic analysis layer is connected with the second classification output layer; the training of the multi-table Text2sql model after parameter migration according to the second training data set comprises:

inputting the second natural sentence into the second semantic analysis layer to obtain second feature data;

inputting the second feature data into the second classification output layer to obtain a second prediction SQL statement; the second prediction SQL statement is used for inquiring and returning data of a plurality of tables in the database;

and adjusting the parameters of the multi-table Text2SQL model after the parameters are migrated according to the second label SQL statement and the second prediction SQL statement.

7. The method of claim 6, wherein the second classification output layer comprises a second preset number of second linear normalization layers, and an output result of each second linear normalization layer is used for querying an attribute feature of the second target data in one dimension of the tables; the second preset number is greater than the first preset number.

8. A multi-table Text2sql model training system, comprising:

9. A multi-table Text2sql model training device is characterized by comprising:

at least one memory for storing a program;

at least one processor configured to load the program to perform the multi-table Text2sql model training method according to any of claims 1 to 7.

10. A computer-readable storage medium having stored thereon computer-executable instructions for performing the multi-table Text2sql model training method of any of claims 1 to 7.