CN114282497A

CN114282497A - Method and system for converting text into SQL

Info

Publication number: CN114282497A
Application number: CN202111596478.1A
Authority: CN
Inventors: 俞凯; 曹瑞升; 陈露; 李杰宇; 许洪深
Original assignee: Sipic Technology Co Ltd
Current assignee: Sipic Technology Co Ltd
Priority date: 2021-12-24
Filing date: 2021-12-24
Publication date: 2022-04-05

Abstract

The embodiment of the invention provides a method for converting text into SQL. The method comprises the following steps: determining a node candidate set for generating an abstract syntax tree through a problem text and a database table and a column; determining any node randomly selected from the node candidate set as a head node of the abstract syntax tree, and inputting the head node into a decoder to obtain the action distribution of the head node; determining a plurality of nodes which can be expanded at the next moment of the node at the current moment based on the motion distribution, judging a corresponding node selection scheme according to the nodes, determining a sub-node at the next moment from the plurality of expandable nodes and corresponding motion distribution to expand the abstract syntax tree until the sub-node at the next moment can not be determined, and obtaining a final abstract syntax tree which avoids overfitting; and converting the question text into a corresponding SQL statement based on the final abstract syntax tree. The embodiment of the invention also provides a system for converting the text into the SQL. The embodiment of the invention relieves the overfitting problem caused by poor combination generalization and obtains more accurate SQL sentences.

Description

Method and system for converting text into SQL

Technical Field

The invention relates to the field of intelligent voice, in particular to a method and a system for converting text into SQL.

Background

A text-to-SQL (Structured Query Language) task converts a natural Language question into a corresponding SQL Query given a database schema.

Generally, the class adopts modular parallel decoding, different network modules such as a classifier, a pointer network and a sequence generation network are designed for different SQL clauses in a targeted manner, and prediction is respectively carried out, wherein the representative model is SQLNet.

Or the end-to-end decoding based on the symbols is adopted, each symbol in the SQL is directly generated, the SQL comprises some SQL keywords (such as SELECT and FROM) which represent Seq2SQL and Picard, and the generation of the SQL is solved as an autoregressive sequence modeling problem in an end-to-end mode.

Or end-to-end decoding based on the grammar can be adopted, the aim is to generate an equivalent grammar spanning tree of the SQL, the generation of the sequence is converted into the generation of a structured grammar tree, and the grammar spanning tree is restored into the original SQL by a post-processing transduction program after the prediction is finished. The method can be subdivided into two types, one is a top-down grammar-guided generation model, such as an IRNet model, and the other is a bottom-up grammar-guided generation model, such as a SmBoP model.

In the process of implementing the invention, the inventor finds that at least the following problems exist in the related art:

module-based parallel decoding model: different modules need to be explicitly designed, information circulation and sharing modes among the modules need to be achieved, the problems of repetition and deletion of prediction among the different modules need to be considered in the parallel decoding process, model design and super-parameter adjustment which are excessively detailed are difficult to migrate among different tasks and data sets.

Symbol-based end-to-end model: because the decoding process is not limited, SQL sentences with irregular grammar and unreasonable semantics are easy to generate, and a large amount of calculation power is used for generating output which does not meet the requirements.

Grammar-based end-to-end model: the process of generating the top-down tree is determined by a predefined traversal order, so that the problems of poor combination generalization and unreasonable inference path are easily caused; the search space of the bottom-up tree generation process is too large, the optimal spanning tree is difficult to find, even a plurality of equivalent syntax trees exist, scoring is often performed from the whole situation through an additional reordering model, and the tree with the highest score is returned.

Disclosure of Invention

The method aims to at least solve the problems that in the prior art, text-to-SQL generalization is poor and reasoning paths are unreasonable.

In a first aspect, an embodiment of the present invention provides a method for converting a text into an SQL, including:

determining a node candidate set for generating an abstract syntax tree through a problem text and a database table and a column;

determining any node randomly selected from the node candidate set as a head node of the abstract syntax tree, and inputting the head node into a decoder to obtain the action distribution of the head node;

determining a plurality of nodes which can be expanded at the next moment of the node at the current moment based on the action distribution, judging a corresponding node selection scheme according to the nodes, determining a next-moment sub-node from the plurality of expandable nodes and corresponding action distribution to expand the abstract syntax tree until the next-moment sub-node cannot be determined, and obtaining a final abstract syntax tree which avoids over-fitting;

and converting the question text into a corresponding SQL statement based on the final abstract syntax tree.

In a second aspect, an embodiment of the present invention provides a method for training a text-to-SQL model, including:

inputting a training node candidate set determined by a training problem text and a database table and columns into the text-to-SQL model;

determining any node randomly selected from the node candidate set as a head node of the abstract syntax tree, and decoding the head node by using a decoder of the text-to-SQL model to obtain action distribution of the head node;

determining a plurality of nodes which can be expanded at the next moment of the node at the current moment based on the motion distribution, judging a corresponding node selection scheme according to the nodes, determining a next-moment sub-node and corresponding motion distribution from the plurality of expandable nodes to expand the abstract syntax tree, inputting the next-moment sub-node into the decoder again to expand the abstract syntax tree until the sub-node of the next-moment sub-node cannot be determined, and obtaining a pre-estimated abstract syntax tree;

and performing over-fitting-avoiding training on the text-to-SQL model based on the difference between the real abstract syntax tree corresponding to the training problem text and the predicted abstract syntax tree until the predicted abstract syntax tree approaches the real abstract syntax tree.

In a third aspect, an embodiment of the present invention provides a system for converting text into SQL, including:

a node candidate set determining program module for determining a node candidate set for generating an abstract syntax tree through a problem text, a database table and a column;

a motion distribution determining program module for determining any node randomly selected from the node candidate set as a head node of the abstract syntax tree, and inputting the head node to a decoder to obtain motion distribution of the head node;

a syntax tree construction program module, configured to determine, based on the motion distribution, multiple nodes at which a node at the current time is expandable at the next time, determine, according to the node selection scheme corresponding to the node selection, a next-time child node and a corresponding motion distribution from the multiple nodes that are expandable, and expand the abstract syntax tree until the next-time child node cannot be determined, so as to obtain a final abstract syntax tree that is prevented from being over-fitted;

and the conversion program module is used for converting the question text into a corresponding SQL statement based on the final abstract syntax tree.

In a fourth aspect, an embodiment of the present invention provides a training system for converting a text into an SQL model, including:

the training data input program module is used for inputting a training node candidate set determined by a training problem text and a database table and columns into the text-to-SQL model;

the action distribution determining program module is used for determining any node randomly selected from the node candidate set as a head node of the abstract syntax tree, and decoding the head node by using a decoder of the text-to-SQL model to obtain the action distribution of the head node;

a syntax tree prediction program module, configured to determine, based on the motion distribution, multiple nodes that are expandable at a current time, determine, from the multiple expandable nodes, a node selection scheme corresponding to the node selection, a next-time child node and a corresponding motion distribution from the multiple expandable nodes to expand the abstract syntax tree, and input the next-time child node to the decoder to expand the abstract syntax tree again until a child node of the next-time child node cannot be determined, so as to obtain a predicted abstract syntax tree;

and the training program module is used for performing over-fitting-avoiding training on the text-to-SQL model based on the difference between the real abstract syntax tree corresponding to the training problem text and the predicted abstract syntax tree until the predicted abstract syntax tree approaches the real abstract syntax tree.

In a fifth aspect, an electronic device is provided, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to perform the steps of the method for training a text-to-SQL and text-to-SQL model according to any of the embodiments of the invention.

In a sixth aspect, an embodiment of the present invention provides a storage medium, on which a computer program is stored, where the computer program is configured to, when executed by a processor, implement the steps of the method for training a text-to-SQL and a text-to-SQL model according to any embodiment of the present invention.

The embodiment of the invention has the beneficial effects that: in the process of decoding text to SQL, aiming at the top-down grammar decoding process, a certain randomness is introduced when a decoding input end selects a node by regarding the grammar decoding process as a combined structured set prediction problem, and a decoder autonomously decides the current optimal action according to the prediction output, so that the overfitting problem caused by combined generalization difference is relieved, and a more accurate SQL sentence is obtained.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a flowchart of a method for converting text into SQL according to an embodiment of the present invention;

FIG. 2 is a flowchart of processing typed collections at the input of the decoder according to an embodiment of the present invention;

FIG. 3 is a flowchart of a method for converting text to SQL to process a set without type at the output of a decoder according to an embodiment of the present invention;

FIG. 4 is a diagram illustrating an application of an abstract syntax tree of a method for converting text into SQL according to an embodiment of the present invention;

FIG. 5 is a flowchart of a training method for converting text into SQL model according to an embodiment of the present invention;

FIG. 6 is a data diagram of the main results of the training method for converting text into SQL model with respect to the data set Spider according to an embodiment of the present invention;

FIG. 7 is a result data diagram of the data set DuSQL of the training method for converting text into SQL model according to an embodiment of the present invention;

FIG. 8 is a diagram of the evaluation data of TS and UTS of the training method for converting text to SQL model according to an embodiment of the present invention;

FIG. 9 is a sequential graph of a set of types of random vs probes for a training method for a text-to-SQL model according to an embodiment of the present invention;

FIG. 10 is a sequential graph of a set of non-types of random vs probes for a training method for a text-to-SQL model according to an embodiment of the present invention;

FIG. 11 is a diagram illustrating ordered paths of TS of a training method for converting text into SQL model according to an embodiment of the present invention;

FIG. 12 is a data diagram of the result of the training method for converting text into SQL model according to an embodiment of the present invention when the TS order is fixed;

FIG. 13 is a graph of non-type set-heuristic order change of a training method for a text-to-SQL model according to an embodiment of the present invention;

FIG. 14 is a mixed training data diagram of TS and UTS of a training method for converting text to SQL model according to an embodiment of the present invention;

FIG. 15 is a block diagram of a system for converting text into SQL according to an embodiment of the present invention;

FIG. 16 is a schematic structural diagram of a training system for converting text into SQL model according to an embodiment of the present invention;

fig. 17 is a schematic structural diagram of an embodiment of an electronic device for converting text into SQL according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a flowchart of a method for converting text into SQL according to an embodiment of the present invention, which includes the following steps:

s11: determining a node candidate set for generating an abstract syntax tree through a problem text and a database table and a column;

s12: determining any node randomly selected from the node candidate set as a head node of the abstract syntax tree, and inputting the head node into a decoder to obtain the action distribution of the head node;

s13: determining a plurality of nodes which can be expanded at the next moment of the node at the current moment based on the action distribution, judging a corresponding node selection scheme according to the nodes, determining a next-moment sub-node from the plurality of expandable nodes and corresponding action distribution to expand the abstract syntax tree until the next-moment sub-node cannot be determined, and obtaining a final abstract syntax tree which avoids over-fitting;

s14: and converting the question text into a corresponding SQL statement based on the final abstract syntax tree.

In this embodiment, the purpose is to convert the input question text into a corresponding SQL syntax spanning tree containing nodes and their types, and to extend the actions of each node. Specifically, the input at each moment in decoding, i.e. the characteristics of the nodes, and the output, i.e. the action corresponding to each node, indicate how to expand the current input node, including the type and number of the child nodes to be added.

For step S11, the question text and the database tables and columns are determined as a set of node candidates of an abstract syntax tree, specifically including the determination by the encoder of vector representations of all words in the question text;

determining, by the encoder, a vector representation of the database table and a vector representation of a column;

and determining the vector representations of all the words, the vector representation of the database table and the vector representation of the column to a node candidate set of an abstract syntax tree. In the present embodiment, it is preferred that,

assume that the encoder obtains the vector representations Q of all words in the question text, the vector representations T and C of all database tables and columns, and the feature memory V of all candidate sets extracted. The output of the encoder is X ═ Q; t; c ], stored in a feature memory V, which is accessed during decoding to calculate the attention vector.

For step S12, AST (Abstract Syntax Tree) is a Tree representation of the source code. The input of the decoder is a traversal of nodes in AST (when the abstract syntax tree is built step by step, only one head node is started), and the output is a series of corresponding operations indicating how to extend the types and the number of input nodes and child nodes, as shown in fig. 2, the input head node is SQL, or, in other words, the abstract syntax tree is used as a root node, and the obtained output action applywale: SQL ═ SFW (select, from, where) can indicate how to extend the types and the number of input nodes and child nodes.

For step S13, continue with the head node determined in step S12:

(1) selecting a head node n from the abstract syntax tree of the current part_jTo expand. (initially there are only head nodes, with continued expansion, the abstract syntax tree is progressively expanded)

(2) Processing the characteristics of the node to obtain

(3) According to the type of the selected head node

Calculating its output motion distribution

(4) Selecting an action a from a syntax tree_jAs the current goal.

(5) In symbol nullApply the action a therebetween_jAnd (4) expanding the abstract syntax tree, and returning to the step (1) until no head node can be expanded.

Continuing with fig. 2, in the expansion, since expansion is performed in three directions, namely, select, from, and where, the corresponding node selection scheme is determined according to the type of the node. Wherein, the node includes: typed nodes and untyped nodes;

when the node is a node with a type, the determining, according to the node selection scheme corresponding to the node judgment, a child node at the next moment from the plurality of expandable nodes to expand the abstract syntax tree includes:

and reasoning the sub-node at the next moment from the paths in the extensible nodes based on a controller scheme or a random scheme in a preset sequence or an exploration scheme based on enumeration.

In the present embodiment, a type attribute is assigned to each node in the AST to indicate its grammatical role. For example, a node of type SQL represents the root of a complete SQL query. Other nodes with types such as from, select, or where represent different clauses of finer granularity semantics. In general, nodes may be classified into non-terminal nodes and terminal nodes according to their types. Specifically, in the task of the method, the terminal type includes tab _ id, col _ id and val _ id.

For grammar rules, each rule takes the form of type RuleName (type1, type 2.) (just like the example ApplyRule: sql SFW (select, from, where) described above). Here, type represents the specific type of parent non-terminal node to be extended, and type1, type 2. Only non-terminator types can appear on the left, and RuleName should be globally unique to distinguish how the parent node (i.e., the head node of the abstract syntax tree) is extended.

Having type n^τAnd each node n of the grammar rule r is embedded by two separate embedding functions ψ (n)^τ) And phi (r).

For node sets with different types, as shown on the left side of fig. 2, after the head node is input, there are three selection schemes in the output action, and all possible node types are randomly selected and enumerated according to a predefined generation sequence.

As shown in the right side of fig. 2, if the controller scheme is selected, the child node at the next time is select, and the corresponding action distribution is: applyrruleacon: select ═ selecttwocolums (column ); if a random scheme is selected, the child node at the next moment is from, and the corresponding action distribution is as follows: applyrruleacon: from TreeTalbe (tab _ id, tab _ id, tab _ id); if the exploration scheme is selected, the child node at the next moment is where, and the corresponding action distribution is as follows: applyrruleacon: where is onecondition (dondition). Preferably, a random scheme is adopted when the type set selects the nodes. In the same way, the selected node is used as the input of the encoder at the next moment to expand the next node.

As another embodiment, when the node is a node without type, the determining, according to the node selection scheme corresponding to the node determination, that the next-time child node extends the abstract syntax tree from the plurality of extensible nodes includes:

and deducing a sub-node at the next moment from the paths in the extensible nodes based on a controller scheme or a random scheme in a preset sequence or an exploration scheme based on beam searching.

In this embodiment, a set of nodes without type is targeted: node characteristics of different node inputs in the set at the time

Are identical except for the way each input node is extended, i.e., the output action. There are also three schemes, named similarly to the above-mentioned typed sets. For the above exploration scheme, all possible action choices are added to Beam, and only the optimal K paths so far are reserved according to the probability score of the model and the constraint K of the Beam size, so that exponential explosion is avoided. Specifically, as shown in fig. 3, the head node is input into the decoder to obtain the applywileaction: from is TreeTalbe (tab _ id, tab _ id, tab _ id) output action. Will be provided withThe labeled action tree is guided by id 2, id 3, and id 4, and the controller scheme, the random scheme, and the search scheme are used in the same way, so as to obtain the output action SelectTableAction: table _ id? . The preferred type-free set takes an exploration scheme when choosing nodes. In the same way, the selected node is used as the input of the encoder at the next moment to expand the next node. Thus, the abstract syntax tree is expanded, as shown in fig. 4, to obtain a final abstract syntax tree, and the SQL statement can be reversely pushed out through the abstract syntax tree.

For step S14, decoding a node tree of nodes traversed from the final abstract syntax tree; obtaining an action tree formed by action distribution corresponding to the nodes; and determining a corresponding SQL statement through the action tree. An abstract syntax tree is obtained, which contains the nodes and their types, and the actions to extend each node. Specifically, the input at each moment in decoding, i.e. the characteristics of the nodes, and the output, i.e. the action corresponding to each node, indicate how to expand the current input node, including the type and number of the child nodes to be added. Thereby converting the final abstract syntax tree into SELECT MAX (col _ id1) FROM tab _ id 1. And the intelligent equipment receiving the SQL statement executes corresponding operation.

According to the embodiment, in the process of decoding text to SQL, the top-down grammar decoding process is regarded as a combined structured set prediction problem, certain randomness is introduced when a decoding input end selects nodes, and a decoder autonomously decides the current optimal action according to the prediction output, so that the overfitting problem caused by combined generalization difference is relieved, and a more accurate SQL sentence is obtained.

Fig. 5 is a flowchart of a training method for converting a text into an SQL model according to an embodiment of the present invention, which includes the following steps:

s21: inputting a training node candidate set determined by a training problem text and a database table and columns into the text-to-SQL model;

s22: determining any node randomly selected from the node candidate set as a head node of the abstract syntax tree, and decoding the head node by using a decoder of the text-to-SQL model to obtain action distribution of the head node;

s23: determining a plurality of nodes which can be expanded at the next moment of the node at the current moment based on the motion distribution, judging a corresponding node selection scheme according to the nodes, determining a next-moment sub-node and corresponding motion distribution from the plurality of expandable nodes to expand the abstract syntax tree, inputting the next-moment sub-node into the decoder again to expand the abstract syntax tree until the sub-node of the next-moment sub-node cannot be determined, and obtaining a pre-estimated abstract syntax tree;

s24: and performing over-fitting-avoiding training on the text-to-SQL model based on the difference between the real abstract syntax tree corresponding to the training problem text and the predicted abstract syntax tree until the predicted abstract syntax tree approaches the real abstract syntax tree.

In the embodiment, the text-to-SQL model can be trained, so that a better interaction effect can be achieved when the method is applied to intelligent interaction.

For step S21, similarly, assume that the encoder obtains vector representations Q of all words in the question text, vector representations T and C of all tables and columns in the database, and feature memory V of all extracted candidate value sets. The output of the encoder is X ═ Q; t; c ], stored in the characteristic memory V.

For step S22, briefly in each expansion step: selecting a head node n_jCarrying out expansion; processing n_jIs characterized by obtaining

According to the type of the selected head node

Calculating its output motion distribution

Selecting an action a from a syntax tree_jAs a current target; applying the action a in symbol space_jAnd (4) expanding the abstract syntax tree, and returning to the step (1) until no head node can be expanded.

Specifically, for the input of the decoder, the input characteristics at time j include three parts:

(1) embedding of previous actions a_j-1Will be defined later (i.e., the motion profile of the previous round) to tell the decoder how to update the AST in the neural space;

(2) previous decoder hidden state h_j-1；

(3) Structure Position Embedding (SPE) s_j。

Wherein the Structure Position Embedding (SPE) represents the boundary node n_jAt the local head node of the AST. In general, s_jIs a concatenation of four vectors:

(1)n_jtype of (1) embedding

(2) Type embedding of its parent node

(p_jRepresents n_jThe time length of the parent node when it is expanded);

(3) action embedding a for expanding its parent node_pj；

(4) Decoder hidden state h when its parent node is used as leading edge node_pj. The formula is as follows:

decoder hidden state h_jAnd attention vector

Calculated by the following way:

wherein X ═ Q; t; c]Is the encoder output, f_dec(-) represents the RNN family of networks.

It consists of:

h₀＝AttentivePooling_dec(X)，

a₀＝0，

wherein n is₀Is a root node of type sql.

For the output of the decoder, the prediction target of the decoder at each time j is an action aj to expand the AST generated by the partial node. The overall training objectives are:

wherein a ═ a₁，...，a_|a|) Is a series of actions, and V is the feature memory referred to in step S21. There are two types of operation, APPLYRULE tailored for non-terminal types and SELECTITEM tailored for terminal types. In this task, an item may be a table, a column, or a value. For the sake of brevity, only the select table operation is discussed, and the other two operations may be inferred.

APPLYRULE operation is based on non-terminator type

Selecting a rule r from the entire grammar rule set_i. Specifically, rule r_iMust be equal to the border node n_jType (c) of the cell. Given attention vector

The distribution of APPLYRULE actions is given by

The select action is to expand the node of the terminal type option id, and please select the select action. The probability of selecting the ith term from table memory T is calculated by the following equation:

action embeddings a can be defined_j(as decoder input for next and future steps),

wherein t is_jIs the selected entry in T, r_jIs the grammar rule selected in the current time instant j.

What has not been described in the above steps is how to determine the boundary nodes and the output operation of each step with respect to step S23. The method may follow a depth-first search order in the vertical direction according to a top-down syntax. After expanding a node n, a next step is switched to a child node thereof. If all child nodes have completed, the trace back will go to the parent node of node n. In the horizontal direction, an attempt is made to adjust the priority.

For typed and untyped nodes, the set of child nodes is divided into two classes. Typed sets represent a set of nodes, where each node has a different type, while untyped represents the same type. In fig. 2, the rule sql SFW defines a typed set including 3 fields with select, from, and where, while rule from, threads (…) includes an untyped set of 3 nodes with type tab _ id in fig. 3.

In fact, the same type of nodes attached to the parent node typically have the same syntactic function, e.g., user intent in a SELECT clause. Their order of precedence is instance-specific, depending on each training sample. However, the set of types is more domain-generic. Assume that there is a dominant canonical order for different instances of each type set.

In the GTOL (gold tree oriented learning) framework of the method shown in fig. 2 and 3, the target AST is also built step by step during training according to the instructions of the gold action tree. To cope with exploration and uncertainty, a bundle of K best ASTs that have been historically explored may be retained.

For a handle type set of input, given a set of different types of nodes, a boundary node (actually a type) is selected as input, and the sequence controller has three options, as shown in FIG. 2:

a controller: the next un-expanded node is returned according to some predefined order on the type, e.g., select → from → where.

And (3) random: a type is randomly selected from the remaining unexpanded child nodes.

Exploring: enumerating all different types of unexpanded nodes, and making the model decide according to the output distribution. This selection may be delayed until after several steps, depending on the beam size K.

For output-side processing of a set of no-types, since the type of the child nodes is the same, at the input SPE s_jThere is no difference, for example, in fig. 3, a node of type tab _ id. At the output, how to extend the gold tags for each child node is available during training. All choices may be added to the beam going deep into different child nodes, ranked according to the prediction score and the top K paths retained. Other alternatives, such as Random selection (Random) or a canonical order of each typeless set of each training example (controller) are also possible.

By the method, the overall pre-estimated abstract syntax tree is obtained.

For step S24, since the training data is prepared in advance and the corresponding real abstract syntax tree is also prepared, the model can be reversely trained by using the estimated and real errors, and the training mode is other modes, which are not limited herein. And obtaining a trained text-to-SQL model until the predicted abstract syntax tree approaches the real abstract syntax tree.

It can be seen from this embodiment that, for the top-down syntax decoding process, by regarding it as a combined structured set prediction problem, a certain randomness is introduced in the selection of nodes at the decoding input end, and the decoder autonomously decides the current optimal action according to the prediction output, so as to alleviate the over-trained fitting problem caused by the combined generalization difference. Meanwhile, aiming at the problem that the traversal path of the abstract syntax tree cannot be enumerated, an optimal historical path set is maintained by using the idea of a bundling decoding strategy for reference, and the exploration space of the training process is increased. The more accurate pre-estimated SQL is obtained, and the method is not only suitable for the field of text-to-SQL, but also can be migrated to other structured decoding tasks.

The above embodiment was described by way of experiment, and for the evaluation index:

value-free exact set matching (EM) this metric measures the equivalence of two SQL queries by comparing each component. The prediction is correct only if each fine-grained unit is correct. Order issues will be ignored, e.g., SELECT col1, col2 ═ SELECT col2, col 1. However, EM only checks the sketch and ignores the values.

Exact set and value matching (EMV) on an EM basis, this metric further checks the correctness of the SQL value without actually performing it. It may be too strict because different SQL queries may be semantically consistent.

Execution accuracy (EX) it measures accuracy by comparing the results of two SQL query executions. To alleviate the problems caused by pseudo-programs, where predictive SQL is executed and examined across multiple databases, this setup is followed and the results reported accordingly.

Spider, which is a large-scale cross-domain zero-snapshot text-to-SQL benchmark test, is selected for the dataset, and comprises 8659 training examples and 1034 verification examples spanning 146 databases. The test data set was not visible, containing 2147 samples and 40 databases. The model is submitted to a challenge organizer for evaluation. EM and EX were reported in the experiments. DuSQL is a large-scale, practical, cross-domain Chinese dataset for the zero-shot text conversion SQL task. There are 22521/2482 training/validation examples in 177 databases. Participants only have access to the input of 3759 test samples containing 23 databases, while the output SQL is not available. EM and EMV were used to evaluate the model of the present method.

For the experimental configuration, a 300-dim word vector model and PLM (pre-trained language model) were used for Spider, with the default beam size of the frame in the word vector model during training being 4. For evaluation, if there is an order in which each type set is specified, it will be followed directly. Otherwise, the search space is enlarged by using an exploration method.

The main results for the data set Spider are shown in fig. 6. The black heart indicates that accuracy is borrowed from the original code link because the leaderboard is not displayed. And the main result of the data set DuSQL shown in fig. 7. The method framework has made a significant improvement over DuSQL, exceeding the most advanced data enhancement method (+ Aug) by 15: 9% in the measure of EMV. As for the data set Spider, competitive performance was achieved on the development set, only lower than the PICARD model using very large PLM.

The performance of different sorting methods of typed/untyped sets and combinations thereof was also analyzed. For simplicity, in the rest, capital TS is used to represent type Set-type Set, UTS is used to represent UnTyped Set-type Set, C/R/P is used to represent Controller, Random, and Probe, respectively: controller, random, exploration method. All results of the method are averaged over at least three random seeds to reduce randomness. For time and memory space considerations, we use a GLOVE (belonging to a language model based on, e.g., neural network) vector for the dataset Spider.

For the same order, first, the ordering method of TS and UTS is uniform, as shown in fig. 8 for the evaluation of TS and UTS. The cross symbol represents the evaluation of the test set, otherwise the development set. For the C (controller shorthand above) method, a canonical order is randomly sampled for each TS, which is tightly coupled with a grammar rule, and a canonical order is randomly sampled for each UTS of each training sample. These sequences are then fixed throughout the training process.

As shown in FIG. 8, (1) method R randomly selects a particular order in each iteration during training, which outperforms other methods. (2) Using reinforcement learning (+ RL) to fine tune the pre-trained model, only limited improvement over the baseline TS-C + UTS + C was obtained. (3) Disappointingly, the performance of method P left the CS-Seq2Set problem to the model itself, degrading significantly. For reasons, the training curves for methods P and R are plotted in fig. 9 and 10. For R, after each training phase, the model was run in evaluation mode using method P to plot the order change. It records the number of samples whose order has changed from the previous epoch. The accumulated amount is further divided by TS and UTS. By comparing the curves, the following conclusions can be drawn: (1) the R model converges quickly after iteration. (2) This is not the case for P, especially for TS. It is observed that each training sample selects a custom order for the same type set. However, they never reached a consensus. This interaction results in oscillations between several orders per training sample, and the optimal traversal order never converges. This phenomenon can be called "order collapse".

To solve the above problem, a canonical order (TS-C) is specified for each TS before training, and only the sequential method of UTS is changed. For the priority order of the TS, a more heuristic approach is used: in the evaluation mode, the stored TS-R + UTS-R model is run again on the training set, rather than randomly sampling an order for each TS. By counting the number of times each sequential occurrence of the TS, we select the one with the highest frequency. Fig. 11 lists part of the canonical prediction order.

From the most exciting observation, as shown in fig. 12, UTS-P starts to be more dominant than R when the order of each TS is fixed in the sample. When the training curve is again plotted, the UTS eventually converges and the last segment is parsed with TS-C (as shown in FIG. 13). Even with a beam size of 1 during training, its performance is consistently better than UTS-R on a cross-shaft. More excellent results were obtained with increasing size (═ 4). The irregular fluctuations on the DuSQL development set can be attributed to the finite size of its average UTS.

A more aggressive attempt is to combine the advantages of TS method R and UTS method P. The results are shown in FIG. 14. This combination provides the best performance to date compared to baseline TS-C + UTS-C, achieving a 4:1/4:6EM/EX improvement on Spider and a 2:2/2:1EM/EMV improvement on DuSQL. This is also the strategy adopted in the main outcome of the method.

In the method, a learning framework facing the golden tree is provided, and a problem is set by taking a top-down grammar-based text-to-SQL task as a combined structure sequence. Through experiments, nodes are randomly selected from the typed set to serve as decoder input, and the model selects output actions from the non-typed set according to prediction of the nodes, so that the experience criterion is achieved, and the training effect is improved.

Fig. 15 is a schematic structural diagram of a system for converting text into SQL according to an embodiment of the present invention, which can execute the method for converting text into SQL according to any of the above embodiments and is configured in a terminal.

The system 10 for converting text into SQL provided by the embodiment includes: a node candidate set determination program module 11, an action distribution determination program module 12, a syntax tree construction program module 13 and a conversion program module 14.

The node candidate set determining program module 11 is configured to determine a node candidate set used for generating an abstract syntax tree according to a problem text, a database table, and a column; the motion distribution determining program module 12 is configured to determine any node randomly selected from the node candidate set as a head node of the abstract syntax tree, and input the determined head node to a decoder to obtain motion distribution of the head node; the syntax tree construction program module 13 is configured to determine, based on the motion distribution, multiple nodes at which a node at the current time is expandable at the next time, determine, according to the node selection scheme corresponding to the node selection, a next-time child node and a corresponding motion distribution from the multiple nodes that are expandable, and expand the abstract syntax tree until the next-time child node cannot be determined, so as to obtain a final abstract syntax tree that is prevented from being over-fitted; the converter module 14 is configured to convert the question text into a corresponding SQL statement based on the final abstract syntax tree.

The embodiment of the invention also provides a nonvolatile computer storage medium, wherein the computer storage medium stores computer executable instructions which can execute the text-to-SQL method in any method embodiment;

as one embodiment, a non-volatile computer storage medium of the present invention stores computer-executable instructions configured to:

Fig. 16 is a schematic structural diagram of a training system for converting a text into an SQL model according to an embodiment of the present invention, which can execute the training method for converting a text into an SQL model according to any of the above embodiments and is configured in a terminal.

The training system 20 for converting text into SQL model provided by the embodiment includes: a training data input program module 21, an action distribution determination program module 22, a syntax tree estimation program module 23 and a training program module 24.

The training data input program module 21 is configured to input a training node candidate set determined by a training problem text and a database table and a column to the text-to-SQL model; the action distribution determining program module 12 is configured to determine any node randomly selected from the node candidate set as a head node of the abstract syntax tree, and decode the head node by using a decoder of the text-to-SQL model to obtain action distribution of the head node; the syntax tree estimation program module 23 is configured to determine, based on the motion distribution, multiple nodes that are expandable at a current time, determine, from the multiple expandable nodes, a next-time child node and a corresponding motion distribution to expand the abstract syntax tree, and input the next-time child node to the decoder to expand the abstract syntax tree again until the child node of the next-time child node cannot be determined, so as to obtain an estimated abstract syntax tree; the training program module 24 is configured to perform over-fitting avoidance training on the text-to-SQL model based on a difference between the real abstract syntax tree corresponding to the training question text and the predicted abstract syntax tree until the predicted abstract syntax tree approaches the real abstract syntax tree.

The embodiment of the invention also provides a nonvolatile computer storage medium, wherein the computer storage medium stores computer executable instructions which can execute the training method of the text-to-SQL model in any method embodiment;

As a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the methods in embodiments of the present invention. One or more program instructions are stored in a non-transitory computer readable storage medium, which when executed by a processor, perform the text-to-SQL method and the training method of the text-to-SQL model in any of the method embodiments described above.

Fig. 17 is a schematic diagram of a hardware structure of an electronic device according to another embodiment of the present application, where the electronic device includes a text-to-SQL method and a training method for a text-to-SQL model, and as shown in fig. 17, the electronic device includes:

one or more processors 1710, and a memory 1720, with one processor 1710 being illustrated in fig. 17. The equipment of the text-to-SQL method can also comprise: an input device 1730 and an output device 1740.

The processor 1710, memory 1720, input device 1730, and output device 1740 may be connected by a bus or other means, such as being connected by a bus in fig. 17.

Memory 1720, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the text-to-SQL method and the training method of the text-to-SQL model in the embodiments of the present application. The processor 1710 executes various functional applications and data processing of the server by running nonvolatile software programs, instructions and modules stored in the memory 1720, that is, implementing the text-to-SQL method and the training method of the text-to-SQL model according to the above embodiments of the method.

The memory 1720 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data and the like. Further, the memory 1720 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some embodiments, the memory 1720 may optionally include memory located remotely from the processor 1710, which may be connected to the mobile device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 1730 may receive input numeric or character information. The output device 1740 may include a display device such as a display screen.

The one or more modules are stored in the memory 1720 and, when executed by the one or more processors 1710, perform the text-to-SQL method and the training method of the text-to-SQL model in any of the method embodiments described above.

The product can execute the method provided by the embodiment of the application, and has the corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the methods provided in the embodiments of the present application.

The non-volatile computer-readable storage medium may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the device, and the like. Further, the non-volatile computer-readable storage medium may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the non-transitory computer readable storage medium optionally includes memory located remotely from the processor, which may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

An embodiment of the present invention further provides an electronic device, which includes: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to perform the steps of the text-to-SQL method and the training method of the text-to-SQL model according to any of the embodiments of the invention.

The electronic device of the embodiments of the present application exists in various forms, including but not limited to:

(1) mobile communication devices, which are characterized by mobile communication capabilities and are primarily targeted at providing voice and data communications. Such terminals include smart phones, multimedia phones, functional phones, and low-end phones, among others.

(2) The ultra-mobile personal computer equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include PDA, MID, and UMPC devices, such as tablet computers.

(3) Portable entertainment devices such devices may display and play multimedia content. The devices comprise audio and video players, handheld game consoles, electronic books, intelligent toys and portable vehicle-mounted navigation devices.

(4) Other electronic devices with data processing capabilities.

In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for converting text into SQL comprises the following steps:

2. The method of claim 1, wherein the node comprises: typed nodes and untyped nodes;

3. The method of claim 2, wherein when the node is a type-free node, determining a next-time child node from the plurality of expandable nodes to expand the abstract syntax tree according to the node selection scheme determined by the node determination module comprises:

4. The method of claim 1, wherein the determining a set of node candidates for generating an abstract syntax tree from the question text and the database tables and columns comprises:

determining, by an encoder, vector representations of all words in the question text;

and determining the vector representations of all the words, the vector representation of the database table and the vector representation of the column to a node candidate set of an abstract syntax tree.

5. The method of claim 2, wherein the method further comprises:

and when the node is a typed node, reasoning the sub-node at the next moment from the paths in the extensible nodes by utilizing a random scheme.

6. The method of claim 3, wherein the method further comprises:

and when the node is a non-type node, reasoning the sub-node at the next moment from paths in the extensible multiple nodes by utilizing an exploration scheme based on beam search.

7. The method of claim 1, wherein said converting the question text to a corresponding SQL statement based on the final abstract syntax tree comprises:

decoding a node tree of nodes traversed from the final abstract syntax tree;

obtaining an action tree formed by action distribution corresponding to the nodes;

and determining a corresponding SQL statement through the action tree.

8. A training method for converting text to SQL model includes:

9. A system for text to SQL, comprising:

10. A system for training a text-to-SQL model, comprising:

11. An electronic device, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the method of any one of claims 1-8.

12. A storage medium on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.