CN114780577A - SQL statement generation method, device, equipment and storage medium - Google Patents

SQL statement generation method, device, equipment and storage medium Download PDF

Info

Publication number
CN114780577A
CN114780577A CN202210344221.5A CN202210344221A CN114780577A CN 114780577 A CN114780577 A CN 114780577A CN 202210344221 A CN202210344221 A CN 202210344221A CN 114780577 A CN114780577 A CN 114780577A
Authority
CN
China
Prior art keywords
candidate value
sql statement
database
value set
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210344221.5A
Other languages
Chinese (zh)
Inventor
孙翀
朱柏尔
帖军
尹帆
郑禄
王江晴
徐科
雷炜豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South Central Minzu University
Original Assignee
South Central University for Nationalities
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South Central University for Nationalities filed Critical South Central University for Nationalities
Priority to CN202210344221.5A priority Critical patent/CN114780577A/en
Publication of CN114780577A publication Critical patent/CN114780577A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/243Natural language query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the technical field of data processing, and discloses a method, a device, equipment and a storage medium for generating an SQL statement. The method comprises the steps of acquiring a natural language query question, a database mode and database contents input by a user, and then inputting the natural language query question, the database mode and the database contents into a preset SLSQL model to generate an SQL statement. Compared with the prior art that natural language input of a user is converted into executable SQL statements through Text2SQL, the SQL statements are not good in effect in a complex problem environment, the method can improve the utilization effect of the database contents, and complex and nested SQL statements are generated through the preset SLSQL model.

Description

SQL statement generation method, device, equipment and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for generating an SQL statement.
Background
In recent years, with the rapid development of mobile internet and artificial intelligence related industries, data assets have become an indispensable part of enterprise and personal life, and related technologies for mining and analyzing data have become the focus of attention in the big data era. However, today, the mainstream data analysis software still needs to operate commands (such as SQL statements) through a professional database, which undoubtedly increases the use cost of the data analysis tool. The automatic Text2SQL technology is developed. As a natural language processing technology, the method aims to analyze the natural language input of a user into corresponding SQL query statements and endow artificial intelligence technology and natural language understanding capability for a data analysis tool. However, due to complexity of user intention and diversity of databases in real scenes, Text2SQL systems available in the current market are still scarce, and often have no good effect in cross-domain environments with complex problems. Therefore, how to accurately generate the SQL statement becomes a problem to be solved urgently.
The above is only for the purpose of assisting understanding of the technical solution of the present invention, and does not represent an admission that the above is the prior art.
Disclosure of Invention
The invention mainly aims to provide a method, a device, equipment and a storage medium for generating an SQL statement, and aims to solve the technical problem of how to accurately generate the SQL statement.
In order to achieve the above object, the present invention provides a method for generating an SQL statement, including the following steps:
acquiring a natural language query question input by a user, a database mode and database content;
and inputting the natural language query question, the database mode and the database content into a preset SLSQL model to generate an SQL statement.
Optionally, the preset SLESQL model includes: presetting an entity identification model, a coding model and a decoding model;
the step of inputting the natural language query question, the database schema and the database content into a preset SLSQL model to generate an SQL statement specifically includes:
inputting the natural language query question, the database mode and the database content into the preset entity recognition model to obtain a candidate value set;
inputting the candidate value set into the preset coding model to obtain a coded candidate value set;
and inputting the coded candidate value set to the preset decoding model to generate an SQL statement.
Optionally, the step of inputting the natural language query question, the database pattern, and the database content into the preset entity recognition model to obtain a candidate value set specifically includes:
inputting the natural language query question into the preset entity recognition model to obtain a candidate value set;
determining a candidate value from the set of candidate values, the database content, and the database schema;
determining a set of candidate values from the candidate values and the database schema.
Optionally, the step of determining a candidate value according to the candidate value set, the database content, and the database schema specifically includes:
determining a target set of candidate values from the set of candidate values, the database content, and the database schema;
and verifying the target candidate value set according to the number of the candidate values in the target candidate value set, the number of the numerical values in the database content and a preset similarity threshold value to obtain candidate values.
Optionally, the step of determining a target set of candidate values according to the set of candidate values, the database contents, and the database patterns specifically includes:
acquiring word embedded text distance between each candidate value in the candidate value set and the database content;
when the word embedded text distance meets a preset text distance condition, obtaining an initial candidate value set;
acquiring an editing distance between each initial candidate value in the initial candidate value set and the database mode;
and when the edit distance meets a preset edit distance condition, obtaining a target candidate value set.
Optionally, the step of inputting the candidate value set into the preset coding model to obtain a coded candidate value set specifically includes:
obtaining a matching type corresponding to the candidate value set;
marking the candidate value set according to the matching type to obtain a marked candidate value set;
and inputting the marked candidate value set into the preset coding model to obtain a coded candidate value set.
Optionally, the preset decoding model includes: a skeleton decoder and a detail decoder;
the step of inputting the encoded candidate value set to the preset decoding model to generate an SQL statement specifically includes:
inputting the encoded candidate value set into the skeleton decoder to obtain a statement skeleton;
and inputting the statement skeleton and the encoded candidate value set into the detail decoder to obtain the SQL statement.
In addition, to achieve the above object, the present invention further provides an SQL statement generating apparatus, including:
the information acquisition module is used for acquiring natural language query questions, database modes and database contents input by a user;
and the SQL statement generation module is used for inputting the natural language query question, the database mode and the database content into a preset SLSQL model so as to generate an SQL statement.
In addition, in order to achieve the above object, the present invention further provides an SQL statement generating device, where the SQL statement generating device includes: the system comprises a memory, a processor and an SQL statement generating program stored on the memory and capable of running on the processor, wherein the SQL statement generating program is configured to realize the steps of the SQL statement generating method.
In addition, to achieve the above object, the present invention further provides a storage medium, which stores an SQL statement generating program, and when the SQL statement generating program is executed by a processor, the SQL statement generating program implements the steps of the SQL statement generating method as described above.
The method comprises the steps of obtaining a natural language query question, a database mode and database contents input by a user, and then inputting the natural language query question, the database mode and the database contents into a preset SLSQL model to generate an SQL statement. Compared with the prior art that natural language input of a user is converted into executable SQL statements through Text2SQL, the SQL statements are not good in effect in a complex problem environment, the method can improve the utilization effect of the database contents, and complex and nested SQL statements are generated through the preset SLSQL model.
Drawings
Fig. 1 is a schematic structural diagram of an SQL statement generating device of a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a schematic flowchart of a SQL statement generating method according to a first embodiment of the present invention;
FIG. 3 is a flowchart illustrating a second embodiment of the SQL statement generating method according to the invention;
fig. 4 is a block diagram of the SQL statement generating apparatus according to the first embodiment of the present invention.
The implementation, functional features and advantages of the present invention will be further described with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.
Referring to fig. 1, fig. 1 is a schematic structural diagram of an SQL statement generating device in a hardware operating environment according to an embodiment of the present invention.
As shown in fig. 1, the SQL statement generation device may include: a processor 1001, such as a Central Processing Unit (CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. The communication bus 1002 is used to implement connection communication among these components. The user interface 1003 may include a Display (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a Wireless interface (e.g., a Wireless-Fidelity (Wi-Fi) interface). The Memory 1005 may be a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as a disk Memory. The memory 1005 may alternatively be a storage device separate from the processor 1001.
Those skilled in the art will appreciate that the architecture shown in FIG. 1 does not constitute a limitation of the SQL statement generation device, and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.
As shown in fig. 1, a memory 1005, which is a storage medium, may include therein an operating system, a network communication module, a user interface module, and an SQL statement generation program.
In the SQL statement generating apparatus shown in fig. 1, the network interface 1004 is mainly used for data communication with a network server; the user interface 1003 is mainly used for data interaction with a user; the processor 1001 and the memory 1005 in the SQL statement generation device of the present invention may be disposed in the SQL statement generation device, and the SQL statement generation device invokes the SQL statement generation program stored in the memory 1005 through the processor 1001 and executes the SQL statement generation method provided in the embodiment of the present invention.
Based on the above SQL statement generation device, an embodiment of the present invention provides an SQL statement generation method, and referring to fig. 2, fig. 2 is a flowchart illustrating a first embodiment of the SQL statement generation method according to the present invention.
In this embodiment, the SQL statement generation method includes the following steps:
step S10: acquiring a natural language query question input by a user, a database mode and database content;
it should be noted that the execution main body of the embodiment may be a computing service device with data processing, network communication and program running functions, such as a mobile phone, a tablet computer, a personal computer, etc., or an electronic device or an SQL statement generating device capable of implementing the above functions. The present embodiment and the following embodiments will be described below by taking the SQL statement generating device as an example.
It is understood that a natural language query question refers to a query question entered by a user, such as: "Show origin and destination for flights with price highher than 300", the natural language query question in this embodiment is composed of English.
It should be understood that a database schema may include a rank of a database, with database contents referring to values in the database.
Step S20: and inputting the natural language query question, the database mode and the database content into a preset SLESSQL model to generate an SQL statement.
It should be noted that the preset SLESQL model refers to a preset SLESQL model, and the model can generate SQL statements.
In this embodiment, the SQL statement is generated by acquiring the natural language query question, the database schema, and the database content input by the user and then inputting the natural language query question, the database schema, and the database content into the preset SLESQL model. Compared with the prior art that natural language input of a user is converted into executable SQL statements through Text2SQL, the SQL statements are not good in effect in a complex problem environment, the method can improve the utilization effect of the database contents, and complex and nested SQL statements are generated through the preset SLSQL model.
Referring to fig. 3, fig. 3 is a schematic flowchart of a method for generating an SQL statement according to a second embodiment of the present invention.
Based on the first embodiment described above, in the present embodiment, the step S20 includes:
step S201: inputting the natural language query question, the database mode and the database content into the preset entity recognition model to obtain a candidate value set;
it should be noted that the preset SLESQL model in this embodiment may include a preset entity identification model, a preset coding model, and a preset decoding model. The preset entity recognition model refers to a preset entity recognition model, the preset coding model refers to a preset coding model, and the preset decoding model refers to a preset decoding model.
It will be appreciated that the set of candidate values may include candidate values associated with the natural language query problem, the rows and columns corresponding to the candidate values.
Further, in order to accurately determine the candidate value set, in this embodiment, the step S201 includes: inputting the natural language query question into the preset entity recognition model to obtain a candidate value set; determining a candidate value from the set of candidate values, the database content, and the database pattern; determining a set of candidate values from the candidate values and the database schema.
It is understood that the preset entity identification model in the embodiment may be a TENER model, which is an improvement on an original Transformer basis for an NER task, and a modified Transformer encoder is adopted to model character-level features and word-level features.
It should be understood that the present embodiment also uses some simple and effective heuristic method to extract the candidate values, 1) the content in quotation marks: for example, whose head name has a substring 'Ha'? (ii) a 2) The capitalized terms: for example, show all flight numbers with AirbussA 340-300 planes; 3) single letter: for example, when is the hiring date of employees who do not include the letter M in their name?
In a specific implementation, a natural language query question is input into a preset entity identification model, so as to obtain a candidate value set, where the candidate value set includes a plurality of candidate values, and the candidate values refer to values related to an entity in the natural language query question, for example: the natural language query question includes an entity of 'Xiaoming', and the candidate value set may include 'Mingyu' and 'Mingfeng' awaiting selection values.
Further, in order to accurately determine a candidate value, in this embodiment, the step of determining a candidate value according to the candidate value set, the database content, and the database pattern specifically includes: determining a target set of candidate values from the set of candidate values, the database contents, and the database pattern; and verifying the target candidate value set according to the number of the candidate values in the target candidate value set, the number of the numerical values in the database content and a preset similarity threshold value to obtain the candidate values.
It should be noted that the target candidate value set refers to a candidate value set obtained by screening all candidate values in the candidate value set.
It will be appreciated that after determining the target candidate value set, the number of candidate values in the target candidate value and the number of values in the database contents may be obtained, for example: the number of candidate values in the target candidate value is 60, the number of numerical values in the database content is 100, and at this time, the ratio between the number of candidate values and the number of numerical values is too large, which indicates that the number of selected candidate values is too large, and the similarity threshold needs to be reset until the ratio between the number of selected candidate values and the number of numerical values is smaller than the preset ratio.
It should be understood that the similarity threshold may include a preset text distance condition and a preset edit distance condition, and the ratio between the number of selected values in the target candidate value and the number of values in the database content may be reduced by changing the two preset distance conditions, which is a verification process.
In a specific implementation, the present embodiment may also use exact matching instead of following the similarity to verify the target candidate set. There are two examples below: 'List the top 3albums of Elton John in the Billboard charts' and 'Find all albums of Elton John standing with "goodbye'. in these cases, the present embodiment would not Find '3' or" goodbye "in the database contents. In the first example, the value 3 is not part of the database, but is used in SQL queries to limit the results. In a second example, the label "goodbye" requires a wildcard match. Wildcard matching is not sufficient to validate candidate values because such an approach is too flexible resulting in more false candidates.
Further, in order to accurately determine the target candidate value set, in this embodiment, the step of determining the target candidate value set according to the candidate value set, the database content and the database pattern specifically includes: acquiring word embedded text distance between each candidate value in the candidate value set and the database content; when the word embedded text distance meets a preset text distance condition, obtaining an initial candidate value set; acquiring an edit distance between each initial candidate value in the initial candidate value set and the database mode; and when the editing distance meets a preset editing distance condition, obtaining a target candidate value set.
It should be noted that word embedding is a word representation form, which allows words with similar meanings to have similar representation forms, word embedded text distance refers to a text distance between each candidate value in the candidate value set and each database value in the database content, and the preset text distance condition refers to a preset text distance condition, which may be specifically set according to an actual situation, and this embodiment does not specifically limit this.
It can be understood that, in the embodiment, all candidate values corresponding to the preset text distance condition can be selected from the candidate value set to form an initial candidate value set for primary screening.
It should be understood that the edit distance refers to a quantitative measure of a difference degree between two character strings (e.g., english characters), the measure is how many times a process is required to change one character string into another character string, the edit distance in this embodiment refers to an edit distance between each initial candidate value in the initial candidate value set and a row and a column in the database pattern, the preset edit distance condition refers to a preset edit distance condition, which may be specifically set according to actual situations, and this embodiment does not specifically limit this.
In a specific implementation, in this embodiment, all candidate values that meet the preset edit distance condition may be selected from the initial candidate value set to form a target candidate value set, so as to perform re-screening.
In addition, the present embodiment can also generate the target candidate value set in two other ways, the first way is through a manual heuristic method, because the database has some specific (repeated) methods to implement some data types. The present embodiment currently uses the following heuristic approach: (1) classical gender values, for example, are typically implemented as a VARCHAR type column with contents of "F" or "M". (2) Boolean data types are typically implemented with columns of values, 0 and 1. (3) For example, ordinals are typically implemented with numeric columns, and are typically represented by an integer column. (4) Months (e.g., 8 months) are typically part of a full date column, and they can be found by using wildcards (e.g., 8/%). Although these simple manual heuristics do not generalize to every domain of databases, generative models can be directed to learn this pattern in a more dynamic manner. A second way is to use n-grams, which this embodiment applies to words that are marked multiple times, using n-grams for words for which consecutive words are marked as candidate column names or candidate numerical values.
Specifically, the three modes can be combined for use, and one mode or two or three modes can be used, which is not limited in this embodiment.
Step S202: inputting the candidate value set into the preset coding model to obtain a coded candidate value set;
further, in order to accurately obtain the encoded candidate value set, in this embodiment, the step S202 includes: obtaining a matching type corresponding to the candidate value set; marking the candidate value set according to the matching type to obtain a marked candidate value set; and inputting the marked candidate value set into the preset coding model to obtain a coded candidate value set.
It should be noted that the match type in this embodiment may include a full match class and a candidate value match class, and if a table or a column completely matches a word in a natural language query question, it is classified herein as a full match class, and if it is only a partial match, it is classified as a candidate value match class.
It can be understood that, the present embodiment performs label classification on each word in the input natural language query question, and the classification is divided into a table, a column, a numerical value, an aggregation operation, a super value, and the like, which will become a more important basis in synthesizing the query. This embodiment performs a character-based exact match of all words in the database schema and database contents, which will provide some a priori knowledge for the neural network. Because if the neural network only inputs the natural language query question and the database schema, much extra information is lost, for example, when we want to know whether a table name or a column name in the database schema is mentioned in the natural language query statement, we need to provide the neural network with suggestive information about the table name and the column name. But those more complex relationships, such as "highher" may refer to price or height, which will be established in the encoder of the neural network.
In a specific implementation, after the candidate value set is marked, all candidate values in the candidate value set may be marked as a complete matching class or a candidate value matching class, a marked candidate value set is obtained, and the marked candidate value set is input to a preset coding model. The default coding model of this embodiment is based on a pre-trained Transformer, and this architecture is also used in the latest Text2SQL system retsql. The Transformer architecture has been used for different tasks such as natural language translation, natural language generation, and more recently in the field of entity recognition, and this kind of only-concerned Transformer structure is better able to represent natural language sequences than the classical Recurrent Neural Network (RNN). Thus, in many natural language processing tasks, the Transformer structure generally yields better results than traditional neural networks. The preset coding model of the present embodiment is an extension of the IRNet encoder. The main difference is that the encoder of SLESQL enters not only information about the database schema, but also candidate values extracted from the database contents. Therefore, the preset coding model of the present embodiment can also learn the correlation between the labels of the natural language query questions and the actual values in the database contents.
Step S203: and inputting the coded candidate value set to the preset decoding model to generate an SQL statement.
Further, in order to accurately generate the SQL statement, in this embodiment, the step S203 includes: inputting the encoded candidate value set into the skeleton decoder to obtain a statement skeleton; and inputting the statement skeleton and the encoded candidate value set into the detail decoder to obtain the SQL statement.
It should be noted that the preset decoding model in the present embodiment may include a skeleton decoder and a detail decoder.
In a specific implementation, the preset decoding model in this embodiment receives as input the question/table/column/value code from the preset coding model, and the output is the synthesized SemQL query, and the preset decoding model is composed of an LSTM architecture and multiple pointer networks, and is used for selecting the table, the column and the value. The present embodiment uses a syntax-based decoder (TRANX) to simulate the generation process of SemQL queries with LSTM. The present embodiment adopts a coarse-to-fine framework (coarse-to-fine) to decompose the decoding process of the SemQL query into two stages. In the first stage, a skeleton decoder outputs the skeleton of the SemQL query. Then, the detail decoder fills in the missing details in the skeleton by selecting columns and tables, and the candidate value set and the skeleton are completed into a complete SQL statement in the detail decoder.
In the embodiment, a natural language query question, a database mode and database contents are input into a preset entity recognition model to obtain a candidate value set, then the candidate value set is input into a preset coding model to obtain a coded candidate value set, and then the coded candidate value set is input into a preset decoding model to generate an SQL statement. In the embodiment, a candidate value set is extracted and generated through a preset entity recognition model, and then a complex and nested SQL statement is synthesized by using the candidate value set through a preset encoder and a preset decoder.
Referring to fig. 4, fig. 4 is a block diagram of the SQL statement generating apparatus according to the first embodiment of the present invention.
As shown in fig. 4, the apparatus for generating an SQL statement according to the embodiment of the present invention includes:
an information acquisition module 10, configured to acquire a natural language query question input by a user, a database mode, and database content;
the SQL statement generating module 20 is configured to input the natural language query question, the database schema, and the database content into a preset SLESQL model to generate an SQL statement.
In the embodiment, the SQL statement is generated by acquiring the natural language query question, the database schema and the database content input by the user and then inputting the natural language query question, the database schema and the database content into the preset SLESQL model. Compared with the prior art that natural language input of a user is converted into executable SQL statements through Text2SQL, the SQL statements are not good in effect in a complex problem environment, the method can improve the utilization effect of the database contents, and complex and nested SQL statements are generated through the preset SLSQL model.
It should be noted that the above-mentioned work flows are only illustrative and do not limit the scope of the present invention, and in practical applications, those skilled in the art may select some or all of them according to actual needs to implement the purpose of the solution of the present embodiment, and the present invention is not limited herein.
In addition, the technical details that are not described in detail in this embodiment may refer to the SQL statement generation method provided in any embodiment of the present invention, and are not described herein again.
Based on the first embodiment of the SQL statement generation apparatus of the present invention, a second embodiment of the SQL statement generation apparatus of the present invention is provided.
In this embodiment, the preset SLESQL model includes: presetting an entity identification model, a coding model and a decoding model; the SQL statement generating module 20 is further configured to input the natural language query question, the database pattern, and the database content into the preset entity identification model to obtain a candidate value set; inputting the candidate value set into the preset coding model to obtain a coded candidate value set; and inputting the encoded candidate value set to the preset decoding model to generate an SQL statement.
Further, the SQL statement generating module 20 is further configured to input the natural language query question into the preset entity identification model, so as to obtain a candidate value set; determining a candidate value from the set of candidate values, the database content, and the database pattern; determining a set of candidate values from the candidate values and the database schema.
Further, the SQL statement generating module 20 is further configured to determine a target set of candidate values according to the set of candidate values, the database content, and the database schema; and verifying the target candidate value set according to the number of the candidate values in the target candidate value set, the number of the numerical values in the database content and a preset similarity threshold value to obtain the candidate values.
Further, the SQL statement generating module 20 is further configured to obtain word embedded text distances between each candidate value in the candidate value set and the database content; when the word embedded text distance meets a preset text distance condition, obtaining an initial candidate value set; acquiring an edit distance between each initial candidate value in the initial candidate value set and the database mode; and when the editing distance meets a preset editing distance condition, obtaining a target candidate value set.
Further, the SQL statement generating module 20 is further configured to obtain a matching type corresponding to the candidate value set; marking the candidate value set according to the matching type to obtain a marked candidate value set; and inputting the marked candidate value set into the preset coding model to obtain a coded candidate value set.
Further, the preset decoding model comprises: a framework decoder and a detail decoder; the SQL statement generating module 20 is further configured to input the encoded candidate value set into the skeleton decoder to obtain a statement skeleton; and inputting the statement skeleton and the encoded candidate value set into the detail decoder to obtain the SQL statement.
Other embodiments or specific implementation manners of the SQL statement generating device according to the present invention may refer to the above method embodiments, and are not described herein again.
In addition, an embodiment of the present invention further provides a storage medium, where the storage medium stores an SQL statement generating program, and the SQL statement generating program, when executed by a processor, implements the steps of the SQL statement generating method described above.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., a rom/ram, a magnetic disk, an optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A SQL statement generation method is characterized by comprising the following steps:
acquiring a natural language query question input by a user, a database mode and database content;
and inputting the natural language query question, the database mode and the database content into a preset SLSQL model to generate an SQL statement.
2. The SQL statement generation method according to claim 1, wherein the preset SLESQL model comprises: presetting an entity identification model, a coding model and a decoding model;
the step of inputting the natural language query question, the database schema and the database content into a preset SLSQL model to generate an SQL statement specifically includes:
inputting the natural language query question, the database mode and the database content into the preset entity recognition model to obtain a candidate value set;
inputting the candidate value set into the preset coding model to obtain a coded candidate value set;
and inputting the coded candidate value set to the preset decoding model to generate an SQL statement.
3. The SQL statement generating method according to claim 2, wherein the step of inputting the natural language query question, the database schema, and the database content into the preset entity recognition model to obtain a candidate value set specifically comprises:
inputting the natural language query question into the preset entity recognition model to obtain a candidate value set;
determining a candidate value from the set of candidate values, the database content, and the database schema;
determining a set of candidate values from the candidate values and the database schema.
4. The SQL statement generation method of claim 3, wherein the step of determining a candidate value from the set of candidate values, the database content, and the database schema comprises:
determining a target set of candidate values from the set of candidate values, the database contents, and the database pattern;
and verifying the target candidate value set according to the number of the candidate values in the target candidate value set, the number of the numerical values in the database content and a preset similarity threshold value to obtain candidate values.
5. The SQL statement generation method according to claim 4, wherein the step of determining a target set of candidate values from the set of candidate values, the database content and the database schema comprises:
acquiring word embedded text distance between each candidate value in the candidate value set and the database content;
when the word embedded text distance meets a preset text distance condition, obtaining an initial candidate value set;
acquiring an editing distance between each initial candidate value in the initial candidate value set and the database mode;
and when the editing distance meets a preset editing distance condition, obtaining a target candidate value set.
6. The SQL statement generation method according to claim 2, wherein the step of inputting the candidate value set into the preset coding model to obtain a coded candidate value set specifically comprises:
obtaining a matching type corresponding to the candidate value set;
marking the candidate value set according to the matching type to obtain a marked candidate value set;
and inputting the marked candidate value set into the preset coding model to obtain a coded candidate value set.
7. The SQL statement generation method according to claim 2, wherein the preset decoding model comprises: a framework decoder and a detail decoder;
the step of inputting the encoded candidate value set to the preset decoding model to generate an SQL statement specifically includes:
inputting the encoded candidate value set into the skeleton decoder to obtain a statement skeleton;
and inputting the statement skeleton and the encoded candidate value set into the detail decoder to obtain the SQL statement.
8. An SQL statement generation apparatus, characterized in that the SQL statement generation apparatus comprises:
the information acquisition module is used for acquiring natural language query questions, database modes and database contents input by a user;
and the SQL statement generating module is used for inputting the natural language query question, the database mode and the database content into a preset SLSQL model so as to generate an SQL statement.
9. An SQL statement generation apparatus, comprising: a memory, a processor, and an SQL statement generation program stored on the memory and executable on the processor, the SQL statement generation program being configured to implement the steps of the SQL statement generation method according to any of claims 1 to 7.
10. A storage medium having stored thereon an SQL statement generation program which, when executed by a processor, implements the steps of the SQL statement generation method according to any one of claims 1 to 7.
CN202210344221.5A 2022-04-02 2022-04-02 SQL statement generation method, device, equipment and storage medium Pending CN114780577A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210344221.5A CN114780577A (en) 2022-04-02 2022-04-02 SQL statement generation method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210344221.5A CN114780577A (en) 2022-04-02 2022-04-02 SQL statement generation method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114780577A true CN114780577A (en) 2022-07-22

Family

ID=82427232

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210344221.5A Pending CN114780577A (en) 2022-04-02 2022-04-02 SQL statement generation method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114780577A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117591543A (en) * 2024-01-19 2024-02-23 成都工业学院 SQL sentence generation method and device for Chinese natural language

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117591543A (en) * 2024-01-19 2024-02-23 成都工业学院 SQL sentence generation method and device for Chinese natural language
CN117591543B (en) * 2024-01-19 2024-04-02 成都工业学院 SQL sentence generation method and device for Chinese natural language

Similar Documents

Publication Publication Date Title
CN112818093B (en) Evidence document retrieval method, system and storage medium based on semantic matching
CN111191275A (en) Sensitive data identification method, system and device
CN111198948A (en) Text classification correction method, device and equipment and computer readable storage medium
CN112084381A (en) Event extraction method, system, storage medium and equipment
CN112100401B (en) Knowledge graph construction method, device, equipment and storage medium for science and technology services
CN112163424A (en) Data labeling method, device, equipment and medium
CN117743315B (en) Method for providing high-quality data for multi-mode large model system
CN110750297B (en) Python code reference information generation method based on program analysis and text analysis
US20080184107A1 (en) Method and apparatus for creating a tool for generating an index for a document
CN106030568A (en) Natural language processing system, natural language processing method, and natural language processing program
CN116070632A (en) Informal text entity tag identification method and device
CN114780582A (en) Natural answer generating system and method based on form question and answer
Liu et al. Convolutional neural networks-based locating relevant buggy code files for bug reports affected by data imbalance
CN112036189A (en) Method and system for recognizing gold semantic
CN117709465A (en) Key information extraction method based on large language model
Flynn et al. Automated template-based metadata extraction architecture
CN115510188A (en) Text keyword association method, device, equipment and storage medium
CN115374354A (en) Scientific and technological service recommendation method, device, equipment and medium based on machine learning
CN114780577A (en) SQL statement generation method, device, equipment and storage medium
Pengcheng et al. Fast Chinese calligraphic character recognition with large-scale data
CN113377844A (en) Dialogue type data fuzzy retrieval method and device facing large relational database
EP4258107A1 (en) Method and system for automated discovery of artificial intelligence and machine learning assets in an enterprise
CN112668281A (en) Automatic corpus expansion method, device, equipment and medium based on template
CN115408506B (en) NL2SQL method combining semantic analysis and semantic component matching
Rajbhoj et al. DocToModel: automated authoring of models from diverse requirements specification documents

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination