CN117112590A - Method for generating structural query language and data query equipment - Google Patents
Method for generating structural query language and data query equipment Download PDFInfo
- Publication number
- CN117112590A CN117112590A CN202310526550.6A CN202310526550A CN117112590A CN 117112590 A CN117112590 A CN 117112590A CN 202310526550 A CN202310526550 A CN 202310526550A CN 117112590 A CN117112590 A CN 117112590A
- Authority
- CN
- China
- Prior art keywords
- target
- candidate
- information
- filtered
- sql
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 101
- 238000013507 mapping Methods 0.000 claims abstract description 110
- 238000012216 screening Methods 0.000 claims description 40
- 238000012545 processing Methods 0.000 claims description 36
- 238000004590 computer program Methods 0.000 claims description 12
- 238000013473 artificial intelligence Methods 0.000 claims 1
- 230000006870 function Effects 0.000 description 20
- 230000008569 process Effects 0.000 description 17
- 238000010586 diagram Methods 0.000 description 10
- 238000003058 natural language processing Methods 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 238000001914 filtration Methods 0.000 description 5
- 238000007726 management method Methods 0.000 description 5
- 238000012549 training Methods 0.000 description 5
- 230000009466 transformation Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 230000003993 interaction Effects 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000013519 translation Methods 0.000 description 3
- 241000234295 Musa Species 0.000 description 2
- 235000018290 Musa x paradisiaca Nutrition 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 239000008280 blood Substances 0.000 description 2
- 210000004369 blood Anatomy 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 235000019219 chocolate Nutrition 0.000 description 1
- 238000004883 computer application Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2433—Query languages
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2452—Query translation
- G06F16/24522—Translation of natural language queries to structured queries
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application provides a method and data query equipment for generating a Structured Query Language (SQL) statement, which are used for searching candidate examples which are most matched with a problem currently input by a user in a history query record according to a problem input by the user in the stored history query information and a mapping relation between the corresponding history generated correct SQL statement, so that the problem and the candidate examples currently input by the user are used as input of a large language model, and the large language model outputs the problem currently input by the user to correspond to the SQL statement. The method provided by the application can conveniently and accurately obtain the candidate examples corresponding to the problem currently input by the user, thereby improving the accuracy of SQL sentences generated by the problem currently input by the user.
Description
Technical Field
The embodiment of the application relates to the technical field of natural language processing, in particular to a method for generating a structural query language and data query equipment.
Background
The structured query language (structured query language, SQL) is a database query and programming language for accessing data, querying, updating, and managing relational database systems. The SQL language is a standardized query language in a relational database, and can finish operations such as data definition, data query, data control and the like on a target database. In the process of performing primary query by using SQL, a user generally does not need to refer to how data in a database is defined and stored, and only needs to know the structure of a table to query the required information from the table.
Natural language to structured query language (natural language to structured query language, NL2 SQL) refers to a technique that converts natural language to an SQL query. Its purpose is to convert natural language queries into machine-readable SQL queries so that data can be retrieved from a relational database. For example, in practical application, a user may directly input a problem, generate an SQL statement corresponding to the problem in the background, send an operation request to the database by using the SQL statement, process the request by the database server, and return a query result to the user. However, in this process, whether the SQL statement corresponding to the question given by the user can be accurately generated may affect the accuracy of the query result.
Therefore, how to improve the accuracy of the generated SQL sentence aiming at the problem of user input becomes a technical problem to be solved.
Disclosure of Invention
The application provides a method for generating a structural query language, which can improve the accuracy of the generated SQL sentence.
In a first aspect, a method for generating a structural query language is provided, where the method may be performed by a data query device, or may also be performed by a component (such as a chip or a circuit) of the data query device, or may also be performed by each server or virtual machine in a cloud service system, and this is not limited to this.
The method comprises the following steps: storing historical query information, wherein the historical query information comprises a first mapping relation between first information and a first Structural Query Language (SQL) statement, the first information comprises problems input by a user in a historical manner, and the first SQL statement is a correct SQL statement generated by a history corresponding to the problems input by the user in the historical manner; screening the historical query information according to second information to obtain candidate examples, wherein the second information comprises a problem currently input by a user, and the candidate examples comprise screened first mapping relations, and the similarity between the first information and the second information in the screened first mapping relations is larger than or equal to a first threshold value; generating third information according to the second information and the candidate examples, wherein the third information comprises the second information and the candidate examples; and generating a second SQL sentence corresponding to the second information according to the large language model and the third information, wherein the input of the large language model is the third information, and the output of the large language model is the second SQL sentence.
Specifically, the similarity between questions may be measured, for example, by the following method: the problem currently input by the user and the problem input by the stored user history can be vectorized by using one hot code or word embedding code, and then the similarity comparison is carried out by comparing the inner products of the two, or by measuring the distance, or by a neural network and other methods.
In the application, the history inquiry information can be stored, wherein the history inquiry information records the mapping relation between the problems input by the user and the correct SQL sentences generated by the corresponding history, and then the candidate example which is most matched with the problems input by the user at present is found in the history inquiry record, so that the problems input by the user at present and the candidate example are used as the input of the large language model, and the large language model outputs the problems input by the user at present corresponding to the SQL sentences. The method provided by the application can conveniently and accurately obtain the candidate examples corresponding to the problem currently input by the user, thereby improving the accuracy of SQL sentences generated based on the problem currently input by the user.
Optionally, the second information further includes a table mode corresponding to the question currently input by the user, where the second information is specifically the question currently input by the user and the table mode corresponding to the question; the first information further includes a table mode corresponding to the problem input by the user in history, and at this time, the history query information has a mapping relationship between the problem input by the user in history, the table mode corresponding to the problem, and the first structural query language SQL statement.
Specifically, the screening can be performed by setting rules when the screening is performed based on the table schema. For example, the user may edit various rules for filtering. For example: the table names are the same, the column names of the tables are the same, the tables have blood relationship, the tables with different table names are the same, and the tables with the main external keys are related. The user history input problem correspondence table pattern may be scored by these rules, and may be determined to be a candidate example when the score is greater than or equal to a certain threshold (e.g., a first threshold). For another example, user-defined rules may be screened for attributes of the table patterns, such as scoring the problem-corresponding table patterns entered by the user history based on the quality of the table, the date of the table, etc., and when the score is greater than or equal to a certain threshold (e.g., a first threshold), it may be determined to be a candidate example.
In the application, when historical query information is screened, the problem currently input by the user can be screened, and the table mode corresponding to the problem currently input by the user can be further screened, so that the matching between the screened candidate examples and the problem currently input by the user can be further ensured, and the accuracy of the produced SQL sentence is ensured.
With reference to the first aspect, in one possible implementation manner, after obtaining the candidate example, the method further includes: determining a target example among the candidate examples; generating third information according to the second information and the candidate examples, wherein the third information comprises the problem and the candidate examples which are currently input by the user, and the third information comprises: third information is generated according to the second information and the target examples, wherein the third information comprises the problems and the target examples which are currently input by the user.
In the method, the candidate examples can be further screened after the candidate examples are obtained, the target examples are obtained in the candidate examples, redundant information in the candidate examples can be removed, the richness of the target examples is guaranteed, computing resources can be saved, and the richness of the obtained target examples can be improved.
With reference to the first aspect, in one possible implementation manner, the obtained candidate examples include a plurality of candidate examples, where each candidate example in the plurality of candidate examples includes a first screened mapping relationship, and the first screened mapping relationship is a mapping relationship between a problem input by a user history after screening and a first SQL statement after screening, and determining a target example in the candidate examples includes: screening the plurality of first SQL sentences according to the differences among the plurality of first SQL sentences included in the plurality of first mapping relations corresponding to the plurality of candidate examples to obtain a plurality of target SQL sentences, wherein the differences among the plurality of target SQL sentences are larger than or equal to a second threshold value; and determining a plurality of target examples according to the plurality of target SQL sentences and the filtered problems input by the user history corresponding to the plurality of target SQL sentences, wherein each candidate example in the target examples comprises the filtered problems input by the user history and the mapping relation between the target SQL sentences.
In the present application, when judging the variability of the respective SQL statements, the variability between the SQL statements can be determined by, for example, comparing the contents of the respective fields of the SQL statements.
With reference to the first aspect, in one possible implementation manner, the obtained candidate examples include a plurality of candidate examples, where each candidate example in the plurality of candidate examples includes the first screened mapping relationship, the first screened mapping relationship is a screened problem input by a user history, a table mode corresponding to the screened problem input by the user history, and a mapping relationship between the first screened SQL statement, and determining a target example in the candidate examples includes: screening the table modes according to the differences among the table modes included in the first mapping relations corresponding to the candidate examples to obtain target table modes, wherein the differences among the target table modes are larger than or equal to a third threshold; determining a plurality of target examples according to the plurality of target table modes, the filtered user history input problems and the filtered first SQL sentences corresponding to the plurality of target table modes respectively; wherein each of the plurality of target examples includes: the filtered problems input by the user history, the filtered target table mode and the filtered mapping relation between the first SQL sentences.
With reference to the first aspect, in one possible implementation manner, the obtained candidate examples include a plurality of candidate examples, where each candidate example in the plurality of candidate examples includes a first screened mapping relationship, the first screened mapping relationship is a table mode corresponding to a problem input by a user history after screening, the problem input by the user history after screening, and a mapping relationship between first screened SQL statements, and determining a target example according to the candidate examples includes: screening the first SQL sentences according to the differences among the first SQL sentences included in the first mapping relations corresponding to the candidate examples to obtain a plurality of target SQL sentences, wherein the differences among the target SQL sentences are larger than or equal to a second threshold; determining a plurality of target examples according to the plurality of target SQL sentences, the filtered user history input questions corresponding to the plurality of target SQL sentences and the table modes corresponding to the filtered user history input questions; wherein each of the plurality of target examples includes: the filtered problems input by the user history, the filtered table modes and the mapping relation between target SQL sentences.
It should be appreciated that when the candidate instances are all similar, then screening by variability may be understood as ultimately retaining one or both candidate instances, which are target instances. This process may be understood as a process of removing redundant examples, and may reduce resource occupation.
According to the method and the device for obtaining the target examples, after the obtained candidate examples, the candidate examples can be screened according to the difference of SQL sentences and/or the difference of table modes, so that the target examples are more matched with the problems currently input by the user or the target examples are more matched with the problems currently input by the user and the table modes corresponding to the problems, and the accuracy of the generated SQL sentences is improved.
In a second aspect, the present application proposes a data querying device for performing the method of the first aspect described above. In particular, the device may comprise means and/or modules, such as a transceiver unit and/or a processing unit, for performing the method of the first aspect described above.
In a third aspect, there is provided a data querying device, the device comprising: at least one processor configured to execute the computer program or instructions stored in the memory to perform the method of the first aspect described above. Optionally, the device further comprises a memory for storing a computer program or instructions. Optionally, the device further comprises a communication interface through which the processor reads the computer program or instructions stored in the memory.
In one implementation, the device is a functional data querying device for implementing a method of generating SQL statements in a chip.
In another implementation, the device is a functional chip, a system-on-chip, or a circuit for implementing a method of generating SQL statements in a chip.
In a fourth aspect, the present application provides a processor comprising: input circuit, output circuit and processing circuit. The processing circuit is configured to receive a signal via the input circuit and transmit a signal via the output circuit, such that the processor performs the method of the first aspect.
In a specific implementation process, the processor may be one or more chips, the input circuit may be an input pin, the output circuit may be an output pin, and the processing circuit may be a transistor, a gate circuit, a flip-flop, various logic circuits, and the like. The input signal received by the input circuit may be received and input by, for example and without limitation, a transceiver, the output signal output by the output circuit may be output to and transmitted by, for example and without limitation, a transmitter, and the input circuit and the output circuit may be the same circuit, which functions as the input circuit and the output circuit, respectively, at different times. The embodiment of the application does not limit the specific implementation modes of the processor and various circuits.
The operations such as transmitting and acquiring/receiving, etc. related to the processor may be understood as operations such as outputting and receiving, inputting, etc. by the processor, or may be understood as operations such as transmitting and receiving by the radio frequency circuit and the antenna, if not specifically stated, or if not contradicted by actual function or inherent logic in the related description, which is not limited by the present application.
In a fifth aspect, a processing device is provided that includes a processor and a memory. The processor is configured to read instructions stored in the memory and to receive signals via the transceiver and to transmit signals via the transmitter to perform the method of the first aspect described above.
Optionally, the processor is one or more, and the memory is one or more.
Alternatively, the memory may be integrated with the processor or the memory may be separate from the processor.
In a specific implementation process, the memory may be a non-transient (non-transitory) memory, for example, a Read Only Memory (ROM), which may be integrated on the same chip as the processor, or may be separately disposed on different chips.
It should be appreciated that the related data interaction process, for example, transmitting the indication information, may be a process of outputting the indication information from the processor, and the receiving the capability information may be a process of receiving the input capability information by the processor. Specifically, the data output by the processor may be output to the transmitter, and the input data received by the processor may be from the transceiver. Wherein the transmitter and transceiver may be collectively referred to as a transceiver.
The processing device in the fifth aspect described above may be one or more chips. The processor in the processing device may be implemented in hardware or in software. When implemented in hardware, the processor may be a logic circuit, an integrated circuit, or the like; when implemented in software, the processor may be a general-purpose processor, implemented by reading software code stored in a memory, which may be integrated in the processor, or may reside outside the processor, and exist separately.
In a sixth aspect, a cluster of computing devices is provided, comprising at least one computing device, each computing device comprising a processor and a memory; the processor of the at least one computing device is configured to execute instructions stored in the memory of the at least one computing device to cause the cluster of computing devices to perform the method of the first aspect or any one of the possible implementations of the first aspect.
In the alternative, the processor may be a general purpose processor, and may be implemented in hardware or in software. When implemented in hardware, the processor may be a logic circuit, an integrated circuit, or the like; when implemented in software, the processor may be a general-purpose processor, implemented by reading software code stored in a memory, which may be integrated in the processor, or may reside outside the processor, and exist separately.
In a seventh aspect, a computer readable storage medium is provided, the computer readable medium storing program code for device execution, the program code comprising instructions for performing the method of the first aspect described above.
In an eighth aspect, there is provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of the first aspect described above.
In a ninth aspect, there is provided a chip system comprising a processor for calling and running a computer program from a memory, such that a device in which the chip system is installed performs the method of the first aspect described above.
Drawings
Fig. 1 is a schematic diagram of a data query device architecture to which the present application is applicable.
FIG. 2 is a schematic flow chart of a method 200 of generating SQL statements provided by the application.
FIG. 3 is another illustrative flow chart for generating SQL statements provided by the application.
Fig. 4 is a schematic block diagram of a data querying device 100 provided by the present application.
Fig. 5 is a schematic block diagram of a data querying device 200 provided by the present application.
Fig. 6 is a schematic architecture diagram of a computing device cluster according to an embodiment of the present application.
Fig. 7 is a schematic diagram of a connection between computing devices 600A and 600B via a network provided by an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings.
In order to facilitate understanding of the technical solution of the present application, the following description will simply refer to individual technical terms related to the present application.
1. Big language model (large language model LLM)
The large language model refers to a natural language processing model constructed based on deep learning technology, which can receive text as input and predict missing parts in the text. Large language models are typically trained from large amounts of text data, such as news on the internet, blogs, wikipedia, and the like. The models can generate natural language texts, answer natural language questions, perform tasks such as machine translation and speech recognition, and are widely applied to the field of language processing. Currently, some large technical companies such as google, microsoft, etc. have built very powerful large language models. The models have very strong language understanding and generating capability, can be used for various language processing tasks, and show excellent performances in a plurality of fields. The term "large language model" as referred to herein also includes other machine learning models that may occur in the future.
2. Chat-forming pre-training transformation model
Chat-forming pre-training transformation model is one of the most advanced natural language processing (natural language processing, NLP) models at present, is a machine learning model based on a language model, can be used for generating text of natural language, has high language understanding capability, and can acquire high text understanding capability and generating capability by learning different data sets. LLM represented by chat generation type pre-training transformation model can assist human to complete various tasks, thereby reducing working pressure of people and improving production efficiency. In various fields of natural language processing, the high efficiency and accuracy of chat-generating pre-training transformation model are widely accepted. For example, it may be used in the fields of machine translation, automatic summarization, text error correction, speech generation, etc.
3. Structured query language (structured query language, SQL)
SQL is a database language with multiple functions such as data manipulation and data definition, and the language has the characteristic of interactivity, and can provide great convenience for users, and a database management system should make full use of the SQL language to improve the working quality and efficiency of a computer application system. SQL may be used to manipulate the statements of a database, such as: create item (create), query content (read), update content (update), delete item (delete), etc. Typically, the general format of an SQL statement includes the following fields: "select", "from", "where", "group by", "orderby".
4. Natural language to structured query language (natural language to structured query language, NL2 SQL)
NL2SQL is a technique that converts a user's natural sentence into an executable SQL sentence, and has great significance in improving the interaction between the user and the database. The NL2SQL is essentially a subtask in the field of semantic analysis, which converts the user's natural language sentences into canonical semantic representations that can be understood and executed by a computer. The application fields of NL2SQL mainly comprise: (1) an intelligent search engine: by using NL2SQL technology, the data in the relational database can be queried by the natural language of the user, so that the intelligent level of the search engine is improved. (2) chat robots: the intelligent chat robot can be realized by using NL2SQL technology, so that the chat robot can automatically recognize natural language query input by a user and retrieve related information from a database. (3) database query: the NL2SQL technology can support a database management system, support users to inquire data in a database by using natural language, and improve inquiry efficiency.
Based on the development of the current large language model, it is considered that the large language model can be applied in the NL2SQL field. The most commonly used use scenario of the large language model is to feed back a query result based on a problem input by a user, in order to enable output content of the large language model to be more comprehensive and meet user expectations, a context (in-context) learning model is adopted as a chat-forming pre-training transformation model, and reasonable content is generated from proper context association by guiding the model. For example, the user may additionally input some examples to the large language model when inputting a question, thereby forming a guide to the large model. For example, a user may input the following to the large language model: please translate "i like to eat chocolate", for example: apple, banana. At this time, the language model learns that the language model is a task of translating Chinese into English according to an example input by a user (for example, apple, banana), so that the translation from Chinese to English can be performed quickly and efficiently. Also, for a NL2SQL task, if some examples can be provided, the resulting SQL statement may be more comprehensive or accurate. Or it can be understood that writing out a suitable example affects the accuracy of the generated SQL statement, and further directly affects the accuracy of the query result obtained by the problem input by the user.
However, it is likely that the whole data lake is faced by one data development engineer. The data in the data lake originates from different storage systems, e.g., object store, data warehouse, database, etc. Also from different business systems, e.g., customer relationship management systems (customer relationship management, CRM), enterprise resource management systems (enterprise resource planning, ERP), operation and maintenance systems, etc. In this complex system, it takes a lot of time for the engineer to determine the example corresponding to the currently inputted question by thinking, and the inputted example is not necessarily the example that best matches the current question, which results in that the finally generated SQL statement is not necessarily accurate. In other words, how to make the SQL sentence generated by the large language model more accurate becomes the technical problem to be solved.
In view of this, the present application proposes that historical query information can be stored, where the historical query information records a mapping relationship between a problem input by a user in a history and a correct SQL statement generated by a corresponding history, and then a candidate example that is the best match with the problem input by the user currently is found in the historical query record, so that the problem input by the user currently and the candidate example are used as input of a large language model, and the large language model outputs the problem input by the user currently corresponding to the SQL statement. The method provided by the application can conveniently and accurately obtain the candidate examples corresponding to the problem currently input by the user, thereby improving the accuracy of SQL sentences generated based on the problem currently input by the user.
Fig. 1 is a schematic architecture diagram of a data query device to which the present application is applicable, as shown in fig. 1, text content input by a user or text content converted by speech is input into an NL2SQL model through an interaction layer, the NL2SQL model searches an SQL statement corresponding to the text content in a database, and feeds back an SQL execution result to the interaction layer.
Fig. 2 is a schematic flowchart of a method 200 for obtaining an SQL statement according to the present application, where the technical solution provided in the present application may be implemented by each server (e.g., virtual machine) in a cloud service architecture, or may be implemented by a common data query device (e.g., a data query engine). As shown in fig. 2, the method includes:
step 201, storing historical query information, wherein the historical query information comprises a first mapping relation between first information and a SQL statement of a first structural query language.
In the application, the first information comprises the problems input by the user history, and the first SQL statement can be understood as the correct SQL statement generated by the history corresponding to the problems input by the user history. At this time, the history query information can be specifically understood as: the mapping relation between the problems input by the user history and the correct SQL sentences generated by the history corresponding to the problems input by the user history. For example, the format of the history query information stored in this manner is < key_1: problem of user history input, value: the correct SQL statement of history generation >. By way of example, the first mapping relationship may be understood as a correspondence relationship shown in table 1 below.
Table 1
First mapping relation | First information | First SQL statement |
Mapping relation #1 | Question #1 of user history input | History generated correct SQL statement #1 |
Mapping relation #2 | Question #2 of user history input | History generated correct SQL statement #2 |
Mapping relation #3 | Question #3 of user history input | History generated correct SQL statement #3 |
Mapping relation #4 | Question #4 of user history input | History generated correct SQL statement #4 |
Optionally, in some implementations, the first information further includes a table schema corresponding to the questions input by the user history. At this time, the history query information can be understood as: the mapping relation between the user history input questions and the table schema (table schema) corresponding to the user history input questions, and the history generated correct SQL sentences. For example, the format of the history query information stored in this manner is < key_1: question of user history input, key_2: table mode, value corresponding to the problem input by user history: the correct SQL statement of history generation >. The first mapping relationship at this time may be understood as a correspondence relationship as shown in table 2 below, for example.
Table 2
In the present application, a "table schema" may be understood as an attribute of a table, a name listed in a table, and the like.
In the application, the first SQL sentence comprises an SQL sentence which is generated before the large language model and is directly executed, and also comprises an SQL sentence which is finally executed after the large language model generates the SQL sentence and is modified manually
Alternatively, "mapping relationship" and "correspondence relationship" herein may also be expressed as "association relationship". It should be understood that, in the embodiments of the present application, the "mapping relationship" may be stored or recorded by a functional relationship, a table, or the like.
Step 202, screening the historical query information according to the second information to obtain candidate examples.
In the application, the second information comprises the problem currently input by the user, and the candidate examples comprise the first mapping relation after screening. The similarity between the first information and the second information in the screened first mapping relation is greater than or equal to a first threshold value.
In one implementation, when the second information includes only the question currently input by the user and the first information includes the question input by the user historically, the similarity between the first information and the second information may be specifically understood as: similarity between the questions currently entered by the user and the questions entered by the stored user history. The implementation manner may also be understood that the screening is performed in the first mapping relationship based on the similarity of the questions, so as to obtain a screened first mapping relationship.
Specifically, the similarity between questions may be measured, for example, by the following method: the problem currently input by the user and the problem input by the stored user history can be vectorized by using one hot code or word embedding code, and then the similarity comparison is carried out by comparing the inner products of the two, or by measuring the distance, or by a neural network and other methods.
Exemplary, candidate examples finally determined based on the screening of the problem similarity are mapping relationship #1, mapping relationship #2, mapping relationship #3, and mapping relationship #4 in table 1.
In another implementation, when the second information includes a question currently input by the user and a table mode corresponding to the question currently input by the user, the first information includes a question historically input by the user and a table mode corresponding to the question historically input by the user, and at this time, the similarity between the first information and the second information may be specifically understood as: similarity between the question currently input by the user and the stored question input by the user history, and similarity between the table pattern corresponding to the question currently input by the user and the table pattern corresponding to the question input by the user history. The implementation manner can be understood that the screening can be performed not only based on the similarity of the questions, but also based on the table mode corresponding to the questions, so as to obtain the first mapping relation after screening. It should be understood that the table pattern similarity may be first selected and then selected according to the problem similarity, and the order of the two is not limited.
Specifically, the screening can be performed by setting rules when the screening is performed based on the table schema. For example, the user may edit various rules for filtering. For example: the table names are the same, the column names of the tables are the same, the tables have blood relationship, the tables with different table names are the same, and the tables with the main external keys are related. The user history input problem correspondence table pattern may be scored by these rules, and may be determined to be a candidate example when the score is greater than or equal to a certain threshold (e.g., a first threshold). For another example, user-defined rules may be screened for attributes of the table patterns, such as scoring the problem-corresponding table patterns entered by the user history based on the quality of the table, the date of the table, etc., and when the score is greater than or equal to a certain threshold (e.g., a first threshold), it may be determined to be a candidate example.
The "tables with different table names and associated tables of the main external key are the same" is understood to mean that, although the table names of the tables are different, each table contains certain information, and the information related to the information can be found through the information and is associated in the same table.
For example, candidate examples determined based on the screening of the problem similarity are mapping relationship #5, mapping relationship #6, mapping relationship #7 and mapping relationship #8 in table 2, and candidate examples are screened again based on the similarity of the table patterns, and the finally determined candidate examples are mapping relationship #5, mapping relationship #6 and mapping relationship #7 in table 2.
And step 203, generating third information according to the second information and the candidate examples, wherein the third information comprises the second information and the candidate examples.
As one example, assume that the question currently entered by the user is: how many blue sports wear are available in the query repository, by screening the historic store for candidate examples are: "how many red cars are found" — "A table called carts with columns id and color", "Select x from cars where color = 'red'". Thus, the third information can be understood as: the warehouse is also queried for how many blue sports clothes, e.g. "how many red cars are found" — "A table called carts with columns id and color" "Select = 'red'".
In order that the finally obtained examples may cover various dimensions or aspects and remove redundant examples in various examples (it may be understood that only one example similar to the candidate example remains), candidate examples may be further screened to obtain target examples. Thus, in one possible implementation, after obtaining the candidate examples, the method further comprises: a target example is determined among the candidate examples. It may also be understood that the present application also proposes that candidate examples may be further screened to obtain target examples. At this time, third information is generated from the second information and the candidate example, including: third information is generated according to the second information and the target examples, wherein the third information comprises the problems and the target examples which are currently input by the user. The following describes the respective implementation manners of screening candidate examples to obtain target examples in the present application:
Mode 1:
the candidate examples are assumed to comprise a plurality of candidate examples, and each candidate example in the plurality of candidate examples comprises a screened first mapping relation, wherein the screened first mapping relation is a mapping relation between a problem input by a screened user history and a screened first SQL sentence. Illustratively, it is assumed that the candidate examples are the mapping relationship #1 to the mapping relationship #4 in the above table 1.
At this time, the plurality of first SQL statements may be filtered according to the differences between the plurality of first SQL statements included in the plurality of first mapping relationships corresponding to the plurality of candidate examples, to obtain a plurality of target SQL statements, where the differences between the plurality of target SQL statements are greater than or equal to a second threshold. For example, the mapping relationships #1 to #4 in the candidate examples correspond to the history-generated correct SQL statement #1 to #4, respectively, and at this time, the variability between the history-generated correct SQL statements may be compared such that the variability between the history-generated correct SQL statements is finally selected to be greater than or equal to a certain threshold (for example, the second threshold). For example, since the two SQL statements, i.e., the correct SQL statement #1 generated by the history and the correct SQL statement #2 generated by the history, are found to have a large difference, it is possible to determine that the target examples are the mapping relationship #1 and the mapping relationship #2.
In the present application, when judging the variability of the respective SQL statements, the variability between the SQL statements can be determined by, for example, comparing the contents of the respective fields of the SQL statements.
Mode 2:
the method comprises the steps that a plurality of candidate examples are included in a candidate example, each candidate example in the plurality of candidate examples comprises a screened first mapping relation, wherein the screened first mapping relation is a screened problem input by a user history, a table mode corresponding to the screened problem input by the user history and a mapping relation among screened first SQL sentences. Illustratively, it is assumed that the candidate examples are the mapping relationship #1 to the mapping relationship #3 in the above table 2.
At this time, the plurality of first SQL statements may be filtered according to the differences between the plurality of first SQL statements included in the plurality of first mapping relationships corresponding to the plurality of candidate examples, to obtain a plurality of target SQL statements, where the differences between the plurality of target SQL statements are greater than or equal to a second threshold. For example, the mapping relations #5 to #7 in the candidate examples correspond to the history-generated correct SQL statement #a to the history-generated correct SQL statement #c, respectively, and at this time, the differences between the history-generated correct SQL statements may be compared, so that the differences between the history-generated correct SQL statements are finally selected to be greater than or equal to a certain threshold (for example, the second threshold). For example, since the two SQL statements, i.e., the correct SQL statement #a generated by the history and the correct SQL statement #b generated by the history, are found to have a large difference, it is possible to determine that the target examples are the mapping relationship #5 and the mapping relationship #6.
Mode 3:
the method comprises the steps that a plurality of candidate examples are included in a candidate example, each candidate example in the plurality of candidate examples comprises a screened first mapping relation, wherein the screened first mapping relation is a screened problem input by a user history, a table mode corresponding to the screened problem input by the user history and a mapping relation among screened first SQL sentences. Illustratively, it is assumed that the candidate examples are the mapping relationship #1 to the mapping relationship #3 in the above table 2.
At this time, the plurality of table patterns may be filtered according to the differences among the plurality of table patterns included in the plurality of first mapping relationships corresponding to the plurality of candidate examples, to obtain a plurality of target table patterns, where the degree of difference among the plurality of target representations is greater than or equal to a second threshold. Illustratively, the mapping relations #5 to #7 in the candidate examples correspond to the history-generated correct SQL statement #a to the history-generated correct SQL statement #c, respectively, and at this time, the differences between the table patterns may be compared such that the differences between the table patterns are finally selected to be greater than or equal to a certain threshold (e.g., a third threshold). For example, since the table patterns #1 and #3 are found to have large differences through the filtering, it is possible to determine that the target examples are the mapping relationship #5 and #7.
In the present application, in judging the variability between the table patterns in the respective candidate examples, the target table pattern may be determined by, for example, comparing the table names, column names, and the like of the respective tables, respectively.
Of course, the mode 2 and the mode 3 may be executed simultaneously, that is, the difference between the table modes may be screened by screening the difference of the first SQL statement, so as to determine the target examples by combining the results of the two aspects, for example, the target examples are obtained by combining the mode 2 and the mode 3 and are the mapping relationship #6 and the mapping relationship #7.
In the method, the candidate examples can be further screened after the candidate examples are obtained, the target examples are obtained in the candidate examples, redundant information in the candidate examples can be removed, the richness of the target examples is guaranteed, computing resources can be saved, and the richness of the obtained target examples can be improved.
It should be appreciated that when the candidate instances are all similar, then screening by variability may be understood as ultimately retaining one or both candidate instances, which are target instances. This process may be understood as a process of removing redundant examples, and may reduce resource occupation.
As can be seen from the above description, the present application can determine candidate examples through similarities between questions or through similarities between questions and similarities between table patterns, and when determining target examples, determine differences between first SQL statements and/or differences between table patterns in the candidate examples. The matching performance of the final screening example and the problem currently input by the user can be best through multiple times of screening, and the accuracy of the generated SQL sentence is further ensured.
And 204, generating a second SQL sentence corresponding to the second information according to the large language model and the third information.
In the application, the input of the large language model is the question and the target example which are input by the user at present, and the output of the large language model is the SQL sentence corresponding to the question which is input by the user at present. Illustratively, the input information for the large language model is "how many blue sports clothes the query repository is, examples: how many red cars are found— "A table called carts with columns id and color", "Select x from cars where color = 'red'", the output of the large language model is "Atable called tracksuit with columns id and color", "Select x from tracksuit where color = 'blue'".
It should be noted that the large language model in the present application may be deployed on a third party server, or may be integrated on the same server, which is not limited.
FIG. 3 is another schematic flow chart of generating SQL statements according to the present application, as shown in FIG. 3, step 301, storing historical query information. The format of the stored historical query information may be: < key_1, value >, or the format of the stored history query information may be: < key_1, key_2, value >. Step 302, historical query information is filtered based on rules of the table schema. For example, the history inquiry information may be screened out, from the history inquiry information, that the table name of the table mode corresponding to the problem input by the user in the history is the same as the table name of the table mode corresponding to the problem input by the user currently. For example, the history inquiry information may be selected from the history inquiry information such that table names of table patterns corresponding to the questions inputted by the user in the history are different, but column attributes of the table are the same as column attributes of the table pattern corresponding to the questions inputted by the user at present. For another example, the table names of the table modes corresponding to the questions input by the user in the history query information may be screened out, and the different tables associated with the main external keys of the tables are the same as the history query information associated with the main external keys of the table modes corresponding to the questions input by the user. In step 303, historical query information is filtered based on the similarity of the questions. Step 304, filtering is performed based on the variability of the SQL statement and/or the variability of the table schema. Step 305, stitching the question currently entered by the user with the example obtained by the screening (e.g. stitching using the sympt technique). Step 306, calling the large language model to output the SQL sentence corresponding to the question currently input by the user.
Based on the scheme, the method and the device have the advantages that the examples are screened gradually in the stored historical query information, the examples which are matched with the problems currently input by the user are finally obtained, the problems currently input by the user and the screened examples are used as the input of the large language model, and the large language model outputs SQL sentences corresponding to the problems currently input by the user. The method can ensure that the acquired examples have extremely high matching degree with the current input problem of the user, a developer does not need to manually input the examples, the accuracy of the acquired examples is ensured, the working difficulty of the data developer is also simplified, and the efficiency of generating SQL sentences is improved.
It will be appreciated that the term "and/or" is merely one association relationship describing the associated object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.
Those of skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The embodiment of the application can divide the functional modules of the computing device according to the method example, for example, each functional module can be divided corresponding to each function, or two or more functions can be integrated in one processing module. The integrated modules may be implemented in hardware or in software functional modules. It should be noted that, in the embodiment of the present application, the division of the modules is schematic, which is merely a logic function division, and other division manners may be implemented in actual implementation. The following description will take an example of dividing each functional module into corresponding functions.
Fig. 4 is a schematic block diagram of a data query device 100 provided in an embodiment of the present application. As shown, the apparatus 100 may include: a storage module 110, a screening module 120, and a processing module 130.
The above modules are used for executing the steps of the method for obtaining the model, which is not described herein.
It should also be appreciated that the data querying device 100 herein is embodied in the form of a functional unit. The term "unit" herein may refer to an application specific integrated circuit (application specific integrated circuit, ASIC), an electronic circuit, a processor (e.g., a shared, dedicated, or group processor, etc.) and memory that execute one or more software or firmware programs, a combinational logic circuit, and/or other suitable components that support the described functionality.
The data query device 100 of each of the above embodiments has a function of implementing the corresponding steps of the method 200 described above. The functions may be implemented by hardware, or may be implemented by hardware executing corresponding software. The hardware or software comprises one or more modules corresponding to the functions; for example, the screening unit or the like may be replaced by a processor, and the transceiving operations and the related processing operations in the respective method embodiments may be performed separately. Furthermore, the egress unit may be a processing circuit.
It should be noted that the device in fig. 4 may be the data query device in the foregoing method embodiment, and may also be a chip or a chip system of a computing device, for example: system on chip (SoC). Wherein the processing unit is an integrated processor or microprocessor or integrated circuit on the chip. And are not limited herein.
Fig. 5 is a schematic block diagram of another data querying device 200 provided by an embodiment of the present application. As shown, the apparatus 200 includes: at least one processor 220. The processor 220 is coupled to the memory for executing instructions stored in the memory to transmit signals and/or receive signals. Optionally, the device 200 further comprises a memory 230 for storing instructions. Optionally, the device 200 further comprises a transceiver 210, and the processor 220 controls the transceiver 210 to transmit signals and/or to receive signals.
It should be appreciated that the processor 220 and the memory 230 may be combined into a single processing device, and that the processor 220 is configured to execute program code stored in the memory 230 to perform the functions described above. In particular implementations, the memory 230 may also be integrated into the processor 220 or may be separate from the processor 220.
It should also be appreciated that transceiver 210 may include a transceiver (or receiver) and a transmitter (or transmitter). The transceiver may further include antennas, the number of which may be one or more. Transceiver 210 may be a communication interface or interface circuit.
Specifically, the processor 220 in the device 200 may correspond to the filtering module 120, the exiting module 130 in the data querying device 100.
As an option, the data querying device 200 is configured to implement the steps in the above method 200 embodiment.
For example, the processor 220 is configured to execute computer programs or instructions stored in the memory 230 to implement the various steps in the method 200 above.
Fig. 6 is a schematic architecture diagram of a computing device cluster according to an embodiment of the present application. The cluster of computing devices includes at least one computing device. The computing device may be a server, such as a central server, an edge server, or a local server in a local data center. In some embodiments, the computing device may also be a terminal device such as a desktop, notebook, or smart phone. As shown in fig. 6, the cluster of computing devices includes at least one computing device 600. The same instructions for performing the method of generating SQL statements described in the above embodiments may be stored in memory 630 in one or more computing devices 600 in a computing device cluster.
In some possible implementations, some instructions for performing the method of generating an SQL statement described in the above embodiments may also be stored in the memory 630 of one or more computing devices 600 in the computing device cluster, respectively. In other words, a combination of one or more computing devices 600 may collectively execute instructions for performing the method of generating an SQL statement described by the above embodiments.
It should be noted that the memory 630 in different computing devices 600 in the computing device cluster may store different instructions for performing part of the functions of the computing device 600. That is, the instructions stored by the memory 630 in the different computing devices 600 may implement the functionality of one or more of the storage module 110, the screening module 120, and the processing module 130.
Alternatively, the memory 630 in different computing devices 600 in a cluster of computing devices may store different instructions for performing portions of the functions of the data querying devices 100-200, respectively, described above. That is, the instructions stored by the memory 630 in the different computing devices 600 may implement the functionality of one or more of the storage module 110, the screening module 120, and the processing module 130.
In some possible implementations, one or more computing devices in a cluster of computing devices may be connected through a network. Wherein the network may be a wide area network or a local area network, etc. Fig. 7 shows one possible implementation. Fig. 7 is a schematic diagram of a connection between computing devices 600A and 600B via a network provided by an embodiment of the present application. The two computing devices 600A and 600B are connected by a network. Specifically, the connection to the network is made through a communication interface in each computing device. In this type of possible implementation, instructions for performing the functions of the screening module 120, the processing module 130 are stored in a memory 630 in the computing device 600A. Meanwhile, the memory 630 in the computing device 600B has stored therein instructions for performing the functions of the memory module 110.
The manner of connection between clusters of computing devices shown in fig. 7 may be in view of the large amount of historical query information that the method of generating SQL provided by the present application requires to store, and thus in view of the functionality implemented by the storage module 110 being performed by the computing device 600B.
It should be appreciated that the functionality of computing device 600A shown in fig. 7 may also be performed by multiple computing devices 600. Likewise, the functionality of computing device 600B may also be performed by multiple computing devices 600.
According to the method provided by the embodiment of the application, the application further provides a computer program product, on which a computer program code is stored, which when run on a computer causes the computer to perform the steps in the embodiment of the method 200.
According to the method provided by the embodiment of the present application, the present application further provides a computer readable medium storing a program code, which when run on a computer, causes the computer to perform the steps in the embodiment of the method 200 described above.
The explanation and beneficial effects of the relevant content in any of the above-mentioned devices can refer to the corresponding method embodiments provided above, and are not repeated here.
It should be understood that the specific processes of each transceiver and processor to execute the corresponding steps are described in detail in the above method embodiments, and are not described herein for brevity.
In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in the processor for execution. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method. To avoid repetition, a detailed description is not provided herein.
It should be noted that the processor in the embodiments of the present application may be an integrated circuit chip with signal processing capability. In implementation, the steps of the above method embodiments may be implemented by integrated logic circuits of hardware in a processor or instructions in software form. The processor may be a general purpose processor, a digital signal processor (digital signal processor, DSP), an application-specific integrated circuit (ASIC), a field-programmable gate array (field-programmable gate array) or other programmable logic device, a discrete gate or transistor logic device, a discrete hardware component. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on a computer, the processes or functions described in accordance with embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable device. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line (digital subscriber line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a high-density digital video disc (digital video disc, DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.
In the above-described respective device embodiments, the respective steps are performed by respective modules or units, for example, the steps of receiving or transmitting in the method embodiments are performed by a transceiver unit (transceiver), and other steps than transmitting and receiving may be performed by a processing unit (processor). Reference may be made to corresponding method embodiments for the function of a specific unit. Wherein the processor may be one or more.
As used in this specification, the terms "component," "module," "system," and the like are intended to refer to a computer-related entity, either hardware, firmware, a combination of hardware and software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a computing device and the computing device can be a component. One or more components may reside within a process and/or thread of execution and a component may be localized on one computer and/or distributed between 2 or more computers. Furthermore, these components can execute from various computer readable media having various data structures stored thereon. The components may communicate by way of local and/or remote processes such as in accordance with a signal having one or more data packets (e.g., data from two components interacting with one another in a local system, distributed system, and/or across a network such as the internet with other systems by way of the signal).
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not described in detail herein.
In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interface, indirect coupling or communication connection of devices or units, electrical, mechanical, or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Claims (17)
1. A method of generating a structured query language, comprising:
storing historical query information, wherein the historical query information comprises a first mapping relation between first information and a first Structural Query Language (SQL) statement, the first information comprises problems input by a user in a historical manner, and the first SQL statement is a correct SQL statement generated by a history corresponding to the problems input by the user in the historical manner;
screening the historical query information according to second information to obtain candidate examples, wherein the second information comprises a problem currently input by a user, the candidate examples comprise the first screened mapping relation, and the similarity between the first information and the second information in the first screened mapping relation is larger than or equal to a first threshold value;
Generating third information according to the second information and the candidate example, wherein the third information comprises the second information and the candidate example;
and generating a second SQL sentence corresponding to the second information according to the large language model and the third information, wherein the input of the large language model is the third information, and the output of the large language model is the second SQL sentence.
2. The method of claim 1, wherein the second information further comprises a table schema corresponding to a question currently entered by the user, and wherein the first information further comprises a table schema corresponding to a question historically entered by the user.
3. The method according to claim 1 or 2, wherein after obtaining the candidate examples, the method further comprises:
determining a target example among the candidate examples;
the generating third information according to the second information and the candidate example, wherein the third information comprises the question currently input by the user and the candidate example, and the method comprises the following steps:
third information is generated according to the second information and the target example, wherein the third information comprises the problem currently input by the user and the target example.
4. The method of claim 3, wherein the candidate instances obtained comprise a plurality of candidate instances, wherein each candidate instance in the plurality of candidate instances comprises the first filtered mapping relationship between the filtered questions of the user history input and the first filtered SQL statement,
the determining a target example in the candidate examples comprises the following steps:
screening the plurality of first SQL sentences according to the differences among the plurality of first SQL sentences included in the plurality of first mapping relations corresponding to the plurality of candidate examples to obtain a plurality of target SQL sentences, wherein the difference among the plurality of target SQL sentences is larger than or equal to a second threshold value;
determining the target examples according to the target SQL sentences and the filtered problems of the user history input corresponding to the target SQL sentences;
wherein each candidate example in the target examples comprises a mapping relation between the filtered problems input by the user history and the target SQL statement.
5. The method of claim 3, wherein the candidate instances obtained comprise a plurality of candidate instances, wherein each candidate instance in the plurality of candidate instances comprises the first filtered mapping relationship, the first filtered mapping relationship being a table schema corresponding to the filtered user history input question, and a mapping relationship between the first filtered SQL statement,
the determining a target example in the candidate examples comprises the following steps:
screening the table modes according to the differences among the table modes included in the first mapping relations corresponding to the candidate examples to obtain a plurality of target table modes, wherein the differences among the target table modes are larger than or equal to a third threshold;
determining the target examples according to the target table patterns, the filtered problems input by the user history and the filtered first SQL statement corresponding to the target table patterns;
wherein each target instance of the plurality of target instances comprises: the filtered problems input by the user history, the filtered target table mode and the filtered mapping relation between the first SQL sentences.
6. The method of claim 3, wherein the candidate instances obtained comprise a plurality of candidate instances, wherein each candidate instance in the plurality of candidate instances comprises the first filtered mapping relationship, the first filtered mapping relationship being a table schema corresponding to the filtered user history input question, and a mapping relationship between the first filtered SQL statement,
the determining a target example according to the candidate example includes:
screening the plurality of first SQL sentences according to the differences among the plurality of first SQL sentences included in the plurality of first mapping relations corresponding to the plurality of candidate examples to obtain a plurality of target SQL sentences, wherein the differences among the plurality of target SQL sentences are larger than or equal to a second threshold value;
determining the target examples according to the target SQL sentences, the filtered problems input by the user history corresponding to the target SQL sentences and the table modes corresponding to the filtered problems input by the user history;
Wherein each target instance of the plurality of target instances comprises: the filtered problems input by the user history, the filtered table modes and the mapping relation between the target SQL sentences.
7. A data query device, comprising: the device comprises a storage module, a screening module and a processing module, wherein,
the storage module is used for storing historical query information, the historical query information comprises first information and a first mapping relation between a first Structural Query Language (SQL) statement, wherein the first information comprises problems input by a user in a historical manner, and the first SQL statement is a correct SQL statement generated by a history corresponding to the problems input by the user in the historical manner;
the screening module is configured to screen the historical query information according to second information to obtain a candidate example, where the second information includes a problem currently input by a user, and the candidate example includes the first mapping relationship after screening, where a similarity between the first information and the second information in the first mapping relationship after screening is greater than or equal to a first threshold;
the processing module generates third information according to the second information and the candidate example, wherein the third information comprises the second information and the candidate example;
The processing module is used for generating a second SQL statement corresponding to the second information according to the large language model and the third information, wherein the input of the large language model is the third information, and the output of the large language model is the second SQL statement.
8. The data query device of claim 7, wherein said second information further comprises a table schema corresponding to a question currently entered by said user, and wherein said first information further comprises a table schema corresponding to a question historically entered by said user.
9. The data query device as claimed in claim 7 or 8, wherein,
the processing module is used for determining a target example in the candidate examples;
the processing module is configured to generate third information according to the second information and the candidate example, where the third information includes a question currently input by the user and the candidate example, and includes:
the processing module is used for generating third information according to the second information and the target example, wherein the third information comprises the problem currently input by the user and the target example.
10. The data query device of claim 9, wherein the candidate examples obtained comprise a plurality of candidate examples, wherein each candidate example of the plurality of candidate examples comprises the first filtered mapping relationship between the filtered questions of the user history input and the first filtered SQL statement,
The processing module is configured to determine a target example among the candidate examples, including:
the processing module is configured to filter the plurality of first SQL statements according to differences among the plurality of first SQL statements included in the plurality of first mapping relationships corresponding to the plurality of candidate examples, so as to obtain a plurality of target SQL statements, where the degree of difference among the plurality of target SQL statements is greater than or equal to a second threshold;
the processing module is used for determining the target examples according to the target SQL sentences and the filtered problems of the user history input corresponding to the target SQL sentences;
wherein each candidate example in the target examples comprises a mapping relation between the filtered problems input by the user history and the target SQL statement.
11. The data query device of claim 9, wherein the candidate examples obtained comprise a plurality of candidate examples, wherein each candidate example of the plurality of candidate examples comprises the first filtered mapping relationship, the first filtered mapping relationship being a table schema corresponding to the filtered user history input question, and a mapping relationship between the first filtered SQL statement,
The processing module is configured to determine a target example among the candidate examples, including:
the processing module is configured to screen the plurality of table modes according to differences among the plurality of table modes included in the plurality of first mapping relationships corresponding to the plurality of candidate examples, so as to obtain a plurality of target table modes, where the degree of difference among the plurality of target table modes is greater than or equal to a third threshold;
the processing module is configured to determine the plurality of target examples according to the plurality of target table modes, the filtered problems of the user history input corresponding to the plurality of target table modes, and the filtered first SQL statement;
wherein each target instance of the plurality of target instances comprises: the filtered problems input by the user history, the filtered target table mode and the filtered mapping relation between the first SQL sentences.
12. The data query device of claim 9, wherein the candidate examples obtained comprise a plurality of candidate examples, wherein each candidate example of the plurality of candidate examples comprises the first filtered mapping relationship, the first filtered mapping relationship being a table schema corresponding to the filtered user history input question, and a mapping relationship between the first filtered SQL statement,
The processing module is configured to determine a target example according to the candidate example, including:
the processing module is configured to filter the plurality of first SQL statements according to the differences among the plurality of first SQL statements included in the plurality of first mapping relationships corresponding to the plurality of candidate examples, so as to obtain a plurality of target SQL statements, where the degree of difference among the plurality of target SQL statements is greater than or equal to a second threshold;
the processing module is configured to determine the plurality of target examples according to the plurality of target SQL statements, the filtered problems of the user history input corresponding to the plurality of target SQL statements, and a table mode corresponding to the filtered problems of the user history input;
wherein each target instance of the plurality of target instances comprises: the filtered problems input by the user history, the filtered table modes and the mapping relation between the target SQL sentences.
13. The data querying device of any of claims 7-12, wherein the data querying device is deployed on a cloud server of an artificial intelligence AI application.
14. A computing device comprising a processor and a memory, the processor executing instructions in the memory to cause the processor to perform the method of any of claims 1-6.
15. A cluster of computing devices, comprising at least one computing device, each computing device comprising a processor and a memory;
the processor of the at least one computing device is configured to execute instructions stored in the memory of the at least one computing device to cause the cluster of computing devices to perform the method of any one of claims 1 to 6.
16. A computer program product containing instructions that, when executed by a cluster of computing devices, cause the cluster of computing devices to perform the method of any of claims 1 to 6.
17. A computer readable storage medium comprising computer program instructions which, when executed by a cluster of computing devices, perform the method of any of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310526550.6A CN117112590A (en) | 2023-05-10 | 2023-05-10 | Method for generating structural query language and data query equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310526550.6A CN117112590A (en) | 2023-05-10 | 2023-05-10 | Method for generating structural query language and data query equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117112590A true CN117112590A (en) | 2023-11-24 |
Family
ID=88802722
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310526550.6A Pending CN117112590A (en) | 2023-05-10 | 2023-05-10 | Method for generating structural query language and data query equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117112590A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117973255A (en) * | 2024-01-05 | 2024-05-03 | 西南石油大学 | Inversion method for detecting combustible gas leakage in oil and gas production area |
CN118093632A (en) * | 2024-04-17 | 2024-05-28 | 商飞智能技术有限公司 | Graph database query method and device based on large language model and graph structure |
CN118227656A (en) * | 2024-05-24 | 2024-06-21 | 浙江大学 | Query method and device based on data lake |
CN118227742A (en) * | 2024-05-24 | 2024-06-21 | 浙江口碑网络技术有限公司 | Data trend analysis method, device, equipment, storage medium and program product |
CN118377783A (en) * | 2024-06-27 | 2024-07-23 | 卓世智星(青田)元宇宙科技有限公司 | SQL sentence generation method and device |
-
2023
- 2023-05-10 CN CN202310526550.6A patent/CN117112590A/en active Pending
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117973255A (en) * | 2024-01-05 | 2024-05-03 | 西南石油大学 | Inversion method for detecting combustible gas leakage in oil and gas production area |
CN118093632A (en) * | 2024-04-17 | 2024-05-28 | 商飞智能技术有限公司 | Graph database query method and device based on large language model and graph structure |
CN118227656A (en) * | 2024-05-24 | 2024-06-21 | 浙江大学 | Query method and device based on data lake |
CN118227742A (en) * | 2024-05-24 | 2024-06-21 | 浙江口碑网络技术有限公司 | Data trend analysis method, device, equipment, storage medium and program product |
CN118227656B (en) * | 2024-05-24 | 2024-08-13 | 浙江大学 | Query method and device based on data lake |
CN118377783A (en) * | 2024-06-27 | 2024-07-23 | 卓世智星(青田)元宇宙科技有限公司 | SQL sentence generation method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11790006B2 (en) | Natural language question answering systems | |
US20220382752A1 (en) | Mapping Natural Language To Queries Using A Query Grammar | |
US11995073B2 (en) | One-shot learning for text-to-SQL | |
CN117112590A (en) | Method for generating structural query language and data query equipment | |
CN112437917B (en) | Natural language interface for databases using autonomous agents and thesaurus | |
US20200081899A1 (en) | Automated database schema matching | |
US20170116260A1 (en) | Using a dimensional data model for transforming a natural language query to a structured language query | |
CN103377239B (en) | Method and device for calculating similarity between texts | |
US20190317965A1 (en) | Methods and apparatus to facilitate generation of database queries | |
US20230177078A1 (en) | Conversational Database Analysis | |
CN110597844B (en) | Unified access method for heterogeneous database data and related equipment | |
CN110659282B (en) | Data route construction method, device, computer equipment and storage medium | |
WO2021047373A1 (en) | Big data-based column data processing method, apparatus, and medium | |
KR20200094074A (en) | Method, apparatus, device and storage medium for managing index | |
CN114756607A (en) | Parameter configuration method and device | |
WO2012164738A1 (en) | Database management system, device, and method | |
CN112527796B (en) | Data table processing method and device and computer readable storage medium | |
WO2021135103A1 (en) | Method and apparatus for semantic analysis, computer device, and storage medium | |
US10877998B2 (en) | Highly atomized segmented and interrogatable data systems (HASIDS) | |
CN112989011B (en) | Data query method, data query device and electronic equipment | |
CN111625579A (en) | Information processing method, device and system | |
CN112612945B (en) | Classification searching method and device | |
CN118210809B (en) | Object definition method, system, equipment and medium based on ER information | |
CN118568128A (en) | Method, apparatus, device and readable medium for query statement generation | |
CN118152423A (en) | Intelligent query method, intelligent query device, electronic equipment and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |