CN114625748A

CN114625748A - SQL query statement generation method and device, electronic equipment and readable storage medium

Info

Publication number: CN114625748A
Application number: CN202110443856.6A
Authority: CN
Inventors: 刘志鹏; 郭峰; 高杨
Original assignee: Asiainfo Technology Nanjing Co ltd
Current assignee: Asiainfo Technology Nanjing Co ltd
Priority date: 2021-04-23
Filing date: 2021-04-23
Publication date: 2022-06-14

Abstract

The embodiment of the application provides a method and a device for generating an SQL query statement, electronic equipment and a readable medium, and relates to the technical field of language processing. The method comprises the following steps: acquiring query sentences, performing word segmentation processing on the query sentences, determining at least one column name to be queried according to word segmentation results, searching a table name associated with each column name to be queried in a pre-constructed knowledge graph, and determining a target table name to be queried from the table names associated with the column names, so that the determining efficiency and accuracy of the target table name are improved; determining the dependency relationship of each participle in the participle result, and determining the preset condition in the SQL query statement according to the dependency relationship of the target participle belonging to the column name in the participle result, wherein the preset condition comprises one or more combinations of a grouping condition, an ordering condition and a limiting condition, so that the defect that other schemes cannot process the grouping condition, the ordering condition and the like is overcome.

Description

SQL query statement generation method and device, electronic equipment and readable storage medium

Technical Field

The present application relates to the field of language processing technologies, and in particular, to a method and an apparatus for generating an SQL query statement, an electronic device, and a computer-readable storage medium.

Background

The data volume of enterprises is huge, each enterprise has its own database, the data searching, analyzing and processing need to obtain the data in the database, the data in the database are stored in tables, and the data of different types are stored in different tables, so that one database usually contains a plurality of tables, the data searching is performed through SQL sentences, but the SQL query threshold for non-professionals is high, and how to convert the query sentences of natural language types into SQL sentences becomes an important problem to be solved.

The existing conversion of a query statement of a natural language type into an SQL statement is realized by an X-SQL model (relationship schema representation with context, text enhancement is realized by enhanced information in text), the X-SQL can convert the query statement into the SQL statement, but the X-SQL is only directed at an english scene, the accuracy of extracting information implied by the chinese language is low, and the X-SQL cannot locate a table to be queried in a database, and does not support operations such as grouping and sequencing of query results.

Disclosure of Invention

The application provides a method and a device for generating an SQL query statement, an electronic device and a computer readable storage medium, which can solve the problems that fuzzy time cannot be identified, a data table to be queried in a database cannot be located and the like in the prior art. The technical scheme is as follows:

in one aspect of the present application, a method for generating an SQL query statement is provided, where the method includes:

acquiring a query statement, wherein the query statement comprises a task to be queried described by a natural language;

performing word segmentation processing on the query sentence, and determining at least one column name to be queried according to a word segmentation result;

searching a table name associated with each column name to be queried in a pre-constructed knowledge graph, and determining a target table name to be queried from the table names associated with the column names;

determining the dependency relationship of each participle in the participle result, and determining a preset condition in the SQL query statement according to the dependency relationship of the target participle for the target participle belonging to the column name in the participle result;

generating an SQL query statement according to at least one column name to be queried, a target table name and a preset condition;

wherein, the association relation between the table name and the column name in the database is stored in the knowledge map.

In one possible implementation manner, determining at least one column name to be queried according to the word segmentation result includes:

and inputting the word segmentation result into a pre-trained SQL generation model to obtain subtasks of the tasks to be queried, which are output by the SQL generation model, and determining at least one column name to be queried according to the subtasks.

In one possible implementation manner, the task to be queried comprises fuzzy time information;

obtaining subtasks of the task to be queried output by the SQL generation model comprises the following steps:

replacing fuzzy time information in the word segmentation result of the query sentence with a preset fuzzy word to obtain a new word segmentation result;

inputting the new word segmentation result into the SQL generating model to obtain a new subtask output by the SQL generating model;

determining at least one column name to be queried according to the word segmentation result, wherein the method comprises the following steps:

acquiring current time information;

inputting the current time information and the fuzzy time information into a pre-constructed time matching template to obtain a first SQL clause which is output by the time matching template and contains accurate time information;

and replacing the preset fuzzy word contained in the new subtask with the first SQL clause to obtain the subtask corresponding to the query statement.

In one possible implementation manner, the table name associated with each column name to be queried is searched in a pre-constructed knowledge graph, and before the step of constructing the knowledge graph, the step of:

traversing at least one table name in the database, and creating a table name entity corresponding to the at least one table name;

traversing at least one column name of a table in a database, and creating a column name entity corresponding to the column name;

and creating an incidence relation between the table name entity and the column name entity, and storing the incidence relation between the table name entity and the column name entity in the knowledge graph.

In one possible implementation manner, determining a target table name to be queried from the table names associated with the column names includes:

determining common table names in the table names associated with the column names;

if the number of the common table names is one, determining the common table names as target table names to be inquired;

and if the number of the common table names exceeds one, presenting the common table names to the user so as to receive response selection of the user, and determining the table names selected by the user as target table names to be inquired.

In one possible implementation, determining the preset condition in the SQL query statement according to the dependency relationship of the target participle includes:

constructing a dependency syntax tree of the sentence to be queried according to the dependency relationship of each participle, wherein the dependency syntax tree comprises nodes and edges; the nodes are participles in the query sentence; the edges are the dependency relationship among the nodes;

creating a grouping entity identification model, wherein the grouping entity identification model is used for identifying a grouping entity, and the grouping entity is a word segmentation of which the entity type is a grouping representation;

creating a sequencing entity recognition model, wherein the sequencing entity recognition model is used for recognizing a sequencing entity, and the sequencing entity is a word segmentation of which the entity type represents sequencing;

and creating a limiting entity recognition model, wherein the limiting entity recognition model is used for recognizing a limiting entity, and the limiting entity is a word segmentation of which the entity type is the representation quantity.

In one possible implementation, determining the preset condition in the SQL statement according to the dependency relationship of the target word segmentation includes:

traversing nodes in the dependency syntax tree, if the currently traversed nodes are target participles, searching child nodes of the target participles, and determining entity types of the child nodes;

if the entity type of the child node is a grouping entity, determining that the preset conditions in the SQL statement comprise grouping conditions;

if the entity type of the child node is a sequencing entity, determining that the preset conditions in the SQL statement comprise sequencing conditions;

and if the entity type of the child node is a quantity entity, determining that the preset condition in the SQL statement comprises a limiting condition.

In one possible implementation, determining the entity type of the child node includes:

and inputting the participles corresponding to the child nodes into at least one of the grouping entity recognition model, the sequencing entity recognition model and the limiting entity recognition model, and determining the entity types of the participles as the entity types of the child nodes.

In one possible implementation manner, generating an SQL query statement according to at least one column name to be queried, a target table name, and a preset condition includes:

inserting at least one column name to be queried, a target table name and a preset condition into an SQL query statement template to generate an SQL query statement;

the SQL query statement template comprises placeholders corresponding to at least one column name to be queried, a target table name and a preset condition respectively.

In a second aspect of the present application, there is provided an apparatus for generating an SQL query statement, the apparatus including:

the system comprises an acquisition module, a query module and a query module, wherein the acquisition module is used for acquiring query sentences which comprise tasks to be queried described by natural language;

the column name determining module is used for performing word segmentation processing on the query sentence and determining at least one column name to be queried according to a word segmentation result;

the target table name determining module is used for searching a table name associated with each column name to be inquired in a pre-constructed knowledge graph and determining the target table name to be inquired from the table names associated with the column names;

the preset condition determining module is used for determining the dependency relationship of each participle in the participle result, and determining the preset condition in the SQL query statement according to the dependency relationship of the target participle for the target participle belonging to the column name in the participle result;

the generating module is used for generating an SQL query statement according to at least one column name to be queried, a target table name and a preset condition;

the knowledge graph stores the association relationship between the table names and the column names of the data tables in the database.

In one possible implementation, the column name determining module includes: and inputting the word segmentation result into a pre-trained SQL generating model to obtain subtasks of the tasks to be queried, which are output by the SQL generating model, and determining at least one column name to be queried according to the subtasks.

In one possible implementation manner, the task to be queried comprises fuzzy time information; the method for obtaining the subtasks of the task to be queried output by the SQL generation model comprises the following steps:

replacing fuzzy time information in the word segmentation result of the query sentence with preset fuzzy words to obtain a new word segmentation result;

acquiring current time information;

In one possible implementation manner, the column name determining module further includes a knowledge graph constructing sub-module:

In one possible implementation, the target table name determining module includes:

In one possible implementation manner, the preset condition determining module includes:

the dependency syntax tree construction sub-module is used for constructing a dependency syntax tree of the query sentence according to the dependency relationship of each participle, and the dependency syntax tree comprises nodes and edges; the nodes are participles in the query sentence; the edges are the dependency relationship among the nodes;

the grouping entity identification model building submodule is used for identifying a grouping entity, and the grouping entity is a word segmentation of which the entity type is a grouping representation;

the sequencing entity identification model building sub-module is used for identifying a sequencing entity, and the sequencing entity is a word segmentation of which the entity type is indicative of sequencing;

and the limiting entity identification model building submodule is used for identifying a limiting entity, and the limiting entity is a word segmentation with the entity type being the number.

In one possible implementation manner, the preset condition determining module further includes:

In one possible implementation, the generating module includes:

In a third aspect, an embodiment of the present invention provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the steps of the method as provided in the first aspect are implemented.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the steps of the method as provided in the first aspect.

In a fifth aspect, an embodiment of the present invention provides a computer program, where the computer program includes computer instructions stored in a computer-readable storage medium, and when a processor of a computer device reads the computer instructions from the computer-readable storage medium, the processor executes the computer instructions, so that the computer device executes the steps of implementing the method provided in the first aspect.

The beneficial effect that technical scheme that this application provided brought is: according to the method, the device, the electronic equipment and the storage medium for generating the SQL query statement, the query statement is subjected to word segmentation by acquiring the query statement, at least one column name to be queried is determined according to word segmentation results, a table name associated with each column name to be queried is searched in a pre-constructed knowledge graph, and a target table name to be queried is determined from the table names associated with the column names; determining the dependency relationship of each participle in the participle result, and determining the preset condition in the SQL query statement according to the dependency relationship of the target participle belonging to the column name in the participle result, wherein the preset condition comprises one or more combinations of a grouping condition, an ordering condition and a limiting condition, so that the defect that other schemes cannot process the grouping condition, the ordering condition and the like is overcome, and the accuracy and the practicability of the SQL query statement generation are improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.

Fig. 1 is a schematic flowchart of a method for generating an SQL query statement according to an embodiment of the present application;

FIG. 2 is a schematic subtask flow diagram of a task to be queried, which is output by an SQL generation model according to another embodiment of the present application;

FIG. 3 is a schematic flow chart of construction of a knowledge graph according to another embodiment of the present application;

FIG. 4 is a schematic illustration of a knowledge graph corresponding to performance and sales gauges, as provided in another embodiment of the present application;

FIG. 5 is a flowchart illustrating the determination of a target table name to be queried according to another embodiment of the present application;

FIG. 6 is a flowchart illustrating the steps of building a dependency syntax tree, building a grouped entity recognition model, building a sorted entity recognition model, and building a restricted entity recognition model according to another embodiment of the present application;

FIG. 7 is a schematic flow chart illustrating the obtaining of preset conditions according to another embodiment of the present disclosure;

FIG. 8 is a diagram illustrating a generated dependency syntax tree according to an embodiment of the present application;

FIG. 9 is a diagram illustrating a dependency syntax tree after determining entity types corresponding to child nodes of a target participle according to an embodiment of the present application;

FIG. 10 is a diagram illustrating another generated dependency syntax tree provided by an embodiment of the present application;

FIG. 11 is a diagram illustrating another example of a dependency syntax tree after determining entity types corresponding to child nodes of a target participle according to the present application;

fig. 12 is a schematic structural diagram of an apparatus for generating an SQL query statement according to an embodiment of the present application;

fig. 13 is a schematic structural diagram of an electronic device according to a method for generating an SQL query statement according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to the embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application.

As used herein, the singular forms "a", "an", "the" and "the" include plural referents unless the context clearly dictates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

The terms referred to in this application will first be introduced and explained:

SQL (Structured Query Language), a special purpose programming Language, is a database Query and programming Language for accessing data and querying, updating and managing relational database systems, the Structured Query Language is a high-level non-procedural programming Language, allows users to work on high-level data structures, does not require users to specify a storage method for data, and does not require users to know a specific data storage mode, so different database systems with completely different underlying structures can use the same Structured Query Language as an interface for data input and management, Structured Query Language statements can be nested, which makes it have great flexibility and powerful functions, SQL can independently complete all activities in the database life cycle, including defining relational patterns, entering data, establishing database life cycle, and the like, A series of operations such as checking, inquiring, updating, maintaining, database reconstructing, database security control and the like provide a good environment for database application system development, and after the database is put into operation, the mode can be gradually modified as required without influencing the operation of the database, so that the system has good expandability.

The most prominent, central part of the SQL language is its query function. SQL query statements are used to retrieve data already present in the database in a particular combination, conditional expression, or order, done using select statements. The basic format of query in SQL is a query block composed of select clause, from clause and where clause:

name of select column

from Table name

where query qualifiers "

Select specifies the data that is desired to view those columns, from specifies the data from those data tables, where specifies which rows are desired to view, representing the query qualifier, and SQL query statements include, in addition to the select clause, the from clause, and the where clause, a group by clause representing the grouping condition, an order by clause representing the ordering condition, and a limit clause representing the qualifier.

Knowledge Graph (knowledgegraph) is a series of different graphs displaying the relationship between the development process and the structure in the book intelligence field, and describes Knowledge resources and carriers thereof by using visualization technology, and excavates, analyzes, constructs, draws and displays Knowledge and the mutual connection between the Knowledge resources and the carriers.

Syntactic analysis is one of the key techniques in natural language processing, and its basic task is to determine the syntactic structure of a sentence or the dependency between words in the sentence. The syntactic analysis mainly comprises two aspects of contents, namely, determining a grammar system of a language, namely, giving formal definition to a grammar structure of a legal sentence in the language; another aspect is syntactic analysis techniques, i.e. the automatic derivation of the syntactic structure of a sentence, according to a given syntactic hierarchy, the analysis of the syntactic units contained in the sentence and the relations between these syntactic units.

The method, the device, the electronic device and the computer-readable storage medium for generating the SQL query statement aim to solve the technical problems in the prior art.

The following describes the technical solution of the present application and how to solve the above technical problems in detail by specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

The SQL (structured query language) sentence converted by natural language is one of core technologies in enhanced analysis, the enhanced analysis refers to data analysis and BI (Business Intelligence) functions based on machine learning, and comprises modules for intelligent data discovery, enhanced data preparation, enhanced data analysis and the like.

In the actual operation scene of enterprises in China, commercial data is mostly stored in a table form, the searching, the analysis and the processing of the data are mostly achieved through a database, and the database is called through SQL statements. In a data-only question-answer scenario, an SQL query statement is generated according To a natural Language input by a user, and result data of SQL operation is displayed visually, the process is called as NL2SQL (natural Language To SQL, the natural Language is converted into SQL statements), the NL2SQL can be used for carrying out rapid adaptation on an enterprise database, data retrieval and text analysis processes are simple and convenient, an enterprise can be helped To interact with the database freely, and knowledge value of the enterprise database is activated effectively. In an actual 2B (enterprise-oriented) scenario, enterprise customers who use a large amount of databases are mostly high-net-value users such as financial insurance, financing enterprises and large-scale companies, and a stronger support is provided for product profit after the technology is landed.

With the continuous increase of the complexity of the business, the fixed index query can not follow the development of the business completely, many times, the business needs to query data through SQL at any time and any place, but the SQL query threshold for non-professionals is high, the business hopes to query the needed data through a natural language mode, the existing method for converting the natural language into SQL query sentences converts the task into planting filling, and then trains a plurality of different sub-models to skip slots for each slot by using a deep learning method, such as an X-SQL (enhanced format with text enhancement) model and an X-SQL model, which are used for converting the query sentences into SQL sentences, but only aiming at the scenes of English, the extraction accuracy of the information hidden in the Chinese grammar is low, in addition, the X-SQL model can only query a specified single table, cannot process fuzzy time, and cannot support sequencing conditions, grouping conditions, limiting conditions and the like.

The embodiment of the application provides a method for generating an SQL query statement, as shown in fig. 1, the method includes:

step S101, obtaining a query statement, wherein the query statement comprises a task to be queried described by a natural language.

The query statement is a statement described in natural language input by a user, the query statement input by the user is intended to obtain a query result and carries a query task, and the query statement is a statement described in natural language including a task to be queried. For example, the query statement "search sales of subsidiaries in the last half year" and "search sales of subsidiaries in the last half year" are expressed by natural language, most users are non-professionals and do not necessarily use SQL query statement to query, and in order to improve the universality and applicability of the query, the query statement input by the user is the statement described by natural language.

Step S102, performing word segmentation processing on the query sentence, and determining at least one column name to be queried according to the word segmentation result.

The query sentence is a sentence expressed by natural language, which usually has implicit query meaning, and in order to obtain more accurate query result, the query sentence needs to be participled, and the participle processing is to perform participle on the sentence according to the part of speech and the sentence structure, and is similar to sentence break. The word segmentation means or the word segmentation tool used in the embodiment of the application is not limited.

Specifically, the embodiment of the application uses jieba participle (a participle tool) to participle the query statement, and participle the query statement "total sales of 2 months" to obtain the participle result of "2 months", "of", "total", "sales".

After the segmentation result is obtained, the column name and the where clause to be queried are obtained according to the segmentation result, the database contains a lot of data, the data are all stored in the table, the table is composed of a plurality of columns, each column (also called field) in the table represents one type of data, the column names of the columns are different and represent different types of stored data, for example, sales of '2 months', 'total', 'sales' in the segmentation result belong to the column name, the target table name to be queried can be further obtained through the obtained column name, and the detailed process is shown in step S103.

As an alternative embodiment, before performing the word segmentation process, the query statement may be input into a pre-established deactivation word bank, and words in the query statement that are not related to the query task are removed.

The stop word stock is used for removing words irrelevant to actual query in the query sentence, stop words irrelevant to actual query, such as 'please help me', 'help me calculate', 'search', 'query', 'please inquire', 'please help me inquire' and the like, can be added into the stop word stock, the query sentence is input into the stop word stock for matching, the stop words contained in the query sentence can be removed, and interference on the actual query sentence is reduced.

Specifically, a query sentence "please help me to search the total sales of month 2 this year" is input into the stop word bank, the matched "please help me to search" is a stop word, and the sentence obtained by removing the stop word is the total sales of month 2.

Step S103, table names associated with each column name to be inquired are searched in a pre-constructed knowledge graph, and target table names to be inquired are determined from the table names associated with the column names; wherein, the association relation between the table name and the column name in the database is stored in the knowledge map.

In the prior art, two main ways are available for determining the target table name, one way is to directly limit the designated table name, and the way has great limitation in practical application, can only definitely query specific table names and cannot intelligently select the target table name in a database according to natural language; the other method is that the table name is added into the SQL generation model for training, the complexity of a model sample is increased, the quantity of training samples is increased, the training time is increased, and the accuracy rate of determining the target table name is lower under the same sample condition. The embodiment of the application searches the table names associated with the column names in a mode of constructing a knowledge graph between the table names and the column names.

The knowledge graph is essentially a semantic network, and is based on a data structure of a graph, stores knowledge in a graph mode, and returns processed and inferred knowledge to a user. It is composed of "nodes" representing "entities" in the real world and "edges" representing "relationships" between the entities. The knowledge graph in the embodiment of the application stores the association relationship between the table names and the column names, the nodes in the knowledge graph are the table name entity and the column name entity, wherein the table name entity is located at the central position, the column name entity is located at the peripheral position, the edges point to the table name entity from the column name entity, the association relationship exists between the column name and the table name, one table name can be associated with a plurality of column names, one column name can also be associated with a plurality of table names, the table name associated with each column name can be obtained by inputting the column name to be inquired into the knowledge graph storing the association relationship between the table names and the column names of the database, the common table name appearing in the table names associated with each column name is determined from the table names associated with each column name to be inquired, if only one common table name exists, the common table name is determined as the target table name, if the common table names are more than one, the common table names are presented to the user so that the user selects one table name from the common table names, and the table name selected by the user is determined as the target table name.

Specifically, the query sentence "average sales of subsidiaries for approximately six months" is a participle result of "approximately 6 months", "each", "subsidiaries", "the", "average", "sales", "wherein" subsidiaries "and" sales "are both subject to column names and are both target participles, and" approximately six months "is time, is subject to column names" date ", and is also a target participle, and" subsidiaries "," sales "and" date "are entered into the knowledge map, and a table name associated with" subsidiaries "is found to have" performance table "and" sales table ", a table name associated with" sales "has" performance table ", a table name associated with" date "has" performance table "and" sales table ", and therefore, the common table name is" performance table ", and is unique, and therefore, the" performance table "is determined to be the target table name.

According to the method and the device, the knowledge graph for storing the association relation between the table names and the column names in the database is built, the column names to be inquired are input into the knowledge graph, the table names associated with the column names can be obtained, the target table names to be inquired are determined from the table names associated with the column names, and the determining efficiency and the determining accuracy of the target table names are greatly improved.

And step S104, determining the dependency relationship of each participle in the participle result, and determining the preset condition in the SQL query sentence according to the dependency relationship of the target participle belonging to the column name in the participle result.

In addition to the column name to be queried, the target table name and the where clause, the SQL query statement also has grouping conditions, sorting conditions and limiting conditions, which respectively correspond to the group by clause, the order by clause and the limit clause, and other schemes at the present stage cannot recognize the clauses, which results in low applicability and generalization capability of the conversion of the SQL query statement.

In the embodiment of the application, on the basis of performing word segmentation processing on a query sentence, the dependency relationship among all the participles in a word segmentation result is determined, the core participle in the dependency relationship dominates the participles of other components, the core participle refers to a participle which is not dominated by the participles of other components, all dominated participles depend on the core participle in a certain dependency relationship, such as "red apple", "apple" is the core participle, and "red" is the dominated participle. Dependencies include the core relation (head, abbreviated as "HED"), the middle relation (ATT), the left appendage relation (left appendage, abbreviated as "LAD"), the right appendage relation (right appendage, abbreviated as "RAD"), and the main predicate relation (subject verbs, abbreviated as "SBV"), where the core relation refers to the core of the whole sentence, the middle relation is usually a word that modifies another word, such as "red apple", "red", that modifies "apple", and a middle relation is between "apple" and "red", the left appendage relation is usually a left appendage relation of "and", "or" preceding terms, such as "flower and grass", and "grass", the right appendage relation is usually a category or object or an assistant, and a right appendage relation of "category" or "following" children ", such as" children "is added, the main-predicate relation is the relation between the subject and the predicate, and the dependency relation of each participle in the participle result is determined by performing syntactic analysis on the query statement.

And for the target participles belonging to the column name in the participle result, determining preset conditions in the SQL query sentence according to the dependency relationship of the target participles, wherein the preset conditions comprise grouping conditions, sorting conditions and limiting conditions.

Specifically, in the query statement "average sales of subsidiaries in last six months", the result of the participle is that "last six months", "each", "subsidiaries", "the", "average", "sales", "subsidiaries" and "sales" all belong to a column name, are all target participles, and "last six months" is time, belongs to a column name "date", and is also a target participle, and the dependencies between "sales" and "last six months", "average", and "subsidiaries" are all a centering relationship, and the dependencies between "subsidiaries" and "each" and "of" are respectively a centering relationship and a right additional relationship, where "each" represents a grouping corresponding to a grouping condition, and "each" has a centering relationship, and thus determines that the grouping condition is "group by subsidiaries".

According to the method and the device, the dependency relationship among the participles is determined, and for the target participles belonging to the column name, if the participles having the dependency relationship with the column name include words representing grouping, sorting or quantity, the corresponding grouping condition, sorting condition or limiting condition can be determined to be specific to the column name, namely the grouping conditions of 'group by target participles', 'order by target participles' and 'target participles limit m', wherein m is any positive integer, so that the applicability and the accuracy of SQL query statement conversion are improved.

Step S105, generating an SQL query statement according to at least one column name to be queried, the target table name and a preset condition.

After the column name, the target table name and the preset condition to be queried are obtained, the column name, the target table name and the preset condition to be queried need to be filled into an SQL query statement template to generate an SQL query statement, the SQL query statement template comprises placeholders which respectively correspond to at least one column name, target table name and preset condition to be queried, the placeholders are reserved positions, and the SQL query statement is generated by filling the column name, the target table name and the preset condition to be queried into the corresponding placeholders.

Continuing the above example, the columns to be queried determined in the above steps are named as "subsidiaries" and "sales", the target table is named as "performance table", the preset conditions include a grouping condition of "group by subsidiaries", the where condition is that "date is nearly six months", and the SQL query statement template is:

select placeholder

from placeholder

where placeholder

A placeholder is provided for the first device,

inserting the column name, the target table name, the where condition and the preset condition to be queried into an SQL query statement template to obtain an SQL query statement: the select subsidiary, sales from Performance Table where date, is nearly six months group by subsidiary.

The method comprises the steps of obtaining query sentences, performing word segmentation processing on the query sentences, determining at least one column name to be queried according to word segmentation results, searching table names associated with each column name to be queried in a pre-constructed knowledge graph, determining a target table name to be queried from the table names associated with the column names, greatly improving the determination efficiency and accuracy of the target table name, determining the dependency relationship of each word in the word segmentation results, determining the preset conditions in the SQL query sentences according to the dependency relationship of the target word segmentation, wherein the preset conditions comprise one or more combinations of grouping conditions, sorting conditions and limiting conditions, making up the defects that other schemes cannot process the grouping conditions, the sorting conditions and the like, and improving the accuracy and the practicability of the SQL query sentence generation.

The embodiment of the present application provides a possible implementation manner, and determining at least one column name to be queried according to a word segmentation result includes:

The word segmentation result is input into a pre-trained SQL generation model to obtain subtasks of a task to be queried, which are output by the SQL generation model, wherein the subtasks are partial constituent elements of the SQL constituent elements to be generated, and include 8 subtasks, namely, Select-Number (query Column Number), Select-Column (query Column name), Select-Aggregation (Aggregation function of query), Where-Number (query Number), Where-Column (Column name in query Condition), Where-Operator (Operator in query Condition), Where-Value (Value corresponding to Column name in query Condition), and Where-Condition-Operator (logical Operator of query), Where Select-Number represents the Number of columns to be predicted; the Select-Column represents the predicted Column name to be queried; the Select-Aggregation represents the Aggregation function corresponding to the ith column, including Summation (SUM), Averaging (AVG), Maximum (MAX) and Minimum (MIN); Where-Number denotes the Number of predicted conditions; Where-Column indicates the condition field of the prediction, i.e. the condition associated with the Column name, Where-Operator indicates the condition Operator corresponding to the condition field of the prediction, including >, <, >, |! !! <, < >; the Where-Value represents a numerical Value corresponding to a predicted Condition field, and Where-Column corresponds to a field in the Where-Column, Where the numerical Value corresponds to a field in the Where-Column, Where-Condition-Operator represents a prediction logic Operator, including and (or) or (or), the SQL generation model is used to predict values of the 8 subtasks, Where the 8 subtasks are all correlated, and whether prediction is correct or not can be determined by correlation between the subtasks, and when sample training is performed, a prediction result can be adjusted by correlation between the 8 subtasks, for example, "Select-Number: 2 'indicating that the Number of the columns to be inquired is two, the Select-Column is defined as' Select-Column ', the Column name is 1, the Column name is 2', if only one Column name exists in the Select-Column, the Select-Number prediction is wrong; for another example: "Where-Number: 2' indicating that the Where-Column includes two condition fields, two operations are in the Where-Operator, two values are in the Where-Value, and whether prediction error is caused can be judged through the relation between subtasks. The general flow of processing query statements of the SQL generation model is: the method comprises the steps of firstly coding an original problem and a field name to obtain a uniform coding dimension to ensure the accuracy of model construction, then manually adding [ CXT ] in front of the problem to extract global information, and then performing 8 subtask predictions, for example, inputting a query statement 'the name and class of a girl older than 12 years' into an SQL generation model to obtain subtasks output by the SQL generation model:

Select-Number：2

Select-Column: name, class

Select-Aggregation: is free of

Where-Number：2，

Where-Column: the age, the sex,

Where-Operator：>，＝，

Where-Value: 12, girl's life

Where-Condition-Operator：and，

"Select-Number: 2 "indicates that 2 Column names are to be queried," Select-Column: name, class "indicates that the column name to be queried is name and class," Select-Aggregation: no "means no aggregation function," Where-Number: 2 "indicates that there are two Where conditions and there is a" and "relationship between the two, and" Where-Condition-Operator: and "is the corresponding," Where-Column: age, sex "," wheel-Operator: >, "" Where-Value: 12, girl "is also related, which means that" age' >12and gender is girl ", so that the subtask can be corrected by the relationship between 8 subtasks to obtain the correct subtask.

A possible implementation manner is provided in the embodiment of the present application, as shown in fig. 2, which exemplarily shows a subtask flow diagram of a task to be queried, which is output by an SQL generation model and obtained in another embodiment of the present application, as shown in the figure, the task to be queried includes fuzzy time information; the method for obtaining the subtasks of the task to be queried output by the SQL generation model comprises the following steps:

step S201, replacing the fuzzy time information in the word segmentation result of the query sentence with a preset fuzzy word to obtain a new word segmentation result.

In generating SQL query statements from query statements, the identification of fuzzy time information is an important and difficult problem, since in the chinese context, for example: the fuzzy time is often required to be converted into conditional clauses such as in and between, rather than only being converted into the form of "where time is 20201030" recognized by the existing method, and if the existing method is used for generation, in a scene with fuzzy time, the generated result is an incorrect conditional clause such as "where time is nearly 6 months", which causes the generated SQL to be unable to run, and therefore, the influence of the fuzzy time on the generation of the SQL query statement needs to be removed.

In the process of training an SQL generation model, in order to eliminate the complexity of training, the fuzzy time in the query sentence sample is replaced by a preset fuzzy word, and in order to ensure that the vector length of the query sentence is consistent during sample training, the preset fuzzy word may be "XXXX month XX day in XXXX year, X is any positive integer, which is equivalent to inserting a time placeholder in the query sentence, and replacing the original fuzzy time information with the preset fuzzy word, for example, the query sentence is" query average sales of each subsidiary company in last six months ", and the segmentation result after removing the stop word" query "is" last six months "," each subsidiary "," average "," sales "," six months ", and replacing the fuzzy time" last six months "with the preset fuzzy word" 0000 year 00 month 00 day ", so as to obtain new segmentation results" 0000 year 00 month 00 day "," each "and, ' subsidiary ', ' of ', ' average ', ' sales ', ' etc.

And S202, inputting the new word segmentation result into the SQL generation model to obtain a new subtask output by the SQL generation model.

After a new word segmentation result is obtained, inputting the new word segmentation result into the SQL generation model, and replacing fuzzy time information with a preset fuzzy word, so that the subtask where value: XXXX, XX, month XX, for example, inputting "average sales of subsidiaries 00/0000" into the SQL model, and obtaining new subtasks output by the SQL model is:

Select-Number：2，

Select-Column: the number of subsidiaries, sales, etc.,

Select-Aggregation：AVG，

Where-Number：1，

Where-Column: the date of the day,

Where-Operator：＝，

Where-Value: 0000 year 00 month and 00 days

wheel-Condition-Operator: is free of

The Where-Value subtask contains the preset fuzzy word "00/0000".

in step S203, the current time information is acquired.

And step S204, inputting the current time information and the fuzzy time information into a pre-constructed time matching template to obtain a first SQL clause which is output by the time matching template and contains accurate time information.

The current time information is the time at this moment of the system, for example, the current time information is "9/10/2020", the fuzzy time information is the ambiguous expression of "almost six months", "almost half years", and "last quarter", etc., and after being converted into the SQL statement, the ambiguous time becomes "time is almost six months", "time is almost half years", and "time is last quarter", etc., which will cause the generated SQL statement to be inoperable, so that the fuzzy time needs to be converted into the corresponding accurate time "between 202004 and 202009", "time between 202004 and 202009", and "time in (202007, 202008, 202009)", in combination with the current time.

The Where-Value subtask contains a preset fuzzy word, a query result still cannot be obtained, further processing is needed, and the preset fuzzy word is replaced by a first SQL clause which can obtain the query result and contains accurate time information.

Step S204 is preceded by: inputting the fuzzy time information into a similar meaning word matching model to obtain a fuzzy time entity which is output by the similar meaning word matching model and has the same time as the fuzzy time;

the similar meaning word matching model can integrate a plurality of similar fuzzy time information into a fuzzy time entity, the fuzzy time entity represents fuzzy time, for example, similar fuzzy time such as 'nearly June', 'nearly semiannual', 'nearly 6 month' and the like can be integrated into the fuzzy time entity 'nearly 6 month', the 'nearly semiannual' is input into the similar meaning word matching model, and the matched fuzzy time entity 'nearly 6 months' can be obtained, so that the complexity of subsequent processing can be greatly reduced.

The time matching template of the embodiment of the application can receive the fuzzy time entity and convert the fuzzy time entity to obtain the SQL clause containing accurate time, and the time matching template consists of two parts: a matching sub-template and an output sub-template. The matching sub-template is used for identifying the time type of the fuzzy time entity, the matching sub-template comprises a plurality of time types, for example, the month time type can be identified as 'nearly 6 months', the year time type can be identified as 'nearly 2 years', the quarter time type can be identified as 'first quarter', and the like. In addition, the time matching template of the embodiment of the present application can identify any time type fuzzy time entity, including but not limited to a year time entity type, a month time entity type, a quarter time entity type, a day time entity type, and the like.

Specifically, the fuzzy time information is "approximately six months", the "approximately six months" is input into the synonym matching model, the obtained fuzzy time entity is "approximately 6 months", the current time information is "9 months 2020, the fuzzy time entity and the current time information are input into the time matching template, the matching sub-template identifies the type of the" approximately 6 months "month time entity, the time value" 6 "is extracted, the" 6 "is input into the output sub-template for outputting the month time entity type, the output sub-template combines with the current time" 9 months 2020 ", the time value is converted into the specific time period" 202009 ", and the first SQL sub-sentence" between 202004 and 202009 "approximately 6 months" containing accurate time is output.

In another embodiment of the present application, the fuzzy time entity is "quarter 1", the fuzzy time entity is input into the time matching template, the matching sub-template recognizes that "quarter 1" is a quarter time entity type, extracts "1", and inputs the extracted fuzzy time entity into the output sub-template for outputting the quarter time entity type, and the output sub-template combines the current time information "2020", converts the time value into a specific time period "202001, 202002, 202003", and outputs a first SQL clause "in (202001,202002,202003)" corresponding to "quarter 1" and containing accurate time.

In another embodiment of the present application, the fuzzy time entity is "near 5 days", the current time information is "9/10/2020", the fuzzy time entity and the current time information are input into the time matching template, the matching sub-template recognizes that "near 5 days" is a time entity type of days, extracts "5" and inputs the extracted data into the output sub-template for outputting the time entity type of days, the output sub-template combines the current time information "9/10/2020", converts the time value into an accurate time "20200906 + 20200910", and outputs a first SQL clause "between 20200906 and 20200910" containing the accurate time.

Step S205, replace the preset fuzzy word included in the new subtask with the first SQL clause to obtain a subtask corresponding to the query statement.

And after the first SQL clause containing accurate time information is obtained, replacing the preset fuzzy word contained in the new subtask with the first SQL clause to obtain a subtask corresponding to the query statement, so that the subtask where-value does not contain the preset fuzzy word, but contains the first SQL clause.

According to the method and the device, the fuzzy time information in the query statement is replaced by the preset fixed word before the SQL generating model is trained to obtain the query statement containing the preset fuzzy word, the word segmentation result of the query statement containing the preset fuzzy word is input into the SQL generating model to obtain the new subtask output by the SQL generating model, the fuzzy time entity is converted into the first SQL clause containing accurate time through the time matching template, the preset fuzzy word contained in the new subtask is replaced by the first SQL clause, the influence of fuzzy time on subsequent training is eliminated, the complexity of a training sample of the SQL generating model is reduced, and the training efficiency is improved.

A possible implementation manner is provided in the embodiment of the present application, as shown in fig. 3, which exemplarily provides a schematic diagram of a construction process of a knowledge graph according to another embodiment of the present application, where table names associated with each column name to be queried are searched in a pre-constructed knowledge graph, and before the step, the method further includes:

step S301, traversing at least one table name in the database, and creating a table name entity corresponding to the at least one table name.

Traversing all TABLE TABLEs (1) … TABLE (i) … TABLE (n) in the database, creating a TABLE name entity (TABLE { name: "TABLE name i" }) for the ith TABLE, wherein TABLE (i) represents the ith TABLE, and TABLE { name: "TABLE name i" }) is of the TABLE name entity created for the ith TABLE, for example, creating a TABLE 3 as a "sales amount" TABLE, and the corresponding TABLE name entity is (: TABLE { name: "sales amount }).

Step S302, traversing at least one column name of the table in the database, and creating a column name entity corresponding to the column name.

All COLUMNs (1) … COLUMN (j) … COLUMN (m) of TABLE (i) are traversed, a COLUMN entity is created (TABLE _ COLUMN { name: "COLUMN name j" }), wherein COLUMN (j) represents the jth COLUMN of the ith TABLE, and (TABLE _ COLUMN { name: "COLUMN name j" }) is the COLUMN name entity created for the jth COLUMN of the ith TABLE, for example, the 3 rd TABLE is the "sales" TABLE, the 2 nd COLUMN of the sales TABLE is the "region", and the corresponding COLUMN name entity is (TABLE _ COLUMN { name: "region" }).

Step S303, an incidence relation between the table name entity and the column name entity is created, and the incidence relation between the table name entity and the column name entity is stored in a knowledge graph.

Specifically, associations [: IS _ COLUMN _ OF _ TABLE ] OF TABLE name entities (: TABLE _ COLUMN { name: "COLUMN name i" }) and each COLUMN entity (: TABLE _ COLUMN { name: "COLUMN name 1" }) … (: TABLE _ COLUMN { name: "COLUMN name m" }) are created, triples (: TABLE { name: "TABLE name i" }) < - > - ]: IS _ COLUMN _ OF _ TABLE ] < - > (: TABLE _ COLUMN _ TABLE { name: "COLUMN name j" }) are formed, and the respective associations are stored into a knowledge graph, i.e., several triples are stored in the knowledge graph. Continuing the above example, the constructed triplet IS (: TABLE { name: "sales" }) < - [: IS _ COLUMN _ OF _ TABLE ] < - (: TABLE _ COLUMN { name: "region" }).

Specifically, the map of knowledge corresponding to the performance table and the sales table is shown in fig. 4, which exemplarily shows a schematic diagram of the map of knowledge corresponding to the performance table and the sales table provided in another embodiment of the present application, where there are 6 columns in the performance table, which are: subsidiary, sales amount, traffic type, date, area, and remarks, the sales TABLE has 4 COLUMNs, which are subsidiary, sales amount, area, and date, and the arrow points to the TABLE name by the COLUMN name, indicating the association relationship between the two, i.e., (: TABLE { name: "TABLE name i" }) < - [: IS _ COLUMN _ OF _ TABLE ] < - > (: TABLE _ COLUMN { name: "COLUMN name j" }).

The data is stored in tables, each table comprises a plurality of columns, and in order to conveniently search the tables in the database and the columns in the database, the association relationship between the table names and the column names can be created and stored in the knowledge graph, so that when the column names are known, the table names associated with the column names can be found by inputting the column names into the knowledge graph.

A possible implementation manner is provided in the embodiment of the present application, as shown in fig. 5, which exemplarily shows a flowchart illustrating a process of determining a target table name to be queried in another embodiment of the present application, where determining the target table name to be queried from table names associated with column names includes:

in step S501, a common table name is identified from among the table names associated with the respective column names.

In the previous step, the SQL generates a model and predicts Select-Column and Where-Column, which both include all the Column information in the statement, and therefore, a table name associated with the Column name needs to be found.

From the above example, more than one column name is associated with each table name, and more than one table name is associated with each column name, so that when the respective column names are known, each column name is input into the knowledge graph, and a plurality of table names associated with the column names are obtained, and the table names common to the table names associated with the respective column names need to be determined by searching.

In step S502, if the number of the common table names is one, the common table name is determined as the target table name to be queried.

If the number of the common table names is one, the common table names are unique, and the common table names are determined to be the target table names to be inquired.

Step S503, if the number of the common table names exceeds one, presenting the common table names to the user to receive the response selection of the user, and determining the table name selected by the user as the target table name to be queried.

If a plurality of table names are found, it is explained that ambiguity exists in natural language description of the query sentence, and a unique table name cannot be located, at this time, a common table name needs to be presented to a user and the user is prompted to select a name of a table to be queried, a response selection of the user is received, and the table name selected by the user is determined as a target table name to be queried.

Continuing the above example, the determined column names are "subsidiary", "sales", "date", and the reasoning process in the knowledge map is as follows:

the TABLE name for calculating the relation between the IS _ COLUMN _ OF _ TABLE and the subsidiary company IS performance TABLE and sales TABLE;

the TABLE name for calculating the relation between the IS _ COLUMN _ OF _ TABLE and the sales amount IS { performance TABLE };

the TABLE for calculating the relation between the IS _ COLUMN _ OF _ TABLE and the date IS { performance TABLE, sales TABLE };

the tables appearing in the three sets of { performance table, sales table }, { performance table, sales table } and { performance table, sales table } are calculated as 'performance tables', so that the target table of the SQL statement is deduced as follows: and (4) a performance table.

According to the method and the device, the list names are input into the knowledge graph to determine the list names associated with the list names, and then the target list names are determined, prediction is not needed through a large number of sample training models, and the efficiency and the accuracy of target list name positioning are improved.

A possible implementation manner is provided in the embodiment of the present application, as shown in fig. 6, which exemplarily shows a flowchart of building a dependency syntax tree, building a grouped entity identification model, building an ordered entity identification model, and building a restricted entity identification model in another embodiment of the present application, where determining a preset condition in an SQL query statement according to a dependency relationship of a target word segmentation includes:

step S601, constructing a dependency syntax tree of the query sentence according to the dependency relationship of each participle, wherein the dependency syntax tree comprises nodes and edges; the nodes are participles in the query sentence; edges are dependencies between nodes.

The dependency syntax tree is a tree obtained by parsing the query statement according to the dependency relationship, and the tree is called a dependency syntax tree, and the dependency syntax tree includes nodes and edges, the nodes are participles in the query statement, and the edges are the dependency relationship between the nodes, and includes: core relationships, centering relationships, left additional relationships, right additional relationships, and cardinal relationships, among others.

Step S602, a grouping entity recognition model is created, the grouping entity recognition model is used for recognizing grouping entities, and the grouping entities are participles with entity types representing grouping.

Creating a grouping recognition model for recognizing a grouping entity, the grouping entity representing grouped words for an entity type, a corpus including "each", "per", and "each" and the like representing grouped words when creating the grouping entity, inputting a participle into the grouping entity recognition model, and the grouping entity recognition model recognizing whether the participle is a grouping entity.

Step S603, a sequencing entity recognition model is created and used for recognizing a sequencing entity, and the sequencing entity is a participle with an entity type representing sequencing.

Creating a sequencing entity recognition model for recognizing a sequencing entity, wherein the sequencing entity is a word of which the entity type represents sequencing, training corpora used for creating the sequencing entity recognition model comprise words of which the sequence is represented by 'highest', 'lowest', 'maximum' and 'minimum', and the like, the input of a grouping to the sequencing entity is a must-fill model, and the sequencing entity recognition model can recognize whether the participle is the sequencing entity.

Step S604, a restriction entity recognition model is created, the restriction entity recognition model is used for recognizing a restriction entity, and the restriction entity is a word segmentation with the entity type as the representation number.

Creating a limiting entity recognition model for recognizing a quantity entity, wherein the limiting entity recognition model is a word of which the entity type represents the quantity, and a training corpus used for creating the limiting entity recognition model comprises n participles, n parts, n strips, n pairs and the like which represent the limiting quantity.

A possible implementation manner is provided in an embodiment of the present application, as shown in fig. 7, a schematic flow chart of obtaining a preset condition in another embodiment of the present application, where determining the preset condition in an SQL statement according to a dependency relationship of a target word segmentation includes:

step S701, traversing nodes in the dependency syntax tree, if the currently traversed nodes are target participles, searching child nodes of the target participles, and determining entity types of the child nodes.

After the dependency syntax tree corresponding to the query statement is created, the dependency syntax tree needs to be traversed to determine the entity type of each traversed node. Firstly, comparing the traversed nodes and Column names in the sub-task Select-Column and the sub-task Where-Column, if the participle corresponding to the traversed nodes is the Column name, determining that the participle is a target participle, and searching child nodes of the target participle, thereby determining the entity types of the child nodes, namely grouping the entities when determining the child nodes, sequencing the entities, or quantity entities.

Step S702, if the entity type of the child node is a grouping entity, determining that the preset condition in the SQL statement includes a grouping condition.

If the entity type of the child node is a grouping entity, it indicates that the preset condition in the SQL statement includes a grouping condition group, for example, the target participle is "sub company", and the entity type of one child node "each" found by the target participle is a grouping entity, and then it is determined that the grouping condition clause corresponding to the grouping condition in the SQL statement is "grouping by sub company".

Step S703, if the entity type of the child node is a sort entity, determining that the preset condition in the SQL statement includes a sort condition.

If the entity type of the child node is a sequencing entity, it indicates that the preset condition in the SQL statement includes a sequencing condition order by, for example, the target participle is "sales amount", and if the entity type of the child node "highest" found for the target participle is a grouped entity, it is determined that the sequencing condition clause corresponding to the sequencing condition in the SQL statement is "order by sales amount".

In step S704, if the entity type of the child node is a quantity entity, it is determined that the preset condition in the SQL statement includes a limitation condition.

If the entity type of the child node is the sequencing entity, it indicates that the preset condition in the SQL statement includes the limit condition limit, for example, the target participle is "region", and if the entity types of "3" child nodes of the target participle are found as quantity entities, it is determined that the limit condition clause corresponding to the limit condition in the SQL statement is "limit 3".

In addition, if the child node of the target participle has a participle representing an aggregation function, the column name in the generated SQL query statement becomes "aggregation function name (column name)", the aggregation function name includes SUM, AVG, MAX, and MIN, and when the participles are "SUM" and "total", and the like, the aggregation function is determined to be "SUM"; when the participles are "mean" and "average", etc. represent the participles of the average, the aggregation function is determined to be "AVG"; when the participle is "MAX", etc. indicates the participle of the maximum value, the aggregation function is determined to be "MAX"; when the participle is "minimum value" or the like indicates the participle of the minimum value, the aggregation function is determined to be "MIN".

Specifically, if the child node of the target participle "sales" has the participle "average" indicating the aggregation function, the column name edge in the query sentence is "AVG (sales)".

The embodiment of the present application provides a possible implementation manner, and determining an entity type of a child node includes:

Inputting the participle corresponding to the child node of the target participle into a grouping entity recognition model, if the participle is a grouping entity, determining the child node as the grouping entity, traversing the next node of the child node, and determining the entity type of the next node until all nodes are traversed; if the participle is not a grouping entity, inputting the participle into a sequencing entity recognition model, if the participle is a sequencing entity, determining the child node as the sequencing entity, traversing the next node of the child node, and determining the entity type of the next node until all nodes are traversed; if the participle is not a grouping entity, inputting the participle into a limited entity recognition model, if the participle is a quantity entity, determining the child node as the quantity entity, traversing the next node of the child node, and determining the entity type of the next node until all nodes are traversed; if the word segmentation part ranks the entities, the word segmentation is determined to be other types of entities, and the other types of entities are words which have no influence on the generation of the SQL query statement, such as the auxiliary words "of", "have", "o", and the like. In addition, the embodiment of the application does not limit the sequence of inputting the word segmentation into the grouped entity recognition model, sorting the entity recognition model and limiting the entity recognition model, and the input of the entity recognition model is within the protection scope of the application.

Specifically, in an embodiment of the present application, a dependency syntax tree generated by the parts of speech of each participle in the participle results of "near 6 months", "each", "sub-company", "of", "average", "sales", "and" dependency relationship between the participles "is shown in fig. 8, where the ROOT node of the dependency syntax tree is a fixed word ROOT, which indicates the ROOT node, the child nodes of the ROOT node are" sales ", the parts of speech are nouns, and there is a core relationship between the two, the 3 child nodes of" sales "are from left to right" near six months "," average "and" sub-company ", the parts of speech are nouns, status words and nouns, the dependency relationships are all fixed relationships, neither the child node exists in" six months "and" average ", the sub-company" has two child nodes, and there are "each" and "from left to right", the parts of speech relationship are pronouns and co-words, the dependency relationships are respectively a centering relationship and a right-appended relationship. The phrase of the starting side of the arrow is a core word, and is governed by the phrase of the collected other component, and the phrase of the arrow direction is governed by the core word.

Traversing the above dependency syntax tree, determining whether the participle of each node in the dependency syntax tree is a column name, and determining the entity type corresponding to the child node of the target participle for the target participle belonging to the column name, as shown in fig. 9, which exemplarily shows a schematic diagram of the dependency syntax tree after determining the entity type corresponding to the child node of the target participle according to another embodiment of the present application, determining "sales" as the column name "sales", "subsidiaries" as the column name "subsidiaries", "last six months" as the column name "date", "average" as the aggregation function "AVG", "each" as a grouping entity, corresponding to the grouping condition "group by", "is a help word, belongs to other entity types, and is meaningless, because" average "is the child node of" sales ", therefore, the column name" sales "of the subtask select-column" becomes "AVG (sales)", "each" is a child node of "subsidiary", and therefore, the grouping condition clause "group by subsidiary" is added.

In an embodiment of the present application, a dependency syntax tree constructed by parts of speech of each participle in the participle results of "highest", "sales", "3", "region", and "and dependencies between the participles is shown in fig. 10, where a ROOT node of the dependency syntax tree is a fixed word ROOT, which indicates a ROOT node, child nodes of the ROOT node are" region ", parts of speech are nouns, which are core relationships, 2 child nodes of the" region "are" 3 "and" sales "from left to right, parts of speech are respectively numerals and nouns, dependencies are both in a fixed relationship, two child nodes of the" sales "are" highest "and" from left to right, parts of speech are respectively adjectives and auxiliary words, and corresponding dependencies are respectively a predicate relationship and a right-appended relationship. The phrase of the starting side of the arrow is a core word and is dominated by the phrase of the collected other components, and the phrase of the arrow direction is dominated by the dominated word and is dominated by the core word.

Traversing the dependency syntax tree, determining whether the participle of each node in the dependency syntax tree is a column name, and determining the entity type corresponding to the child node of the target participle for the target participle belonging to the column name, as shown in fig. 11, determining that "region" is the column name "region", the sales amount is the column name "sales amount", "3" are numerals, and are quantity entities, and correspond to the limit condition "limit", and "3" are child nodes of the target participle "region", so as to determine that the limit condition clause "limit 3", "highest" is a sort entity, and correspond to the sort condition "order by", and "highest" is the child node of the target participle "sales amount", and the obtained sort condition is "order by sales amount", "the" is a help word, and belongs to other entity types, and is meaningless.

The embodiment of the application deduces clauses such as 'group by', 'order by', 'limit' and the like through syntactic analysis based on the construction of the dependency syntax tree, the grouping entity recognition model, the ordering entity recognition model and the limiting entity recognition model, makes up the defect that other schemes cannot process grouping conditions, ordering conditions and limiting conditions, and improves the accuracy and the applicability of the generation of the SQL query statement.

The embodiment of the present application provides a possible implementation manner, and the generating of an SQL query statement according to at least one column name to be queried, a target table name, and a preset condition includes:

The placeholder is a position where a column name, a target table name and a preset condition to be queried are reserved and inserted in the SQL query statement template, and the format of the SQL query statement is as follows:

select placeholder

from placeholder

where placeholder

Placeholder

After the column name, the target indication, the where condition and the preset condition to be queried are determined, the column name, the target indication, the where condition and the preset condition to be queried are respectively inserted into placeholders corresponding to the column name, the target indication, the where condition and the preset condition to be queried in the SQL query statement template,

the sequence of inserting the column name, the target indication, the where condition and the preset condition to be inquired into the SQL query statement template is not limited, and the column name, the target indication, the where condition and the preset condition to be inquired can be inserted into the placeholder of the SQL query statement template after any one of the column name, the target indication, the where condition and the preset condition to be inquired is obtained, or can be inserted into the placeholder of the SQL query statement template after all the component elements are determined.

An embodiment of the present application provides a device for generating an SQL query statement, and as shown in fig. 12, the SQL generating device 120 may include:

an obtaining module 121, configured to obtain a query statement, where the query statement includes a task to be queried described in a natural language;

a column name determining module 122, configured to perform word segmentation on the query statement, and determine at least one column name to be queried according to a word segmentation result;

a target table name determining module 123, configured to search a table name associated with each column name to be queried in a pre-constructed knowledge graph, and determine a target table name to be queried from the table names associated with the column names; wherein, the incidence relation between the data table name and the column name in the database is stored in the knowledge map.

The preset condition determining module 124 is configured to determine a dependency relationship of each participle in the participle result, and determine a preset condition in the SQL query statement according to the dependency relationship of the target participle for the target participle belonging to the column name in the participle result;

a generating module 125, configured to generate an SQL query statement according to at least one column name to be queried, a target table name, and a preset condition;

the apparatus for generating an SQL query statement in this embodiment may execute the method for generating an SQL query statement in the foregoing embodiment of the present application, and the implementation principles thereof are similar, and are not described herein again.

On the basis of the foregoing embodiments, as an optional embodiment, the column name determining module includes: and inputting the word segmentation result into a pre-trained SQL generation model to obtain subtasks of the tasks to be queried, which are output by the SQL generation model, and determining at least one column name to be queried according to the subtasks.

On the basis of the above embodiments, as an optional embodiment, the task to be queried includes fuzzy time information; the method for obtaining the subtasks of the task to be queried output by the SQL generation model comprises the following steps:

acquiring current time information;

On the basis of the above embodiments, as an optional embodiment, the column name determining module further includes a knowledge graph constructing sub-module:

On the basis of the foregoing embodiments, as an optional embodiment, the target table name determining module includes:

On the basis of the foregoing embodiments, as an optional embodiment, the preset condition determining module includes:

the grouping entity identification model building submodule is used for identifying a grouping entity, and the grouping entity is a word segmentation of which the entity type is a grouping expression;

On the basis of the foregoing embodiments, as an optional embodiment, the preset condition determining module further includes:

On the basis of the foregoing embodiments, as an alternative embodiment, the determining the entity type of the child node includes:

On the basis of the foregoing embodiments, as an optional embodiment, the generating module includes:

An embodiment of the present application provides an electronic device, including: a memory and a processor; at least one program stored in the memory for execution by the processor, which when executed by the processor, implements:

In an alternative embodiment, there is provided an electronic device, as shown in fig. 13, the electronic device 4000 shown in fig. 13 comprising: a processor 4001 and a memory 4003. Processor 4001 is coupled to memory 4003, such as via bus 4002. Optionally, the electronic device 4000 may further include a transceiver 4004, and the transceiver 4004 may be used for data interaction between the electronic device and other electronic devices, such as transmission of data and/or reception of data. In addition, the transceiver 4004 is not limited to one in practical applications, and the structure of the electronic device 4000 is not limited to the embodiment of the present application.

The Processor 4001 may be a CPU (Central Processing Unit), a general-purpose Processor, a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array) or other Programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 4001 may also be a combination that performs a computational function, including, for example, a combination of one or more microprocessors, a combination of a DSP and a microprocessor, or the like.

Bus 4002 may include a path that carries information between the aforementioned components. The bus 4002 may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus 4002 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 13, but this is not intended to represent only one bus or type of bus.

The Memory 4003 may be a ROM (Read Only Memory) or other types of static storage devices that can store static information and instructions, a RAM (Random Access Memory) or other types of dynamic storage devices that can store information and instructions, an EEPROM (Electrically Erasable Programmable Read Only Memory), a CD-ROM (Compact Disc Read Only Memory) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), a magnetic Disc storage medium or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to these.

The memory 4003 is used for storing application program codes (computer programs) for executing the present scheme, and is controlled by the processor 4001 to execute. Processor 4001 is configured to execute application code stored in memory 4003 to implement what is shown in the foregoing method embodiments.

The present application provides a computer-readable storage medium, on which a computer program is stored, which, when running on a computer, enables the computer to execute the corresponding content in the foregoing method embodiments. Compared with the prior art, the query statement is obtained, the word segmentation is carried out on the query statement, at least one column name to be queried is determined according to the word segmentation result, the table name associated with each column name to be queried is searched in a pre-constructed knowledge graph, the target table name to be queried is determined from the table name associated with the column name, the determination efficiency and the accuracy of the target table name are greatly improved, the preset conditions in the SQL query statement are determined according to the dependency relationship of the target word segmentation in the word segmentation result by determining the dependency relationship of each word segmentation in the word segmentation result, the preset conditions comprise one or more combinations of grouping conditions, sorting conditions and limiting conditions, the defects that other schemes cannot process the grouping conditions, the sorting conditions and the like are overcome, and the accuracy and the practicability of SQL query statement generation are improved.

The embodiment of the present application provides a computer program, which includes computer instructions stored in a computer-readable storage medium, and when a processor of a computer device reads the computer instructions from the computer-readable storage medium, the processor executes the computer instructions, so that the computer device executes the contents as shown in the foregoing method embodiment. Compared with the prior art, the query statement is obtained, the word segmentation is carried out on the query statement, at least one column name to be queried is determined according to the word segmentation result, the table name associated with each column name to be queried is searched in a pre-constructed knowledge graph, the target table name to be queried is determined from the table name associated with the column name, the determination efficiency and the accuracy of the target table name are greatly improved, the preset conditions in the SQL query statement are determined according to the dependency relationship of the target word segmentation in the word segmentation result by determining the dependency relationship of each word segmentation in the word segmentation result, the preset conditions comprise one or more combinations of grouping conditions, sorting conditions and limiting conditions, the defects that other schemes cannot process the grouping conditions, the sorting conditions and the like are overcome, and the accuracy and the practicability of SQL query statement generation are improved.

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A method for generating an SQL query statement is characterized by comprising the following steps:

determining the dependency relationship of each participle in the participle result, and determining a preset condition in the SQL query statement according to the dependency relationship of the target participle in the participle result, wherein the target participle belongs to the column name;

generating an SQL query statement according to the at least one column name to be queried, the target table name and the preset condition;

wherein, the incidence relation between the table name and the column name in the database is stored in the knowledge graph.

2. The method for generating an SQL query statement according to claim 1, wherein the determining at least one column name to be queried according to the word segmentation result comprises:

and inputting the word segmentation result into a pre-trained SQL generating model to obtain subtasks of the tasks to be queried, which are output by the SQL generating model, and determining at least one column name to be queried according to the subtasks.

3. The method for generating an SQL query statement according to claim 2, wherein the task to be queried includes fuzzy time information;

the obtaining of the subtasks of the task to be queried output by the SQL generating model comprises:

the determining at least one column name to be queried according to the word segmentation result comprises:

acquiring current time information;

4. The method for generating an SQL query statement according to claim 1, wherein the searching a table name associated with each column name to be queried in a pre-constructed knowledge graph further comprises, before constructing the knowledge graph:

traversing at least one table name in a database, and creating a table name entity corresponding to the at least one table name;

and creating an incidence relation between the table name entity and the column name entity, and storing the incidence relation between the table name entity and the column name entity into a knowledge graph.

5. The method for generating an SQL query statement according to claim 1, wherein the determining a target table name to be queried from table names associated with column names comprises:

if the number of the common table names is one, determining the common table names as the target table names to be inquired;

and if the number of the common table names exceeds one, presenting the common table names to a user to receive response selection of the user, and determining the table names selected by the user as the target table names to be inquired.

6. The method for generating an SQL query statement according to claim 1, wherein the determining the preset condition in the SQL query statement according to the dependency relationship of the target word segmentation comprises:

constructing a dependency syntax tree of the query statement according to the dependency relationship of each participle, wherein the dependency syntax tree comprises nodes and edges; the nodes are participles in the query sentence; the edges are the dependency relationship among the nodes;

creating a grouping entity identification model, wherein the grouping entity identification model is used for identifying a grouping entity, and the grouping entity is a participle with an entity type representing grouping;

creating a sequencing entity recognition model, wherein the sequencing entity recognition model is used for recognizing a sequencing entity, and the sequencing entity is a participle with an entity type representing sequencing;

and creating a limiting entity recognition model, wherein the limiting entity recognition model is used for recognizing a limiting entity, and the limiting entity is a word segmentation with the entity type being the representation quantity.

7. The method for generating an SQL query statement according to claim 6, wherein the determining the preset condition in the SQL statement according to the dependency relationship of the target participle includes:

8. The method of generating an SQL query statement according to claim 7, wherein the determining the entity type of the child node comprises:

and inputting the participles corresponding to the child nodes into at least one of a grouping entity recognition model, a sequencing entity recognition model and a limiting entity recognition model, and determining the entity types of the participles as the entity types of the child nodes.

9. The method for generating an SQL query statement according to claim 1, wherein the generating an SQL query statement according to the at least one column name to be queried, the target table name, and the preset condition comprises:

inserting the at least one column name to be queried, the target table name and the preset condition into an SQL query statement template to generate an SQL query statement;

the SQL query statement template comprises placeholders corresponding to the at least one column name to be queried, the target table name and the preset condition respectively.

10. An apparatus for generating an SQL query statement, comprising:

the target table name determining module is used for searching a table name associated with each column name to be inquired in a pre-constructed knowledge graph and determining a target table name to be inquired from the table names associated with the column names;

the preset condition determining module is used for determining the dependency relationship of each participle in the participle result, and for a target participle belonging to a column name in the participle result, determining a preset condition in the SQL query statement according to the dependency relationship of the target participle;

the generating module is used for generating an SQL query statement according to the at least one column name to be queried, the target table name and the preset condition;

and the knowledge graph stores the association relationship between the table names and the column names of the data tables in the database.

11. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method for generating an SQL query statement according to any one of claims 1 to 9 when executing the program.

12. A computer-readable storage medium storing computer instructions for causing a computer to perform the steps of the method for generating an SQL query statement according to any one of claims 1 to 9.

13. A computer program, characterized in that the computer program comprises computer instructions stored in a computer-readable storage medium, which, when read by a processor of a computer device from the computer-readable storage medium, cause the processor to execute the computer instructions, so that the computer device performs the steps of the method for generating an SQL query statement according to any one of claims 1-9.