CN109947794B

CN109947794B - Interactive natural language query conversion method

Info

Publication number: CN109947794B
Application number: CN201910129037.7A
Authority: CN
Inventors: 王梅; 陈德华; 潘乔; 李继云; 王丽敏
Original assignee: Donghua University
Current assignee: Donghua University
Priority date: 2019-02-21
Filing date: 2019-02-21
Publication date: 2023-09-01
Anticipated expiration: 2039-02-21
Also published as: CN109947794A

Abstract

The invention provides an interactive natural language query conversion method, which comprises the following steps: semantic analysis; mapping nodes; pattern matching; predefining a function operation; inquiring and interacting; and (5) result interaction. Aiming at the difficulty of non-professional user query and database use and the semantic gap between the ambiguity and abstract of the natural language in expression and the accuracy and certainty of the structured query language in big data-oriented application, the invention analyzes the natural language query description input by the user, establishes the corresponding relation between the semantic source and the database table as well as the fields, generates basic query, adds function operation on the basis to obtain the final query, combines the traditional natural language query interface with the interactive query, so that the common user can query the database in the natural language description mode, and simultaneously better captures the query intention of the user. By defining the interaction function and the result feedback mechanism, the accuracy and the high efficiency of complex query conversion are improved.

Description

Interactive natural language query conversion method

Technical Field

The invention relates to a method for interactively converting natural language query into structured query language SQL, belonging to the technical field of big data processing.

Background

With the continuous increase of the data size and the continuous increase of the data openness, how to promote the public, especially the non-professional personnel, the ability to use the data, provide friendly and convenient query and search service, and become the key problem to be solved. The research of the problem has important significance for promoting the data opening process and the large data development of China.

Currently, the standard query language for relational datasets is the structured query language (Structured Query Language, SQL). Although structured query language can accurately express the query intention of the user, writing structured query sentences requires the user to be skilled in grasping the query language used and is familiar with the structure of tables in a database, etc.

Even for professional users, it is difficult to understand the complex professional query language grammar, let alone the application-oriented average user. Natural language, on the other hand, is a natural form of human expression demand. If the natural language query can be automatically converted into SQL, the query requirement of the user can be conveniently met. However, the ambiguity and abstract nature of natural language in expression and the accuracy and certainty of structured query language present significant challenges for accurate query transformations. On the one hand, natural language query descriptions often have conciseness and ambiguity, whereas SQL is converted into a query execution plan, contains accurate specific information, has strong semantic inequality, has poor direct one-to-one conversion accuracy, and is particularly difficult to directly convert complex query intentions containing sub-queries and the like. More importantly, the SQL language is suitable for determining the query intention, but for the common user, the SQL language is not a data producer, and the query intention is difficult to determine in advance on the premise of not knowing the data, and complex interactive query is often needed to realize. To this end, it is necessary to design and implement interactive natural language query transformations.

Disclosure of Invention

The invention aims to solve the technical problems that: aiming at the query intention of the user natural language description, how to automatically convert the natural language query into SQL sentences, an interactive function and a result feedback mechanism are designed in the conversion process, the accuracy and the high efficiency of complex query conversion are improved, and the convenience of non-database professional users for accessing data is improved.

In order to solve the technical problems, the technical scheme of the invention is to provide an interactive natural language query conversion method, which is characterized by comprising the following steps:

step 1, semantic analysis;

splitting a natural language query sentence input by a user into individual words, and analyzing and processing the query sentence by a natural language analysis tool to obtain a semantic dependency tree capable of expressing complex semantic relationships, wherein leaf nodes are words in the sentence, and non-leaf nodes represent semantic dependency relationships among the nodes;

step 2, node mapping;

mapping the nodes in the semantic tree to the corresponding node types, and discarding the meaningless segmentation;

step 3, pattern matching;

establishing connection among nodes completing mapping according to a database structure stored in advance, and finding out connection relation among tables;

step 4, generating a template T of the basic query _bq ；

Step 5, predefining function operation;

according to the syntax of SQL, defining 5 function operation types, including field selection operation, screening condition operation, grouping operation, screening operation after grouping and sorting operation;

step 6, inquiring and interacting;

the user inputs the added operation description in a natural language form, a mapping pair is obtained through a word segmentation and node mapping method, the system converts the mapping pair into a specific function operation according to a defined grammar rule, the incremental query is further obtained through conversion on the basis of the preposed query, and the final query is obtained through continuous iteration;

step 7, result interaction;

the system returns the added condition of each operation to the user, and the user judges the correctness of the natural language conversion to a certain extent by checking the converted SQL and the returned result of the SQL executed in the database, including whether the converted SQL is the correct SQL conforming to the syntax and whether the converted SQL meets the semantic requirement of the user; through interaction and feedback of the intermediate query result, the user can know the data returned by the current query, and perform the next adding operation according to the data, so as to assist the user to acquire information in a progressive information acquisition mode.

Preferably, in step 1, in order to ensure that the specific vocabulary in the database used can be correctly segmented, a corresponding auxiliary corpus is constructed according to the database, and the corpus contains the specific vocabulary which appears in the database but is segmented into more than one word by the segmentation tool.

Preferably, the step 2 specifically includes:

step 2.1: mapping the nodes in the semantic tree to the corresponding node types;

firstly, classifying node types ON a semantic tree, including a selection node SN, AN operation node ON, a logic node LN, a function node FN, a sequencing node ODN, a grouping node GN, a post-grouping screening node HN, a table name node TN, a field name node AN and a value node VN, so that the node types correspond to corresponding parts in SQL;

step 2.2: creating a value-attribute inverted index for the value node;

acquiring values of fields in a database, recording fields corresponding to each value to obtain an inverted list item set, sorting according to a dictionary sequence of the values to obtain a first-level inverted index, classifying the first-level inverted index, and storing the index by using a data structure of a B+ tree;

step 2.3, node type mapping;

for the word segmentation result, firstly judging whether the word is an enumeration type node, then judging whether the word is a database table name or attribute name node, and if not, finally judging whether the word is a value node; for the value node, searching a field corresponding to the obtained value according to a value-attribute mapping method based on the inverted index, and establishing mapping from the value to the attribute; and determining the node type of each word and the data table related to query through a node mapping stage.

Preferably, the step 3 specifically includes:

step 3.1: extracting and storing the structure of the database, defining a database schema graph DSG for representing the main key and the field owned by each table and the connection relation of the main external key between the tables;

DSG is expressed as:wherein (1)>The set representing the concept of the database consists of two parts, the table name node +.>And attribute node->Likewise, the->Representing relationships between concepts, including the dependency edge->And major foreign bond relationship edge->Two types of->By the name->Point to the field contained in the table +.>By fieldPoint to the name +.>For indicating that there is a primary foreign key relationship between the two tables, and the field is the foreign key of the pointed table;

step 3.2, obtaining the pattern diagram of the current query

Default each edge of the original DSGFor a set of node map pairs queried, when a field appears, setting a weight of 1 for the dependency edge pointing to the field; when two or more tables appear, setting the main external key relation side between the tables as 1; if the connection between the two tables needs the third table as the intermediary, the corresponding connection edge of the intermediary table is also set to be 1, and the pattern diagram corresponding to the current query is obtained according to the edge with the 1 weightAnd generates a base query based on this graph.

Preferably, in the step 4, the template T of the basic query _bq See table 1;

table 1 basic query template

Table 1 shows the table _i Representing table names and tables in a database _i PKey represents a table _i Is included in the connection field of (a).

Generated according to step 3Determining the table involved in the query after the From clause, while in the white clause, the query structure is ++>The main external key edge in the list obtains the Join relation between the lists, and a Basic query is generated.

Preferably, the step 5 specifically includes:

step 5.1, defining a selection field operation: select (f||func (F), pre_q);

based on the Pre-query pre_q, adding the selected field F or the aggregation operation result Func (F) of the field F, wherein the Func (F) operation is divided into the following steps according to the aggregation function in the SQL: SUM () summing function, COUNT () counting function, MAX () maximum function, MIN () minimum function, and AVG () averaging function; defining a grammar rule of a select clause (SeleClause) as shown in formula (4-1);

SeleClause＝SN+AN|SN+AN+FN|SN+FN+AN (4-1)

step 5.2, defining screening condition operation: filter (FOP value F OP sub_Q, pre_Q);

the Where operation is divided into an explicit screening condition and a nested Sub-query condition, OP represents an operation symbol, and Sub_Q represents a nested Sub-query; defining a conditional clause (CondClause) grammar rule as shown in formula (4-2);

CondClause＝AN+ON+VN|ON+VN|CondClause

+LN+CondClause|AN+ON+Sub _Q (4-2)

if the Where clause does not exist in the Pre_Q, adding the Where clause and the newly obtained screening condition, if the Where clause exists in the Pre_Q, directly adding the newly obtained condition to the back of the original condition, and judging the connection of the screening condition in terms of ' and ' or ' through logic words;

step 5.3, defining grouping operation: group by (F, pre_Q);

on the basis of Pre_Q, adding grouping operation, generating a Group by clause, if the Group by clause does not exist in the Pre_Q, adding a Group by () clause, otherwise, directly adding a new field into the original Group by clause; defining a group operation (GroClause) grammar rule as shown in formula (4-3):

GroClause＝AN+GN (4-3)；

step 5.4, defining screening operation after grouping: having (Fun (F) OP value, pre_q);

selecting screening operation after adding the grouping on the basis of Pre_Q, and adding the obtained Having clause to the back of the Group by clause; considering the usual HavingClause with aggregation function, value is a specific number NUM, defining a grammar rule as shown in the screening clause (HavingClause) after grouping of formula (4-4):

HavingClause＝HN+FN+AN+ON+NUM (4-4)；

step 5.5, defining a sorting operation: order by (F||Func (F), pre_Q);

for the sorting operation in the SQL grammar, a sorting function is defined, when the interactive operation input by the user is that the selection result is sorted according to a certain field, the sorting function is triggered, and the sorting clause is at the end of the whole SQL sentence, a sorting operation (OrderClause) grammar rule shown in a formula (4-5) is defined:

OrderClause＝ODN+AN+ACS|DESC (4-5)。

the invention provides a method for interactively converting natural language query into Structured Query Language (SQL). Aiming at the difficulties of non-professional user query and database use in data open sharing application facing big data, and the ambiguity and abstract of natural language in expression and the accuracy and deterministic semantic gap of structured query language, the invention adopts the method of analyzing the natural language query description input by a user, establishing the corresponding relation between the original meaning and the database table and fields, generating basic query, adding function operation on the basis to obtain the final query, and combines the traditional natural language query interface and interactive query, so that the ordinary user can query the database in the natural language description mode, and simultaneously better capture the user query intention. By defining the interaction function and the result feedback mechanism, the accuracy and the high efficiency of complex query conversion are improved.

Drawings

FIG. 1 is a database DSG diagram;

FIG. 2 is a pattern diagram corresponding to query Q

FIG. 3 is a template of a base query;

FIG. 4 is a diagram of an example progressive query;

fig. 5 is a schematic diagram of a progressive query result.

Detailed Description

The invention will be further illustrated with reference to specific examples.

The embodiment provides an interactive natural language query conversion method, which comprises the steps of firstly analyzing natural language query description input by a user, establishing a corresponding relation between a meaning and a database table and a field, and generating a basic query. Based on basic inquiry, user interaction is introduced again, the user inputs an interaction function, the system automatically generates new inquiry and returns an inquiry result to the user. The user may update the query based on the query results. The overall steps are as follows:

and step 1, semantic analysis. The natural language query sentence input by the user is segmented into individual words. After the query sentence is analyzed and processed by a natural language analysis tool, a semantic dependency tree capable of expressing complex semantic relations is obtained, wherein leaf nodes are words in the sentence, and non-leaf nodes represent semantic dependency relations among the nodes. In order to ensure that the specific vocabulary in the database can be correctly segmented, a corresponding auxiliary corpus is constructed according to the database, and the corpus contains the specific vocabulary which appears in the database but is segmented into more than one word by a word segmentation tool. For example, a plurality of professional medical terms exist in a medical data set, such as thyroidectomy, renal failure, glycerinum fructose needle and the like, attribute values of all fields in the adopted data set are analyzed in advance, fields of words with stronger professionals are extracted, a special medical dictionary auxiliary corpus is established, and word segmentation accuracy is improved. For query statement Q, a patient with low hemoglobin content among patients with hyperthyroidism, after the auxiliary corpus is established in advance, "thyroid", "functional" and "hyperthyroidism" are not divided into three words and can be recognized as one specialized vocabulary. The query example sentence Q can be obtained after the first step of semantic analysis: query/hyperthyroidism/patient/medium/hemoglobin/index/content/low/patient.

And 2, node mapping. Nodes in the semantic tree are mapped to corresponding node types while meaningless segmentations such as "medium" are discarded. The method comprises the following steps:

in step 2.1, in order to better understand the semantic tree from the database perspective, the nodes in the semantic tree are mapped to the corresponding node types, and the node types ON the semantic tree are firstly classified, including a selection node SN, AN operation node ON, a logic node LN, a function node FN, a sorting node ODN, a grouping node GN, a post-grouping screening node HN, a table name node TN, a field name node AN and a value node VN, so that the nodes can correspond to the corresponding parts in the SQL.

And 2.2, creating a value-attribute inverted index for the value node. The method comprises the steps of obtaining values of fields in a database, recording the fields corresponding to each value to obtain an inverted list item set, sorting according to the dictionary sequence of the values to obtain a first-level inverted index, classifying the first-level inverted index, and storing the index by using a data structure of a B+ tree. Such as: { hyperthyroidism- > (diagnosis. ICD name, pathology report. Main diagnosis. Pathology report. Outpatient diagnosis), white blood cell- > -test index. Index name, 3.8778- > -test index. Test result }.

And 2.3, mapping the node types. For the word segmentation result, firstly judging whether the word is an enumeration type node, then judging whether the word is a database table name or attribute name node, if not, finally judging whether the word is a value node, and for the value type node, searching a field corresponding to the obtained value according to a value-attribute mapping method based on an inverted index, and establishing a mapping from the value to the attribute. And determining the node type of each word and the data table related to query through a node mapping stage. For example, for the word segmentation result of query Q in step 1, a mapping pair may be obtained: (1) query->Select, (2) hyperthyroidism->ICD name, (3) hemoglobin->Index name (4) low->Abnormal prompt for patient->Patient basic information table. Wherein (1) belongs to enumeration key mapping Value _SN (2) (3) (4) Value node mapping Value _vN (5) node mapping Value belonging to table name _TN Meanwhile, a basic information table, a diagnosis table and a check index table related to the patient table are determined and inquired according to the mapping result.

And 3, pattern matching. And establishing the relation among the nodes which complete the mapping according to the database structure stored in advance, and finding the Join connection relation among the tables. The method comprises the following steps:

and 3.1, extracting and storing the structure of the database, and defining a database schema graph DSG (Database Schema Graph) for representing the main keys and fields owned by each table and the connection relation of the main external keys between the tables. D (D)SG can be expressed as:wherein (1)>The set representing the concept of the database consists of two parts, the table name node +.>And attribute node->Likewise, the->Representing relationships between concepts, including the dependency edge->And major foreign bond relationship edge->Two types of->By the name->Point to the field contained in the table +.>By field->Pointing to a table nameFor indicating that there is a primary foreign key relationship between the two tables and that this field is the foreign key of the table pointed to. As in figure 1 is a simple database DSG (to make the figure simplerClearly, the fields contained in each table are not all shown).

Step 3.2, obtaining the pattern diagram of the current queryDefault that the weight of each side of the original DSG is 0, for a node mapping pair set of a query, when a field appears, the weight is set to be 1 for the subordinate relation side pointing to the field, when two or more tables appear, the weight is set to be 1 for the main external key relation side between the tables, if the connection between the two tables needs a third table as an intermediary, the corresponding connection side of the intermediary table is also set to be 1, and a pattern diagram corresponding to the current query can be obtained according to the side with the weight of 1>And generates a base query based on this graph. For example, query Q in step 1 corresponds to pattern +.>As shown in fig. 2.

And 4, generating a basic query. Template T of basic query _bq See fig. 3. Generated according to step 3The table involved in the query after the From clause can be determined, while in the white clause, the query structure is ++>The primary and outer key edges in the table can obtain Join relations among the tables, and Basic query is generated.

For example, query statement Q is generated in step 3 from the base query templateThe graph may get its corresponding base query:

Select*

from patient table basic information, visit, examination report, test index

The Where patient table basic information, medical card number = visit, medical card number

And, treating the patient, and reporting the patient

And inspection report number = test index report number

And 5, predefining function operation. According to the syntax of SQL, 5 kinds of function operation types are defined, including a field selection operation, a screening condition operation, a grouping operation, a screening operation after grouping, and a sorting operation. The method comprises the following steps:

step 5.1, defining a selection field operation: select (F||Func (F), pre_Q)

On the basis of the Pre-query pre_q (i.e., an add operation on the basis of this query), a selected field F or an aggregate operation result Func (F) on the field F is added, which adds the selected field to the "Select" word, defining a grammar rule as shown in formula 4-1.

SeleClause＝SN+AN|SN+AN+FN|SN+FN+AN (4-1)

Func (F) operations are divided into according to the aggregation function in SQL: SUM () summing function, COUNT () counting function, MAX () maximum function, MIN () minimum function, and AVG () averaging function. For example, the user inputs "Select age of patient", and based on the word segmentation and node mapping results, a "Select patient basic information, age" clause will be obtained.

Step 5.2, defining screening condition operation: filter (FOP value F OP Sub Q, pre Q)

The Where operation is divided into an explicit screening condition and a nested Sub-query condition, OP represents an operation symbol, and sub_Q represents a nested Sub-query. Grammar rules are defined as shown in formula 4-2.

CondClause＝AN+ON+VN|ON+VN|CondClause

+LN+CondClause|AN+ON+Sub _Q (4-2)

If the Where clause does not exist in the Pre_Q, the Where clause and the newly obtained screening condition are added, if the Where clause exists in the Pre_Q, the newly obtained condition is directly added to the back of the original condition, and the screening condition is judged to be connected with ' and ' or ' through a logic word. For example, a user may input "patient with operation name of double-first full-cut" and may get "white operation.

Step 5.3, defining grouping operation: group by (F, pre_Q)

On the basis of Pre_Q, adding grouping operation, generating a Group by clause, if the Group by clause does not exist in the Pre_Q, adding the Group by () clause, otherwise, directly adding a new field into the original Group by clause. Grammar rules are defined as shown in formulas 4-3.

GroClause＝AN+GN (4-3)

For example, a user input "Group departments" will get a "Group by" clause.

Step 5.4, defining screening operation after grouping: having (Fun (F) OP value, pre_Q)

And selecting screening operation after adding the packet on the basis of Pre_Q, and adding the obtained suspension clause to the back of the Group by clause. Considering the usual Having clause with an aggregation function, value is a specific number NUM, defining the grammar rules as shown in formulas 4-4.

HavingClause＝HN+FN+AN+ON+NUM (4-4)

For example, the user inputs "patients with more than 3 medications", a "Having count (medication name > 3)" may be obtained and added to the Group by clause.

Step 5.5, defining a sorting operation: orderby (F|Func (F), pre_Q)

For the sorting operation in the SQL grammar, a sorting function is defined, when the interactive operation input by the user is that the selection result is sorted according to a certain field, the sorting function is triggered, and the sorting clause is at the end of the whole SQL sentence, and the grammar rule shown in the formulas 4-5 is defined.

OrderClause＝ODN+AN+ACS|DESC (4-5)

For example, the user inputs "Order descending Order of patient's age", will get the "Order by patient basic information.

And 6, inquiring and interacting. The user inputs the added operation description in a natural language form, the mapping pair is obtained through a word segmentation and node mapping method, the system converts the mapping pair into specific function operation according to defined grammar rules, the incremental query is further obtained through conversion on the basis of the preposed query, and the final query is obtained through continuous iteration. For example, for the query sentence "query name of patient with low hemoglobin content in hyperthyroidism patient", the basic query pre_q may be obtained through step 1, step 2 and step 3:

“Select*

from patient table basic information, visit, examination report, test index

And, treating the patient, and reporting the patient

And inspection report number = test index report number'

The user inputs the added operation description through interaction, such as ' selecting the name of a patient ', ' screening ICD name is hyperthyroidism ', ' index name is hemoglobin ', ' abnormal prompt is low ', the corresponding selection condition ' Select patient table basic information, name ', screening condition ' visit ', ' ICD name= ' hyperthyroidism ', ' test index ', ' index name= ' hemoglobin ', ' test index ', ' abnormal prompt= ' abnormal prompt is low ', and the operation description is sequentially added into the pre-query according to the grammar rule of the operation, so that the final query can be obtained:

"Select patient table basic information. Name

From patient table basic information, visit, examination report, test index

And, treating the patient, and reporting the patient

And inspection report number = test index report number

And visit ICD name = 'hyperthyroidism'

And test index name = 'hemoglobin'

And test index anomaly prompt = 'low'

And 7, result interaction. The system can return the added condition of each operation to the user, and the user can judge the correctness of the natural language conversion to a certain extent by checking the converted SQL and the returned result of the SQL executed in the database, including whether the converted SQL is the correct SQL conforming to the syntax and whether the converted SQL meets the semantic requirement of the user. Through interaction and feedback of the intermediate query result, the user can know the data returned by the current query, and perform the next adding operation according to the data, so as to assist the user to acquire information in a progressive information acquisition mode.

For example, in the progressive query example in fig. 4, after obtaining the SQL statement of the query Q1 "the query is performed for hyperthyroidism patient check record more than 2 times," the user may execute and return the result in the database, and through simple analysis on the query result, the user may continue to add operations on the query, and Q2 and Q3 in fig. 3 are respectively added with "check index" index name = 'TRAB' "and" check index "on the basis of Q1. And FIG. 5 shows the results of partial queries performed by Q1, Q2, Q3 in the system, to better assist the user in obtaining the final desired query result in a progressive manner through interaction of the results.

While the invention has been described with respect to preferred embodiments thereof, it will be understood by those skilled in the art that various modifications and additions may be made without departing from the scope of the invention. Equivalent embodiments of the present invention will be apparent to those skilled in the art having the benefit of the teachings disclosed herein, when considered in the light of the foregoing disclosure, and without departing from the spirit and scope of the invention; meanwhile, any equivalent changes, modifications and evolution of the above embodiments according to the essential technology of the present invention still fall within the scope of the technical solution of the present invention.

Claims

1. An interactive natural language query conversion method is characterized by comprising the following steps:

step 1, semantic analysis;

dividing a natural language query sentence input by a user into individual words, wherein the words comprise words in the sentence and meaningless words, analyzing and processing the query sentence by a natural language analysis tool to obtain a semantic dependency tree capable of expressing complex semantic relations, wherein leaf nodes are words in the sentence, and non-leaf nodes represent semantic dependency relations among the nodes;

step 2, node mapping;

step 3, pattern matching;

step 4, generating a template T of the basic query _bq ；

Step 5, predefining function operation;

step 6, inquiring and interacting;

step 7, result interaction;

the system returns the added condition of each operation to the user, and the user judges the correctness of the natural language conversion by checking the converted SQL and the returned result of the SQL executed in the database, including whether the converted SQL is the correct SQL conforming to the syntax and whether the converted SQL meets the semantic requirement of the user; through interaction and feedback of the intermediate query result, the user can know the data returned by the current query and perform the next adding operation according to the data, and the user is assisted to acquire information in a progressive information acquisition mode;

the step 2 specifically includes:

step 2.2: creating a value-attribute inverted index for the value node;

step 2.3, node type mapping;

2. The interactive natural language query conversion method of claim 1, wherein: in step 1, in order to ensure that the specific vocabulary in the database can be correctly segmented, a corresponding auxiliary corpus is constructed according to the database, and the corpus contains the specific vocabulary which appears in the database but is segmented into more than one word by a word segmentation tool.

3. The interactive natural language query conversion method of claim 1, wherein: the step 3 specifically includes:

DSG is expressed as:wherein (1)>The set representing the concept of the database consists of two parts, the table name node +.>And attribute node->Likewise, the->Representing relationships between concepts, including dependency edgesAnd major foreign bond relationship edge->Two types of->By the name->Point to the field contained in the table +.> By field->Point to the name +.>For indicating that there is a primary foreign key relationship between the two tables, and the field is the foreign key of the pointed table;

step 3.2, obtaining the pattern diagram of the current query

Default the weight of each edge of the original DSG is 0, for a queried node mapping pair set, when a field appears, set the weight for the subordinate relation edge pointing to the field as 1; when two or more tables appear, setting the main external key relation side between the tables as 1; if the connection between the two tables needs the third table as the intermediary, the corresponding connection edge of the intermediary table is also set to be 1, and the pattern diagram corresponding to the current query is obtained according to the edge with the 1 weightAnd generates a base query based on this graph.

4. The interactive natural language query conversion method of claim 1, wherein: in the step 4, the template T of the basic query _bq The following are provided:

Select*

From table _i ,…,table _k the// query involving a data table

Wheretable _i .PKey＝table _j .PKey

And…

And table _j .PKey＝table _k PKey// selection table Join connection conditions

Generated according to step 3After determining the From clause, the table involved is queried, while in the while clause, the query pattern is +.>The main external key edge in the list obtains the Join relation between the lists, and a Basic query is generated.

5. The interactive natural language query conversion method of claim 1, wherein: the step 5 specifically includes:

step 5.1, defining a selection field operation: select (f||func (F), pre_q);

based on the Pre-query pre_q, adding the selected field F or the aggregation operation result Func (F) of the field F, wherein the Func (F) operation is divided into the following steps according to the aggregation function in the SQL: SUM () summing function, COUNT () counting function, MAX () maximum function, MIN () minimum function, and AVG () averaging function; defining a grammar rule as shown in formula (4-1);

SeleClause＝SN+AN|SN+AN+FN|SN+FN+AN (4-1)

the Where operation is divided into an explicit screening condition and a nested Sub-query condition, OP represents an operation symbol, and Sub_Q represents a nested Sub-query; defining a grammar rule as shown in formula (4-2);

CondClause＝AN+ON+VN|ON+VN|CondClause+LN+CondClause|AN+ON+Sub_Q (4-2)

step 5.3, defining grouping operation: group by (F, pre_Q);

on the basis of Pre_Q, adding grouping operation, generating a Group by clause, if the Group by clause does not exist in the Pre_Q, adding a Group by () clause, otherwise, directly adding a new field into the original Group by clause; defining a grammar rule as shown in formula (4-3):

GroClause＝AN+GN (4-3)；

selecting screening operation after adding the grouping on the basis of Pre_Q, and adding the obtained Having clause to the back of the Group by clause; considering the usual Having clause with an aggregation function, value is a specific number NUM, defining a grammar rule as shown in formula (4-4):

HavingClause＝HN+FN+AN+ON+NUM (4-4)；

step 5.5, defining a sorting operation: order by (F||Func (F), pre_Q);

for the sorting operation in the SQL grammar, a sorting function is defined, when the interactive operation input by the user is that the selection result is sorted according to a certain field, the sorting function is triggered, and the sorting clause is at the end of the whole SQL sentence, and the grammar rule shown in the formula (4-5) is defined:

OrderClause＝ODN+AN+ACS|DESC (4-5)。