CN112765201A - Method and device for analyzing SQL (structured query language) statement into specific field query statement - Google Patents

Method and device for analyzing SQL (structured query language) statement into specific field query statement Download PDF

Info

Publication number
CN112765201A
CN112765201A CN202110140201.1A CN202110140201A CN112765201A CN 112765201 A CN112765201 A CN 112765201A CN 202110140201 A CN202110140201 A CN 202110140201A CN 112765201 A CN112765201 A CN 112765201A
Authority
CN
China
Prior art keywords
statement
query
sql
field
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110140201.1A
Other languages
Chinese (zh)
Inventor
刘煜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Sipuleng Technology Co Ltd
Wuhan Sipuling Technology Co Ltd
Original Assignee
Wuhan Sipuling Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Sipuling Technology Co Ltd filed Critical Wuhan Sipuling Technology Co Ltd
Priority to CN202110140201.1A priority Critical patent/CN112765201A/en
Publication of CN112765201A publication Critical patent/CN112765201A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a method and a device for analyzing SQL sentences into query sentences in a specific field, wherein the method comprises the following steps: acquiring an SQL statement; traversing the SQL sentences according to preset query words, and determining key fields corresponding to the query sentences in the specific field; determining a statement structure of a query statement in a specific field according to the key field; inputting the SQL statement into a coding-decoding model with complete training, and outputting a rule character corresponding to a filtering condition in the SQL statement; and determining a specific field query statement corresponding to the SQL statement according to the statement structure and the rule characters. According to the method, the SQL grammar structure is abstracted, the query words in the SQL grammar structure are used for determining the statement structure of the query statement in the specific field, the SQL statement is further encoded and decoded by using an encoding-decoding model, the regular characters are determined and filled in the statement structure, efficient and intelligent conversion between the SQL statement analysis and the query statement in the specific field is realized, and the development cost of technical personnel is reduced.

Description

Method and device for analyzing SQL (structured query language) statement into specific field query statement
Technical Field
The invention relates to the technical field of statement data analysis, in particular to a method and a device for analyzing SQL statements into query statements in a specific field.
Background
At present, various types of data are rapidly increasing, and higher requirements are put forward on the data retrieval capability. And the user or the developer can utilize the data more effectively by using more flexible query means such as full-text retrieval, fuzzy query and the like. Therefore, the query speed requirements for full-text retrieval and fuzzy query of mass data are higher and higher. The traditional database can not meet the performance requirement of full-text retrieval gradually, and some domain-specific query statements become means for realizing efficient full-text retrieval or fuzzy query, while the query statement of an elastic search (search server based on Lucene, abbreviated as es) is definitely a perquisite among them. But the es-specific query language (abbreviated as es-dsl) has a certain learning cost, and is greatly different from the commonly used SQL query statement.
In the prior art, the SQL statement is analyzed by adopting a writing analyzer, however, as the rule adopts an artificial definition method, the development difficulty is high, the development time is long, the problems of incomplete grammar coverage and the like exist, and the full support and high accuracy rate are difficult to achieve. Meanwhile, the SQL statement is converted into es-dsl by using a natural language processing algorithm, however, the context and semantic weight of the traditional natural language processing algorithm are the same, so that the model performance is low, the actual conversion power and accuracy are low, and the correctness of the converted es-dsl cannot be guaranteed by adopting an end-to-end training mode. In conclusion, how to quickly and accurately analyze the SQL statement is an urgent problem to be solved.
Disclosure of Invention
In view of the above, there is a need to provide a method for parsing an SQL statement into a domain-specific query statement, so as to solve the problem of how to parse the SQL statement quickly and accurately.
The invention provides a method for analyzing SQL sentences into query sentences in a specific field, which comprises the following steps:
acquiring an SQL statement;
traversing the SQL sentences according to preset query words, and determining key fields corresponding to the query sentences in the specific field;
determining a statement structure of the specific field query statement according to the key field;
inputting the SQL statement into a coding-decoding model which is completely trained, and outputting a rule character corresponding to a filtering condition in the SQL statement;
and determining the specific field query statement corresponding to the SQL statement according to the statement structure and the rule characters.
Further, the query term includes a select word and a from word, the key field includes a source field, the traversing the SQL statement according to the preset query term is performed, and determining the key field corresponding to the specific field query statement includes:
traversing the SQL statement according to the select word and the from word;
determining whether a first field between the select word and the from word is empty;
if the query statement is empty, the source field does not exist in the specific field query statement;
and if not, determining the source field corresponding to the specific field query statement according to the first field.
Further, the determining the key field corresponding to the query statement in the specific field further includes:
determining a first word in the SQL statement after the from word;
and taking the first word as the table name field corresponding to the specific field query statement, and writing the table name field into a corresponding query URL.
Further, the query term includes a where word, an and word, and an or word, the key field includes a filter field, the traversing is performed on the SQL statement according to a preset query term, and determining the key field corresponding to the specific field query statement further includes:
traversing the SQL sentence according to the where word, the and word and the or word;
judging whether the SQL sentence has the where word;
if yes, the filtering field is empty;
if not, converting the where word into an identification word, and determining a connector according to the and word and the or word;
and determining the filtering field according to the identification words and the connectors.
Further, the determining, according to the key field, a statement structure of the domain-specific query statement includes: and determining the statement structure according to the source field, the table name field and the filter field.
Further, the rule character includes a column name, an operator, and a value, and the training process of the encoding-decoding model includes:
acquiring an SQL filtering conditional statement training set containing marking information, wherein the marking information comprises an actual column name, an actual operator and an actual numerical value corresponding to the SQL filtering conditional statement;
inputting the SQL filtering conditional statement training set into the coding-decoding model, and determining a prediction column name, a prediction operator and a prediction numerical value corresponding to the SQL filtering conditional statement;
determining a model accuracy according to an error between the actual column name and the predicted column name, an error between the actual operator and the predicted operator, and an error between the actual value and the predicted value;
and adjusting parameters of the coding-decoding model according to the model accuracy until a threshold condition is met, and finishing the training of the coding-decoding model.
Further, the encoding-decoding model sequentially includes a condition field encoding layer and a condition field decoding layer, wherein:
the condition field coding layer comprises a BERT network layer and is used for coding each character in the SQL filtering condition statement and determining a corresponding coding vector;
the conditional field decoding layer comprises a residual error network layer, a softmax classification layer and an output layer, wherein the residual error network layer is used for carrying out feature combination on any continuous N coding vectors and determining feature vectors corresponding to the N coding vectors; the softmax classification layer is used for respectively determining classification probabilities of corresponding characters corresponding to different classes according to the feature vectors, and taking the maximum classification probability as the class corresponding to the characters, wherein the different classes comprise column names, operators and numerical values; the output layer is used for filling the characters with the column names and the numerical values in the sentence structure, and filling the translated characters with the operator in the sentence structure.
Further, the determining, according to the statement structure and the rule characters, the specific field query statement corresponding to the SQL statement includes: and sequentially filling the rule characters into the sentence structure according to the categories of the rule characters to form the complete specific field query sentence.
Further, the method for parsing the SQL statement into the query statement in the specific field further includes:
if the user judges that the finally output specific field query statement does not meet the accuracy requirement, revising the specific field query statement;
updating the coding-decoding model according to the revised domain-specific query statement.
The invention also provides a device for analyzing the SQL sentence into the query sentence in the specific field, which comprises a processor and a memory, wherein the memory is stored with a computer program, and when the computer program is executed by the processor, the method for analyzing the SQL sentence into the query sentence in the specific field is realized.
Compared with the prior art, the invention has the beneficial effects that: firstly, effectively acquiring an SQL statement; then, traversing the SQL statement, determining a key field corresponding to the query statement in the specific field, and effectively determining the key field by structural analysis of the SQL statement; secondly, determining a statement structure of the query statement in the specific field by using the key field, and efficiently and accurately converting the statement structure of the query statement in the specific field into a corresponding statement structure of the query statement in the specific field by analyzing the structure of the SQL statement so as to ensure that the structure of the SQL statement and the structure of the query statement in the specific field are consistent; furthermore, the filtering conditions in the SQL sentences are coded and decoded by using a coding-decoding model, so that the corresponding rule characters are efficiently and quickly determined; and finally, filling the rule characters into a statement structure, so as to accurately obtain the specific field query statement corresponding to the conversion of the SQL statement. In summary, the invention abstracts the SQL syntax structure, determines the statement structure of the query statement in the specific field by using the query word, further encodes and decodes the SQL statement by using the encoding-decoding model, determines the rule characters and fills the rule characters into the statement structure, realizes the efficient and intelligent conversion between the SQL statement analysis and the query statement in the specific field, and reduces the development cost of technical personnel.
Drawings
FIG. 1 is a flow chart of a method for parsing an SQL statement into a specific domain query statement according to the present invention;
FIG. 2 is a first flowchart illustrating a process of determining key fields according to the present invention;
FIG. 3 is a second flowchart illustrating a process of determining key fields according to the present invention;
FIG. 4 is a third schematic flow chart illustrating the determination of key fields according to the present invention;
FIG. 5 is a schematic flow chart of a training process provided by the present invention;
FIG. 6 is a schematic structural diagram of an encoding-decoding model provided by the present invention;
fig. 7 is a schematic flow chart of updating the encoding-decoding model provided by the present invention.
Detailed Description
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate preferred embodiments of the invention and together with the description, serve to explain the principles of the invention and not to limit the scope of the invention.
Example 1
The embodiment of the present invention provides a method for parsing an SQL statement into a query statement in a specific field, and with reference to fig. 1, fig. 1 is a schematic flow diagram of the method for parsing an SQL statement into a query statement in a specific field according to the present invention, where the method for parsing an SQL statement into a query statement in a specific field includes steps S1 to S5, where:
in step S1, an SQL statement is acquired;
in step S2, traversing the SQL statement according to a preset query word, and determining a key field corresponding to the query statement in the specific field;
in step S3, determining a sentence structure of the domain-specific query sentence according to the key field;
in step S4, the SQL statement is input into the fully trained coding-decoding model, and the rule characters corresponding to the filtering condition in the SQL statement are output;
in step S5, a domain-specific query statement corresponding to the SQL statement is determined based on the statement structure and the rule characters.
In the embodiment of the invention, firstly, SQL sentences are effectively acquired; then, traversing the SQL statement, determining a key field corresponding to the query statement in the specific field, and effectively determining the key field by structural analysis of the SQL statement; secondly, determining a statement structure of the query statement in the specific field by using the key field, and efficiently and accurately converting the statement structure of the query statement in the specific field into a corresponding statement structure of the query statement in the specific field by analyzing the structure of the SQL statement so as to ensure that the structure of the SQL statement and the structure of the query statement in the specific field are consistent; furthermore, the filtering conditions in the SQL sentences are coded and decoded by using a coding-decoding model, so that the corresponding rule characters are efficiently and quickly determined; and finally, filling the rule characters into a statement structure, so as to accurately obtain the specific field query statement corresponding to the conversion of the SQL statement. It should be noted that the domain-specific query statement provided by the present invention is an es-dsl statement.
Preferably, referring to fig. 2, fig. 2 is a first schematic flow chart illustrating the determining of the key field according to the present invention, and the step S2 includes steps S21 to S24, where:
in step S21, traversing the SQL statement according to the select word and the from word;
in step S22, it is determined whether the first field between the select word and the from word is empty;
in step S23, if it is empty, the specific domain query statement does not have a source field;
in step S24, if not, a source field corresponding to the domain-specific query statement is determined according to the first field.
As a specific embodiment, the embodiment of the invention determines the field information between the select word and the from word by traversing the SQL statement, and deduces the basic structure of the es-dsl by analyzing the field and the structure of the SQL statement.
In a specific embodiment of the present invention, the inference process of the source field is as follows:
1. if the SQL sentence is from, and the field between the word of the select and the word of the from is "", the query sentence in the specific field has no source field;
2. if the SQL statement is select src _ ip, dst _ ip from, and the fields between the select word and the from word are src _ ip and dst _ ip, the source field of the query statement in the specific field is: "_ source": [ "src _ ip", "dst _ ip" ].
Preferably, referring to fig. 3, fig. 3 is a schematic diagram illustrating a second flow of determining the key field according to the present invention, and the step S2 further includes steps S25 to S26, where:
in step S25, the first word after the from word in the SQL statement is determined;
in step S26, the first word is written into the corresponding query URL as the table name field corresponding to the domain-specific query sentence.
As a specific embodiment, the embodiment of the present invention obtains the first word after the from word through traversal of the SQL statement, that is, the table name, but the table name is not written in es-dsl, and the table name is written in url corresponding to the es query during query, that is, the table name is output as a separate field in actual use, so as to achieve the purpose of quickly determining the table name corresponding to the query statement in the specific field.
Preferably, referring to fig. 4, fig. 4 is a third schematic flowchart of the process of determining the key field provided by the present invention, and the step S2 further includes a step S27 to a step S211, where:
in step S27, traversing the SQL statement according to the where word, the and word, and the or word;
in step S28, it is determined whether there is a where word in the SQL statement;
in step S29, if present, the filter field is empty;
in step S210, if the word does not exist, the word is converted into an identification word, and a connector is determined according to the and word and the or word;
in step S211, a filter field is determined based on the identification word and the connector.
As a specific embodiment, the embodiment of the invention identifies the filtering condition in the SQL statement according to the where word, and deduces the structure of the filtering field of the es-dsl through the analysis of the field and the structure of the SQL statement.
It should be noted that all statements after the where word in the SQL statement are filter conditions, the filter conditions have high complexity, and there may be a situation of multiple sets of filter conditions, therefore, in traversing the SQL, only a filter structure is inferred (a column name, an operator, and a value in the filter conditions are not inferred), and meanwhile, a filter condition character string is obtained, and a column name, an operator, and a value in the filter conditions will be abbreviated as coli、opri、valiThese three placeholders (where subscript i marks that the current placeholder belongs to the set of filter conditions). The where condition in SQL corresponds to the query field in es-dsl. To enable the es-dsl structure to correspond to the SQL structure, the es abstract query structure unit is designed as follows:
{"bool":{"conni":[{"match_phrase":{"col1":{"opr1":"val1"}}}
wherein conniFor this purpose a filter condition connector. The default value of the connector of the filter condition is must, and when two conditions are connected by or, the values of the connectors of the two filter conditions are simultaneously modified to should. coli、opri、val1The column name, operator, value of the filter term for this purpose. By adopting the structure, each group of filtering conditions in the SQL statement corresponds to one query structure unit, and a plurality of query structure units are organized by arrays. The complete query array is { "query": { "borol": { "filter": 2 { "query]In (c), the value of the filter field. And the es-dsl is constructed by adopting the query structure unit, so that the analysis effect can reach high accuracy.
In a specific embodiment of the present invention, the inference process for filtering fields is as follows:
1. when there is no word in the SQL statement, the query field (i.e., the filter field) is empty. For example: if the SQL statement is select from netflow, the query field of es-dsl corresponding to the SQL statement is: { };
2. when only one filter condition (only where word) exists in the SQL statement, namely, the condition connector of the and word and the or word does not exist, the inferred query field of the es-dsl contains an es query structure unit, and because only one query unit exists, the conn connection character is defaulted to be the must;
for example: if the SQL statement is select from netflow where dst _ port is 1320, the query field of the corresponding es-dsl is:
{"query":{"bool":{"filter":[{"bool":{"must":[{"match_phrase":{"col1":{"opr1":"val1"}}}]}}]}}
the column names, operators and values are abbreviated as placeholders, so that the corresponding relation among the column names, operators and values can be prevented from being judged when the filtering condition is analyzed, and the accuracy of the analysis result is improved to a great extent;
3. when a plurality of filtering conditions exist in the SQL sentence, namely a plurality of and words and or word ear condition connectors exist, when an or word connection exists, the conn connectors of the two conditions of the or connection use should, and when an and word connection exists, the conn connectors of the two conditions of the and connection use should;
for example: if the SQL statement is select from netflow where dst _ port is 1320and src _ port is 80;
the query field of es-dsl is:
{"bool":{"filter":[{"bool":{"must":[{"match_phrase":{"col1":{"opr1":"val1"}}}]}},{"bool":{"must":[{"match_phrase":{"col2":{"opr2":"val2"}}}]}}]}};
for example: if the SQL statement is selected from netflow where dst _ port is 1320and src _ port is 80or src _ ip is 192.168.24.29;
the query field of es-dsl is:
{"bool":{"filter":[{"bool":{"must":[{"match_phrase":{"col1":{"opr1":"val1"}}}]}},{"bool":{"should":[{"match_phrase":{"col2":{"opr2":"val2"}}},
{"bool":{"should":[{"match_phrase":{"col2":{"opr2":"val2"}}}]}}]}}。
preferably, the step S3 specifically includes: and determining a statement structure according to the source field, the table name field and the filter field. As a specific embodiment, the embodiment of the invention acquires the source field, the table name field and the filter field in the SQL statement by traversing the SQL statement, and deduces the basic structure of the es-dsl through the acquired fields and structures.
In a specific embodiment of the present invention, the query body structure of es-dsl is obtained by inference from the query field, for example as follows:
for example: if the SQL statement is select src _ ip, dst _ ip from netflow where dst _ port is 1320and src _ port is 80;
through the determination of the source field, the table name field and the filter field, the sentence structure of the query sentence in the specific field is as follows:
{"query":{"bool":{"filter":[{"bool":{"must":[{"match_phrase":
{"col1":{"opr1":"val1"}}}]}},
{"bool":{"must":[{"match_phrase":{"col2":{"opr2":"val2"}}}]}}]}},
"_source":["src_ip","dst_ip"]};
then, the column name, operator, and value in the filter field need to be identified and written into the corresponding placeholder.
Preferably, referring to fig. 5, fig. 5 is a schematic flowchart of a training process provided by the present invention, where the training process of the coding-decoding model includes steps S001 to S004, where:
in step S001, an SQL filtering conditional statement training set including annotation information is obtained, where the annotation information includes an actual column name, an actual operator, and an actual numerical value corresponding to the SQL filtering conditional statement;
in step S002, the SQL filtering conditional statement training set is input to the coding-decoding model, and the prediction column name, the prediction operator, and the prediction value corresponding to the SQL filtering conditional statement are determined;
in step S003, a model accuracy is determined based on an error between the actual column name and the predicted column name, an error between the actual operator and the predicted operator, and an error between the actual value and the predicted value;
in step S004, parameters of the coding-decoding model are adjusted according to the model accuracy until a threshold condition is satisfied, and training of the coding-decoding model is completed.
As a specific embodiment, the embodiment of the invention carries out coding and decoding training on the filtering condition statement of the SQL statement in the training data, so that the model can effectively extract the column name, the operator and the value in the SQL statement. It should be noted that SQL often includes multiple filtering condition statements (corresponding to where words), each filtering condition statement includes different column names, operators, and values, and a method of directly traversing SQL statements and obtaining column names, operators, and values through rules has a high error rate. Enabling the model to pass through the filtering condition statement, and acquiring column names, operators and values in the filtering condition statement.
Preferentially, referring to fig. 6, fig. 6 is a schematic structural diagram of an encoding-decoding model provided by the present invention, where the encoding-decoding model sequentially includes a condition field encoding layer and a condition field decoding layer, where:
the condition field coding layer comprises a BERT network layer and is used for coding each character in the SQL filtering condition statement and determining a corresponding coding vector;
the condition field decoding layer comprises a residual error network layer, a softmax classification layer and an output layer, wherein the residual error network layer is used for carrying out feature combination on any continuous N coding vectors and determining a feature vector corresponding to the Nth coding vector; the softmax classification layer is used for respectively determining classification probabilities of corresponding characters corresponding to different classes according to the feature vectors, and taking the maximum classification probability as the class corresponding to the characters, wherein the different classes comprise column names, operators and numerical values; the output layer is used for filling the characters with the belonged categories of column names and numerical values into the sentence structure, and filling the characters with the belonged categories of operators into the sentence structure after the characters are translated.
As a specific embodiment, the embodiment of the present invention uses a pre-training model of BERT to encode each character in the SQL filtering conditional statement to obtain an encoded coding vector, further performs decoding by using a residual network structure to obtain a feature vector corresponding to an nth coding vector, and finally calculates a category to which the maximum probability of each field belongs by using a softmax classification layer.
It should be noted that, in the present invention, the training data set and the model training code are deployed in the server, the training code is executed, the training is performed, after each round of training, the result output by the model is compared with the result in the training data, the accuracy is calculated, and when the accuracy is greater than the threshold, the model is obtained, otherwise, the training is continued. And testing the model by adopting test data and carrying out manual detection, calculating the accuracy, outputting the model if the accuracy is higher than a threshold value, and otherwise, continuing training.
In a specific embodiment of the invention, a BERT network layer is adopted for coding service, a pretrained BERT network layer is called to obtain the coding vector representation of each character in each SQL filtering condition statement, and a high-performance BERT network layer is adopted for vectorization operation, so that the accuracy of analyzing the SQL statement into es-dsl can be improved. For a filtering condition statement containing n characters, a two-dimensional array with the size of n multiplied by 120 can be obtained through a BERT model, each line of the two-dimensional array represents a coding vector of the corresponding character, the two-dimensional array contains 120 eigenvalues, and the coding vector is marked as x1To xn
In a specific embodiment of the present invention, the decoder employs a residual network architecture. The expression of the features of any n consecutive characters is:
Figure BDA0002927097120000111
wherein x islRepresenting the ith character vector, function F is a matrix multiplication operation. WiRepresenting the ith weight matrix. By multiplying n consecutive characters by a weight matrix and summing with the l-th character vector. The characteristics of n continuous characters are more effectively expressed;
wherein, the characteristic x is classified by adopting softmax classification functionLAnd performing operation and obtaining the subscript of the value with the maximum probability.
pL=argmax(softmax(xL))
Wherein, softmax (x)L) Denotes xLColumn name of, operator, value, or probability that none belongs to these four attributes. We denote the probability maximum property as the result of this feature (argmax is the label under the array maximum) as PL. Where the value of the attribute is not 0, the attribute is column name 1, the attribute is operator 2, and the attribute is value 3.
Through the encoding and decoding operation, the model can effectively extract column names, operators and values in the SQL sentences. The column names and values can be directly filled into the query structure obtained by structure inference, and the operator needs to be translated as follows:
wherein, "═ translates to" query "," > "translates to" gt ", and" < "translates to" lt ";
and obtaining the complete es-dsl through structure inference and filtering condition identification.
Preferably, the step S5 specifically includes: and filling the rule characters into a sentence structure in sequence according to the categories of the rule characters to form a complete specific field query sentence. As specific examples, the present invention is exemplified. As a specific embodiment, the embodiment of the invention combines the sentence structure and the rule characters, and fills in the sentence according to the category to which the rule characters belong, so that the accuracy of the query sentence in the specific field is ensured.
Preferably, referring to fig. 7, fig. 7 is a schematic flowchart of the process of updating the coding-decoding model provided by the present invention, and the method for parsing the SQL statement into the domain-specific query statement further includes steps S6 to S7, where:
in step S6, if the user determines that the final output domain-specific query statement does not meet the accuracy requirement, the domain-specific query statement is revised;
in step S7, the encoding-decoding model is updated according to the revised domain-specific query statement.
As a specific embodiment, the embodiment of the invention updates the coding-decoding model in time through the artificial judgment of the user, and continuously trains in the using process, thereby ensuring the improvement of the accuracy rate of the model.
Example 2
The embodiment of the invention provides a device for analyzing SQL (structured query language) statements into specific field query statements, which comprises a processor and a memory, wherein the memory is stored with a computer program, and when the computer program is executed by the processor, the method for analyzing the SQL statements into the specific field query statements is realized.
The invention discloses a method and a device for analyzing SQL (structured query language) statements into query statements in a specific field, wherein the SQL statements are effectively acquired; then, traversing the SQL statement, determining a key field corresponding to the query statement in the specific field, and effectively determining the key field by structural analysis of the SQL statement; secondly, determining a statement structure of the query statement in the specific field by using the key field, and efficiently and accurately converting the statement structure of the query statement in the specific field into a corresponding statement structure of the query statement in the specific field by analyzing the structure of the SQL statement so as to ensure that the structure of the SQL statement and the structure of the query statement in the specific field are consistent; furthermore, the filtering conditions in the SQL sentences are coded and decoded by using a coding-decoding model, so that the corresponding rule characters are efficiently and quickly determined; and finally, filling the rule characters into a statement structure, so as to accurately obtain the specific field query statement corresponding to the conversion of the SQL statement. It should be noted that the domain-specific query statement provided by the present invention is an es-dsl statement.
According to the technical scheme, the SQL grammar structure is abstracted, the query words in the SQL grammar structure are used for determining the statement structure of the query statement in the specific field, the SQL statement is further encoded and decoded by using an encoding-decoding model, the regular characters are determined and filled in the statement structure, efficient and intelligent conversion between SQL statement analysis and the query statement in the specific field is achieved, and development cost of technical personnel is reduced.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.

Claims (10)

1. A method for analyzing SQL sentences into specific field query sentences is characterized by comprising the following steps:
acquiring an SQL statement;
traversing the SQL sentences according to preset query words, and determining key fields corresponding to the query sentences in the specific field;
determining a statement structure of the specific field query statement according to the key field;
inputting the SQL statement into a coding-decoding model which is completely trained, and outputting a rule character corresponding to a filtering condition in the SQL statement;
and determining the specific field query statement corresponding to the SQL statement according to the statement structure and the rule characters.
2. The method for parsing an SQL statement according to claim 1 into a domain-specific query statement, wherein the query terms include a select word and a from word, the key fields include a source field, and traversing the SQL statement according to a preset query term to determine the key fields corresponding to the domain-specific query statement comprises:
traversing the SQL statement according to the select word and the from word;
determining whether a first field between the select word and the from word is empty;
if the query statement is empty, the source field does not exist in the specific field query statement;
and if not, determining the source field corresponding to the specific field query statement according to the first field.
3. The method for parsing an SQL statement into a specific field query statement according to claim 2, wherein the key field includes a table name field, the traversing the SQL statement according to a preset query word, and determining the key field corresponding to the specific field query statement further includes:
determining a first word in the SQL statement after the from word;
and taking the first word as the table name field corresponding to the specific field query statement, and writing the table name field into a corresponding query URL.
4. The method for parsing an SQL statement into a specific field query statement according to claim 3, wherein the query terms include a word, an and word, and an or word, the key fields include filter fields, the traversing the SQL statement according to a preset query term determines the key fields corresponding to the specific field query statement, and further includes:
traversing the SQL sentence according to the where word, the and word and the or word;
judging whether the SQL sentence has the where word;
if yes, the filtering field is empty;
if not, converting the where word into an identification word, and determining a connector according to the and word and the or word;
and determining the filtering field according to the identification words and the connectors.
5. The method for parsing an SQL statement according to claim 4 into a domain-specific query statement, wherein the determining a statement structure of the domain-specific query statement according to the key field comprises: and determining the statement structure according to the source field, the table name field and the filter field.
6. The method of parsing an SQL statement according to claim 5, wherein the rule characters comprise column names, operators and numerical values, and the training process of the codec model comprises:
acquiring an SQL filtering conditional statement training set containing marking information, wherein the marking information comprises an actual column name, an actual operator and an actual numerical value corresponding to the SQL filtering conditional statement;
inputting the SQL filtering conditional statement training set into the coding-decoding model, and determining a prediction column name, a prediction operator and a prediction numerical value corresponding to the SQL filtering conditional statement;
determining a model accuracy according to an error between the actual column name and the predicted column name, an error between the actual operator and the predicted operator, and an error between the actual value and the predicted value;
and adjusting parameters of the coding-decoding model according to the model accuracy until a threshold condition is met, and finishing the training of the coding-decoding model.
7. The method for parsing an SQL statement according to claim 5 into a domain-specific query statement, wherein the coding-decoding model comprises a conditional field coding layer and a conditional field decoding layer in sequence, wherein:
the condition field coding layer comprises a BERT network layer and is used for coding each character in the SQL filtering condition statement and determining a corresponding coding vector;
the conditional field decoding layer comprises a residual error network layer, a softmax classification layer and an output layer, wherein the residual error network layer is used for carrying out feature combination on any continuous N coding vectors and determining feature vectors corresponding to the N coding vectors; the softmax classification layer is used for respectively determining classification probabilities of corresponding characters corresponding to different classes according to the feature vectors, and taking the maximum classification probability as the class corresponding to the characters, wherein the different classes comprise column names, operators and numerical values; the output layer is used for filling the characters with the column names and the numerical values in the sentence structure, and filling the translated characters with the operator in the sentence structure.
8. The method for parsing an SQL statement according to claim 1 into a domain-specific query statement, wherein the determining the domain-specific query statement corresponding to the SQL statement according to the statement structure and the rule characters comprises: and sequentially filling the rule characters into the sentence structure according to the categories of the rule characters to form the complete specific field query sentence.
9. The method for parsing an SQL statement according to claim 7 into a domain-specific query statement, further comprising:
if the user judges that the finally output specific field query statement does not meet the accuracy requirement, revising the specific field query statement;
updating the coding-decoding model according to the revised domain-specific query statement.
10. An apparatus for parsing an SQL statement into a domain-specific query statement, the apparatus comprising a processor and a memory, the memory storing a computer program, the computer program, when executed by the processor, implementing the method for parsing an SQL statement into a domain-specific query statement according to any one of claims 1 to 9.
CN202110140201.1A 2021-02-01 2021-02-01 Method and device for analyzing SQL (structured query language) statement into specific field query statement Pending CN112765201A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110140201.1A CN112765201A (en) 2021-02-01 2021-02-01 Method and device for analyzing SQL (structured query language) statement into specific field query statement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110140201.1A CN112765201A (en) 2021-02-01 2021-02-01 Method and device for analyzing SQL (structured query language) statement into specific field query statement

Publications (1)

Publication Number Publication Date
CN112765201A true CN112765201A (en) 2021-05-07

Family

ID=75704581

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110140201.1A Pending CN112765201A (en) 2021-02-01 2021-02-01 Method and device for analyzing SQL (structured query language) statement into specific field query statement

Country Status (1)

Country Link
CN (1) CN112765201A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112988785A (en) * 2021-05-10 2021-06-18 浙江大学 SQL conversion method and system based on language model coding and multitask decoding
CN114003229A (en) * 2021-09-28 2022-02-01 厦门国际银行股份有限公司 SQL code similarity analysis method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106649630A (en) * 2016-12-07 2017-05-10 乐视控股(北京)有限公司 Data query method and device
CN108446289A (en) * 2017-09-26 2018-08-24 北京中安智达科技有限公司 A kind of data retrieval method for supporting heterogeneous database
US20190266271A1 (en) * 2018-02-27 2019-08-29 Elasticsearch B.V. Systems and Methods for Converting and Resolving Structured Queries as Search Queries
CN110968582A (en) * 2019-11-01 2020-04-07 苏宁云计算有限公司 Crowd generation method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106649630A (en) * 2016-12-07 2017-05-10 乐视控股(北京)有限公司 Data query method and device
CN108446289A (en) * 2017-09-26 2018-08-24 北京中安智达科技有限公司 A kind of data retrieval method for supporting heterogeneous database
US20190266271A1 (en) * 2018-02-27 2019-08-29 Elasticsearch B.V. Systems and Methods for Converting and Resolving Structured Queries as Search Queries
CN110968582A (en) * 2019-11-01 2020-04-07 苏宁云计算有限公司 Crowd generation method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112988785A (en) * 2021-05-10 2021-06-18 浙江大学 SQL conversion method and system based on language model coding and multitask decoding
CN114003229A (en) * 2021-09-28 2022-02-01 厦门国际银行股份有限公司 SQL code similarity analysis method and system

Similar Documents

Publication Publication Date Title
CN110348016B (en) Text abstract generation method based on sentence correlation attention mechanism
CN108363790B (en) Method, device, equipment and storage medium for evaluating comments
CN106202010B (en) Method and apparatus based on deep neural network building Law Text syntax tree
De Mori Spoken language understanding: A survey
CN111444311A (en) Semantic understanding model training method and device, computer equipment and storage medium
CN112069295B (en) Similar question recommendation method and device, electronic equipment and storage medium
CN113011189A (en) Method, device and equipment for extracting open entity relationship and storage medium
CN112528637B (en) Text processing model training method, device, computer equipment and storage medium
CN109902301B (en) Deep neural network-based relationship reasoning method, device and equipment
CN108549658A (en) A kind of deep learning video answering method and system based on the upper attention mechanism of syntactic analysis tree
CN114611494A (en) Text error correction method, device, equipment and storage medium
CN110442880B (en) Translation method, device and storage medium for machine translation
CN115048447B (en) Database natural language interface system based on intelligent semantic completion
CN116628186B (en) Text abstract generation method and system
CN112765201A (en) Method and device for analyzing SQL (structured query language) statement into specific field query statement
CN113657123A (en) Mongolian aspect level emotion analysis method based on target template guidance and relation head coding
CN110084323A (en) End-to-end semanteme resolution system and training method
CN116661805B (en) Code representation generation method and device, storage medium and electronic equipment
CN114218928A (en) Abstract text summarization method based on graph knowledge and theme perception
CN113360610A (en) Dialog generation method and system based on Transformer model
CN114168754A (en) Relation extraction method based on syntactic dependency and fusion information
CN114429122A (en) Aspect level emotion analysis system and method based on cyclic attention
CN116432662A (en) Training method of text processing model, text processing method and device
WO2023115770A1 (en) Translation method and related device therefor
CN110717316A (en) Topic segmentation method and device for subtitle dialog flow

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210507

RJ01 Rejection of invention patent application after publication