CN111159220B - Method and apparatus for outputting structured query statement - Google Patents

Method and apparatus for outputting structured query statement Download PDF

Info

Publication number
CN111159220B
CN111159220B CN201911412056.7A CN201911412056A CN111159220B CN 111159220 B CN111159220 B CN 111159220B CN 201911412056 A CN201911412056 A CN 201911412056A CN 111159220 B CN111159220 B CN 111159220B
Authority
CN
China
Prior art keywords
natural language
templated
structured query
sentence
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911412056.7A
Other languages
Chinese (zh)
Other versions
CN111159220A (en
Inventor
王丽杰
杨春杰
孙珂
李婷婷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201911412056.7A priority Critical patent/CN111159220B/en
Publication of CN111159220A publication Critical patent/CN111159220A/en
Application granted granted Critical
Publication of CN111159220B publication Critical patent/CN111159220B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2452Query translation
    • G06F16/24522Translation of natural language queries to structured queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the application discloses a method and a device for outputting a structured query statement. One embodiment of the method comprises the following steps: acquiring a natural language sentence to be converted; inputting natural language sentences into a first model trained in advance, and generating templated natural language corresponding to the natural language sentences; and outputting the generated structured query sentence corresponding to the templated natural language according to the corresponding relation between the templated natural language and the structured query sentence. This embodiment improves the accuracy of outputting the structured query statement.

Description

Method and apparatus for outputting structured query statement
Technical Field
Embodiments of the present application relate to the field of computer technology, and in particular, to a method and apparatus for outputting a structured query statement.
Background
In the information age, databases are becoming more and more widely used as carriers of information. One needs to query various data from the database and master a standard structured query language (Structured Query Language, SQL). But it is difficult for many non-professionals to master SQL and various grammars of SQL need to be learned, so a method of converting natural language into structured query statements is urgently needed for non-professionals to use databases.
The existing method for generating the structured query sentence is usually to manually write the structured query sentence or input the natural language query sentence into a pre-trained machine learning model to obtain the structured query sentence output by the model.
Disclosure of Invention
The embodiment of the application provides a method and a device for outputting a structured query statement.
In a first aspect, some embodiments of the present application provide a method for outputting a structured query statement, the method comprising: acquiring a natural language sentence to be converted; inputting natural language sentences into a first model trained in advance, and generating templated natural language corresponding to the natural language sentences; and outputting the generated structured query sentence corresponding to the templated natural language according to the corresponding relation between the templated natural language and the structured query sentence.
In some embodiments, the first model comprises a model trained by: acquiring a sample set, wherein the sample set comprises sample natural language sentences and sample templated natural language corresponding to the sample natural language sentences; and based on the sample set, jointly training a first model and a second model, wherein the second model is used for representing the corresponding relation between the templated natural language and the natural language sentence.
In some embodiments, jointly training the first model and the second model based on the sample set includes: and respectively taking the sample natural language sentences and sample templated natural languages corresponding to the sample natural language sentences as input and output, and training to obtain a first model based on a pre-established loss function, wherein the loss function comprises a regular term which is increased to meet the probability duality of the first model and the second model.
In some embodiments, the sample set includes sample templated natural language generated via: acquiring key information based on a database associated with the structured query statement to be output, wherein the key information comprises at least one of the following: table names, field names, attributes, operators and operators in the structured query language; filling the obtained key information into a predefined natural language template to obtain a sample templated natural language.
In some embodiments, outputting the generated structured query sentence corresponding to the templated natural language according to the preset correspondence between the templated natural language and the structured query sentence, including: determining key information included in the generated templated natural language and a corresponding natural language template; obtaining a structured query sentence template corresponding to the determined natural language template according to the corresponding relation between the pre-established natural language template and the structured query sentence template; and filling the determined key information into the obtained structured query statement template to obtain the structured query statement.
In a second aspect, some embodiments of the present application provide an apparatus for outputting a structured query statement, the apparatus comprising: an acquisition unit configured to acquire a natural language sentence to be converted; the generation unit is configured to input a natural language sentence into a first model trained in advance, and generate a templated natural language corresponding to the natural language sentence; the output unit is configured to output the generated structured query sentence corresponding to the templated natural language according to the corresponding relation between the templated natural language and the structured query sentence.
In some embodiments, the apparatus further comprises a training unit, the training unit comprising: a first acquisition subunit configured to acquire a sample set including a sample natural language sentence and a sample templated natural language corresponding to the sample natural language sentence; and a training subunit configured to jointly train a first model and a second model based on the sample set, the second model being used for characterizing the correspondence between the templated natural language and the natural language sentence.
In some embodiments, the training subunit is further configured to: and respectively taking the sample natural language sentences and sample templated natural languages corresponding to the sample natural language sentences as input and output, and training to obtain a first model based on a pre-established loss function, wherein the loss function comprises a regular term which is increased to meet the probability duality of the first model and the second model.
In some embodiments, the apparatus further comprises a sample generation unit, the sample generation unit comprising: a second obtaining subunit configured to obtain key information based on a database associated with the structured query statement to be output, the key information including at least one of: table names, field names, attributes, operators and operators in the structured query language; and the first filling subunit is configured to fill the acquired key information into a predefined natural language template to obtain a sample templated natural language.
In some embodiments, the output unit includes: a determination subunit configured to determine key information included in the generated templated natural language and a corresponding natural language template; a third obtaining subunit configured to obtain a structured query sentence template corresponding to the determined natural language template according to a corresponding relationship between a pre-established natural language template and the structured query sentence template; and the second filling subunit is configured to fill the determined key information into the obtained structured query statement template to obtain the structured query statement.
In a third aspect, some embodiments of the present application provide an apparatus comprising: one or more processors; and a storage device having one or more programs stored thereon, which when executed by the one or more processors cause the one or more processors to implement the method as described in the first aspect.
In a fourth aspect, some embodiments of the present application provide a computer readable medium having stored thereon a computer program which when executed by a processor implements a method as described in the first aspect.
The method and the device for outputting the structured query statement are provided by the embodiment of the application, and the natural language statement to be converted is obtained; inputting natural language sentences into a first model trained in advance, and generating templated natural language corresponding to the natural language sentences; according to the preset corresponding relation between the templated natural language and the structured query sentence, outputting the generated structured query sentence corresponding to the templated natural language, thereby improving the accuracy of outputting the structured query sentence.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the following drawings, in which:
FIG. 1 is an exemplary system architecture diagram to which some of the present application may be applied;
FIG. 2 is a flow chart of one embodiment of a method for outputting a structured query statement according to the present application;
FIG. 3 is a schematic illustration of an application scenario of a method for outputting structured query statements according to the present application;
FIG. 4 is a flow chart of training a first model according to the present application;
FIG. 5 is a structural schematic diagram of one embodiment of an apparatus for outputting structured query statements according to the present application;
fig. 6 is a schematic diagram of a computer system suitable for use in implementing some embodiments of the present application.
Detailed Description
The present application is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.
It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
FIG. 1 illustrates an exemplary system architecture 100 to which embodiments of the methods of the present application for outputting structured query terms or apparatus for outputting structured query terms may be applied.
As shown in fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various client applications, such as database-like applications, e-commerce-like applications, search-like applications, etc., may be installed on the terminal devices 101, 102, 103.
The terminal devices 101, 102, 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices with display screens, including but not limited to smartphones, tablets, laptop and desktop computers, and the like. When the terminal devices 101, 102, 103 are software, they can be installed in the above-listed electronic devices. Which may be implemented as a plurality of software or software modules, or as a single software or software module. The present invention is not particularly limited herein.
The server 105 may be a server providing various services, for example a background server providing support for applications installed on the terminal devices 101, 102, 103, and the server 105 may obtain natural language sentences to be converted; inputting natural language sentences into a first model trained in advance, and generating templated natural language corresponding to the natural language sentences; and outputting the generated structured query sentence corresponding to the templated natural language according to the corresponding relation between the templated natural language and the structured query sentence.
It should be noted that, the method for outputting the structured query sentence provided in the embodiment of the present application may be executed by the server 105, or may be executed by the terminal devices 101, 102, and 103, and accordingly, the device for outputting the structured query sentence may be provided in the server 105, or may be provided in the terminal devices 101, 102, and 103.
The server may be hardware or software. When the server is hardware, the server may be implemented as a distributed server cluster formed by a plurality of servers, or may be implemented as a single server. When the server is software, it may be implemented as a plurality of software or software modules (e.g., to provide distributed services), or as a single software or software module. The present invention is not particularly limited herein.
It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to FIG. 2, a flow 200 of one embodiment of a method for outputting a structured query statement is shown, in accordance with the present application. The method for outputting the structured query statement comprises the following steps:
step 201, a natural language sentence to be converted is obtained.
In this embodiment, a method execution body (e.g., a server or a terminal shown in fig. 1) for outputting a structured query sentence may first acquire a natural language sentence to be converted. The natural language sentence to be converted may originate from natural language information in the form of text, images or speech entered by the user.
Step 202, inputting the natural language sentence into a pre-trained first model, and generating a templated natural language corresponding to the natural language sentence.
In this embodiment, the execution body may input a natural language sentence into a first model trained in advance, and generate a templated natural language corresponding to the natural language sentence. The templated natural language may be in a form intermediate between natural language and SQL statements, expressed in natural language, but is meta-templated. As an example, the natural language is "which of students older than 18 years" and the corresponding templated natural language may include "older than 18, giving the student's name".
The first model may be used to characterize the correspondence of the natural language sentence to the templated natural language. The first model may include an Encoder-Decoder (Encoder-Decoder) model with or without an attention mechanism, and may also include one or more neural network models, which may use a recurrent neural network model (RNN, recurrent Neural Network) in which hidden nodes in the network structure are connected to form a loop that not only learns information at the current time but also relies on previous sequence information. The problem of information preservation is solved due to the special network model structure. RNNs have unique advantages for handling time series and language text series problems. Further, one or more of the RNN's variant long and short term memory network (LSTM, long Short Term Memory networks), gating recursion unit (GRU, gated Recurrent Unit) may also be used to compose a sequence-to-sequence model. Furthermore, the first model may be trained alone or in combination with other models.
Step 203, outputting the generated structured query sentence corresponding to the templated natural language according to the corresponding relation between the templated natural language and the structured query sentence.
In this embodiment, the execution body may output the structured query sentence corresponding to the templated natural language generated in step 202 according to the preset correspondence between the templated natural language and the structured query sentence. The corresponding relation between the templated natural language and the structured query sentence can be obtained through enumeration, can be obtained through abstract part of content and then enumeration, and can also be obtained through learning by a machine learning method.
In some optional implementations of this embodiment, outputting the generated structured query sentence corresponding to the templated natural language according to the preset correspondence between the templated natural language and the structured query sentence, including: determining key information included in the generated templated natural language and a corresponding natural language template; obtaining a structured query sentence template corresponding to the determined natural language template according to the corresponding relation between the pre-established natural language template and the structured query sentence template; and filling the determined key information into the obtained structured query statement template to obtain the structured query statement.
Here, the key information may include at least one of: table name (table), field name (column), attribute (value), operator (AGG) and Operator (OP) in the structured query language. The attributes, i.e., field attribute values, operators in the structured query language may include min (minimum), max (maximum), count (total), sum (sum), avg (average). Operators in the structured query language may include: > (greater than), > = (greater than or equal to), < (less than), <= (less than or equal to), = = (equal to), +.! = (not equal) etc. Here, the natural language template may be a template natural language abstracted according to rules. For example, the templated natural language "give name", "give age", where "name", "age" is a field name, may be abstracted from the natural language template "give column". The correspondence between the natural language templates and the structured query statement templates can be obtained through enumeration. Likewise, the structured query term template may be obtained by abstracting the structured query term according to rules, for example, "SELECT gender", "gender" is a field name, and may be abstracted as "SELECT column".
As an example, the correspondence between the templated natural language and the structured query statement may further include a correspondence between part of information in the SQL statement and part of information in the templated natural language, for example, the templated natural language corresponding to "SELECT column" may be "give column", "SELECT AGG (column)" may be "AGG to give column", "GROUP BY column" may be "templated natural language corresponding to each column", "GROUP BY column HAVING column OP value" may be "column of OP value", "ORDER BY column ASC" may be "column-ascending" and the templated natural language corresponding to "ORDER BY column DESC" may be "column-descending".
With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the method for outputting a structured query statement according to the present embodiment. In the application scenario of fig. 3, a server 301 acquires a natural language sentence 302 to be converted; inputting the natural language sentence 302 into a pre-trained first model 303, and generating a templated natural language 304 corresponding to the natural language sentence 302; and outputting the generated structured query sentence 305 corresponding to the templated natural language 304 according to the corresponding relation between the templated natural language and the structured query sentence which are preset.
The method provided by the embodiment of the application is implemented by acquiring the natural language sentence to be converted; inputting natural language sentences into a first model trained in advance, and generating templated natural language corresponding to the natural language sentences; according to the preset corresponding relation between the templated natural language and the structured query sentence, outputting the generated structured query sentence corresponding to the templated natural language, thereby improving the accuracy of outputting the structured query sentence.
With further reference to fig. 4, a flow 400 of training a first model is shown. The process 400 of training the first model includes the steps of:
step 401, a sample set is obtained, wherein the sample set comprises sample natural language sentences and sample templated natural language corresponding to the sample natural language sentences.
In this embodiment, a method execution body (e.g., a server or a terminal shown in fig. 1) or other execution body for outputting a structured query term may first obtain a sample set including a sample natural language term and a sample templated natural language corresponding to the sample natural language term.
In some alternative implementations of the present embodiment, the sample set includes sample templated natural language generated via: acquiring key information based on a database associated with the structured query statement to be output, wherein the key information comprises at least one of the following: table names, field names, attributes, operators and operators in the structured query language; filling the obtained key information into a predefined natural language template to obtain a sample templated natural language. The training data can be expanded through the implementation mode, and a better training effect is further obtained.
Step 402, jointly training a first model and a second model based on a sample set.
In this embodiment, the execution body may jointly train the first model and the second model based on the sample set obtained in step 401, where the second model is used to characterize the correspondence between the templated natural language and the natural language sentence. The first model and the second model have a dual relationship, and the joint training of the first model and the second model can comprise taking the output of the first model as the input of the second model, taking the output of the second model as the input of the first model, and training the first model and the second model based on a reinforcement learning method. It can also be used as a multi-objective optimization problem (multi-objective optimization problem) to ensure that the two models with dual relationships meet probability duality, i.e., P (templated natural language, natural language sentence) =p (templated natural language |natural language sentence) =p (natural language sentence|templated natural language) P (templated natural language), while monitoring to minimize the loss function.
In some alternative implementations of the present embodiment, jointly training the first model and the second model based on the sample set includes: and respectively taking the sample natural language sentences and sample templated natural languages corresponding to the sample natural language sentences as input and output, and training to obtain a first model based on a pre-established loss function, wherein the loss function comprises a regular term which is increased to meet the probability duality of the first model and the second model. The loss function of the first model may be a logarithmic loss function or the like
As an example, the first model employs a log-loss function, and the learning objective of generating a templated natural language from natural language sentences may be as follows:
Figure BDA0002350233220000091
wherein x is a natural language sentence, y is a templated natural language, θ is a parameter to be determined in the first model, n is a sample number, and i is 1 to n.
The second model adopts a logarithmic loss function, and the learning target of the template natural language generated by the natural language sentence can be as follows:
Figure BDA0002350233220000092
wherein x is a natural language sentence, y is a templated natural language,
Figure BDA0002350233220000096
and taking 1 to n as the parameter to be determined in the second model, wherein n is the number of samples.
The dual information between the first model and the second model may be defined as:
Figure BDA0002350233220000093
the gradient drop of the first model after adding the dual information can be expressed as:
Figure BDA0002350233220000094
the gradient drop of the second model after adding the dual information can be expressed as:
Figure BDA0002350233220000095
where m is the number of samples used when using the gradient descent method.
As can be seen from fig. 4, in the process 400 of the method for outputting a structured query sentence in this embodiment, the effect of model training can be improved by joint learning, so that the accuracy of outputting the structured query sentence is further improved.
With further reference to fig. 5, as an implementation of the method shown in the foregoing figures, the present application provides an embodiment of an apparatus for outputting a structured query statement, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus is specifically applicable to various electronic devices.
As shown in fig. 5, the apparatus 500 for outputting a structured query statement of the present embodiment includes: an acquisition unit 501, a generation unit 502, and an output unit 503. The system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is configured to acquire a natural language sentence to be converted; the generation unit is configured to input a natural language sentence into a first model trained in advance, and generate a templated natural language corresponding to the natural language sentence; the output unit is configured to output the generated structured query sentence corresponding to the templated natural language according to the corresponding relation between the templated natural language and the structured query sentence.
In this embodiment, specific processes of the obtaining unit 501, the first determining unit 502, the second determining unit 503, and the first generating unit 504 of the apparatus 500 for outputting a structured query statement may refer to step 201, step 202, step 203, and step 204 in the corresponding embodiment of fig. 2.
In some optional implementations of this embodiment, the apparatus further includes a training unit, where the training unit includes: a first acquisition subunit configured to acquire a sample set including a sample natural language sentence and a sample templated natural language corresponding to the sample natural language sentence; and a training subunit configured to jointly train a first model and a second model based on the sample set, the second model being used for characterizing the correspondence between the templated natural language and the natural language sentence.
In some optional implementations of the present embodiment, the training subunit is further configured to: and respectively taking the sample natural language sentences and sample templated natural languages corresponding to the sample natural language sentences as input and output, and training to obtain a first model based on a pre-established loss function, wherein the loss function comprises a regular term which is increased to meet the probability duality of the first model and the second model.
In some optional implementations of the present embodiment, the apparatus further includes a sample generation unit, the sample generation unit including: a second obtaining subunit configured to obtain key information based on a database associated with the structured query statement to be output, the key information including at least one of: table names, field names, attributes, operators and operators in the structured query language; and the first filling subunit is configured to fill the acquired key information into a predefined natural language template to obtain a sample templated natural language.
In some alternative implementations of the present embodiment, the output unit includes: a determination subunit configured to determine key information included in the generated templated natural language and a corresponding natural language template; a third obtaining subunit configured to obtain a structured query sentence template corresponding to the determined natural language template according to a corresponding relationship between a pre-established natural language template and the structured query sentence template; and the second filling subunit is configured to fill the determined key information into the obtained structured query statement template to obtain the structured query statement.
The device provided by the embodiment of the application obtains the natural language sentence to be converted; inputting natural language sentences into a first model trained in advance, and generating templated natural language corresponding to the natural language sentences; according to the preset corresponding relation between the templated natural language and the structured query sentence, outputting the generated structured query sentence corresponding to the templated natural language, thereby improving the accuracy of outputting the structured query sentence.
Referring now to FIG. 6, there is illustrated a schematic diagram of a computer system 600 suitable for use in implementing a server or terminal of an embodiment of the present application. The server or terminal illustrated in fig. 6 is merely an example, and should not be construed as limiting the functionality and scope of use of the embodiments of the present application.
As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU) 601, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the system 600 are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
The following components may be connected to the I/O interface 605: an input portion 606 including a keyboard, mouse, etc.; an output portion 607 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The drive 610 is also connected to the I/O interface 605 as needed. Removable media 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on drive 610 so that a computer program read therefrom is installed as needed into storage section 608.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network through the communication portion 609, and/or installed from the removable medium 611. The above-described functions defined in the method of the present application are performed when the computer program is executed by a Central Processing Unit (CPU) 601. It should be noted that the computer readable medium described in the present application may be a computer readable signal medium or a computer readable medium, or any combination of the two. The computer readable medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations of the present application may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the C-programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units involved in the embodiments of the present application may be implemented by software, or may be implemented by hardware. The described units may also be provided in a processor, for example, described as: a processor includes an acquisition unit, a generation unit, and an output unit. The names of these units do not constitute a limitation on the unit itself in some cases, and the acquisition unit may also be described as "a unit configured to acquire a natural language sentence to be converted", for example.
As another aspect, the present application also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be present alone without being fitted into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: acquiring a natural language sentence to be converted; inputting natural language sentences into a first model trained in advance, and generating templated natural language corresponding to the natural language sentences; and outputting the generated structured query sentence corresponding to the templated natural language according to the corresponding relation between the templated natural language and the structured query sentence.
The foregoing description is only of the preferred embodiments of the present application and is presented as a description of the principles of the technology being utilized. It will be appreciated by persons skilled in the art that the scope of the invention referred to in this application is not limited to the specific combinations of features described above, but it is intended to cover other embodiments in which any combination of features described above or equivalents thereof is possible without departing from the spirit of the invention. Such as the above-described features and technical features having similar functions (but not limited to) disclosed in the present application are replaced with each other.

Claims (12)

1. A method for outputting a structured query statement, comprising:
acquiring a natural language sentence to be converted;
inputting the natural language sentence into a first model trained in advance, and generating a templated natural language corresponding to the natural language sentence, wherein the templated natural language is a language which is between the natural language and the structured query sentence and expressed by the natural language but is partially templated;
and outputting the generated structured query sentence corresponding to the templated natural language according to the corresponding relation between the templated natural language and the structured query sentence.
2. The method of claim 1, wherein the first model comprises a model trained by:
acquiring a sample set, wherein the sample set comprises sample natural language sentences and sample templated natural language corresponding to the sample natural language sentences;
and training the first model and the second model in a combined mode based on the sample set, wherein the second model is used for representing the corresponding relation between the templated natural language and the natural language sentence.
3. The method of claim 2, wherein the jointly training the first model and the second model based on the sample set comprises:
and respectively taking the sample natural language sentences and sample templated natural languages corresponding to the sample natural language sentences as input and output, and training to obtain the first model based on a pre-established loss function, wherein the loss function comprises a regular term which is increased in order to meet the probability duality of the first model and the second model.
4. The method of claim 2, wherein the sample set includes sample templated natural language generated via:
obtaining key information based on a database associated with the structured query statement to be output, wherein the key information comprises at least one of the following: table names, field names, attributes, operators and operators in the structured query language;
filling the obtained key information into a predefined natural language template to obtain a sample templated natural language.
5. The method according to any one of claims 1-4, wherein outputting the generated structured query statement corresponding to the templated natural language according to the preset correspondence between the templated natural language and the structured query statement, includes:
determining key information included in the generated templated natural language and a corresponding natural language template;
obtaining a structured query sentence template corresponding to the determined natural language template according to the corresponding relation between the pre-established natural language template and the structured query sentence template;
and filling the determined key information into the obtained structured query statement template to obtain the structured query statement.
6. An apparatus for outputting a structured query statement, comprising:
an acquisition unit configured to acquire a natural language sentence to be converted;
a generation unit configured to input the natural language sentence into a first model trained in advance, and generate a templated natural language corresponding to the natural language sentence, wherein the templated natural language is a language which is between the natural language and the structured query sentence, expressed in the natural language, but is partially templated;
the output unit is configured to output the generated structured query sentence corresponding to the templated natural language according to the corresponding relation between the templated natural language and the structured query sentence.
7. The apparatus of claim 6, wherein the apparatus further comprises a training unit comprising:
a first acquisition subunit configured to acquire a sample set including a sample natural language sentence and a sample templated natural language corresponding to the sample natural language sentence;
and a training subunit configured to jointly train the first model and a second model based on the sample set, the second model being used for characterizing a correspondence between the templated natural language and the natural language sentence.
8. The apparatus of claim 7, wherein the training subunit is further configured to:
and respectively taking the sample natural language sentences and sample templated natural languages corresponding to the sample natural language sentences as input and output, and training to obtain the first model based on a pre-established loss function, wherein the loss function comprises a regular term which is increased in order to meet the probability duality of the first model and the second model.
9. The apparatus of claim 7, wherein the apparatus further comprises a sample generation unit comprising:
a second obtaining subunit configured to obtain key information based on a database associated with the structured query statement to be output, the key information including at least one of: table names, field names, attributes, operators and operators in the structured query language;
and the first filling subunit is configured to fill the acquired key information into a predefined natural language template to obtain a sample templated natural language.
10. The apparatus according to any one of claims 6-9, wherein the output unit comprises:
a determination subunit configured to determine key information included in the generated templated natural language and a corresponding natural language template;
a third obtaining subunit configured to obtain a structured query sentence template corresponding to the determined natural language template according to a corresponding relationship between a pre-established natural language template and the structured query sentence template;
and the second filling subunit is configured to fill the determined key information into the obtained structured query statement template to obtain the structured query statement.
11. An electronic device, comprising:
one or more processors;
a storage device having one or more programs stored thereon;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-5.
12. A computer readable medium having stored thereon a computer program which, when executed by a processor, implements the method of any of claims 1-5.
CN201911412056.7A 2019-12-31 2019-12-31 Method and apparatus for outputting structured query statement Active CN111159220B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911412056.7A CN111159220B (en) 2019-12-31 2019-12-31 Method and apparatus for outputting structured query statement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911412056.7A CN111159220B (en) 2019-12-31 2019-12-31 Method and apparatus for outputting structured query statement

Publications (2)

Publication Number Publication Date
CN111159220A CN111159220A (en) 2020-05-15
CN111159220B true CN111159220B (en) 2023-06-23

Family

ID=70560243

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911412056.7A Active CN111159220B (en) 2019-12-31 2019-12-31 Method and apparatus for outputting structured query statement

Country Status (1)

Country Link
CN (1) CN111159220B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111651474B (en) * 2020-06-02 2023-07-25 东云睿连(武汉)计算技术有限公司 Method and system for converting natural language into structured query language
CN113032418B (en) * 2021-02-08 2022-11-11 浙江大学 Method for converting complex natural language query into SQL (structured query language) based on tree model
CN113254619A (en) * 2021-06-21 2021-08-13 北京沃丰时代数据科技有限公司 Automatic reply method and device for user query and electronic equipment
CN114461665B (en) * 2022-01-26 2023-01-24 北京百度网讯科技有限公司 Method, apparatus and computer program product for generating a statement transformation model
CN114117025B (en) * 2022-01-28 2022-05-17 阿里巴巴达摩院(杭州)科技有限公司 Information query method, device, storage medium and system
CN114168619B (en) * 2022-02-09 2022-05-10 阿里巴巴达摩院(杭州)科技有限公司 Training method and device of language conversion model

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105868313A (en) * 2016-03-25 2016-08-17 浙江大学 Mapping knowledge domain questioning and answering system and method based on template matching technique
WO2018081020A1 (en) * 2016-10-24 2018-05-03 Carlabs Inc. Computerized domain expert
CN109408526A (en) * 2018-10-12 2019-03-01 平安科技(深圳)有限公司 SQL statement generation method, device, computer equipment and storage medium
CN109739483A (en) * 2018-12-28 2019-05-10 北京百度网讯科技有限公司 Method and apparatus for generated statement
CN109766355A (en) * 2018-12-28 2019-05-17 上海汇付数据服务有限公司 A kind of data query method and system for supporting natural language

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10262062B2 (en) * 2015-12-21 2019-04-16 Adobe Inc. Natural language system question classifier, semantic representations, and logical form templates
CN107451153B (en) * 2016-05-31 2020-03-31 北京京东尚科信息技术有限公司 Method and device for outputting structured query statement
CN109542929B (en) * 2018-11-28 2020-11-24 山东工商学院 Voice query method and device and electronic equipment
CN110347784A (en) * 2019-05-23 2019-10-18 深圳壹账通智能科技有限公司 Report form inquiring method, device, storage medium and electronic equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105868313A (en) * 2016-03-25 2016-08-17 浙江大学 Mapping knowledge domain questioning and answering system and method based on template matching technique
WO2018081020A1 (en) * 2016-10-24 2018-05-03 Carlabs Inc. Computerized domain expert
CN109408526A (en) * 2018-10-12 2019-03-01 平安科技(深圳)有限公司 SQL statement generation method, device, computer equipment and storage medium
CN109739483A (en) * 2018-12-28 2019-05-10 北京百度网讯科技有限公司 Method and apparatus for generated statement
CN109766355A (en) * 2018-12-28 2019-05-17 上海汇付数据服务有限公司 A kind of data query method and system for supporting natural language

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
郭富强 ; 鱼滨.基于模糊数据库的数据查询研究.《微电子学与计算机》.2005,123-126. *

Also Published As

Publication number Publication date
CN111159220A (en) 2020-05-15

Similar Documents

Publication Publication Date Title
CN111159220B (en) Method and apparatus for outputting structured query statement
KR102401942B1 (en) Method and apparatus for evaluating translation quality
CN109614111B (en) Method and apparatus for generating code
CN110969012B (en) Text error correction method and device, storage medium and electronic equipment
CN111428010B (en) Man-machine intelligent question-answering method and device
WO2022142121A1 (en) Abstract sentence extraction method and apparatus, and server and computer-readable storage medium
US20210326524A1 (en) Method, apparatus and device for quality control and storage medium
US11132996B2 (en) Method and apparatus for outputting information
US11501655B2 (en) Automated skill tagging, knowledge graph, and customized assessment and exercise generation
CN110807311A (en) Method and apparatus for generating information
CN111104796B (en) Method and device for translation
CN111008213B (en) Method and apparatus for generating language conversion model
CN110232920B (en) Voice processing method and device
CN114357195A (en) Knowledge graph-based question-answer pair generation method, device, equipment and medium
CN112582073B (en) Medical information acquisition method, device, electronic equipment and medium
CN111125154B (en) Method and apparatus for outputting structured query statement
CN109857838B (en) Method and apparatus for generating information
CN116821327A (en) Text data processing method, apparatus, device, readable storage medium and product
CN114020774A (en) Method, device and equipment for processing multiple rounds of question-answering sentences and storage medium
CN109036554B (en) Method and apparatus for generating information
CN110990528A (en) Question answering method and device and electronic equipment
CN112328751A (en) Method and device for processing text
CN112131378A (en) Method and device for identifying categories of civil problems and electronic equipment
CN111079185A (en) Database information processing method and device, storage medium and electronic equipment
CN112148751A (en) Method and device for querying data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant