CN111159220A - Method and apparatus for outputting structured query statement - Google Patents

Method and apparatus for outputting structured query statement Download PDF

Info

Publication number
CN111159220A
CN111159220A CN201911412056.7A CN201911412056A CN111159220A CN 111159220 A CN111159220 A CN 111159220A CN 201911412056 A CN201911412056 A CN 201911412056A CN 111159220 A CN111159220 A CN 111159220A
Authority
CN
China
Prior art keywords
natural language
structured query
templated
sample
query statement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911412056.7A
Other languages
Chinese (zh)
Other versions
CN111159220B (en
Inventor
王丽杰
杨春杰
孙珂
李婷婷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201911412056.7A priority Critical patent/CN111159220B/en
Publication of CN111159220A publication Critical patent/CN111159220A/en
Application granted granted Critical
Publication of CN111159220B publication Critical patent/CN111159220B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2452Query translation
    • G06F16/24522Translation of natural language queries to structured queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the application discloses a method and a device for outputting a structured query statement. One embodiment of the method comprises: acquiring natural language sentences to be converted; inputting natural language sentences into a pre-trained first model to generate templated natural language corresponding to the natural language sentences; and outputting the structured query sentence corresponding to the generated templated natural language according to the preset corresponding relation between the templated natural language and the structured query sentence. This embodiment improves the accuracy of outputting the structured query statement.

Description

Method and apparatus for outputting structured query statement
Technical Field
The embodiment of the application relates to the technical field of computers, in particular to a method and a device for outputting a structured query statement.
Background
In the information age, a database is used as an information carrier and is more and more widely applied. People need to Query various data from a database, and need to master a standard Structured Query Language (SQL). However, it is difficult for many non-professionals to master SQL and need to learn various grammars of SQL, so a method of converting natural language into structured query statements is urgently needed for the non-professionals to use the database.
The existing method for generating a structured query statement generally involves manually writing the structured query statement or inputting a natural language query statement into a machine learning model trained in advance to obtain the structured query statement output by the model.
Disclosure of Invention
The embodiment of the application provides a method and a device for outputting a structured query statement.
In a first aspect, some embodiments of the present application provide a method for outputting a structured query statement, the method comprising: acquiring natural language sentences to be converted; inputting natural language sentences into a pre-trained first model to generate templated natural language corresponding to the natural language sentences; and outputting the structured query sentence corresponding to the generated templated natural language according to the preset corresponding relation between the templated natural language and the structured query sentence.
In some embodiments, the first model comprises a model trained by: acquiring a sample set, wherein the sample set comprises sample natural language sentences and sample templated natural languages corresponding to the sample natural language sentences; and jointly training a first model and a second model based on the sample set, wherein the second model is used for representing the corresponding relation between the templated natural language and the natural language sentences.
In some embodiments, jointly training the first model and the second model based on the sample set comprises: and training a sample natural language sentence and a sample templated natural language corresponding to the sample natural language sentence as input and output respectively to obtain a first model based on a pre-established loss function, wherein the loss function comprises a regular term which is increased in order to meet the probability duality of the first model and the second model.
In some embodiments, the sample set includes a sample templated natural language generated via the steps of: acquiring key information based on a database associated with a structured query statement to be output, wherein the key information comprises at least one of the following items: table names, field names, attributes, operational characters and operators in the structured query language; and filling the acquired key information into a predefined natural language template to obtain the sample templated natural language.
In some embodiments, outputting the structured query statement corresponding to the generated templated natural language according to the preset corresponding relationship between the templated natural language and the structured query statement includes: determining key information included in the generated templated natural language and a corresponding natural language template; acquiring a structured query statement template corresponding to the determined natural language template according to a corresponding relation between a pre-established natural language template and the structured query statement template; and filling the determined key information into the acquired structured query statement template to obtain the structured query statement.
In a second aspect, some embodiments of the present application provide an apparatus for outputting a structured query statement, the apparatus comprising: an acquisition unit configured to acquire a natural language sentence to be converted; a generating unit configured to input natural language sentences to a first model trained in advance, and generate templated natural language corresponding to the natural language sentences; and the output unit is configured to output the structured query statement corresponding to the generated templated natural language according to the preset corresponding relation between the templated natural language and the structured query statement.
In some embodiments, the apparatus further comprises a training unit comprising: a first obtaining subunit configured to obtain a sample set, the sample set including a sample natural language sentence and a sample templated natural language corresponding to the sample natural language sentence; a training subunit configured to jointly train a first model and a second model based on the sample set, the second model being used for characterizing a corresponding relationship between the templated natural language and the natural language sentence.
In some embodiments, the training subunit is further configured to: and training a sample natural language sentence and a sample templated natural language corresponding to the sample natural language sentence as input and output respectively to obtain a first model based on a pre-established loss function, wherein the loss function comprises a regular term which is increased in order to meet the probability duality of the first model and the second model.
In some embodiments, the apparatus further comprises a sample generation unit comprising: a second obtaining subunit configured to obtain key information based on a database associated with the structured query statement to be output, the key information including at least one of: table names, field names, attributes, operational characters and operators in the structured query language; a first filling subunit configured to fill the acquired key information into a predefined natural language template to obtain a sample templated natural language.
In some embodiments, an output unit includes: a determining subunit configured to determine key information included in the generated templated natural language and a corresponding natural language template; a third obtaining subunit, configured to obtain a structured query statement template corresponding to the determined natural language template according to a correspondence between a pre-established natural language template and the structured query statement template; and the second filling subunit is configured to fill the determined key information into the acquired structured query statement template to obtain a structured query statement.
In a third aspect, some embodiments of the present application provide an apparatus comprising: one or more processors; a storage device, on which one or more programs are stored, which, when executed by the one or more processors, cause the one or more processors to implement the method as described above in the first aspect.
In a fourth aspect, some embodiments of the present application provide a computer readable medium having stored thereon a computer program which, when executed by a processor, implements the method as described above in the first aspect.
According to the method and the device for outputting the structured query statement, the natural language statement to be converted is obtained; inputting natural language sentences into a pre-trained first model to generate templated natural language corresponding to the natural language sentences; and outputting the structured query sentence corresponding to the generated templated natural language according to the preset corresponding relation between the templated natural language and the structured query sentence, thereby improving the accuracy of outputting the structured query sentence.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is a diagram of an exemplary system architecture to which some of the present application may be applied;
FIG. 2 is a flow diagram for one embodiment of a method for outputting a structured query statement in accordance with the present application;
FIG. 3 is a schematic diagram of an application scenario of a method for outputting a structured query statement according to the present application;
FIG. 4 is a flow chart for training a first model according to the present application;
FIG. 5 is a schematic diagram illustrating the structure of one embodiment of an apparatus for outputting a structured query statement in accordance with the present application;
FIG. 6 is a block diagram of a computer system suitable for use in implementing a server or terminal of some embodiments of the present application.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 shows an exemplary system architecture 100 to which embodiments of the present method for outputting a structured query statement or an apparatus for outputting a structured query statement may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various client applications, such as database-type applications, e-commerce-type applications, search-type applications, etc., may be installed on the terminal devices 101, 102, 103.
The terminal apparatuses 101, 102, and 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices with display screens, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like. When the terminal apparatuses 101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules, or as a single piece of software or software module. And is not particularly limited herein.
The server 105 may be a server providing various services, for example, a background server providing support for applications installed on the terminal devices 101, 102, and 103, and the server 105 may obtain natural language statements to be converted; inputting natural language sentences into a pre-trained first model to generate templated natural language corresponding to the natural language sentences; and outputting the structured query sentence corresponding to the generated templated natural language according to the preset corresponding relation between the templated natural language and the structured query sentence.
It should be noted that the method for outputting the structured query statement provided in the embodiment of the present application may be executed by the server 105, or may be executed by the terminal devices 101, 102, and 103, and accordingly, the apparatus for outputting the structured query statement may be disposed in the server 105, or may be disposed in the terminal devices 101, 102, and 103.
The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to FIG. 2, a flow 200 of one embodiment of a method for outputting a structured query statement in accordance with the present application is shown. The method for outputting the structured query statement comprises the following steps:
step 201, obtaining a natural language sentence to be converted.
In the present embodiment, a method execution body (e.g., a server or a terminal shown in fig. 1) for outputting a structured query statement may first acquire a natural language statement to be converted. The natural language sentence to be converted may be derived from natural language information in the form of text, image, or voice input by the user.
Step 202, inputting the natural language sentence into the pre-trained first model, and generating the templated natural language corresponding to the natural language sentence.
In this embodiment, the execution body may input the natural language sentence to a first model trained in advance, and generate a templated natural language corresponding to the natural language sentence. The templated natural language may be a form intermediate between natural language and SQL statements, expressed in natural language, but partially templated. As an example, the natural language is "what students older than 18 have", and the corresponding templated natural language may include "older than 18, given the name of the student".
The first model may be used to characterize the correspondence of natural language sentences to templated natural language. The first model may include an Encoder-Decoder (Encoder-Decoder) model with or without an attention mechanism, and may also include one or more neural Network models, and the neural Network models may use a Recurrent Neural Network (RNN) model, and connections between hidden nodes in a Network structure of the recurrent neural Network model form a ring, which not only learns information at the current time, but also relies on previous sequence information. The problem of information storage is solved due to the special network model structure. RNN has unique advantages for dealing with time series and language text series problems. Further, one or more of a variant Long Short Term Memory network (LSTM) of RNN, Gated Recursion Unit (GRU), and Gated Recursion Unit (GRU) may also be used to compose a sequence-to-sequence model. In addition, the first model may be trained alone or in combination with other models.
Step 203, outputting the generated structured query statement corresponding to the templated natural language according to the preset corresponding relationship between the templated natural language and the structured query statement.
In this embodiment, the execution body may output the structured query statement corresponding to the templated natural language generated in step 202 according to a preset correspondence between the templated natural language and the structured query statement. The corresponding relation between the templated natural language and the structured query sentence can be obtained by enumeration, can also be obtained by enumerating after abstracting partial content, and can also be obtained by learning through a machine learning method.
In some optional implementations of this embodiment, outputting the generated structured query statement corresponding to the templated natural language according to a preset correspondence between the templated natural language and the structured query statement includes: determining key information included in the generated templated natural language and a corresponding natural language template; acquiring a structured query statement template corresponding to the determined natural language template according to a corresponding relation between a pre-established natural language template and the structured query statement template; and filling the determined key information into the acquired structured query statement template to obtain the structured query statement.
Here, the key information may include at least one of: table name (table), field name (column), attribute (value), Operator (AGG) and Operator (OP) in the structured query language. Attributes are field attribute values, and operators in the structured query language may include min (min), max (max), count (total), sum (sum), avg (average). Operators in the structured query language may include: a length of a wire (i) is greater than (i), a length of a wire (i) is less than (i), a! Not equal to, etc. Here, the natural language template may be abstracted from the templated natural language according to a rule. For example, in the templated natural language "give name", "give age", the "name", "age" are field names, and both can abstract the natural language template "give column". The correspondence between the natural language template and the structured query statement template may be obtained by enumeration. Similarly, the template of the structured query statement may be abstracted from the structured query statement according to rules, for example, "SELECT gender" is a field name, and "SELECT column" may be abstracted.
For example, the correspondence between the templated natural language and the structured query statement may further include correspondence between part of information in the SQL statement and part of information in the templated natural language, for example, the templated natural language corresponding to "SELECT column" may be "column-providing" and the templated natural language corresponding to "SELECT AGG (column)" may be "AGG for column-providing", the templated natural language corresponding to "GROUP BY column" may be "each column", the templated natural language corresponding to "GROUP BY column HAVING" column format OP value "may be" column for OP value ", the templated natural language corresponding to" ORDER BY column ASC "may be" column ascending ORDER ", and the templated natural language corresponding to" ORDER BY column DESC "may be" column descending ORDER ".
With continued reference to FIG. 3, FIG. 3 is a schematic diagram of an application scenario of the method for outputting a structured query statement according to the present embodiment. In the application scenario of fig. 3, a server 301 obtains a natural language sentence 302 to be converted; inputting the natural language sentence 302 into a pre-trained first model 303, and generating a templated natural language 304 corresponding to the natural language sentence 302; the structured query sentence 305 corresponding to the generated templated natural language 304 is output based on the correspondence between the templated natural language and the structured query sentence set in advance.
The method provided by the above embodiment of the present application obtains the natural language sentence to be converted; inputting natural language sentences into a pre-trained first model to generate templated natural language corresponding to the natural language sentences; and outputting the structured query sentence corresponding to the generated templated natural language according to the preset corresponding relation between the templated natural language and the structured query sentence, thereby improving the accuracy of outputting the structured query sentence.
With further reference to FIG. 4, a flow 400 of training the first model is illustrated. The process 400 for training the first model includes the following steps:
step 401, a sample set is obtained, where the sample set includes a sample natural language sentence and a sample templated natural language corresponding to the sample natural language sentence.
In this embodiment, a method execution body (e.g., a server or a terminal shown in fig. 1) or other execution body for outputting a structured query statement may first obtain a sample set, where the sample set includes a sample natural language statement and a sample templated natural language corresponding to the sample natural language statement.
In some optional implementations of this embodiment, the sample set includes a sample templated natural language generated via: acquiring key information based on a database associated with a structured query statement to be output, wherein the key information comprises at least one of the following items: table names, field names, attributes, operational characters and operators in the structured query language; and filling the acquired key information into a predefined natural language template to obtain the sample templated natural language. The training data can be expanded through the realization mode, and a better training effect is further achieved.
Step 402, jointly training a first model and a second model based on a sample set.
In this embodiment, the execution body may jointly train a first model and a second model based on the sample set obtained in step 401, where the second model is used to characterize the corresponding relationship between the templated natural language and the natural language sentence. The joint training of the first model and the second model may include using an output of the first model as an input of the second model, using an output of the second model as an input of the first model, and training the first model and the second model based on a reinforcement learning method. It can also be used as a multi-objective optimization problem (multi-objective optimization problem), which ensures that the two models with dual relationship satisfy probability duality, i.e. P (templated natural language sentence) P (templated natural language) while supervising learning to minimize the loss function.
In some optional implementations of this embodiment, jointly training the first model and the second model based on the sample set includes: and training a sample natural language sentence and a sample templated natural language corresponding to the sample natural language sentence as input and output respectively to obtain a first model based on a pre-established loss function, wherein the loss function comprises a regular term which is increased in order to meet the probability duality of the first model and the second model. The loss function of the first model may be a logarithmic loss function or the like
By way of example, where the first model employs a log-loss function, the learning objective of generating templated natural language from natural language statements may be as follows:
Figure BDA0002350233220000091
wherein x is a natural language statement, y is a templated natural language, θ is a parameter to be determined in the first model, n is a sample number, and i is 1 to n.
The second model employs a log-loss function, and the generation of a learning objective of templated natural language from natural language sentences may be as follows:
Figure BDA0002350233220000092
wherein x is a natural language sentence, y is a templated natural language,
Figure BDA0002350233220000096
and n is the number of samples, and i is 1 to n.
The dual information between the first model and the second model may be defined as:
Figure BDA0002350233220000093
the gradient dip of the first model after adding the dual information can be expressed as:
Figure BDA0002350233220000094
the gradient dip of the second model after adding the dual information can be expressed as:
Figure BDA0002350233220000095
where m is the number of samples used when the gradient descent method is used.
As can be seen from fig. 4, in the process 400 of the method for outputting the structured query statement in the present embodiment, the effect of model training can be improved through joint learning, and the accuracy of outputting the structured query statement is further improved.
With further reference to fig. 5, as an implementation of the method shown in the above figures, the present application provides an embodiment of an apparatus for outputting a structured query statement, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be applied to various electronic devices.
As shown in fig. 5, the apparatus 500 for outputting a structured query statement of the present embodiment includes: acquisition section 501, generation section 502, and output section 503. Wherein the acquisition unit is configured to acquire a natural language sentence to be converted; a generating unit configured to input natural language sentences to a first model trained in advance, and generate templated natural language corresponding to the natural language sentences; and the output unit is configured to output the structured query statement corresponding to the generated templated natural language according to the preset corresponding relation between the templated natural language and the structured query statement.
In this embodiment, specific processes of the acquiring unit 501, the first determining unit 502, the second determining unit 503 and the first generating unit 504 of the apparatus 500 for outputting a structured query statement may refer to step 201, step 202, step 203 and step 204 in the corresponding embodiment of fig. 2.
In some optional implementations of this embodiment, the apparatus further includes a training unit, the training unit including: a first obtaining subunit configured to obtain a sample set, the sample set including a sample natural language sentence and a sample templated natural language corresponding to the sample natural language sentence; a training subunit configured to jointly train a first model and a second model based on the sample set, the second model being used for characterizing a corresponding relationship between the templated natural language and the natural language sentence.
In some optional implementations of this embodiment, the training subunit is further configured to: and training a sample natural language sentence and a sample templated natural language corresponding to the sample natural language sentence as input and output respectively to obtain a first model based on a pre-established loss function, wherein the loss function comprises a regular term which is increased in order to meet the probability duality of the first model and the second model.
In some optional implementations of this embodiment, the apparatus further includes a sample generation unit, and the sample generation unit includes: a second obtaining subunit configured to obtain key information based on a database associated with the structured query statement to be output, the key information including at least one of: table names, field names, attributes, operational characters and operators in the structured query language; a first filling subunit configured to fill the acquired key information into a predefined natural language template to obtain a sample templated natural language.
In some optional implementations of this embodiment, the output unit includes: a determining subunit configured to determine key information included in the generated templated natural language and a corresponding natural language template; a third obtaining subunit, configured to obtain a structured query statement template corresponding to the determined natural language template according to a correspondence between a pre-established natural language template and the structured query statement template; and the second filling subunit is configured to fill the determined key information into the acquired structured query statement template to obtain a structured query statement.
The device provided by the above embodiment of the present application obtains the natural language sentence to be converted; inputting natural language sentences into a pre-trained first model to generate templated natural language corresponding to the natural language sentences; and outputting the structured query sentence corresponding to the generated templated natural language according to the preset corresponding relation between the templated natural language and the structured query sentence, thereby improving the accuracy of outputting the structured query sentence.
Referring now to FIG. 6, shown is a block diagram of a computer system 600 suitable for use in implementing a server or terminal according to an embodiment of the present application. The server or the terminal shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU)601 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the system 600 are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
The following components may be connected to the I/O interface 605: an input portion 606 such as a keyboard, mouse, or the like; an output portion 607 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted in the storage section 608 as necessary.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 611. The computer program performs the above-described functions defined in the method of the present application when executed by a Central Processing Unit (CPU) 601. It should be noted that the computer readable medium described herein can be a computer readable signal medium or a computer readable medium or any combination of the two. A computer readable medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the C language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit, a generation unit, and an output unit. Where the names of these units do not in some cases constitute a limitation on the units themselves, for example, the acquisition unit may also be described as a "unit configured to acquire a natural language sentence to be converted".
As another aspect, the present application also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: acquiring natural language sentences to be converted; inputting natural language sentences into a pre-trained first model to generate templated natural language corresponding to the natural language sentences; and outputting the structured query sentence corresponding to the generated templated natural language according to the preset corresponding relation between the templated natural language and the structured query sentence.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims (12)

1. A method for outputting a structured query statement, comprising:
acquiring natural language sentences to be converted;
inputting the natural language sentences to a pre-trained first model to generate templated natural language corresponding to the natural language sentences;
and outputting the structured query sentence corresponding to the generated templated natural language according to the preset corresponding relation between the templated natural language and the structured query sentence.
2. The method of claim 1, wherein the first model comprises a model trained by:
obtaining a sample set, wherein the sample set comprises sample natural language sentences and sample templated natural languages corresponding to the sample natural language sentences;
and jointly training the first model and a second model based on the sample set, wherein the second model is used for representing the corresponding relation between the templated natural language and the natural language sentences.
3. The method of claim 2, wherein the jointly training the first and second models based on the sample set comprises:
and training a sample natural language sentence and a sample templated natural language corresponding to the sample natural language sentence as input and output respectively to obtain the first model based on a pre-established loss function, wherein the loss function comprises a regular term which is increased to meet the probability duality of the first model and the second model.
4. The method of claim 2, wherein the sample set includes a sample templated natural language generated via:
obtaining key information based on a database associated with a structured query statement to be output, wherein the key information comprises at least one of the following: table names, field names, attributes, operational characters and operators in the structured query language;
and filling the acquired key information into a predefined natural language template to obtain the sample templated natural language.
5. The method according to any one of claims 1 to 4, wherein the outputting the structured query statement corresponding to the generated templated natural language according to the preset corresponding relationship between the templated natural language and the structured query statement comprises:
determining key information included in the generated templated natural language and a corresponding natural language template;
acquiring a structured query statement template corresponding to the determined natural language template according to a corresponding relation between a pre-established natural language template and the structured query statement template;
and filling the determined key information into the acquired structured query statement template to obtain the structured query statement.
6. An apparatus for outputting a structured query statement, comprising:
an acquisition unit configured to acquire a natural language sentence to be converted;
a generating unit configured to input the natural language sentence to a pre-trained first model, and generate a templated natural language corresponding to the natural language sentence;
and the output unit is configured to output the structured query statement corresponding to the generated templated natural language according to the preset corresponding relation between the templated natural language and the structured query statement.
7. The apparatus of claim 6, wherein the apparatus further comprises a training unit comprising:
a first obtaining subunit configured to obtain a sample set including a sample natural language sentence and a sample templated natural language corresponding to the sample natural language sentence;
a training subunit configured to jointly train the first model and a second model based on the sample set, the second model being used to characterize a correspondence of a templated natural language with a natural language sentence.
8. The apparatus of claim 7, wherein the training subunit is further configured to:
and training a sample natural language sentence and a sample templated natural language corresponding to the sample natural language sentence as input and output respectively to obtain the first model based on a pre-established loss function, wherein the loss function comprises a regular term which is increased to meet the probability duality of the first model and the second model.
9. The apparatus of claim 7, wherein the apparatus further comprises a sample generation unit comprising:
a second obtaining subunit configured to obtain key information based on a database associated with the structured query statement to be output, the key information including at least one of: table names, field names, attributes, operational characters and operators in the structured query language;
a first filling subunit configured to fill the acquired key information into a predefined natural language template to obtain a sample templated natural language.
10. The apparatus according to any one of claims 6-9, wherein the output unit comprises:
a determining subunit configured to determine key information included in the generated templated natural language and a corresponding natural language template;
a third obtaining subunit, configured to obtain a structured query statement template corresponding to the determined natural language template according to a correspondence between a pre-established natural language template and the structured query statement template;
and the second filling subunit is configured to fill the determined key information into the acquired structured query statement template to obtain a structured query statement.
11. An electronic device, comprising:
one or more processors;
a storage device having one or more programs stored thereon;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method recited in any of claims 1-5.
12. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-5.
CN201911412056.7A 2019-12-31 2019-12-31 Method and apparatus for outputting structured query statement Active CN111159220B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911412056.7A CN111159220B (en) 2019-12-31 2019-12-31 Method and apparatus for outputting structured query statement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911412056.7A CN111159220B (en) 2019-12-31 2019-12-31 Method and apparatus for outputting structured query statement

Publications (2)

Publication Number Publication Date
CN111159220A true CN111159220A (en) 2020-05-15
CN111159220B CN111159220B (en) 2023-06-23

Family

ID=70560243

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911412056.7A Active CN111159220B (en) 2019-12-31 2019-12-31 Method and apparatus for outputting structured query statement

Country Status (1)

Country Link
CN (1) CN111159220B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111651474A (en) * 2020-06-02 2020-09-11 东云睿连(武汉)计算技术有限公司 Method and system for converting natural language into structured query language
CN113032418A (en) * 2021-02-08 2021-06-25 浙江大学 Method for converting complex natural language query into SQL (structured query language) based on tree model
CN113254619A (en) * 2021-06-21 2021-08-13 北京沃丰时代数据科技有限公司 Automatic reply method and device for user query and electronic equipment
CN114117025A (en) * 2022-01-28 2022-03-01 阿里巴巴达摩院(杭州)科技有限公司 Information query method, device, storage medium and system
CN114168619A (en) * 2022-02-09 2022-03-11 阿里巴巴达摩院(杭州)科技有限公司 Training method and device of language conversion model
CN114461665A (en) * 2022-01-26 2022-05-10 北京百度网讯科技有限公司 Method, apparatus and computer program product for generating a statement transformation model

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105868313A (en) * 2016-03-25 2016-08-17 浙江大学 Mapping knowledge domain questioning and answering system and method based on template matching technique
US20170177715A1 (en) * 2015-12-21 2017-06-22 Adobe Systems Incorporated Natural Language System Question Classifier, Semantic Representations, and Logical Form Templates
CN107451153A (en) * 2016-05-31 2017-12-08 北京京东尚科信息技术有限公司 The method and apparatus of export structure query statement
WO2018081020A1 (en) * 2016-10-24 2018-05-03 Carlabs Inc. Computerized domain expert
CN109408526A (en) * 2018-10-12 2019-03-01 平安科技(深圳)有限公司 SQL statement generation method, device, computer equipment and storage medium
CN109542929A (en) * 2018-11-28 2019-03-29 山东工商学院 Voice inquiry method, device and electronic equipment
CN109739483A (en) * 2018-12-28 2019-05-10 北京百度网讯科技有限公司 Method and apparatus for generated statement
CN109766355A (en) * 2018-12-28 2019-05-17 上海汇付数据服务有限公司 A kind of data query method and system for supporting natural language
CN110347784A (en) * 2019-05-23 2019-10-18 深圳壹账通智能科技有限公司 Report form inquiring method, device, storage medium and electronic equipment

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170177715A1 (en) * 2015-12-21 2017-06-22 Adobe Systems Incorporated Natural Language System Question Classifier, Semantic Representations, and Logical Form Templates
CN105868313A (en) * 2016-03-25 2016-08-17 浙江大学 Mapping knowledge domain questioning and answering system and method based on template matching technique
CN107451153A (en) * 2016-05-31 2017-12-08 北京京东尚科信息技术有限公司 The method and apparatus of export structure query statement
WO2018081020A1 (en) * 2016-10-24 2018-05-03 Carlabs Inc. Computerized domain expert
CN109408526A (en) * 2018-10-12 2019-03-01 平安科技(深圳)有限公司 SQL statement generation method, device, computer equipment and storage medium
CN109542929A (en) * 2018-11-28 2019-03-29 山东工商学院 Voice inquiry method, device and electronic equipment
CN109739483A (en) * 2018-12-28 2019-05-10 北京百度网讯科技有限公司 Method and apparatus for generated statement
CN109766355A (en) * 2018-12-28 2019-05-17 上海汇付数据服务有限公司 A kind of data query method and system for supporting natural language
CN110347784A (en) * 2019-05-23 2019-10-18 深圳壹账通智能科技有限公司 Report form inquiring method, device, storage medium and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
郭富强; 鱼滨: "基于模糊数据库的数据查询研究", 《微电子学与计算机》, pages 123 - 126 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111651474A (en) * 2020-06-02 2020-09-11 东云睿连(武汉)计算技术有限公司 Method and system for converting natural language into structured query language
WO2021243903A1 (en) * 2020-06-02 2021-12-09 东云睿连(武汉)计算技术有限公司 Method and system for transforming natural language into structured query language
CN111651474B (en) * 2020-06-02 2023-07-25 东云睿连(武汉)计算技术有限公司 Method and system for converting natural language into structured query language
CN113032418A (en) * 2021-02-08 2021-06-25 浙江大学 Method for converting complex natural language query into SQL (structured query language) based on tree model
CN113254619A (en) * 2021-06-21 2021-08-13 北京沃丰时代数据科技有限公司 Automatic reply method and device for user query and electronic equipment
CN114461665A (en) * 2022-01-26 2022-05-10 北京百度网讯科技有限公司 Method, apparatus and computer program product for generating a statement transformation model
CN114461665B (en) * 2022-01-26 2023-01-24 北京百度网讯科技有限公司 Method, apparatus and computer program product for generating a statement transformation model
CN114117025A (en) * 2022-01-28 2022-03-01 阿里巴巴达摩院(杭州)科技有限公司 Information query method, device, storage medium and system
CN114117025B (en) * 2022-01-28 2022-05-17 阿里巴巴达摩院(杭州)科技有限公司 Information query method, device, storage medium and system
CN114168619A (en) * 2022-02-09 2022-03-11 阿里巴巴达摩院(杭州)科技有限公司 Training method and device of language conversion model

Also Published As

Publication number Publication date
CN111159220B (en) 2023-06-23

Similar Documents

Publication Publication Date Title
CN111159220B (en) Method and apparatus for outputting structured query statement
CN109460513B (en) Method and apparatus for generating click rate prediction model
CN109614111B (en) Method and apparatus for generating code
CN110969012B (en) Text error correction method and device, storage medium and electronic equipment
CN108877782B (en) Speech recognition method and device
CN108932220A (en) article generation method and device
US11651015B2 (en) Method and apparatus for presenting information
CN110019742B (en) Method and device for processing information
CN109740167B (en) Method and apparatus for generating information
CN116127020A (en) Method for training generated large language model and searching method based on model
CN111382261B (en) Abstract generation method and device, electronic equipment and storage medium
CN111582360A (en) Method, apparatus, device and medium for labeling data
CN111008213B (en) Method and apparatus for generating language conversion model
CN111414453A (en) Structured text generation method and device, electronic equipment and computer readable storage medium
CN111104796B (en) Method and device for translation
CN110232920B (en) Voice processing method and device
CN116955561A (en) Question answering method, question answering device, electronic equipment and storage medium
CN111125154B (en) Method and apparatus for outputting structured query statement
CN114020774A (en) Method, device and equipment for processing multiple rounds of question-answering sentences and storage medium
CN114357195A (en) Knowledge graph-based question-answer pair generation method, device, equipment and medium
CN113723095A (en) Text auditing method and device, electronic equipment and computer readable medium
CN109036554B (en) Method and apparatus for generating information
CN110705308A (en) Method and device for recognizing field of voice information, storage medium and electronic equipment
CN109800438B (en) Method and apparatus for generating information
CN116821327A (en) Text data processing method, apparatus, device, readable storage medium and product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant