CN117453717A - Data query statement generation method, device, equipment and storage medium - Google Patents

Data query statement generation method, device, equipment and storage medium Download PDF

Info

Publication number
CN117453717A
CN117453717A CN202311468107.4A CN202311468107A CN117453717A CN 117453717 A CN117453717 A CN 117453717A CN 202311468107 A CN202311468107 A CN 202311468107A CN 117453717 A CN117453717 A CN 117453717A
Authority
CN
China
Prior art keywords
data query
sentence
original
thinking
chain
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311468107.4A
Other languages
Chinese (zh)
Inventor
谭锋镭
王墨
谢俊言
夏正勋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Transwarp Technology Shanghai Co Ltd
Original Assignee
Transwarp Technology Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Transwarp Technology Shanghai Co Ltd filed Critical Transwarp Technology Shanghai Co Ltd
Priority to CN202311468107.4A priority Critical patent/CN117453717A/en
Publication of CN117453717A publication Critical patent/CN117453717A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a data query statement generation method, a device, equipment and a storage medium, which comprise the following steps: acquiring a process display requirement and a natural sentence to be converted; constructing a prompt word instruction according to the process display requirement; inputting the natural sentences to be converted into a pre-trained data query sentence generation model based on the prompt word instruction, and determining and outputting an intermediate generation result and a target data query sentence corresponding to the process display requirement according to the model output result; the pre-trained data query sentence generation model is a large generated language model obtained based on pre-built thinking chain training, and the pre-built thinking chain is a thinking chain obtained by logically segmenting a corresponding logic plan of the data query sentence. The conversion process from the natural sentences to be converted to the target data query sentences based on logic driving is defined, so that the sentence conversion processing process is measurable and controllable, and the interpretability of an output result and the transparency of the target data query sentence generation process are improved.

Description

Data query statement generation method, device, equipment and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for generating a data query statement.
Background
In the current digital age, there is an increasing need to obtain useful information from large data sets. Data query languages are widely used in databases for extracting, modifying and managing data, such as structured query language (Structured Query Language) is often used in relational databases for data management. However, writing valid data query statements can be a challenging task for users that are not familiar with the data query language or database structure. To simplify user interaction with the database, the conversion of natural language into data query statements becomes a hotspot for research.
The large language model, which is a deep learning model trained using a large amount of text, is applied to generate corresponding data query sentences from natural language questions provided by users, so that non-professional users can directly issue query requests in natural language without learning complex data query language grammars and database structures. There are various methods available for implementing the conversion of natural language into data query sentences, such as mapping natural language into data query sentences based on manually formulated rules and templates, modeling and matching natural language and corresponding data query sentences using machine learning algorithms and training data, modeling and mapping natural language and data query sentences end-to-end using deep learning techniques, and generating strategies for data query sentences by interactive learning and optimization with databases using reinforcement learning algorithms.
However, the method of manually formulating rules and templates requires a great deal of manual action and expertise, and is difficult to adapt to complex and diverse query requirements; machine learning algorithms based on maximum entropy models, hidden Markov models, conditional random fields, etc., may not perform well for specific fields or complex queries that lack training data; the method based on the model of the cyclic neural network and the attention mechanism, the model based on the converter and the like using the deep learning technology can realize the conversion from the natural language to the data query statement more flexibly and accurately through the large-scale training data and parameter optimization, but a large amount of marking data and calculation resources are needed, and the model has relatively weak interpretation; the method for generating the data query statement based on the reinforcement learning algorithm requires longer interaction and training time, and has higher access cost to the database. The above methods are not intuitive and controllable for the process from natural language to data query statement problem solving, so that users are difficult to obtain explanation with logic in the actual operation process, and the requirements of non-professional users on data query statement generation are difficult to be met.
Disclosure of Invention
The invention provides a charge and discharge control method, a device, equipment and a storage medium, which convert natural language into data query sentences through logic driving, so that the sentence conversion processing process is measurable and controllable, the interpretability of an output result is improved, the transparency of the data query sentence generation process is enhanced, and the possible errors in the data query sentence generation process are reduced.
In a first aspect, an embodiment of the present invention provides a method for generating a data query statement, including:
acquiring a process display requirement and a natural sentence to be converted;
constructing a prompt word instruction according to the process display requirement;
inputting the natural sentences to be converted into a pre-trained data query sentence generation model based on the prompt word instruction, and determining and outputting an intermediate generation result and a target data query sentence corresponding to the process display requirement according to the model output result;
the pre-trained data query sentence generation model is a large generated language model obtained based on pre-built thinking chain training, and the pre-built thinking chain is a thinking chain obtained by logically segmenting a corresponding logic plan of the data query sentence.
In a second aspect, an embodiment of the present invention provides a data query statement generating device, including:
The demand statement acquisition module is used for acquiring a process display demand and a natural statement to be converted;
the instruction construction module is used for constructing a prompt word instruction according to the process display requirement;
the query sentence generation module is used for inputting the natural sentence to be converted into a pre-trained data query sentence generation model based on the prompt word instruction, and determining and outputting an intermediate generation result and a target data query sentence corresponding to the process display requirement according to the model output result;
the pre-trained data query sentence generation model is a large generated language model obtained based on pre-built thinking chain training, and the pre-built thinking chain is a thinking chain obtained by logically segmenting a corresponding logic plan of the data query sentence.
In a third aspect, an embodiment of the present invention further provides a data query statement generating device, including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to enable the at least one processor to perform the data query statement generation method provided by the embodiments of the present invention.
In a fourth aspect, embodiments of the present invention also provide a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform the data query statement generation method provided by the embodiments of the present invention.
The embodiment of the invention provides a data query statement generation method, a device, equipment and a storage medium, which are used for displaying requirements and natural statements to be converted through an acquisition process; constructing a prompt word instruction according to the process display requirement; inputting the natural sentences to be converted into a pre-trained data query sentence generation model based on the prompt word instruction, and determining and outputting an intermediate generation result and a target data query sentence corresponding to the process display requirement according to the model output result; the pre-trained data query sentence generation model is a large generated language model obtained based on pre-built thinking chain training, and the pre-built thinking chain is a thinking chain obtained by logically segmenting a corresponding logic plan of the data query sentence. By adopting the technical scheme, the applied data query statement generation model is a large generated language model obtained based on pre-constructed thinking chain training, and the thinking chain is formed by logically segmenting according to a logic plan corresponding to the data query statement, so that when the data query statement is applied to a conversion process from a natural statement to be converted to the data query statement, the corresponding thinking chain can be generated through the data query statement generation model to indicate the conversion logic from the natural statement to be converted to a target data query statement, meanwhile, the construction of a prompt word instruction can be completed according to the process display requirement given by a user in advance, the data query statement generation model is guided to output and display the information such as the thinking chain containing the logic conversion process as an intermediate generation result, the conversion process from the natural statement to be converted to the target data query statement based on logic driving is defined, the conversion process of the natural statement to be converted is controllable, the interpretability of the output result is improved, the transparency of the target data query statement generation process is enhanced, meanwhile, the data query statement generation model can be accurately analyzed according to the logic chain and the conversion error statement contained in the logic chain is improved, and the conversion data in the conversion process to be accurately generated to the target data query statement is corrected.
It should be understood that the description of this section is not intended to identify key or critical features of the embodiments of the application or to delineate the scope of the application. Other features of the present application will become apparent from the description that follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a method for generating a data query statement according to an embodiment of the present invention;
FIG. 2 is a flowchart of a method for generating a data query statement according to a second embodiment of the present invention;
FIG. 3 is a training flowchart of a data query statement generation model according to a second embodiment of the present invention;
FIG. 4 is a flowchart illustrating a process of determining an original thinking chain corresponding to each sentence conversion data according to an original logic plan corresponding to each sentence conversion data, and determining an original training sample set corresponding to a sentence conversion data set according to each sentence conversion data and each original thinking chain according to a second embodiment of the present invention;
FIG. 5 is a flowchart illustrating determining an optimization thinking chain corresponding to each sentence conversion data according to an optimization logic plan corresponding to each sentence conversion data according to a second embodiment of the present invention;
FIG. 6 is a flowchart illustrating a process for determining an optimized training sample set corresponding to a sentence conversion data set according to sentence conversion data, original thinking chains, optimized thinking chains, and intermediate model output results according to a second embodiment of the present invention;
fig. 7 is a schematic structural diagram of a data query statement generating device according to a third embodiment of the present invention;
fig. 8 is a schematic structural diagram of a data query statement generating device according to a fourth embodiment of the present invention.
Detailed Description
In order to make the present application solution better understood by those skilled in the art, the following description will be made in detail and with reference to the accompanying drawings in the embodiments of the present application, it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, shall fall within the scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example 1
Fig. 1 is a flowchart of a method for generating a data query sentence according to an embodiment of the present invention, where the embodiment of the present invention is applicable to a case of converting a natural language type problem proposed by a user into a data query sentence that can be used for data query management in a database. The method may be performed by a data query statement generating device, which may be configured in a data query statement generating apparatus. Alternatively, the data query statement production device may be a notebook, a desktop computer, an intelligent tablet, or the like, which is not limited in the embodiment of the present invention.
As shown in fig. 1, the method for generating a data query statement provided by the embodiment of the invention specifically includes the following steps:
s101, acquiring a process display requirement and a natural sentence to be converted.
In this embodiment, the process presentation requirement is specifically understood as a requirement that is proposed and input by a user, and that is expected to specify the natural sentence-to-data query sentence conversion logic and the final conversion target. The natural sentence to be converted can be specifically understood as a natural language question description which is proposed and input by a user and needs to be converted into a data query sentence.
Specifically, digital assistant products in the financial field often answer questions of customers in an intelligent manner, and conventional intelligent question-answering systems can retrieve answers from a knowledge base based on the customer's questions, but often fail to answer the results that are available only after a database query. In the scheme, a new data query statement generation model is constructed, so that the generation from the natural statement to the data query statement which can be used for database query can be realized, the problem raised by a client can be received from the outside, the client question of the natural language type is determined as the natural statement to be converted, the user can be prompted to input the display requirement aiming at the conversion logic and the final conversion target of the natural statement to the data query statement, and the display requirement is determined as the process display requirement.
It can be understood that the process display requirements can only include the final display result requirements expected by the user, but not include the intermediate process display requirements, and the user can be prompted according to the type of the intermediate results which can be generated by the data query statement generation model, so that the user can input the process display requirements which meet the output capability of the scheme.
S102, constructing a prompt word instruction according to the process display requirement.
In this embodiment, the term instruction may be specifically understood as an artificial intelligence term (prompt) that uses natural language to instruct or excite an artificial intelligence model to complete a specific task, and in this embodiment, may be understood as an instruction constructed according to a process exhibition requirement to prompt the content of the data query statement generation model to be input at different stages, and prompt the data query statement generation model to output the content.
Specifically, the process display requirement is split so as to define the requirement that the user hopes to finally obtain the data query statement, and the requirement that the user hopes to display the intermediate result in the process of converting the natural statement to be converted into the data query statement. And the obtained intermediate result requirements and data query statement requirements are arranged according to the execution sequence, the process from the previous requirement to the next requirement is determined as a task, the task description is explicitly carried out by utilizing a preset template or according to task targets, required operation, related conditions and the like, the writing of task templates is completed by using common vocabulary on the basis of considering the context and the context, and prompt word instructions corresponding to the process display requirements can be obtained by combining the tasks corresponding to the templates according to the time sequence.
It should be clear that the construction of the promt should be straightforward, easy to understand, and relevant to the task description.
S103, inputting the natural sentences to be converted into a pre-trained data query sentence generation model based on the prompt word instruction, and determining and outputting an intermediate generation result and a target data query sentence corresponding to the process display requirement according to the model output result.
The pre-trained data query sentence generation model is a large generated language model obtained based on pre-built thinking chain training, and the pre-built thinking chain is a thinking chain obtained by logically segmenting a corresponding logic plan of the data query sentence.
In this embodiment, the data query sentence generation model can be understood as a large language model of a generation formula for converting a natural language type sentence input therein into a thought chain and converting the thought chain input therein into a data query sentence. The Chain of thought (CoT) is specifically understood as an improved prompt strategy for improving the performance of large language models in complex reasoning tasks, which may include deriving prompt information between input and output.
Specifically, each data query sentence corresponding to the database is provided with logic executing the data query sentence, before being converted into a physical execution plan, the logic plan can be called as the logic plan corresponding to the data query sentence, in order to logically realize one logic plan, a plurality of logic execution operations which are processed in series or in parallel are needed, each logic execution operation has a dependency relationship, the execution sequence of each execution operation can be determined according to the corresponding dependency relationship, and a serial-parallel execution mode can be used for determining the execution logic of the data query sentence by carrying out logic segmentation on the logic plan, and the thinking chain can be used for describing the deduction prompt information of each logic execution operation in the execution logic, so that the thinking chain corresponding to the data query sentence can be constructed based on the logic plan corresponding to the data query sentence after logic segmentation, and the thinking chain is used as an intermediate node between natural language and the data query sentence, so that training of a data query sentence generation model is completed, and the completed data query sentence generation model has the capability of outputting the thinking chain according to the natural sentence input therein, and the capability of outputting the data query sentence according to the thinking chain input therein. Since the thought chain which is used as the intermediate node and contains the natural sentence to data query sentence reasoning prompt can be output, when the natural sentence to be converted is input into the pre-trained data query sentence generation model based on the prompt word instruction, not only the target data query sentence corresponding to the natural sentence to be converted can be obtained, but also the thought chain which corresponds to the natural sentence to be converted and contains the reasoning prompt information can be obtained as an intermediate generation result. And determining an intermediate generation result required to be output by the data query statement generation model according to the process display requirement, and taking the intermediate generation result as the input of the data query statement generation model again so as to determine the finally obtained model output result as a target data query statement.
According to the technical scheme, the requirements and the natural sentences to be converted are displayed through the acquisition process; constructing a prompt word instruction according to the process display requirement; inputting the natural sentences to be converted into a pre-trained data query sentence generation model based on the prompt word instruction, and determining and outputting an intermediate generation result and a target data query sentence corresponding to the process display requirement according to the model output result; the pre-trained data query sentence generation model is a large generated language model obtained based on pre-built thinking chain training, and the pre-built thinking chain is a thinking chain obtained by logically segmenting a corresponding logic plan of the data query sentence. By adopting the technical scheme, the applied data query statement generation model is a large generated language model obtained based on pre-constructed thinking chain training, and the thinking chain is formed by logically segmenting according to a logic plan corresponding to the data query statement, so that when the data query statement is applied to a conversion process from a natural statement to be converted to the data query statement, the corresponding thinking chain can be generated through the data query statement generation model to indicate the conversion logic from the natural statement to be converted to a target data query statement, meanwhile, the construction of a prompt word instruction can be completed according to the process display requirement given by a user in advance, the data query statement generation model is guided to output and display the information such as the thinking chain containing the logic conversion process as an intermediate generation result, the conversion process from the natural statement to be converted to the target data query statement based on logic driving is defined, the conversion process of the natural statement to be converted is controllable, the interpretability of the output result is improved, the transparency of the target data query statement generation process is enhanced, meanwhile, the data query statement generation model can be accurately analyzed according to the logic chain and the conversion error statement contained in the logic chain is improved, and the conversion data in the conversion process to be accurately generated to the target data query statement is corrected.
Example two
Fig. 2 is a flowchart of a data query statement generating method provided by a second embodiment of the present invention, where the technical solution of the second embodiment of the present invention is further optimized based on the above-mentioned alternative technical solutions, and by analyzing the process display requirement, determining an intermediate prompt problem template and a target prompt problem template corresponding to the intermediate output requirement and the target output requirement according to the division into the intermediate output requirement and the target output requirement, respectively, and splicing the intermediate prompt problem template and the target prompt problem template to obtain a prompt word instruction for prompting the data query statement generating model to output according to the user process display requirement, so that the data query statement generating model can generate an intermediate generating result for an input natural statement to be converted according to the intermediate prompt problem template, and obtain a target data query statement according to the input intermediate generating result according to the target prompt problem template, thereby realizing the display of the target data query statement generating process, making the statement conversion process measurable and controllable, improving the interpretability of the output result, and enhancing the transparency of the target data query statement generating process. Meanwhile, a training mode of a data query statement generation model is provided in the embodiment, through an obtained statement conversion data set, a logic plan corresponding to the data query statement in the statement conversion data set is determined by adopting modes with different optimization degrees, logic segmentation is further carried out on the logic plan according to the dependency relationship, a corresponding original thinking chain and an optimized thinking chain are constructed, an original training sample set is further formed according to the original thinking chain and the statement conversion data set, preliminary training of the initial data query statement generation model is completed, and the intermediate data query statement generation model obtained through training has the generation capacity of natural statement- > thinking chain- > data query statement under the condition of retaining the maximum logic semantic information. The construction of the optimization training sample set is further completed based on the optimization thinking chain and the intermediate data query statement generation model, and the intermediate data query statement generation model is further optimized and trained through the optimization training sample set, so that the finally obtained data query statement generation model not only has the natural statement- > thinking chain- > data query statement generation capability under the condition of retaining the maximum logic semantic information, but also has the natural statement- > thinking chain- > data query statement generation capability under the condition of optimizing the logic semantic information, and the conversion capability between two thinking chains under the condition of different logic semantic information, the type that the intermediate generation result can be generated by the data query statement generation model is further enriched, the generation logic from the natural statement to the data query statement can be displayed in a more clear and detailed mode, and the accuracy of converting the natural statement to the target data query statement is improved according to multi-level logic analysis contained in the thinking chain and error possibly occurring in the data query statement generation process.
As shown in fig. 2, a method for generating a data query statement according to a second embodiment of the present invention specifically includes the following steps:
s201, acquiring a process display requirement and a natural sentence to be converted.
S202, determining an intermediate output requirement and a target output requirement according to the process display requirement.
The intermediate output requirements at least comprise an original thinking chain output requirement, an optimized thinking chain output requirement and an original data query statement output requirement; the target output requirements include at least an original data query statement output requirement and an optimized data query statement output requirement.
In this embodiment, the output requirement of the original thought chain can be specifically understood as a requirement of the thought chain that the data query sentence generation model is required to output the corresponding non-optimized natural sentences according to the input natural sentences, and the logic semantic information is retained to the greatest extent. The requirement of the output of the optimization thinking chain can be specifically understood as the requirement of the optimization thinking chain which needs the data query statement generation model to output the logic semantic information which corresponds to the data query statement generation model according to the input natural statement or the original thinking chain and is processed by an optimization strategy such as predicate push. The output requirement of the original data query statement can be specifically understood as the requirement that the data query statement generation model is required to output the corresponding original data query statement according to the input original thinking chain. Each of the output requirements may be considered as an output requirement of an intermediate result of the data query statement converted from the natural language statement to the end user requirement, and thus each of the output requirements may be considered as an intermediate output requirement. The output requirement of the optimized data query statement can be specifically understood as the requirement that the data query statement generation model outputs the corresponding optimized data query statement according to the input optimized thinking chain. It can be understood that both the original data query statement and the optimized data query statement can be used as data query statements finally required by the user, and only represent different results generated by different thinking chains of the data query statement generation model input in executing the generation task, so that the output requirements of the original data query statement and the output requirements of the optimized data query statement can be used as selectable target output requirements.
Optionally, the prompt of the intermediate output requirement and the target output requirement can be set based on the capability of the training completed data query statement generation model, so that the user can realize the configuration of the process display requirement according to the requirement.
S203, determining a corresponding middle prompt problem template according to the middle output requirement.
The intermediate prompt problem template is used for prompting the input and output targets from the original data to the intermediate output requirement.
Specifically, the original data which is expected to be input into the data query statement generation model is determined according to specific intermediate output requirements, the intermediate output which is expected to be obtained by the data query statement generation model aiming at the input original data is determined, and an intermediate prompt problem template which corresponds to the intermediate output requirements and is used for prompting the input and output targets from the original data to the intermediate output requirements is determined according to the original data, the intermediate output, the conversion targets and the required operation. Optionally, the intermediate prompt problem template may be preset according to an intermediate result that may be output by the data query statement generation model, or may be determined in real time according to an intermediate output requirement given by a user each time, which is not limited in the embodiment of the present invention.
For example, assuming the intermediate output requirement is the original thought chain, the intermediate hint problem template can be expressed as: "the natural sentence to be converted, which the user wants to convert, is known as follows { } please generate the original thought chain containing detailed description of the implementation step according to the natural sentence to be converted. The above-mentioned intermediate prompt question template is only one example provided in the embodiment of the present invention, and the specific implementation manner may be adaptively set according to actual situations, which is not limited in the embodiment of the present invention.
S204, determining a corresponding target prompt problem template according to the target output requirement.
The target prompting problem template is used for prompting the middle output corresponding to the middle output requirement to the input and output target of the target output requirement.
Specifically, an intermediate output corresponding to the intermediate output requirement input to the data query statement generation model is determined according to a specific target output requirement, and a target prompt problem template corresponding to the target output requirement is determined according to the intermediate output, the data query statement, a conversion target and a required operation and used for prompting an input/output target from the intermediate output to the target output requirement, wherein the data query statement is expected to be finally output by the user aiming at the intermediate output by the data query statement generation model. Optionally, the target prompt problem template may be preset according to a data query statement that the data query statement generation model may output and that the data query statement is desired to be output, or may be determined in real time according to a target output requirement given by a user each time, which is not limited in the embodiment of the present invention.
For example, assuming that the target output requirement is the original data query statement output requirement, the target hint question template may be expressed as: "follow the following steps { } please step by step generate the original data query statement corresponding to the question according to the above steps. The step description information contained in the original thought chain corresponding to the original data query statement may be input in "wherein. The target prompt problem template is only one example provided in the embodiment of the present invention, and the specific implementation manner may be adaptively set according to the actual situation, which is not limited in the embodiment of the present invention.
S205, splicing the middle prompt question template and the target prompt question template, and determining the middle prompt question template and the target prompt question template as prompt word instructions.
Specifically, the intermediate prompt question templates are ordered according to the execution sequence, the intermediate prompt question templates and the target prompt question templates are spliced according to the sequence, and the spliced content is determined to be a prompt word instruction.
It can be understood that the user does not need to give an intermediate output requirement when determining the process display requirement, only gives a target output requirement, and the prompt word instruction is a target prompt problem template.
S206, inputting the natural sentences to be converted into a pre-trained data query sentence generation model according to the middle prompt problem template, and determining the obtained model output result as a middle generation result.
Specifically, when only one intermediate prompt question template exists, the natural sentence to be converted is input into the pre-trained data query sentence generation model according to the intermediate prompt question template, and the obtained model output result is determined to be an intermediate generation result. When there are a plurality of intermediate prompt question templates, the corresponding content is input into the data query sentence generation model according to the input requirements contained in the intermediate prompt question templates, and the output result of the data query sentence generation model is determined as the intermediate generation result corresponding to the intermediate prompt question templates, it can be understood that at this time, the input corresponding to the first intermediate prompt question template should be a natural sentence to be converted, and the input corresponding to the intermediate prompt question template located in time sequence thereafter may be the model output of the preceding data query sentence generation model.
S207, inputting the intermediate generation result into a data query statement generation model according to the target prompt problem template, and determining the obtained model output result as a target data query statement.
Further, before the requirements and the natural sentences to be converted are displayed in the acquisition process, training of the data query sentence generation model is completed, and fig. 3 is a training flow chart of the data query sentence generation model provided in the second embodiment of the present invention, as shown in fig. 3, specifically including the following steps:
S301, acquiring a statement conversion data set.
The sentence conversion data set comprises at least two groups of sentence conversion data, and each group of sentence conversion data comprises natural sentences, semantic description information and data query sentences which are in one-to-one correspondence.
In this embodiment, the statement conversion data set may be specifically understood as a set of a plurality of sets of database table description information, query questions, and data query statements having a corresponding relationship, which are obtained based on a to-be-applied scenario, and each set of database table description information, query questions, and data query statements may be used as one set of statement conversion data in the statement conversion data set. It can be understood that the natural sentence in the sentence conversion data can be specifically understood as a query problem of a given natural language type when a user needs to perform data query in a database; the semantic description information is specifically understood as description information containing semantics corresponding to the data query statement required by the database.
For example, taking a financial database assisted query scenario as an example, the problem types that the financial digital assistant can process include a form problem, an information retrieval problem, a content summary problem, a hotspot search problem, and the like, wherein the form problem belongs to a problem that a result can be output only by relying on database query. According to the method and the device for sorting and acquiring the statement conversion data set, query problems and corresponding data query statements related to the statement description and the data in the financial database can be sorted according to the statement description and the data in the financial database, and each group of the statement description, the query problems and the data query statements are used as one group of statement conversion data in the statement conversion data set. Taking the stock right excitation table t_inc_shop_info in the financial database as an example, each sentence conversion data in the sentence conversion data set obtained after corpus arrangement can comprise natural sentences, namely, the inquiry problem is described by natural language, which can be called Query for short; taking SQL query sentences as an example, the data query sentences corresponding to the natural sentences are subsequently referred to by SQL codes; and description information containing semantics, namely semantic description information, which may be simply referred to herein as schema, an example of which is represented as follows by a set of statement transformation data:
{ "query": 1 month 1 year 2022 to now, how many marketers successfully implement the equity incentive,
"sql": "select count (di_stop (S_INFO_WINDCODE)) from t_inc_shop_info window trunk (now (), 'YYYY-MM-DD') = '2022-01-01'and progress_name = 'completed',
"schema": "create table t_inc_shop_info (\n id, \ n s _info_windcode, stock code \ n s _info_name, stock name \n preplan_ ann _date, draft bulletin day \ns_inc_first, grant day \ n s _inc_initeecpri, grant price \ n s _inc_square, incentive total number (tens/tens of thousands) \n inc return_rate, stock incentive return (%), n cit_1, medium business first order, n cit_2, medium business second order, n cit_3, medium business third order, ns_info_program, province, n s _info_city, city, n wind_sec_code, corporate nature, nprogress_name, project schedule n inc_exec_set varchar2 (50), unlocking ratio of each period time, the lock time (month) of each batch, the incentive tool, the ndi incentive, the nominal discount rate = grant price/benchmark price, the n discrete real, the actual discount rate = grant price/grant day harvest price, the current incentive, the same draft day counting the same period, the n-time of incentive, the cumulative number of incentives (period), the number of incentives (20, 0), the number of incentives, the total number of employees, the number of incentives (month), the lock period, n locked period, the lock period (year), n s info_l time, the market date n s info_exp time, the first number of days of market trade to the first time of incentive (37 days of market) of the trade number of incentive (37, 36 days of market, negative values indicate that equity incentives \n agent agency names \n) were implemented prior to market,
"lan":"zh"}
S302, determining an original thinking chain corresponding to each sentence conversion data according to an original logic plan corresponding to each sentence conversion data, and determining an original training sample set corresponding to the sentence conversion data set according to each sentence conversion data and each original thinking chain.
In this embodiment, the original logic plan may be specifically understood as a logic plan that is generated by compiling the SQL in the statement conversion data without processing by the optimization policy and can maximally retain logic semantic information in the SQL. The original thinking chain can be specifically understood as a thinking chain comprising different-level logic execution introduction constructed after segmentation and hierarchical division according to the dependency relationship based on the logic relationship chain contained in the original logic plan. The original training sample set is specifically understood as a set formed by training a data query sentence generation model primarily so that the data query sentence generation model can have the ability to determine an original thought chain from natural sentences and determine data query sentences corresponding to the natural sentences from the original thought chain.
Specifically, compiling the data query statement without optimization processing for each statement conversion data to obtain an original logic plan capable of retaining logic semantic information in the data query statement to the greatest extent, wherein the original logic plan can be a logic plan grammar tree structure, splitting branches of the logic plan grammar tree according to dependency relations, determining serial-parallel relations and hierarchical relations comprising a plurality of execution operations and operation steps corresponding to each execution operation, determining description for each execution operation and execution step according to the serial-parallel relations and the hierarchical relations by combining semantic description information in the statement conversion data, combining the description for each execution operation and the semantic description information in the statement conversion data to form an original thinking chain corresponding to the original logic plan, associating the original thinking chain with the statement conversion data, constructing an original training sample corresponding to each statement conversion data, and determining a set formed by each original training sample as an original training sample set.
Optionally, fig. 4 is a flowchart illustrating a process of determining an original thinking chain corresponding to each sentence conversion data according to an original logic plan corresponding to each sentence conversion data, and determining an original training sample set corresponding to a sentence conversion data set according to each sentence conversion data and each original thinking chain, as shown in fig. 4, which specifically includes the following steps:
s3021, determining an original logic plan corresponding to the data query statement in the statement conversion data for each group of the statement conversion data.
Specifically, the statement conversion data set is traversed, so that the data query statement contained in each group of statement conversion data can be determined, and then the original logic plan corresponding to each data query statement can be determined by compiling each data query statement without optimization strategy.
In the above example, it is assumed that the statement conversion data set may be represented as a Query-SQL data set, where a set of Query-SQL-schema corresponding to one by one may be used as a set of statement conversion data, when the original training sample set is generated, the Query-SQL data set is traversed first, one original training sample is generated for each set of statement conversion data, and the SQL Query statement in each set of statement conversion data is extracted, where the SQL Query statement may be recorded as sql_i, i is a sample sequence number, and then the sql_i is input to a Query performance analysis tool provided by the database management system, so that a logic plan corresponding to the sql_i may be generated. Optionally, in the embodiment of the invention, the visual original logic plan can be obtained by adopting SQL development of Oracle.
S3022, performing logic segmentation on the original logic plan according to the dependency relationship, and determining an original logic segmentation result comprising an execution operation hierarchy sequence and an operation step hierarchy sequence.
In the above example, since the original logic plan may be represented by a form of a logic plan syntax tree, in which a plurality of logic relationships between different execution operations may be included, a relationship link splitting method may be adopted at this time, each execution operation is divided into different stagei_lj according to the bifurcation of the logic plan syntax tree of the original logic plan, and serial and parallel processing relationships between each execution operation are determined according to the execution dependencies, where i is an operation sequence number and j is an execution operation hierarchy sequence number. If the execution dependency among different execution operations is not found, the same execution operation level can be put into for parallel processing; otherwise, the serial processing is carried out in different levels. Further, each stagei_lj can be segmented into different operation steps stepk_ph according to different data input and different data processing modes, and serial and parallel processing relations among the operation steps are determined according to execution dependencies, wherein k is an operation step sequence number, and h is an operation step hierarchy sequence number. The method can be used for carrying out serial processing on the same operation step level aiming at the same input data and the same data processing mode, and carrying out parallel processing on the operation steps related to different input data or different data processing modes at different operation step levels, so that an original logic segmentation result which corresponds to an original logic plan and comprises an execution operation level sequence and an operation step level sequence can be obtained.
In the embodiment of the invention, the parallel computing capacity of the multi-core processor and the distributed system can be exerted to the greatest extent by constructing the original logic segmentation result comprising the execution operation level sequence and the operation step level sequence, and the data operation efficiency is improved.
S3023, determining an original thinking chain corresponding to the sentence conversion data according to the original logic segmentation result and the semantic description information, and determining an original training sample corresponding to the sentence conversion data according to the natural sentence, the original thinking chain and the data query sentence so as to obtain an original training sample set formed by the original training samples.
The original training samples comprise a first original training sub-sample formed by a natural sentence and an original thinking chain and a second original training sub-sample formed by the original thinking chain and a data query sentence.
Specifically, since the original logic segmentation result includes an execution operation hierarchy sequence and an operation step hierarchy sequence, the execution operation hierarchy sequence and the operation step hierarchy sequence can be used to indicate the execution relationship and the execution structure of the corresponding data query statement in the execution process, so as to generate an original thinking chain including step interpretation, that is, to perform code generation of changing the code object on the original logic segmentation result, at this time, the execution operation and the operation step in the serial space can be respectively coded according to the execution operation hierarchy sequence and the operation step hierarchy sequence, so as to identify the parallel and serial execution relationship. Wherein performing operation level encoding and operation step level encoding are performed sequentially, the same encoding means that performing operations or operation steps may be performed in parallel. After coding step by step, using semantic description information with semantic labels to generate detailed description of implementation conditions of each execution operation and operation steps, and splicing the detailed description according to structures corresponding to original logic segmentation results to obtain an original thinking chain corresponding to sentence conversion data. Further, the original thinking chain can be used as a label of a natural sentence in the sentence conversion data corresponding to the original thinking chain, a first original training subsamples are constructed, the data query sentence in the sentence conversion data is used as a label of the original thinking chain corresponding to the sentence conversion data, a second original training subsamples are constructed, the first original training subsamples and the second original training subsamples are combined to be determined to be the original training samples corresponding to the sentence conversion data, and a set formed by the original training samples is determined to be an original training sample set.
By way of example, the content generating capability of the existing big data analysis big model can be combined with a plurality of tables and field information in the semantic description information schema to construct a prompt word instruction from the semantic description information schema and the data query statement SQL to the thinking chain CoT, taking the SQL exemplified in the stock right incentive tables t_inc_shop_info and S301 as an example, the specifically constructed prompt word instruction can be expressed as follows:
the { known data table information is as follows: "multiple steps are required, please make the content of each step clear step by step, and generate the final SQL statement" ". In each step, the applicable SQL keywords, SQL functions, etc. are required to be clarified. }
The schema and the SQL in the S301 can be respectively filled into the two double quotation marks in the prompt word instruction, so that the specific step description is output by guiding the big data analysis big model through the prompt word instruction. After the big data analysis big model outputs the thinking chain, the corresponding description in the thinking chain can be cleaned through manual verification or a third party evaluation model METEOR or BERTSCore and the like, and finally the original thinking chain CoT_P corresponding to SQL is obtained.
Following the above example, the CoT_P output by the big data analysis big model can be described as:
[ L1] [ stage 1] step one: the t_inc_shop_info table is filtered.
< P1> < step101> scans the related information table t_inc_shop_info storing employee option plans
Step two of [ L2] [ stage 2 ]: screening meeting condition line
< P1> < step201> data from day 1, month 1 of 2022 were screened. The current date is acquired using the current date function NOW (). Then it is truncated using the trunk function into a format with dates calculated by day. For example, the current date and time is truncated to the date level, and the date in the 'YYYY-MM-DD' format is obtained. Next, the date after the cutoff is compared with the size of '2022-01-01'. If the date after the cutoff is equal to or greater than '2022-01-01', the line satisfies the condition, and the line is reserved. If the date after the cutoff is less than '2022-01-01', the line does not satisfy the condition and the line is discarded.
< P2> < step202> the company that successfully enforces the equity incentive is screened out. The progress_name field is used to determine the progress of the equity incentive program. If progress_name is equal to 'completed', indicating that the company has successfully implemented the equity incentive, the line satisfies the condition, and the line is reserved. And otherwise, discarding the line.
Step three of [ L3] [ stage 3 ]: and counting the number of the companies meeting the requirements.
< P1> < step301> grouping the reserved rows, grouping according to the s_info_window column. Then, the S_INFO_WINDCODE column in each packet is subjected to a deduplication operation by using a DISTRINCT function, and a non-repeated S_INFO_WINDCODE value in each packet is obtained. The COUNT function is then used to COUNT the number of non-duplicate s_info_window values in each packet. Finally, the result is output using the SELECT statement.
Wherein, [ stage 1] etc. are used for indicating the execution operation sequence number of the execution operation in the original thinking chain, [ L1] [ L2] etc. are used for indicating the execution operation layer sequence number corresponding to different execution operations in the original thinking chain, < step101> etc. are used for indicating the operation step sequence number of different operation steps under one execution operation in the original thinking chain, < P1> etc. are used for indicating the operation step layer sequence number corresponding to different operation steps under one execution operation in the original thinking chain.
S303, training the initial data query statement generation model through the original training sample set to obtain an intermediate data query statement generation model.
In this embodiment, the initial data query term generation model may be specifically understood as a data query term generation model that is not subjected to weight parameter adjustment.
Specifically, when training is performed on the initial data query statement generation model, two training tasks can be constructed according to an original training sample set, wherein one training task is used for training the ability of the initial data query statement generation model to learn the ability of generating the description of the thinking chain based on the natural statement, and the other training task is used for training the ability of the initial data query statement generation model to learn the data query statement generation ability based on the input description of the thinking chain.
Optionally, training the initial data query sentence generation model through a first original training sub-sample set formed by each first original training sub-sample to obtain a stage intermediate data query sentence generation model; training the intermediate data query sentence generation model of the stage through a second original training sub-sample set formed by the second original training sub-samples to obtain the intermediate data query sentence generation model.
Specifically, since the first original training subsamples are training samples constructed by taking the original thinking chain as a natural sentence label, the first original training subsamples set formed by each first original training subsample can be used for training the initial data query sentence generating model, the trained model is determined to be a stage intermediate data query sentence generating model, and the stage intermediate data query sentence generating model can understand the natural sentence and generate corresponding original thinking chain description. Next, a second original training subsampleset formed by using the data query sentence as each second original training subsampleset corresponding to the label of the original thinking chain can be utilized to train the stage intermediate data query sentence generating model, and the trained model is determined as the intermediate data query sentence generating model, so that the output intermediate data query sentence generating model has the capability of converting the original thinking chain description into the data query sentence on the basis of generating the original thinking chain description by the natural sentence.
S304, determining an optimization thinking chain corresponding to each sentence conversion data according to the optimization logic plan corresponding to each sentence conversion data.
In this embodiment, the optimizing logic plan may be specifically understood as a logic plan after the optimizing policy processing is performed, and the SQL in the statement conversion data is compiled, so that the generated logic plan optimizes the logic voice information in the SQL. The optimization thinking chain is specifically understood as a thinking chain comprising different-level logic execution introduction constructed after segmentation and level division according to the dependency relationship based on a logic relation chain contained in the optimization logic plan.
It can be understood that the generation of the optimized logic plan and the generation of the optimized thought chain differ from the generation of the original logic plan and the original thought chain only in whether or not the processing is performed by adopting the optimization strategy for the data query statement in the statement conversion data, which is consistent with the processing method in S302.
Fig. 5 is a flowchart illustrating determining an optimization thinking chain corresponding to each sentence conversion data according to an optimization logic plan corresponding to each sentence conversion data according to a second embodiment of the present invention, and as shown in fig. 5, specifically includes the following steps:
S3041, determining an optimized logic plan corresponding to the data query statement in the statement conversion data according to a preset optimized processing strategy for each group of statement conversion data.
S3042, performing logic segmentation on the optimized logic plan according to the dependency relationship, and determining an optimized logic segmentation result comprising an execution operation hierarchy sequence and an operation step hierarchy sequence.
S3043, determining an optimized thinking chain corresponding to the sentence conversion data according to the optimized logic segmentation result and the semantic description information.
S305, inputting each optimized thinking chain into an intermediate data query sentence generation model to obtain an intermediate model output result corresponding to each optimized thinking chain, and determining an optimized training sample set corresponding to the sentence conversion data set according to each sentence conversion data, each original thinking chain, each optimized thinking chain and each intermediate model output result.
Specifically, each optimized thinking chain is sequentially input to the intermediate data query sentence generation model, and the intermediate data query sentence generation model has the capability of determining the data query sentences by the thinking chain, so that the obtained output result of the intermediate model can be considered as the data query sentence generation result given by the intermediate data query sentence generation model for the optimized thinking chain. At this time, the output result of the intermediate model can be compared with the data query statement which is contained in the data conversion data and is expected to be output by the model, whether the output result of the intermediate model which is generated by the intermediate data query statement generation model can be adopted or not is determined according to the comparison result, and then an optimized training sample set which corresponds to the statement conversion data set is determined according to the statement conversion data, the original thinking chains, the optimized thinking chains and the output result of the intermediate model.
Optionally, fig. 6 is a flowchart illustrating determining an optimized training sample set corresponding to a sentence conversion data set according to each sentence conversion data, each original thought chain, each optimized thought chain, and each intermediate model output result, and as shown in fig. 6, the flowchart specifically includes the following steps:
s3051, checking the output result of the intermediate model through the data query statement in the corresponding statement conversion data for each intermediate model output result, and determining the intermediate model output result as an optimized data query statement corresponding to the statement conversion data if the output result of the intermediate model passes the check.
For each intermediate model output result, the intermediate model output result and the data query statement in the corresponding statement conversion data can be compared and checked through the checker, if the result is consistent, the intermediate model output result output by the intermediate data query statement generation model can be considered to be a correct result, at this time, the intermediate model output result is determined to be an optimized data query statement corresponding to the statement conversion data, so as to form a corresponding relationship between the optimized data chain and the optimized data query statement, and the optimization training of the intermediate data query statement generation model can be completed by using the corresponding relationship between the optimized data chain and the optimized data query statement.
S3052, determining an optimized training sample corresponding to the sentence conversion data according to the natural sentences, the original thinking chain, the optimized thinking chain and the optimized data query sentences in the sentence conversion data so as to obtain an optimized training sample set formed by the optimized training samples.
The optimization training sample comprises a first optimization training subsamplewhich is composed of a natural sentence and an optimization thinking chain, a second optimization training subsamplewhich is composed of an optimization thinking chain and an optimization data query sentence, and a third optimization training subsamplewhich is composed of an original thinking chain and an optimization thinking chain.
Specifically, the optimized thinking chain is used as a label of a natural sentence in sentence conversion data corresponding to the optimized thinking chain, and a first optimized training subsampleis constructed, so that the first optimized training subsamplecan be used for training the generating capacity of the model from the natural data to the optimized thinking chain; constructing a second optimization training subsamples by taking the optimization data query sentences as labels of the optimization thinking chains corresponding to the optimization data query sentences, so that the second optimization training subsamples can be used for training the generating capacity from the optimization thinking chains to the optimization data query sentences; and constructing a third optimization training subsampleby taking the optimized thinking chain as a label corresponding to the original thinking chain, so that the third optimization training subsamplecan be used for training the original thinking chain formed by the non-optimized logic plan of the model, and the capability of forming the optimized thinking chain by the optimized logic plan. Further, the first, second and third optimal training subsamples are combined to determine an optimal training sample corresponding to the sentence conversion data, and a set of each optimal training sample is determined as an optimal training sample set.
S306, training the intermediate data query statement generation model through the optimized training sample set to obtain the data query statement generation model.
Specifically, training a middle data query sentence generation model through a first optimization training sub-sample set formed by each first optimization training sub-sample to obtain a first-stage data query sentence generation model; training the first-stage data query statement generation model through a second optimization training sub-sample set formed by each second optimization training sub-sample to obtain a second-stage data query statement generation model; and training the second-stage data query statement generation model through a third optimization training sub-sample set formed by all the third optimization training sub-samples to obtain the data query statement generation model.
Optionally, the optimization training for the data query statement generation model in S304-S306 is an optionally executed technical solution, and whether the data query statement generation model needs to be optimized or not can be determined according to the verification condition of the output result of the intermediate model determined in S3051, if the proportion of verification failure exceeds the preset proportion threshold, the optimization training for the data query statement generation model can be started, so that the data generation accuracy of the data query statement generation model is improved through the optimization training. It will be appreciated that training of the original and optimized mind chains may also be performed directly during the training process, as embodiments of the present invention are not limited in this regard.
According to the technical scheme, through analysis of process display requirements, the middle prompt problem template and the target prompt problem template corresponding to the middle output requirements are respectively determined according to the middle output requirements and the target output requirements, the middle prompt problem template and the target prompt problem template are spliced to obtain a prompt word instruction for prompting the data query statement generation model to output according to the user process display requirements, so that the data query statement generation model can generate an input natural statement to be converted into a middle generation result according to the middle prompt problem template, and a target data query statement is obtained according to the input middle generation result according to the target prompt problem template, display of the target data query statement generation process is achieved, statement conversion processing process is controllable, interpretability of the output result is improved, and transparency of the target data query statement generation process is enhanced. Meanwhile, a training mode of a data query statement generation model is provided in the embodiment, through an obtained statement conversion data set, a logic plan corresponding to the data query statement in the statement conversion data set is determined by adopting modes with different optimization degrees, logic segmentation is further carried out on the logic plan according to the dependency relationship, a corresponding original thinking chain and an optimized thinking chain are constructed, an original training sample set is further formed according to the original thinking chain and the statement conversion data set, preliminary training of the initial data query statement generation model is completed, and the intermediate data query statement generation model obtained through training has the generation capacity of natural statement- > thinking chain- > data query statement under the condition of retaining the maximum logic semantic information. The construction of the optimization training sample set is further completed based on the optimization thinking chain and the intermediate data query statement generation model, and the intermediate data query statement generation model is further optimized and trained through the optimization training sample set, so that the finally obtained data query statement generation model not only has the natural statement- > thinking chain- > data query statement generation capability under the condition of retaining the maximum logic semantic information, but also has the natural statement- > thinking chain- > data query statement generation capability under the condition of optimizing the logic semantic information, and the conversion capability between two thinking chains under the condition of different logic semantic information, the type that the intermediate generation result can be generated by the data query statement generation model is further enriched, the generation logic from the natural statement to the data query statement can be displayed in a more clear and detailed mode, and the accuracy of converting the natural statement to the target data query statement is improved according to multi-level logic analysis contained in the thinking chain and error possibly occurring in the data query statement generation process.
Example III
Fig. 7 is a schematic structural diagram of a data query sentence generating device according to a third embodiment of the present invention, where, as shown in fig. 7, the data query sentence generating device may include a requirement sentence obtaining module 41, an instruction constructing module 42, and a query sentence generating module 43.
The requirement sentence acquisition module 41 is configured to acquire a process presentation requirement and a natural sentence to be converted; an instruction construction module 42, configured to construct a prompt word instruction according to the process display requirement; the query sentence generation module 43 is configured to input a natural sentence to be converted into a pre-trained data query sentence generation model based on a prompt word instruction, and determine and output an intermediate generation result and a target data query sentence corresponding to a process display requirement according to a model output result; the pre-trained data query sentence generation model is a large generated language model obtained based on pre-built thinking chain training, and the pre-built thinking chain is a thinking chain obtained by logically segmenting a corresponding logic plan of the data query sentence.
According to the technical scheme, the applied data query statement generation model is the large generated language model obtained based on pre-constructed thinking chain training, and the thinking chain is formed by logically segmenting according to the corresponding logic plan of the data query statement, so that when the data query statement is applied to the conversion process from the natural statement to be converted to the data query statement, the corresponding thinking chain can be generated through the data query statement generation model to indicate the conversion logic from the natural statement to be converted to the target data query statement, meanwhile, the construction of the prompt word instruction can be completed according to the process display requirements given by a user, the data query statement generation model is guided to output and display the information such as the thinking chain containing the logic conversion process as the intermediate generation result, the conversion process from the natural statement to be converted to the target data query statement based on logic driving is defined, the conversion process of the natural statement to be converted to the target data query statement is controllable, the interpretability of the output result is improved, the transparency of the target data query statement generation process is enhanced, meanwhile, the data query statement generation model can be accurately analyzed according to the logic segmentation and the conversion data contained in the logic chain is improved, and the conversion error statement can be accurately generated to the target statement in the conversion process.
Optionally, the instruction construction module 42 includes:
the demand determining unit is used for determining an intermediate output demand and a target output demand according to the process display demand;
the middle template construction unit is used for determining a corresponding middle prompt problem template according to the middle output requirement; the middle prompting problem template is used for prompting an input/output target from the original data to a middle output requirement;
the target template construction unit is used for determining a corresponding target prompt problem template according to the target output requirement; the target prompting problem template is used for prompting the middle output corresponding to the middle output requirement to the input and output target of the target output requirement;
the instruction determining unit is used for splicing the middle prompt question template and the target prompt question template to determine the prompt word instruction; the intermediate output requirements at least comprise an original thinking chain output requirement, an optimized thinking chain output requirement and an original data query statement output requirement; the target output requirements include at least an original data query statement output requirement and an optimized data query statement output requirement.
Optionally, the query sentence generation module 43 includes:
the middle result generation unit is used for inputting the natural sentence to be converted into a pre-trained data query sentence generation model according to the middle prompt problem template, and determining the obtained model output result as a middle generation result;
The target statement generating unit is used for inputting the intermediate generating result into the data query statement generating model according to the target prompt problem template, and determining the obtained model output result as the target data query statement.
Optionally, the data query statement generating device further includes: the model training module is specifically used for:
acquiring a sentence conversion data set before acquiring a process presentation requirement and a natural sentence to be converted; the sentence conversion data set comprises at least two groups of sentence conversion data, wherein each group of sentence conversion data comprises natural sentences, semantic description information and data query sentences which are in one-to-one correspondence;
according to the original logic plan corresponding to each sentence conversion data, determining an original thinking chain corresponding to each sentence conversion data, and according to each sentence conversion data and each original thinking chain, determining an original training sample set corresponding to a sentence conversion data set;
training the initial data query statement generation model through the original training sample set to obtain an intermediate data query statement generation model;
determining an optimized thinking chain corresponding to each sentence conversion data according to the optimized logic plan corresponding to each sentence conversion data;
inputting each optimized thinking chain into an intermediate data query sentence generating model to obtain an intermediate model output result corresponding to each optimized thinking chain, and determining an optimized training sample set corresponding to a sentence conversion data set according to each sentence conversion data, each original thinking chain, each optimized thinking chain and each intermediate model output result;
And training the intermediate data query statement generation model through the optimized training sample set to obtain the data query statement generation model.
Optionally, determining an original thinking chain corresponding to each sentence conversion data according to an original logic plan corresponding to each sentence conversion data, and determining an original training sample set corresponding to the sentence conversion data set according to each sentence conversion data and each original thinking chain, including:
determining an original logic plan corresponding to the data query statement in the statement conversion data for each group of the statement conversion data;
performing logic segmentation on the original logic plan according to the dependency relationship, and determining an original logic segmentation result comprising an execution operation level sequence and an operation step level sequence;
determining an original thinking chain corresponding to the sentence conversion data according to the original logic segmentation result and the semantic description information, and determining an original training sample corresponding to the sentence conversion data according to the natural sentence, the original thinking chain and the data query sentence to obtain an original training sample set formed by the original training samples;
the original training samples comprise a first original training sub-sample formed by a natural sentence and an original thinking chain and a second original training sub-sample formed by the original thinking chain and a data query sentence.
Optionally, training the initial data query sentence generation model through the original training sample set to obtain an intermediate data query sentence generation model, including:
training the initial data query statement generation model through a first original training sub-sample set formed by each first original training sub-sample to obtain a stage intermediate data query statement generation model;
training the intermediate data query sentence generation model of the stage through a second original training sub-sample set formed by the second original training sub-samples to obtain the intermediate data query sentence generation model.
Optionally, determining an optimization thinking chain corresponding to each sentence conversion data according to an optimization logic plan corresponding to each sentence conversion data includes:
determining an optimized logic plan corresponding to the data query statement in the statement conversion data according to a preset optimized processing strategy aiming at each group of statement conversion data;
performing logic segmentation on the optimized logic plan according to the dependency relationship, and determining an optimized logic segmentation result comprising an execution operation level sequence and an operation step level sequence;
and determining an optimized thinking chain corresponding to the sentence conversion data according to the optimized logic segmentation result and the semantic description information.
Optionally, determining an optimized training sample set corresponding to the sentence conversion data set according to each sentence conversion data, each original thought chain, each optimized thought chain and each intermediate model output result includes:
checking the output result of each intermediate model through the data query statement in the corresponding statement conversion data, and if the output result of each intermediate model passes the checking, determining the output result of each intermediate model as an optimized data query statement corresponding to the statement conversion data;
determining an optimized training sample corresponding to the sentence conversion data according to natural sentences, an original thinking chain, an optimized thinking chain and optimized data query sentences in the sentence conversion data so as to obtain an optimized training sample set formed by all the optimized training samples;
the optimization training sample comprises a first optimization training subsamplewhich is composed of a natural sentence and an optimization thinking chain, a second optimization training subsamplewhich is composed of an optimization thinking chain and an optimization data query sentence, and a third optimization training subsamplewhich is composed of an original thinking chain and an optimization thinking chain.
Optionally, training the intermediate data query sentence generation model by optimizing the training sample set to obtain the data query sentence generation model, including:
Training the intermediate data query sentence generation model through a first optimization training sub-sample set formed by each first optimization training sub-sample to obtain a first-stage data query sentence generation model;
training the first-stage data query statement generation model through a second optimization training sub-sample set formed by each second optimization training sub-sample to obtain a second-stage data query statement generation model;
and training the second-stage data query statement generation model through a third optimization training sub-sample set formed by all the third optimization training sub-samples to obtain the data query statement generation model.
The charge and discharge control device provided by the embodiment of the invention can execute the data query statement generation method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Example IV
Fig. 8 is a schematic structural diagram of a data query statement generating device according to a fourth embodiment of the present invention. The data query statement generation device 50 may be an electronic device intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 8, the data query statement generating device 50 includes at least one processor 51, and a memory such as a Read Only Memory (ROM) 52, a Random Access Memory (RAM) 53, etc. which is communicatively connected to the at least one processor 51, wherein the memory stores a computer program executable by the at least one processor, and the processor 51 can perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 52 or the computer program loaded from the storage unit 58 into the Random Access Memory (RAM) 53. In the RAM 53, various programs and data required for the operation of the data query sentence generating device 50 can also be stored. The processor 51, the ROM 52 and the RAM 53 are connected to each other via a bus 54. An input/output (I/O) interface 55 is also connected to bus 54.
A plurality of components in the data query statement generating device 50 are connected to the I/O interface 55, including: an input unit 56 such as a keyboard, a mouse, etc.; an output unit 57 such as various types of displays, speakers, and the like; a storage unit 58 such as a magnetic disk, an optical disk, or the like; and a communication unit 59 such as a network card, modem, wireless communication transceiver, etc. The communication unit 59 allows the data query sentence generating device 50 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.
The processor 51 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 51 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 51 performs the various methods and processes described above, such as the data query statement generation method.
In some embodiments, the data query statement generation method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as the storage unit 58. In some embodiments, part or all of the computer program may be loaded and/or installed onto the data query statement generating device 50 via the ROM 52 and/or the communication unit 59. When the computer program is loaded into RAM 53 and executed by processor 51, one or more steps of the data query statement generation method described above may be performed. Alternatively, in other embodiments, the processor 51 may be configured to perform the data query statement generation method in any other suitable manner (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (12)

1. A method for generating a data query statement, comprising:
acquiring a process display requirement and a natural sentence to be converted;
constructing a prompt word instruction according to the process display requirement;
inputting the natural sentence to be converted into a pre-trained data query sentence generation model based on the prompt word instruction, and determining and outputting an intermediate generation result and a target data query sentence corresponding to the process display requirement according to a model output result;
The pre-trained data query statement generation model is a large generated language model obtained based on pre-built thinking chain training, and the pre-built thinking chain is a thinking chain obtained by logically segmenting a corresponding logic plan of the data query statement.
2. The method of claim 1, wherein the constructing a hint word instruction according to the process presentation requirement comprises:
determining an intermediate output requirement and a target output requirement according to the process display requirement;
determining a corresponding middle prompt problem template according to the middle output requirement; the intermediate prompt problem template is used for prompting an input/output target from original data to the intermediate output requirement;
determining a corresponding target prompt problem template according to the target output requirement; the target prompting problem template is used for prompting the middle output corresponding to the middle output requirement to the input and output target of the target output requirement;
splicing the intermediate prompt question template with the target prompt question template, and determining the intermediate prompt question template and the target prompt question template as prompt word instructions;
the intermediate output requirements at least comprise an original thinking chain output requirement, an optimized thinking chain output requirement and an original data query statement output requirement; the target output requirements at least comprise an original data query statement output requirement and an optimized data query statement output requirement.
3. The method according to claim 2, wherein the inputting the natural sentence to be converted into a pre-trained data query sentence generating model based on the prompt word instruction, determining and outputting an intermediate generating result and a target data query sentence corresponding to the process presentation requirement according to a model output result, includes:
inputting the natural sentences to be converted into the pre-trained data query sentence generating model according to the intermediate prompt problem template, and determining the obtained model output result as an intermediate generating result;
and inputting the intermediate generation result into the data query statement generation model according to the target prompt problem template, and determining the obtained model output result as a target data query statement.
4. The method of claim 1, further comprising, prior to the retrieving process exposing the requirements and the natural language sentence to be converted:
acquiring a statement conversion data set; the sentence conversion data set comprises at least two groups of sentence conversion data, wherein each group of sentence conversion data comprises natural sentences, semantic description information and data query sentences which are in one-to-one correspondence;
determining an original thinking chain corresponding to each sentence conversion data according to an original logic plan corresponding to each sentence conversion data, and determining an original training sample set corresponding to the sentence conversion data set according to each sentence conversion data and each original thinking chain;
Training an initial data query statement generation model through the original training sample set to obtain an intermediate data query statement generation model;
determining an optimized thinking chain corresponding to each sentence conversion data according to an optimized logic plan corresponding to each sentence conversion data;
inputting each optimized thinking chain into the intermediate data query sentence generation model to obtain an intermediate model output result corresponding to each optimized thinking chain, and determining an optimized training sample set corresponding to the sentence conversion data set according to each sentence conversion data, each original thinking chain, each optimized thinking chain and each intermediate model output result;
training the intermediate data query statement generation model through the optimized training sample set to obtain a data query statement generation model.
5. The method according to claim 4, wherein the determining an original thought chain corresponding to each sentence conversion data according to an original logic plan corresponding to each sentence conversion data, and determining an original training sample set corresponding to the sentence conversion data set according to each sentence conversion data and each original thought chain, comprises:
Determining an original logic plan corresponding to the data query statement in the statement conversion data aiming at each group of the statement conversion data;
performing logic segmentation on the original logic plan according to the dependency relationship, and determining an original logic segmentation result comprising an execution operation hierarchy sequence and an operation step hierarchy sequence;
determining an original thinking chain corresponding to the sentence conversion data according to the original logic segmentation result and the semantic description information, and determining an original training sample corresponding to the sentence conversion data according to the natural sentence, the original thinking chain and the data query sentence to obtain an original training sample set formed by the original training samples;
the original training samples comprise a first original training subsamples formed by the natural sentences and the original thinking chains and a second original training subsamples formed by the original thinking chains and the data query sentences.
6. The method of claim 5, wherein training the initial data query term generation model through the original training sample set to obtain an intermediate data query term generation model comprises:
Training an initial data query statement generation model through a first original training sub-sample set formed by the first original training sub-samples to obtain a stage intermediate data query statement generation model;
training the stage intermediate data query statement generation model through a second original training sub-sample set formed by the second original training sub-samples to obtain an intermediate data query statement generation model.
7. The method of claim 4, wherein determining an optimization thought chain corresponding to each of the sentence conversion data according to an optimization logic plan corresponding to each of the sentence conversion data comprises:
determining an optimized logic plan corresponding to the data query statement in the statement conversion data according to a preset optimized processing strategy aiming at each group of the statement conversion data;
performing logic segmentation on the optimized logic plan according to the dependency relationship, and determining an optimized logic segmentation result comprising an execution operation hierarchy sequence and an operation step hierarchy sequence;
and determining an optimized thinking chain corresponding to the statement conversion data according to the optimized logic segmentation result and the semantic description information.
8. The method according to claim 4, wherein the determining an optimized training sample set corresponding to the sentence conversion data set from each of the sentence conversion data, each of the original thought chains, each of the optimized thought chains, and each of the intermediate model output results includes:
Checking the output result of each intermediate model through a data query statement in corresponding statement conversion data, and if the output result of each intermediate model passes the checking, determining the output result of each intermediate model as an optimized data query statement corresponding to the statement conversion data;
determining an optimized training sample corresponding to the sentence conversion data according to natural sentences, the original thinking chain, the optimized thinking chain and the optimized data query sentences in the sentence conversion data so as to obtain an optimized training sample set formed by the optimized training samples;
the optimization training sample comprises a first optimization training subsamplewhich is composed of the natural sentences and the optimization thinking chains, a second optimization training subsamplewhich is composed of the optimization thinking chains and the optimization data query sentences, and a third optimization training subsamplewhich is composed of the original thinking chains and the optimization thinking chains.
9. The method of claim 8, wherein training the intermediate data query statement generation model by the optimized training sample set to obtain a data query statement generation model comprises:
Training the intermediate data query statement generation model through a first optimization training sub-sample set formed by the first optimization training sub-samples to obtain a first-stage data query statement generation model;
training the first-stage data query statement generation model through a second optimization training sub-sample set formed by the second optimization training sub-samples to obtain a second-stage data query statement generation model;
and training the second-stage data query statement generation model through a third optimization training sub-sample set formed by the third optimization training sub-samples to obtain a data query statement generation model.
10. A data query statement generation apparatus, comprising:
the demand statement acquisition module is used for acquiring a process display demand and a natural statement to be converted;
the instruction construction module is used for constructing a prompt word instruction according to the process display requirement;
the query sentence generation module is used for inputting the natural sentence to be converted into a pre-trained data query sentence generation model based on the prompt word instruction, and determining and outputting an intermediate generation result and a target data query sentence corresponding to the process display requirement according to a model output result;
The pre-trained data query statement generation model is a large generated language model obtained based on pre-built thinking chain training, and the pre-built thinking chain is a thinking chain obtained by logically segmenting a corresponding logic plan of the data query statement.
11. A data query statement generation device, characterized by comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the data query statement generation method of any one of claims 1 to 9.
12. A storage medium containing computer executable instructions which, when executed by a computer processor, are for performing the data query statement generation method of any of claims 1 to 9.
CN202311468107.4A 2023-11-06 2023-11-06 Data query statement generation method, device, equipment and storage medium Pending CN117453717A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311468107.4A CN117453717A (en) 2023-11-06 2023-11-06 Data query statement generation method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311468107.4A CN117453717A (en) 2023-11-06 2023-11-06 Data query statement generation method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117453717A true CN117453717A (en) 2024-01-26

Family

ID=89583314

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311468107.4A Pending CN117453717A (en) 2023-11-06 2023-11-06 Data query statement generation method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117453717A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114168619A (en) * 2022-02-09 2022-03-11 阿里巴巴达摩院(杭州)科技有限公司 Training method and device of language conversion model
CN116340584A (en) * 2023-05-24 2023-06-27 杭州悦数科技有限公司 Implementation method for automatically generating complex graph database query statement service
CN116756169A (en) * 2023-05-30 2023-09-15 淘宝(中国)软件有限公司 Method and computing device for interactively generating data report
CN116821168A (en) * 2023-08-24 2023-09-29 吉奥时空信息技术股份有限公司 Improved NL2SQL method based on large language model
CN116861921A (en) * 2023-07-10 2023-10-10 厦门大学 Robot task analysis method and device based on large language model and readable medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114168619A (en) * 2022-02-09 2022-03-11 阿里巴巴达摩院(杭州)科技有限公司 Training method and device of language conversion model
CN116340584A (en) * 2023-05-24 2023-06-27 杭州悦数科技有限公司 Implementation method for automatically generating complex graph database query statement service
CN116756169A (en) * 2023-05-30 2023-09-15 淘宝(中国)软件有限公司 Method and computing device for interactively generating data report
CN116861921A (en) * 2023-07-10 2023-10-10 厦门大学 Robot task analysis method and device based on large language model and readable medium
CN116821168A (en) * 2023-08-24 2023-09-29 吉奥时空信息技术股份有限公司 Improved NL2SQL method based on large language model

Similar Documents

Publication Publication Date Title
US11442932B2 (en) Mapping natural language to queries using a query grammar
US20190272296A1 (en) Natural Language Question Answering Systems
EP3958145A1 (en) Method and apparatus for semantic retrieval, device and storage medium
US10706045B1 (en) Natural language querying of a data lake using contextualized knowledge bases
CN112434535B (en) Element extraction method, device, equipment and storage medium based on multiple models
CN111104423B (en) SQL statement generation method and device, electronic equipment and storage medium
CN110222194B (en) Data chart generation method based on natural language processing and related device
CN112434024B (en) Relational database-oriented data dictionary generation method, device, equipment and medium
CN113806563A (en) Architect knowledge graph construction method for multi-source heterogeneous building humanistic historical material
CN114281968B (en) Model training and corpus generation method, device, equipment and storage medium
CN116383399A (en) Event public opinion risk prediction method and system
CN111859969A (en) Data analysis method and device, electronic equipment and storage medium
CN113779062A (en) SQL statement generation method and device, storage medium and electronic equipment
CN114625748A (en) SQL query statement generation method and device, electronic equipment and readable storage medium
CN115062617A (en) Task processing method, device, equipment and medium based on prompt learning
CN113360654B (en) Text classification method, apparatus, electronic device and readable storage medium
CN113157853B (en) Problem mining method, device, electronic equipment and storage medium
CN114238653A (en) Method for establishing, complementing and intelligently asking and answering knowledge graph of programming education
CN117077682A (en) Document analysis method and system based on semantic recognition
CN113379432B (en) Sales system customer matching method based on machine learning
CN117453717A (en) Data query statement generation method, device, equipment and storage medium
CN115827885A (en) Operation and maintenance knowledge graph construction method and device and electronic equipment
US11775757B2 (en) Automated machine-learning dataset preparation
CN113408298A (en) Semantic analysis method and device, electronic equipment and storage medium
CN111782781A (en) Semantic analysis method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination