CN118445300A - Query sentence rewriting method, rewriting platform, electronic device and storage medium - Google Patents
Query sentence rewriting method, rewriting platform, electronic device and storage medium Download PDFInfo
- Publication number
- CN118445300A CN118445300A CN202410439161.4A CN202410439161A CN118445300A CN 118445300 A CN118445300 A CN 118445300A CN 202410439161 A CN202410439161 A CN 202410439161A CN 118445300 A CN118445300 A CN 118445300A
- Authority
- CN
- China
- Prior art keywords
- query statement
- query
- rewritten
- statement
- rewriting
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 119
- 238000012549 training Methods 0.000 claims description 94
- 230000006870 function Effects 0.000 claims description 35
- 230000008569 process Effects 0.000 claims description 35
- 238000004590 computer program Methods 0.000 claims description 15
- 230000000052 comparative effect Effects 0.000 claims description 4
- 238000005516 engineering process Methods 0.000 abstract description 7
- 238000010586 diagram Methods 0.000 description 14
- 230000000694 effects Effects 0.000 description 9
- 238000012545 processing Methods 0.000 description 9
- 238000003058 natural language processing Methods 0.000 description 6
- 238000013461 design Methods 0.000 description 5
- 238000005457 optimization Methods 0.000 description 4
- 230000003993 interaction Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000007781 pre-processing Methods 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000013136 deep learning model Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000010845 search algorithm Methods 0.000 description 2
- 208000004547 Hallucinations Diseases 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013475 authorization Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application discloses a query statement rewriting method, a rewriting platform, electronic equipment and a storage medium, and relates to the technical field of large models. Wherein the method comprises the following steps: acquiring a query statement to be rewritten and task description, wherein the task description is used for representing a task describing the query statement to be rewritten for rewriting; searching a query sentence database based on the query sentence to be rewritten to obtain an example query sentence matched with the query sentence to be rewritten, wherein the similarity between the example query sentence and the query sentence to be rewritten is greater than the similarity between other query sentences except the example query sentence in the query sentence database and the query sentence to be rewritten; generating a statement rewriting task of a query statement to be rewritten based on the task description and the example query statement; and executing the statement rewriting task on the query statement to be rewritten to obtain the target query statement. The application solves the technical problem of lower rewriting efficiency of query sentences in the related technology.
Description
Technical Field
The application relates to the technical field of large models, in particular to a query statement rewriting method, a rewriting platform, electronic equipment and a storage medium.
Background
At present, the rewriting of structured query sentences guided by efficiency is a classical problem in database optimization research, and as some query sentences provided by users and staff may have a problem of low execution efficiency, how to rewrite the query sentences into equivalent and execute more efficient query sentences becomes a research capable of greatly improving the processing efficiency.
In view of the above problems, no effective solution has been proposed at present.
Disclosure of Invention
The embodiment of the application provides a query statement rewriting method, a rewriting platform, electronic equipment and a storage medium, which are used for at least solving the technical problem of low query statement rewriting efficiency in the related art.
According to an aspect of an embodiment of the present application, there is provided a query statement rewriting method, including: acquiring a query statement to be rewritten and task description, wherein the task description is used for representing a task describing the query statement to be rewritten for rewriting; searching a query sentence database based on the query sentence to be rewritten to obtain an example query sentence matched with the query sentence to be rewritten, wherein the similarity between the example query sentence and the query sentence to be rewritten is greater than the similarity between other query sentences except the example query sentence in the query sentence database and the query sentence to be rewritten; generating a statement rewriting task of a query statement to be rewritten based on the task description and the example query statement; and executing the statement rewriting task on the query statement to be rewritten to obtain the target query statement.
According to an aspect of an embodiment of the present application, there is provided a query statement rewriting method, including: acquiring a query statement to be rewritten and task description by calling a first interface, wherein the first interface comprises a first parameter, and a parameter value of the first parameter comprises the query statement to be rewritten and the task description which is used for representing the task description for rewriting the query statement to be rewritten; searching a query sentence database based on the query sentence to be rewritten to obtain an example query sentence matched with the query sentence to be rewritten, wherein the similarity between the example query sentence and the query sentence to be rewritten is greater than the similarity between other query sentences except the example query sentence in the query sentence database and the query sentence to be rewritten; generating a statement rewriting task of a query statement to be rewritten based on the task description and the example query statement; executing a statement rewriting task on the query statement to be rewritten to obtain a target query statement; and outputting the target query statement by calling a second interface, wherein the second interface comprises a second parameter, and the parameter value of the second parameter comprises the target query statement.
According to an aspect of an embodiment of the present application, there is provided an apparatus for rewriting a query sentence, including: the acquisition module is used for acquiring the query statement to be rewritten and task description, wherein the task description is used for representing a task describing the query statement to be rewritten for rewriting; the retrieval module is used for retrieving the query statement database based on the query statement to be rewritten to obtain an example query statement matched with the query statement to be rewritten, wherein the similarity between the example query statement and the query statement to be rewritten is greater than the similarity between other query statements except the example query statement in the query statement database and the query statement to be rewritten; the generation module is used for generating a statement rewriting task of the query statement to be rewritten based on the task description and the example query statement; and the execution module is used for executing the statement rewriting task on the query statement to be rewritten to obtain the target query statement.
According to an aspect of an embodiment of the present application, there is provided an apparatus for rewriting a query sentence, including: the acquisition module is used for acquiring the query statement to be rewritten and the task description by calling the first interface, wherein the first interface comprises a first parameter, and the parameter value of the first parameter comprises the query statement to be rewritten and the task description which is used for representing the task description for rewriting the query statement to be rewritten; the retrieval module is used for retrieving the query statement database to obtain an example query statement matched with the query statement to be rewritten, wherein the similarity between the example query statement and the query statement to be rewritten is greater than the similarity between other query statements except the example query statement in the query statement database and the query statement to be rewritten; the generation module is used for generating a statement rewriting task of the query statement to be rewritten based on the task description and the example query statement; the execution module is used for executing a statement rewriting task on the query statement to be rewritten to obtain a target query statement; and the output module is used for outputting the target query statement by calling a second interface, wherein the second interface comprises a second parameter, and the parameter value of the second parameter comprises the target query statement.
According to an aspect of an embodiment of the present application, there is provided a rewrite platform for a query statement, including: the retrieval node is used for acquiring the query statement to be rewritten and retrieving the query statement database based on the query statement to be rewritten to obtain an example query statement matched with the query statement to be rewritten, wherein the similarity between the example query statement and the query statement to be rewritten is greater than the similarity between other query statements except the example query statement in the query statement database and the query statement to be rewritten; the task generating node is used for acquiring task description and generating a statement rewriting task of the query statement to be rewritten based on the task description and the example query statement, wherein the task description is used for describing the task of rewriting the query statement to be rewritten; and the database executor is used for rewriting the query statement to be rewritten based on the statement rewriting task to obtain the target query statement.
According to another aspect of the embodiment of the present application, there is also provided a computer terminal including: a memory storing an executable program; and a processor for running a program, wherein the program when run performs the methods of the various embodiments of the present application.
According to another aspect of the embodiments of the present application, there is also provided a computer readable storage medium including a stored executable program, where the executable program when run controls a device in which the computer readable storage medium is located to perform the method in the embodiments of the present application.
According to another aspect of embodiments of the present application, there is also provided a computer program product comprising a computer program which, when executed by a processor, implements the methods of the various embodiments of the application.
According to another aspect of embodiments of the present application, there is also provided a computer program product comprising a non-volatile computer readable storage medium storing a computer program which, when executed by a processor, implements the method in the various embodiments of the application.
According to another aspect of embodiments of the present application, there is also provided a computer program which, when executed by a processor, implements the methods of the various embodiments of the application.
In the embodiment of the application, a query statement to be rewritten and task description are acquired, wherein the task description is used for representing a task describing the query statement to be rewritten for rewriting; searching a query sentence database based on the query sentence to be rewritten to obtain an example query sentence matched with the query sentence to be rewritten, wherein the similarity between the example query sentence and the query sentence to be rewritten is greater than the similarity between other query sentences except the example query sentence in the query sentence database and the query sentence to be rewritten; generating a statement rewriting task of a query statement to be rewritten based on the task description and the example query statement; and executing the sentence rewriting task on the query sentence to be rewritten to obtain the target query sentence, thereby improving the rewriting efficiency of the query sentence. It is easy to note that the rewriting process of the query statement can be guided according to the example query statement matched with the query statement to be rewritten, so that the statement rewriting task generated according to the task description and the example query statement can be more attached to the meaning to be expressed by the query statement to be rewritten, the rewriting accuracy of the query statement to be rewritten is improved, the whole rewriting process does not need to be manually participated, the rewriting efficiency of the query statement to be rewritten can be further improved, and the technical problem that the rewriting efficiency of the query statement is lower in the related art is solved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application, as claimed.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
Fig. 1 is a schematic view of an application scenario of a method for rewriting query sentences according to an embodiment of the present application;
FIG. 2 is a flow chart of a method of rewriting a query statement according to embodiment 1 of the application;
FIG. 3 is a schematic diagram of a query statement rewriting system of a large language model in accordance with an embodiment of the application;
FIG. 4 is a schematic diagram of a comparative learning framework in accordance with an embodiment of the present application;
FIG. 5 is a schematic diagram of a lesson learning framework for smooth training in accordance with an embodiment of the present application;
FIG. 6 is a flow chart of a method of rewriting a query statement according to embodiment 2 of the application;
FIG. 7 is a schematic diagram of a query statement rewriting apparatus according to an embodiment of the application;
FIG. 8 is a schematic diagram of a query statement rewriting apparatus according to an embodiment of the application;
FIG. 9 is a schematic diagram of a rewrite platform for a query statement according to an embodiment of the application;
fig. 10 is a block diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The technical scheme provided by the application is mainly realized by adopting a large model technology, wherein the large model refers to a deep learning model with large-scale model parameters, and the deep learning model can generally contain hundreds of millions, billions, trillions and even billions of model parameters. The large Model can be called as a Foundation Model, a training Model is performed by using a large-scale unlabeled corpus, a pre-training Model with more than one hundred million parameters is produced, the Model can adapt to a wide downstream task, and the Model has better generalization capability, such as a large-scale language Model (Large Language Model, LLM), a multi-modal pre-training Model (multi-modal pre-training Model) and the like.
It should be noted that, when the large model is actually applied, the pretrained model can be finely tuned by a small number of samples, so that the large model can be applied to different tasks. For example, the large model can be widely applied to the fields of natural language processing (Natural Language Processing, abbreviated as NLP), computer vision, voice processing and the like, and can be particularly applied to the tasks of the fields of computer vision such as vision question-answering (Visual Question Answering, abbreviated as VQA), image description (IC), image generation and the like, and can also be widely applied to the tasks of the fields of natural language processing such as emotion classification based on text, text abstract generation, machine translation and the like. Thus, major application scenarios for large models include, but are not limited to, digital assistants, intelligent robots, searches, online education, office software, electronic commerce, intelligent design, and the like. In the embodiment of the application, the explanation is given by taking the data processing through a large language model in a database scene as an example,
First, partial terms or terminology appearing in the course of describing embodiments of the application are applicable to the following explanation:
large language model (Large Language Model): a deep learning-based natural language processing model. The large language model is characterized in that under the training of a large amount of text data, language structures and modes are learned, and the understanding, generating and processing capacities of the language are realized.
Structured query language (Structured Query Language, abbreviated as SQL) is a special purpose programming language, a database query and programming language, used to access data and query, update and manage relational database systems. Through SQL query statements, data operations on databases, including add, delete, update, query, and the like, can be implemented.
SQL statement execution efficiency (SQL query efficiency): the amount of time it takes to query the corresponding database for the desired content using a certain SQL query statement.
SQL statement rewrite (SQL REWRITE): by changing the format or the execution sequence of the SQL sentences, the method for improving the execution efficiency of the same SQL sentence and keeping the consistency of the results is achieved.
For the problem of improving the rewriting efficiency of query sentences, most of the existing methods currently surround the theory and logic proof of databases. Because of its high complexity and the need for experienced personnel, such conventional methods are less versatile and flexible and require more resources. With the gradual rise of the large language model (Large Language Model), the work of solving the query statement rewriting by using the large language model also gradually appears. However, the current stage of rewriting query statements with a large language model generates a flow based on the simplest sequence-to-sequence (Sequence to sequence). Such methods can be affected by spurious generation (hallucination) that can greatly reduce accuracy. Therefore, the application proposes to use a large model to provide a rewrite scheme for a database rewrite system, thereby guiding the database rewrite system to perform efficient query statement rewrite. The flow provided by the application is more automatic, the application robustness is higher, and the experimental effect is obviously improved compared with the traditional method.
Currently, with the rapid increase in the amount of user data, the currently popular Database management system (Database MANAGEMENT SYSTEM, abbreviated DBMS) becomes no longer easy in managing and handling queries. As databases and queries become more complex, it may take seconds or even minutes to execute a structured query statement. Thus, efficient query processing or query optimization becomes critical in modern database systems.
One key topic in query optimization, query rewrite, has attracted a great deal of attention. Formally, the goal of query rewrite is to output a new query equivalent to the target SQL query, while having a shorter execution time. In general, a better rewritten query should meet both the following basic requirements of executability and equivalence, where executability means that the rewritten query should be able to execute without error; equivalency refers to the fact that a rewritten query should return the same data as the original query. The executability and the equivalence are only basic standards of effective query rewrite, and the final goal of query rewrite is to improve two aspects of query efficiency, namely execution efficiency and computational efficiency, wherein the execution efficiency means that the query after the rewrite is executed should have lower delay than the original query is executed; computational efficiency means that the cost of the rewrite process should be acceptable compared to the saved query execution time.
In database management systems, the execution scheme of a query is a high-level representation of an SQL query execution plan, and is a main rewriting goal when improving query efficiency. The order and type of operators in the edit execution scheme not only can naturally meet the performability and equivalence criteria, but also can be more easily implemented. The existing method for finding a better rewrite strategy based on rules mainly follows two directions: a new rewrite rule is discovered or an existing rewrite rule is used. Most methods of discovering new rewrite rules involve complex attestation or manual operations that are effective only for certain types of queries, and are relatively computationally expensive and user-unfriendly. While recent approaches, such as implementing an automated process in the process of proposing new rewrite rules, current approaches only support limited operators or require user interaction using a predefined rule language. Providing limited rewrites means that execution efficiency may be achieved in only part of the query, on the other hand, human participation consumes more time and resources.
In contrast, attempting to rewrite a query using existing rules has a simpler, more stable flow of operations. A variety of existing rewrite rules from existing platforms may be utilized and a selection may be learned to apply such rules. However, since a Monte Carlo TREE SEARCH algorithm is used to find a better solution and a trained query term time-consuming estimation model is used to determine the rewrite rule to be selected, the computational efficiency of the search algorithm and possible errors in the time-consuming estimation model cause the inefficiency of execution of the method are major problems.
On the other hand, with the development of large language models (Large Language Models), there are also some "large language models for databases" items and studies that support direct query rewrites. The intuitive idea of these methods is to directly output a new rewritten query for an input query using the sequence-to-sequence generation capability of the language model, without considering any rewrite rules or DBMS information. While these methods may find new rewrites that do not follow any existing rules, they are susceptible to spurious generation that is not addressed by the language model, which gives plausible but erroneous output, especially for long and complex queries. Grammar or reference errors that occur during generation will cause significant errors in query execution. Thus, an output query that relies solely on a large language model may violate the query's executability and equivalence, deviating from the basic goal of query rewrite.
To remedy the deficiencies and combine the advantages of the existing query rewrite techniques, the present application proposes a query statement rewrite system enhanced with a large language model that uses the large language model to suggest rewrite schemes and applies those schemes to existing database platforms to rewrite incoming queries. Inspired by a tool learning framework which appears synchronously with the large language model, the application utilizes the powerful generalization and reasoning capability of the large language model and simultaneously avoids the problems of illusion and the like. Meanwhile, the database-based rewriting platform is used to ensure the query executable.
Example 1
According to an embodiment of the present application, there is provided a method of rewriting query statements, it being noted that the steps shown in the flowchart of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is shown in the flowchart, in some cases the steps shown or described may be performed in an order other than that shown.
Considering that the model parameters of the large model are huge and the operation resources of the mobile terminal are limited, the method for rewriting the query statement provided by the embodiment of the application can be applied to the application scenario shown in fig. 1, but is not limited to the application scenario. Fig. 1 is a schematic view of an application scenario of a query statement rewriting method according to an embodiment of the present application, in the application scenario shown in fig. 1, a large model is deployed in a server 10, and the server 10 may connect to one or more client devices 20 through a local area network connection, a wide area network connection, an internet connection, or other types of data networks, where the client devices 20 may include, but are not limited to: smart phones, tablet computers, notebook computers, palm computers, personal computers, smart home devices, vehicle-mounted devices and the like. The client device 20 can interact with a user through a graphical user interface to realize the invocation of the large model, thereby realizing the method provided by the embodiment of the application.
In an embodiment of the present application, a system formed by a client device and a server may perform the following steps: the client device performs the step of generating a query statement to be rewritten and a task description. The server executes and acquires the query statement to be rewritten and the task description; searching a query statement database based on the query statement to be rewritten to obtain an example query statement matched with the query statement to be rewritten; generating a statement rewriting task of a query statement to be rewritten based on the task description and the example query statement; and executing the statement rewriting task on the query statement to be rewritten to obtain the target query statement. It should be noted that, in the case that the operation resource of the client device can meet the deployment and operation conditions of the large model, the embodiment of the present application may be performed in the client device.
In the above-described operating environment, the present application provides a method for rewriting query statements as shown in fig. 2. Fig. 2 is a flowchart of a method of rewriting a query sentence according to embodiment 1 of the present application. As shown in fig. 2, the method may include the steps of:
Step S202, obtaining a query statement to be rewritten and task description;
Wherein the task description is used for representing the task for describing the query statement to be rewritten.
The above query statement to be rewritten may refer to a query statement that needs to be modified or improved in a database query. The original query statement has the reasons of performance problems, logic errors, new conditions or functions needing to be added and the like, and the original query statement needs to be rewritten to achieve a better query effect, so that a better query result is obtained. The query statement to be rewritten may be a structured query statement to be rewritten, or may be other types of query statements, where the types of query statements are not limited.
The query statement to be rewritten can also be any query statement used for querying the database, and the query statement is rewritten so that the rewritten query statement can better conform to the query rule of the database, thereby improving the query efficiency of the database.
The task description may be used to describe related information for rewriting the query sentence to be rewritten, and optionally, a section for rewriting the query sentence to be rewritten may be described in the task description, and a instructive opinion for rewriting the query sentence to be rewritten may be described in the task description. The specific content of the task description is not limited herein, and the task description may be determined according to the actual rewriting requirement of the query statement.
In an alternative embodiment, after the user inputs the query statement of the target database, whether the query statement is rewritten can be confirmed, if the query statement needs to be rewritten, the task description of the query statement to be rewritten can be received by the user, so that the rewriting process of the query statement to be rewritten is guided according to the task description, and therefore the rewriting accuracy of the query statement to be rewritten is improved.
Step S204, searching a query statement database based on the query statement to be rewritten to obtain an example query statement matched with the query statement to be rewritten;
wherein, the similarity between the example query statement and the query statement to be rewritten is greater than the similarity between other query statements except the example query statement in the query statement database and the query statement to be rewritten.
The query statement database contains a plurality of example query statements, and the example query statements in the query statement database can be preset query statements capable of efficiently querying the target database. The example query statement may also be a query statement with a better query effect obtained by screening from the historical query record, and the generation mode of the example query statement is not limited in any way.
The example query sentence matched with the query sentence to be rewritten may be an example query sentence with a higher similarity with the query sentence to be rewritten in the query sentence database. The method can be used for guiding the rewriting process of the query statement to be rewritten by searching the example query statement matched with the query statement to be rewritten, so that the target query statement obtained by rewriting can reach the basic target of the query statement.
Because the similarity between the example query statement and the query statement to be rewritten is greater than the similarity between other query statements except the example query statement in the query statement database and the query statement to be rewritten, the rewriting process of the query statement to be rewritten can be guided more effectively and accurately according to the example query statement, and the obtained target query statement can more accord with the original query intention.
In an alternative embodiment, the query statement database may be searched based on the query statement to be rewritten, so that an example query statement matched with the query statement to be rewritten is searched from the query statement database, a corresponding example query statement may be searched by traversing the query statement database, a corresponding example query statement may also be searched by a neural network model, a specific query manner is not limited herein, and an appropriate search manner may be selected according to actual requirements to search an example query statement matched with the query statement to be rewritten from the query statement database.
Step S206, generating a statement rewriting task of the query statement to be rewritten based on the task description and the example query statement;
In an alternative embodiment, firstly, the data query requirement related in the task description needs to be understood, then, according to the structure and grammar characteristics of the example query statement, the query statement to be rewritten is analyzed, reconstructed or adjusted by using natural language processing and semantic understanding technology, so that the statement rewriting task of the query statement to be rewritten is generated, the generated query statement to be rewritten needs to keep the semantic meaning and the data requirement of the original query statement, and meanwhile, the grammar specification and the query optimization principle of the database system need to be met, the rewritten query statement needs to be capable of correctly extracting the required data, and efficient query and processing are performed in the database. Alternatively, the effect of the statement rewrite task may be evaluated by comparing the query results before and after the rewrite with the performance.
In another alternative embodiment, a statement rewriting task of the query statement to be rewritten can be generated according to the task description and the example query statement, and the statement rewriting task is mainly used for guiding and suggesting a rewriting process of the query statement to be rewritten and rewriting is performed on the basis of the original query statement to be rewritten, so that rewriting accuracy of the query statement to be rewritten is improved, and the rewritten query statement can be close to the basic requirement to be achieved.
Step S208, executing a statement rewriting task on the query statement to be rewritten to obtain the target query statement.
The target query statement needs to satisfy two basic requirements, namely, the executable and the equivalence, wherein the executable means that the rewritten target query statement can execute the query action without error, and the equivalence means that the data obtained by querying the rewritten target query statement should be the same as the data obtained by querying the query statement to be rewritten.
Further, the executability and the equivalence are basic standards for rewriting the query statement, the final goal of the query statement rewriting is to improve two aspects of query efficiency, namely, execution efficiency and calculation efficiency, wherein the execution efficiency refers to that a target query statement obtained after a task of executing the query statement rewriting has lower delay than a query process of the query statement to be rewritten, and the calculation efficiency refers to that the cost of the rewriting process is acceptable compared with the saved query execution time.
In an alternative embodiment, the task of sentence rewriting can be executed on the query sentence to be rewritten so as to rewrite the query sentence to be rewritten through the task of sentence rewriting to obtain the target query sentence, alternatively, the task of sentence rewriting can be executed on the query sentence to be rewritten through an executor of the database, and after the target query sentence is obtained, the database is directly queried through the target query sentence to obtain the required data, thereby improving the query efficiency of the data.
Through the steps, obtaining the query statement to be rewritten and task description, wherein the task description is used for representing the task describing the query statement to be rewritten for rewriting; searching a query sentence database based on the query sentence to be rewritten to obtain an example query sentence matched with the query sentence to be rewritten, wherein the similarity between the example query sentence and the query sentence to be rewritten is greater than the similarity between other query sentences except the example query sentence in the query sentence database and the query sentence to be rewritten; generating a statement rewriting task of a query statement to be rewritten based on the task description and the example query statement; and executing the sentence rewriting task on the query sentence to be rewritten to obtain the target query sentence, thereby improving the rewriting efficiency of the query sentence. It is easy to note that the rewriting process of the query statement can be guided according to the example query statement matched with the query statement to be rewritten, so that the statement rewriting task generated according to the task description and the example query statement can be more attached to the meaning to be expressed by the query statement to be rewritten, the rewriting accuracy of the query statement to be rewritten is improved, the whole rewriting process does not need to be manually participated, the rewriting efficiency of the query statement to be rewritten can be further improved, and the technical problem that the rewriting efficiency of the query statement is lower in the related art is solved.
In the above embodiment of the present application, generating a statement rewriting task of a query statement to be rewritten based on a task description and an example query statement includes: the task description and the example query statement are input to the large language model to direct the large language model output statement to rewrite the task.
The large language model is an artificial intelligent model based on deep learning, and can understand and generate natural language texts. The large language model may receive a text input and then generate new text based on the entered instructions or hints, such as rewriting query sentences, which may be used to generate sentence rewriting tasks that may be used to instruct the query sentences to be rewritten by understanding the grammar, semantics, and logic of the language.
In an alternative embodiment, the task description and the example query statement may be input into a large language model, the large language model may generate a rewrite scheme for the query statement to be rewritten, that is, the statement rewrite task described above, according to the instruction of the task description and referring to the example query statement, so that the rewrite efficiency of the query statement to be rewritten may be further improved, and the accuracy of the statement rewrite task may be improved by generating the statement rewrite task through the instruction of the task description and referring to the example query statement, so that the accuracy of the obtained target query statement may be further improved.
In the above embodiment of the present application, searching a query sentence database based on a query sentence to be rewritten to obtain an example query sentence matched with the query sentence to be rewritten, includes: and inputting the query sentence to be rewritten into a comparison learning model to obtain an example query sentence matched with the query sentence to be rewritten from the query sentence database by the comparison learning model.
The contrast learning model is mainly used for retrieving an example query sentence matched with a query sentence to be rewritten from a query sentence database.
In an alternative embodiment, an example query sentence with the highest similarity to the query sentence to be rewritten can be selected from the query sentence database through the training model, so that a reference is provided for the process of generating the sentence rewriting task for the large language model, and the rewriting method of the rewriting task can be more fit with the original query sentence, so that the rewriting effect is improved.
In the above embodiment of the present application, the method further includes: obtaining a sample query statement to be rewritten, a positive example sample query statement of the sample query statement to be rewritten and a negative example sample query statement of the sample query statement to be rewritten, wherein the similarity between the positive example sample query statement and the sample query statement to be rewritten is larger than a first preset similarity, the similarity between the negative example sample query statement and the sample query statement to be rewritten is smaller than a second preset similarity, and the first preset similarity is larger than the second preset similarity; and updating model parameters of the comparison learning model based on the sample query statement to be rewritten, the positive example sample query statement and the negative example sample query statement.
The forward example sample query statement may be a sample query statement having a high similarity to the sample query statement to be rewritten. The negative example sample query statement described above may be a sample query statement that has a low similarity to the sample query statement to be rewritten.
The first preset similarity may be a preset higher similarity, and the second preset similarity may be a preset lower similarity.
In an alternative embodiment, a sample query statement to be rewritten, a positive example sample query statement of the sample query statement to be rewritten and a negative example sample query statement of the sample query statement to be rewritten may be obtained, and the similarity between the sample query statement to be rewritten and the positive example sample query statement may be pulled up, so that the similarity between the sample query statement to be rewritten and the positive example sample query statement may be higher; the similarity between the sample query statement to be rewritten and the negative example sample query statement may be pulled far so that the similarity between the sample query statement to be rewritten and the negative example sample query statement may be lower. Therefore, the retrieval effect of the comparison learning model is improved, and the comparison learning model can retrieve the example query sentence which is most matched with the query sentence to be rewritten from a plurality of example query sentences with little similarity difference.
In the above embodiment of the present application, updating model parameters of a comparative learning model based on a sample query statement to be rewritten, a positive example sample query statement, and a negative example sample query statement includes: inputting the sample query sentence to be rewritten and the forward sample query sentence into a comparison learning model to obtain a first initial similarity of the sample query sentence to be rewritten and the forward sample query sentence output by the comparison learning model; inputting the sample query statement to be rewritten and the negative example sample query statement into a comparison learning model to obtain a second initial similarity of the sample query statement to be rewritten and the negative example sample query statement output by the comparison learning model; constructing a loss function based on the first initial similarity and the second initial similarity; and updating model parameters of the comparison learning model by using the loss function.
In an alternative embodiment, the sample query sentence to be rewritten and the positive sample query sentence can be input into the comparison learning model to obtain a first initial similarity, the sample query sentence to be rewritten and the negative sample query sentence can be input into the comparison learning model to obtain a second initial similarity, a loss function is constructed through the first initial similarity and the second initial similarity, the similarity between the sample query sentence to be rewritten and the positive sample query sentence can be improved, the similarity between the sample query sentence to be rewritten and the negative sample query sentence can be reduced, model parameters of the comparison learning model can be updated according to the constructed loss function, so that a subsequently obtained comparison learning model can output a result with higher similarity to the sample query sentence with higher similarity, and a result with lower similarity to the sample query sentence with lower similarity can be output, and therefore the similarity between each sample query sentence and the sample query sentence to be rewritten can be better distinguished.
In the above embodiment of the present application, constructing the loss function based on the first initial similarity and the second initial similarity includes: constructing a forward loss function based on an error between the first initial similarity and a first preset similarity; constructing a negative loss function based on an error between the second initial similarity and a second preset similarity; a loss function is determined based on the positive loss function and the negative loss function.
The forward loss function is used for indicating that the similarity between the sample query statement to be rewritten and the forward sample query statement is further improved when the similarity between the sample query statement to be rewritten and the forward sample query statement is determined, so that the purpose of improving the similarity is achieved.
The negative loss function is used for further reducing the similarity between the sample query statement to be rewritten and the negative example sample query statement when the similarity between the sample query statement to be rewritten and the negative example query statement is determined later, so that the purpose of reducing the similarity is achieved.
The first preset similarity is larger than the first initial similarity, and the second preset similarity is smaller than the second initial similarity.
The similarity between sample query sentences with higher similarity is improved by respectively constructing a positive loss function and a negative loss function, and the similarity between sample query sentences with lower similarity is further reduced.
In the above embodiment of the present application, obtaining a sample query statement to be rewritten, a positive sample query statement of the sample query statement to be rewritten, and a negative sample query statement of the sample query statement to be rewritten includes: acquiring a plurality of groups of training data and comparing the current training stage of the learning model; determining target training data from a plurality of groups of training data based on the current training stage, wherein the target training data is at least one group of training data which does not participate in the contrast learning model training process in the plurality of groups of training data; sample query statements to be rewritten, positive example sample query statements, and negative example sample query statements are determined from the target training data.
In an alternative embodiment, the training data in the training data set may be grouped according to a training phase to obtain multiple sets of training data.
The current training phase may be any training phase of the comparative learning model.
The number of the plurality of sets of training data may be determined according to a training round, for example, one training round uses one set of training data for training, or one training round uses a plurality of sets of training data for training, and a specific training mode is not limited.
The target training data may be training data that is easier to learn than the learning model in the current training stage, and may also be training data that meets other training conditions, where the target training data is not specifically limited.
In an alternative embodiment, at least one set of training data matched with the current training stage, that is, the target training data, may be determined from multiple sets of training data according to the current training stage, after the current training stage is trained using the target training data, other training data except the target training data may be used in subsequent other training stages, and in the last training stage, the remaining training data may be added to the training stage, so that the comparison learning model may be smoother under limited data, and training rounds may be effectively reduced.
In the above embodiment of the present application, the method further includes: outputting an example query statement matched with the query statement to be rewritten; receiving an adjustment query statement corresponding to an example query statement, wherein the query statement to be adjusted is a statement obtained by adjusting the example query statement; and generating a statement rewriting task of the query statement to be rewritten based on the task description and the adjustment query statement.
In an alternative embodiment, an example query statement matched with the query statement to be rewritten can be output to a client of a user, so that the user can check whether the example query statement needs to be adjusted through the client, if the example query statement needs to be adjusted, the user can adjust the example query statement to obtain an adjusted query statement, and a statement rewriting task of the query statement to be rewritten is generated based on the adjusted query statement and task description, so that accuracy of the statement rewriting task is improved. If the user considers that the example query statement is accurate, the example query statement may not need to be adjusted, or the example query statement may be fine-tuned to obtain an adjusted query statement, so as to generate a statement rewriting task of the query statement to be rewritten according to the adjusted query statement and the task description.
In the above embodiment of the present application, the method further includes: outputting a sentence rewriting task; receiving an adjustment rewriting task obtained by adjusting the sentence rewriting task; and rewriting the query statement to be rewritten based on the adjustment rewriting task to obtain the target query statement.
In an alternative embodiment, the sentence rewriting task may be output to the client of the user, so that the user can check whether the sentence rewriting task is accurate through the client, and if the user considers that the sentence rewriting task is inaccurate, the sentence rewriting task may be adjusted to obtain the adjustment rewriting task. If the user considers that the sentence rewriting task is accurate, the sentence rewriting task does not need to be adjusted, or the sentence rewriting task is trimmed to obtain an adjustment rewriting task, so that the rewriting operation is performed on the query sentence to be rewritten according to the sentence rewriting task or the adjustment rewriting task, and the target query sentence is obtained.
The application comprises three parts, wherein the first part is a query sentence automatic rewriting system combined with a large language model, the second part is to select a query sentence with a proper example for the large language model through a comparison learning technology, so as to assist the large language model to generate a rewriting task, and the third part is to enable the comparison learning process to be smoother and more efficient under limited data through a course learning technology, so that the large language model achieves a better effect.
Fig. 3 is a schematic diagram of a Query term rewriting system of a large language model according to an embodiment of the present application, as shown in fig. 3, a Query term to be rewritten (Input Query) may be Input, a fixed instruction (Fixed Instructions) may be Selected from a system instruction (System Instructions) according to a candidate rule (CANDIDATE RULES), a rewriting paradigm (One-shot Demonstration) may be Selected from an exemplary pool (Demonstration Pool), a term rewriting method may be given according to the provided task description and the rewriting paradigm using the large language model (Large Language Model), a rule (Selected Rules) may be Selected, for example, an aggregate term MERGE (AGGREGATE _process_merge), a suggestion (Propose) may be made to a database-based rule executor (DB-Based Rule Executer) according to the Selected rule given by the large model, thereby rewriting (Rewrite) the Query term, outputting a rewritten Query term, and the term structure of the thickened portion is successfully rewritten by the database platform through the large language model generated scheme as shown below.
The query statement to be rewritten may be:
select o_orderpriority,count(*)as order_count from orders
where
o_orderdate>=date‘1993-05-01’and o_orderdate<date‘1993-05-01’+interval‘3’month
and exists(
select from lineitem
where
l_orderkey=o_orderkey and l_commidate<l_receiptdate
)
group by
o_orderpriority
order by
o_orderpriority;
the rewritten query statement may be:
select t.o_orderpriority,count(*)as order_count from(
select*fromorders
where
o_orderdate>=date‘1993-05-01’and o_orderdate<(date‘1993-05-01’+interval‘3’month)
)as t inner join(
select l_orderkey,TRUE as to from lineitem
where
l_commidate<l_receiptdate
)as t1 on t.o_orderkey=t1.l_orderkey
group by
t.o_orderpriority
order by
t.o_orderpriority;
FIG. 4 is a schematic diagram of a comparison learning framework according to an embodiment of the present application, as shown in FIG. 4, in order to improve the quality of a rewrite method provided by a large predictive model, a rewrite system is selected to provide a rewrite example to guide a large language model to rewrite, a positive query sentence and a negative query sentence obtained through preprocessing can be paired with a query sentence to be rewritten, in comparison learning, the model can be trained to pull up the similarity between the query sentence to be rewritten and the positive query sentence, and simultaneously pull up the similarity between the query sentence to be rewritten and the negative query sentence, by which the model can be trained to select, for a given query sentence, an example query sentence that is most similar to the given query sentence from a sample sentence library, Thus, flexible and fitting input example query sentences are provided for the large language model, and the rewriting effect is improved. Specifically, in the Original query (Original Queries), t1.a is selected as a from t1, and t2.b is selected as B from t2, where t1.c0=t2.c1 (selected t1.a as a, t2.b as B from t1, t2where t1.c0=t2.c1); T3.a is selected as a from the t1 internal join t2 at t1.d=t2.d, where t3.m= 'M' limit100 (select t3.a as Afrom t inner joint t2 as t3 on t 1.d=t2.d where t 3.m= 'M' limit 100), in the comparison query (Contrastive Queries), the improvement demonstration (Improve Demos) may be to select t1.a as a from t1, t2.b as B from t2, Where t1.d=t2.d (selected t1. A. As a, t2.B as B from t1, t2where t1. D=t2. D), it is also possible to select t3.A as a from t1 internal junction t2 at t1.b=t2. B, where t3.m= 'N' limit100 (selected t3.A as a from t1 inner joint t2 as t3 on t1. B=t2.b where t 3.m= 'N' limit 100), the regression demonstration (REGRESS DEMES) may be to select t1.a as a, t2.b as B (select t1.aas a, t2.b as B from t1 inner joint t2 on t 1.d=t2.dasc) from t1 inner joint t2 when t1.d=t2.dasc, or to select the count (t3.a) from t1 inner joint t5 when t1.b=t5.b, where t1.m= 'N' and t5.k='K'limit100(select count(t3.a)from t1 inner join t5 as t3 on t1.b=t5.b where t1.m='N'and t5.k='K'limit100).
Further, a forward example and a reverse example may be determined from the comparison query, where the forward example query statement may be t1.A as a from t1 and t2.B as B from t2, where t1.d=t2.d, and the reverse example query statement may be t3.A as a from t1 inner junction t2 when t1.b=t2.b, where t3.m= 'N' limit100; it is also possible to select t1.A as a and t2.B as B from t1 inner join t2 at t1.d=t2.dasc; it is also possible to select a count (t 3. A) from t1 internal junction t5 at t1.b=t5.b, where t1.m= 'N' and t5.k= 'K' limit100.
Because of the limited pre-processing data, a course learning framework can be designed to make the contrast learning process smoother and more efficient under the limited data, after N sets of training data are obtained from the pre-processing, the contrast learning model can be trained in K stages, N/K sets of data which are easier for the current stage of the contrast learning model can be selected from the unselected training data in different stages and added to the training data of the stage to train the contrast learning model, the process can be repeated until the last stage adds the rest data to train, the process can make the contrast learning process smoother under the limited data, and fewer training rounds can be used. In training data, the learning method combining contrast learning and course learning can provide flexible and fitting input example query sentences for a large language model, so that the rewriting effect of the query sentences to be rewritten is improved.
FIG. 5 is a schematic diagram of a course learning framework for smooth training according to an embodiment of the present application, as shown in FIG. 5, in selecting a suitable example query sentence from training data, training data may be selected by combining a comparison selector (Contrastive Selector) and a course design method (Generate Curriculum), the comparison selector may be a comparison learning model, training data corresponding to the training phase may be selected in different training phases, so that the comparison selector obtained by training finds the suitable example query sentence, and a large language model may instruct a database executor to rewrite the query sentence according to the example query sentence in an example manager, so as to rewrite the query sentence to be rewritten.
The course design method is a method for training a machine learning model, and helps the model learn and generalize more effectively by gradually increasing the difficulty and complexity of training data. For example, if a language model needs to be trained to generate news headlines. When using the course design method, the model can be first made to learn to generate simple news headlines, such as ' today's clear weather ', and then gradually increase difficulty, so that the model can be made to learn to generate more complex and challenging news headlines, such as ' scientists find new materials '. The model can be better suitable for news headlines of different types by gradually increasing the difficulty of training samples, and the accuracy and the diversity of the generation are improved. Such training methods can help the model better understand and learn different types of data, improving its generalization ability. Through course learning technology, the model can gradually improve the self ability in the training process, and the problem of difficult learning or overfitting caused by too complex and difficult data is avoided. The method can help the model to be better generalized to new data, and improve the performance and stability of the model.
In the application, when the course design method is applied, the comparison selector can be trained by selecting relatively simple training data from the training data, and in the subsequent training stage, the comparison selector can be continuously trained by adding relatively difficult training data, so that the model is helped to gradually improve the capability of selecting example query sentences in the training process, and the accuracy of the model is improved.
The large language model can self-infer and propose a rewriting scheme, and is more fit with query sentences and specific data compared with the time-consuming estimation method of the traditional database. The efficiency of the rewritten query statement output by the rewriting system is obviously superior to that of other rewriting systems. In addition, the universality of the rewriting system can be improved by combining a large language model, query statement rewriting can be performed on different databases and different data, the large language model is not limited by time-consuming estimation of the statement, a more flexible rewriting scheme can be provided, and compared with the traditional method, the method has higher efficiency improvement, and more flexible rewriting schemes can be provided to achieve better rewriting quality.
It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region, and provide corresponding operation entries for the user to select authorization or rejection.
It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present application.
From the description of the above embodiments, it will be clear to a person skilled in the art that the method according to the above embodiments may be implemented by means of software plus a necessary general hardware platform, but that it may also be implemented by means of hardware. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method of the various embodiments of the present application.
Example 2
There is also provided, in accordance with an embodiment of the present application, a method of rewriting query statements, it being noted that the steps shown in the flowchart of the figures may be performed in a computer system, such as a set of computer executable instructions, and, although a logical order is shown in the flowchart, in some cases, the steps shown or described may be performed in an order other than this.
Fig. 6 is a flowchart of a query statement rewriting method according to embodiment 2 of the present application, and as shown in fig. 6, the method includes the steps of:
Step S602, acquiring a query statement to be rewritten and task description by calling a first interface;
the first interface comprises a first parameter, and the parameter value of the first parameter comprises a query statement to be rewritten and a task description used for representing the task description for rewriting the query statement to be rewritten.
The first interface may be an interface for performing data interaction between the cloud server and the client, and the client may transmit the query statement to be rewritten and the task description into an interface function as a first parameter of the interface function, so as to achieve the purpose of uploading the query statement to be rewritten and the task description to the cloud server.
Step S604, searching a query statement database based on the query statement to be rewritten to obtain an example query statement matched with the query statement to be rewritten;
The similarity between the example query statement and the query statement to be rewritten is greater than the similarity between other query statements except the example query statement in the query statement database and the query statement to be rewritten;
Step S606, generating a statement rewriting task of the query statement to be rewritten based on the task description and the example query statement;
Step S608, executing a statement rewriting task on the query statement to be rewritten to obtain a target query statement;
Step S610, outputting the target query statement by calling the second interface.
The second interface comprises a second parameter, and the parameter value of the second parameter comprises a target query statement.
The second interface may be an interface for performing data interaction between the cloud server and the client, where the cloud server may transmit the target query statement into the interface function as a second parameter of the interface function, so as to achieve the purpose of issuing the target query statement to the client.
Through the steps, the query statement to be rewritten and the task description are obtained by calling the first interface, wherein the first interface comprises a first parameter, and the parameter value of the first parameter comprises the query statement to be rewritten and the task description which is used for representing the task description for rewriting the query statement to be rewritten; searching a query sentence database based on the query sentence to be rewritten to obtain an example query sentence matched with the query sentence to be rewritten, wherein the similarity between the example query sentence and the query sentence to be rewritten is greater than the similarity between other query sentences except the example query sentence in the query sentence database and the query sentence to be rewritten; generating a statement rewriting task of a query statement to be rewritten based on the task description and the example query statement; executing a statement rewriting task on the query statement to be rewritten to obtain a target query statement; and outputting the target query statement by calling the second interface, wherein the second interface comprises a second parameter, and the parameter value of the second parameter comprises the target query statement, so that the rewriting efficiency of the query statement is improved. It is easy to note that the rewriting process of the query statement can be guided according to the example query statement matched with the query statement to be rewritten, so that the statement rewriting task generated according to the task description and the example query statement can be more attached to the meaning to be expressed by the query statement to be rewritten, the rewriting accuracy of the query statement to be rewritten is improved, the whole rewriting process does not need to be manually participated, the rewriting efficiency of the query statement to be rewritten can be further improved, and the technical problem that the rewriting efficiency of the query statement is lower in the related art is solved.
It should be noted that, the preferred embodiment of the present application in the above examples is the same as the embodiment provided in example 1, the application scenario and the implementation process, but is not limited to the embodiment provided in example 1.
Example 3
According to an embodiment of the present application, there is further provided a query statement rewriting apparatus for implementing the foregoing query statement rewriting method, and fig. 7 is a schematic diagram of a query statement rewriting apparatus according to an embodiment of the present application, as shown in fig. 7, where the apparatus 700 includes: an acquisition module 702, a retrieval module 704, a generation module 706, an execution module 708.
The acquisition module is used for acquiring a query statement to be rewritten and task description, wherein the task description is used for representing a task describing the query statement to be rewritten for rewriting; the retrieval module is used for retrieving the query statement database based on the query statement to be rewritten to obtain an example query statement matched with the query statement to be rewritten, wherein the similarity between the example query statement and the query statement to be rewritten is greater than the similarity between other query statements except the example query statement in the query statement database and the query statement to be rewritten; the generating module is used for generating a statement rewriting task of the query statement to be rewritten based on the task description and the example query statement; the execution module is used for executing a statement rewriting task on the query statement to be rewritten to obtain the target query statement.
It should be noted that, the above-mentioned obtaining module 702, retrieving module 704, generating module 706, and executing module 708 correspond to steps S202 to S208 in embodiment 1, and the four modules are the same as the examples and application scenarios implemented by the corresponding steps, but are not limited to the disclosure of the above-mentioned embodiment one. It should be noted that the above modules or units may be hardware components or software components stored in a memory and processed by one or more processors, or the above modules may be executed as a part of the apparatus in the server 10 provided in embodiment 1.
In the above embodiment of the present application, the generating module is further configured to input the task description and the example query sentence into the large language model, so as to instruct the large language model output sentence to rewrite the task.
In the above embodiment of the present application, the search module is further configured to input the query sentence to be rewritten into the comparison learning model, so as to obtain an example query sentence matched with the query sentence to be rewritten from the query sentence database by using the comparison learning model.
In the above embodiment of the present application, the apparatus further includes: and updating the module.
The acquisition module is further used for acquiring a sample query statement to be rewritten, a positive example sample query statement of the sample query statement to be rewritten and a negative example sample query statement of the sample query statement to be rewritten, wherein the similarity between the positive example sample query statement and the sample query statement to be rewritten is larger than the first initial similarity, the similarity between the negative example sample query statement and the sample query statement to be rewritten is smaller than the second initial similarity, and the first initial similarity is larger than the second initial similarity; the updating module is further used for updating model parameters of the comparison learning model based on the sample query statement to be rewritten, the positive example sample query statement and the negative example sample query statement.
In the above embodiment of the present application, the update module is further configured to input a sample query sentence to be rewritten and a forward sample query sentence into the comparison learning model, so as to obtain a first initial similarity of the sample query sentence to be rewritten and the forward sample query sentence output by the comparison learning model; inputting the sample query statement to be rewritten and the negative example sample query statement into a comparison learning model to obtain a second initial similarity of the sample query statement to be rewritten and the negative example sample query statement output by the comparison learning model; constructing a loss function based on the first initial similarity and the second initial similarity; and updating model parameters of the comparison learning model by using the loss function.
In the above embodiment of the present application, the update module is further configured to construct a forward loss function based on an error between the first initial similarity and the first preset similarity; constructing a negative loss function based on an error between the second initial similarity and a second preset similarity; a loss function is determined based on the positive loss function and the negative loss function.
In the above embodiment of the present application, the obtaining module is further configured to obtain multiple sets of training data and compare a current training stage of the learning model; determining target training data from a plurality of groups of training data based on the current training stage, wherein the target training data is at least one group of training data which does not participate in the contrast learning model training process in the plurality of groups of training data; sample query statements to be rewritten, positive example sample query statements, and negative example sample query statements are determined from the target training data.
In the above embodiment of the present application, the apparatus further includes: and the output module and the receiving module.
The output module is used for outputting an example query statement matched with the query statement to be rewritten; the receiving module is used for receiving an adjustment query statement corresponding to the example query statement, wherein the query statement to be adjusted is a statement obtained by adjusting the example query statement; and the generating module is used for generating a statement rewriting task of the query statement to be rewritten based on the task description and the adjustment query statement.
In the above embodiment of the present application, the apparatus further includes: and (5) rewriting the module.
The output module is used for outputting a sentence rewriting task; the receiving module is used for receiving an adjustment rewriting task obtained by adjusting the sentence rewriting task; the rewriting module is used for rewriting the query statement to be rewritten based on the adjustment rewriting task to obtain the target query statement.
It should be noted that, the preferred embodiment of the present application in the above examples is the same as the embodiment provided in example 1, the application scenario and the implementation process, but is not limited to the embodiment provided in example 1.
Example 4
According to an embodiment of the present application, there is further provided a query statement rewriting apparatus for implementing the foregoing query statement rewriting method, and fig. 8 is a schematic diagram of a query statement rewriting apparatus according to an embodiment of the present application, as shown in fig. 8, where the apparatus 800 includes: an acquisition module 802, a retrieval module 804, a generation module 806, an execution module 808, and an output module 810.
The acquisition module is used for acquiring the query statement to be rewritten and the task description by calling the first interface, wherein the first interface comprises a first parameter, and the parameter value of the first parameter comprises the query statement to be rewritten and the task description which is used for representing the task description for rewriting the query statement to be rewritten; the retrieval module is used for retrieving the query statement database based on the query statement to be rewritten to obtain an example query statement matched with the query statement to be rewritten, wherein the similarity between the example query statement and the query statement to be rewritten is greater than the similarity between other query statements except the example query statement in the query statement database and the query statement to be rewritten; the generating module is used for generating a statement rewriting task of the query statement to be rewritten based on the task description and the example query statement; the execution module is used for executing a statement rewriting task on the query statement to be rewritten to obtain a target query statement; the output module is used for outputting the target query statement by calling a second interface, wherein the second interface comprises a second parameter, and the parameter value of the second parameter comprises the target query statement.
It should be noted that, the above-mentioned obtaining module 802, retrieving module 804, generating module 806, executing module 808, and outputting module 810 correspond to steps S602 to S610 in embodiment 2, and the five modules are the same as the examples and application scenarios implemented by the corresponding steps, but are not limited to the disclosure of the above-mentioned embodiment one. It should be noted that the above modules or units may be hardware components or software components stored in a memory and processed by one or more processors, or the above modules may be executed as a part of the apparatus in the server 10 provided in embodiment 1.
It should be noted that, the preferred embodiment of the present application in the above examples is the same as the embodiment provided in example 1, the application scenario and the implementation process, but is not limited to the embodiment provided in example 1.
Example 5
There is further provided, in accordance with an embodiment of the present application, a rewrite platform for a query statement including a rewrite apparatus for a query statement, fig. 9 is a schematic diagram of a rewrite platform for a query statement according to an embodiment of the present application, as shown in fig. 9, the platform 900 including: a retrieval node 902, a task generation node 904, and a database executor 906.
The retrieval node is used for acquiring the query statement to be rewritten and retrieving the query statement database based on the query statement to be rewritten to obtain an example query statement matched with the query statement to be rewritten, wherein the similarity between the example query statement and the query statement to be rewritten is greater than the similarity between other query statements except the example query statement in the query statement database and the query statement to be rewritten; the task generating node is used for acquiring task description and generating a statement rewriting task of the query statement to be rewritten based on the task description and the example query statement, wherein the task description is used for describing the task of rewriting the query statement to be rewritten; and the database executor is used for rewriting the query statement to be rewritten based on the statement rewriting task to obtain the target query statement.
Example 6
Embodiments of the present application may provide an electronic device, which may be any one of a group of electronic devices. Alternatively, in this embodiment, the electronic device may be replaced by a terminal device such as a mobile terminal.
Alternatively, in this embodiment, the electronic device may be located in at least one network device of a plurality of network devices of the computer network.
In this embodiment, the computer terminal may execute the program code in the method.
Alternatively, fig. 10 is a block diagram of an electronic device according to an embodiment of the present application. As shown in fig. 10, the electronic device a may include: one or more (only one is shown) processors 102, memory 104, memory controller, and peripheral interfaces, where the peripheral interfaces are connected to the radio frequency module, audio module, and display.
The memory may be used to store software programs and modules, such as program instructions/modules corresponding to the methods and apparatuses in the embodiments of the present application, and the processor executes the software programs and modules stored in the memory, thereby performing various functional applications and data processing, that is, implementing the methods in the embodiments described above. The memory ss memory may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid state memory. In some examples, the memory may further include memory remotely located with respect to the processor, which may be connected to terminal a through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The processor may call the information and the application program stored in the memory through the transmission device to perform the following steps: acquiring a query statement to be rewritten and task description, wherein the task description is used for representing a task describing the query statement to be rewritten for rewriting; searching a query sentence database based on the query sentence to be rewritten to obtain an example query sentence matched with the query sentence to be rewritten, wherein the similarity between the example query sentence and the query sentence to be rewritten is greater than the similarity between other query sentences except the example query sentence in the query sentence database and the query sentence to be rewritten; generating a statement rewriting task of a query statement to be rewritten based on the task description and the example query statement; and executing the statement rewriting task on the query statement to be rewritten to obtain the target query statement.
The processor may call the information and the application program stored in the memory through the transmission device to perform the following steps: acquiring a query statement to be rewritten and task description by calling a first interface, wherein the first interface comprises a first parameter, and a parameter value of the first parameter comprises the query statement to be rewritten and the task description which is used for representing the task description for rewriting the query statement to be rewritten; searching a query sentence database based on the query sentence to be rewritten to obtain an example query sentence matched with the query sentence to be rewritten, wherein the similarity between the example query sentence and the query sentence to be rewritten is greater than the similarity between other query sentences except the example query sentence in the query sentence database and the query sentence to be rewritten; generating a statement rewriting task of a query statement to be rewritten based on the task description and the example query statement; executing a statement rewriting task on the query statement to be rewritten to obtain a target query statement; and outputting the target query statement by calling a second interface, wherein the second interface comprises a second parameter, and the parameter value of the second parameter comprises the target query statement.
By adopting the embodiment of the application, the query statement to be rewritten and the task description are obtained, wherein the task description is used for representing the task for describing the query statement to be rewritten to rewrite; searching a query sentence database based on the query sentence to be rewritten to obtain an example query sentence matched with the query sentence to be rewritten, wherein the similarity between the example query sentence and the query sentence to be rewritten is greater than the similarity between other query sentences except the example query sentence in the query sentence database and the query sentence to be rewritten; generating a statement rewriting task of a query statement to be rewritten based on the task description and the example query statement; and executing the sentence rewriting task on the query sentence to be rewritten to obtain the target query sentence, thereby improving the rewriting efficiency of the query sentence. It is easy to note that the rewriting process of the query statement can be guided according to the example query statement matched with the query statement to be rewritten, so that the statement rewriting task generated according to the task description and the example query statement can be more attached to the meaning to be expressed by the query statement to be rewritten, the rewriting accuracy of the query statement to be rewritten is improved, the whole rewriting process does not need to be manually participated, the rewriting efficiency of the query statement to be rewritten can be further improved, and the technical problem that the rewriting efficiency of the query statement is lower in the related art is solved.
It will be appreciated by those skilled in the art that the structure shown in fig. 10 is merely illustrative, and the electronic device may be a terminal device such as a smart phone (e.g. an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, a Mobile internet device (Mobile INTERNET DEVICES, MID), a PAD, etc. Fig. 10 is not limited to the structure of the electronic device. For example, electronic device A may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in FIG. 10, or have a different configuration than shown in the figure.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of the above embodiments may be implemented by a program for instructing a terminal device to execute in association with hardware, the program may be stored in a computer readable storage medium, and the storage medium may include: flash disk, read-Only Memory (ROM), random-access Memory (Random Access Memory, RAM), magnetic disk or optical disk, etc.
Example 7
Embodiments of the present application also provide a computer-readable storage medium. Alternatively, in the present embodiment, the computer-readable storage medium may be used to store the program code executed by the method provided in the above embodiment.
Alternatively, in this embodiment, the storage medium may be located in any one of the electronic devices in the group of electronic devices in the computer network, or in any one of the mobile terminals in the group of mobile terminals.
Optionally, in the present embodiment, the computer readable storage medium is configured to store program code for performing the steps of: acquiring a query statement to be rewritten and task description by calling a first interface, wherein the first interface comprises a first parameter, and a parameter value of the first parameter comprises the query statement to be rewritten and the task description which is used for representing the task description for rewriting the query statement to be rewritten; searching a query sentence database based on the query sentence to be rewritten to obtain an example query sentence matched with the query sentence to be rewritten, wherein the similarity between the example query sentence and the query sentence to be rewritten is greater than the similarity between other query sentences except the example query sentence in the query sentence database and the query sentence to be rewritten; generating a statement rewriting task of a query statement to be rewritten based on the task description and the example query statement; executing a statement rewriting task on the query statement to be rewritten to obtain a target query statement; and outputting the target query statement by calling a second interface, wherein the second interface comprises a second parameter, and the parameter value of the second parameter comprises the target query statement.
By adopting the embodiment of the application, the query statement to be rewritten and the task description are obtained, wherein the task description is used for representing the task for describing the query statement to be rewritten to rewrite; searching a query sentence database based on the query sentence to be rewritten to obtain an example query sentence matched with the query sentence to be rewritten, wherein the similarity between the example query sentence and the query sentence to be rewritten is greater than the similarity between other query sentences except the example query sentence in the query sentence database and the query sentence to be rewritten; generating a statement rewriting task of a query statement to be rewritten based on the task description and the example query statement; and executing the sentence rewriting task on the query sentence to be rewritten to obtain the target query sentence, thereby improving the rewriting efficiency of the query sentence. It is easy to note that the rewriting process of the query statement can be guided according to the example query statement matched with the query statement to be rewritten, so that the statement rewriting task generated according to the task description and the example query statement can be more attached to the meaning to be expressed by the query statement to be rewritten, the rewriting accuracy of the query statement to be rewritten is improved, the whole rewriting process does not need to be manually participated, the rewriting efficiency of the query statement to be rewritten can be further improved, and the technical problem that the rewriting efficiency of the query statement is lower in the related art is solved.
Example 8
Embodiments of the present application also provide a computer program product. Alternatively, in the present embodiment, the computer program product may comprise a computer program which, when executed by a processor, implements the method provided by the above embodiment.
Example 9
Embodiments of the present application also provide a computer program product. Alternatively, the computer program product may comprise a non-volatile computer readable storage medium, which may be used for storing a computer program, which when executed by a processor implements the method provided by the above embodiments.
Example 10
Embodiments of the present application also provide a computer program. Optionally, in this embodiment, the above-mentioned computer program, when executed by a processor, implements the method provided in the above-mentioned embodiment.
The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
In the foregoing embodiments of the present application, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed technology may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, such as the division of the units, is merely a logical function division, and may be implemented in another manner, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a usb disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application, which are intended to be comprehended within the scope of the present application.
Claims (14)
1. A method for rewriting a query statement, comprising:
Acquiring a query statement to be rewritten and task description, wherein the task description is used for representing and describing a task for rewriting the query statement to be rewritten;
Searching a query sentence database based on the query sentence to be rewritten to obtain an example query sentence matched with the query sentence to be rewritten, wherein the similarity between the example query sentence and the query sentence to be rewritten is greater than the similarity between other query sentences except the example query sentence in the query sentence database and the query sentence to be rewritten;
generating a statement rewriting task of the query statement to be rewritten based on the task description and the example query statement;
And executing the statement rewriting task on the query statement to be rewritten to obtain a target query statement.
2. The method of claim 1, wherein generating a statement rewrite task for the query statement to be rewritten based on the task description and the example query statement comprises:
The task description and the example query statement are input to a large language model to direct the large language model to output the statement rewriting task.
3. The method of claim 1, wherein retrieving the query statement database based on the query statement to be rewritten results in an example query statement that matches the query statement to be rewritten, comprising:
and inputting the query sentence to be rewritten into a comparison learning model to obtain the example query sentence matched with the query sentence to be rewritten from the query sentence database by the comparison learning model.
4. A method according to claim 3, characterized in that the method further comprises:
Obtaining a sample query statement to be rewritten, a positive example sample query statement of the sample query statement to be rewritten and a negative example sample query statement of the sample query statement to be rewritten, wherein the similarity between the positive example sample query statement and the sample query statement to be rewritten is greater than a first preset similarity, the similarity between the negative example sample query statement and the sample query statement to be rewritten is less than a second preset similarity, and the first preset similarity is greater than the second preset similarity;
And updating model parameters of the comparison learning model based on the sample query statement to be rewritten, the positive example sample query statement and the negative example sample query statement.
5. The method of claim 4, wherein updating model parameters of the comparative learning model based on the sample query statement to be rewritten, the positive example sample query statement, and the negative example sample query statement comprises:
inputting the sample query statement to be rewritten and the forward sample query statement to be rewritten into the comparison learning model to obtain a first initial similarity of the sample query statement to be rewritten and the forward sample query statement output by the comparison learning model;
inputting the sample query statement to be rewritten and the negative example sample query statement into the comparison learning model to obtain a second initial similarity of the sample query statement to be rewritten and the negative example sample query statement output by the comparison learning model;
constructing a loss function based on the first initial similarity and the second initial similarity;
and updating the model parameters of the comparison learning model by using the loss function.
6. The method of claim 5, wherein constructing a loss function based on the first initial similarity, the second initial similarity, comprises:
Constructing a forward loss function based on an error between the first initial similarity and a first preset similarity;
constructing a negative loss function based on the error between the second initial similarity and a second preset similarity;
the loss function is determined based on the positive loss function and the negative loss function.
7. The method of claim 4, wherein obtaining a sample query statement to be rewritten, a positive example sample query statement of the sample query statement to be rewritten, and a negative example sample query statement of the sample query statement to be rewritten comprises:
acquiring multiple groups of training data and the current training stage of the comparison learning model;
Determining target training data from the plurality of sets of training data based on the current training stage, wherein the target training data is at least one set of training data which does not participate in the comparison learning model training process in the plurality of sets of training data;
and determining the sample query statement to be rewritten, the positive example sample query statement and the negative example sample query statement from the target training data.
8. The method according to any one of claims 1 to 7, further comprising:
Outputting an example query statement matched with the query statement to be rewritten;
Receiving an adjustment query statement corresponding to the example query statement, wherein the query statement to be adjusted is a statement obtained by adjusting the example query statement;
and generating a statement rewriting task of the query statement to be rewritten based on the task description and the adjustment query statement.
9. The method according to any one of claims 1 to 7, further comprising:
Outputting the statement rewriting task;
receiving an adjustment rewriting task obtained by adjusting the sentence rewriting task;
And rewriting the query statement to be rewritten based on the rewrite adjustment task to obtain the target query statement.
10. A query statement rewrite platform, comprising:
the search node is used for acquiring a query statement to be rewritten and searching a query statement database based on the query statement to be rewritten to obtain an example query statement matched with the query statement to be rewritten, wherein the similarity between the example query statement and the query statement to be rewritten is greater than the similarity between other query statements except the example query statement in the query statement database and the query statement to be rewritten;
a task generating node, configured to obtain a task description, and generate a statement rewriting task of the query statement to be rewritten based on the task description and the example query statement, where the task description is used to describe a task for rewriting the query statement to be rewritten;
And the database executor is used for rewriting the query statement to be rewritten based on the statement rewriting task to obtain a target query statement.
11. A method for rewriting a query statement, comprising:
Acquiring a query statement to be rewritten and task description by calling a first interface, wherein the first interface comprises a first parameter, and a parameter value of the first parameter comprises the query statement to be rewritten and the task description which is used for representing the task description for rewriting the query statement to be rewritten;
Searching a query sentence database based on the query sentence to be rewritten to obtain an example query sentence matched with the query sentence to be rewritten, wherein the similarity between the example query sentence and the query sentence to be rewritten is greater than the similarity between other query sentences except the example query sentence in the query sentence database and the query sentence to be rewritten;
generating a statement rewriting task of the query statement to be rewritten based on the task description and the example query statement;
executing the statement rewriting task on the query statement to be rewritten to obtain a target query statement;
and outputting the target query statement by calling a second interface, wherein the second interface comprises a second parameter, and the parameter value of the second parameter comprises the target query statement.
12. An electronic device, comprising:
a memory storing an executable program;
a processor for executing the program, wherein the program when run performs the method of any of claims 1 to 11.
13. A computer readable storage medium, characterized in that the computer readable storage medium comprises a stored executable program, wherein the executable program when run controls a device in which the computer readable storage medium is located to perform the method according to any one of claims 1 to 11.
14. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 11.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410439161.4A CN118445300A (en) | 2024-04-11 | 2024-04-11 | Query sentence rewriting method, rewriting platform, electronic device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410439161.4A CN118445300A (en) | 2024-04-11 | 2024-04-11 | Query sentence rewriting method, rewriting platform, electronic device and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN118445300A true CN118445300A (en) | 2024-08-06 |
Family
ID=92307903
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410439161.4A Pending CN118445300A (en) | 2024-04-11 | 2024-04-11 | Query sentence rewriting method, rewriting platform, electronic device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118445300A (en) |
-
2024
- 2024-04-11 CN CN202410439161.4A patent/CN118445300A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3671526B1 (en) | Dependency graph based natural language processing | |
US11281864B2 (en) | Dependency graph based natural language processing | |
CN106649742A (en) | Database maintenance method and device | |
US20240338414A1 (en) | Inter-document attention mechanism | |
CN118170894B (en) | Knowledge graph question-answering method, knowledge graph question-answering device and storage medium | |
CN111159381B (en) | Data searching method and device | |
CN112507089A (en) | Intelligent question-answering engine based on knowledge graph and implementation method thereof | |
CN117725183A (en) | Reordering method and device for improving retrieval performance of AI large language model | |
CN118133972B (en) | Content retrieval generation method and device based on knowledge graph and storage medium | |
CN117271558A (en) | Language query model construction method, query language acquisition method and related devices | |
CN117932086A (en) | Method and system for reducing illusion of large language model by using external knowledge base check | |
CN116974554A (en) | Code data processing method, apparatus, computer device and storage medium | |
CN118364087A (en) | Database-based retrieval enhancement and question-answering method and system | |
CN118132732A (en) | Enhanced search user question and answer method, device, computer equipment and storage medium | |
CN117851445A (en) | Large language model Text2SQL chart generation method and device | |
CN116450855A (en) | Knowledge graph-based reply generation strategy method and system for question-answering robot | |
CN115146118B (en) | Information retrieval method, device, equipment and storage medium | |
CN116049376A (en) | Method, device and system for retrieving and replying information and creating knowledge | |
CN116304347A (en) | Git command recommendation method based on crowd-sourced knowledge | |
CN118445300A (en) | Query sentence rewriting method, rewriting platform, electronic device and storage medium | |
CN115858723A (en) | Query graph generation method and system for complex knowledge base question answering | |
CN114579605A (en) | Table question-answer data processing method, electronic equipment and computer storage medium | |
CN116414940A (en) | Standard problem determining method and device and related equipment | |
Gao et al. | Modular rag: Transforming rag systems into lego-like reconfigurable frameworks | |
CN118626626B (en) | Information processing method, apparatus, device, storage medium, and computer program product |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |