CN116701437A

CN116701437A - Data conversion method, data conversion system, electronic device, and readable storage medium

Info

Publication number: CN116701437A
Application number: CN202310980534.4A
Authority: CN
Inventors: 吕桓雪; 李昊阳; 李剑楠; 苏鹏; 黄炎; 陈书俊
Original assignee: Shanghai Aikesheng Information Technology Co ltd
Current assignee: Shanghai Aikesheng Information Technology Co ltd
Priority date: 2023-08-07
Filing date: 2023-08-07
Publication date: 2023-09-05
Anticipated expiration: 2043-08-07
Also published as: CN116701437B

Abstract

The invention provides a data conversion method, a data conversion system, an electronic device and a readable storage medium, comprising the following steps: acquiring a user query problem; selecting keywords in the user query questions; aiming at each keyword, acquiring a keyword vector corresponding to the keyword; obtaining candidate word vectors with vector distances within a preset range from the keyword vectors in a database, and obtaining candidate fields corresponding to all the candidate word vectors; selecting a first preset number of carefully chosen fields from the candidate fields by using a large language model; for each carefully chosen field, acquiring all candidate data corresponding to the carefully chosen field from the database; selecting a second preset number of carefully selected data corresponding to the query question from the candidate data by using a large language model; and generating an output result according to the carefully selected data. The invention can combine a large language model and a vector retrieval technology, and improves the accuracy and efficiency of data conversion.

Description

Data conversion method, data conversion system, electronic device, and readable storage medium

Technical Field

The present invention relates to database technologies, and in particular, to a data conversion method, a data conversion system, an electronic device, and a readable storage medium.

Background

Natural language is recognized as the best way of interaction in many fields. There is no general model available to connect natural language and arbitrary fields. If a relational database can be linked through natural language, the user, whether or not proficient with the SQL query language, will be able to simplify much of the existing work. With the rise of deep learning technology, a great deal of work for researching a natural language connection relation type database begins to appear.

The SQL language is the primary query language for relational databases currently in use. The mapping of natural language to SQL can be considered a semantic parsing problem (Andreas, vlachos et al, 2013). Semantic parsing is a long-standing and widely studied problem in Natural Language Processing (NLP). Therefore, it has attracted considerable attention in academia and industry, particularly the conversion of natural language into SQL queries. In the current age, from the financial, electronic business to medical fields, large amounts of data are stored in relational databases. In database query processes, users typically make query requests using natural language. However, converting natural language directly into executable SQL queries is a challenging task.

Text2SQL is the conversion of queries in human language (e.g., english) into database query language (SQL). The traditional Text2SQL method has limitation in processing complex or semantically fuzzy queries through word questions and answers, so that the method cannot accurately convert into correct SQL query sentences. Therefore, a new method and system is needed to improve the accuracy and efficiency of Text2 SQL.

Disclosure of Invention

The invention aims to provide a data conversion method, a data conversion system, electronic equipment and a readable storage medium using a large language model, which are combined with the large language model and a vector retrieval technology to improve the accuracy and efficiency of data conversion and can be effectively applied to a Text2SQL scene.

In order to achieve the above object, the present invention provides a data conversion method, comprising the steps of: acquiring a user query problem; selecting keywords in the user query questions; aiming at each keyword, acquiring a keyword vector corresponding to the keyword; obtaining candidate word vectors with the distance within a preset range from the keyword vectors in a database, and obtaining candidate fields corresponding to all the candidate word vectors; selecting a first preset number of carefully chosen fields from the candidate fields by using a large language model; for each carefully chosen field, acquiring all candidate data corresponding to the carefully chosen field from the database; selecting a second preset number of carefully selected data corresponding to the query question from the candidate data by using a large language model; and generating an output result according to the carefully selected data.

Optionally, the selecting, using a large language model, a second preset number of selected data corresponding to the query question from the candidate data specifically includes: dividing all the candidate data into a plurality of data groups according to a third preset quantity; selecting a fourth preset number of selected data in each data group by using a large language model; summarizing the beneficiation data in each of the data groups to obtain the second preset number of beneficiation data.

Optionally, the candidate data and/or the data form of the carefully selected data includes a table in a string format.

Optionally, using a large language model, selecting a second preset number of selected data corresponding to the query question from the candidate data, including: the circulation steps are as follows: selecting intermediate data from the candidate data by using a large language model, and taking the intermediate data as updated candidate data; and circularly executing the circulation step for preset times until the updated candidate data quantity reaches the second preset quantity, and taking the updated candidate data at the moment as the carefully selected data.

Optionally, the generating the output result according to the carefully chosen data specifically includes: and generating SQL sentences corresponding to the query questions according to the carefully chosen data by using a large language model.

Optionally, the generating the output result according to the carefully chosen data specifically includes: and generating the output result according to a preset prompt word instruction by using a large language model.

Optionally, before each use of the large language model, all the step contents before the use of the large language model are input to the large language model as dialogue histories.

In order to achieve the above object, the present invention further provides a data conversion system, which is applied to any one of the above data conversion methods, including: the acquisition module is used for acquiring the user inquiry problem; the selecting module is used for selecting the keywords in the query questions; a processing module for performing at least one of the following steps: aiming at each keyword, acquiring a keyword vector corresponding to the keyword; obtaining candidate word vectors with the distance within a preset range from the keyword vectors in a database, and obtaining candidate fields corresponding to all the candidate word vectors; selecting a first preset number of carefully chosen fields from the candidate fields by using a large language model; for each carefully chosen field, acquiring all candidate data corresponding to the carefully chosen field from the database; selecting a second preset number of carefully selected data corresponding to the query question from the candidate data by using a large language model; and generating an output result according to the carefully selected data.

To achieve the above object, the present invention also provides an electronic device including: a memory storing a computer program; a processor communicatively coupled to the memory for executing any one of the data conversion methods described above when the computer program is invoked; and the display is in communication connection with the processor and the memory and is used for displaying a GUI interactive interface related to the data conversion method.

To achieve the above object, the present invention also provides a readable storage medium storing a computer program, characterized in that: the computer program, when executed by a processor, implements the data conversion method of any of the above.

The data conversion method, the data conversion system, the electronic equipment and the readable storage medium using the large language model provided by the invention have the following beneficial effects:

the data conversion method provided by the invention comprises the following steps: acquiring a user query problem; selecting keywords in the user query questions; aiming at each keyword, acquiring a keyword vector corresponding to the keyword; obtaining candidate word vectors with vector distances within a preset range from the keyword vectors in a database, and obtaining candidate fields corresponding to all the candidate word vectors; selecting a first preset number of carefully chosen fields from the candidate fields by using a large language model; for each carefully chosen field, acquiring all candidate data corresponding to the carefully chosen field from the database; selecting a second preset number of carefully selected data corresponding to the query question from the candidate data by using a large language model; and generating an output result according to the carefully selected data.

When the invention is used, a user can obtain an output result corresponding to a requirement format, such as a Text2SQL scene, only by using a common language habit to input a query problem. Because the combination of keywords in the common language dispatch sentence making is complex, the invention can firstly extract the keywords in the query problem according to the preset algorithm, then independently search the vector of each keyword, search the fields similar to the keywords based on the principle of word vectors, and then screen the fields through a large language model. The fields screened out at this time may still correspond to a large amount of data in the database (for example, each field may correspond to a large amount of tables in a string format, and most of these tables are actually irrelevant to the query problem), at this time, a large language model is further used to screen out the tables related to the query problem, and finally select data corresponding to the query problem is obtained and efficiently and accurately converted into SQL statements. By means of the method, the device and the system, the task of data conversion is disassembled, and the accuracy and the efficiency of data conversion, particularly the accuracy and the efficiency of Text2SQL, can be effectively improved by combining the advantages of a large language model and a vector retrieval technology.

Because the data conversion system and the data conversion method provided by the invention belong to the same invention conception, the data conversion system can disassemble the task of data conversion, and can effectively improve the accuracy and efficiency of data conversion by combining the advantages of a large language model and a vector retrieval technology.

Because the electronic equipment and the data conversion method provided by the invention belong to the same invention conception, the data conversion system can disassemble the task of data conversion, and can effectively improve the accuracy and efficiency of data conversion by combining the advantages of a large language model and a vector retrieval technology.

Because the readable storage medium and the data conversion method provided by the invention belong to the same invention conception, the data conversion system can disassemble the task of data conversion, and can effectively improve the accuracy and efficiency of data conversion by combining the advantages of a large language model and a vector retrieval technology.

Drawings

Fig. 1 is a flow chart of a data conversion method according to an embodiment of the invention.

Fig. 2 is a schematic block diagram of an electronic device according to an embodiment of the invention.

Wherein the reference numerals are as follows:

a 101-processor; 102-a communication interface; 103-memory; 104-a communication bus; 105-display.

Detailed Description

The invention will be described in further detail with reference to the drawings and the specific embodiments thereof in order to make the objects, advantages and features of the invention more apparent. It should be noted that the drawings are in a very simplified form and are not drawn to scale, merely for convenience and clarity in aiding in the description of embodiments of the invention. Furthermore, the structures shown in the drawings are often part of actual structures. In particular, the drawings are shown with different emphasis instead being placed upon illustrating the various embodiments.

It will be understood that when an element or layer is referred to as being "on" or "connected to" another element or layer, it can be directly on, connected to, or intervening elements or layers may be present. In contrast, when an element is referred to as being "directly on" …, "directly connected to" another element or layer, there are no intervening elements or layers present. Although the terms first, second, third, etc. may be used herein to describe various elements, components, regions, layers and/or sections, these elements, components, regions, layers and/or sections should not be limited by these terms. These terms are only used to distinguish one element, component, region, layer or section from another element, component, region, layer or section. Thus, a first element, component, region, layer or section discussed below could be termed a second element, component, region, layer or section without departing from the teachings of the present invention. Spatially relative terms, such as "under … …," "below," "lower," "above … …," "upper," and the like, may be used herein for convenience of description to describe one element or feature's relationship to another element or feature as illustrated in the figures. It will be understood that the spatially relative terms are intended to encompass different orientations of the device in use and operation in addition to the orientation depicted in the figures. For example, if the device in the figures is turned over, elements or features described as "under" … … "," below "and" beneath "would then be oriented" on "other elements or features. The device may be otherwise oriented (rotated 90 degrees or other orientations) and the spatially relative descriptors used herein interpreted accordingly. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, elements, components, and/or groups. As used herein, the term "and/or" includes any and all combinations of the associated listed items.

The large language model (English: large Language Model, abbreviated LLM), also known as large language model, is an artificial intelligence model, intended to understand and generate human language. They train on a large amount of text data and can perform a wide range of tasks including text summarization, translation, emotion analysis, and so forth. LLMs are characterized by a large scale, containing billions of parameters, which help them learn complex patterns in linguistic data. These models are typically based on deep learning architectures, such as converters, which help them to achieve impressive performance on various NLP tasks.

Text2SQL (Structured Query Language) is a Natural Language Processing (NLP) technique that aims to convert natural language queries into executable SQL query statements. It allows users to use natural language to propose database queries and automatically convert them into SQL statements that the database can understand and execute. Text2SQL has applications in many fields, particularly in the fields of database interfaces and intelligent assistants.

As an illustrative example, in the medical health field, a medical database is assumed that contains information about patients, doctors, diagnoses, and treatments. Using Text2SQL techniques at this point, a doctor or researcher can access information in the database using natural language queries without having to learn and write complex SQL query statements. For example, a doctor may make the following query: "find patients with diabetes in the last year and list their name, age and treatment regimen. The "Text2SQL system will parse the query question and generate the appropriate SQL query statement, such as: "SELECT name, age, treatmentplan FROM patients WHERE diagnosis = 'diabetes' AND admissiondate > DATESUB (CURRENTDATE ()", INTERVAL 1 YEAR) ".

Through Text2SQL technology, doctors can easily use their own familiar natural language to make database queries without knowing the database structure and SQL grammar. This provides a convenient way for non-technical professionals to interact with the database in a more intuitive and flexible way. However, the conventional Text2SQL method has limitations in processing complex or semantically ambiguous query problems through word questions, resulting in an inability to accurately convert into correct SQL query statements. Therefore, a new method and system is needed to improve the accuracy and efficiency of Text2 SQL.

In order to achieve the above objective, the present invention provides a data conversion method, please refer to fig. 1, fig. 1 is a flow chart of a data conversion method according to an embodiment of the present invention. As shown in fig. 1, the present invention includes the steps of:

acquiring a user query problem;

selecting keywords in the user query questions;

aiming at each keyword, acquiring a keyword vector corresponding to the keyword;

obtaining candidate word vectors with vector distances within a preset range from the keyword vectors in a database, and obtaining candidate fields corresponding to all the candidate word vectors;

selecting a first preset number of carefully chosen fields from the candidate fields by using a large language model;

for each carefully chosen field, acquiring all candidate data corresponding to the carefully chosen field from the database;

selecting a second preset number of carefully selected data corresponding to the query question from the candidate data by using a large language model;

and generating an output result according to the carefully selected data.

When the invention is used, a user can obtain an output result corresponding to a requirement format, such as a Text2SQL scene, only by using a common language habit to input a query problem. Since the combination of keywords in conventional language-specific sentence-making can be complex, especially in some complex or semantically ambiguous query problems. The invention can firstly extract the keywords in the query problem according to the preset algorithm, then independently search the vector of each keyword, search the fields similar to the keywords based on the principle of word vectors, and then screen the fields through a large language model. The fields screened out at this time may still correspond to a large amount of data in the database (for example, each field may correspond to a large amount of tables in a string format, and most of these tables are actually irrelevant to the query problem), at this time, a large language model is further used to screen out the tables related to the query problem, and finally, the selected data is implemented and efficiently and accurately converted into the SQL statement. By means of the method, the device and the system, the task of data conversion is disassembled, and the accuracy and the efficiency of data conversion, particularly the accuracy and the efficiency of Text2SQL, can be effectively improved by combining the advantages of a large language model and a vector retrieval technology.

It should be understood that the vector distances include, but are not limited to, euclidean distances.

It should be noted that, the selecting the keywords in the user query question may be based on a preset screening algorithm to establish a screening logic to select, for example, the english question "What's the most popular project in github (What is the most popular item in the gitub platform)", may set to extract only the table and the scholars as keywords, for example, popular project github, but not limited thereto.

Specifically, the candidate data and/or the data format of the culled data includes a table (e.g., an SQL database table) in a string format.

For example: the choice field corresponding to "project" is "team" and the term "team" may correspond to a plurality of tables in the database, i.e., a plurality of tables may contain the field "team", one of the tables is, for example, as follows:

col : team | county | wins | years won

row 1 : greystones | wicklow | 1 | 2011

row 2 : ballymore eustace | kildare | 1 | 2010

row 3 : maynooth | kildare | 1 | 2009

it can be seen that if the english problem "What's the most popular project in github" in the above embodiment is still corresponded, the problem of this form study is the winning situation of the team, and not the team corresponding to the most popular item in the gitsub platform, it can be seen that this form does not correspond to the query problem and is filtered out by the large language model.

After the candidate data is obtained, the data volume of the candidate data is very huge, for example, various tables with huge quantities can be obtained, if the candidate data is input to a large language model at one time, the screening effect and the screening efficiency are poor, so batch input is required, and the following technical scheme is provided based on the invention: selecting a second preset number of carefully chosen data corresponding to the query question from the candidate data by using a large language model, wherein the method specifically comprises the following steps of:

dividing all the candidate data into a plurality of data groups according to a third preset quantity;

selecting a fourth preset number of selected data in each data group by using a large language model;

summarizing the beneficiation data in each of the data groups to obtain the second preset number of beneficiation data.

Preferably, a large language model is used to select a second preset number of selected data corresponding to the query question from the candidate data, which specifically includes:

the circulation steps are as follows: selecting intermediate data from the candidate data by using a large language model, and taking the intermediate data as updated candidate data;

and circularly executing the circulation step for preset times until the updated candidate data quantity reaches the second preset quantity, and taking the updated candidate data at the moment as the carefully selected data. Thus, the invention iteratively interacts with the user to gradually narrow the candidate range. Specifically, the iterative candidate vector may be obtained by introducing a multi-round dialogue mechanism, and it should be noted that the number of rounds of multi-round dialogue is not limited to this.

It should be understood that, with the large language model, the concept of prompt word (prompt) engineering should be utilized to generate an output result according to a preset prompt word instruction. In an exemplary embodiment, the template for generating the instruction of the prompting word corresponding to the output result according to the carefully chosen data is as follows:

please answer the user's questions concisely and professionally based on the database table structure represented by the following strings.

If no answer is available from this, please say "the question cannot be answered based on the known information" or "sufficient relevant information is not provided". No braiding component is allowed to be added to the answer. In addition, the answer requests use Chinese.

Database table structure: (the database table corresponding to the selected data is input therein)

Problems: (here, user query questions are entered).

It should be noted that, when introducing the multi-round dialogue mechanism, in order to make the large language model more accurately contact the context, the dialogue history should be provided for the large language model as comprehensively as possible, so the following technical scheme is provided based on the present invention: before each large language model is used, the contents of all steps before the large language model is used are input into the large language model as dialogue histories.

In summary, the invention provides an optimization scheme for task disassembly multi-round dialogue, combines a large language model, vector retrieval and prompt word engineering, compensates the defect of single question-answering by task disassembly and automatically executing multi-round dialogue in a system, better processes complex semantics, and effectively improves the accuracy and efficiency of Text2 SQL. Essentially, by utilizing task disassembly of the system and each sub-query generation strategy, finer query sentences can be provided for complex queries, and the query effect is further improved.

The inventor performs a comparison experiment with the prior art according to the technical principle of the invention, and experimental data are as follows:

experimental environment:

operating system: ubuntu 20.04.5 LTS

CPU：Intel(R) Xeon(R) Platinum 8362 CPU @ 2.80GHz

GPU：NVIDIA A30

Data set: spider open source dataset

In addition to the above resources, other resources are not limited.

	T5-base	T5-large	T5-3B	The invention is that
					Avg	58.12	66.63	71.76	78.98

Therefore, the invention can obviously improve the accuracy and efficiency of Text2 SQL.

In order to achieve the above object, the present invention further provides a data conversion system, which is applied to any one of the above data conversion methods, including:

the acquisition module is used for acquiring the user inquiry problem;

the selecting module is used for selecting the keywords in the query questions;

a processing module for performing at least one of the following steps:

obtaining candidate word vectors with the distance within a preset range from the keyword vectors in a database, and obtaining candidate fields corresponding to all the candidate word vectors;

and generating an output result according to the carefully selected data.

In order to achieve the above objective, the present invention further provides an electronic device, please refer to fig. 2, and fig. 2 is a block structure schematic diagram of the electronic device according to an embodiment of the present invention. As shown in fig. 2, the electronic device includes:

a memory 103 storing a computer program;

a processor 101, communicatively coupled to the memory, for executing the data conversion method of any of the above when the computer program is invoked;

and a display 105 communicatively coupled to the processor and the memory for displaying a GUI interactive interface associated with the data conversion method.

As shown in fig. 2, the electronic device further comprises a communication interface 102 and a communication bus 104, wherein the processor 101, the communication interface 102, and the memory 103 communicate with each other via the communication bus 104. The communication bus 104 may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, among others. The communication bus 104 may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus. The communication interface 102 is used for communication between the electronic device and other devices.

The processor 101 of the present invention may be a central processing unit (Central Processing Unit, CPU), other general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, and the processor 101 is a control center of the electronic device, and connects various parts of the entire electronic device using various interfaces and lines.

The memory 103 may be used to store the computer program, and the processor 101 may implement various functions of the electronic device by running or executing the computer program stored in the memory 103 and invoking data stored in the memory 103.

The memory 103 may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

To achieve the above object, the present invention provides a readable storage medium storing a computer program which, when executed by a processor, implements the data conversion method as set forth in any one of the above. Because the readable storage medium provided by the invention and the data conversion method described above belong to the same inventive concept, the readable storage medium provided by the invention has all the advantages of the data conversion method described above, so the beneficial effects of the readable storage medium provided by the invention are not repeated here.

The readable storage media of embodiments of the present invention may take the form of any combination of one or more computer-readable media. The readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer hard disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

It should be noted that, in the present description, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different manner from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the system disclosed in the embodiment, the description is relatively simple because of corresponding to the method disclosed in the embodiment, and the relevant points refer to the description of the method section.

It should be further noted that although the present invention has been disclosed in the preferred embodiments, the above embodiments are not intended to limit the present invention. Many possible variations and modifications of the disclosed technology can be made by anyone skilled in the art without departing from the scope of the technology, or the technology can be modified to be equivalent. Therefore, any simple modification, equivalent variation and modification of the above embodiments according to the technical substance of the present invention still fall within the scope of the technical solution of the present invention.

It should be further understood that the terms "first," "second," "third," and the like in this specification are used merely for distinguishing between various components, elements, steps, etc. in the specification and not for indicating a logical or sequential relationship between the various components, elements, steps, etc., unless otherwise indicated.

It should also be understood that the terminology described herein is for the purpose of describing particular embodiments only, and is not intended to limit the scope of the present invention. It must be noted that, as used herein and in the appended claims, the singular forms "a," "an," and "the" include plural referents unless the context clearly dictates otherwise. For example, reference to "a step" or "an apparatus" means a reference to one or more steps or apparatuses, and may include sub-steps as well as sub-apparatuses. All conjunctions used should be understood in the broadest sense. And, the word "or" should be understood as having the definition of a logical "or" rather than a logical "exclusive or" unless the context clearly indicates the contrary. Further, implementation of embodiments of the present invention may include performing selected tasks manually, automatically, or in combination.

Claims

1. A method of data conversion comprising the steps of:

acquiring a user query problem;

selecting keywords in the user query questions;

and generating an output result according to the carefully selected data.

2. The data transformation method of claim 1, wherein the selecting, using a large language model, a second predetermined number of selected data corresponding to the query question from the candidate data comprises:

3. The data conversion method of claim 1, wherein the candidate data and/or the data form of the culled data comprises a table in a string format.

4. The data transformation method of claim 1, wherein the selecting, using a large language model, a second predetermined number of selected data corresponding to the query question from the candidate data comprises:

and circularly executing the circulation step for preset times until the updated candidate data quantity reaches the second preset quantity, and taking the updated candidate data at the moment as the carefully selected data.

5. The data conversion method according to claim 1, wherein said generating said output result based on said beneficiated data, comprises:

and generating SQL sentences corresponding to the query questions according to the carefully chosen data by using a large language model.

6. The data conversion method according to claim 1, wherein said generating said output result based on said beneficiated data, comprises:

and generating the output result according to a preset prompt word instruction by using a large language model.

7. The data conversion method according to any one of claims 1 to 6, wherein before each use of a large language model, contents of all steps before the use of the large language model are input to the large language model as a dialogue history.

8. A data conversion system, characterized by being applied to the data conversion method according to any one of claims 1 to 7, comprising:

the acquisition module is used for acquiring the user inquiry problem;

the selecting module is used for selecting the keywords in the query questions;

a processing module for performing at least one of the following steps:

and generating an output result according to the carefully selected data.

9. An electronic device, the electronic device comprising:

a memory storing a computer program;

a processor communicatively coupled to said memory, said processor executing the data conversion method of any one of claims 1-7 when said computer program is invoked;

and the display is in communication connection with the processor and the memory and is used for displaying a GUI interactive interface related to the data conversion method.

10. A readable storage medium storing a computer program, characterized by: the computer program, when executed by a processor, implements the data conversion method as claimed in any one of claims 1 to 7.