CN116701437A - Data conversion method, data conversion system, electronic device, and readable storage medium - Google Patents

Data conversion method, data conversion system, electronic device, and readable storage medium Download PDF

Info

Publication number
CN116701437A
CN116701437A CN202310980534.4A CN202310980534A CN116701437A CN 116701437 A CN116701437 A CN 116701437A CN 202310980534 A CN202310980534 A CN 202310980534A CN 116701437 A CN116701437 A CN 116701437A
Authority
CN
China
Prior art keywords
data
candidate
language model
selecting
data conversion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310980534.4A
Other languages
Chinese (zh)
Other versions
CN116701437B (en
Inventor
吕桓雪
李昊阳
李剑楠
苏鹏
黄炎
陈书俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Aikesheng Information Technology Co ltd
Original Assignee
Shanghai Aikesheng Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Aikesheng Information Technology Co ltd filed Critical Shanghai Aikesheng Information Technology Co ltd
Priority to CN202310980534.4A priority Critical patent/CN116701437B/en
Publication of CN116701437A publication Critical patent/CN116701437A/en
Application granted granted Critical
Publication of CN116701437B publication Critical patent/CN116701437B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention provides a data conversion method, a data conversion system, an electronic device and a readable storage medium, comprising the following steps: acquiring a user query problem; selecting keywords in the user query questions; aiming at each keyword, acquiring a keyword vector corresponding to the keyword; obtaining candidate word vectors with vector distances within a preset range from the keyword vectors in a database, and obtaining candidate fields corresponding to all the candidate word vectors; selecting a first preset number of carefully chosen fields from the candidate fields by using a large language model; for each carefully chosen field, acquiring all candidate data corresponding to the carefully chosen field from the database; selecting a second preset number of carefully selected data corresponding to the query question from the candidate data by using a large language model; and generating an output result according to the carefully selected data. The invention can combine a large language model and a vector retrieval technology, and improves the accuracy and efficiency of data conversion.

Description

Data conversion method, data conversion system, electronic device, and readable storage medium
Technical Field
The present invention relates to database technologies, and in particular, to a data conversion method, a data conversion system, an electronic device, and a readable storage medium.
Background
Natural language is recognized as the best way of interaction in many fields. There is no general model available to connect natural language and arbitrary fields. If a relational database can be linked through natural language, the user, whether or not proficient with the SQL query language, will be able to simplify much of the existing work. With the rise of deep learning technology, a great deal of work for researching a natural language connection relation type database begins to appear.
The SQL language is the primary query language for relational databases currently in use. The mapping of natural language to SQL can be considered a semantic parsing problem (Andreas, vlachos et al, 2013). Semantic parsing is a long-standing and widely studied problem in Natural Language Processing (NLP). Therefore, it has attracted considerable attention in academia and industry, particularly the conversion of natural language into SQL queries. In the current age, from the financial, electronic business to medical fields, large amounts of data are stored in relational databases. In database query processes, users typically make query requests using natural language. However, converting natural language directly into executable SQL queries is a challenging task.
Text2SQL is the conversion of queries in human language (e.g., english) into database query language (SQL). The traditional Text2SQL method has limitation in processing complex or semantically fuzzy queries through word questions and answers, so that the method cannot accurately convert into correct SQL query sentences. Therefore, a new method and system is needed to improve the accuracy and efficiency of Text2 SQL.
Disclosure of Invention
The invention aims to provide a data conversion method, a data conversion system, electronic equipment and a readable storage medium using a large language model, which are combined with the large language model and a vector retrieval technology to improve the accuracy and efficiency of data conversion and can be effectively applied to a Text2SQL scene.
In order to achieve the above object, the present invention provides a data conversion method, comprising the steps of: acquiring a user query problem; selecting keywords in the user query questions; aiming at each keyword, acquiring a keyword vector corresponding to the keyword; obtaining candidate word vectors with the distance within a preset range from the keyword vectors in a database, and obtaining candidate fields corresponding to all the candidate word vectors; selecting a first preset number of carefully chosen fields from the candidate fields by using a large language model; for each carefully chosen field, acquiring all candidate data corresponding to the carefully chosen field from the database; selecting a second preset number of carefully selected data corresponding to the query question from the candidate data by using a large language model; and generating an output result according to the carefully selected data.
Optionally, the selecting, using a large language model, a second preset number of selected data corresponding to the query question from the candidate data specifically includes: dividing all the candidate data into a plurality of data groups according to a third preset quantity; selecting a fourth preset number of selected data in each data group by using a large language model; summarizing the beneficiation data in each of the data groups to obtain the second preset number of beneficiation data.
Optionally, the candidate data and/or the data form of the carefully selected data includes a table in a string format.
Optionally, using a large language model, selecting a second preset number of selected data corresponding to the query question from the candidate data, including: the circulation steps are as follows: selecting intermediate data from the candidate data by using a large language model, and taking the intermediate data as updated candidate data; and circularly executing the circulation step for preset times until the updated candidate data quantity reaches the second preset quantity, and taking the updated candidate data at the moment as the carefully selected data.
Optionally, the generating the output result according to the carefully chosen data specifically includes: and generating SQL sentences corresponding to the query questions according to the carefully chosen data by using a large language model.
Optionally, the generating the output result according to the carefully chosen data specifically includes: and generating the output result according to a preset prompt word instruction by using a large language model.
Optionally, before each use of the large language model, all the step contents before the use of the large language model are input to the large language model as dialogue histories.
In order to achieve the above object, the present invention further provides a data conversion system, which is applied to any one of the above data conversion methods, including: the acquisition module is used for acquiring the user inquiry problem; the selecting module is used for selecting the keywords in the query questions; a processing module for performing at least one of the following steps: aiming at each keyword, acquiring a keyword vector corresponding to the keyword; obtaining candidate word vectors with the distance within a preset range from the keyword vectors in a database, and obtaining candidate fields corresponding to all the candidate word vectors; selecting a first preset number of carefully chosen fields from the candidate fields by using a large language model; for each carefully chosen field, acquiring all candidate data corresponding to the carefully chosen field from the database; selecting a second preset number of carefully selected data corresponding to the query question from the candidate data by using a large language model; and generating an output result according to the carefully selected data.
To achieve the above object, the present invention also provides an electronic device including: a memory storing a computer program; a processor communicatively coupled to the memory for executing any one of the data conversion methods described above when the computer program is invoked; and the display is in communication connection with the processor and the memory and is used for displaying a GUI interactive interface related to the data conversion method.
To achieve the above object, the present invention also provides a readable storage medium storing a computer program, characterized in that: the computer program, when executed by a processor, implements the data conversion method of any of the above.
The data conversion method, the data conversion system, the electronic equipment and the readable storage medium using the large language model provided by the invention have the following beneficial effects:
the data conversion method provided by the invention comprises the following steps: acquiring a user query problem; selecting keywords in the user query questions; aiming at each keyword, acquiring a keyword vector corresponding to the keyword; obtaining candidate word vectors with vector distances within a preset range from the keyword vectors in a database, and obtaining candidate fields corresponding to all the candidate word vectors; selecting a first preset number of carefully chosen fields from the candidate fields by using a large language model; for each carefully chosen field, acquiring all candidate data corresponding to the carefully chosen field from the database; selecting a second preset number of carefully selected data corresponding to the query question from the candidate data by using a large language model; and generating an output result according to the carefully selected data.
When the invention is used, a user can obtain an output result corresponding to a requirement format, such as a Text2SQL scene, only by using a common language habit to input a query problem. Because the combination of keywords in the common language dispatch sentence making is complex, the invention can firstly extract the keywords in the query problem according to the preset algorithm, then independently search the vector of each keyword, search the fields similar to the keywords based on the principle of word vectors, and then screen the fields through a large language model. The fields screened out at this time may still correspond to a large amount of data in the database (for example, each field may correspond to a large amount of tables in a string format, and most of these tables are actually irrelevant to the query problem), at this time, a large language model is further used to screen out the tables related to the query problem, and finally select data corresponding to the query problem is obtained and efficiently and accurately converted into SQL statements. By means of the method, the device and the system, the task of data conversion is disassembled, and the accuracy and the efficiency of data conversion, particularly the accuracy and the efficiency of Text2SQL, can be effectively improved by combining the advantages of a large language model and a vector retrieval technology.
Because the data conversion system and the data conversion method provided by the invention belong to the same invention conception, the data conversion system can disassemble the task of data conversion, and can effectively improve the accuracy and efficiency of data conversion by combining the advantages of a large language model and a vector retrieval technology.
Because the electronic equipment and the data conversion method provided by the invention belong to the same invention conception, the data conversion system can disassemble the task of data conversion, and can effectively improve the accuracy and efficiency of data conversion by combining the advantages of a large language model and a vector retrieval technology.
Because the readable storage medium and the data conversion method provided by the invention belong to the same invention conception, the data conversion system can disassemble the task of data conversion, and can effectively improve the accuracy and efficiency of data conversion by combining the advantages of a large language model and a vector retrieval technology.
Drawings
Fig. 1 is a flow chart of a data conversion method according to an embodiment of the invention.
Fig. 2 is a schematic block diagram of an electronic device according to an embodiment of the invention.
Wherein the reference numerals are as follows:
a 101-processor; 102-a communication interface; 103-memory; 104-a communication bus; 105-display.
Detailed Description
The invention will be described in further detail with reference to the drawings and the specific embodiments thereof in order to make the objects, advantages and features of the invention more apparent. It should be noted that the drawings are in a very simplified form and are not drawn to scale, merely for convenience and clarity in aiding in the description of embodiments of the invention. Furthermore, the structures shown in the drawings are often part of actual structures. In particular, the drawings are shown with different emphasis instead being placed upon illustrating the various embodiments.
It will be understood that when an element or layer is referred to as being "on" or "connected to" another element or layer, it can be directly on, connected to, or intervening elements or layers may be present. In contrast, when an element is referred to as being "directly on" …, "directly connected to" another element or layer, there are no intervening elements or layers present. Although the terms first, second, third, etc. may be used herein to describe various elements, components, regions, layers and/or sections, these elements, components, regions, layers and/or sections should not be limited by these terms. These terms are only used to distinguish one element, component, region, layer or section from another element, component, region, layer or section. Thus, a first element, component, region, layer or section discussed below could be termed a second element, component, region, layer or section without departing from the teachings of the present invention. Spatially relative terms, such as "under … …," "below," "lower," "above … …," "upper," and the like, may be used herein for convenience of description to describe one element or feature's relationship to another element or feature as illustrated in the figures. It will be understood that the spatially relative terms are intended to encompass different orientations of the device in use and operation in addition to the orientation depicted in the figures. For example, if the device in the figures is turned over, elements or features described as "under" … … "," below "and" beneath "would then be oriented" on "other elements or features. The device may be otherwise oriented (rotated 90 degrees or other orientations) and the spatially relative descriptors used herein interpreted accordingly. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, elements, components, and/or groups. As used herein, the term "and/or" includes any and all combinations of the associated listed items.
The large language model (English: large Language Model, abbreviated LLM), also known as large language model, is an artificial intelligence model, intended to understand and generate human language. They train on a large amount of text data and can perform a wide range of tasks including text summarization, translation, emotion analysis, and so forth. LLMs are characterized by a large scale, containing billions of parameters, which help them learn complex patterns in linguistic data. These models are typically based on deep learning architectures, such as converters, which help them to achieve impressive performance on various NLP tasks.
Text2SQL (Structured Query Language) is a Natural Language Processing (NLP) technique that aims to convert natural language queries into executable SQL query statements. It allows users to use natural language to propose database queries and automatically convert them into SQL statements that the database can understand and execute. Text2SQL has applications in many fields, particularly in the fields of database interfaces and intelligent assistants.
As an illustrative example, in the medical health field, a medical database is assumed that contains information about patients, doctors, diagnoses, and treatments. Using Text2SQL techniques at this point, a doctor or researcher can access information in the database using natural language queries without having to learn and write complex SQL query statements. For example, a doctor may make the following query: "find patients with diabetes in the last year and list their name, age and treatment regimen. The "Text2SQL system will parse the query question and generate the appropriate SQL query statement, such as: "SELECT name, age, treatmentplan FROM patients WHERE diagnosis = 'diabetes' AND admissiondate > DATESUB (CURRENTDATE ()", INTERVAL 1 YEAR) ".
Through Text2SQL technology, doctors can easily use their own familiar natural language to make database queries without knowing the database structure and SQL grammar. This provides a convenient way for non-technical professionals to interact with the database in a more intuitive and flexible way. However, the conventional Text2SQL method has limitations in processing complex or semantically ambiguous query problems through word questions, resulting in an inability to accurately convert into correct SQL query statements. Therefore, a new method and system is needed to improve the accuracy and efficiency of Text2 SQL.
The invention aims to provide a data conversion method, a data conversion system, electronic equipment and a readable storage medium using a large language model, which are combined with the large language model and a vector retrieval technology to improve the accuracy and efficiency of data conversion and can be effectively applied to a Text2SQL scene.
In order to achieve the above objective, the present invention provides a data conversion method, please refer to fig. 1, fig. 1 is a flow chart of a data conversion method according to an embodiment of the present invention. As shown in fig. 1, the present invention includes the steps of:
acquiring a user query problem;
selecting keywords in the user query questions;
aiming at each keyword, acquiring a keyword vector corresponding to the keyword;
obtaining candidate word vectors with vector distances within a preset range from the keyword vectors in a database, and obtaining candidate fields corresponding to all the candidate word vectors;
selecting a first preset number of carefully chosen fields from the candidate fields by using a large language model;
for each carefully chosen field, acquiring all candidate data corresponding to the carefully chosen field from the database;
selecting a second preset number of carefully selected data corresponding to the query question from the candidate data by using a large language model;
and generating an output result according to the carefully selected data.
When the invention is used, a user can obtain an output result corresponding to a requirement format, such as a Text2SQL scene, only by using a common language habit to input a query problem. Since the combination of keywords in conventional language-specific sentence-making can be complex, especially in some complex or semantically ambiguous query problems. The invention can firstly extract the keywords in the query problem according to the preset algorithm, then independently search the vector of each keyword, search the fields similar to the keywords based on the principle of word vectors, and then screen the fields through a large language model. The fields screened out at this time may still correspond to a large amount of data in the database (for example, each field may correspond to a large amount of tables in a string format, and most of these tables are actually irrelevant to the query problem), at this time, a large language model is further used to screen out the tables related to the query problem, and finally, the selected data is implemented and efficiently and accurately converted into the SQL statement. By means of the method, the device and the system, the task of data conversion is disassembled, and the accuracy and the efficiency of data conversion, particularly the accuracy and the efficiency of Text2SQL, can be effectively improved by combining the advantages of a large language model and a vector retrieval technology.
It should be understood that the vector distances include, but are not limited to, euclidean distances.
It should be noted that, the selecting the keywords in the user query question may be based on a preset screening algorithm to establish a screening logic to select, for example, the english question "What's the most popular project in github (What is the most popular item in the gitub platform)", may set to extract only the table and the scholars as keywords, for example, popular project github, but not limited thereto.
Specifically, the candidate data and/or the data format of the culled data includes a table (e.g., an SQL database table) in a string format.
For example: the choice field corresponding to "project" is "team" and the term "team" may correspond to a plurality of tables in the database, i.e., a plurality of tables may contain the field "team", one of the tables is, for example, as follows:
col : team | county | wins | years won
row 1 : greystones | wicklow | 1 | 2011
row 2 : ballymore eustace | kildare | 1 | 2010
row 3 : maynooth | kildare | 1 | 2009
it can be seen that if the english problem "What's the most popular project in github" in the above embodiment is still corresponded, the problem of this form study is the winning situation of the team, and not the team corresponding to the most popular item in the gitsub platform, it can be seen that this form does not correspond to the query problem and is filtered out by the large language model.
After the candidate data is obtained, the data volume of the candidate data is very huge, for example, various tables with huge quantities can be obtained, if the candidate data is input to a large language model at one time, the screening effect and the screening efficiency are poor, so batch input is required, and the following technical scheme is provided based on the invention: selecting a second preset number of carefully chosen data corresponding to the query question from the candidate data by using a large language model, wherein the method specifically comprises the following steps of:
dividing all the candidate data into a plurality of data groups according to a third preset quantity;
selecting a fourth preset number of selected data in each data group by using a large language model;
summarizing the beneficiation data in each of the data groups to obtain the second preset number of beneficiation data.
Preferably, a large language model is used to select a second preset number of selected data corresponding to the query question from the candidate data, which specifically includes:
the circulation steps are as follows: selecting intermediate data from the candidate data by using a large language model, and taking the intermediate data as updated candidate data;
and circularly executing the circulation step for preset times until the updated candidate data quantity reaches the second preset quantity, and taking the updated candidate data at the moment as the carefully selected data. Thus, the invention iteratively interacts with the user to gradually narrow the candidate range. Specifically, the iterative candidate vector may be obtained by introducing a multi-round dialogue mechanism, and it should be noted that the number of rounds of multi-round dialogue is not limited to this.
It should be understood that, with the large language model, the concept of prompt word (prompt) engineering should be utilized to generate an output result according to a preset prompt word instruction. In an exemplary embodiment, the template for generating the instruction of the prompting word corresponding to the output result according to the carefully chosen data is as follows:
please answer the user's questions concisely and professionally based on the database table structure represented by the following strings.
If no answer is available from this, please say "the question cannot be answered based on the known information" or "sufficient relevant information is not provided". No braiding component is allowed to be added to the answer. In addition, the answer requests use Chinese.
Database table structure: (the database table corresponding to the selected data is input therein)
Problems: (here, user query questions are entered).
It should be noted that, when introducing the multi-round dialogue mechanism, in order to make the large language model more accurately contact the context, the dialogue history should be provided for the large language model as comprehensively as possible, so the following technical scheme is provided based on the present invention: before each large language model is used, the contents of all steps before the large language model is used are input into the large language model as dialogue histories.
In summary, the invention provides an optimization scheme for task disassembly multi-round dialogue, combines a large language model, vector retrieval and prompt word engineering, compensates the defect of single question-answering by task disassembly and automatically executing multi-round dialogue in a system, better processes complex semantics, and effectively improves the accuracy and efficiency of Text2 SQL. Essentially, by utilizing task disassembly of the system and each sub-query generation strategy, finer query sentences can be provided for complex queries, and the query effect is further improved.
The inventor performs a comparison experiment with the prior art according to the technical principle of the invention, and experimental data are as follows:
experimental environment:
operating system: ubuntu 20.04.5 LTS
CPU:Intel(R) Xeon(R) Platinum 8362 CPU @ 2.80GHz
GPU:NVIDIA A30
Data set: spider open source dataset
In addition to the above resources, other resources are not limited.
T5-base T5-large T5-3B The invention is that
Avg 58.12 66.63 71.76 78.98
Therefore, the invention can obviously improve the accuracy and efficiency of Text2 SQL.
In order to achieve the above object, the present invention further provides a data conversion system, which is applied to any one of the above data conversion methods, including:
the acquisition module is used for acquiring the user inquiry problem;
the selecting module is used for selecting the keywords in the query questions;
a processing module for performing at least one of the following steps:
aiming at each keyword, acquiring a keyword vector corresponding to the keyword;
obtaining candidate word vectors with the distance within a preset range from the keyword vectors in a database, and obtaining candidate fields corresponding to all the candidate word vectors;
selecting a first preset number of carefully chosen fields from the candidate fields by using a large language model;
for each carefully chosen field, acquiring all candidate data corresponding to the carefully chosen field from the database;
selecting a second preset number of carefully selected data corresponding to the query question from the candidate data by using a large language model;
and generating an output result according to the carefully selected data.
Because the data conversion system and the data conversion method provided by the invention belong to the same invention conception, the data conversion system can disassemble the task of data conversion, and can effectively improve the accuracy and efficiency of data conversion by combining the advantages of a large language model and a vector retrieval technology.
In order to achieve the above objective, the present invention further provides an electronic device, please refer to fig. 2, and fig. 2 is a block structure schematic diagram of the electronic device according to an embodiment of the present invention. As shown in fig. 2, the electronic device includes:
a memory 103 storing a computer program;
a processor 101, communicatively coupled to the memory, for executing the data conversion method of any of the above when the computer program is invoked;
and a display 105 communicatively coupled to the processor and the memory for displaying a GUI interactive interface associated with the data conversion method.
Because the electronic equipment and the data conversion method provided by the invention belong to the same invention conception, the data conversion system can disassemble the task of data conversion, and can effectively improve the accuracy and efficiency of data conversion by combining the advantages of a large language model and a vector retrieval technology.
As shown in fig. 2, the electronic device further comprises a communication interface 102 and a communication bus 104, wherein the processor 101, the communication interface 102, and the memory 103 communicate with each other via the communication bus 104. The communication bus 104 may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, among others. The communication bus 104 may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus. The communication interface 102 is used for communication between the electronic device and other devices.
The processor 101 of the present invention may be a central processing unit (Central Processing Unit, CPU), other general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, and the processor 101 is a control center of the electronic device, and connects various parts of the entire electronic device using various interfaces and lines.
The memory 103 may be used to store the computer program, and the processor 101 may implement various functions of the electronic device by running or executing the computer program stored in the memory 103 and invoking data stored in the memory 103.
The memory 103 may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
To achieve the above object, the present invention provides a readable storage medium storing a computer program which, when executed by a processor, implements the data conversion method as set forth in any one of the above. Because the readable storage medium provided by the invention and the data conversion method described above belong to the same inventive concept, the readable storage medium provided by the invention has all the advantages of the data conversion method described above, so the beneficial effects of the readable storage medium provided by the invention are not repeated here.
The readable storage media of embodiments of the present invention may take the form of any combination of one or more computer-readable media. The readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer hard disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
It should be noted that, in the present description, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different manner from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the system disclosed in the embodiment, the description is relatively simple because of corresponding to the method disclosed in the embodiment, and the relevant points refer to the description of the method section.
It should be further noted that although the present invention has been disclosed in the preferred embodiments, the above embodiments are not intended to limit the present invention. Many possible variations and modifications of the disclosed technology can be made by anyone skilled in the art without departing from the scope of the technology, or the technology can be modified to be equivalent. Therefore, any simple modification, equivalent variation and modification of the above embodiments according to the technical substance of the present invention still fall within the scope of the technical solution of the present invention.
It should be further understood that the terms "first," "second," "third," and the like in this specification are used merely for distinguishing between various components, elements, steps, etc. in the specification and not for indicating a logical or sequential relationship between the various components, elements, steps, etc., unless otherwise indicated.
It should also be understood that the terminology described herein is for the purpose of describing particular embodiments only, and is not intended to limit the scope of the present invention. It must be noted that, as used herein and in the appended claims, the singular forms "a," "an," and "the" include plural referents unless the context clearly dictates otherwise. For example, reference to "a step" or "an apparatus" means a reference to one or more steps or apparatuses, and may include sub-steps as well as sub-apparatuses. All conjunctions used should be understood in the broadest sense. And, the word "or" should be understood as having the definition of a logical "or" rather than a logical "exclusive or" unless the context clearly indicates the contrary. Further, implementation of embodiments of the present invention may include performing selected tasks manually, automatically, or in combination.

Claims (10)

1. A method of data conversion comprising the steps of:
acquiring a user query problem;
selecting keywords in the user query questions;
aiming at each keyword, acquiring a keyword vector corresponding to the keyword;
obtaining candidate word vectors with vector distances within a preset range from the keyword vectors in a database, and obtaining candidate fields corresponding to all the candidate word vectors;
selecting a first preset number of carefully chosen fields from the candidate fields by using a large language model;
for each carefully chosen field, acquiring all candidate data corresponding to the carefully chosen field from the database;
selecting a second preset number of carefully selected data corresponding to the query question from the candidate data by using a large language model;
and generating an output result according to the carefully selected data.
2. The data transformation method of claim 1, wherein the selecting, using a large language model, a second predetermined number of selected data corresponding to the query question from the candidate data comprises:
dividing all the candidate data into a plurality of data groups according to a third preset quantity;
selecting a fourth preset number of selected data in each data group by using a large language model;
summarizing the beneficiation data in each of the data groups to obtain the second preset number of beneficiation data.
3. The data conversion method of claim 1, wherein the candidate data and/or the data form of the culled data comprises a table in a string format.
4. The data transformation method of claim 1, wherein the selecting, using a large language model, a second predetermined number of selected data corresponding to the query question from the candidate data comprises:
the circulation steps are as follows: selecting intermediate data from the candidate data by using a large language model, and taking the intermediate data as updated candidate data;
and circularly executing the circulation step for preset times until the updated candidate data quantity reaches the second preset quantity, and taking the updated candidate data at the moment as the carefully selected data.
5. The data conversion method according to claim 1, wherein said generating said output result based on said beneficiated data, comprises:
and generating SQL sentences corresponding to the query questions according to the carefully chosen data by using a large language model.
6. The data conversion method according to claim 1, wherein said generating said output result based on said beneficiated data, comprises:
and generating the output result according to a preset prompt word instruction by using a large language model.
7. The data conversion method according to any one of claims 1 to 6, wherein before each use of a large language model, contents of all steps before the use of the large language model are input to the large language model as a dialogue history.
8. A data conversion system, characterized by being applied to the data conversion method according to any one of claims 1 to 7, comprising:
the acquisition module is used for acquiring the user inquiry problem;
the selecting module is used for selecting the keywords in the query questions;
a processing module for performing at least one of the following steps:
aiming at each keyword, acquiring a keyword vector corresponding to the keyword;
obtaining candidate word vectors with the distance within a preset range from the keyword vectors in a database, and obtaining candidate fields corresponding to all the candidate word vectors;
selecting a first preset number of carefully chosen fields from the candidate fields by using a large language model;
for each carefully chosen field, acquiring all candidate data corresponding to the carefully chosen field from the database;
selecting a second preset number of carefully selected data corresponding to the query question from the candidate data by using a large language model;
and generating an output result according to the carefully selected data.
9. An electronic device, the electronic device comprising:
a memory storing a computer program;
a processor communicatively coupled to said memory, said processor executing the data conversion method of any one of claims 1-7 when said computer program is invoked;
and the display is in communication connection with the processor and the memory and is used for displaying a GUI interactive interface related to the data conversion method.
10. A readable storage medium storing a computer program, characterized by: the computer program, when executed by a processor, implements the data conversion method as claimed in any one of claims 1 to 7.
CN202310980534.4A 2023-08-07 2023-08-07 Data conversion method, data conversion system, electronic device, and readable storage medium Active CN116701437B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310980534.4A CN116701437B (en) 2023-08-07 2023-08-07 Data conversion method, data conversion system, electronic device, and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310980534.4A CN116701437B (en) 2023-08-07 2023-08-07 Data conversion method, data conversion system, electronic device, and readable storage medium

Publications (2)

Publication Number Publication Date
CN116701437A true CN116701437A (en) 2023-09-05
CN116701437B CN116701437B (en) 2023-10-20

Family

ID=87824277

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310980534.4A Active CN116701437B (en) 2023-08-07 2023-08-07 Data conversion method, data conversion system, electronic device, and readable storage medium

Country Status (1)

Country Link
CN (1) CN116701437B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117251473A (en) * 2023-11-20 2023-12-19 摩斯智联科技有限公司 Vehicle data query analysis method, system, device and storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6081774A (en) * 1997-08-22 2000-06-27 Novell, Inc. Natural language information retrieval system and method
US20160275148A1 (en) * 2015-03-20 2016-09-22 Huawei Technologies Co., Ltd. Database query method and device
CN109241259A (en) * 2018-08-24 2019-01-18 国网江苏省电力有限公司苏州供电分公司 Natural language querying method, apparatus and system based on ER model
US20210200761A1 (en) * 2019-12-31 2021-07-01 International Business Machines Corporation Natural-language database interface with automated keyword mapping and join-path inferences
CN114020768A (en) * 2021-10-13 2022-02-08 华中科技大学 Construction method and application of SQL (structured query language) statement generation model of Chinese natural language
CN114722069A (en) * 2022-04-07 2022-07-08 平安科技(深圳)有限公司 Language conversion method and device, electronic equipment and storage medium
CN115238101A (en) * 2022-09-23 2022-10-25 中国电子科技集团公司第十研究所 Multi-engine intelligent question-answering system oriented to multi-type knowledge base
CN115576984A (en) * 2022-09-13 2023-01-06 粤港澳国际供应链(广州)有限公司 Method for generating SQL (structured query language) statement and cross-database query by Chinese natural language
US11615080B1 (en) * 2020-04-03 2023-03-28 Apttus Corporation System, method, and computer program for converting a natural language query to a nested database query

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6081774A (en) * 1997-08-22 2000-06-27 Novell, Inc. Natural language information retrieval system and method
US20160275148A1 (en) * 2015-03-20 2016-09-22 Huawei Technologies Co., Ltd. Database query method and device
CN109241259A (en) * 2018-08-24 2019-01-18 国网江苏省电力有限公司苏州供电分公司 Natural language querying method, apparatus and system based on ER model
US20210200761A1 (en) * 2019-12-31 2021-07-01 International Business Machines Corporation Natural-language database interface with automated keyword mapping and join-path inferences
US11615080B1 (en) * 2020-04-03 2023-03-28 Apttus Corporation System, method, and computer program for converting a natural language query to a nested database query
CN114020768A (en) * 2021-10-13 2022-02-08 华中科技大学 Construction method and application of SQL (structured query language) statement generation model of Chinese natural language
CN114722069A (en) * 2022-04-07 2022-07-08 平安科技(深圳)有限公司 Language conversion method and device, electronic equipment and storage medium
CN115576984A (en) * 2022-09-13 2023-01-06 粤港澳国际供应链(广州)有限公司 Method for generating SQL (structured query language) statement and cross-database query by Chinese natural language
CN115238101A (en) * 2022-09-23 2022-10-25 中国电子科技集团公司第十研究所 Multi-engine intelligent question-answering system oriented to multi-type knowledge base

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117251473A (en) * 2023-11-20 2023-12-19 摩斯智联科技有限公司 Vehicle data query analysis method, system, device and storage medium
CN117251473B (en) * 2023-11-20 2024-03-15 摩斯智联科技有限公司 Vehicle data query analysis method, system, device and storage medium

Also Published As

Publication number Publication date
CN116701437B (en) 2023-10-20

Similar Documents

Publication Publication Date Title
CN109033080B (en) Medical term standardization method and system based on probability transfer matrix
CN110059160B (en) End-to-end context-based knowledge base question-answering method and device
RU2509350C2 (en) Method for semantic processing of natural language using graphic intermediary language
CN112487202B (en) Chinese medical named entity recognition method and device fusing knowledge map and BERT
CN116701437B (en) Data conversion method, data conversion system, electronic device, and readable storage medium
US11106873B2 (en) Context-based translation retrieval via multilingual space
CN110517767B (en) Auxiliary diagnosis method, auxiliary diagnosis device, electronic equipment and storage medium
US20230205996A1 (en) Automatic Synonyms Using Word Embedding and Word Similarity Models
Dar et al. Frameworks for querying databases using natural language: a literature review
Steinkamp et al. Basic artificial intelligence techniques: natural language processing of radiology reports
Adduru et al. Towards Dataset Creation And Establishing Baselines for Sentence-level Neural Clinical Paraphrase Generation and Simplification.
Li et al. Using context information to enhance simple question answering
CN116303537A (en) Data query method and device, electronic equipment and storage medium
ZAHIDI et al. Comparative study of the most useful Arabic-supporting natural language processing and deep learning libraries
Gammack et al. Semantic knowledge management system for design documentation with heterogeneous data using machine learning
Zhekova et al. Methodology for creating natural language interfaces to information systems in a specific domain area
Bombieri et al. Surgicberta: a pre-trained language model for procedural surgical language
CN114153994A (en) Medical insurance information question-answering method and device
CN114004237A (en) Intelligent question-answering system construction method based on bladder cancer knowledge graph
Varga Domain adaptation for multilingual neural machine translation
CN116612848B (en) Method, device, equipment and storage medium for generating electronic medical record
Abdul-Kader Application Of Speech-To-Text synthesizer by using Natural Language Processing (NLP).
CN112988952B (en) Multi-level-length text vector retrieval method and device and electronic equipment
KR102642488B1 (en) Data providing device, method and computer program generating answer using artificial intelligence technology
CN112800778B (en) Intent recognition method, system and storage medium based on word string length

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant