CN117648427A - Data query method, device, computer equipment and storage medium - Google Patents

Data query method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN117648427A
CN117648427A CN202311732140.3A CN202311732140A CN117648427A CN 117648427 A CN117648427 A CN 117648427A CN 202311732140 A CN202311732140 A CN 202311732140A CN 117648427 A CN117648427 A CN 117648427A
Authority
CN
China
Prior art keywords
query
data
corpus data
corpus
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311732140.3A
Other languages
Chinese (zh)
Inventor
林鹏程
曹睿
杨冉
鞠芳
张青南
谭珂
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Life Insurance Co ltd
Original Assignee
China Life Insurance Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Life Insurance Co ltd filed Critical China Life Insurance Co ltd
Priority to CN202311732140.3A priority Critical patent/CN117648427A/en
Publication of CN117648427A publication Critical patent/CN117648427A/en
Pending legal-status Critical Current

Links

Abstract

The present application relates to a data query method, apparatus, computer device, storage medium and computer program product. The method comprises the following steps: responding to a data query request, and acquiring initial corpus data input by a user; word segmentation is carried out on the initial corpus data, keywords in the initial corpus data are extracted, intention recognition is carried out on the basis of the keywords, and target intention of a user is determined; searching in a vector database and a keyword index library respectively based on target intention, determining query corpus data, and constructing a query prompt based on the query corpus data; and inputting the query prompt into the query model to obtain a data query result. By adopting the method, the accuracy of the data query result can be improved.

Description

Data query method, device, computer equipment and storage medium
Technical Field
The present invention relates to the field of data processing technology, and in particular, to a data query method, apparatus, computer device, storage medium, and computer program product.
Background
With the rapid growth of information volume, large language models are becoming more and more popular in business' everyday applications, where one of the applications of large language models is a knowledge base-based large model question-answering system. In a large model question-answering system, a user needs to provide query corpus data, and related information is queried through the query corpus data.
In the traditional method, a large model question-answering system generally carries out coding processing on corpus data input by a user to obtain a vector after the coding processing, then carries out similarity matching query in a vector database through the vector, determines a target query corpus with the highest similarity with the vector, generates a question prompt through the target query corpus, inputs the question prompt into a large model for data processing, and outputs a feedback result.
However, in the conventional method, by encoding the corpus data and based on a similarity matching algorithm, it is determined that a certain error exists in the query tag, and thus the accuracy of the data query result is low.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a data query method, apparatus, computer device, computer readable storage medium, and computer program product.
In a first aspect, the present application provides a data query method, the method including:
responding to a data query request, and acquiring initial corpus data input by a user;
word segmentation is carried out on the initial corpus data, keywords in the initial corpus data are extracted, intention recognition is carried out on the basis of the keywords, and target intention of the user is determined;
Searching in a vector database and a keyword index database respectively based on the target intention, determining query corpus data, and constructing a query prompt based on the query corpus data;
and inputting the query prompt into a query model to obtain a data query result.
In one embodiment, the method further comprises:
acquiring training entry data; the training entry data carries a classification identifier;
performing word segmentation processing on the training entry data of each class, and selecting word segments meeting the preset word frequency condition from the word segments after the word segmentation processing as intended word segments;
the intention recognition based on the keywords determines the target intention of the user, and the intention recognition comprises the following steps:
and matching the keywords with each intention word to determine the target intention of the user.
In one embodiment, the searching in the vector database and the keyword index database based on the target intention, respectively, to determine query corpus data includes:
coding the initial corpus data to obtain a query vector;
inquiring a corpus meeting a similarity condition between the query vector and a vector database corresponding to the target intention based on the target intention, and taking the corpus as query corpus data;
And inquiring the corpus with highest word frequency of the keywords in a keyword index base corresponding to the target intention based on the target intention, and taking the corpus as inquiry corpus data.
In one embodiment, the constructing a query hint based on the query corpus data includes:
based on the query scores corresponding to the query corpus data, sequencing the query corpus data queried by the vector database and the query corpus data queried by the keyword index library;
selecting a preset number of target query corpus data from the ranked query corpus data;
and performing splicing processing based on the target query corpus data to construct a query prompt.
In one embodiment, before the ranking of the query corpus data of the vector database query and the query corpus data of the keyword index library query based on the query scores corresponding to the query corpus data, the method further includes:
acquiring initial query scores corresponding to the query corpus data;
and respectively carrying out weighted calculation on initial query scores corresponding to the query corpus data based on the weight values of the vector database and the keyword index database, and determining the query score corresponding to each query corpus data after the weighted calculation.
In one embodiment, the method further comprises:
obtaining result feedback data corresponding to each data query result; the result feedback data comprises positive feedback data and negative feedback data;
based on the positive feedback data and the negative feedback data, respectively sequencing query prompts corresponding to the data query results to obtain sequencing results;
determining the sorting scores of the query prompts based on the sorting results, and selecting the target query prompt with the highest score;
and determining a weight value between the vector database and the keyword index library based on the duty ratio of the query corpus data contained in the target query prompt.
In a second aspect, the present application further provides a data query device, the device including:
the acquisition module is used for responding to the data query request and acquiring initial corpus data input by a user;
the first determining module is used for carrying out word segmentation on the initial corpus data, extracting keywords in the initial corpus data, carrying out intention recognition based on the keywords and determining the target intention of the user;
the second determining module is used for respectively searching in a vector database and a keyword index library based on the target intention, determining query corpus data and constructing a query prompt based on the query corpus data;
And the query module is used for inputting the query prompt into a query model to obtain a data query result.
In a third aspect, the present application also provides a computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
responding to a data query request, and acquiring initial corpus data input by a user;
word segmentation is carried out on the initial corpus data, keywords in the initial corpus data are extracted, intention recognition is carried out on the basis of the keywords, and target intention of the user is determined;
searching in a vector database and a keyword index database respectively based on the target intention, determining query corpus data, and constructing a query prompt based on the query corpus data;
and inputting the query prompt into a query model to obtain a data query result.
In a fourth aspect, the present application also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:
responding to a data query request, and acquiring initial corpus data input by a user;
Word segmentation is carried out on the initial corpus data, keywords in the initial corpus data are extracted, intention recognition is carried out on the basis of the keywords, and target intention of the user is determined;
searching in a vector database and a keyword index database respectively based on the target intention, determining query corpus data, and constructing a query prompt based on the query corpus data;
and inputting the query prompt into a query model to obtain a data query result.
In a fifth aspect, the present application also provides a computer program product comprising a computer program which, when executed by a processor, performs the steps of:
responding to a data query request, and acquiring initial corpus data input by a user;
word segmentation is carried out on the initial corpus data, keywords in the initial corpus data are extracted, intention recognition is carried out on the basis of the keywords, and target intention of the user is determined;
searching in a vector database and a keyword index database respectively based on the target intention, determining query corpus data, and constructing a query prompt based on the query corpus data;
And inputting the query prompt into a query model to obtain a data query result.
The data query method, the data query device, the computer equipment, the storage medium and the computer program product are used for responding to the data query request and acquiring initial corpus data input by a user; word segmentation is carried out on the initial corpus data, keywords in the initial corpus data are extracted, intention recognition is carried out on the basis of the keywords, and target intention of the user is determined; searching in a vector database and a keyword index database respectively based on the target intention, determining query corpus data, and constructing a query prompt based on the query corpus data; and inputting the query prompt into a query model to obtain a data query result. According to the method, through intention recognition on initial corpus data sent by a user, and after the target intention of the user is determined, searching is respectively carried out in a vector database corresponding to the target intention and a keyword index database, query corpus data is determined, and then query prompt is constructed, so that the accuracy of query conditions of query searching is improved, more accurate query prompt is input into a query model for data query, and further, the accuracy of a data query result is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the related art, the drawings that are required to be used in the embodiments or the related technical descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to the drawings without inventive effort for a person having ordinary skill in the art.
FIG. 1 is a flow diagram of a method of data polling in one embodiment;
FIG. 2 is a flow diagram of the steps of constructing intent segments and determining user target intent in one embodiment;
FIG. 3 is a flow chart illustrating the steps for determining query corpus data in one embodiment;
FIG. 4 is a flow diagram of a method of constructing a query hint in one embodiment;
FIG. 5 is a flowchart illustrating a query scoring step corresponding to determining query corpus data in one embodiment;
FIG. 6 is a flowchart illustrating a step of updating weight values corresponding to query corpus data in one embodiment;
FIG. 7 is a flowchart illustrating a specific example of a data query method in one embodiment;
FIG. 8 is a block diagram of a data polling device in one embodiment;
Fig. 9 is an internal structural diagram of a computer device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
In one embodiment, as shown in fig. 1, a data query method is provided, where the method is applied to a terminal to illustrate the method, it is understood that the method may also be applied to a server, and may also be applied to a system including the terminal and the server, and implemented through interaction between the terminal and the server. In this embodiment, the method includes the steps of:
step 102, initial corpus data input by a user is obtained in response to a data query request.
In implementation, in the current data query system, an input box for performing data query is provided for a user, and the user can input initial corpus data in the input box to realize query on target problems or target data, so that when the user initiatively initiates a data query request, the terminal responds to the data query request to acquire the initial corpus data input by the user.
Step 104, word segmentation is carried out on the initial corpus data, keywords in the initial corpus data are extracted, intention recognition is carried out on the basis of the keywords, and target intention of a user is determined.
In implementation, in order to improve accuracy of data query, initial corpus data of a user needs to be preprocessed. Therefore, the terminal performs word segmentation on the initial corpus data to obtain each word segment contained in the initial corpus data after word segmentation. And then, the terminal counts the occurrence frequency of each word, and determines the keywords in the initial corpus data in each word. Then, the terminal performs intention recognition on the query intention of the user based on the keywords, and determines the target intention of the user.
And 106, respectively searching in a vector database and a keyword index library based on the target intention, determining query corpus data, and constructing a query prompt based on the query corpus data.
In implementation, the terminal includes a vector database and a keyword index library, which are pre-constructed and used for determining query corpus data (also called term data), wherein the keyword index library may be, but is not limited to, a keyword inverted index library. Query corpus in the vector database and the keyword index database are pre-classified according to intention. Alternatively, each intent corresponds to a vector database sub-library and a keyword index sub-library. In this way, after determining the target intention of the current user query, the terminal searches the query corpus data set of the vector database and the keyword index database corresponding to the target intention based on the target intention of the user, and determines the query corpus data. The query corpus data not only limits the category of the intention, but also has more accuracy by using two different retrieval modes in two different types of corpus databases (namely a vector database and a keyword index database). Then, the terminal constructs a query prompt symbol prompt in a query corpus data splicing mode based on the determined query corpus data. The query hints are used as input data for a query model.
Step 108, inputting the query prompt into the query model to obtain the data query result.
In implementation, the terminal inputs the query prompt into the query model, and outputs a data query result through data processing of the query model. The query model may be a large language model, and is specifically used for performing a question-answer type data query. The query model has been pre-trained based on training samples.
In the data query method, initial corpus data input by a user is obtained in response to a data query request; word segmentation is carried out on the initial corpus data, keywords in the initial corpus data are extracted, intention recognition is carried out on the basis of the keywords, and target intention of a user is determined; searching in a vector database and a keyword index library respectively based on target intention, determining query corpus data, and constructing a query prompt based on the query corpus data; and inputting the query prompt into the query model to obtain a data query result. According to the method, through intention recognition on initial corpus data sent by a user, and after the target intention of the user is determined, searching is respectively carried out in a vector database corresponding to the target intention and a keyword index database, query corpus data is determined, and then query prompt is constructed, so that the accuracy of query conditions of query searching is improved, more accurate query prompt is input into a query model for data query, and further, the accuracy of a data query result is improved.
In an exemplary embodiment, in order to perform intent recognition on initial corpus data input by a user in advance, instead of a mechanical similar search query, intent segmentation for intent recognition is pre-constructed, as shown in fig. 2, the method further includes the following steps 202 to 206. Wherein:
step 202, obtaining training entry data.
In an implementation, a terminal obtains a plurality of training vocabulary entry data. The training vocabulary data corresponds to a plurality of different classifications according to the content of the training vocabulary data, such as sales, internal control, service, etc. And the training vocabulary entry data carries classification marks, and the classification marks carried by the training vocabulary entry data can be marked in advance.
Step 204, performing word segmentation processing on each type of training entry data, and selecting the word segments meeting the preset word frequency condition from the word segments after the word segmentation processing as intended word segments.
In implementation, word segmentation processing is carried out on each type of training vocabulary entry data, segmented words meeting the preset word frequency condition are selected from the segmented words after the word segmentation processing to serve as intended segmented words, specifically, word segmentation processing is carried out on each type of training vocabulary entry data, word frequency statistics is carried out on the segmented words after the word segmentation processing, and segmented words with word frequency ordering Top-N are selected to serve as intended segmented words of the type of intention. The intent word segmentation is used for realizing intent recognition of initial corpus data input by a user.
Optionally, each type of training vocabulary entry data may correspond to one or may determine a plurality of intention vocabulary entries, and in this embodiment of the present application, the number of intention vocabulary entries corresponding to each type of training vocabulary entry data is not limited.
After determining the intended word segmentation, the specific process of constructing a query prompt based on the query corpus data in step 106 is as follows:
step 206, matching the keywords with each intention word to determine the target intention of the user.
In implementation, the terminal matches the keywords extracted from the initial corpus data with each intended word, and determines the intention of the intended word with high matching degree (i.e. more matching quantity) with a certain type of intended word in the keywords as the target intention of the current user.
In this embodiment, through the pre-classification training of the training vocabulary entry data of each type, the intention word for intention recognition is determined, and the intention word can realize the intention recognition of the initial corpus data input by the user, so that the data query is performed based on the recognized target intention, and the accuracy of the data query is improved.
In an exemplary embodiment, as shown in FIG. 3, step 106 includes steps 302 through 306. Wherein:
Step 302, the initial corpus data is encoded to obtain a query vector.
In implementation, when the vector database is queried for the initial corpus data, the initial corpus data needs to be coded and converted into a vector in advance, so that the terminal performs coding processing on the initial corpus data to obtain a query vector.
Step 304, based on the target intention, inquiring the corpus meeting the similarity condition between the query vector and the query vector in a vector database corresponding to the target intention, and taking the corpus as query corpus data.
In an implementation, after determining a target intention of a user, a vector database corresponding to the target intention is determined based on an association relationship between the intention and the vector database. Then, the terminal queries the corpus meeting the similarity condition between the query vector and the vector database corresponding to the target intention based on the target intention, and the corpus is used as query corpus data.
Optionally, the preset similarity condition is that the similarity between the query vector and the query corpus data in the vector database is the highest.
Step 306, based on the target intention, inquiring the corpus with highest word frequency of the keywords in the keyword index library corresponding to the target intention, and taking the corpus as inquiry corpus data.
In implementation, after determining the target intention of the user, determining a keyword index library corresponding to the target intention based on the association relationship between the intention and the keyword index library. Then, the terminal queries the corpus with highest word frequency of the keywords in the keyword index base corresponding to the target intention based on the target intention, and the corpus is used as query corpus data.
In this embodiment, for the target intention of the user, corpus retrieval is performed in a vector database corresponding to the target intention and a keyword index database respectively based on the initial corpus data of the user, query corpus data is determined, two databases of different types correspond to two different query modes, and the query corpus data is determined from different dimensions, so that the accuracy of the query corpus data is improved, and further, the accuracy of data query is improved.
In an exemplary embodiment, as shown in FIG. 4, the specific process of constructing a query hint based on query corpus data in step 106 includes steps 402 through 406. Wherein:
step 401, based on the query scores corresponding to the query corpus data, ordering the query corpus data of the query of the vector database and the query corpus data of the query of the keyword index library.
In practice, after query corpus data is determined in a database (including a vector database and a keyword index database), a corresponding query score may be predefined for each query corpus data. Further, the terminal sorts the query corpus data queried by the vector database and the query corpus data queried by the keyword index database based on the query scores corresponding to the query corpus data. The query scores may be ranked according to a descending rule, so as to obtain a query score sequence with query scores arranged from high to low. The embodiment of the application does not limit the ranking rule of the query scores.
Step 402, selecting a preset number of target query corpus data from the ranked query corpus data.
In implementation, the terminal selects a preset number of target query corpus data from the ranked query corpus data. Specifically, the query scores are arranged according to a descending order rule, and query corpus data corresponding to N (N > 0) query scores in the top ranking in the query score sequence is selected as target query corpus data for the query score sequence obtained after the ordering.
And step 403, performing splicing processing based on the target query corpus data to construct a query prompt.
In implementation, the terminal performs splicing processing on each target query corpus data according to a preset splicing rule, and a query prompt is constructed. The preset splicing rules can be but not limited to head-to-tail splicing of the target query corpus data according to preset ordering rules, and specific contents of the preset splicing rules are not limited in the embodiment of the application.
In this embodiment, the query corpus data queried by the vector database and the keyword index database is ranked, and the target query corpus data is selected, so that the query prompt is constructed based on the target query corpus data, corpus information contained in the query prompt is enriched, and data query accuracy is improved.
In an exemplary embodiment, for each query corpus data queried in the vector database and the keyword index database, each query corpus data corresponds to an initial query score, and based on the query accuracy of the vector database and the keyword index database, a corresponding weight is further set, and further, after determining the initial query score corresponding to each query corpus data, further score calculation may be performed on the query corpus data, as shown in fig. 5, where the method further includes:
Step 501, obtaining an initial query score corresponding to the query corpus data.
In implementation, a terminal obtains an initial query score corresponding to query corpus data. Specifically, the initial query score corresponding to each query corpus data may be assigned based on the search rank of the query corpus data. For example, the query corpus data of top3 retrieved by the vector database is assigned as follows: a1, a2 and a3; the query corpus data of top3 retrieved by the keyword index library is assigned as follows according to the preset retrieval rank: a4, a5 and a6.
Step 502, respectively performing weighted calculation on initial query scores corresponding to the query corpus data based on the weight values of the vector database and the keyword index database, and determining the query score corresponding to each query corpus data after the weighted calculation.
In implementation, the terminal performs weighted calculation on initial query scores corresponding to the query corpus data based on weight values of the vector database and the keyword index database, and determines the query score corresponding to each query corpus data after the weighted calculation. For example, for the example in step 501, for 6 query corpus data retrieved from the vector database and the keyword index database, their initial query scores are: a1, a2, a3, a4, a5, a6. The weight values corresponding to each initial query score are c1, c2, c3, c4, c5, and c6, respectively. And performing weighted calculation on each initial query score to obtain a final query score corresponding to each query corpus data as a1, a2, a3, a4, a5, c5 and a6.
Optionally, when the weighted calculation of the query corpus data is performed for the first time, an initial weight value corresponding to each query corpus data may be preset, and when the weighted calculation of the query corpus data of a subsequent round is performed, the initial weight value may be adjusted according to the feedback data of the data query result of the historical round, so as to determine the weight value corresponding to each query corpus data corresponding to the current round.
In this embodiment, the initial query score and the weight value corresponding to the target query corpus data are used for performing weighted calculation, determining the query score corresponding to each query corpus data, and emphasizing the influence of each query corpus data on the data query result by the weighted influence of the weight value, thereby ensuring the accuracy of the target query corpus screened based on the query score.
In an exemplary embodiment, a method for calculating a weight value corresponding to each initial query score when calculating the query score is provided, as shown in fig. 6, and the method further includes:
step 601, obtaining result feedback data corresponding to each data query result.
The result feedback data comprises positive feedback data and negative feedback data.
In implementation, after performing a data query of a past duration, the user may evaluate the data query result given by the query model, for example, by performing a praise operation, a click operation, or a number of times the user uses the data query system as a query result feedback, so as to determine a query effect (accuracy) of the data query result. Furthermore, in order to improve the accuracy of the data query result, the terminal may further excite the data query process based on the result feedback data. Specifically, the terminal acquires result feedback data corresponding to each data query result in the history query process.
Step 602, sorting query prompts corresponding to the data query results based on the positive feedback data and the negative feedback data respectively to obtain sorting results.
In implementation, the terminal sorts the query prompts corresponding to each data query result based on the positive feedback data and the negative feedback data, so as to obtain a sorting result. Specifically, for the result feedback condition of each data query result, the quality of the data query result is determined, and the quality of the query prompt corresponding to the data query result is reversely judged according to the quality of the data query result, so that the data query result is displayed in a sequencing manner, and the sequencing result corresponding to the query prompt is determined. For example, a specific ordering rule is positive feedback data: the existing ordering of the query prompts is ordered in descending order according to the order of 10 minutes. Negative feedback data: the existing ordering of the query prompts is incrementally ordered in a 1-point order.
Step 603, determining the ranking score of each query indicator based on the ranking result, and selecting the target query indicator with the highest score.
In implementation, the terminal determines a ranking score of each query indicator based on the ranking result, and selects a target query indicator with the highest score. The target query prompt is obtained by splicing a plurality of query corpus data, so that the corresponding scores of the query corpus data contained in the target query prompt can be correspondingly determined.
Step 604, determining a weight value between the vector database and the keyword index library based on the duty ratio of the query corpus data contained in the target query prompt.
In an implementation, the terminal determines a weight value between the vector database and the keyword index library based on the duty ratio of query corpus data contained in each target query prompt. As shown in table 1 below:
TABLE 1
For example, the target query indicator corresponding to the category one includes query corpus data 1, query corpus data 2 and query corpus data 3, and the scores corresponding to the query corpus data 1, the query corpus data 2 and the query corpus data 3 are 5, 10 and 3, and further, if the score of the target query indicator is higher, the score indicates that the query effect is good, the weight value corresponding to each query corpus data in the target query indicator may be 0.28, 0.56 and 0.16. Furthermore, the initial weight value is updated based on the weight value corresponding to each piece of determined query corpus data, so that more accurate calculation of the query score of each piece of query corpus data is realized.
In this embodiment, the weight value corresponding to the query corpus data in the data query process is updated through the result feedback data of the data query result, so that the accuracy of the data query result is further improved based on the excitation of the result feedback data.
In an exemplary embodiment, as shown in fig. 7, a complete flow example of a data query method is given, which includes:
step 701, the user inputs initial corpus data (i.e. question text data);
step 702, performing word segmentation on initial corpus data, extracting keywords in the initial corpus data, performing intention recognition based on the keywords, and determining target intention of a user;
step 703, searching in the vector database and the keyword index database respectively based on the target intention, determining query corpus data, weighting (initial weight value) ordering initial query scores of the query corpus data, and determining target query corpus data;
step 704, performing splicing processing based on the target query corpus data to construct a query prompt;
step 705, inputting the query prompt into the query model to obtain a data query result;
step 706, obtaining result feedback data (user click action log) corresponding to each data query result;
step 707, determining a weight value between the vector database and the keyword index library through a preset weight calculation model. The weight value is used to update the initial weight value.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides a data query device for realizing the above related data query method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in the embodiments of one or more data query devices provided below may refer to the limitation of the data query method hereinabove, and will not be repeated herein.
In one exemplary embodiment, as shown in fig. 8, there is provided a data query apparatus 800 comprising: an acquisition module 801, a first determination module 802, a second determination module 803, and a query module 804, wherein:
the obtaining module 801 is configured to obtain initial corpus data input by a user in response to a data query request.
The first determining module 802 is configured to perform word segmentation on the initial corpus data, extract keywords in the initial corpus data, perform intent recognition based on the keywords, and determine a target intent of the user.
The second determining module 803 is configured to search in the vector database and the keyword index database, determine query corpus data, and construct a query prompt based on the query corpus data, based on the target intent.
The query module 804 is configured to input a query prompt into the query model to obtain a data query result.
In an exemplary embodiment, the apparatus 800 further comprises:
the first acquisition module is used for acquiring training entry data; the training entry data carries a classification identifier.
The first selecting module is used for carrying out word segmentation processing on each type of training entry data, and selecting word segments meeting the preset word frequency condition from the word segments after the word segmentation processing as intended word segments.
The first determining module 802 is specifically configured to match the keywords with each intention word, and determine a target intention of the user.
In an exemplary embodiment, the second determining module 803 is specifically configured to perform encoding processing on the initial corpus data to obtain a query vector.
Based on the target intention, a corpus meeting a similarity condition between the query and the query vector is queried in a vector database corresponding to the target intention and is used as query corpus data.
Based on the target intention, inquiring the corpus with highest word frequency of keywords in a keyword index base corresponding to the target intention, and taking the corpus as inquiry corpus data.
In an exemplary embodiment, the second determining module 803 is specifically configured to sort the query corpus data of the query in the database and the query corpus data of the query in the keyword index library based on the query scores corresponding to the query corpus data.
Selecting a preset number of target query corpus data from the ranked query corpus data.
And performing splicing processing based on the target query corpus data to construct a query prompt.
In an exemplary embodiment, the apparatus 800 further comprises:
and the second acquisition module is used for acquiring initial query scores corresponding to the query corpus data.
And the weighting calculation module is used for respectively carrying out weighting calculation on the initial query scores corresponding to the query corpus data based on the weight values of the vector database and the keyword index database, and determining the query score corresponding to each query corpus data after the weighting calculation.
In an exemplary embodiment, the apparatus 800 further comprises:
the third acquisition module is used for acquiring result feedback data corresponding to each data query result; the result feedback data includes positive feedback data and negative feedback data.
And the sequencing module is used for sequencing the query prompt corresponding to the data query result based on the positive feedback data and the negative feedback data respectively to obtain a sequencing result.
And the second selecting module is used for determining the sorting scores of the query prompts based on the sorting results and selecting the target query prompt with the highest score.
And the third determining module is used for determining a weight value between the vector database and the keyword index library based on the duty ratio of the query corpus data contained in the target query prompt.
The various modules in the data querying device described above may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In an exemplary embodiment, a computer device, which may be a terminal, is provided, and an internal structure thereof may be as shown in fig. 9. The computer device includes a processor, a memory, an input/output interface, a communication interface, a display unit, and an input means. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface, the display unit and the input device are connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a data query method. The display unit of the computer device is used for forming a visual picture, and can be a display screen, a projection device or a virtual reality imaging device. The display screen can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be a key, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
It will be appreciated by those skilled in the art that the structure shown in fig. 9 is merely a block diagram of a portion of the structure associated with the present application and is not limiting of the computer device to which the present application applies, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
In one exemplary embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program performing the steps of:
and responding to the data query request, and acquiring initial corpus data input by a user.
The method comprises the steps of performing word segmentation on initial corpus data, extracting keywords in the initial corpus data, performing intention recognition based on the keywords, and determining target intention of a user.
Searching in a vector database and a keyword index library respectively based on target intention, determining query corpus data, and constructing a query prompt based on the query corpus data.
And inputting the query prompt into the query model to obtain a data query result.
In one embodiment, the processor when executing the computer program further performs the steps of:
Acquiring training entry data; the training entry data carries a classification identifier.
Performing word segmentation processing on each type of training vocabulary entry data, and selecting the vocabulary words meeting the preset vocabulary frequency condition from the vocabulary words after the word segmentation processing as the intended vocabulary words.
Performing intent recognition based on the keywords, determining a target intent of the user, including:
and matching the keywords with each intention word to determine the target intention of the user.
In one embodiment, the processor when executing the computer program further performs the steps of:
and carrying out coding processing on the initial corpus data to obtain a query vector.
Based on the target intention, a corpus meeting a similarity condition between the query and the query vector is queried in a vector database corresponding to the target intention and is used as query corpus data.
Based on the target intention, inquiring the corpus with highest word frequency of keywords in a keyword index base corresponding to the target intention, and taking the corpus as inquiry corpus data.
In one embodiment, the processor when executing the computer program further performs the steps of:
based on the query scores corresponding to the query corpus data, sorting the query corpus data queried by the vector database and the query corpus data queried by the keyword index database;
Selecting a preset number of target query corpus data from the ranked query corpus data;
and performing splicing processing based on the target query corpus data to construct a query prompt.
In one embodiment, the processor when executing the computer program further performs the steps of:
acquiring initial query scores corresponding to the query corpus data;
and respectively carrying out weighted calculation on initial query scores corresponding to the query corpus data based on the weight values of the vector database and the keyword index database, and determining the query score corresponding to each query corpus data after the weighted calculation.
In one embodiment, the processor when executing the computer program further performs the steps of:
obtaining result feedback data corresponding to each data query result; the result feedback data comprises positive feedback data and negative feedback data;
based on the positive feedback data and the negative feedback data, respectively sequencing the query prompts corresponding to the data query results to obtain sequencing results;
determining the ranking scores of the query prompts based on the ranking results, and selecting the target query prompt with the highest score;
and determining a weight value between the vector database and the keyword index library based on the duty ratio of the query corpus data contained in the target query prompt.
In one embodiment, the computer program when executed by the processor further performs the steps of:
in one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, carries out the steps of the method embodiments described above.
In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims (10)

1. A method of querying data, the method comprising:
responding to a data query request, and acquiring initial corpus data input by a user;
word segmentation is carried out on the initial corpus data, keywords in the initial corpus data are extracted, intention recognition is carried out on the basis of the keywords, and target intention of the user is determined;
searching in a vector database and a keyword index database respectively based on the target intention, determining query corpus data, and constructing a query prompt based on the query corpus data;
And inputting the query prompt into a query model to obtain a data query result.
2. The method according to claim 1, wherein the method further comprises:
acquiring training entry data; the training entry data carries a classification identifier;
performing word segmentation processing on the training entry data of each class, and selecting word segments meeting the preset word frequency condition from the word segments after the word segmentation processing as intended word segments;
the intention recognition based on the keywords determines the target intention of the user, and the intention recognition comprises the following steps:
and matching the keywords with each intention word to determine the target intention of the user.
3. The method of claim 1, wherein the retrieving in a vector database and a keyword index library, respectively, based on the target intent, determines query corpus data, comprising:
coding the initial corpus data to obtain a query vector;
inquiring a corpus meeting a similarity condition between the query vector and a vector database corresponding to the target intention based on the target intention, and taking the corpus as query corpus data;
and inquiring the corpus with highest word frequency of the keywords in a keyword index base corresponding to the target intention based on the target intention, and taking the corpus as inquiry corpus data.
4. The method of claim 1, wherein the constructing a query hint based on the query corpus data comprises:
based on the query scores corresponding to the query corpus data, sequencing the query corpus data queried by the vector database and the query corpus data queried by the keyword index library;
selecting a preset number of target query corpus data from the ranked query corpus data;
and performing splicing processing based on the target query corpus data to construct a query prompt.
5. The method of claim 4, wherein prior to ranking the query corpus data of the vector database query and the query corpus data of the keyword index library query based on the query scores corresponding to the query corpus data, the method further comprises:
acquiring initial query scores corresponding to the query corpus data;
and respectively carrying out weighted calculation on initial query scores corresponding to the query corpus data based on the weight values of the vector database and the keyword index database, and determining the query score corresponding to each query corpus data after the weighted calculation.
6. The method according to claim 1 or 5, characterized in that the method further comprises:
Obtaining result feedback data corresponding to each data query result; the result feedback data comprises positive feedback data and negative feedback data;
based on the positive feedback data and the negative feedback data, respectively sequencing query prompts corresponding to the data query results to obtain sequencing results;
determining the sorting scores of the query prompts based on the sorting results, and selecting the target query prompt with the highest score;
and determining a weight value between the vector database and the keyword index library based on the duty ratio of the query corpus data contained in the target query prompt.
7. A data querying device, the device comprising:
the acquisition module is used for responding to the data query request and acquiring initial corpus data input by a user;
the first determining module is used for carrying out word segmentation on the initial corpus data, extracting keywords in the initial corpus data, carrying out intention recognition based on the keywords and determining the target intention of the user;
the second determining module is used for respectively searching in a vector database and a keyword index library based on the target intention, determining query corpus data and constructing a query prompt based on the query corpus data;
And the query module is used for inputting the query prompt into a query model to obtain a data query result.
8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 6 when the computer program is executed.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.
10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.
CN202311732140.3A 2023-12-16 2023-12-16 Data query method, device, computer equipment and storage medium Pending CN117648427A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311732140.3A CN117648427A (en) 2023-12-16 2023-12-16 Data query method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311732140.3A CN117648427A (en) 2023-12-16 2023-12-16 Data query method, device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117648427A true CN117648427A (en) 2024-03-05

Family

ID=90049412

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311732140.3A Pending CN117648427A (en) 2023-12-16 2023-12-16 Data query method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117648427A (en)

Similar Documents

Publication Publication Date Title
CN111753060B (en) Information retrieval method, apparatus, device and computer readable storage medium
CN111444320B (en) Text retrieval method and device, computer equipment and storage medium
CN108647205B (en) Fine-grained emotion analysis model construction method and device and readable storage medium
CN113127632B (en) Text summarization method and device based on heterogeneous graph, storage medium and terminal
CN111159485A (en) Tail entity linking method, device, server and storage medium
CN111651570A (en) Text sentence processing method and device, electronic equipment and storage medium
CN110737756B (en) Method, apparatus, device and medium for determining answer to user input data
CN115062134B (en) Knowledge question-answering model training and knowledge question-answering method, device and computer equipment
US20220114644A1 (en) Recommendation system with sparse feature encoding
CN117435685A (en) Document retrieval method, document retrieval device, computer equipment, storage medium and product
EP4261763A1 (en) Apparatus and method for providing user's interior style analysis model on basis of sns text
CN117648427A (en) Data query method, device, computer equipment and storage medium
CN111400413B (en) Method and system for determining category of knowledge points in knowledge base
CN115129864A (en) Text classification method and device, computer equipment and storage medium
CN116414940A (en) Standard problem determining method and device and related equipment
CN107622129B (en) Method and device for organizing knowledge base and computer storage medium
CN110647914A (en) Intelligent service level training method and device and computer readable storage medium
CN111382246A (en) Text matching method, matching device and terminal
CN115795023B (en) Document recommendation method, device, equipment and storage medium
CN113420139B (en) Text matching method and device, electronic equipment and storage medium
CN112395487B (en) Information recommendation method and device, computer readable storage medium and electronic equipment
EP4266237A1 (en) Server, user terminal, and method for providing user interior decoration style analysis model on basis of sns text
CN114780681A (en) Audit scheme recommendation method and device, computer equipment and storage medium
CN114093447A (en) Data asset recommendation method and device, computer equipment and storage medium
CN117931858A (en) Data query method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination