CN118332008A - Answer screening method, device, computer equipment and storage medium - Google Patents

Answer screening method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN118332008A
CN118332008A CN202410553114.2A CN202410553114A CN118332008A CN 118332008 A CN118332008 A CN 118332008A CN 202410553114 A CN202410553114 A CN 202410553114A CN 118332008 A CN118332008 A CN 118332008A
Authority
CN
China
Prior art keywords
answer
similarity
answers
processed
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410553114.2A
Other languages
Chinese (zh)
Inventor
柏雪
韩剑平
邓建春
艾若琳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
FAW Jiefang Automotive Co Ltd
Original Assignee
FAW Jiefang Automotive Co Ltd
Filing date
Publication date
Application filed by FAW Jiefang Automotive Co Ltd filed Critical FAW Jiefang Automotive Co Ltd
Publication of CN118332008A publication Critical patent/CN118332008A/en
Pending legal-status Critical Current

Links

Abstract

The application relates to an answer screening method, an answer screening device, computer equipment, a storage medium and a computer program product. The method comprises the following steps: acquiring a problem to be processed, and extracting keywords in the problem to be processed; screening candidate answers with the similarity meeting a preset similarity condition from an answer database; inputting the keywords and the candidate answers into a trained answer screening model; calculating target similarity of the current candidate answer and the keywords respectively aiming at each candidate answer, and determining the similarity of the current candidate answer and the to-be-processed question based on the target similarity; and screening target answers from the candidate answers based on the similarity between each candidate answer and the to-be-processed question. By adopting the method, the answer retrieval accuracy and efficiency can be improved.

Description

Answer screening method, device, computer equipment and storage medium
Technical Field
The present application relates to the field of computer technology, and in particular, to an answer screening method, an answer screening apparatus, a computer device, a storage medium, and a computer program product.
Background
Based on the questions input by the users, the large-scale document set is automatically searched for reply replies, and the method is increasingly applied to aspects of intelligent construction. When searching is carried out on a large-scale document set, text searching is extremely easy to take a long time, and only single-stage recall is used for searching, so that the searching result is not accurate enough.
Disclosure of Invention
Based on this, it is necessary to provide an answer screening method, an apparatus, a computer device, a computer readable storage medium and a computer program product capable of improving the accuracy and efficiency of answer retrieval in view of the above technical problems.
In a first aspect, the present application provides an answer screening method, including:
acquiring a problem to be processed, and extracting keywords in the problem to be processed;
Screening candidate answers with the similarity meeting a preset similarity condition from an answer database;
inputting the keywords and the candidate answers into a trained answer screening model; calculating target similarity of the current candidate answer and the keywords respectively aiming at each candidate answer, and determining the similarity of the current candidate answer and the to-be-processed question based on the target similarity; and screening target answers from the candidate answers based on the similarity between each candidate answer and the to-be-processed question.
In one embodiment, the calculating the target similarity between the current candidate answer and the at least two keywords includes: determining target similarity between the current candidate answer and the keywords based on the frequency, word frequency sensitivity and length sensitivity of the keywords in the current candidate answer for each keyword; the word frequency sensitivity is related to the number of candidate answers, and the length sensitivity is related to the degree of dispersion of the text length corresponding to the candidate answers.
In one embodiment, the target similarity between the current candidate answer and the keyword is calculated by the following formula:
Wherein, the method comprises the steps of, wherein,
Wherein, R is the target similarity between the current candidate answer and the keyword, k1 is the word frequency sensitivity, b is the length sensitivity, dl is the text length corresponding to the current candidate answer, avgdl is the average value of the text lengths corresponding to all candidate answers, andIs the frequency with which keywords appear in the current candidate answer.
In one embodiment, the determining the similarity between the current candidate answer and the to-be-processed question based on the target similarity includes:
Determining the weight of each keyword in at least two keywords;
And determining the similarity between the current candidate answer and the to-be-processed problem based on the weight and the target similarity corresponding to each keyword.
In one embodiment, the similarity between the current candidate answer and the to-be-processed question is calculated by the following formula:
Wherein, A score representing the similarity between the candidate answer and the question to be processed, n represents the total number of keywords, the value range of i is [1, n ],The weight of the i-th keyword is represented,And representing the target similarity corresponding to the ith keyword.
In one embodiment, the determining the weight of each of the at least two keywords includes:
For each keyword, determining the weight of the keyword based on the total number of answers in the answer database and the number of answers containing the keyword in the answer database.
In one embodiment, after the target answer is selected from the candidate answers, the method further includes: inputting the to-be-processed questions and the target answers into a large language model for processing, and obtaining final answers aiming at the to-be-processed questions.
In a second aspect, the present application further provides an answer screening apparatus, including:
the acquisition module is used for acquiring the problem to be processed and extracting keywords in the problem to be processed;
the screening module is used for screening candidate answers with the similarity meeting the preset similarity condition from an answer database;
The screening module is used for inputting the keywords and the candidate answers into an answer screening model; calculating target similarity of the current candidate answer and the at least two keywords respectively aiming at each candidate answer, and determining the similarity of the current candidate answer and the to-be-processed problem based on the target similarity; and screening target answers from the candidate answers based on the similarity between each candidate answer and the to-be-processed question.
In a third aspect, the present application also provides a computer device comprising a memory storing a computer program and a processor implementing the steps of the method of the embodiments when the computer program is executed by the processor.
In a fourth aspect, the present application also provides a computer-readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the method of the embodiments.
In a fifth aspect, the application also provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of the method of the embodiments.
The answer screening method, the answer screening device, the computer equipment, the storage medium and the computer program product acquire the to-be-processed questions and extract keywords in the to-be-processed questions; screening candidate answers with the similarity meeting the preset similarity condition with the to-be-processed questions from an answer database, realizing recall of the candidate answers in the first round, and then inputting the keywords and the candidate answers into a trained answer screening model; for each candidate answer, calculating target similarity between the current candidate answer and at least two keywords, and determining similarity between the current candidate answer and the to-be-processed question based on the target similarity; and screening target answers from the candidate answers based on the similarity between each candidate answer and the to-be-processed question, and realizing recall of the second round of candidate answers. The method for realizing multi-round recall retrieval of the question-answer scene obtains more accurate answers and improves the accuracy of retrieval results. Meanwhile, by adopting the scheme of the application to search the answers, the target answers can be quickly searched, and the searching efficiency is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the related art, the drawings that are required to be used in the embodiments or the related technical descriptions will be briefly described, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to the drawings without inventive effort for those skilled in the art.
FIG. 1 is a flowchart of an answer filtering method in an embodiment;
FIG. 2 is a flowchart of an answer filtering method according to another embodiment;
FIG. 3 is a flowchart illustrating an answer filtering method according to another embodiment;
FIG. 4 is a block diagram illustrating an answer filtering apparatus according to an embodiment;
fig. 5 is an internal structural diagram of a computer device in one embodiment.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
In one embodiment, as shown in fig. 1, an answer screening method is provided, where the method is applied to a terminal to illustrate the answer screening method, it may be understood that the method may also be applied to a server, and may also be applied to a system including the terminal and the server, and implemented through interaction between the terminal and the server. In this embodiment, the method includes the steps of:
102, acquiring a problem to be processed, and extracting keywords in the problem to be processed;
the terminal acquires the problem to be processed, can analyze the word segmentation of the problem to be processed as morphemes, each word is regarded as morphemes to obtain a plurality of morphemes, and can more accurately express the semantics of the problem to be processed by using a keyword extraction algorithm in the field of natural language processing as morpheme analysis, and can select one or more morphemes from the plurality of morphemes as keywords. The keyword extraction algorithm may be a term frequency inverse document frequency (TF-IDF) algorithm.
The terminal can further carry out vectorization processing on the to-be-processed problem, and the text vectorization processing is carried out by utilizing a first embedded model in the LANGCHAIN framework aiming at the to-be-processed problem to obtain the to-be-processed problem after orientation quantization.
Step 104, screening candidate answers with the similarity meeting the preset similarity condition with the to-be-processed questions from an answer database;
The terminal sets LANGCHAIN parameters of the framework, including a similarity score threshold value and the number of candidate answers; the number of candidate answers refers to the number of candidate answers selected from the answer database for answering the question, and is generally set to 3 or 5.
The terminal calculates the similarity score of each answer in the answer database and the questions to be processed, and the lower the similarity score is, the higher the similarity degree of the answer and the keywords is, in short, the lower the similarity score is, the better the similarity score is; therefore, a similarity score threshold needs to be set, when the similarity score is smaller than the similarity score threshold, the answer is effective, otherwise, the answer is ineffective, and therefore the text recall is achieved.
And the terminal screens out a first candidate answer with the similarity score smaller than the similarity score threshold value from the answer database based on the set number of candidate answers.
In general, if the value range of the similarity score threshold is [0,1], the similarity score threshold is set to about 0.5;
And inputting the vectorized to-be-processed questions and a vector database, calculating similarity scores of the vectorized to-be-processed questions and the answers in the first candidate answers by using a vector similarity calculation method provided by a LANGCHAIN framework, and screening the answers with the similarity scores smaller than a threshold value to serve as the candidate answers.
Step 106, inputting the key words and the candidate answers into a trained answer screening model; for each candidate answer, calculating target similarity between the current candidate answer and at least two keywords, and determining similarity between the current candidate answer and the to-be-processed question based on the target similarity; and screening target answers from the candidate answers based on the similarity between each candidate answer and the to-be-processed question.
The terminal inputs the keywords and the candidate answers into a trained answer screening model, and a BM25 (Best Match 25) algorithm is preconfigured in the answer screening model, so that the method is the most mainstream algorithm for calculating the similarity scores of questions and answers in the field of information retrieval at present.
And calculating the target similarity of the current candidate answer and at least two keywords for each candidate answer, and then carrying out weighted summation on the target similarity of the at least two keywords to obtain the similarity of the candidate answer and the question to be processed.
And screening target answers which are ranked in front of the similarity and meet the number of the candidate answers from the candidate answers based on the similarity between the candidate answers and the questions to be processed.
In this embodiment, a problem to be processed is obtained, and keywords in the problem to be processed are extracted; screening candidate answers with the similarity meeting the preset similarity condition with the to-be-processed questions from an answer database, realizing recall of the candidate answers in the first round, and then inputting the keywords and the candidate answers into a trained answer screening model; for each candidate answer, calculating target similarity between the current candidate answer and at least two keywords, and determining similarity between the current candidate answer and the to-be-processed question based on the target similarity; and screening target answers from the candidate answers based on the similarity between each candidate answer and the to-be-processed question, and realizing recall of the second round of candidate answers. The method for realizing multi-round recall retrieval of the question-answer scene obtains more accurate answers and improves the accuracy of retrieval results. Meanwhile, by adopting the scheme of the application to search the answers, the target answers can be quickly searched, and the searching efficiency is improved.
In one embodiment, the calculating the target similarity between the current candidate answer and the at least two keywords includes: determining target similarity between the current candidate answer and the keywords based on the frequency, word frequency sensitivity and length sensitivity of the keywords in the current candidate answer for each keyword; the word frequency sensitivity is related to the number of candidate answers, and the length sensitivity is related to the degree of dispersion of the text length corresponding to the candidate answers.
The frequency of occurrence of keywords in the current candidate answer refers to: the number of the keywords in the current candidate answers;
The word frequency sensitivity is used for controlling the sensitivity degree of the BM25 algorithm to the word frequency when calculating the similarity score; the length sensitivity is used to control how sensitive the BM25 algorithm is to the length of the text when calculating the similarity score.
In one embodiment, the training process of the answer screening model is: obtaining a problem sample to be processed, and extracting keywords in the problem sample to be processed; screening candidate answer samples with the similarity meeting the preset similarity condition from the answer database; and inputting the mark similarity of the candidate answer sample, the candidate answer sample and the keywords of the to-be-processed question sample into an initial answer screening model, and carrying out parameter adjustment on the initial answer screening model until the accuracy of the adjusted initial answer screening model is within a preset accuracy range, so as to obtain the answer screening model.
The labeled similarity of the candidate answer sample refers to the similarity of the manually labeled candidate answer sample and the question sample to be processed.
In the embodiment, text contents corresponding to a plurality of answers are obtained by using file loaders of LANGCHAIN framework different types, a result is returned as a file object, then text segmentation is carried out on the file object by using a text segmenter of LANGCHAIN framework, the text contents are segmented into text blocks meeting the preset parameter requirements of the text segmenter, and an answer database is constructed based on the text blocks; the preset parameter requirements include limiting the maximum cut size of the text blocks and the maximum amount of overlap between the text blocks.
Carrying out vectorization processing on text blocks in a text database by utilizing an embedded model in a LANGCHAIN framework to obtain vector data of the text blocks; and then, calling LANGCHAIN a vector database provided by the framework to store vectors, and obtaining a vector database of the knowledge base.
For the problem sample to be processed, extracting the keywords in the problem sample to be processed, and vectorizing the problem sample to be processed, wherein the extracting mode and vectorizing processing are similar to the extracting mode and vectorizing processing mode for extracting the keywords in the problem to be processed and processing the problem to be processed, and are not repeated.
The method for selecting the candidate answer sample with the similarity satisfying the preset similarity condition from the answer database is similar to the method for selecting the candidate answer with the similarity satisfying the preset similarity condition from the answer database, and is not repeated here.
And determining the difference between the mark similarity corresponding to the candidate answer sample, the similarity between the candidate answer sample obtained by determining the initial answer model and the similarity of the to-be-processed question sample, and based on the difference and the text block length of the candidate answer sample, realizing the adjustment of word frequency sensitivity and length sensitivity in the BM25 algorithm.
The higher the word frequency sensitivity, the more sensitive the computation of the similarity score is to the word frequency, and the value range of the word frequency sensitivity is usually 0 to 3, and the best effect is generally considered in the range of 0.5 to 2.
Aiming at the situation that the number of candidate answers is large, the similar candidate answers are naturally expected to be screened out from the candidate answers more strictly through a BM25 algorithm, so that in the process of rearranging the candidate answers by utilizing the BM25 algorithm, the similarity score of the BM25 algorithm is more sensitive to word frequency, and the similar candidate answers are screened out strictly by utilizing the occurrence frequency of keywords in the similarity score; accordingly, for the case where the number of candidate answers is small, it is desirable that the BM25 algorithm similarity score is less sensitive to word frequency. In summary, the word frequency sensitivity is positively correlated with the number of candidate answers, if the number of candidate answers is large, the word frequency sensitivity is set to be a high value, so that the similarity score of the BM25 algorithm is sensitive to the word frequency, and the most similar candidate answer is more strictly screened; otherwise, the word frequency sensitivity may be set to a lower value.
According to this tuning logic, during the model training phase, specific tuning rules are exemplified as follows: the value range of the number of the candidate answers is [0, TN ], wherein TN is the number of all answers in an answer database of the knowledge base, the value range of the word frequency sensitivity is [0.5,2], and if the number of the candidate answers and the word frequency sensitivity are in linear correlation, a specific linear relation between the number of the candidate answers and the word frequency sensitivity can be calculated according to the value range of the candidate answers, so that the word frequency sensitivity is correspondingly set according to the number of the candidate answers. The above adjustment rules are only an example, and the rules conforming to the adjustment logic may be all right. Accordingly, dynamic adjustment of word frequency sensitivity can be realized;
a greater length sensitivity indicates that the calculation of the similarity score is more sensitive to length; the length sensitivity is usually in the range of 0 to 1, and it is generally considered that the effect is optimal in the range of 0.3 to 0.9.
For a plurality of candidate answers, if the text length difference of the plurality of candidate answers is larger, the sensitivity degree of the similarity score of the BM25 algorithm to the text length needs to be reduced, so that the text length is too long or too short and does not greatly influence the similarity score calculation, and a more reasonable calculation result is obtained. Thus, dynamic adjustment of the length sensitivity can be performed according to the degree of dispersion of the text length: if the degree of text length dispersion in the first candidate text is large, the length sensitivity may be set to a low value; otherwise, the length sensitivity may be set to a higher value.
According to this adjustment logic, specific adjustment rules are exemplified as follows: the discrete degree of the text length of each candidate answer can be judged by using a box diagram or variance, if the variance is larger or the box height in the box diagram is higher, namely the data distribution is more scattered, the value of the length sensitivity can be set in the interval of [0.3,0.5], and the value of 0.4 is recommended; if the variance is medium or the box height in the box diagram is medium, namely the data distribution is normal, setting the value of the length sensitivity in the [0.5,0.7] interval, and suggesting the value to be 0.6; if the variance is small or the bin height in the bin map is low, i.e., the data distribution is relatively concentrated, then the value of the length sensitivity can be set to the interval [0.7,0.9], suggesting a value of 0.8. The above adjustment rules are only an example, and the rules conforming to the adjustment logic may be all right. Accordingly, dynamic adjustment of the length sensitivity can be achieved.
In one embodiment, the target similarity between the current candidate answer and the keyword is calculated by the following formula:
Wherein, the method comprises the steps of, wherein, ; Wherein R is the target similarity between the current candidate answer and the keyword, k1 is the word frequency sensitivity, b is the length sensitivity, dl is the text length corresponding to the current candidate answer, avgdl is the average value of the text lengths corresponding to all candidate answers,Is the frequency with which keywords appear in the current candidate answer.
In the present embodiment, for morphemes as keywords, a certain morphemeSimilarity score with text block (TextChunk) of current candidate reply informationIs defined by the formula:
Wherein, AndIn order to adjust the factor(s),Is thatThe frequency of occurrence in the text block,Is thatFrequency of occurrence in the problem to be treated. In most of the cases the number of times,=1, So the formula can be reduced to:
In one embodiment, the determining the similarity between the current candidate answer and the to-be-processed question based on the target similarity includes: determining the weight of each keyword in at least two keywords; and determining the similarity between the current candidate answer and the to-be-processed problem based on the weight and the target similarity corresponding to each keyword.
For each candidate answer, calculating a similarity score of each keyword and the candidate answer; and finally, carrying out weighted summation on the similarity scores of the keywords relative to the candidate answers, so as to obtain the similarity score of the current candidate answer and the question to be processed.
In one embodiment, the similarity between the current candidate answer and the to-be-processed question is calculated by the following formula:
Wherein, A score representing the similarity between the candidate answer and the question to be processed, n represents the total number of keywords, i has a value range of [1, n ],The weight of the i-th keyword is represented,And representing the target similarity corresponding to the ith keyword.
In one embodiment, the determining the weight of each of the at least two keywords includes: for each keyword, determining the weight of the keyword based on the total number of answers in the answer database and the number of answers containing the keyword in the answer database.
MorphemeThe weight formula of (2) is as follows:
where N is the total number of answers in the answer database, For inclusion of morphemes in answer databasesIs the number of answers to (a).
In one embodiment, after the target answer is selected from the candidate answers, the method further includes: inputting the to-be-processed questions and the target answers into a large language model for processing, and obtaining final answers aiming at the to-be-processed questions.
The questions to be processed and the target answers are input into a large language model (LLM, large language model for processing, and the large language model outputs the final answers of the questions to be processed.
Referring to fig. 2, in a possible embodiment of the present application, the model training process is performed during the process of answer recall of a to-be-processed question, and the solution of the present application is:
step 202, acquiring a problem to be processed, and extracting keywords in the problem to be processed;
Step 204, screening candidate answers with the similarity meeting the preset similarity condition with the to-be-processed questions from an answer database;
Step 206, inputting the label similarity of the candidate answers, the candidate answers and the keywords into an initial answer screening model, and adjusting parameters of the initial answer screening model to obtain an answer screening model;
Step 208, inputting the keywords and the candidate answers into an answer screening model; for each candidate answer, calculating target similarity between the current candidate answer and at least two keywords, and determining similarity between the current candidate answer and the to-be-processed question based on the target similarity; and screening target answers from the candidate answers based on the similarity between each candidate answer and the to-be-processed question.
Specifically, referring to fig. 3, the method includes the steps of:
step one, acquiring file content of a knowledge base, and carrying out text segmentation on the file content to acquire a text database of the knowledge base;
The knowledge base file content refers to an answer file, and a text database comprises each answer text;
Performing vectorization processing on a text database of the knowledge base, and storing the vectorized processing into the vector database of the knowledge base;
And carrying out vectorization processing on the answer texts, wherein the obtained vector database comprises vector expression forms of the answer texts.
Step three, obtaining a user problem, namely extracting keywords and vectorizing the keywords to obtain the keywords and a problem vector;
The user problem is the problem to be treated.
And fourthly, setting a similarity score threshold value and the number of candidate answers.
Step five, based on the question vector and the vector database, obtaining candidate answers with similarity scores meeting a similarity score threshold value with the user questions by using a vector similarity retrieval method;
And step six, dynamically adjusting parameters of the initial answer screening model based on the number of candidate answers and the text length to obtain the answer screening model.
And step seven, word segmentation and preprocessing are carried out on the candidate answers, and a first processing text is obtained.
Inputting the keywords and the first processing text into a trained answer screening model, and retrieving at least one matching text which is ranked in front of the similarity of the user questions and meets the number of candidate answers from the first processing text, wherein the matching text is used as a second candidate text;
The second candidate text corresponds to the target answer.
And step nine, inputting the user questions and the second candidate text into a large language model, and outputting answers to the questions.
In summary, the application optimizes the answer retrieval process: in a key application implementation scene based on LANGCHAIN and a large language model, an answer screening model is added to conduct rearrangement of candidate answers, a multi-round recall retrieval method is achieved, more accurate answers are obtained, and answer retrieval efficiency and accuracy are improved.
Dynamic parameter adjustment of BM25 algorithm: the effectiveness of the BM25 algorithm is affected by both the word frequency sensitivity and the length sensitivity, which parameters must be adjusted to obtain the best performance. According to the method, based on different characteristics of the vector similarity search result, the dynamic adjustment of the parameters of the BM25 algorithm is performed pertinently, so that the BM25 algorithm can play the best performance in different scenes, and a more accurate and reasonable text search result is obtained.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides an answer screening device for realizing the answer screening method. The implementation scheme of the solution provided by the device is similar to the implementation scheme described in the above method, so the specific limitation in the embodiments of one or more answer screening devices provided below may refer to the limitation of the answer screening method hereinabove, and will not be repeated here.
In an exemplary embodiment, as shown in fig. 4, there is provided an answer screening apparatus 400, including:
An obtaining module 401, configured to obtain a problem to be processed, and extract keywords in the problem to be processed;
A screening module 402, configured to screen candidate answers with similarity to the to-be-processed question satisfying a preset similarity condition from an answer database;
A screening module 403, configured to input the keyword and the candidate answer into an answer screening model; calculating target similarity of the current candidate answer and the at least two keywords respectively aiming at each candidate answer, and determining the similarity of the current candidate answer and the to-be-processed problem based on the target similarity; and screening target answers from the candidate answers based on the similarity between each candidate answer and the to-be-processed question.
In one embodiment, the filtering module 403 is specifically configured to, when calculating the target similarity between the current candidate answer and the at least two keywords, respectively:
Determining target similarity between the current candidate answer and the keywords based on the frequency, word frequency sensitivity and length sensitivity of the keywords in the current candidate answer for each keyword;
the word frequency sensitivity is related to the number of candidate answers, and the length sensitivity is related to the degree of dispersion of the text length corresponding to the candidate answers.
In one embodiment, the target similarity between the current candidate answer and the keyword is calculated by the following formula:
Wherein, the method comprises the steps of, wherein,
Wherein, R is the target similarity between the current candidate answer and the keyword, k1 is the word frequency sensitivity, b is the length sensitivity, dl is the text length corresponding to the current candidate answer, avgdl is the average value of the text lengths corresponding to all candidate answers, and
Is the frequency with which keywords appear in the current candidate answer.
In one embodiment, the screening module 403 is specifically configured to, when determining the similarity between the current candidate answer and the to-be-processed question based on the target similarity:
Determining the weight of each keyword in at least two keywords;
And determining the similarity between the current candidate answer and the to-be-processed problem based on the weight and the target similarity corresponding to each keyword.
In one embodiment, the similarity between the current candidate answer and the to-be-processed question is calculated by the following formula:
Wherein, A score representing the similarity between the candidate answer and the question to be processed, n represents the total number of keywords, the value range of i is [1, n ],The weight of the i-th keyword is represented,And representing the target similarity corresponding to the ith keyword.
In one embodiment, the filtering module 403 is specifically configured to, when determining the weight of each keyword of the at least two keywords:
For each keyword, determining the weight of the keyword based on the total number of answers in the answer database and the number of answers containing the keyword in the answer database.
In one embodiment, the apparatus further comprises a processing module; the screening module 403 screens the candidate answers to obtain target answers;
and the processing module is used for inputting the to-be-processed question and the target answer into a large language model for processing, so as to obtain a final answer aiming at the to-be-processed question.
The respective modules in the answer screening apparatus described above may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one exemplary embodiment, a computer device is provided, which may be a terminal, and an internal structure diagram thereof may be as shown in fig. 5. The computer device includes a processor, a memory, an input/output interface, a communication interface, a display unit, and an input means. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface, the display unit and the input device are connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement an answer screening method. The display unit of the computer device is used for forming a visual picture, and can be a display screen, a projection device or a virtual reality imaging device. The display screen can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be a key, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
It will be appreciated by those skilled in the art that the structure shown in FIG. 5 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
In an exemplary embodiment, a computer device is provided, comprising a memory and a processor, the memory having stored therein a computer program, the processor performing the steps of the method embodiments described above when the computer program is executed.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, implements the steps of the method embodiments described above.
In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are both information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data are required to meet the related regulations.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magneto-resistive random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (PHASE CHANGE Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in various forms such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), etc. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims (11)

1. An answer screening method, the method comprising:
acquiring a problem to be processed, and extracting keywords in the problem to be processed;
Screening candidate answers with the similarity meeting a preset similarity condition from an answer database;
inputting the keywords and the candidate answers into a trained answer screening model; calculating target similarity of the current candidate answer and the keywords respectively aiming at each candidate answer, and determining the similarity of the current candidate answer and the to-be-processed question based on the target similarity; and screening target answers from the candidate answers based on the similarity between each candidate answer and the to-be-processed question.
2. The method of claim 1, wherein the calculating the target similarity of the current candidate answer to the keyword comprises:
Determining target similarity between the current candidate answer and the keywords based on the frequency, word frequency sensitivity and length sensitivity of the keywords in the current candidate answer for each keyword;
the word frequency sensitivity is related to the number of candidate answers, and the length sensitivity is related to the degree of dispersion of the text length corresponding to the candidate answers.
3. The method of claim 2, wherein the target similarity of the current candidate answer to the keyword is calculated by the following formula:
Wherein, the method comprises the steps of, wherein,
Wherein, R is the target similarity between the current candidate answer and the keyword, k1 is the word frequency sensitivity, b is the length sensitivity, dl is the text length corresponding to the current candidate answer, avgdl is the average value of the text lengths corresponding to all candidate answers, and
Is the frequency with which keywords appear in the current candidate answer.
4. The method of claim 1, wherein the determining the similarity of the current candidate answer to the question to be processed based on the target similarity comprises:
Determining the weight of each keyword in at least two keywords;
And determining the similarity between the current candidate answer and the to-be-processed problem based on the weight and the target similarity corresponding to each keyword.
5. The method of claim 4, wherein the similarity of the current candidate answer to the question to be processed is calculated by the following formula:
Wherein, A score representing the similarity between the candidate answer and the question to be processed, n represents the total number of keywords, the value range of i is [1, n ],The weight of the i-th keyword is represented,And representing the target similarity corresponding to the ith keyword.
6. The method of claim 4, wherein determining the weight of each of the at least two keywords comprises:
For each keyword, determining the weight of the keyword based on the total number of answers in the answer database and the number of answers containing the keyword in the answer database.
7. The method of claim 1, wherein after screening the candidate answers for the target answer, the method further comprises:
Inputting the to-be-processed questions and the target answers into a large language model for processing, and obtaining final answers aiming at the to-be-processed questions.
8. An answer screening apparatus, said apparatus comprising:
the acquisition module is used for acquiring the problem to be processed and extracting keywords in the problem to be processed;
the screening module is used for screening candidate answers with the similarity meeting the preset similarity condition from an answer database;
The screening module is used for inputting the keywords and the candidate answers into an answer screening model; calculating target similarity of the current candidate answer and the at least two keywords respectively aiming at each candidate answer, and determining the similarity of the current candidate answer and the to-be-processed problem based on the target similarity; and screening target answers from the candidate answers based on the similarity between each candidate answer and the to-be-processed question.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.
11. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.
CN202410553114.2A 2024-05-07 Answer screening method, device, computer equipment and storage medium Pending CN118332008A (en)

Publications (1)

Publication Number Publication Date
CN118332008A true CN118332008A (en) 2024-07-12

Family

ID=

Similar Documents

Publication Publication Date Title
US20200356729A1 (en) Generation of text from structured data
CN112052387B (en) Content recommendation method, device and computer readable storage medium
CN113127632A (en) Text summarization method and device based on heterogeneous graph, storage medium and terminal
CN110765286A (en) Cross-media retrieval method and device, computer equipment and storage medium
CN111159431A (en) Knowledge graph-based information visualization method, device, equipment and storage medium
CN115795000A (en) Joint similarity algorithm comparison-based enclosure identification method and device
CN110737811A (en) Application classification method and device and related equipment
CN112131261A (en) Community query method and device based on community network and computer equipment
JP4143234B2 (en) Document classification apparatus, document classification method, and storage medium
CN117435685A (en) Document retrieval method, document retrieval device, computer equipment, storage medium and product
CN111950265A (en) Domain lexicon construction method and device
CN108229572B (en) Parameter optimization method and computing equipment
CN114547257B (en) Class matching method and device, computer equipment and storage medium
CN118332008A (en) Answer screening method, device, computer equipment and storage medium
CN114741489A (en) Document retrieval method, document retrieval device, storage medium and electronic equipment
CN116882408B (en) Construction method and device of transformer graph model, computer equipment and storage medium
CN115455306B (en) Push model training method, information push device and storage medium
CN113283235B (en) User label prediction method and system
US20160378857A1 (en) Object classification device and non-transitory computer readable medium
CN116361507A (en) Video retrieval model construction method, device, equipment and storage medium
CN117648427A (en) Data query method, device, computer equipment and storage medium
CN117493493A (en) Keyword definition method, keyword definition device, computer equipment and storage medium
CN116975405A (en) Search word processing method, apparatus, device, storage medium and program product
CN118170905A (en) Method, device, equipment, storage medium and product for constructing article knowledge base
CN116910604A (en) User classification method, apparatus, computer device, storage medium, and program product

Legal Events

Date Code Title Description
PB01 Publication