CN117056494B - Open domain question and answer method, device, electronic equipment and computer storage medium - Google Patents

Open domain question and answer method, device, electronic equipment and computer storage medium Download PDF

Info

Publication number
CN117056494B
CN117056494B CN202311265498.XA CN202311265498A CN117056494B CN 117056494 B CN117056494 B CN 117056494B CN 202311265498 A CN202311265498 A CN 202311265498A CN 117056494 B CN117056494 B CN 117056494B
Authority
CN
China
Prior art keywords
question
reference content
vector
answer
encoder
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311265498.XA
Other languages
Chinese (zh)
Other versions
CN117056494A (en
Inventor
陈永锐
蒋海云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202311265498.XA priority Critical patent/CN117056494B/en
Publication of CN117056494A publication Critical patent/CN117056494A/en
Application granted granted Critical
Publication of CN117056494B publication Critical patent/CN117056494B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a method, a device, electronic equipment and a computer storage medium for open domain question answering, relates to the technical field of artificial intelligence, and can be applied to vehicle-mounted scenes. The method comprises the following steps: acquiring a first question, and acquiring a first output result of a large language model LLM according to the first question, wherein the first output result comprises an answer of the first question and reference content of the answer of the first question, and the reference content comprises an explanation of the answer of the first question and/or an reasoning step of the answer of the first question; determining at least one target candidate evidence of the first problem according to the reference content of the first problem and a corpus; and acquiring a second question and acquiring a second output result of the large language model LLM according to the second question, wherein the second question comprises the first question and the at least one target candidate evidence. The method improves the accuracy of open domain questions and answers and further improves the satisfaction degree of users.

Description

Open domain question and answer method, device, electronic equipment and computer storage medium
Technical Field
The present application relates to the field of artificial intelligence (Artificial Intelligence, AI), and in particular, to methods, apparatus, electronic devices, and computer storage media for open domain question-answering.
Background
Open domain question-answering (Open-Domained Question Answering, openQA) is an important task in natural language processing (Natural Language Processing, NLP) that aims to answer questions in natural language based on large scale unstructured documents. Currently, open-domain questions and answers are often primarily based on relevance retrieval of input natural language questions in a corpus, i.e., retrieving entities and evidence related to the natural language questions to assist the model in answering the questions. Existing approaches are still limited because of possible differences in expression between natural language questions and evidence in the corpus.
Recently, due to the advent of large language models (Large Language Model, LLM), how to improve OpenQA using LLM has become one of the research hotspots. LLM is a neural network model based on deep learning, and by pre-training on a large amount of text data, excellent performance on a variety of natural language processing tasks can be achieved. However, LLM suffers from the problem of hallucinations at the same time, so that the open field question-answering effect is still not ideal.
Therefore, how to obtain better question-answering effect in the open field is a problem to be solved urgently.
Disclosure of Invention
The embodiment of the application provides a method, a device, electronic equipment and a computer storage medium for open domain question and answer, which improve the accuracy of open domain question and answer and further improve the satisfaction degree of users.
In a first aspect, an embodiment of the present application provides a method for open domain question answering, including:
acquiring a first problem, acquiring a first output result of a large language model LLM according to the first problem,
wherein the first output result comprises an answer to the first question and reference content of the answer to the first question, and the reference content comprises an explanation of the answer to the first question and/or an reasoning step of the answer to the first question;
determining at least one target candidate evidence of the first problem according to the reference content of the first problem and a corpus;
acquiring a second problem, acquiring a second output result of the large language model LLM according to the second problem,
wherein the second question includes the first question and the at least one target candidate evidence.
In a second aspect, an embodiment of the present application provides an apparatus for jointly scheduling a data center and a power grid, including:
An acquisition unit for acquiring a first question and acquiring a first output result of the large language model LLM according to the first question,
wherein the first output result comprises an answer to the first question and reference content of the answer to the first question, and the reference content comprises an explanation of the answer to the first question and/or an reasoning step of the answer to the first question;
a processing unit, configured to determine at least one target candidate evidence of the first problem according to the reference content of the first problem and a corpus;
an acquisition unit for acquiring a second problem and acquiring a second output result of the large language model LLM according to the second problem,
wherein the second question includes the first question and the at least one target candidate evidence.
In a third aspect, an embodiment of the present application provides an electronic device, including:
a processor adapted to implement computer instructions; the method comprises the steps of,
a memory storing computer instructions adapted to be loaded by a processor and to perform the method of the first aspect described above.
In a fourth aspect, embodiments of the present application provide a computer-readable storage medium storing computer instructions that, when read and executed by a processor of a computer device, cause the computer device to perform the method of the first aspect described above.
In a fifth aspect, embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions, so that the computer device performs the method of the first aspect described above.
According to the technical scheme, the terminal equipment firstly acquires a first output result of the large language model LLM according to the first question, wherein the first output result comprises an answer of the first question and reference content, the reference content comprises an explanation of the answer of the first question and/or an reasoning step of the answer of the first question, the reference content has relevance with the question, the reference content generated by the LLM is more detailed, and the reference content is in a statement form and basically consistent with evidence in a corpus, so that semantic retrieval is facilitated; and then the terminal equipment determines at least one candidate evidence with high similarity with the reference content in the corpus, queries the LLM again according to the at least one candidate evidence and the problem, and acquires a second output result of the large language model LLM, so that the query accuracy is improved, and the satisfaction of the user is further improved.
Drawings
FIG. 1 is an alternative schematic diagram of a system architecture according to an embodiment of the present application;
FIG. 2 is a schematic flow chart of a method for open domain question-answering provided in an embodiment of the present application;
FIG. 3 is a schematic block diagram of an apparatus of an embodiment of the present application;
fig. 4 is a schematic block diagram of an electronic device provided in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.
The present application relates to the field of AI, which is a theory, method, technique and application system that utilizes a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain an optimal result. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.
The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
The scheme provided by the embodiment of the application relates to artificial intelligence natural language processing and other technologies. Natural language processing (Nature Language processing, NLP) is an important direction in the fields of computer science and artificial intelligence. It is studying various theories and methods that enable effective communication between a person and a computer in natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Thus, the research in this field will involve natural language, i.e. language that people use daily, so it has a close relationship with the research in linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic questions and answers, knowledge graph techniques, and the like.
For a clearer understanding of embodiments of the present application, the following briefly describes artificial intelligence techniques that are referred to in the present application.
Open question-and-answer (Open-Domained Question Answering, openQA), an artificial intelligence technique, aims to enable a machine to answer any form of natural language questions, not just predefined questions or questions of a particular domain.
A large language model (Large Language Model, LLM), an artificial intelligence technique, uses deep learning models to learn language rules and semantic information from a large amount of text data to generate natural language text. LLM technology typically involves large neural network models of billions or even billions of parameters, and is therefore referred to as a "large language model".
A Prompt (Prompt), in natural language generation or text completion tasks, refers to a portion of text or question entered into a model that is used to guide the model to generate text meeting the context and task requirements. The promt may be a complete sentence or paragraph, or may be a short question or keyword.
Named entity recognition (Named Entity Recognition, NER) is intended to identify entities of interest in text, such as location, organization, and time. The identified entities may be used in various downstream applications, such as identification and information extraction systems based on patient records, or as a feature of machine learning systems for other natural language processing tasks.
Fig. 1 is an alternative schematic diagram of a system architecture 100 according to an embodiment of the present application. As shown in fig. 1, the system 100 includes a terminal device 110 and a user 120. User 120 may surf the internet through terminal device 110. For example, the user 120 can query the problem through the software ChatGPT (Chat Generative Pre-trained Transformer) on the terminal device 110, wherein ChatGPT is a large language model, chatGPT is a natural language processing tool driven by artificial intelligence technology, can perform dialogue by understanding and learning human language, can interact according to the chat context, can really chat and communicate like human, and can even complete the tasks of writing mails, video scripts, texts, translation, codes, writing papers and the like.
OpenQA is an important task in natural language processing (Natural Language Processing, NLP) that aims to answer questions in natural language based on large scale unstructured documents. Currently, open-domain questions and answers are often primarily based on relevance retrieval of an input natural language question in a corpus, i.e., retrieving entities and evidence related to the natural language question, to assist the model in answering the question.
Conventional relevance retrieval generally only uses entity-related information in the problem or the problem semantic retrieval corpus to obtain evidence. Although this approach may narrow the scope of model retrieval answers to some extent, because the form of the natural language question is usually different from the presentation in the corpus, the result of this manner of retrieval may be poor and the user experience low when the question contains multi-hop reasoning or more complex constraints.
Recently, due to the advent of large language models (Large Language Model, LLM), how to improve OpenQA using LLM has become one of the research hotspots. LLM is a neural network model based on deep learning, and by pre-training on a large amount of text data, LLM already shows strong semantic understanding and knowledge storage capability, and can realize excellent performance on various natural language processing tasks. For natural language questions, LLM can give not only refined answers, but also an explanation of the question, or detailed reasoning steps, through appropriate prompt. By analyzing these steps and interpretations, it can be determined whether the question was answered correctly by the LLM and the cause of its error is interpreted. However, LLM suffers from the problem of hallucinations at the same time, so that the open field question-answering effect is still not ideal.
Therefore, the application provides an open domain question-answering method, which comprises the steps of inputting a question into LLM, outputting an answer of the question by LLM, and explaining or reasoning the answer of the question, determining candidate evidence with high similarity with the explanation or reasoning step in a corpus, inputting the candidate evidence and the question into LLM query again, and obtaining a query result. Compared with the problem, the content of the reasoning steps or the explanation generated by the LLM is more detailed, the reasoning steps or the explanation are also in a statement form, basically consistent with the evidence in the corpus, semantic retrieval is facilitated, meanwhile, candidate evidence with high similarity with the explanation or the reasoning steps is determined in the corpus, the candidate evidence and the problem are input into the LLM query again, a query result is obtained, the query accuracy is improved, and the satisfaction of a user is further improved.
The following describes a scheme provided in the embodiments of the present application with reference to the accompanying drawings.
Fig. 2 is a schematic flowchart of a method 200 for open domain question-answering according to an embodiment of the present application. The method 200 may be performed by any electronic device having data processing capabilities. For example, the electronic device may be implemented as a server or a computer, and the electronic device may also be an in-vehicle terminal. The following description will take an example in which the electronic device is a terminal device. As shown in fig. 2, method 200 may include steps 210 through 230.
S210, the terminal equipment acquires a first question and acquires a first output result of a large language model LLM according to the first question, wherein the first output result comprises an answer of the first question and reference content of the answer of the first question, and the reference content comprises explanation of the answer of the first question and/or an reasoning step of the answer of the first question.
S220, the terminal equipment determines at least one target candidate evidence of the first problem according to the reference content and the corpus.
S230, the terminal equipment acquires a second problem and acquires a second output result of the large language model LLM according to the second problem, wherein the second problem comprises the problem and the at least one target candidate evidence.
In the method 200, the terminal device firstly obtains a first output result of a large language model LLM according to the first question, wherein the first output result comprises an answer of the first question and reference content, the reference content comprises an explanation of the answer of the first question and/or an reasoning step of the answer of the first question, the reference content has relevance to the first question, and the reference content generated by LLM is more detailed, and the reference content also adopts a statement form, is basically consistent with evidence in a corpus, and is more beneficial to semantic retrieval; and then the terminal equipment determines at least one target candidate evidence with high similarity with the reference content in the corpus, queries the LLM again according to the at least one candidate evidence and the problem, and acquires a second output result of the large language model LLM, thereby improving the query accuracy and further improving the satisfaction of the user.
For a clearer understanding of the present embodiments, the method 200 is described in the following substeps.
Optionally, in step S210, the terminal device obtains a first problem, and obtains a first output result of the large language model LLM according to the first problem, including:
the terminal equipment receives a first problem input by a user through a keyboard, and acquires a first output result of a large language model LLM according to the first problem; or,
the terminal equipment acquires a first problem input by a user through voice recognition, and acquires a first output result of the large language model LLM according to the first problem.
Specifically, the user inputs the first question into a large language model LLM installed on the terminal device through a keyboard or a voice function of the terminal device, and obtains a first output result of the LLM.
Optionally, the first question includes indication information, where the indication information is used to indicate that the first output result output by the LLM model includes reference content of an answer to the first question.
Specifically, when the instruction information includes an explanation of an answer to the first question, the first output result includes reference content that is an explanation of the answer to the first question;
When the indication information comprises an inference step of generating an answer to the first question, an inference step of using the reference content included in the first output result as the answer to the first question;
when the instruction information includes an inference step of generating an answer interpretation of the first question and an answer to the question, the first output result includes a reference content of the interpretation of the answer to the first question and the inference step of the answer to the question.
The reference content is the explanation of the answer of the first question and the reasoning step of the answer of the question, and the reasoning steps or the explanation also adopt a statement form, are basically consistent with the evidence in the corpus, and are more beneficial to semantic retrieval; and then the terminal equipment determines at least one candidate evidence with high similarity with the interpretation or reasoning step in the corpus, queries the LLM again according to the at least one candidate evidence and the problem, and acquires a second output result of the large language model LLM, thereby improving the query accuracy and further improving the satisfaction of the user.
For example, the first question may be in the form of generating interpreted campt, as follows:
{Question}\nYour reply should be of the format "answer: x\nexplanation: y",\nx is the most concise answer and needs to be kept as short as possible, \ny is the explanation of x.
for another example, the first question may also be a prompt form of generating an inference step, as follows:
{Question}\n Your reply should be of the format "answer: x\nsteps:\ny1\ny2\n...\nyn\n".\nx is the most concise answer and needs to be kept as short as possible, \nyn is the n-th step that only states the facts obtained and does not describe the process.
Optionally, in step S220, the terminal device determines at least one target candidate evidence of the problem according to the reference content and the corpus, including:
the terminal equipment encodes the ith reference content in the reference content according to the first encoder to obtain a vector of the ith reference content, wherein i is more than or equal to 1;
the terminal equipment determines the semantic similarity between the vector of the ith reference content and the vector of each candidate evidence in the corpus;
and the terminal equipment determines the candidate evidence of which the semantic similarity meets the preset condition as at least one target candidate evidence of the problem.
Specifically, after the terminal device obtains the vector of the ith reference content through the first encoder, the terminal device may combine the vector of the ith reference content with the corpus through a retriever RAs input and return a smaller text filter set +.>,/>I.e., at least one candidate proof.
For example, retrieversIs a function of reasoning step of LLM generation +.>Corresponding vector and corpus->As input, retriever->Return a smaller text filter set +.>Wherein->
Alternatively, the corpus is Wikipedia (Wikipedia).
Optionally, the first encoder is a simple comparative sentence vector representation network (simple contrastive sentence embedding framework, simCSE).
In particular, the reference content is in the form of a natural language representation, the reference content needs to be encoded into a vector that can be identified by the terminal device, and this particular process can be implemented by the first encoder. The reference content is encoded into a vector recognizable by the terminal device using the SimCSE model as the first encoder. The SimCSE model maps any text passage to oneAnd (5) maintaining a real value vector.
For example, for each reasoning stepThe SimCSE model can be used for encoding the vector to obtain vector representationAs shown in formula (1):
(1)
wherein,is the encoder of the SimCSE model.
Optionally, the SimCSE model can also build all of the reference content included for retrievalIndex of the individual paragraphs.
Optionally, before the terminal device encodes the ith reference content in the reference contents according to the first encoder to obtain the vector of the ith reference content, the method further includes:
the [ CLS ] is spliced before the reference content as a token representing the whole reference content semantics, so that the vector corresponding to the [ CLS ] is finally used as the vector of the reference content.
The [ CLS ] is an abbreviation of classification, and the output vector corresponding to the [ CLS ] is a semantic feature vector that can represent the entire text. A CLS symbol is inserted before the text and the output vector corresponding to the symbol is used as a semantic representation of the text, because the symbol without obvious semantic information more "fairly" fuses the semantic information of each word/word in the text than other words/words already in the text.
Optionally, before the terminal device encodes the ith reference content in the reference contents according to the first encoder to obtain the vector of the ith reference content, the method further includes:
the terminal equipment identifies an entity included in the ith reference content;
the terminal equipment masks the entity included in the ith reference content through a mask to obtain updated ith reference content;
the terminal device encodes the ith reference content in the reference content according to a first encoder to obtain a vector of the ith reference content, and the method comprises the following steps:
the terminal equipment encodes the updated ith reference content according to the first encoder to obtain a vector of the ith reference content.
In particular, to avoid LLM from negatively affecting the search because entities in the inference step generate erroneous interpretation or inference steps, NER is performed on the interpretation or inference steps, and the involved entities are masked with a MASK.
For example, for one inference step "David Lee Stenstrom portrayed Waldo the inhibitor," the entities David Lee Stenstrom and Waldo therein are converted to MASKs, resulting in a new inference step "[ MASK ] portrayed [ MASK ] the inhibitor".
Optionally, the determining the semantic similarity of the vector of the ith reference content to the vector of each candidate evidence in the corpus includes:
and calculating cosine similarity of the vector of the ith reference content and the vector of each candidate evidence, and obtaining semantic similarity of the vector of the ith reference content and the vector of each candidate evidence in the corpus.
While there are more expressive forms of models to measure the similarity between the interpretation/reasoning steps of the answer and each candidate evidence, such as a network of multiple layers of cross-attention, the network requires that the similarity function be resolvable, most resolvable similarity functions being some transform of the Euclidean distance. A simpler cosine function is therefore chosen in this application to calculate the cosine similarity of the vector of the ith reference content to the vector of each candidate evidence.
Optionally, the determining the semantic similarity of the vector of the ith reference content to the vector of each candidate evidence in the corpus includes:
And calculating the distance between the vector of the ith reference content and the vector of each candidate evidence, and obtaining the semantic similarity between the vector of the ith reference content and the vector of each candidate evidence in the corpus.
Optionally, the preset condition includes: the semantic similarity is greater than or equal to a preset threshold; alternatively, the preset condition includes: the semantic similarity is arranged in sequence, and the semantic similarity arranged before the kth bit is selected.
Specifically, if the preset threshold is 0.9, candidate evidence with semantic similarity greater than or equal to 0.9 may be determined as the target candidate evidence.
In another example, setting k, sequentially arranging a plurality of candidate evidences according to the semantic similarity from large to small, and selecting the candidate evidence with the similarity of the candidate evidence ranked in the first k bits as the target candidate evidence, wherein the candidate evidence is shown in a formula (2): ,
(2)
wherein,represents the i-th reference content,/->Representing the j candidate evidence,>is a candidate evidence of the target.
Optionally, the method further comprises:
fragmenting the corpus to obtain a plurality of candidate evidences;
and encoding each candidate evidence according to the second encoder to obtain a vector of each candidate evidence.
Optionally, the second encoder is SimCSE.
Specifically, the corpus is segmented into sentences of length n words, for each candidate evidenceRepresenting it as vector +.>As shown in formula (3):
(3)
wherein,is a second encoder.
Optionally, the encoder SimCSE is applied to all paragraphs of the corpus, resulting in vectors for all paragraphs. And index off-line using Facebook's similar search open source library (Facebook AI Similarity Search, FAISS) to build index information for each paragraph. FAISS is a very efficient open source library for similarity searching and clustering of dense vectors, which can be easily applied to billions of vectors.
Optionally, the method 200 further includes:
the training data is constructed and the data is stored,
determining a loss function of the first encoder and a loss function of the second encoder;
determining a minimum value of a loss function of the first encoder and a minimum value of a loss function of the second encoder according to the training data respectively;
optimizing the first encoder by a back propagation algorithm based on a minimum of a loss function of the first encoder;
the second encoder is optimized by a back propagation algorithm based on a minimum of the loss function of the second encoder.
Specifically, embodiments of the present application will train the first encoder and the second encoder simultaneously. The goal of training the encoder is to make the distance between the relevant reasoning step and the pairs of evidence paragraphs smaller (i.e. higher similarity), while the distance between the irrelevant questions and pairs of paragraphs is larger, this goal being achieved by learning a better embedding function.
Is provided withIs training data comprising->Examples are given. Each instance comprising an inference step +.>And a relevant (front) paragraph->And->An irrelevant (negative) paragraph->. The model is optimized by minimizing a loss function, which is the negative log-likelihood of the front paragraph, as shown in equation (4):
(4)
the first encoder and the second encoder are optimized by a back propagation algorithm based on the minimum of the loss function.
In step S230, the terminal device obtains a second question, and obtains a second output result of the large language model LLM according to the second question, where the second question includes the first question and the at least one target candidate evidence.
Specifically, after at least one target candidate evidence is obtainedAfter that, the context for LLM reasoning is obtained. By putting- >Add to question->The second question was composed before and the LLM was queried again to get an answer.
The second problem is in the form of:
Context: \n{} \nQuestion: {question}\nYour reply should be of the format "answer: x\nexplanation: y",\nx is the most concise answer and needs to be kept as short as possible, \ny is the explanation of x.
optionally, the terminal device obtains a second problem, and obtains a second output result of the large language model LLM according to the second problem, including:
the terminal equipment acquires the second problem input by the user through a keyboard, and acquires a second output result of the large language model LLM according to the second problem; or,
the terminal equipment generates the second problem according to the first problem and the at least one target candidate evidence, and acquires a second output result of the large language model LLM according to the second problem.
In the example of the present application, the terminal device first generates an answer and a corresponding reasoning process for each using LLM. The reasoning step is an evidence of LLM generation. Such evidence may have illusions, such as entities and time of error. To alleviate these illusions, each inference step is used to search the corpus for correct evidence. Therefore, the SimCSE model is adopted to search evidence similar to the reasoning step from the corpus in a dense search mode. In particular, the goal of the compact retriever is to index all paragraphs in a low-dimensional continuous space in order to efficiently retrieve the top k paragraphs that are relevant to the input problem at run-time. And finally, inputting the retrieved top-k evidence as a context and the original question into the LLM to answer, so as to promote the question and answer effect.
The specific embodiments of the present application have been described in detail above with reference to the accompanying drawings, but the present application is not limited to the specific details of the above embodiments, and various simple modifications may be made to the technical solutions of the present application within the scope of the technical concept of the present application, and all the simple modifications belong to the protection scope of the present application. For example, the specific features described in the above embodiments may be combined in any suitable manner, and in order to avoid unnecessary repetition, various possible combinations are not described in detail. As another example, any combination of the various embodiments of the present application may be made without departing from the spirit of the present application, which should also be considered as disclosed herein.
It should be further understood that, in the various method embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic of the processes, and should not constitute any limitation on the implementation process of the embodiments of the present application. It is to be understood that the numbers may be interchanged where appropriate such that the described embodiments of the application may be implemented in other sequences than those illustrated or described.
Method embodiments of the present application are described in detail above in connection with fig. 1-2, and apparatus embodiments of the present application are described in detail below in connection with fig. 3 and 4.
Fig. 3 is a schematic block diagram of an apparatus 300 according to an embodiment of the present application, where the apparatus 300 may implement the functions of the terminal device in the above method. As shown in fig. 3, the apparatus 300 may include an acquisition unit 310 and a processing unit 320.
An obtaining unit 310, configured to obtain a first question, and obtain a first output result of the large language model LLM according to the first question, where the first question includes a question and answer instruction information of the question, and the first output result includes an answer of the question and reference content, and the reference content is generated according to the answer instruction information of the question.
The processing unit 320 is configured to determine at least one candidate evidence of the problem according to the reference content and the corpus.
The obtaining unit 310 is further configured to obtain a second question, and obtain a second output result of the large language model LLM according to the second question, where the second question includes the question and the at least one candidate evidence.
In one embodiment, the answer instruction information includes an interpretation of the question and/or an inference step of the question, and the reference content includes an interpretation of the answer and/or an inference step of the answer.
In one embodiment, the processing unit 320 is specifically configured to:
encoding the ith reference content in the reference content according to a first encoder to obtain a vector of the ith reference content, wherein i is more than or equal to 1;
determining semantic similarity of the vector of the ith reference content and the vector of each candidate evidence in the corpus;
and determining the candidate evidence of which the semantic similarity meets the preset condition as at least one target candidate evidence of the problem.
In one embodiment, before the encoding of the ith reference content of the reference contents according to the first encoder, the processing unit 320 is specifically configured to further:
identifying an entity included in the ith reference content;
masking the entity included in the ith reference content by a mask to obtain updated ith reference content;
the encoding the ith reference content in the reference content according to the first encoder to obtain a vector of the ith reference content comprises:
and encoding the updated ith reference content according to the first encoder to obtain a vector of the ith reference content.
In one embodiment, the processing unit 320 is specifically configured to:
And calculating cosine similarity of the vector of the ith reference content and the vector of each candidate evidence, and obtaining semantic similarity of the vector of the ith reference content and the vector of each candidate evidence in the corpus.
In one embodiment, the processing unit 320 is specifically configured to:
and calculating the distance between the vector of the ith reference content and the vector of each candidate evidence, and obtaining the semantic similarity between the vector of the ith reference content and the vector of each candidate evidence in the corpus.
In one embodiment, the preset condition includes: the semantic similarity is greater than or equal to a preset threshold; alternatively, the preset condition includes: the semantic similarity is arranged in sequence, and the semantic similarity arranged before the kth bit is selected.
In one embodiment, the processing unit 320 is specifically configured to:
fragmenting the corpus to obtain a plurality of candidate evidences;
and encoding each candidate evidence according to the second encoder to obtain a vector of each candidate evidence.
In one embodiment, the first encoder and the second encoder characterize the network SimCSE for simple comparative sentence vectors.
In one embodiment, the processing unit 320 is specifically configured to:
The training data is constructed and the data is stored,
determining a loss function of the first encoder and the second encoder;
determining a minimum value of the loss function based on the training data;
the first encoder and the second encoder are optimized by a back propagation algorithm based on the minimum of the loss function.
It should be understood that apparatus embodiments and method embodiments may correspond with each other and that similar descriptions may refer to the method embodiments. To avoid repetition, no further description is provided here. Specifically, when the apparatus 300 for data processing in this embodiment may correspond to an execution body for executing the method 200 in this embodiment of the present application, the foregoing and other operations and/or functions of each module in the apparatus 300 are respectively for implementing the corresponding flow of each method in fig. 2, and are not described herein for brevity.
The apparatus and system of embodiments of the present application are described above in terms of functional modules in connection with the accompanying drawings. It should be understood that the functional module may be implemented in hardware, or may be implemented by instructions in software, or may be implemented by a combination of hardware and software modules. Specifically, each step of the method embodiments in the embodiments of the present application may be implemented by an integrated logic circuit of hardware in a processor and/or an instruction in software form, and the steps of the method disclosed in connection with the embodiments of the present application may be directly implemented as a hardware decoding processor or implemented by a combination of hardware and software modules in the decoding processor. Alternatively, the software modules may be located in a well-established storage medium in the art such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, registers, and the like. The storage medium is located in a memory, and the processor reads information in the memory, and in combination with hardware, performs the steps in the above method embodiments.
Fig. 4 is a schematic block diagram of an electronic device 400 provided in an embodiment of the present application.
As shown in fig. 4, the electronic device 400 may include:
a memory 410 and a processor 420, the memory 410 being adapted to store a computer program and to transfer the program code to the processor 420. In other words, the processor 420 may call and run a computer program from the memory 410 to implement the methods in embodiments of the present application.
For example, the processor 420 can be configured to execute the steps of the various execution bodies of the method 200 described above according to instructions in the computer program.
In some embodiments of the present application, the processor 420 may include, but is not limited to:
a general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like.
In some embodiments of the present application, the memory 410 includes, but is not limited to:
volatile memory and/or nonvolatile memory. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable EPROM (EEPROM), or a flash Memory. The volatile memory may be random access memory (Random Access Memory, RAM) which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (Double Data Rate SDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), and Direct memory bus RAM (DR RAM).
In some embodiments of the present application, the computer program may be partitioned into one or more modules that are stored in the memory 410 and executed by the processor 420 to perform the methods provided herein. The one or more modules may be a series of computer program instruction segments capable of performing the specified functions, which are used to describe the execution of the computer program in the electronic device 400.
Optionally, the electronic device 400 may further include:
a communication interface 430, the communication interface 430 being connectable to the processor 420 or the memory 410.
The processor 420 may control the communication interface 430 to communicate with other devices, and in particular, may send information or data to other devices, or obtain information or data sent by other devices. By way of example, communication interface 430 may include a transmitter and an acquisition machine. The communication interface 430 may further include antennas, the number of which may be one or more.
It should be appreciated that the various components in the electronic device 400 are connected by a bus system that includes a power bus, a control bus, and a status signal bus in addition to a data bus.
According to an aspect of the present application, there is provided a communication device comprising a processor and a memory for storing a computer program, the processor being adapted to invoke and run the computer program stored in the memory, such that the encoder performs the method of the above method embodiments.
According to an aspect of the present application, there is provided a computer storage medium having stored thereon a computer program which, when executed by a computer, enables the computer to perform the method of the above-described method embodiments. Alternatively, embodiments of the present application also provide a computer program product comprising instructions which, when executed by a computer, cause the computer to perform the method of the method embodiments described above.
According to another aspect of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium and executes the computer instructions to cause the computer device to perform the method of the above-described method embodiments.
In other words, when implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces, in whole or in part, a flow or function consistent with embodiments of the present application. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line (digital subscriber line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a digital video disc (digital video disc, DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.
It should be understood that in the embodiments of the present application, "B corresponding to a" means that B is associated with a. In one implementation, B may be determined from a. It should also be understood that determining B from a does not mean determining B from a alone, but may also determine B from a and/or other information.
In the description of the present application, unless otherwise indicated, "at least one" means one or more, and "a plurality" means two or more. In addition, "and/or" describes an association relationship of the association object, and indicates that there may be three relationships, for example, a and/or B may indicate: a alone, a and B together, and B alone, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b, or c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or plural.
It should be further understood that the description of the first, second, etc. in the embodiments of the present application is for purposes of illustration and distinction only, and does not represent a specific limitation on the number of devices in the embodiments of the present application, and should not constitute any limitation on the embodiments of the present application.
It should also be appreciated that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It will be appreciated that in the specific embodiments of the present application, data related to user information and the like may be involved. When the above embodiments of the present application are applied to specific products or technologies, user approval or consent is required, and the collection, use and processing of relevant data is required to comply with relevant laws and regulations and standards of the relevant countries and regions.
Those of ordinary skill in the art will appreciate that the various illustrative modules and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the several embodiments provided in this application, it should be understood that the disclosed apparatus, device, and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms.
The modules illustrated as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. For example, functional modules in the embodiments of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module.
The foregoing is merely a specific embodiment of the present application, but the protection scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes or substitutions are covered in the protection scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the right indication information.

Claims (11)

1. A method of open-domain question-answering for calibrating a first output result of a large language model, comprising:
acquiring a first problem, acquiring a first output result of a large language model according to the first problem,
wherein the first output result comprises a first answer to the first question and reference content of the first answer to the first question, and the reference content comprises an explanation of the first answer to the first question and/or an reasoning step of the first answer to the first question;
determining at least one target candidate evidence of the first problem according to the reference content of the first problem and a corpus;
acquiring a second problem, acquiring a second output result of the large language model according to the second problem,
Wherein the second question comprises the first question and the at least one target candidate evidence, and the second output result comprises a second answer to the first question;
the determining at least one target candidate evidence of the first question according to the reference content of the first question and the corpus comprises:
encoding the ith reference content in the reference contents according to a first encoder to obtain a vector of the ith reference content, wherein i is more than or equal to 1;
determining semantic similarity of the vector of the ith reference content and the vector of each candidate evidence in the corpus;
determining candidate evidences with the semantic similarity meeting a preset condition as at least one target candidate evidence of the problem;
the method further comprises the steps of:
fragmenting the corpus to obtain a plurality of candidate evidences;
and encoding each candidate evidence according to the second encoder to obtain a vector of each candidate evidence.
2. The method of claim 1, wherein the first question includes indication information indicating that the first output result output by the large language model includes reference content of an answer to the first question.
3. The method of claim 1, wherein prior to said encoding an i-th reference content of said reference contents according to a first encoder, resulting in a vector of said i-th reference content, the method further comprises:
identifying an entity included in the ith reference content;
masking the entity included in the ith reference content through a mask to obtain updated ith reference content;
the coding the ith reference content in the reference contents according to the first coder to obtain a vector of the ith reference content, comprising:
and encoding the updated ith reference content according to the first encoder to obtain a vector of the ith reference content.
4. The method of claim 1, wherein the determining the semantic similarity of the vector of the ith reference content to the vector of each candidate evidence in the corpus comprises:
and calculating cosine similarity of the vector of the ith reference content and the vector of each candidate evidence, and obtaining semantic similarity of the vector of the ith reference content and the vector of each candidate evidence in the corpus.
5. The method of claim 1, wherein the determining the semantic similarity of the vector of the ith reference content to the vector of each candidate evidence in the corpus comprises:
And calculating the distance between the vector of the ith reference content and the vector of each candidate evidence, and obtaining the semantic similarity between the vector of the ith reference content and the vector of each candidate evidence in the corpus.
6. The method of claim 1, wherein the preset conditions include: the semantic similarity is greater than or equal to a preset threshold; alternatively, the preset condition includes: and arranging the semantic similarity in sequence, and selecting the semantic similarity arranged before the kth bit.
7. The method of claim 1, wherein the first encoder and the second encoder characterize a network SimCSE for simple comparative sentence vectors.
8. The method according to claim 1, wherein the method further comprises:
the training data is constructed and the data is stored,
determining a loss function of the first encoder and a loss function of the second encoder;
respectively determining the minimum value of the loss function of the first encoder and the minimum value of the loss function of the second encoder according to the training data;
optimizing the first encoder by a back propagation algorithm according to a minimum value of a loss function of the first encoder;
Optimizing the second encoder by a back propagation algorithm based on a minimum of a loss function of the second encoder.
9. An apparatus for open-domain question-answering, the apparatus for calibrating a first output result of a large language model comprising:
an acquisition unit configured to acquire a first question and acquire a first output result of a large language model according to the first question,
wherein the first output result comprises a first answer to the first question and reference content of the first answer to the first question, and the reference content comprises an explanation of the first answer to the first question and/or an reasoning step of the first answer to the first question;
a processing unit, configured to determine at least one target candidate evidence of the first problem according to the reference content of the first problem and a corpus;
an acquisition unit configured to acquire a second problem and acquire a second output result of the large language model according to the second problem,
wherein the second question comprises the first question and the at least one target candidate evidence, and the second output result comprises a second answer to the first question;
The processing unit is further configured to:
encoding the ith reference content in the reference contents according to a first encoder to obtain a vector of the ith reference content, wherein i is more than or equal to 1;
determining semantic similarity of the vector of the ith reference content and the vector of each candidate evidence in the corpus;
determining candidate evidences with the semantic similarity meeting a preset condition as at least one target candidate evidence of the problem;
the processing unit is further configured to:
fragmenting the corpus to obtain a plurality of candidate evidences;
and encoding each candidate evidence according to the second encoder to obtain a vector of each candidate evidence.
10. An electronic device comprising a processor and a memory, the memory having instructions stored therein, which when executed by the processor, cause the processor to perform the method of any of claims 1-8.
11. A computer storage medium comprising instructions which, when run on a computer, cause the computer to perform the method of any of claims 1-8.
CN202311265498.XA 2023-09-28 2023-09-28 Open domain question and answer method, device, electronic equipment and computer storage medium Active CN117056494B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311265498.XA CN117056494B (en) 2023-09-28 2023-09-28 Open domain question and answer method, device, electronic equipment and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311265498.XA CN117056494B (en) 2023-09-28 2023-09-28 Open domain question and answer method, device, electronic equipment and computer storage medium

Publications (2)

Publication Number Publication Date
CN117056494A CN117056494A (en) 2023-11-14
CN117056494B true CN117056494B (en) 2024-01-23

Family

ID=88657495

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311265498.XA Active CN117056494B (en) 2023-09-28 2023-09-28 Open domain question and answer method, device, electronic equipment and computer storage medium

Country Status (1)

Country Link
CN (1) CN117056494B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117453899B (en) * 2023-12-26 2024-03-29 浙江智港通科技有限公司 Intelligent dialogue system and method based on large model and electronic equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111695591A (en) * 2020-04-26 2020-09-22 平安科技(深圳)有限公司 AI-based interview corpus classification method, device, computer equipment and medium
CN115186073A (en) * 2022-05-31 2022-10-14 浙江华巽科技有限公司 Open domain table text question-answering method based on hybrid retrieval
CN115329054A (en) * 2022-06-01 2022-11-11 赵慧雅 Open domain question-answering system for complexity problem
CN116561278A (en) * 2023-05-05 2023-08-08 科大讯飞股份有限公司 Knowledge question-answering method, device, equipment and storage medium
WO2023161630A1 (en) * 2022-02-22 2023-08-31 Unlikely Artificial Intelligence Limited Computer implemented methods for the automated analysis or use of data, including use of a large language model
CN116737895A (en) * 2023-06-01 2023-09-12 华为技术有限公司 Data processing method and related equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230274086A1 (en) * 2021-08-24 2023-08-31 Unlikely Artificial Intelligence Limited Computer implemented methods for the automated analysis or use of data, including use of a large language model

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111695591A (en) * 2020-04-26 2020-09-22 平安科技(深圳)有限公司 AI-based interview corpus classification method, device, computer equipment and medium
WO2023161630A1 (en) * 2022-02-22 2023-08-31 Unlikely Artificial Intelligence Limited Computer implemented methods for the automated analysis or use of data, including use of a large language model
CN115186073A (en) * 2022-05-31 2022-10-14 浙江华巽科技有限公司 Open domain table text question-answering method based on hybrid retrieval
CN115329054A (en) * 2022-06-01 2022-11-11 赵慧雅 Open domain question-answering system for complexity problem
CN116561278A (en) * 2023-05-05 2023-08-08 科大讯飞股份有限公司 Knowledge question-answering method, device, equipment and storage medium
CN116737895A (en) * 2023-06-01 2023-09-12 华为技术有限公司 Data processing method and related equipment

Also Published As

Publication number Publication date
CN117056494A (en) 2023-11-14

Similar Documents

Publication Publication Date Title
CN109033068B (en) Method and device for reading and understanding based on attention mechanism and electronic equipment
Lukovnikov et al. Neural network-based question answering over knowledge graphs on word and character level
CN108875074B (en) Answer selection method and device based on cross attention neural network and electronic equipment
CN107798140B (en) Dialog system construction method, semantic controlled response method and device
CN111930942B (en) Text classification method, language model training method, device and equipment
CN114565104A (en) Language model pre-training method, result recommendation method and related device
CN117056494B (en) Open domain question and answer method, device, electronic equipment and computer storage medium
CN111599340A (en) Polyphone pronunciation prediction method and device and computer readable storage medium
CN113239169A (en) Artificial intelligence-based answer generation method, device, equipment and storage medium
CN113255320A (en) Entity relation extraction method and device based on syntax tree and graph attention machine mechanism
CN111144093A (en) Intelligent text processing method and device, electronic equipment and storage medium
CN112464655A (en) Word vector representation method, device and medium combining Chinese characters and pinyin
Park et al. Natural language generation using dependency tree decoding for spoken dialog systems
CN116956835B (en) Document generation method based on pre-training language model
CN117648429A (en) Question-answering method and system based on multi-mode self-adaptive search type enhanced large model
CN115129826B (en) Electric power field model pre-training method, fine tuning method, device and equipment
CN112818688B (en) Text processing method, device, equipment and storage medium
Pratheek et al. Prediction of answer keywords using char-RNN
CN114547308A (en) Text processing method and device, electronic equipment and storage medium
CN114373443A (en) Speech synthesis method and apparatus, computing device, storage medium, and program product
CN117034942B (en) Named entity recognition method, device, equipment and readable storage medium
US11914635B2 (en) Performing image search based on user input using neural networks
CN114638231B (en) Entity linking method and device and electronic equipment
CN112183114B (en) Model training and semantic integrity recognition method and device
CN117910476A (en) Neural network interpretation method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant