CN117891927A

CN117891927A - Question and answer method and device based on large language model, electronic equipment and storage medium

Info

Publication number: CN117891927A
Application number: CN202410295644.1A
Authority: CN
Inventors: 张宇光; 姚相振; 胡影; 李琳; 任英杰
Original assignee: China Electronics Standardization Institute
Current assignee: China Electronics Standardization Institute
Priority date: 2024-03-15
Filing date: 2024-03-15
Publication date: 2024-04-16

Abstract

The invention provides a question-answering method, a question-answering device, electronic equipment and a storage medium based on a large language model, belonging to the technical field of natural language processing, wherein the question-answering method comprises the following steps: converting non-text information in the multi-mode question information to be replied into corresponding second text information; searching a first text result matched with first text information in the multimodal questioning information to be replied and a second text result matched with second text information from a target text library respectively; the target text library is constructed based on the first text information and the second text information; and determining reply information corresponding to the multi-mode question information to be replied based on the first text information, the first text result and the second text result by using a preset large language model. The invention can realize high-efficiency and accurate answer to the multi-mode question information, does not need to additionally increase the training cost of the large language model, and effectively relieves the actual illusion problem which is easy to occur in the large language model.

Description

Question and answer method and device based on large language model, electronic equipment and storage medium

Technical Field

The present invention relates to the field of natural language processing technologies, and in particular, to a question answering method, device, electronic device, and storage medium based on a large language model.

Background

With the continuous development of artificial intelligence technology, large language models (Large Language Model, LLM) have been widely applied to intelligent question-answering scenarios by virtue of their strong learning and generalization capabilities, which can answer in time according to questions posed by users.

However, in the prior art, the large language model usually only trains natural language and does not relate to other modes such as pictures, so when the inputs of other modes such as pictures are related, the answer content output by the large language model is often inaccurate, and the problem of reality illusion easily occurs.

Disclosure of Invention

The invention provides a question-answering method, a question-answering device, electronic equipment and a storage medium based on a large language model, which are used for solving the defects that answer contents output by the large language model are often inaccurate and a factual illusion problem is easy to occur when other modes such as pictures are involved in input in the prior art.

The invention provides a question-answering method based on a large language model, which comprises the following steps:

converting non-text information in the multi-mode question information to be replied into corresponding second text information;

searching a first text result matched with the first text information in the multimodal questioning information to be replied and a second text result matched with the second text information from a target text library respectively; the target text library is constructed based on the first text information and the second text information;

and determining reply information corresponding to the multimodal questioning information to be replied based on the first text information, the first text result and the second text result by using a preset large language model.

According to the question-answering method based on the large language model provided by the invention, before the first text result matched with the first text information in the multi-mode question information to be answered and the second text result matched with the second text information are respectively retrieved from the target text library, the method further comprises:

respectively cutting the first text information and the second text information to obtain a plurality of first text paragraph information corresponding to the first text information and a plurality of second text paragraph information corresponding to the second text information, and importing each of the first text paragraph information and the second text paragraph information into a preset search library;

determining target index information based on the first text paragraph information and the second text paragraph information by using the preset search library;

and generating a target text library based on the preset search library and the target index information.

According to the question-answering method based on the large language model provided by the invention, the method for determining the target index information based on the first text paragraph information and the second text paragraph information by using the preset search library comprises the following steps:

word segmentation is carried out on the first text paragraph information and the second text paragraph information respectively by adopting a word segmentation tool in the preset search library, so that word segmentation data corresponding to each first text paragraph information and word segmentation data corresponding to each second text paragraph information are obtained;

and determining target index information based on the word segmentation data corresponding to each piece of first text paragraph information, the word segmentation data corresponding to each piece of second text paragraph information and the pre-stored document information in the preset search library.

According to the question-answering method based on the large language model provided by the invention, the answer information corresponding to the multi-mode question information to be answered is determined based on the first text information, the first text result and the second text result by utilizing the preset large language model, and the method comprises the following steps:

according to a preset sequence, information splicing is carried out on the first text information, the first text result, the second text result and preset prompt information, and model input information is obtained;

and inputting the model input information into the preset large language model to obtain the reply information corresponding to the multi-mode question information to be replied, which is output by the preset large language model.

The invention also provides a question answering device based on the large language model, which comprises:

the conversion module is used for converting the non-text information in the multi-mode question information to be replied into corresponding second text information;

the retrieval module is used for respectively retrieving a first text result matched with the first text information in the multimodal questioning information to be replied and a second text result matched with the second text information from a target text library; the target text library is constructed based on the first text information and the second text information;

and the processing module is used for determining reply information corresponding to the multi-mode question information to be replied based on the first text information, the first text result and the second text result by utilizing a preset large language model.

According to the invention, the question answering device based on the large language model further comprises:

the cutting module is used for respectively cutting the first text information and the second text information to obtain a plurality of first text paragraph information corresponding to the first text information and a plurality of second text paragraph information corresponding to the second text information, and importing each of the first text paragraph information and the second text paragraph information into a preset search library;

the construction module is used for determining target index information based on the first text paragraph information and the second text paragraph information by using the preset search library;

and the generation module is used for generating a target text library based on the preset search library and the target index information.

According to the question-answering device based on the large language model provided by the invention, the construction module is specifically used for:

According to the question-answering device based on the large language model provided by the invention, the processing module is specifically used for:

The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the question-answering method based on the large language model when executing the program.

The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a large language model based question-answering method as described in any one of the above.

The present invention also provides a computer program product comprising a computer program which when executed by a processor implements a large language model based question-answering method as described in any one of the above.

According to the question-answering method, the question-answering device, the electronic equipment and the storage medium based on the large language model, the mode information of the non-text mode in the multi-mode question information to be answered is converted into the corresponding text information, and various text information is placed into the search library to construct the target text library so as to search the search result corresponding to the text mode and the search result corresponding to the non-text mode from the target text library, so that the utilization of the non-text mode information is realized, various search results and the question text information are input into the prompt input of the preset large language model together to serve as references, the answer information corresponding to the multi-mode question information to be answered is guided to be generated by the model, the efficient and accurate answer to the multi-mode question information can be realized, the training cost of the large language model is not required to be additionally increased, and the fact illusion problem that the large language model is easy to appear is effectively relieved.

Drawings

In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a question-answering method based on a large language model provided by the invention;

FIG. 2 is a schematic diagram of the structure of the question-answering device based on the large language model provided by the invention;

fig. 3 is a schematic diagram of an entity structure of an electronic device according to the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In the description of the invention, it should be noted that, unless explicitly stated and limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art according to the specific circumstances.

The method, apparatus, electronic device and storage medium for large language model based question answering of the present invention are described below with reference to fig. 1 to 3.

FIG. 1 is a flow chart of a question-answering method based on a large language model, which is provided by the invention, as shown in FIG. 1, and comprises the following steps:

step 110, converting the non-text information in the multi-mode question information to be replied to corresponding second text information;

step 120, respectively retrieving a first text result matched with the first text information in the multimodal questioning information to be replied and a second text result matched with the second text information from the target text library; the target text library is constructed based on the first text information and the second text information;

and 130, determining reply information corresponding to the multi-mode question information to be replied based on the first text information, the first text result and the second text result by utilizing a preset large language model.

Specifically, the multimodal question information to be replied described in the embodiment of the present invention refers to multimodal question information input by a user, where the multimodal refers to a plurality of modalities appearing in a task, and may be a combination of a text modality, a picture modality, a video modality, and the like.

That is, in an embodiment of the present invention, a user may input question information in a combination form including a text modality, a picture modality, a video modality, and the like. For example, the multimodal questioning information to be answered may include the picture modality information of the flower image and "what color of the flowers in the questioning picture? "text modality information.

The first text information described in the embodiment of the present invention refers to text modal information in multimodal question information to be replied, that is, text information of a question input by a user, for example, "what color is a flower in a question picture? "text modality information.

The second text information described in the embodiment of the invention refers to text information obtained after text conversion of non-text information in the multi-mode question information to be replied. For example, a cross-modality generation model ERNIE-ViLG may be employed to translate the input picture information into corresponding text information.

The target text library described in the embodiment of the invention can be constructed by adopting an open source search (ES) search library as a knowledge base for search enhancement generation (Retrieval Augmented Generation, RAG) so as to acquire relevant knowledge segments of question information input by a user and guide a large language model to generate answers. The method can be specifically constructed by using the first text information and the second text information on the basis of an ES retrieval library.

The first text result described in the embodiment of the invention refers to a knowledge segment obtained by searching and recalling in a target text library according to the first text information.

The second text result described in the embodiment of the invention refers to a knowledge segment obtained by retrieving and recalling in the target text library according to the second text information.

The preset large language model described in the embodiment of the invention refers to a preset large language model for processing multi-mode question information, and specifically, a large language model GPT series model, such as a GPT-4 model, can be adopted.

In the embodiment of the present invention, in step 110, an ERNIE-ViLG model may be used to convert non-text information in the multimodal questioning information to be replied to corresponding text information, so as to obtain second text information.

For example, the non-text information in the multimodal questioning information to be replied to is at least one picture data,these picture data can be expressed asThe picture data are input into the ERNIE-ViLG model one by one, and can be expressed as:

model input；

；

The ERNIE-ViLG model converts the input picture data into corresponding text modalities, which can be expressed as:

；

it will be appreciated that the number of components,representation->Corresponding second text information, +.>Representation ofCorresponding second text information, +.>Representation->And corresponding second text information.

Further, in the embodiment of the present invention, in step 120, the second text information and the first text information in the multimodal questioning information to be replied to may be put into the ES retrieval library, so as to construct a target text library for retrieval recall. Furthermore, according to specific matching requirements, for example, keyword matching is utilized to construct query sentences, and accurate matching, fuzzy matching, boolean query and the like can be performed.

In an implementation, queries may be performed using match statements provided by the ES search library. The constructed query sentences are sent to a target text library for query, and first text results matched with the first text information can be respectively retrieved from the target text library and can be expressed asAnd a second text result matched with the second text information, which can be expressed as +.>Wherein m and o are determined according to the actual matching result.

Further, in the embodiment of the present invention, in step 130, the first text information, the retrieved first text result and the retrieved second text result in the multimodal questioning information to be replied are subjected to text vector concatenation, and then the spliced combined vector is input into a preset large language model for natural language processing, and finally the replying information corresponding to the multimodal questioning information to be replied is output.

According to the question-answering method, device, electronic equipment and storage medium based on the large language model, the non-text mode information in the multi-mode question information to be answered is converted into the corresponding text information, and various text information is placed into the search library to construct the target text library, so that the search result corresponding to the text mode and the search result corresponding to the non-text mode are searched from the target text library, the utilization of the non-text mode information is realized, various search results and the question text information are input into the prompt input of the preset large language model together to serve as references, the answer information corresponding to the multi-mode question information to be answered is guided to be generated by the model, efficient and accurate answer to the multi-mode question information can be achieved, the training cost of the large language model is not required to be additionally increased, and the fact illusion problem easily appearing in the large language model is effectively relieved.

Based on the foregoing embodiment, as an optional embodiment, before retrieving, from the target text library, the first text result that matches the first text information in the multimodal challenge information to be replied to, and the second text result that matches the second text information, respectively, the method further includes:

determining target index information based on the first text paragraph information and the second text paragraph information by using a preset search library;

Specifically, the first text paragraph information described in the embodiment of the present invention refers to single text paragraph information obtained after paragraph level cutting is performed on the first text information.

Similarly, the second text paragraph information described in the embodiment of the present invention refers to single text paragraph information obtained after paragraph level cutting is performed on the second text information.

The preset search library described in the embodiment of the invention refers to a preset search library for search enhancement generation, and an ES search library or a Lucene search library can be adopted.

The target index information described in the embodiment of the invention refers to list information obtained after index construction is performed on each piece of first text paragraph information and each piece of second text paragraph information and the associated knowledge document in the preset search library.

In the embodiment of the invention, after the second text information and the first text information in the multimodal questioning information to be replied are acquired, a target text library needs to be constructed. In the specific embodiment, the processing manners of the first text information and the second text information are the same, and for convenience of description, an embodiment of processing the first text information is described, and an embodiment of processing the second text information may be implemented according to the processing manner of the first text information.

Specifically, the first text information is cut at the paragraph level, and a plurality of pieces of first text paragraph information corresponding to the first text information are obtained. Assuming that the first text information is expressed as text, after cutting, the text information can be obtainedWherein->And respectively representing n pieces of first text paragraph information obtained by cutting, wherein n can be determined according to the actual cutting result.

Also, in the above manner, a plurality of pieces of second text paragraph information corresponding to the second text information can be obtained. Furthermore, the preset search library may be an ES search library, and the first text paragraph information and the second text paragraph information may be respectively imported into the ES search library.

Further, the ES search library is utilized, and the first text paragraph information and the second text paragraph information are processed to determine target index information.

Based on the content of the foregoing embodiment, as an optional embodiment, determining, using a preset search library, target index information based on the respective first text paragraph information and second text paragraph information includes:

word segmentation is carried out on the first text paragraph information and the second text paragraph information respectively by adopting a word segmentation tool in a preset search library, so as to obtain word segmentation data corresponding to each first text paragraph information and word segmentation data corresponding to each second text paragraph information;

and determining target index information based on the word segmentation data corresponding to each first text paragraph information, the word segmentation data corresponding to each second text paragraph information and pre-stored document information in a preset search library.

Specifically, in the embodiment of the present invention, the preset search library may employ an ES search library. The word segmentation tool space word segmentation device Whitespace Tokenizer in the ES search library is adopted to segment each piece of first text paragraph information, so that word segmentation data corresponding to each piece of first text paragraph information can be obtained, and the word segmentation data can be expressed as follows:

；

it will be appreciated that the first text paragraph informationThe corresponding word segmentation data is->First text paragraph information->The corresponding word segmentation data is->… and first text paragraph information +.>The corresponding word segmentation data is->。

Similarly, according to the word segmentation mode, word segmentation data corresponding to each piece of second text paragraph information can be obtained.

Further, in the embodiment of the invention, an index structure is constructed for word segmentation data corresponding to each first text paragraph information and pre-stored document information in an ES retrieval library, and an inverted index is established) And (3) a table. Each term corresponds to a list of document IDs, which can be expressed as follows:

；

in the method, in the process of the invention,representing->Each word in the word segmentation data, wherein +.>，；/>The ID information representing the first knowledge document pre-stored in the ES search library, i may be determined according to the number of knowledge documents actually stored.

Similarly, according to the above manner, an index structure may be constructed for the word segmentation data corresponding to each second text paragraph information and the pre-stored document information in the ES search library, and another inverted index table may be established.

In the embodiment of the invention, the target index information can be directly obtained according to the two types of inverted index tables established above.

According to the method provided by the embodiment of the invention, the text data corresponding to the multi-modal information after the paragraph cutting is imported into the ES retrieval library, the word segmentation processing is further carried out on each text data, the target index information is constructed, the utilization of the non-text modal information is realized, the processing of the non-text modal information by the large language model is realized, and the accuracy of the model generating answer is improved.

Further, in the embodiment of the present invention, the ES search library may be updated according to the word segmentation data corresponding to each first text paragraph information, the word segmentation data corresponding to each second text paragraph information, and the established target index information, so as to generate the target text library.

According to the method provided by the embodiment of the invention, the preset search library is utilized to perform word segmentation, index structure establishment and other operations on each text message corresponding to the multi-modal question information, so that the target text library is constructed, the capability of enhancing generation of large language model search is improved, the capability of processing multi-modal input information of the large language model is improved, and the accuracy of question answering is improved.

Based on the content of the foregoing embodiment, as an optional embodiment, using a preset large language model, determining reply information corresponding to the multimodal question information to be replied based on the first text information, the first text result, and the second text result includes:

according to a preset sequence, performing information splicing on the first text information, the first text result, the second text result and the preset prompt information to obtain model input information;

and inputting the model input information into a preset large language model to obtain reply information corresponding to the multi-mode question information to be replied, which is output by the preset large language model.

Specifically, the preset Prompt information described in the embodiment of the present invention refers to preset Prompt (Prompt) template information for inputting a preset large language model.

In the embodiment of the invention, the target text library can be subdivided into a text library 1 and a text library 2 for retrieving and recalling the first text information and the second text information respectively. Further, the first text result with highest matching degree retrieved according to the first text information is obtainedAnd retrieving a recalled second text result according to the second text information corresponding to the picture modalityAnd the preset prompt information added to the preset large language model is spliced with the first text information according to a preset sequence to obtain the whole model input text of the preset large language model. Wherein the first text information may be represented as input,which represents a text vector representation corresponding to the first text information, s may be determined based on a text vectorization result for the first text information.

Wherein the whole model input text of the large language model is presetCan be expressed as:

；

corresponding model input informationThe vector representation of (a) is:

；

wherein,indicating prompt message->Vector representations of (a); />Indicating prompt message->Vector representations of (a); />Indicating prompt message->Vector representations of (a); />Representing prompt informationIs a vector representation of (c).

Further, inputting the model into informationInputting into a preset large language model, a model intermediate hidden state can be obtained, which can be expressed as +.>Wherein u represents the number of words or tokens, and the number of words or tokens may be selected based on the model pair ∈ ->And the processing result of (2) is determined. Then, through the multi-layer perceptron MLP, mapping to the probability distribution on vocabulary +.>Where vocab_size represents the vocabulary size, independent of u. Then, the word with the highest output probability is output as the next word through the activation function Softmax function. And selectively utilizing and understanding the code text in the search result to output by means of the reasoning capability of the large language model, and finally outputting the reply information corresponding to the multi-mode question information to be replied. This process can be expressed as follows:

；

。

according to the method provided by the embodiment of the invention, the first text information, the first text result, the second text result and the preset prompt information are subjected to information splicing, the related knowledge obtained based on multi-mode question information retrieval is integrated into the prompt information of the large language model, the large language model is guided to perform dialogue generation, high-efficiency and accurate answer content can be output, and the answer capability of the large language model for the multi-mode question information is greatly improved.

The big language model based question-answering device provided by the invention is described below, and the big language model based question-answering device described below and the big language model based question-answering method described above can be correspondingly referred to each other.

Fig. 2 is a schematic structural diagram of a question-answering device based on a large language model, as shown in fig. 2, including: the conversion module 210, the retrieval module 220 and the processing module 230 are connected in sequence.

The conversion module 210 is configured to convert the non-text information in the multimodal questioning information to be replied to corresponding second text information;

the retrieval module 220 is configured to retrieve, from the target text library, a first text result that matches with the first text information in the multimodal questioning information to be replied and a second text result that matches with the second text information, respectively; the target text library is constructed based on the first text information and the second text information;

the processing module 230 is configured to determine, using a preset large language model, response information corresponding to the multimodal question information to be responded based on the first text information, the first text result, and the second text result.

The question-answering device based on the large language model in this embodiment may be used to execute the above-mentioned question-answering method embodiment based on the large language model, and its principle and technical effects are similar, and will not be described herein again.

According to the question-answering device based on the large language model, the mode information of the non-text mode in the multi-mode question information to be answered is converted into the corresponding text information, and various text information is placed into the search library to construct the target text library, so that the search result corresponding to the text mode and the search result corresponding to the non-text mode are searched out from the target text library, the utilization of the non-text mode information is realized, various search results and the text information of the questions are input into the prompt input of the preset large language model together to serve as references, the guide model generates the answer information corresponding to the multi-mode question information to be answered, the efficient and accurate answer to the multi-mode question information can be realized, the training cost of the large language model is not required to be additionally increased, and the fact illusion problem easily appearing in the large language model is effectively relieved.

Based on the foregoing, as an alternative embodiment, the apparatus further includes:

the construction module is used for determining target index information based on the first text paragraph information and the second text paragraph information by using a preset search library;

the generation module is used for generating a target text library based on the preset search library and the target index information.

Based on the foregoing embodiment, as an alternative embodiment, the building block is specifically configured to:

Based on the foregoing embodiment, as an optional embodiment, the processing module is specifically configured to:

Fig. 3 is a schematic physical structure of an electronic device according to the present invention, and as shown in fig. 3, the electronic device may include: processor 310, communication interface (Communications Interface) 320, memory 330 and communication bus 340, wherein processor 310, communication interface 320, memory 330 accomplish communication with each other through communication bus 340. Processor 310 may invoke logic instructions in memory 330 to perform the large language model based question-answering method provided by the methods described above, the method comprising: converting non-text information in the multi-mode question information to be replied into corresponding second text information; searching a first text result matched with the first text information in the multimodal questioning information to be replied and a second text result matched with the second text information from a target text library respectively; the target text library is constructed based on the first text information and the second text information; and determining reply information corresponding to the multimodal questioning information to be replied based on the first text information, the first text result and the second text result by using a preset large language model.

Further, the logic instructions in the memory 330 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product, where the computer program product includes a computer program, where the computer program can be stored on a non-transitory computer readable storage medium, and when the computer program is executed by a processor, the computer can execute a question-answering method based on a large language model provided by the above methods, and the method includes: converting non-text information in the multi-mode question information to be replied into corresponding second text information; searching a first text result matched with the first text information in the multimodal questioning information to be replied and a second text result matched with the second text information from a target text library respectively; the target text library is constructed based on the first text information and the second text information; and determining reply information corresponding to the multimodal questioning information to be replied based on the first text information, the first text result and the second text result by using a preset large language model.

In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the method of large language model-based question-answering provided by the above methods, the method comprising: converting non-text information in the multi-mode question information to be replied into corresponding second text information; searching a first text result matched with the first text information in the multimodal questioning information to be replied and a second text result matched with the second text information from a target text library respectively; the target text library is constructed based on the first text information and the second text information; and determining reply information corresponding to the multimodal questioning information to be replied based on the first text information, the first text result and the second text result by using a preset large language model.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for question answering based on a large language model, comprising:

2. The large language model based question-answering method according to claim 1, wherein before retrieving first text results matching first text information in the multimodal question information to be answered and second text results matching the second text information from the target text library, respectively, the method further comprises:

3. The large language model based question-answering method according to claim 2, wherein the determining target index information based on each of the first text paragraph information and the second text paragraph information using the preset search library includes:

4. The big language model based question-answering method according to any one of claims 1-3, wherein the determining, with a preset big language model, answer information corresponding to the to-be-answered multi-modal question-answering information based on the first text information, the first text result and the second text result includes:

5. A large language model-based question answering apparatus, comprising:

6. The large language model based question-answering apparatus according to claim 5, further comprising:

7. The large language model based question-answering apparatus according to claim 6, wherein the construction module is specifically configured to:

8. The large language model based question-answering apparatus according to any one of claims 5-7, wherein the processing module is specifically configured to:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the large language model based question-answering method according to any one of claims 1 to 4 when the program is executed by the processor.

10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the large language model based question-answering method according to any one of claims 1 to 4.