CN116976306A

CN116976306A - Multi-model collaboration method based on large-scale language model

Info

Publication number: CN116976306A
Application number: CN202310958947.2A
Authority: CN
Inventors: 李翔
Original assignee: Zhuhai Zhuohuan Technology Co ltd
Current assignee: Zhuhai Zhuohuan Technology Co ltd
Priority date: 2023-08-01
Filing date: 2023-08-01
Publication date: 2023-10-31

Abstract

The application discloses a multi-model cooperation method based on a large-scale language model, which belongs to the technical field of artificial intelligence and comprises a plurality of steps of model preparation, task understanding and disassembling, model matching, subtask execution, task feedback iterative execution and the like.

Description

Multi-model collaboration method based on large-scale language model

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a multi-model collaboration method based on a large-scale language model.

Background

LLM: large Language Model (large-scale language model); langChain: langChain is a framework based on Large Language Models (LLMs) and is intended to provide a generic interface for the development of LLMs applications; small model: the deep learning model of non-LLM has the general parameters of less than or equal to one hundred million, can process data of various modes such as images, texts and the like, and can only process one data in a small model, so that a specific problem is solved; ebedding: mapping the original data into a semantic space, wherein the purpose is that in the semantic space, quantized values can be used for judging semantic similarity between texts;

the large-scale language model LLM refers to a model capable of automatically predicting the next word in natural language, such as GPT-4, GLM, etc. Because of the self-learning capability of LLMs, the LLMs can learn on the basis of a large amount of data without manual intervention, can continuously improve the performance of the LLMs, and can generate high-quality natural language texts, thus being very useful for tasks such as natural language generation, machine translation and the like. But current inputs and outputs are limited to text only.

The approximation scheme is as follows: langchain is a framework based on a Large Language Model (LLMs), and aims to provide a universal interface for the development of LLMs application, simplify the development difficulty of the application, facilitate a developer to develop complex LLMs application rapidly, and support the call of a third party model to make an ebedding mapping on text data when indexing the text data.

Current LLM technology has met with great success, but has drawbacks that make LLMs difficult to process complex information, such as images and speech. In addition, some complex tasks require multiple models to be done in concert, which is beyond the capabilities of LLMs. While LLMs exhibit excellent results with zero or few samples, they are still not as effective as some experts (e.g., fine-tuning models). In order to process the complex AI tasks, LLMs should work in coordination with external models, but currently Langchain can only call other language models to map text data when the text data is indexed, and when the complex tasks are encountered, no method is available to call other non-language models automatically to do corresponding processing.

Disclosure of Invention

The application aims to provide a multi-model cooperation method based on a large-scale language model, which solves the problem that when complex tasks are encountered, the problem that other non-language models cannot be automatically called to do corresponding processing in the prior art.

In order to achieve the above purpose, the present application provides the following technical solutions: a multi-model collaboration method based on a large-scale language model comprises the following steps:

s1, preparing a model, wherein the model comprises a large model and a model library, and the model library comprises a deep learning small model capable of processing each mode;

s2, task understanding and disassembling, namely after a request is received, splitting the request into a series of structured task sequences, and identifying the dependency relationship and execution sequence among the tasks;

s3, matching the model, namely after resolving a task list in S2, matching subtasks with the small models, firstly, acquiring text descriptions of the small models, transmitting the descriptions to the LLM model, enabling the model to understand the capacity of the small models in terms of semantics through task setting and prompting, particularly the input and output requirements and limitations of the model, then dynamically selecting the models by using context task allocation, and adding the tasks after user inquiry and resolution into prompt information to select the small model most suitable for the task;

s4, executing subtasks, carrying out reasoning calculation of the small models on heterogeneous reasoning terminals for calculation stability, and enabling the small models to quickly calculate on the reasoning terminals and return results of the corresponding subtasks by adapting devices of different architectures;

s5, performing task feedback iteration, enabling the LLM to judge whether the sub-tasks disassembled in the S2 are completed or not according to the returned results of the sub-tasks, combining the results of the sub-tasks according to the context, and if the problem of the total task is not solved, continuing to judge the total problem by repeating the mode of disassembling and calling the small model until the LLM judges that the total problem is solved, and returning part of intermediate processes and final results.

Preferably, the large model comprises GPT4, GLM and VICUNA, and the deep learning small model is used for processing data of various modes such as images, texts and the like.

Preferably, in the step 2, after the LLM is prompted to understand the task correspondingly, the type of the task and the type of the data format are determined by semantics, the task requirement is identified from the input, the data and the scene information related in the task are extracted, and then the task is split into a group of input and output related to task planning according to the type of the task, wherein the input is a request of a user, and the output is a desired task sequence.

Preferably, the step 2 further includes analyzing information of the dependency relationship between the tasks, by understanding the logic relationship between the tasks, and determining the execution sequence and the resource dependency.

Preferably, in the step 3, because the word number of the prompt message is limited, all the model information cannot be added to the prompt message, so that the models are filtered according to the subtask types, the remaining small models are ordered according to the semantic matching degree, and the first K models are selected as candidate models of the subtasks.

Compared with the prior art, the application has the beneficial effects that:

1) The application can not only call other language models to map text data, but also call other non-language models automatically according to tasks, so that not only can text data and tasks be processed, but also data and tasks of images and voices can be processed.

2) The application can enable the large language model to cooperate with other models to automatically complete specific complex tasks, solve the problem that the current large language model can not solve, and improve the task solving efficiency.

Drawings

FIG. 1 is a schematic flow chart of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

Examples:

referring to fig. 1, the present application provides a technical solution: a multi-model collaboration method based on a large-scale language model comprises the following steps:

s1, preparing a model, wherein one part is a large model of the current mainstream, such as GPT4, GLM, VICUNA and the like, the other part is a model library, and the model library comprises deep learning small models capable of processing various modes, wherein the parameters are generally less than one hundred million, and can process data of various modes, such as images, texts and the like, and generally one small model can only process one type of data, so that a specific problem, such as OCR or speech synthesis, speech recognition and the like, is solved;

s2, task understanding and disassembling, namely splitting the task into a series of structured task sequences after receiving a request, identifying the dependency relationship and the execution sequence among the tasks, judging the type of the task and the type of a data format through semantics after the LLM carries out corresponding understanding on the tasks through prompts, identifying the task requirement from input, extracting data and scene information related to the task, splitting the task into a group of input and output related to task planning according to the type of the task, wherein the input is a request of a user, the output is an expected task sequence, analyzing the information of the dependency relationship among the tasks, and determining the execution sequence and resource dependence through understanding the logic relationship among the tasks;

s3, model matching, namely after a task list is analyzed in S2, matching subtasks with small models, firstly, acquiring text descriptions of the small models, transmitting the descriptions to the LLM models, enabling the models to understand the capacity of the small models in terms of semantics through task setting and prompting, particularly the input and output requirements and limitations of the models, then dynamically selecting the models by using context task allocation, and adding tasks after user query and analysis into prompt information to select the small models most suitable for the task, wherein all model information cannot be added into the prompt information due to word number limitation of the prompt information, therefore, filtering the models according to subtask types, sequencing the residual small models according to semantic matching degree, and selecting the front K models as candidate models of the subtasks;

While the fundamental and principal features of the application and advantages of the application have been shown and described, it will be apparent to those skilled in the art that the application is not limited to the details of the foregoing exemplary embodiments, but may be embodied in other specific forms without departing from the spirit or essential characteristics thereof; the present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Although embodiments of the present application have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the application, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A multi-model collaboration method based on a large-scale language model is characterized in that: the method comprises the following steps:

2. The multi-model collaboration method based on a large-scale language model of claim 1, wherein: the large model comprises GPT4, GLM and VICUNA, and the deep learning small model is used for processing data of various modes such as images, texts and the like.

3. The multi-model collaboration method based on a large-scale language model of claim 1, wherein: and step 2, after the LLM carries out corresponding understanding on the task through prompting, judging the type of the task and the type of the data format through semantics, identifying the task requirement from the input, extracting data and scene information related in the task, and splitting the task into a group of input and output related to task planning according to the type of the task, wherein the input is a request of a user, and the output is a desired task sequence.

4. A multi-model collaboration method based on a large-scale language model as claimed in claim 3, wherein: and step 2, analyzing the information of the dependency relationship between the tasks, and determining the execution sequence and the resource dependency by understanding the logic relationship between the tasks.

5. The multi-model collaboration method based on a large-scale language model of claim 1, wherein: in the step 3, all the model information cannot be added to the prompt information due to word number limitation of the prompt information, so that the models are filtered according to subtask types, the remaining small models are ordered according to semantic matching degree, and the first K models are selected as candidate models of the subtasks.