CN116594768A

CN116594768A - Large-model-oriented universal tool collaboration and refinement learning system and method

Info

Publication number: CN116594768A
Application number: CN202310498645.1A
Authority: CN
Inventors: 刘知远; 孙茂松; 汪华东; 秦禹嘉; 胡声鼎; 严澜
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2023-05-05
Filing date: 2023-05-05
Publication date: 2023-08-15

Abstract

The invention provides a large model-oriented universal tool collaborative and refined learning system and a large model-oriented universal tool collaborative and refined learning method, which improve the processing capacity of a large model on complex tasks. The system comprises a dynamic combination mechanism module of different tools, a unified interface module based on language instructions, a general tool fine learning module and an execution process and result information comprehensive reasoning module. The dynamic combination mechanism module of different tools decomposes the task corresponding to the task instruction to be processed into a plurality of tool-level subtasks, the unified interface module based on the language instruction realizes cooperative calling among multiple types of universal tools through the universal tool interface, the functional complementation among the different tools is realized, the special skills of the tools are fully exerted through the fine learning module of the universal tools, the integrated reasoning module of the execution process and the result information carries out integrated reasoning on the execution process information and the processing result of the plurality of tool-level subtasks to obtain the final answer of the task instruction, and the method is applied to the intelligent question-answer scene with remarkable effect.

Description

Large-model-oriented universal tool collaboration and refinement learning system and method

Technical Field

The invention relates to the technical field of natural language processing, in particular to a large-model-oriented universal tool collaboration and refinement learning system and method.

Background

In recent years, a Pre-training language model (Pre-trained Language Model, PLM) brings a series of breakthrough progress to the fields of natural language processing and artificial intelligence, and becomes a mainstream technical paradigm of artificial intelligence. Currently large-scale PLMs are also internationally considered to be "basic models" for implementing general artificial intelligence, also known as "Big models" (BM) in China. The existing large model technology represented by ChatGPT and GPT-4 can complete the artificial intelligent tasks such as article writing, dialogue question answering, automatic programming and the like, but the effect of the large model technology is not satisfactory in many practical application scenes, particularly in the aspects of complex task processing, interpretability, credibility, professional knowledge/skill processing and the like, the large model is easy to generate text contents such as knowledge errors or credibility and the like when generating contents, and the large model is obviously deficient in tasks such as symbol knowledge reasoning, data calculation and the like. In addition, since large models are all trained on data collected before a specific time node, model knowledge is established and does not have real-time updating capability.

Disclosure of Invention

The invention provides a large model-oriented universal tool collaborative and refined learning system and method, which are used for solving the problems that in the prior art, a large model is easy to generate text contents such as knowledge errors or unreliable and the like when generating contents, and the large model is obviously deficient in tasks such as symbol knowledge reasoning, data calculation and the like. In addition, because the large models are all obtained by training on the data collected before the specific time nodes, the model knowledge is established and does not have the defect of real-time updating capability, the functional complementation among different tools is realized by realizing the cooperative calling among the multiple types of general tools, the special skills of the tools are fully exerted by the fine learning technology of the general tools, the processing capability of the large models on complex tasks is improved accordingly, and the intelligent question-answering scene effect is obvious.

The invention provides a large model-oriented universal tool collaboration and refinement learning system, which comprises: the dynamic combination mechanism module of different tools is connected with the output end of the large model through the input end, and is used for decomposing a task corresponding to a task instruction to be processed into a plurality of tool-level subtasks through the large model, constructing a universal tool call graph based on the plurality of tool-level subtasks and a universal tool interface, and establishing a dynamic routing mechanism of the universal tool through reinforcement learning and instruction learning; the input end of the unified interface module is connected with the output end of the dynamic combination mechanism module of the different tools and is used for calling a universal tool interface based on language instructions according to a plurality of tool-level subtasks, the universal tool call graph and a dynamic routing mechanism; the input end of the universal tool refined learning module is connected with the output end of the unified interface module based on the language instruction, and the universal tool refined learning module is used for processing a plurality of tool-level subtasks according to universal tools corresponding to universal tool interfaces based on the language instruction to obtain processing results of the tool-level subtasks; the universal tool refinement learning module is one or more universal tool modules which are updated and maintained in real time; the input end is connected with the output end of the general tool refined learning module, and the output end is connected with the execution process and result information comprehensive reasoning module of the input end of the large model, and the general tool refined learning module is used for carrying out integrated reasoning on the execution process information and the processing results of a plurality of tool-level subtasks so as to obtain a final answer of the task instruction.

According to the large model-oriented universal tool collaborative and refined learning system provided by the invention, the universal tool refined learning module comprises: the interactive webpage browser tool refined learning module is used for processing the to-be-processed tool-level subtasks of which the tool call types are the webpage browser according to the interactive webpage browser tool to obtain a processing result of the to-be-processed tool-level subtasks of which the tool call types are the webpage browser; the knowledge graph tool refined learning module is used for processing the to-be-processed tool-level subtasks with the tool calling type being the knowledge graph according to the knowledge graph tool to obtain a processing result of the to-be-processed tool-level subtasks with the tool calling type being the knowledge graph; and the tool interface API tool refinement learning module is used for processing the to-be-processed tool level subtask of which the tool call type is the tool interface API tool according to the API tool to obtain a processing result of the to-be-processed tool level subtask of which the tool call type is the tool interface API tool.

According to the large-model-oriented universal tool collaboration and refinement learning system provided by the invention, the interactive web browser tool refinement learning module comprises: the fact retrieval acceleration module is used for decomposing a task corresponding to a task instruction to be processed according to a tree structure to obtain a plurality of action-level subtasks after tree structure decomposition, and predicting retrieval queries of the plurality of action-level subtasks after tree structure decomposition; the fact extraction module is used for extracting fact information related to a plurality of action level subtasks in the current page text of the interactive web browser so as to obtain a processing result of a tool calling type of the to-be-processed tool level subtasks of the web browser; the current page of the interactive web browser is obtained by searching according to the search query of a plurality of action-level subtasks.

According to the large model-oriented universal tool collaborative and refined learning system provided by the invention, the interactive web browser tool refined learning module further comprises a fusion visual information module which is used for inputting text information and visual information into the large model in a structure input mode so as to generate text information retrieval query, new fact extraction and next actions of a plurality of action level subtasks.

According to the large-model-oriented general tool collaboration and refinement learning system provided by the invention, the knowledge graph tool refinement learning module comprises: the initial instruction definition module is used for constructing a prompt description or a few-sample example of the serialized atomic operation on the basis of the intellectual task; the atomic operation prediction module is used for carrying out the sequential atomic operation prediction based on the atomic operation prediction model of the large model according to the prompt description or the few sample examples of the sequential atomic operation and the current state facing the knowledge graph; the query language conversion and execution module is used for converting the serialized atomic operation into a serialized knowledge graph interactive query language SPARQL; and executing the knowledge graph query language SPARQL to obtain a serialized interactive query result so as to obtain a processing result of the to-be-processed tool-level subtask with the tool call type being the knowledge graph.

According to the large-model-oriented universal tool collaboration and refinement learning system provided by the invention, the knowledge graph tool refinement learning module further comprises a reward module for optimizing an atomic operation prediction model of the large model based on an excitation model of the large model according to the serialized interactive query result.

According to the large model-oriented general tool collaboration and refinement learning system provided by the invention, the tool interface API tool refinement learning module comprises: the API customizing module is used for analyzing and collecting a tool-oriented interface API set related to a task corresponding to the task instruction based on user requirements; the API integration module based on the unified interface is used for constructing a unified API warehouse according to the tool interface-oriented API set, wherein the API warehouse is integrated with a plurality of API interfaces capable of meeting different requirements of users, and an API level is built; the API retrieval module is used for carrying out API retrieval of the dynamic requirements based on the unified API warehouse according to the API level mechanism to obtain an API retrieval result; and the model learning module based on the API pre-training is used for carrying out model learning based on the API pre-training according to the API retrieval result so as to obtain a processing result of the to-be-processed tool-level subtask of which the tool call type is the tool interface API tool.

According to the large-model-oriented general tool collaboration and refinement learning system provided by the invention, the tool interface API tool refinement learning module further comprises a small-sample generalized API tool manual learning module which is used for updating and maintaining the unified API warehouse and optimizing the API level, and the small-sample generalized tool manual learning is performed based on the updated and maintained unified API warehouse and the optimized API level so as to perform on-demand retrieval by adopting the small-sample generalized API tool manual.

According to the invention, the large model-oriented universal tool collaboration and refinement learning system further comprises: and the unified interface module is used for understanding the API tool manual, obtaining the API tool manual information which is easy to understand by the large module, and calling the API tool required by the to-be-processed tool-level subtask of the tool interface API tool based on the tool calling type according to the API tool manual information.

The invention also provides a large model-oriented universal tool cooperation and refinement learning method, which is applicable to the large model-oriented universal tool cooperation and refinement learning system and comprises the following steps: the dynamic combination mechanism module of different tools decomposes a task corresponding to a task instruction to be processed into a plurality of tool-level subtasks through a large model, constructs a universal tool call graph based on the plurality of tool-level subtasks and a universal tool interface, and establishes a dynamic routing mechanism of the universal tool through reinforcement learning and instruction learning; the unified interface module based on the language instruction calls a universal tool interface based on the language instruction according to a plurality of tool-level subtasks, the universal tool call graph and a dynamic routing mechanism; the universal tool refinement learning module processes a plurality of tool-level subtasks according to universal tools corresponding to the universal tool interfaces based on language instructions to obtain processing results of the plurality of tool-level subtasks; the universal tool refinement learning module is one or more universal tool modules which are updated and maintained in real time; and the comprehensive reasoning module of the execution process and the result information carries out integrated reasoning on the execution process information and the processing results of the plurality of tool-level subtasks so as to obtain a final answer of the task instruction.

Drawings

In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a large model-oriented generic tool collaboration and refinement learning system provided by the present invention;

FIG. 2 is a schematic diagram of a general framework for large model-oriented tool learning provided by the present invention;

FIG. 3 is an overall technical framework diagram of the large model-oriented generic tool collaboration and refinement learning system provided by the present invention;

FIG. 4 is a technical framework diagram of a generic tool refinement learning module provided by the present invention;

FIG. 5 is a diagram of an interactive web browser simulation interface screenshot (left diagram) and supported set of actions (right diagram) provided by the present invention;

FIG. 6 is a frame diagram of a web browser-oriented tool refinement learning module provided by the present invention;

FIG. 7 is a technical framework diagram of a web browser tool refinement learning module provided by the invention that takes account of fact retrieval acceleration and fusion of visual information;

fig. 8 is an example of knowledge graph retrieval based on an atomic serialization operation (the left graph is an example of process record and the right graph is an example of a middle step of query operation);

FIG. 9 is a technical framework diagram of the knowledge-graph-oriented tool refinement learning module provided by the invention;

FIG. 10 is a technical framework diagram of a tool refinement learning module facing a tool interface API provided by the invention;

FIG. 11 is a diagram of annotation record for the unified use of various tools by the annotation platform provided by the invention;

FIG. 12 is a tool learning data annotation platform interface diagram provided by the present invention;

FIG. 13 is a flow chart of the large model oriented generic tool collaboration and refinement learning method provided by the invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Considering that the existing large model technology is not satisfactory in many practical application scenarios, in particular in the aspects of complex task processing, interpretability, credibility and the like. The large model still has serious phenomenon of 'illusion' (i.e. text contents such as knowledge errors, toxicity, unreliability and the like are easy to generate when the contents are generated, and the large model is obviously lacking in tasks such as symbol knowledge reasoning, data calculation and the like. In addition, large models are all obtained by training on data collected before a specific time node, and model knowledge is established and does not have real-time updating capability.

AI machine systems currently lack efficient learning, reasoning, prediction and planning capabilities like humans and animals, and it is considered that "how to let machines learn actively, observe the world representing, learn predictions, and how to let machines perform long-term predictions and long-term plans by breaking complex behaviors into lower-level behavior sequences" is a major bottleneck in building autonomous machine intelligence (Autonomous Machine Intelligence).

In order to solve the technical problems in the prior art, the invention builds a large model-oriented universal tool cooperation and refinement learning system technology by referring to the cognitive behaviors of human use tools, namely the large model-oriented universal tool cooperation and refinement learning system technology, so that the large model learns to use various universal tools, the potential of the universal tools is exerted to help solve various complex tasks, and the aim of further enhancing the intelligent level of the current large model is realized by enabling the large model to learn to cooperatively call and refine to use external tools. The invention is developed around the whole framework of a large-model-oriented universal tool collaboration and refinement system, and comprises a dynamic combination mechanism module 1 of different tools, a unified interface module 2 based on language instructions, a universal tool refinement learning module 3 and an execution process and result information comprehensive reasoning module 4. And the technical framework of the universal tool refined learning module 3 is specifically introduced, and finally, a universal tool collaboration and refined learning platform oriented to a large model is introduced, so that the effect of being applied to intelligent question-answering scenes is remarkable.

Referring to fig. 1, fig. 1 is a schematic structural diagram of a large model-oriented generic tool collaborative and refined learning system provided by the present invention.

Referring to fig. 2, fig. 2 is a general framework diagram of learning tools for large models according to the present invention.

Referring to fig. 3, fig. 3 is an overall technical framework diagram of the large model-oriented generic tool collaboration and refinement learning system provided by the present invention.

The invention provides a large model-oriented universal tool collaboration and refinement learning system, which comprises:

the dynamic combination mechanism module 1 of different tools, the input end of which is connected with the output end of the large model, is used for decomposing a task corresponding to a task instruction to be processed into a plurality of tool-level subtasks through the large model, constructing a universal tool call graph based on the plurality of tool-level subtasks and a universal tool interface, and establishing a dynamic routing mechanism of the universal tool through reinforcement learning and instruction learning;

the input end of the unified interface module 2 is connected with the output end of the dynamic combination mechanism module 1 of different tools and is used for calling a universal tool interface based on language instructions according to a plurality of tool-level subtasks, a universal tool call graph and a dynamic routing mechanism;

The universal tool refinement learning module 3 is connected with the output end of the unified interface module 2 based on the language instruction at the input end and is used for processing a plurality of tool-level subtasks according to the universal tools corresponding to the universal tool interface based on the language instruction to obtain processing results of the plurality of tool-level subtasks; the universal tool refinement learning module is one or more universal tool modules which are updated and maintained in real time;

the input end is connected with the output end of the general tool refined learning module 3, and the output end is connected with the execution process and result information comprehensive reasoning module 4 of the input end of the large model, which is used for carrying out integrated reasoning on the execution process information and the processing results of a plurality of tool-level subtasks so as to obtain the final answer of the task instruction.

Specifically, a collaborative learning system for multiple types of tools: how to correctly call the capabilities of various tools and cooperatively execute the tools, thereby solving the complex tasks and how to comprehensively infer, generalize and sort the information acquired by the various tools in combination with the knowledge of the model itself.

The information multisource of tool learning determines that the model needs to comprehensively sort the obtained complex information and apply the complex information to specific task scenes. Different tools have unique advantages in solving specific tasks, and more complex tasks can be solved by enabling a large model to master various tools, so that the reasoning capacity and the intelligence level of the large model are improved. The key technical problems to be solved include: dynamic combination modules of different tools, execution process, information robustness comprehensive reasoning of results and multi-type tool interaction realization based on tool interfaces.

The method is constructed by combining modules such as dynamic combination and integrated reasoning on the basis of 2 high-level tools such as a knowledge graph (such as Wikidata, SPARQL query language) and a web search engine. The large-model-oriented universal tool collaboration and refinement learning system is divided into a dynamic combination module 1 of different tools, a unified interface module 2 based on language instructions, a universal tool refinement learning module 3 and an execution process and result information comprehensive reasoning module 4. The specific scheme is as follows:

(1) Dynamic combination module 1 of different tools: in the aspect of dynamic combination of modules, a dynamic routing mechanism of the modules is adopted: 1) Decomposing complex tasks into subtasks through a large model, and constructing a call graph structure or a flow chart among toolsCalled Planning Graph (Planning Graph), can be defined as a set of vertices and edges +.>The node v can be composed of task instructions, original questions, sub-questions, tool descriptions and the like, epsilon is an edge between the nodes, the initial node of the edge is the premise of the reasoning step, and the end node is the output of the reasoning step, namely an intermediate conclusion or answer. Modeling of the dynamic routing mechanism of the module is then achieved through reinforcement learning and instruction learning. 2) The Graph Neural Network (GNN) technology can be used for modeling the planning graph structure among tools, and the relationship and interaction among the tools are learned through the GNN model so as to determine the dynamic combination mode of the modules, so that the interaction among the tools can be predicted more accurately, and better task decomposition and combination are realized. Second, reinforcement learning methods can be employed to optimize dynamic combinations of modules, and by introducing rewards functions and policy networks, large models can be made according to different Dynamically selecting the optimal tool combination mode. The method can more flexibly cope with different task demands and tool combination modes, thereby realizing better task efficiency and accuracy. In addition, instruction learning based methods tell the user how tools should be combined to accomplish a task by letting the user provide instructions directly to the model.

(2) Language instruction based unified interface module 2: and the interaction between the large model and different tools is realized based on the multi-type tool interaction realization of the unified interaction cognitive interface. Based on the single tool learning model (BM-Web, BM-KG, BM-API, etc.), a large model tool collaborative learning framework used by multiple types of tools simultaneously is constructed, and the main intelligent agent large model interacts with each single tool learning model through human language instructions. The subtasks are obtained through a dynamic combination module process, tool language instructions are built according to tool description and task types, a single tool learning model executes output feedback according to the customized language instructions, interaction is simple, the single tool learning model can be used for carrying out fine operation on tools, the potential of the single tool can be exerted, and the single tool learning model is greatly dependent on instruction understanding capability of the single tool learning model and task decomposing capability of a main model large model.

(3) General tool refinement learning module 3: how to define the model-tool interaction interface, adapt the model to a specific tool, understand the meaning of the tool's executable operations, and thus be able to output the correct decision sequence in a complex environment. Tool learning needs to fall to the ground (grouping) to a specific tool without losing generality, and here, the method can be used for selecting a Web browser, a knowledge graph and a tool interface API as representative universal tools to develop and provide a design scheme of a system, so that fine learning and use of the universal tools by a large model are realized. Key problems to be solved include: the system comprises a tool refinement learning module BM-Web of an interactive Web browser based on complex problem decomposition, a tool refinement learning module BM-KG based on serialized knowledge graph query and a tool refinement learning module BM-API facing a tool interface API.

(4) The execution process and result information comprehensive reasoning module 4: in multi-type tool use of large models, accuracy and robustness of information results are critical. For this reason, a series of measures are required to improve the reliability and stability of the reasoning and execution process of the model. First, the model can be trained using existing fact-verification datasets to identify and determine the authenticity of tool feedback information, which can improve the accuracy and reliability of the model. During execution, feedback control and error correction mechanisms can be employed to correct errors and ignore irrelevant information to ensure accuracy of the model. For example, if the action of the model results in a negative reward, the error correction mechanism may adjust the policy to prevent similar actions from occurring in the future. This way erroneous decisions of the model can be avoided to the greatest extent. And finally, combining a Self-Consistency (Self-Consistency) thought chain to generate a plurality of combined process executions and reasoning, and voting output results or verifying model selection results, so that the robustness and reliability of information can be further improved, and the model has universality and adaptability.

The large model-oriented universal tool collaboration and refinement learning system has the following innovations: 1) The novel multi-type tool collaborative learning technical framework consists of a dynamic combination module 1 of different tools, a unified interface module 2 based on language instructions, a general tool refined learning module 3 and an execution process and result information comprehensive reasoning module 4. 2) A multi-tool collaborative technology framework based on a language instruction interface is designed.

Without specific explanation, the large model in this patent refers to a large-scale pre-trained language model.

In summary, the large-model-oriented universal tool collaboration and refinement learning system realizes functional complementation among different tools by realizing collaboration calling among multiple types of universal tools, fully exerts special skills of the tools by the refinement learning technology of the universal tools, accordingly improves the processing capacity of the large model on complex tasks, and has remarkable effect when being applied to intelligent question-answering scenes.

Based on the above embodiments:

referring to fig. 4, fig. 4 is a technical frame diagram of the general tool refinement learning module provided by the present invention.

As a preferred embodiment, the universal tool refinement learning module 3 includes: the interactive webpage browser tool refined learning module is used for processing the to-be-processed tool-level subtasks of which the tool call types are the webpage browser according to the interactive webpage browser tool to obtain a processing result of the to-be-processed tool-level subtasks of which the tool call types are the webpage browser; the knowledge graph tool refined learning module is used for processing the to-be-processed tool-level subtasks with the tool calling type being the knowledge graph according to the knowledge graph tool to obtain a processing result of the to-be-processed tool-level subtasks with the tool calling type being the knowledge graph; and the tool interface API tool refinement learning module is used for processing the to-be-processed tool level subtask of which the tool call type is the tool interface API tool according to the API tool to obtain a processing result of the to-be-processed tool level subtask of which the tool call type is the tool interface API tool.

For tool learning oriented to an interactive web browser, in order to break the gap between different web pages, a unified visual interface is constructed based on a graphical interface rendered by the browser, and a unified code text interface is constructed based on HTML. On the basis of defining the interaction interface between the model and the tool, the behavior data of human using the browser is collected, and the pre-training model is finely tuned to learn and simulate human behaviors, so that the tool is intelligently used. Aiming at the knowledge graph, a man-machine collaborative labeling platform is constructed, and the behavior sequence of a person when using the two kinds of tools is recorded. Training a large model based on a certain amount of behavior sequences, and performing reinforcement learning training on the basis. For tools presented in the form of APIs, the build platform supports the input of APIs, and makes the model call to the APIs run in an isolated environment, and results are presented in a human-readable form, so that human beings can cooperate with the model to complete understanding and calling of the APIs. And in this way more human behavioral data is collected to provide a model for simulated learning.

Referring to fig. 5, fig. 5 is a diagram illustrating a simulation interface screenshot (left diagram) and a supported action set (right diagram) of an interactive web browser provided by the present invention.

Referring to fig. 6, fig. 6 is a frame diagram of a refined learning module of the web browser-oriented tool provided by the present invention.

As a preferred embodiment, the interactive web browser tool refinement learning module includes: the fact retrieval acceleration module is used for decomposing the task corresponding to the task instruction to be processed according to the tree structure to obtain a plurality of action level subtasks after the tree structure is decomposed, and predicting retrieval queries of the plurality of action level subtasks after the tree structure is decomposed; the fact extraction module is used for extracting fact information related to a plurality of action level subtasks in the current page text of the interactive web browser so as to obtain a processing result of the tool calling type of the to-be-processed tool level subtasks of the web browser; the current page of the interactive web browser is obtained by searching according to the search query of a plurality of action level subtasks.

As a preferred embodiment, the interactive web browser tool refinement learning module further comprises a fusion visual information module for inputting the text information and the visual information into the large model in a structural input mode so as to generate text information retrieval queries, new fact extraction and next actions of a plurality of action level subtasks.

Specifically, the tool learning BM-Web of the interactive Web browser specifically includes: the method comprises the steps of constructing a search engine use interface, defining important interactable elements in the use process of the search engine, such as a search button, page down turning, page clicking and the like, and influencing the current interface by interaction, such as entering a designated page after clicking a control. On the basis of defining the interaction interface of the large model and the tool, collecting behavior data of human using the browser. Specifically, in one embodiment, two batches of subjects are summoned, one of the batches is asked to formulate a series of requirements, and natural language based instructions are given, such as "why is the sky blue searched? "; another group of testees (executors) needs to use the built search engine to complete corresponding operation by using the interface according to the instruction, and after collecting the human behavior data, the pretrained model is finely tuned to enable the testees to learn and simulate human behaviors so as to enable the testees to intelligently learn and simulate human behaviorsA tool is used. Specifically, the input of the model is S _t Original questions, current Query, action set, window set, fact set. The model predicts the next decision on this basis.

Tool learning for web browsers mainly includes four basic components:

(1) Motion prediction: predicting action probability in an action space according to the current state;

the operation to be performed next is predicted. Taking action Search as an example, assume { x } ₁ ,…,x _N Tagged sequence as action name Search, where x _i Representing a particular mark. The probability of a search can be decomposed as follows:

wherein P (search|S _t ) Representing the output Search term sequence search= { x ₁ ,…,x _N Predictive probability, P (x) ₁ |S _t ) Representing a given state S _t The first word under the condition being x ₁ Is the predictive probability of P (x) _i |S _t ,x ₁ ,…,x _i-1 ) Representing a given state S _t And the ith word under the previous i-1 words is x _i Prediction probability of S _t Indicating the current state, x, at time t _i The representation word or token, N represents the corresponding Search question word sequence length N.

In the reasoning process, the action with the highest execution probability is selected on the interface.

(2) And (3) Query generation: generating a subtask search query according to the current state;

generating search sentencesTo retrieve a search engine (e.g., big) to be considered as a text generation task:

wherein P (Q) _t+1 |S _t ) Representing a given state S _t Generating a search question statement Q under the condition _t+1 Is the predictive probability of P (q) ₁ |S _t ) Representing a given state S _t Generating a first token as q under the condition ₁ Probability of P (q) _i |S _t ,q ₁ ,…,q _i-1 ) Representing a given state S _t And predicting the ith word as q when the first i-1 token _i Prediction probability of S _t Represents the environmental state of the t-th time step, q _i Ith token, |Q, representing a question statement _t+1 The I represents the token sequence length of the question sentence.

(3) The fact extraction module: extracting information related to problems in the current page;

assume that in the browsing mode, the current content of the window is the query resultThe goal of this step is to obtain the query result W _t Extracting a supporting fact f= { w _i ,…,w _j 1.ltoreq.i.ltoreq.j.ltoreq. |w- _t | a. The invention relates to a method for producing a fibre-reinforced plastic composite. Generating only a given S _t First and last few N of f below _f Characters by maximizing conditional probability +.>Realized here [ s ]]And [ e ]]The start and stop special characters, respectively denoted as start fact f, can locate W by text matching after decoding the start and end markers during reasoning _t To the desired sequence, a preset number N _f May be 10.

(4) Answer generation: and carrying out answer synthesis according to the collected fact set and the original questions.

The integration module is responsible for composing a series of supporting facts into a coherent answer. During the training process, the optimization model is based on the original problem (Q ₀ ) And given the supporting facts (f ₁ ,...,f _N ) Generating answers, maximizing P (Answer |Q ₀ ,f ₁ ,…,f _N )。

Referring to fig. 7, fig. 7 is a technical frame diagram of a refined learning module of a web browser tool considering fact retrieval acceleration and fusion visual information provided by the present invention.

The visual information such as the layout, the color and the like of the webpage plays an important role in information retrieval when people browse the webpage to acquire the information. The implementation manner of the embodiment further considers the information retrieval with enhanced visual information and the efficient text information retrieval as follows:

and a fusion visual information module: given text input x _text And visual input X _vision The probability of the target text Y synthesis based on the text and visual information is:

here, theFor a multi-modal pretrained language model (M big model), such as multi-modal-CoT, KOMMOS-1, etc., or other pretrained models input by a supporting structure, such as CPM-3, chatGPT, etc., visual information such as fonts, colors, layout, etc., and text information are input into the model through a structure input mode.

Wherein, the liquid crystal display device comprises a liquid crystal display device,representing a given input text X _text And visual input information X _vision Generating a predictive probability of Y->Representing Y _i The ith token of Y.

The fact retrieval acceleration module: considering that the processes of information retrieval, action execution, fact acquisition, information synthesis and the like are sequentially performed by adopting a chain retrieval mode, the answer recovery speed is low, and meanwhile, the history state information is overlong, so that the deducing cost of a model is increased and the maximum input length of the model is easily exceeded.

In summary, the interactive web browser tool refinement learning module has the following innovations: 1) The method comprises the steps of firstly fusing relevant visual information of the webpage, and improving the webpage text retrieval effect in a webpage browser used by the BM; 2) Aiming at the problem of overlong sequence in the fact extraction, a high-efficiency fact extraction module for generating a span head-tail part sequence is provided; 3) The frame is provided for carrying out tree structure decomposition on the problems to carry out divide-by-divide parallel processing, so that a more efficient BM-Web technology is realized.

Referring to fig. 8, fig. 8 is an example of performing knowledge graph retrieval based on an atomic serialization operation (the left graph is an example of a process record and the right graph is an example of a middle step of a query operation) provided by the present invention.

Referring to fig. 9, fig. 9 is a technical frame diagram of a knowledge graph tool-oriented fine learning module provided by the present invention.

As a preferred embodiment, the knowledge-graph-oriented tool refinement learning module includes: the initial instruction definition module is used for constructing a prompt description or a few-sample example of the serialized atomic operation on the basis of the intellectual task; the atomic operation prediction module is used for carrying out the sequential atomic operation prediction based on the atomic operation prediction model of the large model according to the prompt description or the few sample examples of the sequential atomic operation and the current state facing the knowledge graph; the query language conversion and execution module is used for converting the serialized atomic operation into a serialized knowledge graph interactive query language SPARQL; and executing the knowledge graph query language SPARQL to obtain a serialized interactive query result so as to obtain a processing result of the to-be-processed tool-level subtask with the tool call type being the knowledge graph.

As a preferred embodiment, the knowledge graph tool refinement learning module further comprises a reward module for optimizing an atomic operation prediction model of the large model based on the excitation model of the large model according to the serialized interactive query result.

Specifically, the embodiment constructs a refined learning platform of a knowledge graph-oriented tool, and on the platform, a labeling person can complete complex knowledge base inquiry only by clicking and other operations. The background records the result of human observation data, processes the sequence operation, and provides the sequence operation for the model for training. The platform can then be used to restore the model to allow human annotators to help adjust the behavior sequence.

KoPL is a programming language designed for complex problem reasoning. The KoPL sums the knowledge in the knowledge graph into 7 types of elements (entity, concept, attribute, relationship, attribute type fact, relationship type fact and modifier fact), and abstracts 14 knowledge operation functions (such as Find, filterConcept, filterStr, relate, and, or and the like) and 13 query functions (such as QueryName, queryAttr, queryRelation, selectionAmong and the like) on the basis. For atomic operation construction, the embodiment refers to atomic operation and refers to function setting of KoPL first, and takes the function setting as atomic operation. Various complex reasoning tasks based on knowledge graphs can be basically realized through the atomic operations.

In order for a large model to learn behavior that mimics a human knowledge query, knowledge query behavior data based on atomic operations needs to be obtained. (1) Firstly, based on defined atomic operation, a simulation labeling platform of knowledge query is constructed, and behavior data and intermediate results are labeled and recorded manually. For the problem sources of the labeling data, the data are often defined on a small knowledge graph subset, a specific knowledge graph or the problem is defined strictly according to a given knowledge graph, and the question asking mode is greatly different from the actual question asking mode. The present embodiment will consider combining generalization and zero sample generalization to construct high quality question-answer data. (2) Because the manual labeling process based on the atomic operation is relatively complex, only a small amount of data can be labeled, and the amplification of labeled data can be realized by means of the problem paramagnase.

And in the aspect of model training, on the atomic operation behavior data of the collected knowledge graph query, training the BM-KG by adopting a behavior simulation learning optimization method.

(1) The initial instruction defines a module. On the basis of user problems, a prompt description or a few sample example of the atomic operation is constructed as a task language instruction, and the mode is helpful for understanding the atomic operation by a large model.

(2) And the atomic operation prediction module is used as an Actor in imitation learning, and performs atomic operation prediction on the t-th moment according to the instruction input and the current state.

(3) The query language conversion and execution module converts the atomic operation into the knowledge graph query language SPARQL, which can be implemented by a predefined template method because the atomic operation is fixed. And then executing the query language to acquire the current query result.

(4) And the rewarding module is used for scoring the generated operation sequence according to the current execution result and the action and serving as a reward for optimizing the atomic operation model.

In conclusion, the knowledge graph tool oriented refined learning module has the following innovations: by designing the serialization atomic decomposition, the knowledge question-answering method for the serialization knowledge graph interactive query is realized, the intermediate state is obtained according to the serialization interactive query to adjust, and the problem that any step in the middle is wrong or no answer causes no answer can be avoided.

Referring to fig. 10, fig. 10 is a technical framework diagram of the tool-oriented API tool refinement learning module provided by the present invention.

As a preferred embodiment, the tool interface API-oriented tool refinement learning module includes: the API customizing module is used for analyzing and collecting a tool-oriented interface API set related to a task corresponding to the task instruction based on the user requirement; the system comprises an API integration module based on a unified interface, a tool interface-oriented API set and an API storage module, wherein the API integration module is used for constructing a unified API warehouse according to the tool interface-oriented API set, integrating a plurality of API interfaces capable of meeting different requirements of users and establishing an API level; the API retrieval module based on the dynamic requirements is used for carrying out API retrieval based on the uniform API warehouse according to an API hierarchical mechanism to obtain an API retrieval result; and the model learning module based on the API pre-training is used for carrying out model learning based on the API pre-training according to the API retrieval result so as to obtain a processing result of the to-be-processed tool-level subtask of which the tool call type is the tool interface API tool.

As a preferred embodiment, the tool interface API tool refinement learning module further comprises a low sample generalization API tool manual learning module for updating and maintaining the unified API repository and optimizing the API hierarchy, and performing low sample generalization tool manual learning based on the updated and maintained unified API repository and the optimized API hierarchy for on-demand retrieval using the low sample generalization API tool manual.

In particular, with the continuous development of technology and the increasingly complex and diversified application scenarios, many problems need to be solved using a specific API interface. For example, for a system of equations similar to "2xζ2=3y-1 and x+y=10xy", a specific equation needs to be used to solve the API interface; if a PPT is to be made for tool learning and a picture is to be downloaded from samples. Com, a specific picture download API interface is required; and if it is desired to convert the PDF file into a Word document, it is also necessary to use a specific document conversion API interface. Furthermore, if it is desired to download photos with a specific theme, such as glaciers or aurora, from an instragram, it is also necessary to implement it using a specific API interface.

Because of the very high flexibility and customization of these API interfaces, it is difficult to unify them by one tool. To solve this problem, the present embodiment proposes an idea of directly modeling a specific API interface. By analyzing and collecting the demands of the users, the API set required by the users is obtained, and a unified API warehouse is constructed according to the analysis result. In the warehouse, various API interfaces are integrated, so that different requirements of users can be met. We have also built a strict API hierarchy so that the model can retrieve or use each sub-API.

For each API, its description, parameters, and return values are constructed, and some examples are provided for the model to learn the usage and purpose of the API. For general APIs, annotation data is also built, allowing the model to call APIs more precisely through training. Thus, the model can conveniently call a specific API interface according to the requirements of users, thereby solving various problems. The API repository is continually updated and maintained to ensure that the latest and most excellent API interfaces are contained. The API level is also evolving to accommodate different application scenarios and user requirements. Meanwhile, the documents and the use guidelines of the API are improved, so that a user can use the calling model more easily and touch the API interfaces, a flexible, efficient and easy-to-use API platform is provided for the user, and various problems can be solved more easily.

In order for the tool interface API fine learning module to have cross-task instruction generalization capability, it is necessary to construct a zero-sample or few-sample instruction learning method for the tool interface API. Considering common tools such as a target knowledge graph, a web browser, a database and the like, rich and complete interface description documents and use examples are often provided, and therefore, a method based on instruction data constructed by a Tool Manual (Tool Manual) is adopted. Aiming at the problems that the tools use APIs and the number of examples is huge, and the large models are difficult to input in an instruction definition mode due to the limited input length of the large models, an API manual is adopted to search and support according to requirements, the method is realized by means of search technologies such as vector Dense search (Dense Retrieval), micro-searchable index (Differentiable Search Index, DSI) or approximate nearest neighbor algorithm (Hierarchical Navigable Small World, HNSW) algorithm, and the efficient and accurate search is facilitated by utilizing the inherent hierarchical relation of the APIs.

An example of how the integrated API call is made in the system is described below. The user requests "help me make a PPT of the relevant tool study, where the picture is downloaded from example. This section requires analysis to obtain three APIs: an API for making the PPT, an API for searching and downloading the picture, and an API for format conversion. These three APIs will be integrated in our BM-API. In real user demands (taking the problem as an example still), the model can search the API to obtain a matched API function, if the function appears when training the model, the model can call the API with higher accuracy, and if the function appears when training the model, the model is required to learn the API use by reading a manual and sample examples of the API. The model sequentially processes the files by calling the three required APIs respectively to obtain the finally required tool learning pdf file.

In summary, the tool interface API-oriented tool refinement learning module has the following innovations: the method and the device provide an API tool learning technology for realizing few samples and zero samples based on the description and the example of the API manual and realizing the calling and the use of large-scale APIs in combination with the efficient API retrieval, and are suitable for any tool with the API manual.

As a preferred embodiment, further comprising: and the unified interface module is used for understanding the API tool manual, obtaining the API tool manual information which is easy to understand by the large module, and calling the API tool required by the to-be-processed tool-level subtask of the tool interface API tool based on the tool calling type according to the API tool manual information.

Specifically, a unified interface module based on tool API instructions: considering the SPARQL query language, web search engine, various programming languages and the like of the current Wikidata, rich interfaces are provided, and the large model can directly call the tool API interface to realize the operation of the large model on tools. First, the API manual understands that this information is converted by defined Schema into a machine-readable format that is easily understandable by a large model using the techniques of ChatGPT or rule automatic parsing of the API manual and extraction of relevant information (such as input parameters, output formats, and example code fragments). And secondly, searching the API, automatically selecting the API according to the correlation and compatibility of the API and the current task, and determining the proper API. The tool learning model which receives language instructions is prevented from being built for each tool based on the tool API module, and the expansibility and the flexibility are better. In model optimization, both methods use environmental feedback reinforcement learning via interactive learning.

Referring to fig. 11, fig. 11 is a labeling record chart of the labeling platform provided by the present invention for unifying various tools.

Referring to fig. 12, fig. 12 is a tool learning data labeling platform interface diagram provided by the present invention.

A tool learning platform (Tool Learning Platform, TLP) was developed based on a tool learning framework that was intended to provide an efficient, interactive learning environment that enables users to easily learn and use various tools, such as web browsers, knowledge maps, structured databases, large model tools, etc. The platform has the following characteristics: (1) The system is a three-party interactive platform, namely tools, models and people can mutually interactively learn; (2) The executable environment of the tool is arranged behind the platform, so that the calling and feedback of the tool can be presented to the model and the person in real time; (3) The platform has expandability, and a human annotator can perform a series of actions (actions) on the platform, including clicking, searching and other browsing actions, or perform function calls to complete more complex tasks; (4) Is a multi-step execution environment, and the output of each step of operation can be read by a subsequent model so as to take the subsequent operation; (5) automatic recording. The model and the operations performed by the person on the platform are completely recorded and serve as human real data for the subsequent training model. The results of tool calls generated by the model are directly presented to the model and the person for interactive learning. Corrected sequences of model predictions are recorded for continued training of the model.

In summary, the tool learning platform has the following innovations: a large model-oriented multi-type tool learning platform is constructed, and the platform can provide an efficient and interactive learning environment, so that a user and BM can easily learn and use various tools.

Referring to fig. 13, fig. 13 is a flow chart of a large model-oriented general tool collaborative and refined learning method provided by the invention.

The invention also provides a large model-oriented universal tool cooperation and refinement learning method, which is applicable to the large model-oriented universal tool cooperation and refinement learning system and comprises the following steps:

1301: the dynamic combination mechanism module of different tools decomposes a task corresponding to a task instruction to be processed into a plurality of tool-level subtasks through a large model, constructs a universal tool call graph based on the plurality of tool-level subtasks and a universal tool interface, and establishes a dynamic routing mechanism of the universal tool through reinforcement learning and instruction learning;

1302: the unified interface module based on the language instruction calls a universal tool interface based on the language instruction according to a plurality of tool-level subtasks, a universal tool call graph and a dynamic routing mechanism;

1303: the universal tool refinement learning module processes the plurality of tool-level subtasks according to the universal tools corresponding to the universal tool interfaces based on the language instructions to obtain processing results of the plurality of tool-level subtasks;

1304: the universal tool refinement learning module is one or more universal tool modules which are updated and maintained in real time; and the comprehensive reasoning module of the execution process and the result information carries out integrated reasoning on the execution process information and the processing results of the plurality of tool-level subtasks so as to obtain a final answer of the task instruction.

For the description of the large model-oriented general tool collaboration and refinement learning method provided by the present invention, reference is made to the above system embodiment, and the description of the present invention is omitted here.

The main innovations of the invention are as follows: 1) The invention constructs an effective universal tool collaboration and refinement learning technical framework, which comprises a dynamic combination mechanism module 1 of different tools, a unified interface module 2 based on language instructions, a universal tool refinement learning module 3 and an execution process and result information comprehensive reasoning module 4; 2) A new multi-type tool collaborative learning technical framework is provided, namely a multi-tool collaborative technical framework based on a language instruction interface and an API instruction interface; 3) Constructing a multi-type tool learning platform facing BM, wherein the platform can provide an efficient and interactive learning environment, and users and BM can easily learn and use various tools through the platform; 4) The method comprises the steps of constructing a BM tool learning technology oriented to universal tool refinement operation, comprising a tool refinement learning technology BM-Web of an interactive Web browser fusing visual information, a tool refinement learning technology BM-KG based on serialized knowledge graph query and an API tool refinement learning method BM-API supporting few sample/zero sample learning.

The universal tool collaboration and refinement learning system for constructing the large model has good practical value:

the built large model tool learning system utilizes predefined tools to extend the task processing capabilities of the large model. The execution result of the tool is combined with the generation result of the large model, so that the accuracy of model generation can be improved; optimizing the tool in the specific field can enable the model to better utilize the tool to solve tasks, and the practicability and operability of research are improved. The large model tool learning system constructed has the following advantages over using only a generic language model: the excessive dependence on memory is avoided, and the real-time updating capability is enhanced; the method has more excellent performance in the specific field; providing new opportunities for previously unresolved tasks; support more natural man-machine interaction; the interpretability and the credibility of the model are improved; the robustness of the model is improved; enhancing low-resource language understanding capabilities. Therefore, the built large model tool learning system can realize a new task solution, and provides an effective way for solving the problems of large model 'illusion', and the like.

According to the invention, through constructing a universal tool cooperation and refined learning system of the large model, the large model simulates the cognitive behaviors of human use tools, and methods such as complex task decomposition, reasoning planning, interactive feedback and the like are constructed, so that the autonomous cognitive learning capability of the large model can be realized.

According to the invention, the cognitive behaviors of human use tools are used for reference and learning, tool learning of a large model of a general tool such as a Web browser, a knowledge graph and a tool API is constructed, and practical verification is carried out on application tasks such as intelligent question-answering and task modeling of a complex scene, so that the complex task solving capability of the large model in practical application is improved.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A large model-oriented generic tool collaboration and refinement learning system, comprising:

the dynamic combination mechanism module of different tools is connected with the output end of the large model through the input end, and is used for decomposing a task corresponding to a task instruction to be processed into a plurality of tool-level subtasks through the large model, constructing a universal tool call graph based on the plurality of tool-level subtasks and a universal tool interface, and establishing a dynamic routing mechanism of the universal tool through reinforcement learning and instruction learning;

The input end of the unified interface module is connected with the output end of the dynamic combination mechanism module of the different tools and is used for calling a universal tool interface based on language instructions according to a plurality of tool-level subtasks, the universal tool call graph and a dynamic routing mechanism;

the input end of the universal tool refined learning module is connected with the output end of the unified interface module based on the language instruction, and the universal tool refined learning module is used for processing a plurality of tool-level subtasks according to universal tools corresponding to universal tool interfaces based on the language instruction to obtain processing results of the tool-level subtasks; the universal tool refinement learning module is one or more universal tool modules which are updated and maintained in real time;

the input end is connected with the output end of the general tool refined learning module, and the output end is connected with the execution process and result information comprehensive reasoning module of the input end of the large model, and the general tool refined learning module is used for carrying out integrated reasoning on the execution process information and the processing results of a plurality of tool-level subtasks so as to obtain a final answer of the task instruction.

2. The large model-oriented generic tool collaborative and refinement learning system of claim 1, wherein the generic tool refinement learning module comprises:

The interactive webpage browser tool refined learning module is used for processing the to-be-processed tool-level subtasks of which the tool call types are the webpage browser according to the interactive webpage browser tool to obtain a processing result of the to-be-processed tool-level subtasks of which the tool call types are the webpage browser;

the knowledge graph tool refined learning module is used for processing the to-be-processed tool-level subtasks with the tool calling type being the knowledge graph according to the knowledge graph tool to obtain a processing result of the to-be-processed tool-level subtasks with the tool calling type being the knowledge graph;

and the tool interface API tool refinement learning module is used for processing the to-be-processed tool level subtask of which the tool call type is the tool interface API tool according to the API tool to obtain a processing result of the to-be-processed tool level subtask of which the tool call type is the tool interface API tool.

3. The large model-oriented generic tool collaboration and refinement learning system of claim 2, wherein the interactive web browser tool refinement learning module comprises:

the fact retrieval acceleration module is used for decomposing a task corresponding to a task instruction to be processed according to a tree structure to obtain a plurality of action-level subtasks after tree structure decomposition, and predicting retrieval queries of the plurality of action-level subtasks after tree structure decomposition;

The fact extraction module is used for extracting fact information related to a plurality of action level subtasks in the current page text of the interactive web browser so as to obtain a processing result of a tool calling type of the to-be-processed tool level subtasks of the web browser; the current page of the interactive web browser is obtained by searching according to the search query of a plurality of action-level subtasks.

4. A large model oriented generic tool collaboration and refinement learning system as claimed in claim 3 wherein said interactive web browser tool refinement learning module further comprises a fusion visual information module for inputting to said large model by way of structural input based on text information and visual information to generate text information retrieval queries, new fact extraction and next actions for a number of said action level subtasks.

5. The large model-oriented generic tool collaboration and refinement learning system of claim 2, wherein the knowledge-graph tool refinement learning module comprises:

the initial instruction definition module is used for constructing a prompt description or a few-sample example of the serialized atomic operation on the basis of the intellectual task;

The atomic operation prediction module is used for carrying out the sequential atomic operation prediction based on the atomic operation prediction model of the large model according to the prompt description or the few sample examples of the sequential atomic operation and the current state facing the knowledge graph;

the query language conversion and execution module is used for converting the serialized atomic operation into a serialized knowledge graph interactive query language SPARQL; and executing the knowledge graph query language SPARQL to obtain a serialized interactive query result so as to obtain a processing result of the to-be-processed tool-level subtask with the tool call type being the knowledge graph.

6. The large model-oriented universal tool collaborative and fine learning system of claim 5, wherein the knowledge-graph tool fine learning module further comprises a rewarding module for optimizing an atomic operation prediction model of the large model based on an incentive model of the large model based on the serialized interactive query result.

7. The large model-oriented generic tool collaboration and refinement learning system of claim 2, wherein the tool interface API tool refinement learning module comprises:

the API customizing module is used for analyzing and collecting a tool-oriented interface API set related to a task corresponding to the task instruction based on user requirements;

The API integration module based on the unified interface is used for constructing a unified API warehouse according to the tool interface-oriented API set, wherein the API warehouse is integrated with a plurality of API interfaces capable of meeting different requirements of users, and an API level is built;

the API retrieval module is used for carrying out API retrieval of the dynamic requirements based on the unified API warehouse according to the API level mechanism to obtain an API retrieval result;

and the model learning module based on the API pre-training is used for carrying out model learning based on the API pre-training according to the API retrieval result so as to obtain a processing result of the to-be-processed tool-level subtask of which the tool call type is the tool interface API tool.

8. The large model oriented generic tool collaboration and refinement learning system of claim 7, wherein the tool interface API tool refinement learning module further comprises a few sample generalized API tool manual learning module for updating and maintaining the unified API repository and optimizing the API hierarchy, the few sample generalized tool manual learning based on the updated and maintained unified API repository and the optimized API hierarchy for on-demand retrieval employing the few sample generalized API tool manual.

9. A large model-oriented generic tool collaboration and refinement learning system according to any one of claims 1 to 8 further comprising:

and the unified interface module is used for understanding the API tool manual, obtaining the API tool manual information which is easy to understand by the large module, and calling the API tool required by the to-be-processed tool-level subtask of the tool interface API tool based on the tool calling type according to the API tool manual information.

10. A large model-oriented generic tool collaboration and refinement learning method, characterized in that the method is applicable to a large model-oriented generic tool collaboration and refinement learning system as claimed in any one of claims 1 to 9, comprising:

the dynamic combination mechanism module of different tools decomposes a task corresponding to a task instruction to be processed into a plurality of tool-level subtasks through a large model, constructs a universal tool call graph based on the plurality of tool-level subtasks and a universal tool interface, and establishes a dynamic routing mechanism of the universal tool through reinforcement learning and instruction learning;

the unified interface module based on the language instruction calls a universal tool interface based on the language instruction according to a plurality of tool-level subtasks, the universal tool call graph and a dynamic routing mechanism;

The universal tool refinement learning module processes a plurality of tool-level subtasks according to universal tools corresponding to the universal tool interfaces based on language instructions to obtain processing results of the plurality of tool-level subtasks; the universal tool refinement learning module is one or more universal tool modules which are updated and maintained in real time;

and the comprehensive reasoning module of the execution process and the result information carries out integrated reasoning on the execution process information and the processing results of the plurality of tool-level subtasks so as to obtain a final answer of the task instruction.