CN117575008A

CN117575008A - Training sample generation method, model training method, knowledge question-answering method and knowledge question-answering device

Info

Publication number: CN117575008A
Application number: CN202311520323.9A
Authority: CN
Inventors: 桂安春; 李建; 杨奕凡; 刘星言; 代勇; 杜楠
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2023-11-14
Filing date: 2023-11-14
Publication date: 2024-02-20

Abstract

The embodiment of the application provides a training sample generation method, a model training method, a knowledge question-answering method and a knowledge question-answering device, which can be applied to various scenes such as cloud technology, artificial intelligence, intelligent traffic, auxiliary driving and the like. The training sample generation method comprises the following steps: for any tool interface in the pre-constructed tool library, if the similarity between the interface information of the tool interface and the sample problem information is greater than or equal to a preset threshold value, increasing the probability that the tool interface is selected; in the pre-constructed tool library, carrying out random sampling processing based on the selection probability of each tool interface to obtain interface information of at least one tool interface, and generating a sample tool library based on the interface information of the at least one tool interface; training samples are generated based on the sample problem information and the sample toollibrary. The implementation of the method does not need to adapt to all possible knowledge question-answer scenes to construct a large number of training samples, and the generation efficiency of the training samples can be effectively improved.

Description

Training sample generation method, model training method, knowledge question-answering method and knowledge question-answering device

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a training sample generation method, a model training method, a knowledge question-answering method and equipment.

Background

In recent years, the field of artificial intelligence has achieved remarkable achievements, particularly in the field of natural language processing. Language models are usually trained through a large amount of text data, and aim to learn and understand human language, so that various tasks such as intelligent dialogue, automatic completion, semantic reasoning and the like are realized.

In order to improve the performance of the model and the accuracy of the knowledge questions and answers in the prior art, a large number of training samples are required to be constructed for all possible knowledge questions and answers to perform model training, however, the construction of the training samples in the method has high cost, long time consumption and very low construction efficiency.

Disclosure of Invention

The embodiment of the application provides a training sample generation method, a model training method, a knowledge question-answering method and equipment for solving at least one technical problem. The technical scheme is as follows:

in a first aspect, an embodiment of the present application provides a training sample generating method based on knowledge question and answer, including:

for any tool interface in the pre-constructed tool library, if the similarity between the interface information of the tool interface and the sample problem information is greater than or equal to a preset threshold value, increasing the probability that the tool interface is selected;

In the pre-constructed tool library, carrying out random sampling processing based on the selection probability of each tool interface to obtain interface information of at least one tool interface, and generating a sample tool library based on the interface information of the at least one tool interface;

training samples are generated based on the sample problem information and the sample tool library.

In a second aspect, an embodiment of the present application provides a knowledge question-and-answer based model training method, including:

acquiring training samples pre-constructed by the knowledge question and answer-based training sample generation method provided by the first aspect, wherein the training samples comprise sample question information, a sample tool library, sample interface return information and sample answer information; the sample interface return information includes information returned based on invoking a tool interface in the sample tool library; the sample answer information includes information determined based on the sample interface return information;

training an initial generation type language model by adopting the training sample to obtain a target generation type language model for knowledge question answering;

the training of the initial generation type language model comprises the following steps: acquiring interface information of a prediction tool interface from the sample tool library based on the sample question information, calling the prediction tool interface through the interface information to obtain prediction interface return information, and determining prediction answer information corresponding to the sample question information based on at least one of the prediction interface return information and prior answer information determined through prior information; and determining a target loss value based on the sample answer information and the predicted answer information, and updating the initial generation type language model based on the target loss value.

In a third aspect, an embodiment of the present application provides a knowledge question-answering method, including:

acquiring target problem information input by an operation object;

if it is determined that tool interface calling is required for the target question information through a target generation type language model, and a target tool interface corresponding to the target question information exists in a pre-constructed tool library, interface information of the target tool interface is acquired, a corresponding target tool interface is called through the interface information, first answer information corresponding to the target question information is acquired, second answer information corresponding to the target question information is determined based on prior information, and target answer information corresponding to the target question information is output based on at least one of the first answer information and the second answer information; the target generation type language model is obtained through training by the training sample generation method based on knowledge question and answer.

In a fourth aspect, an embodiment of the present application provides a training sample generating device based on knowledge question and answer, including:

the probability adjustment module is used for aiming at any tool interface in the pre-constructed tool library, and if the similarity between the interface information of the tool interface and the sample problem information is greater than or equal to a preset threshold value, the probability of the tool interface being selected is increased;

The random sampling module is used for carrying out random sampling processing in the pre-constructed tool library based on the selected probability of each tool interface to obtain interface information of at least one tool interface, and generating a sample tool library based on the interface information of the at least one tool interface;

and the sample generation module is used for generating training samples based on the sample problem information and the sample tool library.

In a fifth aspect, an embodiment of the present application provides a knowledge question-and-answer based model training apparatus, including:

the sample obtaining module is used for obtaining training samples pre-constructed by the knowledge question and answer-based training sample generating device provided by the fourth aspect, wherein the training samples comprise sample question information, a sample tool library, sample interface return information and sample answer information; the sample interface return information includes information returned based on invoking a tool interface in the sample tool library; the sample answer information includes information determined based on the sample interface return information;

the model training module is used for training an initial generation type language model by adopting the training sample so as to obtain a target generation type language model for knowledge question answering;

In a sixth aspect, an embodiment of the present application provides a knowledge question-answering apparatus, including:

the problem acquisition module is used for acquiring target problem information input by the operation object;

the answer output module is used for acquiring interface information of a target tool interface through a target generation type language model if it is determined that tool interface calling is required for the target problem information, and a target tool interface corresponding to the target problem information exists in a pre-constructed tool library, acquiring first answer information corresponding to the target problem information through the interface information calling of the corresponding target tool interface, determining second answer information corresponding to the target problem information based on prior information, and outputting target answer information corresponding to the target problem information based on at least one of the first answer information and the second answer information; the objective-generating language model is trained by the knowledge question-and-answer-based training sample generating device provided in the fourth aspect.

In a seventh aspect, embodiments of the present application provide an electronic device, where the electronic device includes a memory, a processor, and a computer program stored on the memory, where the processor executes the computer program to implement the methods provided in the first aspect, the second aspect, and the third aspect.

In an eighth aspect, embodiments of the present application provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the methods provided in the first, second and third aspects above.

In a ninth aspect, embodiments of the present application provide a computer program product comprising a computer program which, when executed by a processor, implements the methods provided in the first, second and third aspects above.

The beneficial effects that technical scheme that this application embodiment provided brought are:

in a first aspect, an embodiment of the present application provides a training sample generating method based on knowledge questions and answers, specifically, in comparison with the prior art adapted to building training samples in all possible knowledge questions and answers scenarios, in the embodiment of the present application, building training samples in combination with tool interfaces that can be called by a model, in the related art, different tool interfaces are provided for different contents, corresponding tool libraries are pre-built in adaptation to the callable tool interfaces, and for any tool interface in the pre-built tool libraries, the similarity between the tool interface and sample problem information is considered, if the similarity is greater than or equal to a preset threshold (i.e., the similarity is higher, it is illustrated that the tool interface can assist in solving a problem by the model that needs to be trained), the probability that the tool interface is selected can be increased; then, in the constructed tool library, carrying out random sampling processing based on the probability that each tool interface is selected, so that interface information of at least one tool interface can be obtained, and generating a sample tool library based on the interface information of the at least one tool interface; training samples are ultimately generated based on the sample problem information and the sample tool library. According to the implementation of the method, the random sampling strategy based on the sample problem information is adopted to acquire the interface information of at least one tool interface from the pre-built tool library to generate the sample tool library so as to build the training samples which can be used for model training, a large number of training samples are not required to be built aiming at all possible knowledge question-answer scenes, the cost for building the training samples can be effectively reduced, the efficiency of building the training samples is improved, the generalization of calling the tool interface can be effectively improved, and the dependence on the large number of training samples in model training is reduced.

In a second aspect, an embodiment of the present application provides a knowledge question-and-answer based model training method, specifically, a training sample pre-constructed by the method provided in the first aspect may be obtained, and an initial generation type language model is trained by using the training sample, so as to obtain a target generation type language model for knowledge question-and-answer; the single training sample comprises sample question information, a sample tool library, sample interface return information and sample answer information; the sample tool library comprises interface information of a tool interface obtained from a pre-constructed tool library based on a random sampling strategy; the sample answer information includes information determined based on the sample interface return information. In the embodiment of the application, in the training process of the initial generation type language model, the generation type model can acquire interface information of a prediction tool interface from a sample tool library based on sample question information, call a corresponding prediction tool interface through the interface information to obtain prediction interface return information, determine prediction answer information corresponding to the sample question information based on at least one of the prediction interface return information and prior answer information determined through prior information, then determine a loss value based on the sample answer information and the prediction answer information, and update the generation type model based on the loss value. According to the implementation of the scheme, on the premise that the model can learn the discrimination capability of the tool, the number of training samples for effectively calling the tool interface is increased, so that training and convergence of the model are accelerated; the sample tool library is constructed by adopting a random sampling strategy, so that generalization of a trained model on tool call can be effectively enhanced, dependence on a large number of training samples is reduced (training samples are not required to be constructed for all possible knowledge question-answering scenes), a target generated language model obtained through training can be enabled to effectively call various tools, a large number of complex reasoning steps are omitted in the process of executing knowledge question-answering tasks, and the generation efficiency of answer information is improved.

In a third aspect, an embodiment of the present application provides a knowledge question-answering method, specifically, a target generation language model trained by the method provided in the second aspect, when target question information input by an operation object is obtained, whether a tool needs to be invoked when the target question information is solved is first determined, if it is determined that a target tool interface corresponding to the target question information exists in a required and pre-built tool library, interface information of the target tool interface can be obtained, a corresponding target tool interface is invoked through the interface information, first answer information corresponding to the target question information (i.e., answer information returned by the target tool interface) is obtained, second answer information corresponding to the target question information is determined based on the prior information, and target answer information corresponding to the target question information is output based on at least one of the first answer information and the second answer information. By the implementation of the scheme, the model can effectively carry out tool calling, the application range of the model is expanded, the reasoning steps required to be executed when the knowledge question-answering task is executed are saved, and the generation efficiency of answer information is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings that are required to be used in the description of the embodiments of the present application will be briefly described below.

FIG. 1 is a schematic diagram of an example of a related art tool interface call;

FIG. 2 is a flowchart of a training sample generation method based on knowledge questions and answers provided in an embodiment of the present application;

FIG. 3 is a flowchart of a knowledge question-and-answer based model training method according to an embodiment of the present application;

FIG. 4 is a flowchart of a knowledge question-answering method according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a model architecture according to an embodiment of the present disclosure;

FIG. 6 is a schematic diagram of a pre-constructed tool library according to an embodiment of the present application;

FIG. 7 is a schematic diagram of a training sample according to an embodiment of the present disclosure;

FIG. 8 is a schematic diagram of a tool interface random sampling provided in an embodiment of the present application;

FIG. 9 is a graph comparing effects of an implementation of one embodiment provided in the present application;

FIG. 10 is a schematic diagram of a tool interface call for a knowledge question and answer provided in an embodiment of the present application;

fig. 11 is a schematic diagram of a training sample generating device based on knowledge questions and answers provided in an embodiment of the present application;

FIG. 12 is a schematic diagram of a knowledge question-and-answer based model training apparatus according to an embodiment of the present application;

fig. 13 is a schematic diagram of a knowledge question-answering device according to an embodiment of the present application;

Fig. 14 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Embodiments of the present application are described below with reference to the drawings in the present application. It should be understood that the embodiments described below with reference to the drawings are exemplary descriptions for explaining the technical solutions of the embodiments of the present application, and the technical solutions of the embodiments of the present application are not limited.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and "comprising," when used in this application, specify the presence of stated features, information, data, steps, operations, elements, and/or components, but do not preclude the presence or addition of other features, information, data, steps, operations, elements, components, and/or groups thereof, all of which may be included in the present application. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein indicates that at least one of the items defined by the term, e.g., "a and/or B" may be implemented as "a", or as "B", or as "a and B".

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include, for example, sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, pre-training model technologies, operation/interaction systems, mechatronics, and the like. The pre-training model is also called a large model and a basic model, and can be widely applied to all large-direction downstream tasks of artificial intelligence after fine adjustment. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

In embodiments of the present application, natural language processing (Nature Language processing, NLP) technology is involved, and in particular, natural language processing is an important direction in the fields of computer science and artificial intelligence. It is studying various theories and methods that enable effective communication between a person and a computer in natural language. The natural language processing relates to natural language, namely the language used by people in daily life, and is closely researched with linguistics; and also to computer science and mathematics. An important technique for model training in the artificial intelligence domain, a pre-training model, is developed from a large language model (Large Language Model) in the NLP domain. Through fine tuning, the large language model can be widely applied to downstream tasks. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic questions and answers, knowledge graph techniques, and the like.

With research and advancement of artificial intelligence technology, research and application of artificial intelligence technology is being developed in various fields, such as common smart home, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned, autopilot, unmanned, digital twin, virtual man, robot, artificial Intelligence Generated Content (AIGC), conversational interactions, smart medical, smart customer service, game AI, etc., and it is believed that with the development of technology, artificial intelligence technology will be applied in more fields and with increasing importance value.

Existing language models often do not have the ability to directly integrate call tools APIs (Application Program Interface, application program interfaces) that rely primarily on training data to master language expressions and logical relationships, rather than implementing interface calls between program code and APIs. In the related art, by presetting a specific prompt template, the language model can have a certain level of API calling capability, as shown in fig. 1, all accessible tool APIs can be listed first, such as search API (Search API), calculation API (Calculator API), weather APPI (Weather API), and the like, as shown in fig. 1. Then, the format of the specific calling tool API is given (for example, "Call_API_1: tool_Name|query- > result_1"); finally, the question that really needs to be solved is given again (e.g. "question: how many years ago sharks first appear on the earth compared to trees"). By means of the templated questioning mode, the model can complete tool calling and replying according to the existing knowledge.

Further, for an open-source language model, the model may be taught how to call an API during the fine-tuning phase by providing a dataset containing examples of API calls. For example, a set of question-answer pairs may be provided, where the answer contains a specific description of the appropriate API interface and parameters. By fine-tuning over such a dataset, the model can learn to associate questions with the correct format and parameters of the API call.

Although the prior art has, to some extent, given the ability of language model invocation tools, these approaches have significant limitations. For a closed-source language model, a method based on a prompt template is adopted, but the method only depends on the knowledge and the capability of the model, and cannot truly call an external tool API to acquire the knowledge. For an open source language model, although the language model can be provided with a certain degree of API calling capability by a data set containing an API calling example, a few problems need to be solved in this way. Specifically, there are (1) training data problems: this requires a large number of questions and corresponding labeling samples of API calls in order to train the model to understand and call the APIs. However, the construction cost of such labeled samples is often very high, and the manner of calling different APIs may also be different, so that it is very difficult and costly to construct a comprehensive and uniform set of training samples. (2) API generalization problem: considering that a model is learned based on its training data, the APIs it understands and can call are typically limited by the number of APIs that appear in the training data. The model may not be able to make tool calls efficiently if an API that does not appear in the training data is encountered.

Aiming at least one technical problem in the prior art, the embodiment of the application provides a training sample generation method based on knowledge questions and answers, specifically, the method does not need to construct a large number of training samples for all possible knowledge questions and answers scenes, but obtains interface information of at least one tool interface from a given tool set through a random sampling strategy based on sample question information to generate a sample tool library so as to construct the training samples for model training, so that the cost for constructing the training samples can be effectively reduced, the problem of constructing the training samples is solved, the efficiency of constructing the training samples is improved, the generalization of tool interface calling can be effectively improved, and the dependence on a large number of training samples in model training is facilitated to be reduced.

The following description is made with respect to some terms involved in the embodiments of the present application:

language model: a probability model capable of automatically learning natural language, by learning statistical features of large-scale text data, the probability distribution of the next word or character in a given text sequence can be predicted.

Tool call: based on the language model being applied, the language model is aided in completing specific natural language processing tasks by calling existing external tools and utilizing the functions and API interfaces provided by the external tools.

Generalization ability: the language model's expressive power in processing new, unseen data or tools. A language model with good generalization capability should be able to make accurate calls to new tools, not just perform well on already seen tool sets.

Fine tuning: using a pre-trained language model as a starting point, a small amount of tuning and training is performed on the model for a particular task to improve the performance of the model on that particular task. As in the embodiments of the present application, the pre-trained initially generated language model is trained based on a small number of training samples.

The technical solutions of the embodiments of the present application and technical effects produced by the technical solutions of the present application are described below by describing several exemplary embodiments. It should be noted that the following embodiments may be referred to, or combined with each other, and the description will not be repeated for the same terms, similar features, similar implementation steps, and the like in different embodiments.

The following describes a knowledge question and answer based training sample generation method in an embodiment of the present application.

Specifically, the execution subject of the method provided in the embodiment of the present application may be a terminal or a server; the terminal (may also be referred to as a device) may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, an intelligent voice interaction device (e.g., a smart speaker), a wearable electronic device (e.g., a smart watch), a vehicle-mounted terminal, a smart home appliance (e.g., a smart television), an AR/VR device, and the like. The server may be an independent physical server, a server cluster formed by a plurality of physical servers or a distributed system (such as a distributed cloud storage system), or a cloud server for providing cloud computing and cloud storage services.

Specifically, as shown in fig. 2, the training sample generating method based on the knowledge problem includes steps S101 to S103:

step S101: for any tool interface in the pre-constructed tool library, if the similarity between the interface information of the tool interface and the sample problem information is greater than or equal to a preset threshold value, the probability that the tool interface is selected is increased.

Alternatively, the pre-built tool library may be an external tool structure provided based on related technology, and specifically may include a tool interface related to an application scenario of the trained model, for example, a knowledge question and answer applied to a daily living scenario, and then the tool library may be built based on an external tool involved in daily living, for example, a knowledge question and answer applied to a business scenario, and then the tool library may be built based on an external tool involved in business activity. In an example, the tool library may be constructed without limiting the types of tool interfaces included, and a wide variety of tool interfaces available from related technologies may be used to construct the tool library to generate a generic pre-constructed tool library.

In a possible embodiment, as shown in fig. 6, the interface information of each tool interface included in the pre-built tool library is obtained through the following construction operations corresponding to step A1-step A4:

Step A1: generating descriptive sub-information for the content information acquired through the tool interface; the content information relates to a scene to which the question information belongs.

For example, the Description information (Description) in the interface information of the weather API in fig. 6 is used to obtain the current or future weather condition of a certain location, and the information such as the specific role and the applicable scenario of the API can be informed by this information.

Step A2: and generating function sub-information aiming at the callable function, the input parameter type and the returned parameter type of the tool interface.

Wherein, all callable functions under the tool API, and the respectively input parameter types and the return parameter types can be listed through Function sub-information (Function); the functional sub-information in the interface information for weather API as in FIG. 6

“WeatherAPI.forecast_weather(location:str,days:int)->str”。

Step A3: example sub-information is generated for example questions and parameters in the tool interface corresponding to each of the functions.

The Example information (Example) may give a specific Example to each callable function of the tool API, including information such as a problem that may be involved and a parameter selection of a corresponding function.

Step A4: interface information of the tool interface is constructed based on the description sub-information, the function sub-information and the example sub-information.

As shown in fig. 6, the present application exemplarily provides a specific setting of interface information of two tool APIs, and when a data set is actually constructed, the interface information of all tool APIs may be integrated by using the form to construct a complete tool library.

Optionally, the sample question information may be question information input by the operation object obtained in the example, or may be question information set based on a requirement of a required training calling tool, such as an initial generation language model required to be trained for a currently generated training sample, and question information required to be accurately solved by an external tool is set.

Optionally, before the sample tool library is generated by acquiring at least one tool interface for a single current training sample from the pre-constructed tool library, the probability of selecting the corresponding tool interface is adjusted for the similarity between the sample problem information and the interface information of the tool interface, so that the probability of selecting the tool interface capable of effectively solving the sample problem information is increased.

For any tool interface in the pre-constructed tool library, the similarity between the sample problem information and the interface information of the tool interface is calculated first, for example, the similarity between the description sub-information, the function sub-information and the example sub-information in the interface information of the sample problem information and the tool interface is calculated. The calculation of the similarity may be performed with reference to a technique provided in the related art, which is not limited in this application. Then, the calculated similarity threshold may be compared with a preset threshold, and if the similarity is greater than or equal to the preset threshold, the probability that the tool interface is selected may be increased. The relationship between the magnitude of the similarity and the probability of increase can be adjusted according to the training degree of the model, which is not particularly limited in the application.

Illustratively, assuming that 10 tool interfaces are included in the pre-built tool library, before step S101 is performed, the probabilities of each tool interface being selected are the same, all 10% (normalized to 0.1), where the pre-set threshold is 0.6; after executing step S101 to perform the probability adjustment of being selected, if the similarity threshold between the interface information of the tool interface 3 and the sample problem information is 0.7, the probability of being selected between the tool interface 3 and the tool interface 7 may be increased when the similarity threshold between the interface information of the tool interface 7 and the sample problem information is 0.8. Alternatively, since the similarity thresholds corresponding to the tool interface 3 and the tool interface 7 are different, the probability of being selected may be increased by a larger probability than that of the tool interface 3 for the tool interface 7 with a higher similarity threshold, for example, the probability of being selected for the tool interface 7 is adjusted to 0.2, and the probability of being selected for the tool interface 3 is adjusted to 0.15. It should be understood that the above description is given of the step S101 as an example, and the numerical values referred to are not limiting to the embodiments of the present application.

Step S102: and in the pre-constructed tool library, carrying out random sampling processing based on the selection probability of each tool interface to obtain interface information of at least one tool interface, and generating a sample tool library based on the interface information of the at least one tool interface.

After the selected probabilities corresponding to the tool interfaces are adjusted (the similarity between part of interface information and sample problem information is smaller than a preset threshold value and adjustment is not needed), at least one tool interface and interface information thereof can be obtained from a pre-constructed tool library based on a random sampling mode so as to generate a sample tool library.

In a possible embodiment, the random sampling strategy implemented through step S101 to step S102 includes uniformly and randomly sampling tool interfaces in the pre-built tool library through a preset value interval by adopting a uniformly distributed random sampling mode of the preset value interval;

optionally, increasing the probability that the tool interface is selected in S101 includes: adjusting the sample value interval of the tool interface corresponding to the uniform distribution so that the sample value interval is smaller than a preset value interval;

optionally, in S102, in the pre-constructed tool library, random sampling processing is performed based on the probability that each tool interface is selected, to obtain interface information of at least one tool interface, including: based on the corresponding value of each tool interface in a preset value interval or a sample value interval, carrying out uniform random sampling processing in the pre-constructed tool library to obtain interface information of the tool interfaces with the corresponding sample number of the initial generation type language model to be trained; the number of samples is related to a maximum degree of input supported by the initially generated language model.

Wherein, the pre-constructed tool library is assumed to beHere a _i The i-th tool API is represented, and each tool API is composed of its description sub-information (d), function sub-information (f), and example sub-information (e), respectively, N being the total number of tool APIs.

The logic of the random sampling strategy employed by embodiments of the present application can be illustrated by the following equation (1) given the request (q) for the current object, i.e., given sample problem information:

wherein U (v, w) shows that the preset value interval is an even distribution of values between v and w (U (v) ^* W) shows a uniform distribution of truncations corresponding to the sample value interval, i.e. satisfying v<v ^* <w)，sim(<d,f,e>Q) calculate the similarity between the request q for the object and the description d, function f and example e of the tool API, if this value is greater than the threshold delta, the corresponding toolThe probability of having an API selected will rise; otherwise, random selection is performed according to uniform distribution between v and w (preset value interval). Illustratively, U (v, w) may be U (0, 1).

According to the embodiment of the application, the sample tool library is built by randomly adopting the strategy, and the number of training samples for effectively calling the tool API can be increased on the premise that the model trained by the generated training samples can learn the discrimination capability of the tool, so that the training and convergence of the model are accelerated.

Alternatively, the number of tool interfaces retrieved from the pre-built tool library may be a fixed value of the setting, such as 3, 5, 10 (as just one example). It will be appreciated that the number of tool interfaces acquired (i.e., the number of tool interfaces included in the sample tool library) is related to the maximum degree of input supported by the trained model. Illustratively, as shown in FIG. 8, sampling 3 tool APIs from the pre-built tool library constitutes a sample tool library.

Step S103: training samples are generated based on the sample problem information and the sample tool library.

Alternatively, the training samples may be constructed based on the sample problem information and the sample tool library corresponding to the steps S101-S102, for example, the sample problem information and the sample tool library are formed into a sample pair.

In an example, where the generated training samples include a sample tool library, the training samples may further include: and invoking sample interface return information generated by a sample tool interface in the sample tool library, sample answer information corresponding to the sample question information and the like.

In a possible embodiment, generating training samples based on the sample problem information and the sample tool library in step S103 includes steps B1-B3:

Step B1: if a sample tool interface corresponding to the sample problem information exists in the sample tool library, sample interface return information is generated based on information obtained by calling the sample tool interface, and sample answer information is generated based on the sample interface return information.

As shown in fig. 7, fig. 7 shows an example of construction of a single training sample, where the weather API included in the sample tool library is related to sample question information, sample interface return information (e.g., including a call function, a result corresponding to a parameter, etc.) may be generated based on information obtained by calling the weather API, and then sample answer information may be generated based on the sample interface return information, e.g., information related to the sample question information obtained from the sample interface return information may be generated in a natural language form.

Step B2: if the sample tool library does not have the sample tool interface corresponding to the sample question information, the sample interface return information is configured to be a null value, and sample answer information is determined based on the null value and the real answer information.

Since the sample tool library is derived based on a random sampling strategy, there may be cases where all tool interfaces included in the sample tool library are independent of sample question information, in which case the sample interface return information may be configured to be null (without any information) when constructing the training sample, and then the sample answer information is determined based on the null and the real answer information. That is, the sample answer information is irrelevant to the sample interface return information, and the real answer information is directly used as the answer information in the training sample.

Step B3: generating a training sample based on the sample question information, the sample tool library, the sample interface return information, and the sample answer information; the training sample also includes system cues that include information describing the tasks currently required to be performed by the model trained by the training sample and the associated requirements of the model input and output.

Optionally, the specific constitution and function of each part included in the training sample constructed for the present application will be described below with reference to fig. 7:

system Prompt (System Prompt): a section of description about the current task, and a trained model inputs and outputs a prompt word of related requirements; this can be seen as a global hint under the tool call task to help the model to better understand the current task by training the sample.

Sample problem information: also known as an object request (User Query), including any question that an object is manipulated into a language model; in a training scenario, the problem may enable the model to be accurately solved by means of an external tool. As in the example, "do you give me weather forecast of new york on tomorrow, including the possibility of air temperature and precipitation? "

Sample tool library: also known as Tool Pool (Tool Pool), includes a set of Tool APIs that the model to be trained can access. Wherein, the tools in the Tool pool are randomly sampled from a complete Tool library (a pre-constructed Tool library) (i.e. sampled by a Tool-Drop sampling strategy), and the situation that all the tools cannot solve the problem at present may occur. Embodiments of the present application allow this to occur to improve the discrimination of the trained model for valid tools.

The sample interface returns information: also known as API calls (API calls), if the trained model finds a suitable tool to handle the problem of operating the object, then the part is made up of the model's already filled calling functions and the answers of the corresponding API; otherwise, the portion is empty.

Sample answer information: also known as model Response (LLM Response), after completing the invocation (or non-invocation) of the tool API, the model needs to combine all of the knowledge currently to make the final Response to the object request. It will be appreciated that as shown in fig. 7, the sample answer information includes information determined based on sample interface return information; if the weather API in the sample tool library can solve the sample problem information, the calling function and the result corresponding to the corresponding API parameter included in the sample interface return information can be used to generate sample answer information in fig. 7; if the tool API in the sample tool library cannot solve the sample problem information, the sample interface return information is null, and the information determined based on the sample interface return information is null at the moment, and accordingly, the sample answer information is the real answer.

The following describes a knowledge question-and-answer based model training method in an embodiment of the present application.

It is contemplated that while current large language models are capable of performing many natural language processing tasks, their performance in invoking external tools remains limited. For example, these models do not leverage existing knowledge bases or online external services to provide more accurate answers. Meanwhile, the fact that the calling process of various tools is complex results in that when the model solves complex problems needing additional tool support, generalization capability of the model is limited and answer generation efficiency is low. In order to solve the technical problem, the embodiment of the application also provides a knowledge question-answering-based model training method.

Specifically, as shown in fig. 3, the knowledge question-and-answer based model training method includes steps S201 to S202:

step S201: acquiring training samples pre-constructed by the knowledge question and answer-based training sample generation method provided by the embodiment, wherein the training samples comprise sample question information, a sample tool library, sample interface return information and sample answer information; the sample tool library comprises interface information of tool interfaces acquired from a pre-constructed tool library based on a random sampling strategy; the sample interface return information includes information returned based on invoking a tool interface in the sample tool library; the sample answer information includes information determined based on the sample interface return information.

In the model training stage, a pre-constructed training sample can be directly obtained for training treatment. Sample question information (e.g., object request information User Query), sample Tool library (e.g., tool Pool Tool), sample interface return information (e.g., API Call information API Call), and sample answer information (e.g., model Response information model) may be included in a single training sample. The sample question information may be question information input by the operation object obtained in the example, or may be question information set based on the requirement of the required training calling tool, for example, for the current initial generation language model, question information which needs to be accurately solved by an external tool is set. The sample Tool library is a set comprising at least one Tool API obtained by sampling a Tool interface based on a pre-constructed Tool library, wherein the setting of a random sampling strategy (such as Tool-Drop) is used for increasing the number of training samples of a model calling Tool API on the premise that the model learns the discriminating capability of the Tool, and the random sampling strategy can be used for randomly sampling the model accessible Tool API (namely, the Tool API included in the sample Tool library) in the training process based on the similarity of sample problem information and the Tool API in the pre-constructed Tool library. The sample interface return information includes answer information obtained by calling the corresponding tool through the interface, namely, reference information for solving the sample problem, which is obtained by calling the external tool through the model. The sample answer information includes answer information corresponding to the sample question information, which may be answer information obtained by a question answer pair in an example dialogue, or information obtained by a model response object request (solution question) in an example.

Optionally, the training sample may also include system hints including information describing the tasks the model is currently required to perform and the associated requirements of the model input and output. For example, a specific dialogue scene and some special output specifications and other requirements can be preset for the model to be trained, and it is understood that the operation of the system prompt can be regarded as global setting; as shown in fig. 7.

S202: and training the initial generation type language model by adopting the training sample to obtain the target generation type language model for knowledge question answering.

Optionally, the initial generative language model may be a generative model such as LLM (Large Language Model ), LLaMA (Large Language Model Meta AI, an open and efficient large basic language model), alpaca (a data generating model based on a self-guiding framework), and Vicuna (an open source large language model), where the specific model may be determined according to requirements, and this application is not limited thereto. And the substitution of any generated model is also beneficial to widening the universality and the universality of the scheme of the application.

Alternatively, as shown in fig. 5, for training samples for training an initially generated language model, word segmentation and word embedding processing of text may be performed before the model is input.

In this embodiment of the present application, as shown in fig. 5, training is performed on a pre-trained initial generation type language model through a pre-constructed training sample, so as to fine tune the initial generation type language model, and finally obtain a target generation type language model for knowledge question-answering, so that the generalization capability of a model calling tool is effectively enhanced, and the training and convergence of the model are accelerated, so that the language model can better serve an actual application scenario.

The following describes specific content of constructing training samples in the embodiments of the present application.

Considering that the initially generated language model after pre-training already has a certain priori information (i.e. has learned a certain knowledge), the understanding of the model's tool calls needs to be learned from the constructed training samples. Specifically, in the training process of the model, the training of the initial generation type language model comprises the steps of C1-C3:

step C1: and acquiring interface information of a prediction tool interface from the sample tool library based on the sample problem information, and calling the prediction tool interface through the interface information to obtain prediction interface return information.

In the training process, the initial generation type language model can acquire interface information of an accessible prediction tool interface from a sample tool library provided in a training sample based on sample problem information, so as to call the prediction tool interface through the interface information to obtain answer information returned by the interface for the sample problem information.

Optionally, during the training process, the language model may or may not call the tool interface in the sample tool library (the case of no call may be that the model judges that the external tool is not required to be called to solve the problem, or that the model judges that the sample tool library does not include the external tool interface capable of solving the problem information of the sample).

Optionally, in step C1, the predicted interface is called through the interface information to obtain predicted interface return information, which includes step C11-step C13:

step C11: and determining a corresponding calling function from the function sub-information of the interface information based on the sample problem information.

As shown in fig. 6 and 7, the types of parameters corresponding to different callable functions in the function sub-information of the same tool API are different, and the results obtained by the API are different, so that it is required to determine the calling function adopted when calling the prediction tool interface based on the currently required sample problem solving information.

Step C12: corresponding parameters are determined from the example sub-information of the interface information based on the sample problem information.

As shown in fig. 6 and fig. 7, different problem information is adapted in the example sub-information of the same tool API, and corresponding parameters are different, so that parameters sampled when the prediction tool interface is called need to be determined based on sample problem information to be solved currently.

Step C13: calling the prediction tool interface based on the calling function and the parameter to obtain the return information of the prediction interface; and the prediction interface return information comprises the calling function and returned answer information.

Alternatively, when the actual call is performed on the prediction tool interface by calling functions and parameters, answer information (prediction interface return information) returned by the prediction tool interface can be obtained; that is, the model solves the reference information of the sample problem information with the aid of an external tool.

Step C2: and determining the prediction answer information corresponding to the sample question information based on at least one of the prediction interface return information and the prior answer information determined by the prior information.

Optionally, the initial generation type language model can give out the prior answer information of the initial generation type language model by the prior information aiming at the sample question information, and then the final output prediction answer information is comprehensively considered by combining the prediction interface return information obtained by the calling tool. In an example, the predicted answer information may be answer information in the prediction interface return information, a priori answer information, or information generated jointly by the prediction interface return information and the a priori answer information.

Optionally, in the training process of the step C1 to the step C2, interface information of a prediction tool interface is obtained from the sample tool library based on the sample question information, the prediction tool interface is called by the interface information to obtain prediction interface return information, and prediction answer information corresponding to the sample question information is determined based on at least one of the prediction interface return information and the prior answer information determined by the prior information, including the step C21 to the step C23:

step C21: and if the tool interface included in the sample tool library is irrelevant to the sample question information, the initial generation type language model acquires interface information of a prediction tool interface from the sample tool library based on the sample question information, calls the prediction tool interface through the interface information to obtain prediction interface return information, and determines that the prediction interface return information is irrelevant to the sample question information, and then outputs prediction answer information corresponding to the sample question information based on prior information.

Alternatively, since the sample tool library is a set formed by acquiring tool interfaces from a pre-constructed tool library based on a random sampling strategy, there may be a case where the tool interfaces included in the sample tool library are independent of sample problem information because of the randomness of the acquisition of the tool interfaces. In this case, the model can be made to learn how to discriminate the call validity of the accessible tools. In an example, in the training process, the model may not accurately determine whether the tool interfaces in the sample tool library can necessarily solve the sample problem information, at this time, the model may acquire a prediction tool interface to call based on the sample problem information, and after performing actual call on the prediction tool interface based on the interface information of the prediction tool interface, the model may obtain the prediction interface return information (reference answer) returned by the prediction tool interface, if the model determines that the answer information provided by the model is irrelevant to the sample problem information (cannot solve the problem) after obtaining the prediction interface return information, the model outputs the prediction answer information corresponding to the sample problem information based on the prior information of the model itself. At this point, the model may learn that the call to the predictive tool interface is invalid.

Step C22: if the sample tool library comprises at least one tool interface corresponding to the sample question information, and the initial generation type language model acquires interface information of a prediction tool interface corresponding to the sample question information from the sample tool library, the interface information is used for calling the prediction tool interface to obtain prediction interface return information, prior answer information is determined based on prior information, and the prediction answer information corresponding to the sample question information is output based on at least one of the prior answer information and the prediction interface return information.

Optionally, since the sample tool library is a set formed by acquiring tool interfaces from a pre-constructed tool library based on a random sampling strategy, since the acquisition of the tool interfaces has randomness, there may be cases where at least part of the tool interfaces included in the sample tool library are related to sample problem information, that is, there may be partial correlations and partial uncorrelated conditions in the sample tool library. On the basis, when the model can acquire the interface information of the prediction tool interface corresponding to the sample question information from the sample tool library, the prediction tool interface can be actually called through the interface information to obtain the prediction interface return information, the prior answer information is determined based on the prior information, and then the prediction answer information corresponding to the sample question information is output based on at least one of the prior answer information and the prediction interface return information.

Step C23: and if the initial generation type language model cannot acquire the interface information of the prediction tool interface corresponding to the sample question information from the sample tool library, determining that the prediction interface return information is null and outputting the prediction answer information corresponding to the sample question information based on prior information.

Optionally, whether the tool interfaces included in the sample tool library are related to the sample problem information or not, there may be a case that the initially generated model cannot acquire the prediction tool interfaces corresponding to the sample problem information from the sample tool library (assuming that the tool interfaces related to the sample problem information exist in the sample tool library, and a case that a fault occurs in the validity judgment of the model on the tool interfaces may also occur), on the basis of this, the model may directly determine that the prediction interface return information is null (that is, virtual call is performed on the prediction tool interfaces), and on the basis of this, the model may output the prediction answer information corresponding to the sample problem information based on the prior information.

Step C3: and determining a target loss value based on the sample answer information and the predicted answer information, and updating the initial generation type language model based on the target loss value.

Optionally, in updating the network parameters of the model, the embodiment of the application considers based on the final output result of the model, such as determining a loss value (i.e. an error between the predicted value and the actual value of the model in the training phase) based on the predicted answer information output by the model and the sample answer information obtained from the training sample, so as to update the initially generated language model based on the loss value.

Optionally, determining the target loss value in step C3 based on the sample answer information and the predicted answer information includes steps C31-C32:

step C31: for each word ordered in the second and above in the text sequence of the sample answer information, determining a first maximum likelihood value corresponding to the word in combination with all word information ordered before the word.

Step C32: and determining a second maximum likelihood value corresponding to each word in the text sequence of the predicted answer information, wherein the words are ranked at the second position and above, and the second maximum likelihood value is determined by combining all word information ranked before the word.

Step C33: and determining loss values corresponding to the words based on the first maximum likelihood value and the second maximum likelihood value, and summing all the loss values to obtain a target loss value.

Optionally, in the training phase, model optimization is performed using a generated training paradigm, i.e., when predicting the current ith word, information of the top i-1 words is used. Wherein the calculation of the target loss value can be performed by the following formula (2):

wherein y is _i The method is characterized in that the i-th word (token) in the answer information is that S is system prompt information, U is sample question information, T is a sample tool library, A is sample interface return information and L is sample answer information.

It will be appreciated that the system prompt information and sample question information are given information (a priori information) in the training samples, and that the model can optimize the tool API called, the interface return information, and the answer information output during the stepwise training.

The embodiment of the application provides a knowledge question-answering based model training method, in particular to a fine tuning method for enhancing the generalization capability of a language model tool call, on the premise of giving a limited tool API set, a random sampling strategy is used in the training sample construction process, so that a model can only obtain limited tool API information in a single training sample, and further the training model combines the matching degree of the current question information and the tool API to determine whether the tool API needs to be further called or not, or the model is directly answered based on the existing knowledge of the model. If the model selects a more relevant tool API from the current tool API set, after the model gives the parameters needed to call the tool API, the answer to the current question is obtained through the tool API service. Finally, the model will combine the reference information given by the tool API and the knowledge of itself, and give the final answer information to the current question. The implementation of the scheme can effectively enhance the generalization capability of the model in the aspect of tool call, greatly reduce the dependence on a large number of training sets and improve the training efficiency of the model.

The following describes a knowledge question-answering method in the embodiment of the present application.

Specifically, as shown in fig. 4, the knowledge question-answering method includes S301 to S302:

step S301: and acquiring target problem information input by the operation object.

Optionally, if the execution body is a terminal, the execution body may interact with the operation object through an interaction interface of the terminal, for example, when responding to an input operation triggered by the operation object at the interaction interface, the target problem information input by the operation object may be obtained. If the execution subject is a server, the terminal can acquire the target problem information and upload the target problem information to the server.

Step S302: and if the tool interface calling is required for the target problem information and a target tool interface corresponding to the target problem information exists in a pre-constructed tool library, acquiring interface information of the target tool interface, calling a corresponding target tool interface through the interface information, acquiring first answer information corresponding to the target problem information, determining second answer information corresponding to the target problem information based on prior information, and outputting target answer information corresponding to the target problem information based on at least one of the first answer information and the second answer information. The target generation type language model is obtained through training by the knowledge question-answer based model training method provided by the embodiment.

Optionally, as shown in fig. 10, when determining answer information through the target generation type language model, the target generation type language model first determines whether to make a call of the tool API, where the basis of the determination may be to consider whether existing knowledge of the target generation type language model can solve the target problem information, and if so, the target generation type language model does not need to call the tool API, and directly outputs the target answer information based on the priori information; if the problem cannot be solved, determining that the tool API needs to be called, further considering whether a target tool interface corresponding to the target problem information exists in the pre-built tool library (namely, whether a tool interface capable of effectively solving the target problem information exists or not), if so, acquiring corresponding interface information of the target tool interface, and executing calling of the target tool interface based on the interface information (calling of the API shown in fig. 10 is successful), wherein at the moment, first answer information returned by the target tool interface can be obtained, and then the target generation type language model can output second answer information based on prior information and output final target answer information based on at least one of the first answer information and the second answer information; if not, namely, a target tool interface corresponding to the target question information does not exist in the pre-constructed tool library, the target generation type language model outputs target answer information aiming at the target question information based on prior information.

Aiming at Tool call of the language model, the embodiment of the application provides an efficient training framework, particularly a Tool-Drop random sampling strategy adopted in the process of constructing training data samples, so that generalization of the language model in terms of Tool call can be effectively enhanced, and dependence on a large number of training samples is reduced. In addition, the tool call generalization capability of the model can be enhanced so that the model learns to use various tools and external services to better solve the problem, even though the problem that the model history is not contacted is possible to be effectively solved. Accordingly, the model can better understand and solve the problem, so that the operation object can obtain better experience in interacting with it. This is important for the acceptance of the operation object and the expansion of the use range of AI. Finally, the embodiment of the application can also realize that the model can effectively call and use various tools, so that a great number of complicated reasoning steps are omitted in the workflow, and the generation efficiency is improved.

To better illustrate the methods provided by embodiments of the present application, the effects achieved by the prior art and embodiments of the present application are illustrated in conjunction with fig. 9.

The comparison results of the different methods are shown in fig. 9, and the tool calling effect of the language model after fine tuning by using the existing common training technology is shown on the left side, and it can be seen that, for the Weather (Weather API) tool that the model has seen in the training phase, the model can correctly call and reply to the question of the operation object during reasoning. However, for tools that do not appear during the training phase (e.g., dates, stocks, maps, etc.), the model may not answer the associated question well, i.e., the output result is a refusal answer, or the generated answer is incorrect.

In contrast, after the language model is trained by adopting the method provided by the embodiment of the application, even for tool APIs such as Date APIs (Date APIs), stock APIs (Stock APIs), map APIs (Map APIs) and the like which are never seen by the model in the training stage, the model can select an appropriate API from the current tool API set (pre-built tool library) to process the request of the operation object. For example, for the problem of an object of operation ("what is Apple's Stock price today.

As can be seen from the comparison result of the example experiment shown in FIG. 9, the fine tuning strategy provided by the embodiment of the present application can greatly improve the generalization of the model calling tool; meanwhile, the high-efficiency training strategy provided by the embodiment of the application can also greatly reduce the dependence on a large number of training samples, and is beneficial to reducing the cost required for constructing a data set.

It should be noted that, in the alternative embodiments of the present application, the related data (such as data related to sample question information, sample answer information, interface information, etc.) needs to be licensed or agreed upon by the user when the embodiments of the present application are applied to specific products or technologies, and the collection, use and processing of the related data needs to comply with the relevant laws and regulations and standards of the relevant countries and regions. That is, in the embodiments of the present application, if data related to the subject is involved, the data needs to be obtained through the subject authorization consent, and in compliance with relevant laws and regulations and standards of the country and region.

The embodiment of the application provides a training sample generating device based on knowledge questions and answers, as shown in fig. 11, the training sample generating device 100 based on knowledge questions and answers may include: a probability adjustment module 101, a random sampling module 102 and a sample generation module 103.

The probability adjustment module 101 is configured to increase, for any tool interface in the pre-constructed tool library, the probability that the tool interface is selected if the similarity between the interface information of the tool interface and the sample problem information is greater than or equal to a preset threshold; the random sampling module 102 is configured to perform random sampling processing in the pre-constructed tool library based on the probability that each tool interface is selected, obtain interface information of at least one tool interface, and generate a sample tool library based on the interface information of the at least one tool interface; the sample generation module 103 is configured to generate training samples based on the sample problem information and the sample tool library.

In a possible embodiment, the random sampling strategy includes uniformly and randomly sampling tool interfaces in the pre-built tool library through a preset value interval by adopting a uniformly distributed random sampling mode of the preset value interval; the probability adjustment module 101, when configured to perform increasing the probability that the tool interface is selected, is specifically configured to: and adjusting the sample value interval of the tool interface which corresponds to the uniform distribution so that the sample value interval is smaller than a preset value interval.

The random sampling module 102 is configured to perform random sampling processing based on the probability that each tool interface is selected in the pre-constructed tool library, and when obtaining interface information of at least one tool interface, specifically is configured to: based on the corresponding value of each tool interface in a preset value interval or a sample value interval, carrying out uniform random sampling processing in the pre-constructed tool library to obtain interface information of the tool interfaces with the corresponding sample number of the initial generation type language model to be trained; the number of samples is related to a maximum degree of input supported by the initially generated language model.

In a possible embodiment, the interface information of each tool interface included in the pre-built tool library in the probability adjustment module 101 is obtained through the following construction operations:

Generating descriptive sub-information for the content information acquired through the tool interface; the content information is related to a scene to which the problem information belongs;

generating function sub-information aiming at a function which can be called by the tool interface, an incoming parameter type and a returned parameter type;

generating example sub-information aiming at example problems and parameters corresponding to the functions in the tool interface;

interface information of the tool interface is constructed based on the description sub-information, the function sub-information and the example sub-information.

In a possible embodiment, the sample generation module 103 is specifically configured to, when configured to perform generating a training sample based on the sample problem information and the sample tool library:

if a sample tool interface corresponding to the sample problem information exists in the sample tool library, generating sample interface return information based on information obtained by calling the sample tool interface, and generating sample answer information based on the sample interface return information;

if the sample tool library does not have a sample tool interface corresponding to the sample question information, configuring sample interface return information as a null value, and determining sample answer information based on the null value and the real answer information;

Generating a training sample based on the sample question information, the sample tool library, the sample interface return information, and the sample answer information; the training sample also includes system cues that include information describing the tasks currently required to be performed by the model trained by the training sample and the associated requirements of the model input and output.

The embodiment of the application provides a knowledge question-and-answer based model training device, as shown in fig. 12, the knowledge question-and-answer based model training device 200 may include: a sample acquisition module 201 and a model training module 202.

The sample obtaining module 201 is configured to obtain a training sample pre-constructed by the training sample generating device 200 based on knowledge question and answer, where the training sample includes sample question information, a sample tool library, sample interface return information and sample answer information; the sample interface return information includes information returned based on invoking a tool interface in the sample tool library; the sample answer information includes information determined based on the sample interface return information; the model training module 102 is configured to train an initial generation type language model by using the training sample, so as to obtain a target generation type language model for knowledge question answering; model training module 202 is also configured to train the initially generated language model, and in particular to: acquiring interface information of a prediction tool interface from the sample tool library based on the sample question information, calling the prediction tool interface through the interface information to obtain prediction interface return information, and determining prediction answer information corresponding to the sample question information based on at least one of the prediction interface return information and prior answer information determined through prior information; and determining a target loss value based on the sample answer information and the predicted answer information, and updating the initial generation type language model based on the target loss value.

In a possible embodiment, the model training module 202 is specifically configured to, when configured to determine a target loss value based on the sample answer information and the predicted answer information:

for each word ordered in the second position and above in the text sequence of the sample answer information, determining a first maximum likelihood value corresponding to the word by combining all word information ordered before the word;

determining a second maximum likelihood value corresponding to each word in the text sequence of the predicted answer information, wherein the second word is ranked at or above the first word, and the second maximum likelihood value is determined by combining all word information ranked before the first word;

and determining loss values corresponding to the words based on the first maximum likelihood value and the second maximum likelihood value, and summing all the loss values to obtain a target loss value.

In a possible embodiment, the model training module 202 is specifically configured to, when configured to execute the predicted interface return information obtained by calling the prediction tool interface through the interface information:

determining a corresponding calling function from the function sub-information of the interface information based on the sample problem information;

determining corresponding parameters from the example sub-information of the interface information based on the sample problem information;

Calling the prediction tool interface based on the calling function and the parameter to obtain the return information of the prediction interface; and the prediction interface return information comprises the calling function and returned answer information.

In a possible embodiment, the model training module 202 is specifically configured to, when executing the obtaining, based on the sample question information, interface information of a prediction tool interface from the sample tool library, call the prediction tool interface through the interface information to obtain prediction interface return information, and determine, based on at least one of the prediction interface return information and prior answer information determined by prior information, prediction answer information corresponding to the sample question information:

if the tool interface included in the sample tool library is irrelevant to the sample problem information, and the initial generation type language model acquires interface information of a prediction tool interface from the sample tool library based on the sample problem information, calls the prediction tool interface through the interface information to obtain prediction interface return information, and determines that the prediction interface return information is irrelevant to the sample problem information, then outputting prediction answer information corresponding to the sample problem information based on prior information;

If the sample tool library comprises at least one tool interface corresponding to the sample question information, and the initial generation type language model acquires interface information of a prediction tool interface corresponding to the sample question information from the sample tool library, calling the prediction tool interface through the interface information to obtain prediction interface return information, determining prior answer information based on prior information, and outputting prediction answer information corresponding to the sample question information based on at least one of the prior answer information and the prediction interface return information;

and if the initial generation type language model cannot acquire the interface information of the prediction tool interface corresponding to the sample question information from the sample tool library, determining that the prediction interface return information is null and outputting the prediction answer information corresponding to the sample question information based on prior information.

The embodiment of the application provides a knowledge question-answering device, as shown in fig. 13, the knowledge question-answering device 300 may include: a question acquisition module 301 and an answer output module 302.

The problem obtaining module 301 is configured to obtain target problem information input by the operation object; the answer output module 302 is configured to, through a target generation type language model, if it is determined that a tool interface call is required for the target question information, and a target tool interface corresponding to the target question information exists in a pre-built tool library, obtain interface information of the target tool interface, call a corresponding target tool interface through the interface information, obtain first answer information corresponding to the target question information, determine second answer information corresponding to the target question information based on prior information, and output target answer information corresponding to the target question information based on at least one of the first answer information and the second answer information; the target generation type language model is obtained through training by the knowledge question-answer based model training device provided by the embodiment.

The apparatus of the embodiments of the present application may perform the method provided by the embodiments of the present application, and implementation principles of the method are similar, and actions performed by each module in the apparatus of each embodiment of the present application correspond to steps in the method of each embodiment of the present application, and detailed functional descriptions of each module of the apparatus may be referred to in the corresponding method shown in the foregoing, which is not repeated herein.

In the present embodiment, the term "module" or "unit" refers to a computer program or a part of a computer program having a predetermined function, and works together with other relevant parts to achieve a predetermined object, and may be implemented in whole or in part by using software, hardware (such as a processing circuit or a memory), or a combination thereof. Also, a processor (or multiple processors or memories) may be used to implement one or more modules or units. Furthermore, each module or unit may be part of an overall module or unit that incorporates the functionality of the module or unit.

The modules referred to in the embodiments described in the present application may be implemented by software. The name of the module is not limited to the module itself in some cases, and for example, the sample acquisition module may be also described as a "module for acquiring a pre-constructed training sample", "first module", or the like.

The embodiment of the application provides electronic equipment, which comprises a memory, a processor and a computer program stored on the memory, wherein the processor executes the computer program to realize the steps of a knowledge question-answering based model training method and a knowledge question-answering method, and compared with the related technology, the method can realize the following steps:

In a second aspect, an embodiment of the present application provides a knowledge question-and-answer based model training method, specifically, a training sample pre-constructed by the method provided in the first aspect may be obtained, and an initial generation type language model is trained by using the training sample, so as to obtain a target generation type language model for knowledge question-and-answer; the single training sample comprises sample question information, a sample tool library, sample interface return information and sample answer information; the sample tool library includes interface information for tool interfaces obtained from a pre-built tool library based on a random sampling strategy. In the embodiment of the application, in the training process of the initial generation type language model, the generation type model can acquire interface information of a prediction tool interface from a sample tool library based on sample question information, call a corresponding prediction tool interface through the interface information to obtain prediction interface return information, determine prediction answer information corresponding to the sample question information based on at least one of the prediction interface return information and prior answer information determined through prior information, then determine a loss value based on the sample answer information and the prediction answer information, and update the generation type model based on the loss value. According to the implementation of the scheme, on the premise that the model can learn the discrimination capability of the tool, the number of training samples for effectively calling the tool interface is increased, so that training and convergence of the model are accelerated; the sample tool library is constructed by adopting a random sampling strategy, so that generalization of a trained model on tool calling can be effectively enhanced, dependence on a large number of training samples is reduced, a target generated language model obtained by training can be effectively called by various tools, a large number of complex reasoning steps are omitted in the process of executing a knowledge question-answering task, and the answer generation efficiency is improved.

In an alternative embodiment, there is provided an electronic device, as shown in fig. 14, the electronic device 4000 shown in fig. 14 includes: a processor 4001 and a memory 4003. Wherein the processor 4001 is coupled to the memory 4003, such as via a bus 4002. Optionally, the electronic device 4000 may further comprise a transceiver 4004, the transceiver 4004 may be used for data interaction between the electronic device and other electronic devices, such as transmission of data and/or reception of data, etc. It should be noted that, in practical applications, the transceiver 4004 is not limited to one, and the structure of the electronic device 4000 is not limited to the embodiment of the present application.

The processor 4001 may be a CPU (Central Processing Unit ), general purpose processor, DSP (Digital Signal Processor, data signal processor), ASIC (Application Specific Integrated Circuit ), FPGA (Field Programmable Gate Array, field programmable gate array) or other programmable logic device, transistor logic device, hardware components, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules, and circuits described in connection with this disclosure. The processor 4001 may also be a combination that implements computing functionality, e.g., comprising one or more microprocessor combinations, a combination of a DSP and a microprocessor, etc.

Bus 4002 may include a path to transfer information between the aforementioned components. Bus 4002 may be a PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus or an EISA (Extended Industry Standard Architecture ) bus, or the like. The bus 4002 can be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in fig. 14, but not only one bus or one type of bus.

Memory 4003 may be, but is not limited to, ROM (Read Only Memory) or other type of static storage device that can store static information and instructions, RAM (Random Access Memory ) or other type of dynamic storage device that can store information and instructions, EEPROM (Electrically Erasable Programmable Read Only Memory ), CD-ROM (Compact Disc Read Only Memory, compact disc Read Only Memory) or other optical disk storage, optical disk storage (including compact discs, laser discs, optical discs, digital versatile discs, blu-ray discs, etc.), magnetic disk storage media, other magnetic storage devices, or any other medium that can be used to carry or store a computer program and that can be Read by a computer.

The memory 4003 is used for storing a computer program that executes an embodiment of the present application, and is controlled to be executed by the processor 4001. The processor 4001 is configured to execute a computer program stored in the memory 4003 to realize the steps shown in the foregoing method embodiment.

Among them, electronic devices include, but are not limited to: terminal and server.

Embodiments of the present application provide a computer readable storage medium having a computer program stored thereon, where the computer program, when executed by a processor, may implement the steps and corresponding content of the foregoing method embodiments.

The embodiments of the present application also provide a computer program product, which includes a computer program, where the computer program can implement the steps of the foregoing method embodiments and corresponding content when executed by a processor.

The terms "first," "second," "third," "fourth," "1," "2," and the like in the description and in the claims of this application and in the above-described figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the present application described herein may be implemented in other sequences than those illustrated or otherwise described.

It should be understood that, although the flowcharts of the embodiments of the present application indicate the respective operation steps by arrows, the order of implementation of these steps is not limited to the order indicated by the arrows. In some implementations of embodiments of the present application, the implementation steps in the flowcharts may be performed in other orders as desired, unless explicitly stated herein. Furthermore, some or all of the steps in the flowcharts may include multiple sub-steps or multiple stages based on the actual implementation scenario. Some or all of these sub-steps or phases may be performed at the same time, or each of these sub-steps or phases may be performed at different times, respectively. In the case of different execution time, the execution sequence of the sub-steps or stages may be flexibly configured according to the requirement, which is not limited in the embodiment of the present application.

The foregoing is merely an optional implementation manner of the implementation scenario of the application, and it should be noted that, for those skilled in the art, other similar implementation manners based on the technical ideas of the application are adopted without departing from the technical ideas of the application, and also belong to the protection scope of the embodiments of the application.

Claims

1. A knowledge question and answer based training sample generation method, comprising:

2. The method according to claim 1, wherein the random sampling strategy comprises uniformly and randomly sampling tool interfaces in the pre-built tool library through a preset value interval by adopting a uniformly distributed random sampling mode of the preset value interval;

Said increasing the probability that the tool interface is selected comprises: adjusting the sample value interval of the tool interface corresponding to the uniform distribution so that the sample value interval is smaller than a preset value interval;

and in the pre-constructed tool library, performing random sampling processing based on the probability that each tool interface is selected to obtain interface information of at least one tool interface, including:

based on the corresponding value of each tool interface in a preset value interval or a sample value interval, carrying out uniform random sampling processing in the pre-constructed tool library to obtain interface information of the tool interfaces with the corresponding sample number of the initial generation type language model to be trained; the number of samples is related to a maximum degree of input supported by the initially generated language model.

3. The method according to claim 1, wherein the interface information of each tool interface included in the pre-built tool library is obtained by the following building operations:

4. A method according to any of claims 1-3, wherein the generating training samples based on the sample problem information and the sample tool library comprises:

5. A knowledge question-and-answer based model training method is characterized by comprising the following steps:

obtaining training samples pre-constructed by the method of any one of claims 1-4, the training samples comprising sample question information, sample tool library, sample interface return information, and sample answer information; the sample interface return information includes information returned based on invoking a tool interface in the sample tool library; the sample answer information includes information determined based on the sample interface return information;

6. The method of claim 5, wherein the determining a target loss value based on the sample answer information and the predicted answer information comprises:

7. The method of claim 5, wherein said calling the predictive tool interface via the interface information to obtain predictive interface return information comprises:

8. The method of claim 5, wherein the obtaining interface information of a prediction tool interface from the sample tool library based on the sample question information, calling the prediction tool interface through the interface information to obtain prediction interface return information, and determining prediction answer information corresponding to the sample question information based on at least one of the prediction interface return information and a priori answer information determined by a priori information, comprises:

9. A knowledge question-answering method, comprising:

acquiring target problem information input by an operation object;

if it is determined that tool interface calling is required for the target question information through a target generation type language model, and a target tool interface corresponding to the target question information exists in a pre-constructed tool library, interface information of the target tool interface is acquired, a corresponding target tool interface is called through the interface information, first answer information corresponding to the target question information is acquired, second answer information corresponding to the target question information is determined based on prior information, and target answer information corresponding to the target question information is output based on at least one of the first answer information and the second answer information; the object-generated language model trained by the method of any one of claims 5-8.

10. A knowledge question and answer based training sample generation device, comprising:

11. A knowledge question-and-answer based model training apparatus, comprising:

a sample acquisition module for acquiring training samples pre-constructed by the knowledge question and answer based training sample generation device of claim 10, wherein the training samples comprise sample question information, sample tool library, sample interface return information and sample answer information; the sample interface return information includes information returned based on invoking a tool interface in the sample tool library; the sample answer information includes information determined based on the sample interface return information;

12. A knowledge question-answering apparatus, comprising:

the answer output module is used for acquiring interface information of a target tool interface through a target generation type language model if it is determined that tool interface calling is required for the target problem information, and a target tool interface corresponding to the target problem information exists in a pre-constructed tool library, acquiring first answer information corresponding to the target problem information through the interface information calling of the corresponding target tool interface, determining second answer information corresponding to the target problem information based on prior information, and outputting target answer information corresponding to the target problem information based on at least one of the first answer information and the second answer information; the objective-generating language model is trained by the knowledge-based model training apparatus according to claim 11.

13. An electronic device comprising a memory, a processor and a computer program stored on the memory, characterized in that the processor executes the computer program to implement the method of any one of claims 1-9.

14. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the method of any of claims 1-9.

15. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the method of any of claims 1-9.