CN113761136A

CN113761136A - Dialogue processing method, information processing method, model training method, information processing apparatus, model training apparatus, and storage medium

Info

Publication number: CN113761136A
Application number: CN202010489948.3A
Authority: CN
Inventors: 戴音培; 孙健; 唐呈光; 黎航宇; 李永彬
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-06-02
Filing date: 2020-06-02
Publication date: 2021-12-07

Abstract

The embodiment of the application provides a dialogue processing method, an information processing method, a model training method, a device and a storage medium. In some embodiments of the present application, an information processing apparatus obtains a dialog current dialog history including a question of a current turn and a history turn, and a system reply candidate set; replying a candidate set and a dialogue history by the system, and inputting a human-computer cooperation dialogue model which is trained by adopting a meta-learning training method in advance; the man-machine cooperation dialogue model determines whether an artificial mode is adopted, and under the condition that the man-machine cooperation dialogue model determines that a system reply mode is adopted, the information processing equipment receives replies aiming at the problems of the current round, wherein the replies are selected and output from the system reply candidate set by the man-machine cooperation dialogue model; the man-machine cooperation dialogue model trained by the meta-learning training method has higher dialogue accuracy.

Description

Dialogue processing method, information processing method, model training method, information processing apparatus, model training apparatus, and storage medium

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a method, device, and storage medium for dialogue processing, information processing, and model training.

Background

The interactive characteristic of the artificial intelligence era is summarized by a word, namely 'conversation', and the most natural human interactive mode, either voice or text, is used for sending instructions to the machine to interact with the machine.

At present, the dialogue model trained by using less training data has lower dialogue accuracy.

Disclosure of Invention

Aspects of the present application provide a dialogue processing, information processing, model training method, apparatus, and storage medium to provide human-machine collaboration dialogue model accuracy.

The embodiment of the application provides a model training method, which comprises the following steps:

obtaining a system reply content sample and a conversation history content sample for a current session;

obtaining a system reply vector of the system reply content and a conversation state vector of the conversation history content;

obtaining an enhanced learning loss function and a cross entropy loss function of the dialogue model according to the system reply vector and the dialogue state vector;

obtaining a joint loss function according to the reinforcement learning loss function and the cross entropy loss function;

and training the network parameters of the model according to the joint loss function to obtain the dialogue model.

The embodiment of the present application further provides a dialog processing method, which performs dialog processing by using the above dialog model, and includes:

receiving a question of a current conversation;

according to the problem of the current conversation, system reply content and conversation historical content aiming at the current conversation are obtained;

generating a system reply vector of the system reply content and a dialogue state vector of the dialogue historical content;

and generating the reply content of the current dialogue question according to the system reply vector and the dialogue state vector.

An embodiment of the present application further provides an information processing method, including:

acquiring a system reply candidate set and a current conversation history for conversation reply, wherein the current conversation history comprises questions of a current turn of a user and conversations of a history turn;

inputting the system reply candidate set and the current conversation history into a pre-trained human-computer cooperation conversation model so that the human-computer cooperation conversation model can determine a reply mode for replying the current turn of the problem; and

receiving replies aiming at the problems of the current turn, which are selected and output from a system reply candidate set when the human-computer cooperation dialogue model determines to adopt a system reply mode;

the human-computer cooperation dialogue model is obtained by training a system reply candidate set sample and a dialogue history sample by adopting a meta-learning training method.

The embodiment of the present application further provides a model training method, including:

acquiring a system reply candidate set sample, a conversation history sample and a target reply sample selected from the system reply candidate set sample;

and carrying out classification training by adopting a meta-learning method according to the system reply candidate set samples, the conversation history samples and the target reply samples selected from the system reply candidate set samples to obtain a man-machine cooperation conversation model.

An information processing apparatus according to an embodiment of the present application includes: one or more processors and one or more memories storing computer programs;

the one or more processors to execute the computer program to:

the system replies the candidate set and the conversation history, and inputs a pre-trained human-computer cooperation conversation model so that the human-computer cooperation conversation model can determine a reply mode for replying the current turn of problems; and

Embodiments of the present application also provide a computer-readable storage medium storing a computer program that, when executed by one or more processors, causes the one or more processors to perform actions comprising:

An embodiment of the present application further provides a model training device, including: one or more processors and one or more memories storing computer programs;

the one or more processors to execute the computer program to:

An embodiment of the present application further provides an information processing apparatus, including:

the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a system reply candidate set and a current conversation history for conversation reply, and the current conversation history comprises problems of a current turn of a user and conversations of a history turn;

the input module is used for inputting the system reply candidate set and the conversation history into a pre-trained human-computer collaboration conversation model so that the human-computer collaboration conversation model can determine a reply mode for replying the current turn of the problem;

the receiving module is used for receiving replies aiming at the problems of the current round, which are selected and output from the system reply candidate set when the human-computer cooperation dialogue model determines to adopt the system reply mode;

The embodiment of the present application further provides a model training device, including:

the acquisition module is used for acquiring a system reply candidate set sample, a conversation history sample and a target reply sample selected from the system reply candidate set sample;

and the training module is used for carrying out classification training by adopting a meta-learning method according to the system reply candidate set samples, the conversation history samples and the target reply samples selected from the system reply candidate set samples to obtain a man-machine cooperation conversation model.

In some exemplary embodiments of the present application, an information processing apparatus acquires a dialog current dialog history including a question of a current turn and a history turn, and a system reply candidate set; replying a candidate set and a dialogue history by the system, and inputting a human-computer cooperation dialogue model which is trained by adopting a meta-learning training method in advance; the man-machine cooperation dialogue model determines whether an artificial mode is adopted, and under the condition that the man-machine cooperation dialogue model determines that a system reply mode is adopted, the information processing equipment receives replies aiming at the problems of the current round, wherein the replies are selected and output from the system reply candidate set by the man-machine cooperation dialogue model; the man-machine cooperation dialogue model trained by the meta-learning training method has higher dialogue accuracy.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

FIG. 1a is a schematic structural diagram of a human-machine collaboration dialog system 10 according to an exemplary embodiment of the present application;

FIG. 1b is a schematic diagram of a model structure of a human-machine collaboration dialogue model 20 according to an exemplary embodiment of the present application;

FIG. 1c is a schematic diagram of a model training framework provided in an exemplary embodiment of the present application;

FIG. 1d is a schematic diagram of another exemplary human-machine collaboration dialog system 40 according to an exemplary embodiment of the present application;

FIG. 2 is a flowchart of a method of processing information provided by an exemplary embodiment of the present application;

fig. 3 is a schematic flowchart of an information processing method according to an embodiment of the present application;

FIG. 4 is a schematic flow chart diagram illustrating a model training method according to an exemplary embodiment of the present application;

FIG. 5 is a flowchart illustrating a method for training a dialogue model according to an exemplary embodiment of the present application;

fig. 6 is a flowchart illustrating a dialog processing method according to an exemplary embodiment of the present application;

fig. 7 is a schematic structural diagram of an information processing apparatus according to an exemplary embodiment of the present application;

FIG. 8 is a schematic structural diagram of a model training apparatus according to an exemplary embodiment of the present disclosure;

fig. 9 is a schematic structural diagram of an information processing apparatus according to an exemplary embodiment of the present application;

FIG. 10 is a schematic diagram of a model processing device according to an exemplary embodiment of the present application;

FIG. 11 is a schematic diagram of a model processing device according to an exemplary embodiment of the present application;

fig. 12 is a schematic structural diagram of an information processing apparatus according to an exemplary embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Currently, in order to maintain a robust online representation of dialog systems, a way of human-machine collaboration is introduced in dialog systems. But because less training data is used, the dialog system has a lower dialog accuracy.

For the technical problem that the existing conversation model is low in conversation accuracy, in some exemplary embodiments of the present application, an information processing device obtains a conversation current conversation history including a problem of a current turn and a history turn, and a system reply candidate set; replying a candidate set and a dialogue history by the system, and inputting a human-computer cooperation dialogue model which is trained by adopting a meta-learning training method in advance; the man-machine cooperation dialogue model determines whether an artificial mode is adopted, and under the condition that the man-machine cooperation dialogue model determines that a system reply mode is adopted, the information processing equipment receives replies aiming at the problems of the current round, wherein the replies are selected and output from the system reply candidate set by the man-machine cooperation dialogue model; the man-machine cooperation dialogue model trained by the meta-learning training method has higher dialogue accuracy.

The technical solutions provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.

Fig. 1a is a schematic structural diagram of a human-computer collaboration dialog system 10 according to an exemplary embodiment of the present application. As shown in fig. 1a, the human-machine collaboration session system 10 includes a session device 10a, a server 10b, and a collaboration provider device 10 c.

In this embodiment, the dialogue device 10a may have functions of computing, communication, internet access, and the like in addition to the basic service function, and the type of the dialogue device 10a is not limited in this embodiment. The dialogue device 10a may be a personal computer, a mobile phone, a robot, a smart television, a smart speaker, or the like.

In the present embodiment, a man-machine collaboration dialogue model is deployed in the server 10 b. The embodiment of the present application does not limit the implementation form of the server 10b, and for example, the server 10b may be a server device such as a conventional server, a cloud host, a virtual center, and the like. The server device mainly comprises a processor, a hard disk, a memory, a system bus and the like, and a general computer architecture type. The server 10b may include one web server or a plurality of web servers.

In this embodiment, the cooperation provider device 10c is a terminal device of a manual cooperation user, and after the server 10b determines that the reply mode of the question in the current turn is the manual reply mode, the cooperation provider device 10c responds to an input operation of the user for replying the question in the current turn, and obtains the reply of the question in the current turn. For example, the collaboration provider device 10c includes an electronic display screen through which a user may interact with the collaboration provider device 10c, the user entering a reply for a current turn by interacting with the electronic display screen; alternatively, the cooperation provider device 10c includes a microphone, and the cooperation provider device 10c acquires the reply to the question of the current turn in response to an operation in which the user inputs the reply to the question of the current turn in a voice manner.

In the present embodiment, the server 10b may be connected wirelessly or by wire to the conversation device 10a and the cooperation provider device 10 c. Alternatively, the server 10b may establish a communication connection with the conversation device 10a and the cooperation provider device 10c by using communication methods such as WIFI, bluetooth, infrared, and the like. Alternatively, the server 10b may establish a communication connection with the conversation device 10a and the cooperation provider device 10c via a mobile network. The network format of the mobile network may be any one of 2G (gsm), 2.5G (gprs), 3G (WCDMA, TD-SCDMA, CDMA2000, UTMS), 4G (LTE), 4G + (LTE +), WiMax, and the like.

In this embodiment, the server 10b obtains the system reply candidate set and the current conversation history, and processes the system reply candidate set and the current conversation history; determining a reply mode for replying to the current round of questions, and selecting replies to the current round of questions from the system reply candidate set. In an alternative embodiment, the server 10b obtains a system reply candidate set and a current conversation history, and inputs the system reply candidate set and the current conversation history into the man-machine collaboration conversation model; determining a reply mode for replying to the current round of questions, and selecting replies to the current round of questions from the system reply candidate set.

In the above-described embodiment, the dialogue device 10a acquires the question of the current round in response to the user's operation of inputting the question of the current round. Including but not limited to the following acquisition modes:

in the first obtaining mode, an interface is displayed on an electronic display screen of the dialogue device 10a, the interface includes a question input item, and the dialogue device 10a responds to an operation of a user inputting a question of a current turn in the question input item to obtain the question of the current turn.

In the second acquisition mode, the microphone provided in the dialogue device 10a collects the question of the current turn spoken by the user in a voice manner.

After the dialogue device 10a acquires the problem of the current round, the problem of the current round is sent to the server 10b, the server 10b searches out dialogs of a historical round and a system reply candidate set for dialogue reply, the server 10b inputs the system reply candidate set and the current dialogue history into a pre-trained man-machine cooperation dialogue model, and the man-machine cooperation dialogue model determines a reply mode for replying the problem of the current round; if the man-machine cooperation dialogue model determines to adopt a system reply mode, selecting and outputting a reply aiming at the problem of the current turn from the system reply candidate set; the server 10b sends the reply to the question of the current turn, which is selected and output from the system reply candidate set when the system reply mode is adopted, to the dialogue device 10 a; if the human-computer cooperation dialogue model determines that the manual reply mode is adopted, a manual reply request is sent to the cooperation provider device 10c, and the cooperation provider device 10c obtains a reply of a manually input question of the current turn and sends the reply to the dialogue device 10 a.

Fig. 1b is a schematic diagram of a model structure of a human-computer collaboration dialogue model 20 according to an exemplary embodiment of the present application. As shown in FIG. 1b, the human-machine collaboration dialog model 20 includes a reply encoder 201, a history encoder 202, a decider 203, and a predictor 204. Inputting the system reply candidate set into a reply encoder 201 in the human-computer cooperation dialogue model 20, and vectorizing the system reply candidate set by the reply encoder to obtain a system reply vector; inputting the current conversation history into a history encoder 202, and performing vectorization processing on the current conversation history by using the history encoder 202 to obtain a conversation state vector; inputting the system reply vector and the dialog state vector into the decider 203, so that the decider 203 can determine whether to adopt a manual reply mode; if the system reply mode is determined to be adopted, inputting a system reply vector and a conversation state vector into the predictor 204, and selecting a reply aiming at the problem of the current turn from the system reply candidate set by the predictor 204; if the manual reply mode is determined to be adopted, the server 10b sends a manual reply request to the cooperation provider device 10c, and after receiving the manual reply request, the cooperation provider device 10c obtains the reply of the manually input current round of questions and sends the reply of the current round of questions to the dialogue device 10 a.

In some exemplary embodiments, the current dialog history, for example: "user side: i want to listen to taylor's song; a system end: what type do you want to listen? (ii) a A user side: country music. ". The system replies to the candidate set, for example: "1, API lookup (taylor, country music); 2, you want to listen to the song of which star; 3, what type you want to listen to; 4, what I can help you ". Inputting the current conversation history and the system reply candidate set into a man-machine cooperation conversation model 20, inputting the system reply candidate set into a reply encoder 201 in the man-machine cooperation conversation model 20, and vectorizing the system reply candidate set by the reply encoder to obtain a system reply vector; inputting the current conversation history into a history encoder 202, and performing vectorization processing on the current conversation history by using the history encoder 202 to obtain a conversation state vector; the system reply vector and the dialog state vector are input into the decider 203 for the decider 203 to determine whether to employ the manual reply mode. The decider 203 determines that the system reply mode is adopted, and inputs the system reply vector and the dialogue state vector into the predictor 204, and the predictor 204 selects "1, API search (taylor, country music)" for the reply of the question of the current round from the system reply candidate set.

In the embodiment of the application, a system reply candidate set and a current conversation history are coded to obtain a system reply vector and a conversation state vector with the same dimensionality, the processing speed of a man-machine cooperation conversation model is increased, meanwhile, a judger is input to judge whether a manual reply mode is adopted, and if the judger selects the manual reply mode, a request is made for manually giving a reply to the problem of the current turn; if the decision device selects the system reply mode, the predictor is used to select the reply of the current round of the question from the system reply candidate set. The conversation system adopts a man-machine cooperation conversation model, the conversation processing speed is high, and the conversation accuracy is high.

In some exemplary embodiments, the artificial collaboration dialogue model may be trained in advance according to a large number of reply candidate set samples and dialogue history samples and a target reply sample selected from the system reply candidate set samples. An optional embodiment is that a system reply candidate set sample, a conversation history sample and a target reply sample selected from the system reply candidate set sample are obtained; and carrying out classification training by adopting a meta-learning method according to the system reply candidate set samples, the conversation history samples and the target reply samples selected from the system reply candidate set samples to obtain a man-machine cooperation conversation model.

In the above embodiment, the system reply candidate set sample, the conversation history sample, and the target reply sample selected from the system reply candidate set sample are obtained. An alternative embodiment is that a plurality of meta-learning tasks are constructed, each meta-learning task comprising a dataset of a respective domain; and selecting a system reply candidate set sample, a conversation history sample and a target reply sample selected from the system reply candidate set sample from a plurality of meta-learning tasks as training samples. For example, each meta-learning task contains a set of data for a domain; sampling K fields from a training data set, sampling M training data of each field as a support set, sampling M training data as a query set to form K meta-learning tasks, and selecting a system reply candidate set sample, a dialogue history sample and a target reply sample selected from the system reply candidate set sample from the K meta-learning tasks, wherein the number of the target reply samples is 2M K training samples. The support set and the query set contain data that are randomly sampled from the training data, and the data are similar.

In the above embodiment, classification training is performed by using a meta-learning method according to the system reply candidate set sample, the conversation history sample, and the target reply sample selected from the system reply candidate set sample, so as to obtain the human-computer collaboration conversation model. One way to implement this is to take the system reply candidate set sample, the dialogue history sample and the target reply sample selected from the system reply candidate set sample as the input parameters; and (3) selecting a correct reply error from the reply candidate set and taking whether the joint loss of manual reply is adopted as an objective function, and simultaneously training a reply encoder, a history encoder, a judger and a model predictor to obtain a man-machine cooperation dialogue model. The embodiment of the application utilizes the MAML algorithm to carry out combined training on the meta-learning tasks, and finds out the most appropriate model initialization parameter set, so that the parameter has the fastest optimization effect on all the meta-learning tasks in the average sense.

Optionally, the error of selecting a correct reply from the reply candidate set and whether the joint loss of the manual reply is adopted are taken as objective functions, and the reply coder, the history coder, the decider and the model predictor are trained simultaneously to obtain the man-machine cooperation dialogue model. An optional embodiment is that vectorization processing is performed on the system reply candidate set samples to obtain a system reply vector; vectorizing the dialogue history sample to obtain a dialogue state vector; and taking the system reply vector and the dialogue state vector as input parameters, selecting a correct reply error from the reply candidate set, and simultaneously performing two-class training and multi-class training by taking the joint loss of manual reply as an objective function to obtain a reply encoder, a history encoder, a decision device and a predictor which are trained. The method comprises the steps of obtaining a joint loss function by adding loss functions of a decision device and a predictor, optimizing the joint loss function by using an MAML algorithm, finding out the optimal parameters of a man-machine cooperation dialogue model, and enabling the man-machine cooperation dialogue model to have the capability of fast learning by adopting less training data and a meta-learning training method.

FIG. 1c is a schematic diagram of a model training framework provided in an exemplary embodiment of the present application. As shown in fig. 1c, K domains are sampled from the training data set, each domain samples M training data as a support set, and samples M training data as a query set, thereby forming K meta-learning tasks, each meta-learning task including a domain data set. And selecting a system reply candidate set sample, a conversation history sample and a target reply sample selected from the system reply candidate set sample from the K meta learning tasks, wherein the total number of the target reply samples is 2M K training samples. Taking the system reply candidate set sample, the conversation history sample and a target reply sample selected from the system reply candidate set sample as input parameters; and simultaneously training the reply coder 301, the history coder 302, the decider 303 and the model predictor 304 to obtain the man-machine collaboration dialogue model by taking the error of selecting a correct reply from the reply candidate set and whether the joint loss of manual reply is adopted as an objective function.

Fig. 1d is a schematic structural diagram of another human-computer collaboration dialog system 40 according to an exemplary embodiment of the present application. As shown in fig. 1d, the human-machine collaboration dialog system 40 includes a dialog device 40a and a collaboration provider device 40 b.

In this embodiment, the dialogue device 40a may have functions of computing, communication, internet access, and the like, besides the basic service function, and the type of the dialogue device 40a is not limited in this embodiment. The dialogue device 40a may be a personal computer, a mobile phone, a robot, a smart television, a smart speaker, or the like. A human-machine collaboration dialog model is deployed in the dialog device 40 a.

In this embodiment, the cooperation provider device 40b is a terminal device of the manual cooperation user, and after the human-computer cooperation dialogue model in the dialogue device 40a determines that the reply mode of the question in the current turn is the manual reply mode, the cooperation provider device 40b responds to the input operation of the user for replying the question in the current turn, and obtains the reply of the question in the current turn. For example, the collaboration provider device 40b includes an electronic display screen through which a user may interact with the collaboration provider device 40b, the user entering a reply for a current turn by interacting with the electronic display screen; alternatively, the cooperation provider device 40b includes a microphone, and the cooperation provider device 40b acquires the reply to the question of the current round in response to an operation in which the user inputs the reply to the question of the current round in a voice manner.

In the present embodiment, the connection between the conversation device 40a and the cooperation provider device 40b may be wireless or wired. Alternatively, the conversation device 40a and the cooperation provider device 40b may establish a communication connection therebetween by using communication methods such as WIFI, bluetooth, infrared, and the like. Alternatively, the communication connection between the conversation device 40a and the cooperation provider device 40b may be established through a mobile network. The network format of the mobile network may be any one of 2G (gsm), 2.5G (gprs), 3G (WCDMA, TD-SCDMA, CDMA2000, UTMS), 4G (LTE), 4G + (LTE +), WiMax, and the like.

In this embodiment, the dialogue device 10a obtains the system reply candidate set and the current dialogue history, and processes the system reply candidate set and the current dialogue history; determining a reply mode for replying to the current round of questions, and selecting replies to the current round of questions from the system reply candidate set. In an alternative embodiment, the dialogue device 10a obtains a system reply candidate set and a current dialogue history, and inputs the system reply candidate set and the current dialogue history into the human-computer collaboration dialogue model; determining a reply mode for replying to the current round of questions, and selecting replies to the current round of questions from the system reply candidate set.

In the above-described embodiment, the dialogue device 40a acquires the question of the current round in response to the user's operation of inputting the question of the current round. Including but not limited to the following acquisition modes:

in the first obtaining mode, an interface is displayed on an electronic display screen of the dialogue device 40a, the interface includes a question input item, and the dialogue device 40a responds to an operation of a user inputting a question of a current turn in the question input item to obtain the question of the current turn.

In the second acquisition mode, the microphone provided in the dialogue device 40a collects the current round of questions spoken by the user in a voice manner.

After the dialog device 40a acquires the problem of the current round, searching out dialogs of a history round and a system reply candidate set for dialog reply, inputting a pre-trained human-computer collaboration dialog model into the dialog device 40a with the system reply candidate set and the current dialog history, and determining a reply mode for reply aiming at the problem of the current round by the human-computer collaboration dialog model; if the man-machine cooperation dialogue model determines to adopt a system reply mode, selecting and outputting a reply aiming at the problem of the current turn from the system reply candidate set; if the human-computer cooperation dialogue model determines that the manual reply mode is adopted, a manual reply request is sent to the cooperation provider device 40b, and the cooperation provider device 40b obtains a reply of a manually input question of the current turn and sends the reply to the dialogue device 40 a.

In the above system embodiment of the present application, the information processing device obtains a current dialog history including a question of a current round and a history round of dialog, and a system reply candidate set; replying a candidate set and a dialogue history by the system, and inputting a human-computer cooperation dialogue model which is trained by adopting a meta-learning training method in advance; the man-machine cooperation dialogue model determines whether an artificial mode is adopted, and under the condition that the man-machine cooperation dialogue model determines that a system reply mode is adopted, the information processing equipment receives replies aiming at the problems of the current round, wherein the replies are selected and output from the system reply candidate set by the man-machine cooperation dialogue model; the man-machine cooperation dialogue model trained by the meta-learning training method has higher dialogue accuracy.

In addition to the human-computer collaboration dialogue system provided above, some embodiments of the present application also provide an information processing method, and the information processing method provided herein may be applied to the human-computer collaboration dialogue system, but is not limited to the human-computer collaboration dialogue system provided above.

Fig. 2 is a flowchart of a method of processing information according to an exemplary embodiment of the present application, where as shown in fig. 2, the method includes the following steps:

s201: acquiring a system reply candidate set and a current conversation history for conversation reply, wherein the current conversation history comprises questions of a current turn of a user and conversations of a history turn;

s202: inputting the system reply candidate set and the current conversation history into a pre-trained human-computer cooperation conversation model so that the human-computer cooperation conversation model can determine a reply mode for replying the current turn of the problem;

s203: receiving replies aiming at the problems of the current round, which are selected and output from the system reply candidate set when the human-computer cooperation dialogue model determines to adopt the system reply mode; the human-computer cooperation dialogue model is obtained by training a system reply candidate set sample and a dialogue history sample by adopting a meta-learning training method.

In this embodiment, the information processing device that is the execution subject of the method may have functions of computing, communication, internet access, and the like, in addition to the basic service function, and this embodiment of the present application does not limit the type of the information processing device. The information processing device can be a personal computer, a mobile phone, a robot, a smart television, a smart sound box, a server and the like. The server device mainly comprises a processor, a hard disk, a memory, a system bus and the like, and a general computer architecture type. The server may include one web server or a plurality of web servers.

In this embodiment, the cooperation provider device is a terminal device of a manual cooperation user, and after the information processing device determines that the reply mode of the current round of questions is the manual reply mode, the cooperation provider device responds to an input operation of the user for replying to the current round of questions, and obtains a reply to the current round of questions. For example, the collaboration provider device includes an electronic display screen, and the user may interact with the collaboration provider device through the electronic display screen, and input a reply of the current turn through interaction with the electronic display screen; or the cooperation provider device comprises a microphone, responds to the operation that the user inputs the reply of the current turn of question in a voice mode, and acquires the reply of the current turn of question.

In the embodiment, the information processing equipment acquires a system reply candidate set and a current conversation history, and processes the system reply candidate set and the current conversation history; determining a reply mode for replying to the current round of questions, and selecting replies to the current round of questions from the system reply candidate set. An optional embodiment is that the information processing device acquires a system reply candidate set and a current conversation history, and inputs the system reply candidate set and the current conversation history into a man-machine cooperation conversation model; determining a reply mode for replying to the current round of questions, and selecting replies to the current round of questions from the system reply candidate set.

In the above-described embodiment, the information processing apparatus acquires the question of the current round in response to an operation of the user inputting the question of the current round. Including but not limited to the following acquisition modes:

the method comprises the steps that an interface is displayed on an electronic display screen of the information processing equipment, the interface comprises a problem input item, and the information processing equipment responds to the operation that a user inputs the problem of the current turn in the problem input item and obtains the problem of the current turn.

And in the second acquisition mode, a microphone arranged on the information processing equipment is used for acquiring the problem of the current turn sent by the user in a voice mode.

After the information processing equipment acquires the problem of the current round, searching out the dialogue of the historical round and a system reply candidate set for dialogue reply, inputting the system reply candidate set and the current dialogue history into a pre-trained human-computer collaboration dialogue model by the information processing equipment, and determining a reply mode for replying the problem of the current round by the human-computer collaboration dialogue model; if the man-machine cooperation dialogue model determines to adopt a system reply mode, selecting and outputting a reply aiming at the problem of the current turn from the system reply candidate set; and if the man-machine cooperation dialogue model determines that the manual reply mode is adopted, sending a manual reply request to the cooperation provider equipment, and sending the cooperation provider equipment to the information processing equipment after acquiring the reply of the current turn of manually input problems.

Based on the description of the method embodiments of the information processing methods in the foregoing embodiments, fig. 3 is a schematic flow chart of an information processing method provided in the embodiments of the present application. As shown in fig. 3, the method includes:

s301: acquiring a system reply candidate set and a current conversation history for conversation reply, wherein the current conversation history comprises questions of a current turn of a user and conversations of a history turn;

s302: inputting the system reply candidate set and the current conversation history into a pre-trained human-computer cooperation conversation model, and determining whether a human reply mode is adopted by the human-computer cooperation conversation model aiming at the problem of the current turn by using a decision device; if yes, go to step S303, otherwise go to step S305;

s303: sending a manual reply request to the cooperation provider device;

s304: receiving a reply of a current round of questions sent by the collaboration provider device;

s305: the artificial collaboration dialogue model utilizes a predictor to reply the question of the current turn selected and output from the system reply candidate set; the human-computer cooperation dialogue model is obtained by training a system reply candidate set sample and a dialogue history sample by adopting a meta-learning training method.

Fig. 4 is a flowchart illustrating a model training method according to an exemplary embodiment of the present disclosure. As shown in fig. 4, the method includes:

s401: acquiring a system reply candidate set sample, a conversation history sample and a target reply sample selected from the system reply candidate set sample;

s402: and carrying out classification training by adopting a meta-learning method according to the system reply candidate set samples, the conversation history samples and the target reply samples selected from the system reply candidate set samples to obtain a man-machine cooperation conversation model.

In this embodiment, the model training device for training the human-computer cooperation dialogue model is an owner of the human-computer cooperation dialogue model, and may be a device of a user itself, for example, an enterprise user has a dialogue need, and may train the human-computer cooperation dialogue model by using a server of the enterprise user. In this embodiment, the implementation form of the model training device is not limited, and for example, the model training device may be a server device such as a regular server, a cloud host, and a virtual center. The server device mainly comprises a processor, a hard disk, a memory, a system bus and the like, and a general computer architecture type.

It should be noted that the execution subjects of the steps of the methods provided in the above embodiments may be the same device, or different devices may be used as the execution subjects of the methods. For example, the execution subjects of steps 301 to 303 may be device a; for another example, the execution subject of steps 301 and 302 may be device a, and the execution subject of step 303 may be device B; and so on.

In addition, in some of the flows described in the above embodiments and the drawings, a plurality of operations are included in a specific order, but it should be clearly understood that the operations may be executed out of the order presented herein or in parallel, and the sequence numbers of the operations, such as 301, 302, etc., are merely used for distinguishing different operations, and the sequence numbers do not represent any execution order per se. Additionally, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first", "second", etc. in this document are used for distinguishing different messages, devices, modules, etc., and do not represent a sequential order, nor limit the types of "first" and "second" to be different.

In the method embodiment of the application, the information processing equipment acquires a current conversation history of a conversation including a current turn of questions and a history turn of the conversation, and a system reply candidate set; replying a candidate set and a dialogue history by the system, and inputting a human-computer cooperation dialogue model which is trained by adopting a meta-learning training method in advance; the man-machine cooperation dialogue model determines whether an artificial mode is adopted, and under the condition that the man-machine cooperation dialogue model determines that a system reply mode is adopted, the information processing equipment receives replies aiming at the problems of the current round, wherein the replies are selected and output from the system reply candidate set by the man-machine cooperation dialogue model; the man-machine cooperation dialogue model trained by the meta-learning training method has higher dialogue accuracy.

Fig. 5 is a flowchart illustrating a training method of a dialogue model according to an exemplary embodiment of the present application. As shown in fig. 5, the method includes:

s501: obtaining a system reply content sample and a conversation history content sample for a current session;

s502: obtaining a system reply vector of the system reply content and a conversation state vector of the conversation history content;

s503: obtaining an enhanced learning loss function and a cross entropy loss function of the model according to the system reply vector and the dialogue state vector;

s504: obtaining a combined loss function according to the reinforcement learning loss function and the cross entropy loss function;

s505: and training the network parameters of the neural network model according to the joint loss function to obtain the dialogue model.

In this embodiment, the model training device for training the session model is an owner of the session model, and may be a device of a user itself, for example, an enterprise user has a session requirement, and may train the session model by using a server of the enterprise user. In this embodiment, the implementation form of the model training device is not limited, and for example, the model training device may be a server device such as a regular server, a cloud host, and a virtual center. The server device mainly comprises a processor, a hard disk, a memory, a system bus and the like, and a general computer architecture type.

In the embodiment, an enhanced learning loss function and a cross entropy loss function of the model are obtained according to the system reply vector and the dialogue state vector; and obtaining a joint loss function according to the reinforcement learning loss function and the cross entropy loss function.

In this embodiment, the network parameters of the model are trained according to the joint loss function, so as to obtain a dialogue model. An optional embodiment is that the joint loss function is optimized by using the MAML algorithm in the meta learning method, and the network parameters of the model are trained to obtain the dialogue model. And when the value of the joint loss function is minimum, solidifying the network parameters of the model to obtain the dialogue model.

In the above embodiments, the system reply content sample and the conversation history content sample for the current session are obtained. An alternative embodiment is that training data of a meta learning method is obtained; a plurality of meta-learning tasks are constructed from the training data, each meta-learning task comprising a set of supports and a set of queries. For example, each meta-learning task contains a set of data for a domain; sampling K fields from a training data set, sampling M training data of each field as a support set, sampling M training data as a query set to form K meta-learning tasks, and selecting a system reply candidate set sample, a dialogue history sample and a target reply sample selected from the system reply candidate set sample from the K meta-learning tasks, wherein the number of the target reply samples is 2M K training samples. The support set and the query set contain data that are randomly sampled from the training data, and the data are similar.

In the above embodiment, the candidate set and dialogue history content is replied to using the support set and query set training system of one or more meta-learning tasks; the system reply candidate set is used for generating system reply content according to the input of the current conversation; the dialog history content is used to provide a dialog history for the current session.

The method comprises the steps of obtaining a joint loss function by adding loss functions of a decision device and a predictor, optimizing the joint loss function by using an MAML algorithm, finding out the optimal parameters of a dialogue model, and enabling the dialogue model to have the capability of fast learning by adopting less training data and a meta-learning training method.

Fig. 6 is a flowchart illustrating a dialog processing method according to an exemplary embodiment of the present application. As shown in fig. 6, the method includes:

s601: receiving a question of a current conversation;

s602: according to the problem of the current conversation, system reply content and conversation historical content aiming at the current conversation are obtained;

s603: generating a system reply vector of the system reply content and a conversation state vector of the conversation history content;

s604: and generating the reply content of the question of the current conversation according to the system reply vector and the conversation state vector.

In the embodiment, the information processing equipment acquires a system reply candidate set and a current conversation history, and processes the system reply candidate set and the current conversation history; determining a reply mode for replying to the question of the current conversation, and selecting a reply to the question of the current conversation from the system reply candidate set. An optional embodiment is that the information processing device acquires a system reply candidate set and a current conversation history, and inputs the system reply candidate set and the current conversation history into the conversation model; determining a reply mode for replying to the question of the current conversation, and selecting a reply to the question of the current conversation from the system reply candidate set.

In the above-described embodiment, the information processing apparatus acquires the question of the current conversation in response to the operation of the user inputting the question of the current conversation. Including but not limited to the following acquisition modes:

the method comprises the steps that an interface is displayed on an electronic display screen of the information processing equipment, the interface comprises a question input item, and the information processing equipment responds to the operation that a user inputs a question of the current conversation in the question input item and obtains the question of the current conversation.

And in the second acquisition mode, a microphone arranged on the information processing equipment collects the problem of the current conversation sent by the user in a voice mode.

After the information processing equipment acquires the problem of the current conversation, searching out the conversation of the history turn and a system reply candidate set for conversation reply, inputting a conversation model trained in advance into the system reply candidate set and the current conversation history by the information processing equipment, and determining a reply mode used for replying the problem of the current conversation by the conversation model; if the dialogue model determines to adopt a system reply mode, selecting and outputting a reply aiming at the problem of the current dialogue from the system reply candidate set; and if the conversation model determines that the manual reply mode is adopted, sending a manual reply request to the cooperation provider equipment, and sending the cooperation provider equipment to the information processing equipment after acquiring the reply of the current conversation question manually input.

In the dialogue model, inputting the system reply candidate set into a reply encoder, and vectorizing the system reply candidate set by the reply encoder to obtain a system reply vector; inputting the current conversation history into a history encoder, and performing vectorization processing on the current conversation history by using the history encoder to obtain a conversation state vector; inputting the system reply vector and the dialogue state vector into a judger for the judger to determine whether a manual reply mode is adopted or not; if the system reply mode is determined to be adopted, inputting a system reply vector and a conversation state vector into a predictor, and selecting a reply aiming at the problem of the current conversation from a system reply candidate set by the predictor; and if the manual reply mode is determined to be adopted, the server sends a manual reply request to the cooperation provider equipment, and after the cooperation provider equipment receives the manual reply request, the server acquires the reply of the current dialog question manually input and sends the reply of the current dialog question to the dialog equipment.

In some exemplary embodiments, the current dialog history, for example: "user side: i want to listen to taylor's song; a system end: what type do you want to listen? (ii) a A user side: country music. ". The system replies to the candidate set, for example: "1, API lookup (taylor, country music); 2, you want to listen to the song of which star; 3, what type you want to listen to; 4, what I can help you ". Inputting the current conversation history and the system reply candidate set into a conversation model 20, inputting the system reply candidate set into a reply encoder 201 in the conversation model 20, and vectorizing the system reply candidate set by the reply encoder to obtain a system reply vector; inputting the current conversation history into a history encoder 202, and performing vectorization processing on the current conversation history by using the history encoder 202 to obtain a conversation state vector; the system reply vector and the dialog state vector are input into the decider 203 for the decider 203 to determine whether to employ the manual reply mode. The decider 203 determines that the system reply mode is adopted, and inputs the system reply vector and the dialog state vector into the predictor 204, and the predictor 204 selects "1, API lookup (taylor, country music)" for the question of the current dialog from the system reply candidate set.

In the embodiment of the application, a system reply candidate set and a current conversation history are coded to obtain a system reply vector and a conversation state vector with the same dimensionality, the processing speed of a conversation model is increased, meanwhile, a decision device is input to judge whether a manual reply mode is adopted, and if the decision device selects the manual reply mode, a request is made for manually giving a reply to the problem of the current conversation; if the decision maker selects the system reply mode, the predictor is used to select the reply of the question of the current conversation from the system reply candidate set. The conversation system of the application adopts a conversation model, the conversation processing speed is high, and the conversation accuracy is high.

Fig. 7 is a schematic structural diagram of an information processing apparatus according to an exemplary embodiment of the present application, and as shown in fig. 7, the information processing apparatus includes an obtaining module 71, an input module 72, and a receiving module 73.

The obtaining module 71 is configured to obtain a system reply candidate set and a current conversation history for conversation reply, where the current conversation history includes questions of a current turn of a user and conversations of a history turn;

the input module 72 is configured to input the system reply candidate set and the current conversation history into a pre-trained human-computer collaboration conversation model, so that the human-computer collaboration conversation model determines a reply mode for replying to a current turn of the question;

a receiving module 73, configured to receive a reply to the problem of the current round, which is selected and output from the system reply candidate set when the human-computer collaboration dialogue model determines to adopt the system reply mode; the human-computer cooperation dialogue model is obtained by training a system reply candidate set sample and a dialogue history sample by adopting a meta-learning training method.

Optionally, the receiving module 73 is configured to receive a reply to the question of the current turn returned by the collaboration provider device when the human-machine collaboration dialogue model determines that the manual reply mode is adopted.

Optionally, the input module 72 is specifically configured to, when the system reply candidate set and the current conversation history are input into a pre-trained human-computer collaboration conversation model, so that the human-computer collaboration conversation model determines a reply mode for replying to the question of the current turn, and receives a reply to the question of the current turn, which is selected and output from the system reply candidate set by the human-computer collaboration conversation model when the system reply mode is determined to be adopted, to: determining whether a manual reply mode is adopted or not by using a decision device in a man-machine cooperation dialogue model; and if the system reply mode is determined to be adopted, selecting the reply aiming at the problem of the current round from the system reply candidate set by using the predictor and outputting the reply of the problem of the current round.

Optionally, the input module 72 determines whether to use the manual reply mode by using the determiner, specifically to: inputting the system reply candidate set into a reply encoder, and performing vectorization processing on the system reply candidate set by using the reply encoder to obtain a system reply vector; inputting the current conversation history into a history encoder, and performing vectorization processing on the current conversation history by using the history encoder to obtain a conversation state vector; the system reply vector and the dialog state vector are input into a decider for the decider to determine whether to adopt a manual reply mode.

Optionally, the receiving module 73 is configured to send a manual reply request to the cooperative provider device if it is determined that the manual reply mode is adopted, so that the cooperative provider device obtains a reply to the manually input question of the current turn; receiving a reply to the question of the current turn sent by the collaboration provider device.

Fig. 8 is a schematic structural diagram of a model training apparatus according to an exemplary embodiment of the present application. As shown in fig. 8, the model training apparatus includes an acquisition module 81 and a training module 82.

The acquiring module 81 is configured to acquire a system reply candidate set sample, a conversation history sample, and a target reply sample selected from the system reply candidate set sample;

and the training module 82 is used for performing classification training by adopting a meta-learning method according to the system reply candidate set samples, the conversation history samples and the target reply samples selected from the system reply candidate set samples to obtain a man-machine cooperation conversation model.

Optionally, the obtaining module 81 is specifically configured to construct a plurality of meta-learning tasks, where each meta-learning task includes a data set of a corresponding field; and selecting a system reply candidate set sample, a conversation history sample and a target reply sample selected from the system reply candidate set sample from a plurality of meta-learning tasks as training samples.

Optionally, the training module 82 is specifically configured to, when performing classification training by using a meta-learning method according to the system reply candidate set sample, the dialogue history sample, and the target reply sample selected from the system reply candidate set sample to obtain a human-computer collaboration dialogue model, use the system reply candidate set sample, the dialogue history sample, and the target reply sample selected from the system reply candidate set sample as input parameters; and (3) selecting a correct reply error from the reply candidate set and taking whether the joint loss of manual reply is adopted as an objective function, and simultaneously training a reply encoder, a history encoder, a judger and a model predictor to obtain a man-machine cooperation dialogue model.

Optionally, the training module 82 is specifically configured to, when the joint loss of whether to adopt a manual reply or not and an error of selecting a correct reply from the reply candidate set are used as objective functions, and the reply encoder, the history encoder, the determiner, and the model predictor are trained at the same time to obtain the human-computer collaboration dialogue model: vectorizing a system reply candidate set sample to obtain a system reply vector; vectorizing the dialogue history sample to obtain a dialogue state vector; and taking the system reply vector and the dialogue state vector as input parameters, selecting a correct reply error from the reply candidate set, and simultaneously performing two-class training and multi-class training by taking the joint loss of manual reply as an objective function to obtain a reply encoder, a history encoder, a decision device and a predictor which are trained.

In the device embodiment of the application, the information processing equipment acquires a current conversation history of a conversation including a current round of questions and a history round of the conversation, and a system reply candidate set; replying a candidate set and a dialogue history by the system, and inputting a human-computer cooperation dialogue model which is trained by adopting a meta-learning training method in advance; the man-machine cooperation dialogue model determines whether an artificial mode is adopted, and under the condition that the man-machine cooperation dialogue model determines that a system reply mode is adopted, the information processing equipment receives replies aiming at the problems of the current round, wherein the replies are selected and output from the system reply candidate set by the man-machine cooperation dialogue model; the man-machine cooperation dialogue model trained by the meta-learning training method has higher dialogue accuracy.

Fig. 9 is a schematic structural diagram of an information processing apparatus according to an exemplary embodiment of the present application. As shown in fig. 9, the information processing apparatus includes a memory 901 and a processor 902. In addition, the information processing apparatus includes necessary components such as a power supply component 903 and a communication component 904.

A memory 901 for storing a computer program and may be configured to store other various data to support operations on the information processing apparatus. Examples of such data include instructions for any application or method operating on the information processing device.

The memory 901 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

A communication component 904 for data transmission with other devices.

The processor 902, which may execute computer instructions stored in the memory 901, is configured to: acquiring a system reply candidate set and a current conversation history for conversation reply, wherein the current conversation history comprises questions of a current turn of a user and conversations of a history turn; inputting the system reply candidate set and the current conversation history into a pre-trained human-computer cooperation conversation model so that the human-computer cooperation conversation model can determine a reply mode for replying the current turn of the problem; receiving replies aiming at the problems of the current round, which are selected and output from the system reply candidate set when the human-computer cooperation dialogue model determines to adopt the system reply mode; the human-computer cooperation dialogue model is obtained by training a system reply candidate set sample and a dialogue history sample by adopting a meta-learning training method.

Optionally, the processor 902 is further configured to: and receiving a reply to the problem of the current turn, which is returned by the cooperation provider device when the man-machine cooperation dialogue model determines to adopt the manual reply mode.

Optionally, the processor 902 is specifically configured to, when inputting the system reply candidate set and the current conversation history into a pre-trained human-machine collaboration conversation model, determine a reply mode for the human-machine collaboration conversation model to reply to the question of the current turn, and receive a reply to the question of the current turn, which is selected and output from the system reply candidate set by the human-machine collaboration conversation model when determining that the system reply mode is adopted, by the human-machine collaboration conversation model: determining whether a manual reply mode is adopted or not by using a decision device in a man-machine cooperation dialogue model; if the system reply mode is determined to be adopted, a predictor is utilized to select a reply aiming at the problem of the current turn from the system reply candidate set; and outputting the reply of the question of the current round.

Optionally, when determining whether to adopt the manual reply mode by using the determiner, the processor 902 is specifically configured to: inputting the system reply candidate set into a reply encoder, and performing vectorization processing on the system reply candidate set by using the reply encoder to obtain a system reply vector; inputting the current conversation history into a history encoder, and performing vectorization processing on the current conversation history by using the history encoder to obtain a conversation state vector; the system reply vector and the dialog state vector are input into a decider for the decider to determine whether to adopt a manual reply mode.

Optionally, the processor 902 is further configured to: if the manual reply mode is determined to be adopted, sending a manual reply request to the cooperation provider equipment so that the cooperation provider equipment can obtain the reply of the manually input current round of problems; receiving a reply to the question of the current turn sent by the collaboration provider device.

Correspondingly, the embodiment of the application also provides a computer readable storage medium storing the computer program. The computer-readable storage medium stores a computer program, and the computer program, when executed by one or more processors, causes the one or more processors to perform the steps in the method embodiment of fig. 2.

Fig. 10 is a schematic structural diagram of a model processing apparatus according to an exemplary embodiment of the present application. As shown in fig. 10, the model processing apparatus includes: a memory 1001 and a processor 1002. In addition, the model processing apparatus includes necessary components such as a power component 1003 and a communication component 1004.

A memory 1001 for storing a computer program and may be configured to store other various data to support operations on the model processing device. Examples of such data include instructions for any application or method operating on the model processing device.

The memory 1001 may be implemented by any type of volatile or non-volatile memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

A communication component 1004 for communicating data with other devices.

The processor 1002, may execute computer instructions stored in the memory 1001 for: acquiring a system reply candidate set sample, a conversation history sample and a target reply sample selected from the system reply candidate set sample; and carrying out classification training by adopting a meta-learning method according to the system reply candidate set samples, the conversation history samples and the target reply samples selected from the system reply candidate set samples to obtain a man-machine cooperation conversation model.

Optionally, when the processor 1002 obtains the system reply candidate set sample, the conversation history sample, and the target reply sample selected from the system reply candidate set sample, it is specifically configured to: constructing a plurality of meta-learning tasks, each meta-learning task comprising a data set of a corresponding domain; and selecting a system reply candidate set sample, a conversation history sample and a target reply sample selected from the system reply candidate set sample from a plurality of meta-learning tasks as training samples.

Optionally, when performing classification training by using a meta-learning method according to the system reply candidate set sample, the conversation history sample, and a target reply sample selected from the system reply candidate set sample to obtain a human-computer collaboration conversation model, the processor 1002 is specifically configured to: taking the system reply candidate set sample, the conversation history sample and a target reply sample selected from the system reply candidate set sample as input parameters; and (3) selecting a correct reply error from the reply candidate set and taking whether the joint loss of manual reply is adopted as an objective function, and simultaneously training a reply encoder, a history encoder, a judger and a model predictor to obtain a man-machine cooperation dialogue model.

Optionally, the processor 1002 is specifically configured to, when the joint loss of whether to adopt a manual reply or not and an error of selecting a correct reply from the reply candidate set are taken as objective functions, and the reply encoder, the history encoder, the determiner, and the model predictor are trained to obtain the human-computer collaboration dialogue model: vectorizing a system reply candidate set sample to obtain a system reply vector; vectorizing the dialogue history sample to obtain a dialogue state vector; and taking the system reply vector and the dialogue state vector as input parameters, selecting a correct reply error from the reply candidate set, and simultaneously performing two-class training and multi-class training by taking the joint loss of manual reply as an objective function to obtain a reply encoder, a history encoder, a decision device and a predictor which are trained.

Correspondingly, the embodiment of the application also provides a computer readable storage medium storing the computer program. The computer-readable storage medium stores a computer program, and the computer program, when executed by one or more processors, causes the one or more processors to perform the steps in the method embodiment of fig. 4.

Fig. 11 is a schematic structural diagram of a model processing device according to an exemplary embodiment of the present application. As shown in fig. 11, the model processing apparatus includes: a memory 1101 and a processor 1102. In addition, the model processing device further comprises necessary components such as a power component 1103 and a communication component 1104.

A memory 1101 for storing a computer program and may be configured to store other various data to support operations on the model processing device. Examples of such data include instructions for any application or method operating on the model processing device.

The memory 1101 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

A communication component 1104 for data transmission with other devices.

The processor 1102, which may execute computer instructions stored in the memory 1101, is configured to: obtaining a system reply content sample and a conversation history content sample for a current session; obtaining a system reply vector of the system reply content and a conversation state vector of the conversation history content; obtaining an enhanced learning loss function and a cross entropy loss function of the model according to the system reply vector and the dialogue state vector; obtaining a combined loss function according to the reinforcement learning loss function and the cross entropy loss function; and training the network parameters of the model according to the joint loss function to obtain the dialogue model.

Optionally, when the processor 1102 trains the network parameters of the model according to the joint loss function to obtain the dialogue model, the processor is specifically configured to: and optimizing the joint loss function by using an MAML algorithm in the meta-learning method, and training network parameters of the model to obtain a dialogue model.

Optionally, the processor 1102 is further configured to: obtaining training data of a meta-learning method; a plurality of meta-learning tasks are constructed from the training data, each meta-learning task comprising a set of supports and a set of queries.

Optionally, the processor 1102 is further configured to: replying candidate set and dialogue history content by using one or more supporting sets of meta-learning tasks and an inquiry set training system; the system reply candidate set is used for generating system reply content according to the input of the current conversation; the dialog history content is used to provide a dialog history for the current session.

Correspondingly, the embodiment of the application also provides a computer readable storage medium storing the computer program. The computer-readable storage medium stores a computer program, and the computer program, when executed by one or more processors, causes the one or more processors to perform the steps in the method embodiment of fig. 5.

Fig. 12 is a schematic structural diagram of an information processing apparatus according to an exemplary embodiment of the present application. As shown in fig. 12, the information processing apparatus includes a memory 1201 and a processor 1202. In addition, the information processing apparatus includes necessary components such as a power supply component 1203 and a communication component 1204.

A memory 1201 for storing a computer program and may be configured to store other various data to support operations on the information processing apparatus. Examples of such data include instructions for any application or method operating on the information processing device.

The memory 1201 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

A communication component 1204 for data transmission with other devices.

A processor 1202 that can execute computer instructions stored in memory 1201 to: receiving a question of a current conversation; according to the problem of the current conversation, system reply content and conversation historical content aiming at the current conversation are obtained; generating a system reply vector of the system reply content and a conversation state vector of the conversation history content; and generating the reply content of the question of the current conversation according to the system reply vector and the conversation state vector.

Correspondingly, the embodiment of the application also provides a computer readable storage medium storing the computer program. The computer-readable storage medium stores a computer program, and the computer program, when executed by one or more processors, causes the one or more processors to perform the steps in the method embodiment of fig. 6.

The communication components of fig. 9-12 described above are configured to facilitate wired or wireless communication between the device in which the communication component is located and other devices. The device where the communication component is located can access a wireless network based on a communication standard, such as a WiFi, a 2G, 3G, 4G/LTE, 5G and other mobile communication networks, or a combination thereof. In an exemplary embodiment, the communication component receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

The power supply components of fig. 9-12 described above provide power to the various components of the device in which the power supply component is located. The power components may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the device in which the power component is located.

In the device and storage medium embodiments of the present application, an information processing device obtains a current conversation history of a conversation including a problem of a current round and a history round, and a system reply candidate set; replying a candidate set and a dialogue history by the system, and inputting a human-computer cooperation dialogue model which is trained by adopting a meta-learning training method in advance; the man-machine cooperation dialogue model determines whether an artificial mode is adopted, and under the condition that the man-machine cooperation dialogue model determines that a system reply mode is adopted, the information processing equipment receives replies aiming at the problems of the current round, wherein the replies are selected and output from the system reply candidate set by the man-machine cooperation dialogue model; the man-machine cooperation dialogue model trained by the meta-learning training method has higher dialogue accuracy.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.

The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A method of model training, comprising:

obtaining an enhanced learning loss function and a cross entropy loss function of the model according to the system reply vector and the dialogue state vector;

2. The method of claim 1, wherein training network parameters of a model according to the joint loss function to obtain a dialogue model comprises:

and optimizing the joint loss function by using an MAML algorithm in the meta-learning method, and training network parameters of the model to obtain a dialogue model.

3. The method of claim 2, further comprising:

obtaining training data of the meta-learning method;

and constructing a plurality of meta-learning tasks according to the training data, wherein each meta-learning task comprises a support set and a query set.

4. The method of claim 3, further comprising:

replying to candidate set and dialogue history content using the support set and query set training system of the one or more meta-learning tasks;

the system reply candidate set is used for generating system reply content according to the input of the current conversation;

the dialog history content is used to provide a dialog history for the current session.

5. A dialogue processing method for dialogue processing using the dialogue model of any one of claims 1 to 4, comprising:

receiving a question of a current conversation;

6. A conversation processing method, comprising:

7. The method of claim 6, further comprising:

and receiving a reply aiming at the current round of the problem returned by the cooperation provider equipment when the man-machine cooperation dialogue model determines to adopt the manual reply mode.

8. The method of claim 6, wherein inputting a system reply candidate set and a current conversation history into a pre-trained human-machine collaboration conversation model for the human-machine collaboration conversation model to determine a reply mode for use by the human-machine collaboration conversation model to reply to a question in a current turn, and receiving a reply to the question in the current turn selected and output by the human-machine collaboration conversation model from the system reply candidate set when the system reply mode is determined to be employed comprises:

determining whether a manual reply mode is adopted or not by using a decision device in a man-machine cooperation dialogue model;

if the system reply mode is determined to be adopted, selecting the reply aiming at the problem of the current turn from the system reply candidate set by utilizing a predictor;

outputting a reply to the question of the current round.

9. The method of claim 8, wherein determining whether to use the manual reply mode using the determiner comprises:

inputting the system reply candidate set into a reply encoder, and performing vectorization processing on the system reply candidate set by using the reply encoder to obtain a system reply vector;

inputting the current conversation history into a history encoder, and performing vectorization processing on the current conversation history by using the history encoder to obtain a conversation state vector;

and inputting the system reply vector and the dialogue state vector into a decider, so that the decider can determine whether a manual reply mode is adopted.

10. The method of claim 8, further comprising:

if the manual reply mode is determined to be adopted, sending a manual reply request to the cooperation provider equipment so that the cooperation provider equipment can obtain the reply of the manually input problem of the current turn;

receiving a reply to the question of the current turn sent by the collaboration provider device.

11. A method of model training, comprising:

12. The method of claim 11, wherein obtaining the system reply candidate set sample, the conversation history sample, and the selected target reply sample from the system reply candidate set sample comprises:

constructing a plurality of meta-learning tasks, each meta-learning task comprising a data set of a corresponding domain;

and selecting a system reply candidate set sample, a conversation history sample and a target reply sample selected from the system reply candidate set sample from the plurality of meta-learning tasks as training samples.

13. The method of claim 11, wherein the obtaining of the human-computer collaboration dialogue model by performing classification training using a meta-learning method according to the system reply candidate set samples, the dialogue history samples, and the target reply samples selected from the system reply candidate set samples comprises:

taking the system reply candidate set sample, the conversation history sample and a target reply sample selected from the system reply candidate set sample as input parameters;

and (3) selecting a correct reply error from the reply candidate set and taking whether the joint loss of manual reply is adopted as an objective function, and simultaneously training a reply encoder, a history encoder, a judger and a model predictor to obtain a man-machine cooperation dialogue model.

14. The method of claim 13, wherein the joint loss of whether to adopt artificial reply and error of selecting correct reply from the reply candidate set are used as objective functions, and the reply coder, the history coder, the decider and the model predictor are trained simultaneously to obtain the man-machine collaboration dialogue model, and the method comprises the following steps:

vectorizing a system reply candidate set sample to obtain a system reply vector;

vectorizing the dialogue history sample to obtain a dialogue state vector;

and taking the system reply vector and the dialogue state vector as input parameters, selecting a correct reply error from the reply candidate set, and simultaneously performing two-class training and multi-class training by taking the joint loss of manual reply as an objective function to obtain a reply encoder, a history encoder, a decision device and a predictor which are trained.

15. An information processing apparatus characterized by comprising: one or more processors and one or more memories storing computer programs;

the one or more processors to execute the computer program to:

16. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by one or more processors, causes the one or more processors to perform acts comprising:

17. A model training apparatus, comprising: one or more processors and one or more memories storing computer programs;

the one or more processors to execute the computer program to:

18. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by one or more processors, causes the one or more processors to perform acts comprising: