CN113450786A

CN113450786A - Network model obtaining method, information processing method, device and electronic equipment

Info

Publication number: CN113450786A
Application number: CN202010218120.4A
Authority: CN
Inventors: 计峰; 彭舒科; 崔少波; 陈海青
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-03-25
Filing date: 2020-03-25
Publication date: 2021-09-28

Abstract

The embodiment of the application provides a network model obtaining method, a network model processing method, an information processing method, a device and electronic equipment. The network model obtaining method comprises the following steps: firstly, obtaining original text information related to the field of multi-dialog scenes; and then, obtaining a plurality of strengthened decision network models according to the original text information related to the multi-conversation scene field. And finally, based on original text information related to the field of multi-conversation scenes, carrying out knowledge distillation on the conversation strategies of the multiple strengthened decision network models to obtain a target simplified decision network model. The target simplified decision network model obtained by the method can obtain the target text information which is used for replying the original text information relating to the multi-conversation scene field and relates to the multi-conversation scene field directly according to the original text information relating to the multi-conversation scene field. The problems that the processing result obtained by the existing processing method for the task type in the multi-conversation scene field is inaccurate and the decision is complex are solved.

Description

Network model obtaining method, information processing method, device and electronic equipment

Technical Field

The application relates to the technical field of computers, in particular to a network model obtaining method, two network model obtaining methods, a device and electronic equipment. The invention also relates to an information processing method, an information processing device, electronic equipment, voice processing equipment, video processing equipment and vehicle-mounted voice processing equipment.

Background

Natural language processing technology is an important direction in the fields of computer science and artificial intelligence, and is used for researching various theories and methods for realizing effective communication between people and computers by using natural language. Taking the application of natural language understanding technology in a task-based dialog scenario between human and machines as an example, in the current human-machine dialog scenario, it is possible to obtain reply information for replying to a user's intention according to the user's intention by using natural language understanding technology.

In the existing task type conversation scene, the problem of man-machine conversation in the field of single conversation scene is mainly processed based on a natural language processing technology and a neural network, and a better result is obtained in the field of single conversation scene. However, in a dialog task involving a multi-dialog scenario domain, the above-described approach applied in the domain involving a single-dialog scenario may no longer be suitable for the approach applied in the domain involving a multi-dialog scenario. The main body is as follows: if the neural network related to the field of single conversation scenes is adopted to process the task-type conversation related to the field of multi-conversation scenes, the slot positions are manually defined in the natural language processing technology, the slot position space required in the conversation tasks related to the field of multi-conversation scenes is far larger than the slot position space required in the conversation tasks related to the field of single conversation scenes, and the accuracy of the processing result of the natural language processing technology is reduced due to a large amount of slot position spaces, and the decision is complex.

Disclosure of Invention

The embodiment of the application provides a network model obtaining method, which is used for solving the problems that in the prior art, a processing result obtained by a task type processing method in the field of multi-conversation scenes is inaccurate and decision making is complex.

The embodiment of the application provides a network model obtaining method, which comprises the following steps:

obtaining original text information relating to the field of multi-dialog scenes;

obtaining a plurality of strengthened decision network models according to the original text information related to the field of the multi-dialog scene; each enhanced decision network model in the plurality of enhanced decision network models is used for obtaining target text information related to the field of the single conversation scene through conversation strategy learning according to original text information related to the field of the single conversation scene; the original text information relating to the field of the single conversation scene is obtained after the field of the conversation scene is divided into the fields of the multiple conversation scenes;

based on the original text information related to the multi-conversation scene field, carrying out knowledge distillation on the conversation strategies of the multiple reinforced decision network models to obtain a target simplified decision network model; the target simplifying decision network model is used for obtaining target text information related to the multi-conversation scene field according to original text information related to the multi-conversation scene field; the target text information is text information used for replying the original text information in the dialogue scene in the field of the dialogue scene.

Optionally, obtaining a plurality of enhanced decision network models according to the original text information related to the multi-dialog scene field includes:

performing dialog scene field recognition on the original text information relating to the multi-dialog scene field to obtain a plurality of dialog scene field information corresponding to the original text information relating to the multi-dialog scene;

and obtaining the plurality of strengthened decision network models according to the plurality of dialogue scene field information.

Optionally, the obtaining the plurality of enhanced decision network models according to the information of the plurality of dialog scene domains includes:

and obtaining the plurality of strengthened decision network models according to the plurality of dialogue scene field information and the corresponding relation between the dialogue scene field information and the strengthened decision network models.

Optionally, the target text information related to the field of single dialog scenes is obtained as follows:

encoding the original text information related to the field of the single dialog scene to obtain a dialog state vector related to the field of the single dialog scene;

carrying out dialogue strategy learning on the dialogue state vector relating to the single-field dialogue scene to obtain a dialogue reply action vector relating to the single-field dialogue scene field;

and decoding the dialog reply action vector relating to the field of the single dialog scene to obtain the target text information relating to the field of the single dialog scene.

Optionally, the method further includes: acquiring artificial marking information aiming at the original text information in the field of the single-dialog scene, and vectorizing the artificial marking information to acquire an artificial marking information vector; the manual labeling information comprises database pointer information and dialogue confidence state information, wherein the database pointer information is used for obtaining target text information related to the field of single dialogue scenes;

the encoding the original text information related to the field of the single dialog scene to obtain the dialog state vector related to the field of the single dialog scene comprises:

and carrying out vector splicing on the original data vector which is subjected to the coding operation and relates to the field of the single dialog scene and the artificial marking information vector to obtain the dialog state vector relating to the field of the single dialog scene.

Optionally, the performing dialog policy learning on the dialog state vector relating to the single-domain dialog scenario to obtain a dialog reply action vector relating to the single-domain dialog scenario field includes:

taking the dialogue state vector relating to the single-field dialogue scene as input data of a dialogue strategy learning module in the reinforced decision network model, obtaining an output result of the dialogue strategy learning module, and taking the output result of the dialogue strategy learning module as a dialogue reply action vector relating to the single-field dialogue scene field; the dialogue strategy learning module is a network model constructed based on a two-region tangent function.

Optionally, the decoding the dialog reply action vector relating to the field of single dialog scenes to obtain the target text information relating to the field of single dialog scenes includes:

and taking the dialog reply action vector relating to the field of the single dialog scene as input data of a decoding module in the enhanced decision network model, obtaining text information output by the decoding module, and confirming the text information output by the decoding module as the target text information relating to the field of the single dialog scene.

Optionally, the method further includes: correspondingly inputting a plurality of original text messages related to the field of the single dialog scene into the plurality of reinforced decision network models respectively to obtain a plurality of target text messages related to the field of the single dialog scene;

the knowledge distillation is carried out on the conversation strategies of the multiple reinforced decision network models based on the original text information related to the multi-conversation scene field to obtain a target simplified decision network model, and the method comprises the following steps:

and based on the original text information related to the multiple conversation scene fields, carrying out knowledge distillation on the conversation strategies of the conversation strategy learning modules in the multiple reinforced decision network models and the target text information of the multiple single conversation scene fields to obtain the target simplified decision network model.

Optionally, the obtaining the target streamlined decision network model by performing knowledge distillation on the dialog strategy of the dialog strategy learning module in the multiple enhanced decision network models and the target text information of the multiple single dialog scene fields based on the original text information relating to the multiple dialog scene fields includes:

encoding the original text information related to the multi-dialog scene field to obtain a dialog state vector related to the multi-dialog scene field;

performing dialogue strategy learning on the dialogue state vectors relating to the multi-field dialogue scene, and obtaining dialogue reply action vectors relating to the multi-field dialogue scene field according to the dialogue reply action vectors relating to the single-dialogue scene field;

decoding the dialog reply action vectors relating to the multi-dialog scene field, and obtaining target text information relating to the multi-dialog scene field according to the target text information relating to the single-dialog scene field;

and obtaining the target simplified decision network model according to the original text information related to the multi-dialog scene field and the target text information related to the multi-dialog scene field.

Optionally, the encoding the original text information related to the multi-dialog scene domain to obtain a dialog state vector related to the multi-dialog scene domain includes:

obtaining a plurality of vocabulary vector information related to the field of single dialog scenes according to the original text information related to the field of the multi-dialog scenes; and obtaining a dialog state vector relating to the multi-dialog scene field according to the plurality of vocabulary vector information relating to the single-dialog scene field.

Optionally, the obtaining the target lean decision network model according to the original text information related to the multi-dialog scene field and the target text information related to the multi-dialog scene field includes:

and optimizing an initial streamlined decision-making network model according to the original text information related to the multi-conversation scene field and the target text information related to the multi-conversation scene field, and taking the optimized streamlined decision-making network model as the target streamlined decision-making network model.

Optionally, the method further includes:

judging whether to stop optimizing the initial compaction decision network model according to a first difference degree between the plurality of dialogue reply action vectors relating to the single dialogue scene field and the dialogue reply action vectors relating to the multi-dialogue scene field and a second difference degree between the plurality of target text information relating to the single dialogue scene field and the target text information relating to the multi-dialogue scene field;

and if the first difference degree of the plurality of dialogue reply action vectors relating to the single dialogue scene field and the dialogue reply action vectors relating to the multi-dialogue scene field meets a preset first threshold condition, and the second difference degree of the plurality of target text messages relating to the single dialogue scene field and the target text messages relating to the multi-dialogue scene field meets a preset second threshold condition, stopping optimizing the initial streamlined decision network model, and taking the optimized streamlined decision network model as the target streamlined decision network model.

The embodiment of the application provides a network model processing method, which comprises the following steps:

obtaining a plurality of original text messages related to the field of single dialog scenes;

respectively obtaining a plurality of strengthened decision network models corresponding to a plurality of dialogue scene fields;

respectively based on each reinforced decision network model, carrying out dialogue strategy learning on the corresponding original text information related to the field of single dialogue scenes;

carrying out a dialogue strategy related to dialogue strategy learning on the plurality of strengthened decision network models, and carrying out knowledge distillation on original text information related to the field of multi-dialogue scenes; wherein the original text information relating to the multiple dialog scene fields is combined information of the original text information relating to the single dialog scene fields.

Optionally, the method further includes: obtaining a plurality of target text information which is respectively output by the plurality of strengthened decision network models after the conversation strategy learning and relates to the field of single conversation scenes;

performing knowledge distillation on the plurality of target text messages related to the single conversation scene field to the original text messages related to the multi conversation scene field;

the system comprises a decision-making enhancement network model, a decision-making enhancement network model and a decision-making enhancement network model, wherein the decision-making enhancement network model is used for obtaining target text information related to the field of single-dialog scenes through dialog strategy learning according to original text information related to the field of the single-dialog scenes; the target text information is text information used for replying the original text information in the dialogue scene in the field of the dialogue scene.

obtaining dialogue strategy knowledge distillation information of a plurality of strengthened decision network models corresponding to a plurality of dialogue scene fields;

obtaining a target simplified decision network model based on the original text information related to the multi-conversation scene field and the conversation strategy knowledge distillation information;

the target simplifying decision network model is used for obtaining target text information related to the multi-conversation scene field according to original text information related to the multi-conversation scene field; the target text information is text information used for replying the original text information in the dialogue scene in the field of the dialogue scene.

Optionally, the method further includes: obtaining a plurality of target text information knowledge distillation information related to the field of single dialog scenes output by the plurality of strengthened decision network models;

and obtaining a target simplified decision network model based on the target text information knowledge distillation information related to the field of the single dialog scene.

The embodiment of the present application correspondingly provides a network model obtaining apparatus, including:

the device comprises an original text information obtaining unit, a text information obtaining unit and a text information processing unit, wherein the original text information obtaining unit is used for obtaining original text information related to the field of multi-conversation scenes;

the system comprises a strengthened decision network model obtaining unit, a decision-making unit and a decision-making unit, wherein the strengthened decision network model obtaining unit is used for obtaining a plurality of strengthened decision network models according to the original text information relating to the field of the multi-conversation scene; each enhanced decision network model in the plurality of enhanced decision network models is used for obtaining target text information related to the field of the single conversation scene through conversation strategy learning according to original text information related to the field of the single conversation scene; the original text information relating to the field of the single conversation scene is obtained after the field of the conversation scene is divided into the fields of the multiple conversation scenes;

a target streamlined decision network model obtaining unit, configured to perform knowledge distillation on the dialog strategies of the multiple strengthened decision network models based on the original text information relating to the multi-dialog scene field, so as to obtain a target streamlined decision network model; the target simplifying decision network model is used for obtaining target text information related to the multi-conversation scene field according to original text information related to the multi-conversation scene field; the target text information is text information used for replying the original text information in the dialogue scene in the field of the dialogue scene.

The embodiment of the present application correspondingly provides a network model processing apparatus, including:

the device comprises an original text information obtaining unit, a text information obtaining unit and a text information processing unit, wherein the original text information obtaining unit is used for obtaining a plurality of original text information related to the field of single dialogue scenes;

the system comprises a strengthened decision network model obtaining unit, a decision-making unit and a decision-making unit, wherein the strengthened decision network model obtaining unit is used for respectively obtaining a plurality of strengthened decision network models corresponding to a plurality of conversation scene fields;

the learning unit is used for carrying out dialogue strategy learning on the corresponding original text information related to the field of the single dialogue scene based on each reinforced decision network model;

the knowledge distillation unit is used for carrying out knowledge distillation on original text information related to the field of multi-conversation scenes by carrying out the conversation strategies related to the conversation strategy learning on the plurality of strengthened decision network models; wherein the original text information relating to the multiple dialog scene fields is combined information of the original text information relating to the single dialog scene fields.

the knowledge distillation information acquisition unit is used for acquiring the conversation strategy knowledge distillation information of a plurality of strengthened decision network models corresponding to a plurality of conversation scene fields;

a target streamlined decision network model obtaining unit, configured to obtain a target streamlined decision network model based on the original text information related to the multi-dialog scene field and the dialog strategy knowledge distillation information;

An embodiment of the present application further provides an information processing method, including:

the original text information relating to the multi-conversation scene field is used as input information of a pre-obtained target simplifying decision network model to obtain target text information relating to the multi-conversation scene field; wherein the target text information related to the multi-dialog scene field is used for replying the original text information related to the multi-dialog scene field in the dialog scene.

The embodiment of the present application further provides an information processing apparatus, which includes:

the target text information obtaining unit is used for taking the original text information relating to the multi-conversation scene field as input information of a pre-obtained target simplification decision network model to obtain target text information relating to the multi-conversation scene field; wherein the target text information related to the multi-dialog scene field is used for replying the original text information related to the multi-dialog scene field in the dialog scene.

An embodiment of the present application further provides a speech processing apparatus, including: the system comprises a voice acquisition module, a voice-to-text module, an information processing module and a response module;

the voice acquisition module is used for acquiring search voice information related to the multi-conversation scene field;

the voice-to-text module is used for converting the voice information searched in the field related to the multi-conversation scene into original text information related to the field of the multi-conversation scene;

the information processing module is used for taking the original text information relating to the multi-conversation scene field as input information of a pre-obtained target simplification decision network model to obtain target text information relating to the multi-conversation scene field; wherein the target text information related to the multi-dialog scene field is used for replying the original text information related to the multi-dialog scene field in the dialog scene;

and the response module is used for converting the target text information relating to the field of the multi-conversation scene into target voice information and playing the target voice information.

An embodiment of the present application further provides a video processing device, including: the system comprises a voice acquisition module, a voice-to-text module, an information processing module and a video response module;

and the video response module is used for converting the target text information relating to the field of the multi-conversation scene into target video information and displaying the target video information.

An embodiment of the present application further provides a vehicle-mounted voice processing device, including: the system comprises a vehicle-mounted voice acquisition module, a voice-to-text module, an information processing module and a response module;

the vehicle-mounted voice acquisition module is used for acquiring voice information related to the field of multi-conversation scenes;

the voice-to-text module is used for converting the voice information relating to the field of the multi-dialog scene into original text information relating to the field of the multi-dialog scene;

the information processing module is used for taking the original text information relating to the multi-conversation scene field as input information of a pre-obtained target simplification decision network model to obtain target text information relating to the multi-conversation scene field; wherein the target text information related to the multi-dialog scene field is used for responding to the original text information related to the multi-dialog scene field in the dialog scene;

and the response module is used for converting the target text information relating to the field of the multi-conversation scene into action information and executing corresponding actions.

An embodiment of the present application further provides an electronic device, including:

a processor;

a memory for storing a computer program to be executed by the processor for executing the methods described in the above-mentioned embodiments of the network model obtaining method, the network model processing method, and the information processing method.

The embodiment of the present application further provides a computer storage medium, where a computer program is stored, and the computer program is run by a processor to execute the methods described in the above network model obtaining method embodiment, network model processing method embodiment, and information processing method embodiment.

Compared with the prior art, the method has the following advantages:

the embodiment of the application provides a network model obtaining method, which comprises the following steps: obtaining original text information relating to the field of multi-dialog scenes; obtaining a plurality of strengthened decision network models according to original text information related to the field of multi-conversation scenes; each enhanced decision network model in the multiple enhanced decision network models is used for obtaining target text information related to the field of the single conversation scene through conversation strategy learning according to original text information related to the field of the single conversation scene; the original text information relating to the field of single conversation scenes is the original text information obtained after the field of conversation scenes of the original text information relating to the field of multi-conversation scenes is divided; performing knowledge distillation on the conversation strategies of a plurality of reinforced decision network models based on original text information related to the field of multi-conversation scenes to obtain a target simplified decision network model; the target simplifying decision network model is used for obtaining target text information related to the multi-conversation scene field according to original text information related to the multi-conversation scene field; the target text information is text information used for replying to the original text information in the dialog scene in the field of the dialog scene. The method comprises the steps of firstly, obtaining original text information related to the field of multi-conversation scenes; and then, obtaining a plurality of strengthened decision network models according to the original text information related to the multi-conversation scene field. And finally, based on original text information related to the field of multi-conversation scenes, carrying out knowledge distillation on the conversation strategies of the multiple strengthened decision network models to obtain a target simplified decision network model. By utilizing the target simplifying decision network model obtained by the embodiment of the application, the target text information which is used for replying the original text information relating to the multi-conversation scene field and relates to the multi-conversation scene field can be obtained directly according to the original text information relating to the multi-conversation scene field. The problems that the processing result obtained by the existing processing method for the task type in the multi-conversation scene field is inaccurate and the decision is complex are solved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art according to the drawings.

Fig. 1 is a flowchart of a network model obtaining method according to a first embodiment of the present application.

Fig. 2 is a block diagram of a teacher model provided in the first embodiment of the present application.

Fig. 3 is a flowchart for obtaining target text information related to the field of single dialog scenes according to the first embodiment of the present application.

FIG. 4 is a flowchart of obtaining a target lean decision network model according to a first embodiment of the present application.

Fig. 5 is a structural diagram of a student model provided in the first embodiment of the present application.

Fig. 6 is a block diagram illustrating knowledge distillation performed by a plurality of teacher models to student models according to the first embodiment of the present application.

Fig. 7 is a schematic diagram of a network model obtaining apparatus according to a fourth embodiment of the present application.

Fig. 8 is a flowchart of an information processing method according to a seventh embodiment of the present application.

Fig. 9 is a schematic diagram of an information processing apparatus according to an eighth embodiment of the present application.

Fig. 10 is a schematic view of an application scenario of an information processing method according to a ninth embodiment of the present application.

Fig. 11 is a schematic view of an electronic device according to a twelfth embodiment of the present application.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is capable of implementation in many different ways than those herein set forth and of similar import by those skilled in the art without departing from the spirit of this application and is therefore not limited to the specific implementations disclosed below.

The application provides a network model obtaining method, two network model obtaining methods, devices and electronic equipment. The invention also relates to an information processing method, an information processing device, electronic equipment, voice processing equipment, video processing equipment and vehicle-mounted voice processing equipment, and the following embodiments are provided.

Fig. 1 is a flowchart of a network model obtaining method according to a first embodiment of the present application, where the method includes the following steps.

Step S101: original text information relating to the field of multi-dialog scenarios is obtained.

In this embodiment, a network model is obtained, and as a premise for obtaining the network model, training sample data for training the network is first obtained. And the original text information related to the field of multi-dialog scenes is training sample data of a training network model. The finally obtained network model in the embodiment is a simplified decision network model, and the simplified decision network model can be used in the field of multiple dialog scenes and can obtain target text information related to the field of multiple dialog scenes, which is used for replying the original text information related to the field of multiple dialog scenes, according to the original text information related to the field of multiple dialog scenes.

For example, in the field of dialog scenarios between a user and a speech interaction system, it is assumed that the user poses such a problem: i want to order a cheap air ticket that flies to place B from place A, and see if there is a suitable hotel if the time is too late. When receiving the question posed by the user, the voice interaction system replies according to the question posed by the user. We treat the questions posed by the user as original text information relating to the dialog scenario, and the information used by the voice interaction system to reply to the user questions as target text information relating to the dialog scenario. Since the above-mentioned questions presented by the user relate to airline reservation and hotel inquiry, it is obvious that there are multi-domain questions involved in the dialogue scenario, i.e., questions involved in the multi-dialogue scenario domain. For example: among the above-mentioned questions posed by the user are the field of airline reservation and the field of hotel inquiries.

However, in the prior art, generally, the processing manner of the task-based dialog between the user and the voice interaction system deals with the problem related to the field of the single-dialog scenario, that is, the voice interaction system can only reply the problem related to the field of the single-dialog scenario, which is presented by the user at a time, that is, only the problem related to the airline reservation or the problem related to the hotel query can be replied among the above-mentioned questions, but the problem related to the airline reservation and the problem related to the hotel query cannot be answered well at the same time. The main reason is that if the problem relating to the multi-dialog scene field is directly processed according to the prior art, the finally obtained reply processing result is inaccurate and the decision involved in the reply is complex.

In the embodiment, the main purpose is to train a network model capable of directly obtaining target text information related to the multi-dialog scene field for replying to the original text information related to the multi-dialog scene field according to the original text information related to the multi-dialog scene field. And (4) calling the network model obtained by training as a reduced decision network model.

The following are some explanations of the concepts involved in the present embodiment: the original text information related to the field of the multi-dialog scene refers to question information including multiple fields, which is asked by a user to the voice interaction system in a task-type dialog scene. For example, the above-mentioned airline reservation and hotel inquiry are problems relating to two fields.

The reinforced decision network model is a network model which can learn and obtain target text information related to the field of the single-dialogue scene through a dialogue strategy according to original text information related to the field of the single-dialogue scene. The original text information related to the field of the single dialogue scene refers to question information containing the single field, which is used for a user to ask a question of the voice interaction system in a task dialogue scene. For example, the above-described airline reservation or hotel inquiry, respectively, is a problem relating to a single field. The target text information related to the field of the single dialog scene refers to answer information containing the single field, which is responded by the voice interaction system to the user in the task type dialog scene. And the dialogue strategy learning refers to learning the related decision-making mode in the process of obtaining answers to questions, and adopts dialogue strategies related to natural language understanding technology. In a task-based dialog scenario, the decision-making modes related to each dialog scenario field are different, and therefore, each dialog scenario field also corresponds to a strengthened decision-making network model of the dialog scenario field. The enhanced decision network model can obtain more accurate target text information aiming at original text information in the field of task type single-dialogue scenes. For example, when the user asks for a cheap air ticket that the place a flies to the place B, the voice interaction system can obtain a reply text message for the cheap air ticket that the place a flies to the place B, such as: the parent, having a special ticket as long as XXX element, the flight number is: XXX. When processing original text information related to multiple dialogue scene fields, multiple strengthened decision models of the multiple dialogue scene fields are needed for processing. However, the decision-making manner inside the enhanced decision-making network model is too complex, which undoubtedly increases the problem of difficult and complex decision-making among multiple enhanced decision-making network models.

Therefore, there is a need for a reduced decision network model for obtaining target text information related to a multi-dialog scenario field directly from original text information related to the multi-dialog scenario field. In other words, the lean decision network model is a network model capable of directly obtaining target text information related to the multi-dialog scene field from original text information related to the multi-dialog scene field. When the simplified decision network model is obtained according to the plurality of enhanced decision network models, knowledge distillation is carried out on the conversation strategies in the plurality of enhanced decision network models in a mode of training to obtain the target simplified decision network model. Here, the decision network models may be treated as Teacher (Teacher) models, the decision network model may be treated as Student (Student) models, and knowledge distillation may be performed through the Teacher (Teacher) models to obtain Student (Student) models. The knowledge distillation of the part refers to strategy knowledge distillation of a plurality of Teacher (Teacher) models to Student (Student) models in a strategy knowledge level.

Knowledge distillation, a means of transferring knowledge gained in complex models to simple models; knowledge obtained by the complex model is generally obtained through training of a large amount of actual training data, the simple model is an essential part of the knowledge obtained from the complex model, and finally the simple model obtains knowledge processing capacity similar to that of the complex model. In this embodiment, the complex model is the enhanced decision network model, i.e. the teacher model; the simple model is the reduced decision network model, namely a student model.

Step S102: obtaining a plurality of strengthened decision network models according to original text information related to the field of multi-conversation scenes; each enhanced decision network model in the multiple enhanced decision network models is used for obtaining target text information related to the field of the single conversation scene through conversation strategy learning according to original text information related to the field of the single conversation scene; the original text information relating to the field of single dialog scenes is the original text information obtained after the field of dialog scenes of the original text information relating to the field of multi-dialog scenes is divided.

After the original text information of the multi-dialog scene field is obtained in step S101, a plurality of enhanced decision network models are obtained according to the original text information related to the multi-dialog scene field. Specifically, obtaining a plurality of hardened decision network models may be obtained as follows.

First, the original text information relating to the multi-dialog scene field is subjected to dialog scene field recognition, and a plurality of dialog scene field information corresponding to the original text information relating to the multi-dialog scene are obtained.

When recognizing the conversation scene field, the keyword information in the original text information related to the multi-conversation scene field can be selected as a way of recognizing the conversation scene field. For example, when a person wants to order a cheap air ticket flying to place a to place B as original text information related to a multi-conversation scene field, the air ticket ordering field information in the conversation scene field can be identified by using the air ticket ordering as keyword information. In this way, a plurality of dialog scene domain information corresponding to the original text information relating to the multi-dialog scene domain is obtained.

And then, obtaining a plurality of strengthened decision network models according to the information of the plurality of dialogue scene fields.

After obtaining a plurality of dialog scene domain information corresponding to original text information relating to a multi-dialog scene domain, obtaining a plurality of enhanced decision network models according to the plurality of dialog scene domain information. Specifically, according to the information of the plurality of dialog scene domains, obtaining the plurality of enhanced decision network models may be as follows: and obtaining a plurality of strengthened decision network models according to the plurality of dialogue scene domain information and the corresponding relation between the dialogue scene domain information and the strengthened decision network models.

More specifically, before obtaining the plurality of decision-making network models, the correspondence between the session scene domain information and the decision-making network models may be obtained, so that after obtaining the plurality of session scene domain information, the plurality of decision-making network models may be obtained according to the obtained plurality of session scene domain information and the correspondence between the session scene domain information and the decision-making network models.

In step S102, the enhanced decision network model is used to obtain target text information related to the field of the single dialog scene through the dialog strategy learning according to the original text information related to the field of the single dialog scene. The enhanced decision network model can learn and obtain target text information related to the field of the single-dialog scene through a dialog strategy according to original text information related to the field of the single-dialog scene, and is mainly related to the structure of an enhanced decision network model, namely a Teacher (Teacher) model, introduced below.

The structure diagram of the Teacher (Teacher) model is shown in fig. 2, and the Teacher (Teacher) model includes: the device comprises an encoding module, a conversation strategy learning module and a decoding module. Accordingly, obtaining target text information related to the field of a single dialog scene through a Teacher (Teacher) model may be performed by performing the following steps shown in fig. 3.

Step S102-1: and encoding the original text information related to the field of the single dialog scene through an encoding module to obtain a dialog state vector related to the field of the single dialog scene.

Specifically, in this step, encoding the original text information related to the field of the single dialog scene by the encoding module means inputting the original text information related to the field of the single dialog scene into the encoding module to obtain an original data vector related to the field of the single dialog scene.

Meanwhile, before obtaining the dialog state vector relating to the field of the single dialog scene, obtaining artificial labeling information aiming at the original text information relating to the field of the single dialog scene, and vectorizing the artificial labeling information to obtain an artificial labeling information vector; the manual labeling information comprises database pointer information used for obtaining target text information related to the field of the single conversation scene and conversation confidence state information. The database pointer information may be a vector used to characterize the database. In the ticket booking conversation system, the number of remaining tickets of the airplane ticket is represented. For example, the database information can be represented as a vector as follows: using a vector [1,0,0] to represent that the remaining ticket number of the plane ticket is 0; using a vector [0,1,0] to represent that the remaining ticket number of the plane ticket is 1; the vector [0,0,1] represents that the number of remaining tickets of the airline ticket is 2 or more. The dialog confidence state information is a vector used to characterize the current dialog state, thereby providing a reference for determining the dialog action vector.

And then carrying out vector splicing on the original data vector and the artificial marking information vector which are subjected to the coding operation and relate to the field of the single dialogue scene to obtain a dialogue state vector relating to the field of the single dialogue scene.

The original text information relating to the field of single dialog scenes is the original text information obtained after the field of dialog scenes of the original text information relating to the field of multi-dialog scenes is divided. After the original text information related to the multi-dialog scene domain is obtained in step S101, the dialog scene domain is directly divided, and the original text information related to the single-dialog scene domain is obtained.

In this step, the dialog state vector relating to the field of single dialog scenes is s ═ v^u _t；v_b；v_kb]. The dialog state vector relating to the field of the single dialog scenario is s composed of the original data vector v relating to the field of the single dialog scenario^u _tDatabase pointer information vector v for obtaining target text information relating to the field of single dialog scenes_bFor obtaining a target text message relating to the field of single-dialog scenesInformation dialogue confidence state information v_kbAnd (4) splicing to obtain the finished product.

Step S102-2: and carrying out dialogue strategy learning on the dialogue state vector relating to the single-field dialogue scene to obtain a dialogue reply action vector relating to the single-field dialogue scene field.

After obtaining the dialog state vector relating to the single-domain dialog scenario, performing dialog policy learning on the dialog state vector relating to the single-domain dialog scenario, where obtaining the dialog reply action vector relating to the single-domain dialog scenario may refer to: and taking the conversation state vector relating to the single-field conversation scene as input data of a conversation strategy learning module in a Teacher (Teacher) model, obtaining an output result of the conversation strategy learning module, and taking the output result of the conversation strategy learning module as a conversation reply action vector relating to the single-field conversation scene. Specifically, the dialogue strategy learning module is a network model constructed based on the two-zone tangent function in this embodiment. If the dialog reply action vector related to the single dialog scene field is recorded as a^TThen, the following calculation formula is given:

a^T＝tanh(w·[v^u _t；v_b；v_kb])

wherein w is a model parameter of the network model related to the dialogue strategy learning module.

Step S102-3: and decoding the dialogue reply action vector relating to the field of the single dialogue scene to obtain target text information relating to the field of the single dialogue scene.

After obtaining the dialog reply action vector relating to the field of the single dialog scene, decoding the dialog reply action vector relating to the field of the single dialog scene to obtain the target text information relating to the field of the single dialog scene, specifically, decoding the dialog reply action vector relating to the field of the single dialog scene means: and taking the dialogue reply action vector relating to the field of the single dialogue scene as input data of a decoding module, obtaining text information output by the decoding module, and confirming the text information output by the decoding module as target text information relating to the field of the single dialogue scene.

Certainly, when the simplified decision network model is trained, the reinforced decision network model is trained at the same time, and a loss function of the reinforced decision network model is constructed by adopting a general expression form of a cross entropy function, wherein the loss function is as follows:

wherein

The representation indicates a function, the condition in the parenthesis is 1 when satisfied, and the value is 0 when the condition is not satisfied, and the function is a general representation form of the cross entropy function. v denotes the set of all possible occurring words. Phi represents all the parameters of the Teacher (Teacher) model. u denotes original text information relating to the field of a single dialog scenario, and s denotes a dialog state vector relating to the field of a single dialog scenario.

Representing all the words that have been generated in the current reply. m denotes the vocabulary contained in the original text information relating to the field of single dialog scenes. Wherein the set of all possible occurring vocabularies refers to the database set used for generating the vocabularies, and all generated vocabularies in the current reply refer to each vocabulary in the sentence used for the reply.

Step S103: performing knowledge distillation on the conversation strategies of a plurality of reinforced decision network models based on original text information related to the field of multi-conversation scenes to obtain a target simplified decision network model; the target simplifying decision network model is used for obtaining target text information related to the multi-conversation scene field according to original text information related to the multi-conversation scene field; the target text information is text information used for replying to the original text information in the dialog scene in the field of the dialog scene.

After a plurality of original text messages related to the single conversation scene field are obtained according to the original text messages related to the multi conversation scene field, the original text messages related to the single conversation scene field are respectively and correspondingly input into a plurality of enhanced decision network models corresponding to the original text messages, and a plurality of target text messages related to the single conversation scene field are obtained.

In this embodiment, knowledge distillation is performed on the dialog strategies of the multiple reinforcement decision network models based on the original text information related to the multiple dialog scene fields to obtain the target simplified decision network model, which means that knowledge distillation is performed on the dialog strategies of the dialog strategy learning modules in the multiple reinforcement decision network models and the target text information in the multiple single dialog scene fields based on the original text information related to the multiple dialog scene fields to obtain the target simplified decision network model.

Specifically, in order to obtain a target simplified decision network model, knowledge distillation is carried out on conversation strategies in a plurality of Teacher (Teacher) models and target text information in a plurality of single conversation scene fields.

More specifically, based on original text information related to multiple dialog scene fields, knowledge distillation is performed on the dialog strategies of the dialog strategy learning modules in the multiple reinforcement decision network models and the target text information of the multiple single dialog scene fields to obtain a target simplified decision network model, and the following steps are performed as shown in fig. 4.

Step S103-1: and encoding original text information related to the multi-dialog scene field to obtain a dialog state vector related to the multi-dialog scene field.

Specifically, encoding the original text information related to the multi-dialog scene field to obtain the dialog state vector related to the multi-dialog scene field may be: first, a plurality of vocabulary vector information related to a single dialog scene domain are obtained from original text information related to a multi-dialog scene domain. And then, obtaining a dialogue state vector relating to the multi-dialogue scene field according to a plurality of pieces of vocabulary vector information relating to the single-dialogue scene field.

When a target retrench decision network model is obtained through training, an initial retrench decision network model is constructed, the initial retrench decision network model serves as a Student (Student) model, and knowledge distillation is carried out on conversation strategies in a plurality of Teacher (Teacher) models and target text information in a plurality of single conversation scene fields to the Student (Student) model, so that the target retrench decision network model is obtained.

Fig. 5 shows a structure diagram of a Student (Student) model, which includes: the device comprises an encoding module, an environment encoding module and a decoding module. It can be found that there is no dialogue strategy learning module in the Student (Student) model, therefore, the knowledge distillation of the dialogue strategies in the Teacher (Teacher) model to the Student (Student) model is needed to ensure the consistency of the Teacher (Teacher) model and the Student (Student) model strategies.

Meanwhile, in order to ensure that the Student (Student) model and the Teacher (Teacher) model can generate approximate target text information when obtaining a consistent model strategy, knowledge distillation is carried out on the target text information of the plurality of Teacher (Teacher) models, which relates to the field of single conversation scenes, to the text information of the Student (Student) models, which relates to the field of multi conversation scenes, so as to obtain the target text information relating to the field of multi conversation scenes.

The coding module is Long Short Term Memory networks (LSTMs); the environment encoding module is also an LSTMs based structure. The decoding module is also based on the structure of LSTMs. The encoding module is adopted to encode the original text information related to the multi-dialog scene field, so as to obtain a dialog state vector related to the multi-dialog scene field, which can be expressed as the following formula:

wherein h is_mDialog state vectors, w, for fields involving multiple dialog scenarios_tmFor a plurality of lexical vector information relating to the domain of the single dialog scene.

Step S103-2: and carrying out conversation strategy learning on the conversation state vectors relating to the multi-field conversation scene, and obtaining the conversation reply action vectors relating to the multi-field conversation scene field according to a plurality of conversation reply action vectors relating to the single conversation scene field.

After obtaining the dialogue state vector relating to the multi-domain dialogue scene, knowledge distillation is carried out on a plurality of dialogue reply action vectors relating to the single-domain dialogue scene so as to obtain the dialogue reply action vector relating to the multi-domain dialogue scene. Obtaining a dialog reply action vector relating to the field of multi-dialog scenarios may be expressed as the following formula:

wherein, a^SFor dialog reply action vectors that relate to the domain of multi-dialog scenarios,

the context coding module is adopted to further abstract the vocabulary vector information related to the multiple dialogue scene fields to obtain the vocabulary vector information related to the multiple dialogue scene fields.

Step S103-3: and decoding the dialog reply action vector relating to the multi-dialog scene field, and obtaining target text information relating to the multi-dialog scene field according to a plurality of target text information relating to the single-dialog scene field.

After obtaining the dialogue reply action vector relating to the multi-dialogue scene field, decoding the dialogue reply action vector by using a decoding module, and carrying out knowledge distillation on target text information relating to the single-dialogue scene field of a plurality of Teacher (Teacher) models to text information relating to the multi-dialogue scene field of a Student (Student) model.

Step S103-4: and obtaining a target simplified decision network model according to the original text information related to the multi-conversation scene field and the target text information related to the multi-conversation scene field.

Specifically, the target simplified decision network model is obtained according to original text information related to the multi-dialog scene field and target text information related to the multi-dialog scene field, and may be: and optimizing the initial streamlined decision-making network model according to the original text information related to the multi-conversation scene field and the target text information related to the multi-conversation scene field, and taking the optimized streamlined decision-making network model as a target streamlined decision-making network model.

More specifically, whether the optimization of the initial compaction decision network model is stopped is judged according to a first difference degree of a plurality of dialogue reply action vectors related to the single dialogue scene field and a plurality of dialogue reply action vectors related to the multi-dialogue scene field and a second difference degree of a plurality of target text information related to the single dialogue scene field and a plurality of target text information related to the multi-dialogue scene field.

And if the first difference degree of a plurality of dialogue reply action vectors relating to the single dialogue scene field and the dialogue reply action vectors relating to the multi-dialogue scene field meets a preset first threshold condition, and the second difference degree of a plurality of target text messages relating to the single dialogue scene field and the target text messages relating to the multi-dialogue scene field meets a preset second threshold condition, stopping optimizing the initial streamlined decision network model, and taking the finally optimized streamlined decision network model as the target streamlined decision network model.

The knowledge distillation of the dialogue strategies in the Teacher models to the Student models and the knowledge distillation of the target text information of the Teacher models to the text information of the Student models to the multi-dialogue scene fields are mainly embodied in strategy knowledge distillation and text knowledge distillation. The structure diagram of knowledge distillation from a plurality of Teacher (Teacher) models to Student (Student) models is shown in fig. 6.

The strategy knowledge distillation first difference degree adopts a first loss function of a dialogue strategy in a plurality of Teacher (Teacher) models to perform knowledge distillation to Student (Student) models to calculate whether a preset first threshold condition is met, wherein the first loss function is as follows:

wherein k is the number of Teacher (Teacher) models,

to reply to an action vector for a dialog corresponding to the ith Teacher (Teacher) model that relates to the field of single dialog scenarios,

an action vector is replied to for a dialog of a Student (Student) model that relates to the domain of multiple dialog scenarios. The degree of strategy knowledge distillation from a plurality of Teacher (Teacher) models to a Student (Student) model can be obtained through the loss function, and the smaller the first loss function is, the better the degree of strategy knowledge distillation from the plurality of Teacher (Teacher) models to the Student (Student) model is.

The second difference degree of text knowledge distillation adopts a second loss function of knowledge distillation of target text information related to the field of single dialog scenes in a plurality of Teacher (Teacher) models to a Student (Student) model to calculate whether a preset second threshold condition is met, wherein the second loss function is as follows:

this function is also a general representation of the cross entropy function. Where φ represents a parameter of the Teacher (Teacher) model and θ represents a parameter of the Student (Student) model. The degree of text knowledge distillation from a plurality of Teacher (Teacher) models to a Student (Student) model can be obtained through the loss function, and the smaller the second loss function is, the better the degree of text knowledge distillation from the plurality of Teacher (Teacher) models to the Student (Student) model is.

And optimizing the initial streamlined decision-making network model through the first loss function and the second loss function, judging under which conditions the optimization is stopped, and taking the streamlined decision-making network model optimized for the last time as a target streamlined decision-making network model.

The embodiment of the application provides a network model obtaining method, and the method comprises the steps of firstly obtaining original text information related to the field of multi-conversation scenes; and then, obtaining a plurality of strengthened decision network models according to the original text information related to the multi-conversation scene field. And finally, based on original text information related to the field of multi-conversation scenes, carrying out knowledge distillation on the conversation strategies of the multiple strengthened decision network models to obtain a target simplified decision network model. The target simplified decision network model obtained in the embodiment of the application can obtain the target text information which is used for replying the original text information related to the multi-conversation scene field and relates to the multi-conversation scene field directly according to the original text information related to the multi-conversation scene field. The problems that the processing result obtained by the existing processing method for the task type in the multi-conversation scene field is inaccurate and the decision is complex are solved. It should be noted that the target refinement decision network model obtained in this embodiment can be applied to a scenario in which original text information related to a multi-dialog scene field is obtained for replying target text information related to the multi-dialog scene field. Meanwhile, the method can also be applied to scenes for obtaining target response information for responding to the original text information related to the multi-dialog scene field according to the original text information related to the multi-dialog scene field. For example, in a voice device for human-computer interaction, the voice device can not only reply to a question made by a user, but also perform some action for the user's request.

In the first embodiment described above, a network model obtaining method is provided, which is a target lean decision network model obtained based on an interactive process of a plurality of reinforcement decision network models and lean decision network models with each other. Correspondingly, the second embodiment of the present application provides a model processing method, which only uses one side of the multiple enhanced decision network models, that is: the provider of knowledge distillation, describes how to obtain the goal reduction decision network model in the first embodiment. Since the model processing method has been described in detail in the first embodiment, reference is made to the description of the first embodiment for relevant points, and details are not repeated here.

The embodiment provides a network model processing method, which comprises the following steps:

first, a plurality of original text information relating to the field of single dialog scenes is obtained. And then, respectively obtaining a plurality of strengthened decision network models corresponding to a plurality of dialogue scene fields based on a plurality of dialogue scene fields corresponding to a plurality of original text messages related to a single dialogue scene field. And then, performing dialogue strategy learning on the corresponding original text information related to the field of the single dialogue scene based on each strengthened decision network model. And finally, carrying out the dialogue strategy related to dialogue strategy learning on the multiple strengthened decision network models, and carrying out knowledge distillation on the original text information related to the field of multi-dialogue scenes.

Meanwhile, a plurality of target text information related to the field of single conversation scene, which is respectively output by a plurality of enhanced decision network models after the conversation strategy learning, can be obtained, the conversation strategies related to the conversation strategy learning of the plurality of target text information related to the field of single conversation scene and the plurality of enhanced decision network models are carried out, and the knowledge distillation is carried out on the original text information related to the field of multi-conversation scene together.

In the present embodiment, the original text information relating to the multiple dialog scene domain is a combination of a plurality of original text information relating to the single dialog scene domain. The reinforced decision network model is used for obtaining target text information related to the field of the single-dialogue scene through dialogue strategy learning according to original text information related to the field of the single-dialogue scene; the target text information is text information used for replying to the original text information in the dialog scene in the field of the dialog scene.

In the first embodiment described above, a network model obtaining method is provided, which is a target lean decision network model obtained based on an interactive process of a plurality of reinforcement decision network models and lean decision network models with each other. Correspondingly, the third embodiment of the present application provides a model processing method, which only obtains the target lean decision network model, that is: the recipient of the knowledge distillation describes how to obtain the target reduced decision network model in the first embodiment. Since the model processing method has been described in detail in the first embodiment, reference is made to the description of the first embodiment for relevant points, and details are not repeated here.

first, raw textual information relating to the field of multi-dialog scenarios is obtained. And then obtaining dialogue strategy knowledge distillation information of a plurality of strengthened decision network models corresponding to a plurality of dialogue scene fields. And finally, obtaining a target simplified decision network model based on original text information related to the field of multi-conversation scenes and conversation strategy knowledge distillation information.

In this embodiment, the target reduction decision network model is configured to obtain target text information related to the multi-dialog scene field according to original text information related to the multi-dialog scene field; the target text information is text information used for replying to the original text information in the dialog scene in the field of the dialog scene.

Meanwhile, in this embodiment, a plurality of pieces of target text information knowledge distillation information related to the field of the single dialog scene, which is output by the plurality of enhanced decision network models, may also be obtained, and the target simplified decision network model may be obtained based on the plurality of pieces of target text information knowledge distillation information related to the field of the single dialog scene.

In the first embodiment described above, a network model obtaining method is provided, and correspondingly, a fourth embodiment of the present application provides a network model obtaining apparatus. Fig. 7 is a schematic diagram of a network model obtaining apparatus according to a fourth embodiment of the present application. Since the apparatus embodiments are substantially similar to the method embodiments, they are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for relevant points. The device embodiments described below are merely illustrative.

A fourth embodiment of the present application correspondingly provides a network model obtaining apparatus, including:

an original text information obtaining unit 701 configured to obtain original text information relating to a multi-dialog scene field;

a decision-making network model obtaining unit 702, configured to obtain multiple decision-making network models according to the original text information related to the multi-dialog scene field; each enhanced decision network model in the plurality of enhanced decision network models is used for obtaining target text information related to the field of the single conversation scene through conversation strategy learning according to original text information related to the field of the single conversation scene; the original text information relating to the field of the single conversation scene is obtained after the field of the conversation scene is divided into the fields of the multiple conversation scenes;

a target streamlined decision network model obtaining unit 703, configured to perform knowledge distillation on the dialog strategies of the multiple enhanced decision network models based on the original text information related to the multiple dialog scene fields to obtain a target streamlined decision network model; the target simplifying decision network model is used for obtaining target text information related to the multi-conversation scene field according to original text information related to the multi-conversation scene field; the target text information is text information used for replying the original text information in the dialogue scene in the field of the dialogue scene.

Optionally, the decision-making network model obtaining unit is specifically configured to:

Optionally, the system further comprises a labeling information vector obtaining unit; the annotation information vector obtaining unit is used for obtaining artificial annotation information aiming at the original text information in the field of the single-dialog scene, vectorizing the artificial annotation information and obtaining an artificial annotation information vector; the manual labeling information comprises database pointer information and dialogue confidence state information, wherein the database pointer information is used for obtaining target text information related to the field of single dialogue scenes;

the decision-making network model enhancement obtaining unit is specifically configured to: and carrying out vector splicing on the original data vector which is subjected to the coding operation and relates to the field of the single dialog scene and the artificial marking information vector to obtain the dialog state vector relating to the field of the single dialog scene.

Optionally, the decision-making network model obtaining unit is specifically configured to: and taking the dialog reply action vector relating to the field of the single dialog scene as input data of a decoding module in the enhanced decision network model, obtaining text information output by the decoding module, and confirming the text information output by the decoding module as the target text information relating to the field of the single dialog scene.

Optionally, the system further comprises a plurality of target text information obtaining units related to the field of single dialog scenes; the multiple target text information obtaining units related to the field of single dialog scenes are specifically used for: correspondingly inputting a plurality of original text messages related to the field of the single dialog scene into the plurality of reinforced decision network models respectively to obtain a plurality of target text messages related to the field of the single dialog scene;

the target streamlined decision network model obtaining unit is specifically configured to:

Optionally, the target lean decision network model obtaining unit is specifically configured to: encoding the original text information related to the multi-dialog scene field to obtain a dialog state vector related to the multi-dialog scene field;

Optionally, the target lean decision network model obtaining unit is specifically configured to: obtaining a plurality of vocabulary vector information related to the field of single dialog scenes according to the original text information related to the field of the multi-dialog scenes; and obtaining a dialog state vector relating to the multi-dialog scene field according to the plurality of vocabulary vector information relating to the single-dialog scene field.

Optionally, the target lean decision network model obtaining unit is specifically configured to: and optimizing an initial streamlined decision-making network model according to the original text information related to the multi-conversation scene field and the target text information related to the multi-conversation scene field, and taking the optimized streamlined decision-making network model as the target streamlined decision-making network model.

Optionally, the system further includes a determining unit, configured to determine, according to a first difference degree between the multiple dialog reply action vectors related to the single dialog scene field and the dialog reply action vectors related to the multiple dialog scene field, and a second difference degree between the multiple target text messages related to the single dialog scene field and the target text messages related to the multiple dialog scene field, whether to stop optimizing the initial lean decision network model;

In the second embodiment, a network model processing method is provided, and correspondingly, a fifth embodiment of the present application provides a network model processing apparatus. Since the apparatus embodiments are substantially similar to the method embodiments, they are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for relevant points. The device embodiments described below are merely illustrative.

A fifth embodiment of the present application provides a network model processing apparatus, including:

Optionally, the system further comprises a target text information obtaining unit; the target text information obtaining unit is used for obtaining a plurality of target text information which is respectively output by the plurality of strengthened decision network models after being learned by a conversation strategy and relates to the field of single conversation scenes;

the knowledge distillation unit is further used for performing knowledge distillation on the plurality of target text information relating to the single dialog scene fields to the original text information relating to the multi-dialog scene fields;

In the third embodiment, a network model processing method is provided, and correspondingly, a sixth embodiment of the present application provides a network model processing apparatus. Since the apparatus embodiments are substantially similar to the method embodiments, they are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for relevant points. The device embodiments described below are merely illustrative.

A sixth embodiment of the present application provides a network model processing apparatus, including:

Optionally, the knowledge distillation information obtaining unit is further configured to: obtaining a plurality of target text information knowledge distillation information related to the field of single dialog scenes output by the plurality of strengthened decision network models;

the target streamlined decision network model obtaining unit is further configured to obtain a target streamlined decision network model based on the target text information knowledge distillation information relating to the single dialog scene field.

In the first embodiment described above, a network model obtaining method is provided, and correspondingly, a seventh embodiment of the present application provides an information processing method. Fig. 8 is a flowchart of an information processing method according to a seventh embodiment of the present application. Since this embodiment of the method substantially corresponds to the first embodiment, it is relatively simple to describe, and for the relevant points, reference may be made to the partial description of the first embodiment. The system embodiments described below are merely illustrative.

The present embodiment provides an information processing method, including the following steps:

step S801: original text information relating to the field of multi-dialog scenarios is obtained.

Step S802: the method comprises the steps that original text information related to the multi-conversation scene field is used as input information of a target simplification decision network model obtained in advance, and target text information related to the multi-conversation scene field is obtained; wherein the target text information related to the multi-dialog scene field is used for replying the original text information related to the multi-dialog scene field in the dialog scene.

In the seventh embodiment described above, an information processing method is provided, and in correspondence with this, an eighth embodiment of the present application provides an information processing apparatus. Fig. 9 is a schematic diagram of an information processing apparatus according to an eighth embodiment of the present application. Since the apparatus embodiments are substantially similar to the method embodiments, they are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for relevant points. The device embodiments described below are merely illustrative.

An eighth embodiment of the present application provides an information processing apparatus, including:

an original text information obtaining unit 901, configured to obtain original text information relating to a multi-dialog scene field;

a target text information obtaining unit 902, configured to use the original text information relating to the multi-dialog scene field as input information of a pre-obtained target simplified decision network model, and obtain target text information relating to the multi-dialog scene field; wherein the target text information related to the multi-dialog scene field is used for replying the original text information related to the multi-dialog scene field in the dialog scene.

In order to facilitate understanding of the information processing method of the present application, the information processing method provided in the seventh embodiment of the present application is applied to a voice processing apparatus. Please refer to fig. 10, which is a schematic view of an application scenario of an information processing method according to a ninth embodiment of the present application.

When the information processing method of the present application is applied to a speech processing apparatus, the method in the schematic diagram of fig. 10 is executed in the speech processing apparatus. The intelligent sound box is internally provided with a voice acquisition module, a voice-to-text module, an information processing module and a response module. The voice collection module is used for collecting search voice information related to the multi-conversation scene field, for example, a user can send the search voice information to the voice processing device, and the voice collection module can be used for collecting the search voice information.

And then, the voice acquisition module searches voice information in the field related to the multi-conversation scene and sends the voice information to the voice-to-text module. And after receiving the search voice information related to the multi-conversation scene field, the voice-to-text module converts the search voice information related to the multi-conversation scene field into the search text information related to the multi-conversation scene field. And then, the voice-to-text module sends the text information related to the multi-dialog scene field to an information processing module, and the information processing module takes the original text information related to the multi-dialog scene field as input information of a pre-obtained target simplification decision network model to obtain target text information related to the multi-dialog scene field.

After the target text information related to the multi-conversation scene field is obtained, the response module converts the target text information related to the multi-conversation scene field into target voice information and plays the target voice information.

The information processing method provided by the seventh embodiment of the present application is applied to a video processing apparatus. An application scenario embodiment of an information processing method provided in the tenth embodiment of the present application is described. The video processing apparatus of this embodiment includes: the system comprises a voice acquisition module, a voice-to-text module, an information processing module and a video response module. These modules are described below.

The voice acquisition module is used for acquiring search voice information related to the field of multi-conversation scenes. And the voice-to-text module is used for converting the voice information searched in the field related to the multi-conversation scene into original text information related to the field of the multi-conversation scene. The information processing module is used for taking the original text information relating to the multi-conversation scene field as input information of a pre-obtained target simplification decision network model to obtain target text information relating to the multi-conversation scene field; wherein the target text information related to the multi-dialog scene field is used for replying the original text information related to the multi-dialog scene field in the dialog scene. And the video response module is used for converting the target text information relating to the field of the multi-conversation scene into target video information and displaying the target video information.

The information processing method provided by the seventh embodiment of the present application is applied to a vehicle-mounted voice processing apparatus. An application scenario embodiment of an information processing method provided in the eleventh embodiment of the present application is described. The vehicle-mounted voice processing apparatus of the embodiment includes: the device comprises a vehicle-mounted voice acquisition module, a voice-to-text module, an information processing module and a response module. These modules are described below.

The vehicle-mounted voice acquisition module is used for acquiring voice information related to the field of multi-conversation scenes; the voice-to-text module is used for converting the voice information relating to the field of the multi-dialog scene into original text information relating to the field of the multi-dialog scene; the information processing module is used for taking the original text information relating to the multi-conversation scene field as input information of a pre-obtained target simplification decision network model to obtain target text information relating to the multi-conversation scene field; wherein the target text information related to the multi-dialog scene field is used for responding to the original text information related to the multi-dialog scene field in the dialog scene; and the response module is used for converting the target text information relating to the field of the multi-conversation scene into action information and executing corresponding actions.

The ninth to eleventh embodiments described above are merely examples of application scenarios, and the purpose of this application scenario example is to facilitate understanding of the information processing method of the present application, and is not to be used to limit the information processing method of the present application.

The first to third embodiments and the seventh embodiment of the present application respectively provide a network model obtaining method, two network model processing methods and an information processing method, and the twelfth embodiment of the present application provides electronic devices corresponding to the methods of the first to third embodiments and the seventh embodiment. As shown in fig. 11, a schematic diagram of the electronic device provided in the present embodiment is shown.

A twelfth embodiment of the present application provides an electronic apparatus, including:

a processor 1101;

the memory 1102 is configured to store a computer program, which is executed by the processor and executes the methods described in the network model obtaining method embodiment, the network model processing method embodiment, and the information processing method embodiment.

The first to third embodiments and the seventh embodiment of the present application respectively provide a network model obtaining method, two network model processing methods and an information processing method, and the thirteenth embodiment of the present application provides computer storage media corresponding to the methods of the first to third embodiments and the seventh embodiment.

A thirteenth embodiment of the present application provides a computer storage medium, which stores a computer program that is executed by a processor to execute the methods described in the above-mentioned network model obtaining method embodiment, network model processing method embodiment, and information processing method embodiment.

Although the present application has been described with reference to the preferred embodiments, it is not intended to limit the present application, and those skilled in the art can make variations and modifications without departing from the spirit and scope of the present application, therefore, the scope of the present application should be determined by the claims that follow.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory. The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

1. Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer-readable medium does not include non-transitory computer-readable storage media (non-transitory computer readable storage media), such as modulated data signals and carrier waves.

2. As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Claims

1. A network model obtaining method, comprising:

2. The method of claim 1, wherein obtaining a plurality of enhanced decision network models from the original text information related to the multi-dialog scene domain comprises:

3. The method of claim 2, wherein obtaining the plurality of enhanced decision network models from the plurality of dialog context domain information comprises:

4. The method of claim 1, wherein the target text information related to the field of single dialog scenes is obtained as follows:

5. The method of claim 4, further comprising: acquiring artificial marking information aiming at the original text information in the field of the single-dialog scene, and vectorizing the artificial marking information to acquire an artificial marking information vector; the manual labeling information comprises database pointer information and dialogue confidence state information, wherein the database pointer information is used for obtaining target text information related to the field of single dialogue scenes;

6. The method of claim 4, wherein performing dialog policy learning on the dialog state vector relating to the single-domain dialog scenario to obtain a dialog reply action vector relating to the single-domain dialog scenario domain comprises:

7. The method of claim 4, wherein the decoding the dialog reply action vector relating to the field of single dialog scenes to obtain the target text information relating to the field of single dialog scenes comprises:

8. The method of claim 6, further comprising: correspondingly inputting a plurality of original text messages related to the field of the single dialog scene into the plurality of reinforced decision network models respectively to obtain a plurality of target text messages related to the field of the single dialog scene;

9. The method according to claim 8, wherein the knowledge distillation is performed on the conversation strategy of the conversation strategy learning module in the plurality of decision-making network models and the target text information of the plurality of single conversation scene domains based on the original text information related to the multiple conversation scene domains to obtain the target reduced decision network model, comprising:

10. The method of claim 9, wherein encoding the original text information related to the multiple dialog scene domain to obtain the dialog state vector related to the multiple dialog scene domain comprises:

11. The method of claim 9, wherein obtaining the target lean decision network model according to the original text information related to the multi-dialog scenario field and the target text information related to the multi-dialog scenario field comprises:

12. The method of claim 11, further comprising:

13. A network model processing method, comprising:

14. The method of claim 13, further comprising: obtaining a plurality of target text information which is respectively output by the plurality of strengthened decision network models after the conversation strategy learning and relates to the field of single conversation scenes;

15. A network model processing method, comprising:

16. The method of claim 15, further comprising: obtaining a plurality of target text information knowledge distillation information related to the field of single dialog scenes output by the plurality of strengthened decision network models; and obtaining a target simplified decision network model based on the target text information knowledge distillation information related to the field of the single dialog scene.

17. A network model obtaining apparatus, comprising:

18. A network model processing apparatus, comprising:

19. A network model processing apparatus, comprising:

20. An information processing method characterized by comprising:

21. An information processing apparatus characterized by comprising:

22. A speech processing device, comprising: the system comprises a voice acquisition module, a voice-to-text module, an information processing module and a response module;

23. A video processing apparatus, comprising: the system comprises a voice acquisition module, a voice-to-text module, an information processing module and a video response module;

24. An in-vehicle voice processing apparatus, characterized by comprising: the system comprises a vehicle-mounted voice acquisition module, a voice-to-text module, an information processing module and a response module;

25. An electronic device, comprising:

a processor;

a memory for storing a computer program for execution by the processor to perform the method of any one of claims 1-12, 13-16, 20.

26. A computer storage medium, characterized in that the computer storage medium stores a computer program which is executed by a processor to perform the method of any one of claims 1-12, 13-16, 20.