CN112214592A

CN112214592A - Reply dialogue scoring model training method, dialogue reply method and device

Info

Publication number: CN112214592A
Application number: CN202011224129.2A
Authority: CN
Inventors: 王栋; 张伟男; 王士进; 刘挺; 刘权; 陈志刚; 胡国平
Original assignee: Zhongke Xunfei Internet Beijing Information Technology Co ltd
Current assignee: Zhongke Xunfei Internet Beijing Information Technology Co ltd
Priority date: 2020-11-05
Filing date: 2020-11-05
Publication date: 2021-01-12

Abstract

The application discloses a reply dialogue scoring model training method, a dialogue reply method and a device thereof, wherein the reply dialogue scoring model training method comprises the following steps: after a dialogue training sample is obtained, generating model training data and dialogue importance according to the dialogue training sample, so that the dialogue importance can be used for describing the information importance of sample dialogue content in the dialogue training sample; and training the reply dialogue scoring model according to the model training data and the dialogue importance. The information importance degree of the sample conversation contents in the conversation training sample can be accurately described according to the conversation importance degree, so that the information importance degree difference between different sample conversation contents is referred when the reply conversation scoring model is trained on the basis of the conversation importance degree, the reply conversation scoring model can accurately and comprehensively understand the conversation contents, the prediction accuracy of the reply conversation scoring model can be improved, and accurate reply can be favorably realized for the conversation contents input by a user.

Description

Reply dialogue scoring model training method, dialogue reply method and device

Technical Field

The present application relates to the field of computer technologies, and in particular, to a reply dialog scoring model training method, a dialog reply method, and an apparatus thereof.

Background

With the development of artificial intelligence technology, the application range of man-machine conversation systems (also called spoken language conversation systems) is gradually expanding.

Currently, the man-machine conversation system not only can assist the user to complete certain tasks (such as finding products and reserving), but also can chat with the user. As can be seen, the human-computer dialog system can reply to the dialog content input by the user. However, because the existing human-computer dialog system has low accuracy of replying to the dialog content input by the user, how to reply to the dialog content input by the user accurately is a technical problem to be solved urgently.

Disclosure of Invention

The embodiment of the present application mainly aims to provide a reply dialog scoring model training method, a dialog reply method and a device thereof, which can implement accurate reply for dialog contents input by a user.

The embodiment of the application provides a reply dialog scoring model training method, which comprises the following steps:

obtaining a dialogue training sample; wherein the dialogue training samples comprise M +1 round sample dialogue contents; m is a positive integer;

generating model training data and a dialogue importance degree according to the dialogue training sample; the dialogue importance is used for describing the information importance of the sample dialogue contents in the dialogue training sample;

and training a reply dialogue scoring model according to the model training data and the dialogue importance.

The embodiment of the application also provides a dialog reply method, which comprises the following steps:

acquiring historical conversation content corresponding to a target user;

generating a candidate reply dialog corresponding to the target user according to the historical dialog content corresponding to the target user;

inputting the historical conversation content corresponding to the target user and the candidate reply conversation corresponding to the target user into a reply conversation scoring model to obtain the use score of the candidate reply conversation output by the reply conversation scoring model;

and determining the target reply dialog corresponding to the target user according to the use score of the candidate reply dialog.

The embodiment of the present application further provides a reply dialog scoring model training device, the device includes:

the sample acquisition unit is used for acquiring a dialogue training sample; wherein the dialogue training samples comprise M +1 round sample dialogue contents; m is a positive integer;

the data generation unit is used for generating model training data and dialogue importance according to the dialogue training samples; the dialogue importance is used for describing the information importance of the sample dialogue contents in the dialogue training sample;

and the model training unit is used for training the reply dialogue scoring model according to the model training data and the dialogue importance.

An embodiment of the present application further provides a dialog reply device, where the dialog reply device includes:

the conversation acquisition unit is used for acquiring sample conversation content corresponding to a target user;

the reply generation unit is used for generating a candidate reply dialog corresponding to the target user according to the sample dialog content corresponding to the target user;

the probability prediction unit is used for inputting the sample conversation content corresponding to the target user and the candidate reply conversation corresponding to the target user into a reply conversation score model to obtain the use score of the candidate reply conversation output by the reply conversation score model; wherein the reply dialog scoring model is trained by using the reply dialog scoring model training method according to any one of claims 1 to 15;

and the reply determining unit is used for determining the target reply dialog corresponding to the target user according to the use score of the candidate reply dialog.

The embodiment of the present application further provides a reply dialog scoring model training device, where the device includes: a processor, a memory, a system bus;

the processor and the memory are connected through the system bus;

the memory is used for storing one or more programs, and the one or more programs comprise instructions which, when executed by the processor, cause the processor to execute any implementation method of the reply dialogue scoring model training method provided by the embodiment of the application.

An embodiment of the present application further provides a dialog reply device, where the dialog reply device includes: a processor, a memory, a system bus;

the processor and the memory are connected through the system bus;

the memory is used for storing one or more programs, and the one or more programs comprise instructions which, when executed by the processor, cause the processor to execute any implementation method of the dialog reply method provided by the embodiment of the application.

The embodiment of the present application further provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are run on a terminal device, the terminal device is enabled to execute any implementation method of the reply dialog scoring model training method provided in the embodiment of the present application, or execute any implementation method of the dialog reply method provided in the embodiment of the present application.

Based on the technical scheme, the method has the following beneficial effects:

according to the reply dialog scoring model training method provided by the application, after a dialog training sample is obtained, model training data and dialog importance are generated according to the dialog training sample, so that the dialog importance can be used for describing the information importance of the sample dialog content in the dialog training sample; and training the reply dialogue scoring model according to the model training data and the dialogue importance. The information importance degree of the sample conversation contents in the conversation training sample can be accurately described according to the conversation importance degree, so that the information importance degree difference between different sample conversation contents is referred when the reply conversation scoring model is trained on the basis of the conversation importance degree, the reply conversation scoring model can more accurately and comprehensively understand the conversation contents, the scoring accuracy of the reply conversation scoring model can be improved, the accuracy of the target reply conversation determined on the basis of the reply conversation scoring model can be improved, and accurate reply can be favorably realized for the conversation contents input by a user.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a flowchart of a reply dialog scoring model training method according to an embodiment of the present disclosure;

fig. 2 is a schematic view of a dialog interaction process of a man-machine dialog according to an embodiment of the present application;

FIG. 3 is a schematic diagram of training data of a positive example provided by an embodiment of the present application;

FIG. 4 is a diagram of negative example training data provided in accordance with an embodiment of the present application;

FIG. 5 is a diagram illustrating a reply dialog scoring model provided in accordance with an embodiment of the present application;

fig. 6 is a flowchart of a dialog reply method according to an embodiment of the present application;

fig. 7 is a schematic application scenario diagram of a dialog reply method applied to a terminal device according to an embodiment of the present application;

fig. 8 is a schematic application scenario diagram of a dialog reply method applied to a server according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a reply dialog scoring model training device according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a dialog reply device according to an embodiment of the present application.

Detailed Description

The inventor finds in research on a dialog reply method used by a human-computer dialog system that, in the related art, when a user inputs a dialog to be replied, a plurality of candidate reply dialogues can be determined according to a plurality of rounds of generated dialog contents corresponding to the user, and a usage score of each candidate reply dialog is determined, so that the usage score of each candidate reply dialog can represent the matching degree between each candidate reply dialog and the dialog to be replied; and determining a target reply conversation corresponding to the conversation to be replied from all candidate reply conversations based on the use scores of all the candidate reply conversations. However, how to accurately determine the usage score of the candidate reply dialog still remains a technical problem to be solved urgently.

The inventor also finds that the use scores of the candidate reply dialogs can be determined by referring to a plurality of rounds of generated dialog contents corresponding to the user in the research on the dialog reply method used by the human-computer dialog system; moreover, because the different turn of generated dialog contents carry different dialog information, the different turn of generated dialog contents can have different degrees of influence on the usage score, so in order to improve the prediction accuracy of the usage score, the usage score of each candidate reply dialog can be determined by referring to the degree of influence caused by the different turn of generated dialog contents.

Based on this, the embodiment of the present application provides a reply dialog scoring model training method, which includes: after a dialogue training sample is obtained, generating model training data and dialogue importance according to the dialogue training sample so that the dialogue importance can represent the information importance of sample dialogue content in the dialogue training sample; and training the reply dialogue scoring model according to the model training data and the dialogue importance.

Therefore, the information importance degree of the sample conversation contents in the conversation training sample can be accurately described according to the conversation importance degree, so that the information importance degree difference between different sample conversation contents is referred when the reply conversation scoring model is trained on the basis of the conversation importance degree, the reply conversation scoring model can more accurately and comprehensively understand the conversation contents, the scoring accuracy of the reply conversation scoring model can be improved, the accuracy of the target reply conversation determined on the basis of the reply conversation scoring model can be improved, and the accurate reply of the conversation contents input by a user can be favorably realized.

In addition, the embodiment of the present application does not limit the execution subject of the reply dialog scoring model training method, and for example, the reply dialog scoring model training method provided in the embodiment of the present application may be applied to a data processing device such as a terminal device or a server. The terminal device may be a smart phone, a computer, a Personal Digital Assistant (PDA), a tablet computer, or the like. The server may be a stand-alone server, a cluster server, or a cloud server.

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Method embodiment one

Referring to fig. 1, the figure is a flowchart of a reply dialog scoring model training method according to an embodiment of the present application.

The reply dialog scoring model training method provided by the embodiment of the application comprises the following steps of S101-S104:

s101: a conversational training sample is obtained.

A dialog training sample refers to a dialog corpus used to generate training data used in training a reply dialog scoring model. In addition, the number of the dialogue training samples is not limited in the embodiment of the present application.

In addition, the embodiment of the present application does not limit the obtaining manner of the dialog training sample, for example, the dialog training sample may be a dialog corpus already stored in the man-machine dialog system, or may be a dialog corpus crawled from a preset dialog webpage (such as a microblog, a post bar, etc.) after legal authorization, or may be a dialog corpus manually written or uploaded.

The conversational training samples are sets of conversational content, and one conversational training sample includes at least one pair of conversational content. The one-pair speech content refers to content spoken by a party interlocutor in a one-pair speech interaction process. For example, for a one-wheel conversation interaction process "historical user says' I want to go out to play a pot! 'Man-machine dialog System says' idea is very good! Where do you want to go? ' to say, "I want to go out and play a calash! The idea of "and" is very good! Where do you want to go? "are both a pair of dialog contents.

In addition, the number of dialog contents in the dialog training sample is not limited in the embodiments of the present application. For example, if the dialog training sample is generated according to the dialog corpus shown in fig. 2, the dialog training sample may include M +1 sample dialog contents; moreover, the M +1 round sample dialog contents may be sorted according to the generation time, which specifically includes: the generation time of the sample dialog content of the 1 st round is earlier than that of the sample dialog content of the 2 nd round, the generation time of the sample dialog content of the 2 nd round is earlier than that of the sample dialog content of the 3 rd round, … …, and the generation time of the sample dialog content of the mth round is earlier than that of the sample dialog content of the M +1 th round. Wherein M is a positive integer.

It should be noted that the generation time of the sample dialog content is not limited in the embodiments of the present application, and for example, the generation time of the sample dialog content may refer to a time when the sample dialog content is stored (or displayed) on the human-computer dialog system.

S102: model training data is generated from the conversational training samples.

The model training data refers to training data that is required to be used in training the reply dialog scoring model, and is used to simulate the dialog process, so that the model training data may include dialog content involved in at least one dialog interaction process. In addition, the number of model training data is not limited in the embodiments of the present application.

The model training data may be generated from the dialogue training samples, and the embodiments of the present application do not limit the implementation of generating the model training data (i.e., S102). To facilitate understanding of the generation process of the model training data, the following description is made with reference to an example.

As an example, when the dialog training samples include M + 1-round sample dialog contents, the generation time of the j-th round sample dialog contents in the dialog training samples is earlier than the generation time of the j + 1-th round sample dialog contents, j is a positive integer, and j ≦ M, S102 may specifically include S1021 to S1023:

s1021: and generating reply reference contents according to the 1 st round sample conversation contents to the M round sample conversation contents in the conversation training samples.

Wherein, the reply reference content may include at least one pair of conversation contents. For example, the reply reference content may include the 1 st through mth round sample conversation contents shown in fig. 2.

In addition, the reply reference content and the following candidate reply content can be jointly used for generating model training data; moreover, for model training data, the reply reference content may be used to serve as other round of dialog content (e.g., 1 st round of sample dialog content through M-th round of sample dialog content in fig. 2) than the last round of dialog content during the dialog simulated by the model training data, while the candidate reply content is used to serve as the last round of dialog content (e.g., M +1 th round of sample dialog content in fig. 2) during the dialog simulated by the model training data.

In addition, the generation manner of the reply reference content is not limited in the embodiments of the present application, for example, the dialog content of the 1 st round sample to the dialog content of the mth round sample in the dialog training samples may be directly determined as the reply reference content.

Based on the related content of S1021, in the embodiment of the present application, after obtaining the dialog training sample including the sample dialog contents of the M +1 round, the sample dialog contents of the 1 st round through the sample dialog contents of the M th round may be directly determined as the reply reference contents, so that the model training data required to be used when the reply dialog scoring model is trained can be generated by using the reply reference contents in the following.

S1022: and acquiring candidate reply contents corresponding to the reply reference contents.

The candidate reply content corresponding to the reply reference content is used for generating model training data together with the reply reference content; in addition, the number of candidate reply contents corresponding to the reply reference content is not limited in the embodiments of the present application. In addition, each candidate reply content corresponding to the reply reference content may include a pair of dialog contents, for example, one candidate reply content corresponding to the reply reference content may be the M +1 th sample dialog content shown in fig. 2.

In fact, in order to improve the scoring performance of the reply dialog scoring model, the positive example training data and the negative example training data may be used in training the reply dialog scoring model, so to meet the requirement, the candidate reply content corresponding to the reply reference content may include one positive example reply content and at least one negative example reply content. As can be seen, each candidate reply content corresponding to the reply reference content may be a positive example reply content or a negative example reply content.

The formal reply content refers to the conversation content which can be correctly replied to the conversation content included in the reply reference content. For example, when the reply reference content includes the sample conversation content of the 1 st round to the sample conversation content of the mth round shown in fig. 2, the normal example reply content may be the sample conversation content of the M +1 st round "… … is a bullet train from south beijing to beijing today" shown in fig. 2.

In addition, the embodiment of the present application does not limit the generation process of the normal reply content, for example, the normal reply content corresponding to the reply reference content may be generated according to the M +1 th round of sample dialog content in the dialog training sample (for example, as shown in fig. 3, the M +1 th round of sample dialog content in the dialog training sample may be directly determined as the normal reply content corresponding to the reply reference content).

The negative example reply content refers to the dialogue content which can be incorrectly replied aiming at the dialogue content included in the reply reference content; in addition, the embodiment of the present application is not limited, and the number of the negative reply contents corresponding to the reply reference content is also limited.

In addition, the embodiment of the present application does not limit the generation process of the negative example reply content, for example, the generation process of the negative example reply content may be: and generating negative example reply contents corresponding to the reply reference contents according to the preset dialogue corpus. The preset dialogue corpus is preset dialogue corpus used for generating negative example reply content.

It should be noted that, the preset dialog corpus is not limited in the embodiments of the present application, for example, when the number of the dialog training samples is Y, and the reply reference content is generated according to the dialog contents of the 1 st round sample to the M th round sample in the Y-th dialog training sample, Y is a positive integer, and Y is less than or equal to Y, the preset dialog corpus may include at least one other dialog training sample in the Y-th dialog training samples except the Y-th dialog training sample, so that the negative example reply content may be generated according to at least one other dialog training sample in the Y-th dialog training samples except the Y-th dialog training sample.

It should be noted that, the embodiment of the present application also does not limit the generation manner of the negative example reply content, and for example, the negative example reply content may be generated by using a negative sampling method.

Based on the related content of S1022, after the reply reference content is obtained, at least one candidate reply content corresponding to the reply reference content may be generated, so that at least one model training data may be generated based on the reply reference content and the at least one candidate reply content corresponding thereto, so that each model training data may simulate a dialog process.

S1023: and generating model training data according to the reply reference content and the candidate reply content corresponding to the reply reference content.

The model training data refers to training data used in training the reply dialogue scoring model, and each model training data can be used for simulating a multi-turn dialogue process. In addition, each model training data comprises a reply reference content and a reply reference content; moreover, a reply reference content may be a positive reply content or a negative reply content.

In addition, to improve the scoring performance of the reply dialog scoring model, the model training data may be positive example training data and negative example training data. Wherein, the positive training data refers to model training data including correct reply dialogue, so that the positive training data can play a role of positive guidance in the training process of the reply dialogue scoring model. Negative training data refers to model training data that includes false reply dialogs, such that the negative training data can act as a back guide in the training process of the reply dialog scoring model.

To facilitate understanding of the positive and negative training data, the following description is made in conjunction with examples.

As an example, when the reply reference content corresponding to the model training data includes the 1 st to M th sample session contents in the session training samples, the positive training data may include the M +1 st sample session contents in the session training samples (as shown in fig. 3), so that the positive training data can guide the reply session scoring model to perform model optimization toward the direction giving the highest score for the M +1 st sample session contents in the training process of the reply session scoring model; however, the negative example training data should include dialogue content (as shown in fig. 4) completely different from the dialogue content of the (M + 1) th round sample in the dialogue training sample, so that the negative example training data can guide the reply dialogue scoring model to perform model optimization towards the direction giving the lowest score for the negative example reply content in the training process of the reply dialogue scoring model.

Based on the related content of S1023, in the embodiment of the present application, after the reply reference content and the candidate reply content corresponding thereto are acquired, the model training data may be generated according to the reply reference content and the candidate reply content corresponding thereto. For example, when the reply reference content includes 1 st to M th sample dialog contents in the dialog training sample, and the candidate reply contents corresponding to the reply reference content include P candidate reply contents, a set of the 1 st sample dialog content, the 2 nd sample dialog content, … …, the M th sample dialog content, and the P th candidate reply content may be determined as the P-th model training data. The p-th candidate reply content may be positive example reply content or negative example reply content; and P is a positive integer, P is less than or equal to P, and P is a positive integer.

Based on the related content of S102, after obtaining the session training sample including the M + 1-round sample session content, generating a reply reference content according to the M-round sample session content with an earlier generation time, and generating a candidate reply content corresponding to the reply reference content according to a preset content (e.g., the M + 1-th sample session content and the preset session corpus in the session training sample); and generating model training data according to the reply reference content and the candidate reply content corresponding to the reply reference content, so that the simulation training data can simulate a conversation process including a multi-turn conversation interaction process, and a reply conversation scoring model can be trained subsequently by using the model training data.

S103: and generating the conversation importance degree according to the conversation training sample.

The dialogue importance is used for describing the information importance of the sample dialogue contents in the dialogue training sample, and particularly can be used for describing the information importance of the non-last-wheel dialogue contents in the dialogue training sample when determining the last-wheel dialogue contents in the dialogue training sample.

In addition, the dialog importance degree is not limited in the embodiments of the present application, for example, when the reply reference content corresponding to the model training data includes 1 st to M th sample dialog contents, the dialog importance degree may include the information importance degree of the 1 st sample dialog content, the information importance degree of the 2 nd sample dialog content, … …, and the information importance degree of the M th sample dialog content.

The information importance of the t-th round of sample conversation content is used for representing the influence degree of the conversation information carried by the t-th round of sample conversation content in the reply conversation scoring process; and t is a positive integer, and t is less than or equal to M.

In fact, for the dialog training samples, the dialog contents of different sample rounds are correlated, for example, the dialog contents of the sample with the earlier generation time may be used for information bedding of the dialog contents of the sample with the later generation time, so that the dialog contents of the sample with the earlier generation time usually carry part of the information related to the dialog contents of the sample with the later generation time. As can be seen, for the t-th round of sample conversation content, the t-th round of sample conversation content may include partial information carried by other rounds of sample conversation content that occur later in time than the t-th round of sample conversation content. In addition, for the tth round sample conversation content, the tth round sample conversation content may further include common sense information.

Based on this, the dialog information carried by the t-th round of sample dialog content is relatively complex, so in order to ensure that the importance of the information of the t-th round of sample dialog content in the reply dialog scoring process can be accurately represented, an embodiment of the present application provides an implementation manner for obtaining the importance of the information of the t-th round of sample dialog content, which is specifically: and determining the information importance of the t-th round sample conversation contents in the conversation training samples according to the unique information of the t-th round sample conversation contents in the conversation training samples.

The unique information of the sample conversation contents of the t-th round refers to the conversation information which is carried by the sample conversation contents of the t-th round and is different from the conversation contents of other rounds. In addition, the unique information of the sample conversation content of the t-th round is not limited in the embodiments of the present application, for example, in some cases, the unique information of the sample conversation content of the t-th round may refer to remaining information obtained after removing the common sense information and/or the matting information (i.e., information carried by other sample conversation contents occurring later than the sample conversation content of the t-th round) from the sample conversation content of the t-th round.

In fact, because the unique information of the t-th sample conversation content is difficult to accurately and comprehensively express through explicit characters, the process of extracting the unique information of the t-th sample conversation content is relatively complex, and the process of determining the information importance of the t-th sample conversation content in the conversation training samples is relatively complex.

In order to simplify the determination process of the information importance, the embodiment of the present application further provides an implementation manner of calculating the information importance of the tth round of sample dialog content, which specifically includes steps 11 to 14:

step 11: and determining the prediction content of the t-th round of conversation according to the conversation contents of the t + 1-th round of sample to the M-th round of sample in the conversation training samples.

The tth wheel conversation prediction content is obtained by performing reverse conversation prediction on the t +1 th wheel sample conversation content to the mth wheel sample conversation content in the conversation training sample, and the tth wheel conversation prediction content can describe common sense information and/or matting information carried by the tth wheel sample conversation content in the conversation training sample (i.e., part of information carried by other wheel sample conversation contents occurring later than the tth wheel sample conversation content). As can be seen, the unique information of the sample conversation content of the tth round may be an information difference between the sample conversation content of the tth round and the predicted conversation content of the tth round.

In addition, the embodiment of the present application does not limit the obtaining process of the tth wheel conversation prediction content, for example, the obtaining process of the tth wheel conversation prediction content may specifically be: inputting the t +1 th round sample conversation content to the Mth round sample conversation content in the conversation training samples into a pre-constructed reverse generation type conversation model to obtain the t-th round conversation prediction content output by the reverse generation type conversation model.

The reverse generative model is a generative model for predicting dialog contents in reverse, and the reverse generative model can predict dialog contents with an earlier occurrence time according to the dialog contents with a later occurrence time.

In addition, the reverse generation type dialogue model may be constructed in advance, and the construction process may specifically be: first training data is generated according to the first training dialogue corpus, and then the reverse generation type dialogue model is trained by utilizing the first training data.

The first training corpus is a corpus used to generate training data to be used in training the backward-generated dialogue model. In addition, the embodiment of the present application does not limit the obtaining manner of the first training dialogue corpus, for example, the first training dialogue corpus may be a dialogue corpus already stored in the man-machine dialogue system, a dialogue corpus crawled from a preset dialogue webpage (such as a microblog, a post bar, and the like), or a dialogue corpus manually written or uploaded. In addition, the embodiment of the present application does not limit the relationship between the first training dialogue corpus and the dialogue training samples, and the first training dialogue corpus and the dialogue training samples may be the same or different.

The first training data is used to train the reverse-generative dialogue model. In addition, the number of the first training data is not limited in the embodiments of the present application.

The first training data includes first input data and first tag data, and the occurrence time of the dialogue contents in the first input data is later than the occurrence time of the dialogue contents in the first tag data. The first input data is input data that needs to be input into the reverse generative dialogue model when the reverse generative dialogue model is trained. The first label data refers to the dialogue content which can be accurately predicted by the trained reverse generative dialogue model according to the first input data. For example, if the first training data is generated according to the dialog corpus shown in fig. 2, when the first input data in the first training data is the g +1 th round sample dialog content to the M +1 th round sample dialog content, the first tag data in the first training data may be the g th round sample dialog content. Wherein g is a positive integer, and g is less than or equal to M.

It should be noted that, the embodiment of the present application is not limited to the training process of the backward generation type dialogue model, and may be implemented by using any existing or future training method of the backward generation type dialogue model. In addition, the present embodiment is not limited to the model structure of the backward generation type dialogue model, and may be implemented using any model structure of the existing or future backward generation type dialogue model.

Based on the related content of the backward generation type conversation model, the backward generation type conversation model trained based on the first training data can predict the conversation content with the earlier occurrence time according to the conversation content with the later occurrence time, so that the predicted conversation content with the earlier occurrence time can carry common sense information and/or part of information carried by the conversation content with the later occurrence time.

Based on the related content in step 11, in the embodiment of the present application, a reverse prediction may be performed according to the (t + 1) th to mth sample conversation contents in the conversation training samples (for example, a reverse prediction is performed by using a reverse generation conversation model), so as to obtain the tth conversation prediction content, so that the tth conversation prediction content can accurately represent the common sense information and/or the matting information carried by the tth sample conversation content in the conversation training samples (that is, the partial information carried by the other sample conversation contents occurring later than the tth sample conversation content).

Step 12: inputting the t +1 th round sample conversation content to the M th round sample conversation content in the conversation training samples and the t th round conversation prediction content into a pre-constructed forward generation type conversation model to obtain the pseudo generation probability corresponding to the M +1 th round sample conversation content in the conversation training samples.

The forward generative dialogue model is a generative dialogue model for forward predicting dialogue contents, and the forward generative dialogue model can predict dialogue contents with a later occurrence time according to dialogue contents with an earlier occurrence time.

In addition, the forward generating dialogue model may be constructed in advance, and the construction process may specifically be: and generating second training data according to the second training dialogue corpus, and training the forward generating dialogue model by using the second training data.

The second training corpus is a corpus used to generate training data required for training the forward-generating dialogue model. In addition, the embodiment of the present application does not limit the manner of obtaining the second training corpus, for example, the second training corpus may be a corpus already stored in the man-machine conversation system, a corpus crawled from a preset conversation webpage (e.g., microblog, post, etc.), or a manually written or uploaded corpus. In addition, the embodiment of the present application does not limit the relationship among the second training dialogue corpus, and the dialogue training samples, and the three may be the same or different.

The second training data is used to train a forward-generated dialogue model. In addition, the number of the second training data is not limited in the embodiments of the present application.

The second training data includes second input data and second tag data, and the occurrence time of the dialogue content in the second input data is earlier than the occurrence time of the dialogue content in the second tag data. The second input data is input data that needs to be input into the forward-generating dialogue model when the forward-generating dialogue model is trained. The second label data refers to the dialogue content that the trained forward-generating dialogue model can accurately predict according to the second input data. For example, if the second training data is generated according to the dialog corpus shown in fig. 2, when the second input data in the second training data is the g-th to g + f-th sample dialog contents, the second tag data in the second training data may be the g + f + 1-th sample dialog contents. Wherein g is a positive integer, f is an integer, and g + f is less than or equal to M.

The present embodiment is not limited to the training process of the forward generating type dialogue model, and may be implemented by any existing or future training method of the forward generating type dialogue model. The present embodiment is not limited to the model structure of the forward-generating dialogue model, and may be implemented using any model structure of a forward-generating dialogue model that is present or will come into existence in the future.

Based on the related content of the forward generating type dialogue model, the forward generating type dialogue model trained based on the second training data can predict the dialogue content with a later occurrence time according to the dialogue content with an earlier occurrence time.

The pseudo generation probability corresponding to the sample conversation content of the (M + 1) th round refers to the generation probability of the predicted sample conversation content of the (M + 1) th round according to the sample conversation content of the (t + 1) th round to the sample conversation content of the (M) th round and the predicted sample conversation content of the t-th round in the conversation training samples by the forward generation type conversation model; moreover, the pseudo-generation probability corresponding to the sample conversation content of the (M + 1) th round can indicate the information correlation between the predicted content of the t-th round conversation (i.e. the commonsense information and/or the bedding information carried by the conversation content of the sample of the t-th round in the conversation training sample) and the conversation content of the sample of the (M + 1) th round in the conversation training sample.

The present embodiment is not limited to the embodiment in which the forward generating dialogue model calculates the generation probability of the M +1 th sample dialogue content, and for example, in the forward generating dialogue model, the use probabilities of the respective words in the M +1 th sample dialogue content predicted by the forward generating dialogue model may be added to obtain the generation probability of the M +1 th sample dialogue content.

Based on the related content of step 12, after the tth wheel conversation prediction content is obtained, the tth wheel conversation prediction content and the t +1 th to M th sample conversation contents in the conversation training samples are directly input into the forward generating conversation model for forward prediction, and the pseudo generation probability corresponding to the M +1 th sample conversation content in the conversation training samples is obtained, so that the pseudo generation probability can accurately represent the information correlation between the tth wheel conversation prediction content (i.e. the commonsense information and/or matting information carried by the tth wheel sample conversation content in the conversation training samples) and the M +1 th sample conversation content in the conversation training samples, and thus the pseudo generation probability can accurately represent the conversation information carried by the tth wheel conversation prediction content (i.e. the commonsense information and/or matting information carried by the tth wheel sample conversation content in the conversation training samples) The degree of influence embodied in the reply dialog prediction process.

Step 13: inputting the t-th to M-th sample conversation contents in the conversation training samples into a pre-constructed forward generating conversation model to obtain the true generating probability corresponding to the M + 1-th sample conversation content in the conversation training samples.

The true generation probability corresponding to the sample conversation content of the (M + 1) th round is the generation probability of the sample conversation content of the (M + 1) th round predicted by the forward generation type conversation model according to the sample conversation content of the (t) th round to the sample conversation content of the M (M) th round in the conversation training samples, so that the true generation probability corresponding to the sample conversation content of the (M + 1) th round can show the information relevance between the sample conversation content of the t (t) th round and the sample conversation content of the (M + 1) th round in the conversation training samples.

Based on the related content in step 13, the t-th to M-th sample session contents in the session training samples can be directly input into the forward generating session model for forward prediction, so as to obtain the true generation probability corresponding to the M + 1-th sample session contents in the session training samples, so that the true generation probability can accurately represent the information correlation between the t-th sample session contents and the M + 1-th sample session contents in the session training samples, and thus, the true generation probability can accurately represent the influence degree of the session information carried by the t-th sample session contents in the session training samples in the reply session prediction process.

It should be noted that, in the embodiment of the present application, the execution order of step 12 and step 13 is not limited, and step 12 and step 13 may be executed sequentially, step 13 and step 12 may be executed sequentially, or step 12 and step 13 may be executed synchronously.

Step 14: and determining the information importance of the dialogue contents of the sample in the t-th round in the dialogue training sample according to the pseudo generation probability corresponding to the dialogue contents of the sample in the M + 1-th round in the dialogue training sample and the true generation probability corresponding to the dialogue contents of the sample in the M + 1-th round in the dialogue training sample.

The embodiment of step 14 is not limited in the examples of the present application, and for example, step 14 may specifically be: and determining the information importance of the dialogue contents of the sample of the t round in the dialogue training sample according to the difference between the pseudo generation probability corresponding to the dialogue contents of the sample of the M +1 round in the dialogue training sample and the true generation probability corresponding to the dialogue contents of the sample of the M +1 round in the dialogue training sample. As another example, step 14 may specifically be: and determining the information importance of the dialogue contents of the sample of the t-th round in the dialogue training sample according to the ratio of the pseudo-generation probability corresponding to the dialogue contents of the sample of the M +1 round in the dialogue training sample and the true generation probability corresponding to the dialogue contents of the sample of the M +1 round in the dialogue training sample.

As can be seen, in the embodiment of the present application, since the difference information between the tth wheel conversation prediction content and the tth wheel sample conversation content is the unique information of the tth wheel sample conversation content, so that the difference between the pseudo generation probability corresponding to the M +1 th wheel sample conversation content and the true generation probability corresponding to the M +1 th wheel sample conversation content can accurately represent the information correlation between the unique information of the tth wheel sample conversation content and the M +1 th wheel sample conversation content, the information importance of the tth wheel sample conversation content in the conversation training sample can be determined according to the difference (or the ratio) between the pseudo generation probability corresponding to the M +1 th wheel sample conversation content and the true generation probability corresponding to the M +1 th wheel sample conversation content, so that the information importance can accurately represent the influence degree of the conversation information carried by the tth wheel sample conversation content in the reply conversation prediction process .

Based on the related contents of the above steps 11 to 14, in the embodiment of the present application, the influence degree of the dialog information carried by the t-th round of the dialog prediction content in the reply dialog prediction process and the influence degree of the dialog information carried by the t-th round of the dialog content of the sample in the dialog training sample in the reply dialog prediction process may be calculated first, and then the influence degree of the unique information of the t-th round of the dialog content of the sample in the reply dialog prediction process is determined according to the difference between the two, so as to determine the information importance degree of the dialog content of the t-th round of the sample, so that the information importance degree of the dialog content of the t-th round of the sample can accurately represent the influence degree of the dialog information carried by the dialog content of the t-th round of the sample in the reply dialog prediction process.

Based on the related content in S103, after the dialog training sample is obtained, the dialog importance level corresponding to the dialog training sample may be generated according to the sample dialog content in the dialog training sample, so that the dialog importance level may accurately represent the information importance level of each sample dialog content in the dialog training sample, and the reply dialog scoring model may be trained according to the dialog importance level in the following.

It should be noted that the embodiment of the present application does not limit the execution order of S102 and S103, and S102 and S103 may be executed sequentially, S103 and S102 may be executed sequentially, or S102 and S103 may be executed simultaneously.

S104: and training the reply dialogue scoring model according to the model training data and the dialogue importance.

Wherein the reply dialog scoring model is used to calculate a usage score for the candidate reply content in the model training data.

In addition, the embodiment of the present application does not limit the model structure of the reply dialog scoring model, and any existing or future occurrence model structure capable of determining the usage score of the reply dialog (e.g., the candidate reply content described above, the candidate reply dialog described below, etc.) may be adopted. For example, as shown in fig. 5, the reply dialog scoring model may include an input layer, a vector layer, N aggregation layers, and N prediction layers. The input layer is used for outputting input data of the reply dialogue scoring model to the vector layer. The vector layer is used for vectorizing the output data of the input layer. The ith aggregation layer is used for performing aggregation processing according to the output data of the (i-1) th aggregation layer and the output data of the vector layer; and i is a positive integer, i is less than or equal to

And N is added. The ith prediction layer is used for determining the use score of the reply dialog according to the output data of the ith aggregation layer; and i is a positive integer, i is less than or equal to N.

In addition, the embodiment of the present application does not limit the training process of the reply dialog scoring model, and for convenience of understanding, the following description is made with reference to two possible implementations of S104.

In a first possible implementation, S104 may specifically include S104a1-S104a 4:

S104A 1: and obtaining the predicted use scores of the candidate reply contents in the model training data according to the model training data, the dialogue importance and the reply dialogue scoring model.

And the predicted use scores of the candidate reply contents are used for describing the use probability of replying the reply reference contents in the model training data by using the candidate reply contents in the model training data, wherein the use probabilities are predicted by a reply dialogue scoring model. That is, the predicted usage score of the candidate reply content is used to characterize the semantic matching probability between the candidate reply content and its reply reference content in the model training data predicted by the reply dialog scoring model.

To facilitate understanding of S104a1, the following description is made with reference to an example.

As an example, when the reply dialog scoring model includes an input layer, a vector layer, N aggregation layers, and a prediction layer, and N is a positive integer, S104a1 may specifically include S104a11-S104a 14:

S104A 11: and inputting model training data to the reply dialogue scoring model by using the input layer.

In the embodiment of the application, after the model training data is acquired, the model training data can be directly input into the reply dialogue scoring model through the input layer of the reply dialogue scoring model, so that other layers except the input layer in the reply dialogue scoring model can perform data processing on the model training data. For example, when the model training data includes the 1 st round sample dialog content, the 2 nd round sample dialog content, … …, the mth round sample dialog content, and the candidate reply content, the model training data may be input to the reply dialog scoring model using the input layer in the input manner shown in fig. 5.

In the embodiments of the present application, the input mode of the input layer is not limited, and for example, the input layer may input words or phrases as units. In addition, the number of the dialog contents corresponding to the input layer is not limited in the embodiment of the present application, for example, the number of the dialog contents corresponding to the input layer may be preset to be a fixed value M +1 (that is, M rounds of sample dialog contents and 1 candidate reply content). In addition, the dialog content length corresponding to the input layer is not limited in the embodiment of the present application, for example, the dialog content length of each dialog content may be limited to a fixed value L (that is, each dialog content includes L words) in the input layer, and L is a positive integer.

S104A 12: and carrying out vectorization processing on the model training data by using a vector layer to obtain a training dialogue vector.

In the embodiment of the application, for the reply dialogue scoring model, the vector layer can perform vectorization processing on the model training data output by the input layer to obtain the training dialogue vector corresponding to the model training data, so that the training dialogue vector can accurately represent dialogue information carried by the model training data. For example, when the model training data includes the 1 st round sample dialog content, … …, the mth round sample dialog content, and the candidate reply content, the vector layer can perform vectorization processing on the 1 st round sample dialog content, the 2 nd round sample dialog content, … …, the mth round sample dialog content, and the candidate reply content to obtain a training dialog vector, so that the training dialog vector can include the 1 st round sample dialog vector corresponding to the 1 st round sample dialog content, the 2 nd round sample dialog vector corresponding to the 2 nd round sample dialog content, … …, the mth round sample dialog vector corresponding to the mth round sample dialog content, and the candidate reply vector corresponding to the candidate reply content.

It should be noted that the embodiments of the present application do not limit the vectorization process used in the vector layer, for example, the vector layer may use a Word vector conversion method (e.g., Word2vec, etc.), ELMO

(arguments from Language Models), or BERT (bidirectional Encoder retrieval from transformations), etc.

S104A 13: performing aggregation processing on the training dialogue vectors by using the ith aggregation layer and the dialogue importance degree to obtain ith layer overall aggregation data; wherein i is a positive integer, and i is not more than N.

The ith layer of overall aggregated data is used for integrally describing dialogue information carried by the model training data, and the ith layer of overall aggregated data can be obtained by aggregating the ith layer of aggregation.

In some cases, in order to improve the accuracy of the aggregation processing, an embodiment of the present application further provides an implementation manner of obtaining the i-th layer overall aggregation data, which may specifically be: and generating the ith layer of overall aggregated data according to the training dialogue vectors, the dialogue importance and the ith-1 layer of overall aggregated data.

As can be seen, for the reply dialogue scoring model, after the vector layer outputs the training dialogue vectors corresponding to the model training data, the 1 st aggregation layer may aggregate the training dialogue vectors by using the dialogue importance and the 0 th layer overall aggregation data to obtain the 1 st layer overall aggregation data; the 2 nd aggregation layer can aggregate the training dialogue vectors by using the dialogue importance and the 1 st layer overall aggregation data to obtain the 2 nd layer overall aggregation data; … … (and so on); the Nth aggregation layer can perform aggregation processing on the training dialogue vectors by using the dialogue importance and the N-1 th layer of overall aggregation data to obtain the Nth layer of overall aggregation data. Wherein the layer 0 global aggregate data is generated from the training dialogue vectors.

For example, when the training dialogue vectors corresponding to the model training data include the 1 st round sample dialogue vector, the 2 nd round sample dialogue vector, … …, the mth round sample dialogue vector, and the candidate reply vector, the 1 st round sample dialogue vector, the 2 nd round sample dialogue vector, … …, the mth round sample dialogue vector, and the candidate reply vector are vector-summed to obtain the 0 th layer overall aggregated data. For another example, a preset neural network model may be used to aggregate training dialogue vectors corresponding to model training data, so as to obtain layer 0 overall aggregated data. The preset neural network model refers to a preset neural network model for aggregating a plurality of dialogue vectors, and the preset neural network model is not limited in the embodiment of the application.

Actually, when a human beings performs a dialog reply, the human beings preferentially understand the dialog to be replied spoken by the other party (i.e., the dialog content closest to the current time), if the dialog to be replied cannot be completely understood, the human beings further understand the dialog in conjunction with the history dialog closer to the dialog to be replied, and if the dialog to be replied cannot be completely understood, the human beings further understand the dialog in conjunction with the history dialog farther from the dialog to be replied until the human beings can finally understand the meaning expressed by the dialog to be replied spoken by the other party.

It can be seen that to improve the understanding of the dialog to be replied, the reply dialog scoring model may understand the multiple rounds of dialog content in a progressive manner. That is, the points of interest of the network layer in the reply dialog scoring model can gradually migrate from dialog content closer to the dialog reply time point to dialog content further away from the dialog reply time point. Based on this, the present application provides another implementation manner for acquiring the ith layer overall aggregation data, which specifically includes steps 21 to 22:

step 21: the ith layer attention profile is obtained.

The ith layer of attention distribution is used for describing the attention degree of each turn of sample conversation contents in the model training data of the ith polymerization layer, so that the ith layer of attention distribution can accurately represent the layer-by-layer migration of the conversation attention in the reply conversation scoring model. To facilitate understanding of the ith layer attention profile, the following description is made in conjunction with an example.

As an example, when the model training data includes 1 st round sample dialog content, 2 nd round sample dialog content, … …, M th round sample dialog content, the ith layer attention distribution may include the ith layer attention a of the 1 st round sample dialog content_i1And the ith layer attention a of the 2 nd round sample conversation content_i2… …, i-th layer attention a of sample conversation content of M-th round_iM. Wherein, the ith layer attention a of the 1 st round sample dialogue content_i1The system is used for describing the attention degree of the ith aggregation layer to the 1 st round of sample conversation content; i-th layer attention a of 2 nd round sample conversation content_i2The system is used for describing the attention degree of the ith aggregation layer to the 2 nd round sample conversation content; … … (and so on); ith layer attention a of Mth round sample conversation content_iMFor describing the attention degree of the ith aggregation layer to the contents of the mth sample dialog.

In addition, the embodiment of the present application does not limit the calculation method of the ith layer attention distribution, for example, the ith layer attention a of the jth sample dialog content_ijA distribution may be obeyed that satisfies a predetermined constraint, and the predetermined constraint may be: when in use

When a is_ijReaching a maximum value; and, if

And

absolute value of the difference between

The larger, then a_ijThe smaller the value of (c).

In the examples of the present application, a is not limited_ijThe distribution obeyed may be any distribution that satisfies the above-mentioned predetermined constraint. For example, when a_ijWhen following a normal distribution, a_ijThe calculation can be performed using equation (1).

In the formula, a_ijThe ith layer attention of the j sample conversation content; σ is the standard deviation; m is the number of sample conversation contents in the model training data; n is the number of aggregation layers in the reply dialogue scoring model; m +1-j is a time sequencing parameter corresponding to the sample conversation content of the jth round; i is a positive integer, i is not more than N; j is a positive integer, and j is less than or equal to M. It should be noted that the time ordering parameter M + corresponding to the j-th sample session content1-j are used to characterize the time difference between the generation time of the candidate reply content and the generation time distance of the sample dialog content of the j-th round.

As can be seen from the above formula (1), as the number i of aggregation layers increases, the center of gravity of attention of an aggregation layer gradually shifts from a conversation content closer to the conversation reply time point to a conversation content farther from the conversation reply time point. That is, the focus of the lower aggregation layer is focused on the dialog content closer to the dialog reply time point, and the focus of the higher aggregation layer is focused on the dialog content farther from the dialog reply time point. The dialog reply time point refers to the occurrence time point of the reply dialog. For example, for model training data, the dialog reply time point refers to the occurrence time corresponding to the candidate reply content.

Based on the related content in step 21, in order to realize that the reply dialog scoring model understands the dialog contents in multiple rounds in a progressive manner, the attention degree of each sample dialog content in each aggregation layer may be determined according to the number of aggregation layers and the time sequencing parameter corresponding to the sample dialog content.

It should be noted that, in the embodiment of the present application, the execution time of step 21 is not limited, and the execution time may be completed before step 22 is executed.

Step 22: and carrying out aggregation processing on the training conversation vectors according to the attention distribution and the conversation importance of the ith layer to obtain the integral aggregation data of the ith layer.

To facilitate understanding of step 22, the following description is made with reference to an example.

As an example, when the training dialog vector includes the 1 st round sample dialog vector, the 2 nd round sample dialog vector, … …, the mth round sample dialog vector, and the candidate reply vector, and the ith layer attention distribution includes the ith layer attention of the 1 st round sample dialog content to the ith layer attention of the mth round sample dialog content, and the dialog importance includes the information importance of the 1 st round sample dialog content to the information importance of the mth round sample dialog content, the step 22 may specifically include steps 221 to 224:

step 221: and generating the ith layer of dialogue aggregation data of the jth round of sample dialogue content according to the jth round of sample dialogue vector and the word aggregation weight of the jth round of sample dialogue content. Wherein j is a positive integer, and j is less than or equal to M.

The word aggregation weight of the sample conversation content of the jth round is used for describing the influence proportion of each word/word in the sample conversation content of the jth round in the process of the speech level aggregation processing of the sample conversation vector of the jth round. For example, if the sample conversation content of the jth round includes the 1 st word, the 2 nd word, … …, and the lth word, the word aggregation weights of the sample conversation content of the jth round may include the 1 st word weight corresponding to the 1 st word content, the 2 nd word weight corresponding to the 2 nd word content, … …, and the lth word weight corresponding to the lth word content. It should be noted that, in different aggregation layers, the word weights corresponding to one word may be the same or different, and this is not specifically limited in this embodiment of the present application.

In addition, the embodiment of the present application does not limit the calculation method of the ith layer dialogue aggregation data of the jth sample dialogue content, and for convenience of understanding, the following description is made with reference to an example.

As an example, when the word aggregation weight of the sample conversation content in the jth round includes the ith layer word aggregation weight of the sample conversation content in the jth round, i is a positive integer, and i is not greater than N, step 221 may specifically include steps 2211 to 2212:

step 2211: and carrying out ith-layer speech level coding on the jth sample conversation vector to obtain ith-layer coding of the jth sample conversation content.

Wherein, the ith layer speech level coding refers to the speech level coding process implemented by the ith aggregation layer according to the ith layer speech level coding requirement. In addition, the embodiment of the present application is not limited to the implementation of the i-th layer speech level coding.

In the embodiment of the application, for the reply dialog scoring model, after the ith aggregation layer receives the jth round of sample dialog vectors output by the vector layer, the ith aggregation layer performs ith layer speech level coding on the jth round of sample dialog vectors according to the ith layer speech level coding requirement to obtain ith layer coding of jth round of sample dialog contents.

Step 2212: and generating the ith layer dialogue aggregation data of the jth round of sample dialogue content according to the ith layer code of the jth round of sample dialogue content and the ith layer word aggregation weight of the jth round of sample dialogue content.

And the ith layer word aggregation weight of the jth sample conversation content is used for describing the influence proportion of each word/word in the jth sample conversation content in the aggregation treatment process of the ith aggregation layer.

In addition, the embodiment of the present application does not limit the calculation process of the ith layer of dialog aggregation data of the jth round sample dialog content, for example, when the ith layer of encoding of the jth round sample dialog content includes the ith layer of speech level encoding vector of the vth word/word in the jth round sample dialog content, and the ith layer of word aggregation weight of the jth round sample dialog content includes the ith layer of word aggregation weight of the vth word/word in the jth round sample dialog content, and v is a positive integer, v is less than or equal to L, and L is the number of words in the jth round sample dialog content, the ith layer of dialog aggregation data of the jth round sample dialog content may be calculated by using equations (2) - (3).

In the formula of U_ijAggregating data for the ith layer of conversation of the jth round of sample conversation content; i is_ijvEncoding vectors for the ith layer of the ith words/phrases in the jth round of sample conversation content; w is a_ijvAggregating weights for ith layer words of the v characters/words in the jth round of sample conversation content; c_i-1Aggregating data for the i-1 st layer as a whole; attention (I)_ijv,C_i-1) Is a pair I_ijvAnd C_i-1Performing attention processing operation; l is the number of words in the sample conversation content of the jth round; i is a positive integer, i is not more than N; j is a positive integer, and j is less than or equal to M.

It should be noted that the embodiments of the present applicationWithout limiting the implementation of the attention processing operation, for example, the attention processing operation may be an inner product of two vectors such that attention (I)_ijv,C_i-1)＝I_ijv·C_i-1。

Based on the related contents in the steps 2211 to 2212, for the reply dialog score model, after the ith aggregation layer obtains the jth sample dialog vector and the ith-1 layer overall aggregation data output by the ith aggregation layer, the ith layer speech level coding may be performed on the jth sample dialog vector to obtain the ith layer coding of the jth sample dialog content; obtaining the ith layer word aggregation weight of the jth round of sample conversation content according to the ith layer code of the jth round of sample conversation content and the ith-1 layer overall aggregation data; and finally, obtaining the ith layer dialogue aggregation data of the jth round of sample dialogue content according to the ith layer code of the jth round of sample dialogue content and the ith layer word aggregation weight of the jth round of sample dialogue content.

Step 222: and generating an ith layer conversation aggregation weight according to the ith layer attention distribution, the conversation importance, the ith layer conversation aggregation data of the 1 st round of sample conversation content, the ith layer conversation aggregation data of the 2 nd round of sample conversation content, … …, the ith layer conversation aggregation data of the M th round of sample conversation content and the ith layer conversation aggregation data of the M th round of sample conversation content.

And the ith layer conversation aggregation weight is used for describing the importance degree of each round of sample conversation content in the model training data in the ith layer aggregation processing. For example, if the model training data includes 1 st round of sample dialog content, 2 nd round of sample dialog content, … …, and mth round of sample dialog content, the ith layer of dialog aggregation weights may include the ith layer of aggregation weights for the 1 st round of sample dialog content, the ith layer of aggregation weights for the 2 nd round of sample dialog content, … …, and the ith layer of aggregation weights for the mth round of sample dialog content.

In fact, since the i-th layer conversation aggregation weight includes the i-th layer aggregation weight of the sample conversation contents of multiple rounds, and the calculation process of the i-th layer aggregation weight of each round of sample conversation contents is similar, for the convenience of understanding step 222, the following description will take the calculation process of the i-th layer aggregation weight of the sample conversation contents of the j-th round as an example. Wherein j is a positive integer, and j is less than or equal to M.

As an example, when the conversation importance includes the information importance of the sample conversation content of the jth round, and the ith layer attention distribution includes the ith layer attention of the sample conversation content of the jth round, the obtaining process of the ith layer aggregation weight of the sample conversation content of the jth round includes steps 31 to 32:

step 31: and generating the ith layer attention weight of the jth sample conversation content according to the ith layer conversation aggregate data and the (i-1) th layer overall aggregate data of the jth sample conversation content.

In this embodiment of the application, for the ith aggregation layer in the reply dialog scoring model, after the ith layer of dialog aggregation data and the ith-1 layer of overall aggregation data of the jth round of sample dialog content are obtained, the ith layer of attention weight of the jth round of sample dialog content may be obtained by calculation according to the formula (4) and according to the ith layer of dialog aggregation data and the ith-1 layer of overall aggregation data of the jth round of sample dialog content.

In the formula, A_ijAn ith layer attention weight of the sample dialog content for the jth round; m is the number of sample conversation contents in the model training data; u shape_ijAggregating data for the ith layer of conversation of the jth round of sample conversation content; c_i-1Aggregating data for the i-1 st layer as a whole; attention (U)_ij,C_i-1) Is to U pair_ijAnd C_i-1Performing attention processing operation; i is a positive integer, i is not more than N; j is a positive integer, and j is less than or equal to M.

Based on the related content in step 31, for the ith aggregation layer in the reply dialog scoring model, after the ith layer dialog aggregation data and the i-1 layer overall aggregation data of the jth round of sample dialog content are obtained, the ith layer dialog aggregation data U of the jth round of sample dialog content may be subjected to the first aggregation_ijAnd i-1 st layer of bulk aggregated data C_i-1Performing attention processing operation and processing attentionThe operation result is subjected to normalization operation (such as softmax operation) to obtain the attention weight A of the ith layer of the j-th round of sample conversation content_ijSo that the A can be subsequently utilized_ijAnd calculating the ith layer aggregation weight of the j sample conversation content.

Step 32: and determining the ith layer aggregation weight of the jth sample conversation content according to the product of the attention weight of the jth sample conversation content, the information importance of the jth sample conversation content and the ith layer attention of the jth sample conversation content.

To facilitate an understanding of step 32, the following description is made in conjunction with two examples.

Example 1, step 32 may specifically be: and (5) determining the product of the attention weight of the sample conversation content of the jth round, the information importance of the sample conversation content of the jth round and the ith layer attention of the sample conversation content of the jth round as the ith layer aggregation weight of the sample conversation content of the jth round.

W_ij＝A_ij×a_ij×b_j (5)

In the formula, W_ijAggregating weights for the ith layer of sample conversation content for the jth round; a. the_ijAn ith layer attention weight of the sample dialog content for the jth round; a is_ijThe ith layer attention of the j sample conversation content; b_jThe information importance of the sample conversation content of the jth round; i is a positive integer, i is not more than N; j is a positive integer, and j is less than or equal to M.

Example 2, step 32 may specifically be: and (3) determining the product of the attention weight of the sample conversation content of the j-th round, the information importance of the sample conversation content of the j-th round, the i-th layer attention of the sample conversation content of the j-th round and the i-th layer correction weight of the sample conversation content of the j-th round as the i-th layer aggregation weight of the sample conversation content of the j-th round as shown in formula (6).

W_ij＝A_ij×a_ij×b_j×E_ij (6)

In the formula, W_ijAggregating weights for the ith layer of sample conversation content for the jth round; a. the_ijAn ith layer attention weight of the sample dialog content for the jth round; a is_ijThe ith layer attention of the j sample conversation content; b_jThe information importance of the sample conversation content of the jth round; e_ijCorrecting the weight for the ith layer of the j sample conversation content; i is a positive integer, i is not more than N; j is a positive integer, and j is less than or equal to M.

It should be noted that, in the embodiment of the present application, the ith layer correction weight E of the jth sample dialog content is not limited_ij，E_ijCan be set according to practical application scenes. For example, in some cases, E_ijMay be equal to the qth round of comprehension of the jth sample dialog content described below.

Based on the related content in step 32, for the ith aggregation layer in the reply dialog scoring model, after obtaining the attention weight of the sample dialog content in the jth round, the information importance of the sample dialog content in the jth round, and the ith layer attention of the sample dialog content in the jth round, the ith aggregation weight of the sample dialog content in the jth round can be calculated by using formula (5) or formula (6).

Based on the related contents in the steps 31 to 32, for the ith aggregation layer in the reply dialog score model, after the ith layer of dialog aggregation data, the (i-1) th layer of overall aggregation data, the information importance of the jth layer of sample dialog content, and the ith layer of attention of the jth layer of sample dialog content of the jth round are obtained, the ith layer of attention weight of the jth layer of sample dialog content may be obtained by calculation according to the ith layer of dialog aggregation data and the (i-1) th layer of overall aggregation data of the jth layer of sample dialog content; and then, calculating the ith layer aggregation weight of the jth round of sample conversation content according to the ith layer attention weight, the information importance and the ith layer attention of the jth round of sample conversation content, so that the ith layer overall aggregation data can be calculated by using the ith layer aggregation weight of the jth round of sample conversation content in the following process. Wherein j is a positive integer, and j is less than or equal to M.

It should be noted that, in the embodiment of the present application, the i-th layer aggregation weight of any sample dialog content can be calculated by using the above steps 31 to 32, and for the sake of brevity, details are not described here again.

Step 223: and carrying out weighted summation on the ith layer of conversation aggregation data of the 1 st round of sample conversation content to the ith layer of conversation aggregation data of the M th round of sample conversation content according to the ith layer of conversation aggregation weight to obtain ith layer of historical aggregation data.

The ith layer of historical aggregated data refers to data aggregated by reply reference content in the model training data.

To facilitate understanding of step 223, the following description is made with reference to an example.

As an example, when the ith layer conversation aggregate weight includes the ith layer aggregate weight of the 1 st round of sample conversation content, the ith layer aggregate weight of the 2 nd round of sample conversation content, … …, and the ith layer aggregate weight of the mth round of sample conversation content, the ith layer historical aggregate data may be calculated by using formula (7).

In the formula u_iAggregating data for the ith layer history; w_ijAggregating weights for the ith layer of sample conversation content for the jth round; u shape_ijAggregating data for the ith layer of conversation of the jth round of sample conversation content; m is the number of sample conversation contents in the model training data; i is a positive integer, and i is not more than N.

Step 224: and obtaining the ith layer of overall aggregated data according to the ith layer of historical aggregated data and the candidate reply vectors.

The embodiment of step 224 is not limited in the present application, and for convenience of understanding, the following description is made in conjunction with one possible embodiment.

In a possible implementation manner, step 224 may specifically include step 2241-step 2242:

step 2241: and generating the ith layer of dialogue aggregation data of the candidate reply content according to the candidate reply vector and the word aggregation weight of the candidate reply content.

The word aggregation weight of the candidate reply content is used for describing the influence proportion of each word/word in the candidate reply content in the process of the utterance level aggregation processing of the candidate reply vector.

It should be noted that the process of generating the i-th layer conversation group data of the candidate reply content is similar to the process of generating the i-th layer conversation group data of the j-th sample conversation content, and for the sake of brevity, the description is omitted here.

Step 2242: and generating the ith layer of overall aggregated data according to the ith layer of historical aggregated data and the ith layer of dialogue aggregated data of the candidate reply content.

It should be noted that the embodiment of the present application does not limit the generation process of the i-th layer overall aggregated data in step 2242. For example, the ith layer of history aggregation data and the ith layer of dialogue aggregation data of the candidate reply content may be spliced to obtain the ith layer of overall aggregation data. For another example, the ith layer of history aggregation data and the ith layer of dialogue aggregation data of the candidate reply content may be vector-summed to obtain the ith layer of overall aggregation data.

Based on the related content of step 224, for the ith aggregation layer in the reply dialog scoring model, after the ith layer of historical aggregation data and the candidate reply vector are obtained, the ith layer of overall aggregation data may be generated according to the ith layer of historical aggregation data and the candidate reply vector, so that the ith layer of overall aggregation data may integrally represent information carried in the dialog process simulated by the model training data.

Based on the related contents of the above steps 21 to 22, for the reply dialog scoring model, the attention points of different aggregation layers can be adjusted by means of the ith layer attention distribution, so that the reply dialog scoring model can understand the contents of multiple rounds of sample dialogs in a progressive manner, which is beneficial to improving the scoring performance of the reply dialog scoring model.

Based on the related content of S104a13, for the reply dialog score model, after the ith aggregation layer acquires the training dialog vector output by the vector layer, the ith aggregation layer may perform aggregation processing on the training dialog vector according to the dialog importance degree to obtain the ith layer overall aggregated data.

It should be noted that in the reply dialog scoring model, any aggregation layer may perform aggregation processing by using the above-mentioned S104a13, and for the sake of brevity, details are not described here again.

S104A 14: and performing prediction processing on the overall aggregation data from the layer 1 to the layer N by using the prediction layer to obtain a prediction use score of the candidate reply content.

The predicted usage score of the candidate reply content refers to the probability of replying to reply reference content in the model training data by using the candidate reply content, which is predicted by the reply dialogue scoring model.

In addition, the embodiment of the present application does not limit the manner of obtaining the predicted usage score of the candidate reply content (i.e., S104a14), and for convenience of understanding, the following description is made with reference to an example.

As an example, when the reply dialog scoring model includes N prediction layers, S104a14 may specifically be: performing prediction processing on the 1 st layer overall aggregation data by using the 1 st prediction layer to obtain the 1 st layer prediction probability of the candidate reply content; predicting the 2 nd layer overall aggregated data by using the 2 nd prediction layer to obtain the 2 nd layer prediction probability of the candidate reply content; … … (and so on); predicting the N-th layer overall aggregated data by using the N-th prediction layer to obtain the N-th layer prediction probability of the candidate reply content; and determining the predicted use score of the candidate reply content according to the layer 1 predicted probability of the candidate reply content to the layer N predicted probability of the candidate reply content.

Based on the related content of the step S104a1, after the model training data and the dialog importance are obtained, the model training data may be predicted by using the reply dialog score model and the dialog importance of the current round, so as to obtain the predicted usage score of the candidate reply content in the model training data, and then the scoring performance of the reply dialog score model of the current round may be determined according to the predicted usage score of the candidate reply content.

S104A 2: judging whether a preset stop condition is reached, if so, executing S104A 4; if not, S104a3 is executed.

The preset stopping condition refers to a preset constraint condition which is required to be reached when the reply dialogue scoring model is stopped to be trained. In addition, the preset stop condition is not limited in the embodiments of the present application, for example, the preset stop condition may be that a difference between a predicted usage score of the candidate reply content in the model training data and an actual usage score of the candidate reply content in the model training data is smaller than a first threshold. As another example, the preset stop condition may be that the scoring result of the reply dialog scoring model converges. For example, the preset stop condition may be that the number of updates of the reply dialog scoring model reaches a second threshold.

In the embodiment of the application, after the reply dialogue scoring model of the current round is used for determining the predicted use score of the candidate reply content in the model training data, whether the reply dialogue scoring model of the current round reaches the preset stop condition needs to be judged, if so, the scoring performance of the reply dialogue scoring model can be determined to be better, so that the training process of the reply dialogue scoring model can be finished, and the trained reply dialogue scoring model is stored or used; if not, the scoring performance of the reply dialogue scoring model can be determined to be poor, so that the reply dialogue scoring model can be updated, and the updated reply dialogue scoring model has better scoring performance.

S104A 3: updating the reply dialog score model according to the predicted usage score of the candidate reply content and the actual usage score of the candidate reply content, and returning to execute S104A 1.

Wherein, the actual usage score of the candidate reply content refers to the actual usage score of the candidate reply content in the model training data. For ease of understanding, the following description is made with reference to examples.

As an example, when the model training data includes 1 st to M th sample dialog contents in the dialog training sample, if the candidate reply content is the M +1 st sample dialog content (i.e., the normal reply content) in the dialog training sample, the actual usage score of the candidate reply content may be 1; if the candidate reply content is completely different dialog content (i.e., negative example reply content) from the M +1 th sample dialog content in the dialog training sample, the actual usage score of the candidate reply content may be 0.

Based on the related content of S104a3, when it is determined that the reply dialog score model of the current round does not reach the preset stop condition, the reply dialog score model may be updated according to a difference between the predicted usage score of the candidate reply content and the actual usage score of the candidate reply content, so that the updated reply dialog score model has better score performance.

It should be noted that, the embodiment of the present application is not limited to the updating process of the reply dialog score model, and the updating process may be performed by using any existing or future updating method of the reply dialog score model.

S104A 4: the training process of the reply dialog scoring model is ended.

Based on the related content of the first possible implementation manner of the S104, after the model training data and the dialogue importance are obtained, the reply dialogue scoring model may be trained by using the model training data and the dialogue importance, so that the trained reply dialogue scoring model can accurately determine the usage score of the candidate reply dialogue, and thus the usage score of the candidate reply dialogue can more accurately predict the semantic matching probability between the candidate reply dialogue and the sample dialogue, and thus the scoring accuracy of the reply dialogue scoring model can be improved.

In some cases, to improve the training efficiency of the reply dialog scoring model, the focus of the reply dialog scoring model may be controlled to adjust as the reply dialog scoring model is trained. Based on this, the present application provides a second possible implementation manner of S104, which may specifically include S104B1-S104B 3:

S104B 1: and generating a pre-training model according to the reply dialogue scoring model.

In the embodiment of the application, the reply dialog scoring model can be trained in a secondary training mode, so that the second training process of the reply dialog scoring model can be implemented according to the prediction loss of the reply dialog scoring model trained for the first time. In order to accurately execute the two training processes of the reply dialogue scoring model, a pre-training model can be generated according to the reply dialogue scoring model, so that the pre-training model as the reply dialogue scoring model undergoes the first training process of the reply dialogue scoring model.

S104B 2: and training the pre-training model according to the model training data, the dialogue importance and the preset model comprehension, and determining the predicted loss value of the pre-training model as a model reference loss value when a first training stopping condition is reached.

The preset model understanding force is used for describing the understanding capacity of the preset pre-training model on the multi-turn dialogue interaction process in the model training data. For example, the preset model understanding force may be 1.

The first training stop condition is a constraint condition set in advance for stopping the training process of the pre-training model. In addition, the embodiment of the present application does not limit the relationship between the first training stop condition and the above preset stop condition, for example, the first training stop condition may be the same as the above preset stop condition.

The model reference loss value refers to a prediction loss value generated when the trained pre-training model performs the usage score prediction. In addition, the calculation process of the model reference loss value is not limited in the embodiment of the present application, and for example, the calculation process may be performed according to the predicted usage score of the candidate reply content and the actual usage score of the candidate reply content, which are predicted in the last round of training process of the pre-trained model.

In addition, the embodiment of the present application does not limit the training process of the pre-training model, and may be implemented by using any implementation manner of the above-provided training reply dialogue scoring model. Note that, if the above equation (6) is used in the prediction process of the pre-training model, the i-th layer correction weight E of the j-th sample dialogue content in equation (6)_ijForce determination can be understood from a preset model (e.g., straightDetermining the preset model understanding force as the ith layer correction weight E of the jth sample dialogue content in the formula (6)_ij)。

Based on the above-mentioned related contents of S104B2, for the training process of the pre-training model (i.e., the first training process of replying to the dialogue score model), the pre-training model may be trained by using the model training data, the dialogue importance and the preset model comprehension, so as to obtain the trained pre-training model and the model reference loss value thereof.

S104B 3: and training the reply dialogue scoring model according to the model training data, the dialogue importance and the model reference loss value.

In this embodiment of the application, for the second training process of replying to the dialogue scoring model, the model understanding power corresponding to each round of training process may be determined according to the model reference loss value, and each round of training process in the second training process of replying to the dialogue scoring model may be implemented according to the model understanding power, the model training data, and the dialogue importance degree corresponding to each round of training process. To facilitate understanding of the second training process of the reply dialog scoring model, the following description is made with reference to an example.

As an example, when the second training of the reply dialog scoring model includes K rounds of training processes, and K is a positive integer, S104B3 may specifically include S104B31-S104B 34:

S104B 31: and training the reply dialogue scoring model according to the model training data, the dialogue importance and the preset model comprehension corresponding to the 1 st round of training process to obtain the 1 st round of updated reply dialogue scoring model and the model prediction loss value corresponding to the 1 st round of training process.

The model understanding power corresponding to the 1 st round of training process may be preset, for example, the model understanding power corresponding to the 1 st round of training process may be set to 1.

The updated reply dialog score model of the 1 st round refers to the updated reply dialog score model in the 1 st round of training.

The model predicted loss value corresponding to the 1 st round of training process is a model predicted loss value of the reply dialogue scoring model used in the 1 st round of training process (i.e., the reply dialogue scoring model that is not updated in the 1 st round).

S104B 32: and training the 1 st round updated reply dialogue scoring model according to the model training data, the dialogue importance and the model comprehension corresponding to the 2 nd round training process to obtain the 2 nd round updated reply dialogue scoring model and the model prediction loss value corresponding to the 2 nd round training process.

Wherein, the model comprehension corresponding to the 2 nd round of training process is determined according to the model reference loss value and the model prediction loss value corresponding to the 1 st round of training process.

The model prediction loss value corresponding to the 2 nd round training process is the model prediction loss value of the reply dialogue scoring model used in the 2 nd round training process (i.e., the reply dialogue scoring model that has undergone the 1 st round update and has not undergone the 2 nd round update).

S104B 33: and training the reply dialogue scoring model after the 2 nd round of updating according to the model training data, the dialogue importance and the model comprehension corresponding to the 3 rd round of training process to obtain the reply dialogue scoring model after the 3 rd round of updating and the model prediction loss value corresponding to the 3 rd round of training process.

… … (analogize with the former)

S104B 34: and training the reply dialogue scoring model after the K-1 round of updating according to the model training data, the dialogue importance and the model comprehension corresponding to the K round of training to obtain the reply dialogue scoring model after the K round of updating.

Based on the above-mentioned related contents of S104B31-S104B34, for the second training of the reply dialog scoring model, after determining the model comprehension corresponding to the k +1 th round of training process according to the model reference loss value and the model prediction loss value corresponding to the k round of training process, according to the model training data, the dialogue importance and the model comprehension corresponding to the k +1 th round of training process, training the reply dialogue scoring model after the k round of updating to obtain the reply dialogue scoring model after the k +1 round of updating and a model prediction loss value corresponding to the k +1 round of training process, so that the model predictive loss value corresponding to the (k + 1) th training process can be continuously utilized to determine the model comprehension corresponding to the (k + 2) th training process in the following, and training the updated reply dialogue scoring model of the (k + 1) th round by using the model comprehension corresponding to the (k + 2) th round of training process. Wherein k is a positive integer.

Therefore, in the second training process of the reply dialog score model, the model comprehension can be continuously adjusted along with the change of the score performance of the reply dialog score model, so that the reply dialog score model can quickly and accurately achieve convergence.

It should be noted that, for the second training process of the reply dialogue scoring model, each training process includes a prediction process using scoring and an update process of the reply dialogue scoring model. In addition, the present embodiment does not limit the prediction process of the usage score in each round of training, and may be implemented by any of the embodiments of the prediction usage score for predicting candidate reply content provided above, and if the above formula (6) is used in the prediction process, the ith layer correction weight E of the jth sample dialog content in the formula (6) is used_ijThe qth round understanding force determination of the jth sample dialog content may be based on (e.g., directly determining the qth round understanding force of the jth sample dialog content as the ith layer correction weight E of the jth sample dialog content in equation (6))_ij) (ii) a Wherein q is a positive integer and is not more than K.

In addition, the embodiment of the present application further provides an acquisition process of the understanding force corresponding to each sample dialog content in different training rounds, and for convenience of understanding, the following description takes an acquisition process of the (k + 1) th round of the understanding force of the jth sample dialog content as an example.

As an example, when the reply dialog scoring model includes R prediction layers, the model prediction loss values corresponding to the k-th training process include the k-th prediction loss value of the 1 st prediction layer to the k-th prediction loss value of the R-th prediction layer, the model reference loss values include the reference loss values of the 1 st prediction layer to the R-th prediction layer, the model understanding force corresponding to the k + 1-th training process includes the k + 1-th understanding force of the jth sample dialog content, j is a positive integer, j is less than or equal to M, and R is a positive integer, the k + 1-th understanding force obtaining process of the jth sample dialog content includes steps 41 to 45:

step 41: and obtaining the k-th round prediction performance parameter of the r-th prediction layer by making a difference between the k-th round prediction loss value of the r-th prediction layer and the reference loss value of the r-th prediction layer. Wherein R is a positive integer and R is not more than R.

Step 42: and adding the kth round prediction performance parameter of the 1 st prediction layer to the kth round prediction performance parameter of the R th prediction layer to obtain the kth round prediction performance parameter of the reply dialogue scoring model.

Step 43: and determining the integral comprehension corresponding to the jth sample conversation content according to the kth predicted performance parameter of the reply conversation scoring model and the time parameter of the jth sample conversation content. The time parameter of the content of the jth sample conversation is used for describing the influence generated by the occurrence time of the jth sample conversation.

Step 44: and determining the local comprehension corresponding to the j-th sample conversation content according to the k-th predicted performance parameter of the reply conversation scoring model and the k-th predicted performance parameter of the j-th sample conversation attention layer. Wherein, the jth sample dialog attention layer refers to a network layer (e.g., an aggregation layer and/or a prediction layer) in which the attention point focuses on the jth sample.

It should be noted that the above "network layer whose attention point is focused on the jth sample" may refer to a network layer whose attention point for the jth sample is higher than a preset attention threshold, or may refer to a network layer whose attention point for the jth sample reaches a highest value.

Step 45: and determining the (k + 1) th round understanding force of the jth sample conversation content according to the overall understanding force of the jth sample conversation content and the local understanding force of the jth sample conversation content.

It should be noted that, the number R of prediction layers in the reply dialog scoring model is not limited in the embodiments of the present application, for example, if the reply dialog scoring model includes N aggregation layers, the number R of prediction layers in the reply dialog scoring model may be N.

Based on the above-mentioned related contents in steps 41 to 45, when the reply dialog score model includes N aggregation layers and N prediction layers, and the r-th prediction layer is used to predict the i-th layer overall aggregation data output by the r-th aggregation layer, the (k + 1) -th round understanding force of the j-th sample dialog content may be calculated by using formula (8).

In the formula, c_j(k+1)(ii) a k +1 th round of comprehension for the jth sample dialog content; loss_krPredicting a loss value for the kth round of the r prediction layer; loss_∞rA reference loss value for an r-th prediction layer; loss_0rThe predicted loss value of the r-th prediction layer in the most original reply dialogue scoring model (namely, the reply dialogue scoring model which is not subjected to any training);

time parameter of conversation content of j sample;

converse the sequence number of the layer of interest for the jth sample, and

is a positive integer; j is a positive integer, and j is less than or equal to M.

Based on the related content of S104B3, when the model training data, the dialogue importance and the model reference loss value are obtained, the reply dialogue scoring model may be trained for the second time by using the model training data, the dialogue importance and the model reference loss value, so that the scoring performance of the trained reply dialogue scoring model is better.

Based on the related content of the second possible implementation manner of the above S104, after the model training data is obtained, the reply dialog score model may be trained in a secondary training manner, so that the score performance of the trained reply dialog score model is better.

Based on the related contents of the above S101 to S104, in the reply dialog scoring model training method provided in the present application, when a dialog training sample is obtained, first, according to the dialog training sample, model training data and a dialog importance level are generated, so that the dialog importance level can be used to describe the information importance level of the sample dialog content in the dialog training sample; and training the reply dialogue scoring model according to the model training data and the dialogue importance. The information importance degree of the sample conversation contents in the conversation training sample can be accurately described according to the conversation importance degree, so that the information importance degree difference between different sample conversation contents is referred when the reply conversation scoring model is trained on the basis of the conversation importance degree, the reply conversation scoring model can more accurately and comprehensively understand the conversation contents, the scoring accuracy of the reply conversation scoring model can be improved, the accuracy of the target reply conversation determined on the basis of the reply conversation scoring model can be improved, and accurate reply can be favorably realized for the conversation contents input by a user.

Based on the reply dialogue scoring model training method provided by the method embodiment, the embodiment of the application also provides a dialogue reply method, which is described below with reference to the accompanying drawings.

Method embodiment two

Referring to fig. 6, it is a flowchart of a dialog reply method according to an embodiment of the present application.

The dialog reply method provided by the embodiment of the application comprises S601-S604:

s601: and acquiring historical conversation content corresponding to the target user.

The target user refers to a user of the man-machine dialog system.

The historical dialogue content refers to the dialogue which is generated between the target user and the man-machine dialogue system in one dialogue process. For example, when the sample dialog contents of the 1 st round to the M th round described in fig. 2 have been generated between the target user and the human-computer dialog system, the historical dialog contents corresponding to the target user may include the sample dialog contents of the 1 st round to the M th round.

S602: and generating candidate reply dialogs corresponding to the target user according to the historical dialog contents corresponding to the target user.

The candidate reply dialog corresponding to the target user refers to reply content acquired according to historical dialog content corresponding to the target user; in addition, the number of candidate reply dialogs corresponding to the target user is not limited in the embodiment of the application.

In the embodiment of the application, after the history conversation content corresponding to the target user is obtained, at least one candidate reply conversation may be generated according to the candidate reply conversation corresponding to the target user, so that the final reply content that can be fed back to the target user can be selected from the at least one candidate reply conversation in the following.

It should be noted that, in the embodiment of the present application, the generation process of the candidate reply dialog corresponding to the target user is not limited, and any existing or future method that can generate the candidate reply dialog corresponding to the target user may be used for implementation.

S603: and inputting the history conversation content corresponding to the target user and the candidate reply conversation corresponding to the target user into a reply conversation scoring model to obtain the use score of the candidate reply conversation output by the reply conversation scoring model.

Wherein the reply dialog scoring model is configured to determine, from the model input data, a usage score for the dialog content in the model input data that serves as the reply dialog role. In addition, the reply dialog score model can be obtained by training according to any one of the above-mentioned embodiments of the reply dialog score model training method.

Note that, if the formula (5) or (6) is used in the prediction process of the reply dialog score model, b in the formula (5) or (6)_jCan be directly set to the first weight value (e.g. 1), and E in equation (6)_ijOr may be directly set to the second weight value (e.g., 1).

Based on the above S603, in this embodiment of the application, after the history dialog content corresponding to the target user and the candidate reply dialog thereof are obtained, the history dialog content corresponding to the target user and the candidate reply dialog thereof may be input to the reply dialog score model, so as to obtain the usage score of the candidate reply dialog output by the reply dialog score model, so that the final reply content that can be fed back to the target user can be determined based on the usage score of the candidate reply dialog in the following.

S604: and determining the target reply dialog corresponding to the target user according to the use score of the candidate reply dialog.

In the embodiment of the application, after the usage score of the candidate reply dialog is obtained, the target reply dialog corresponding to the target user can be determined according to the usage score of the candidate reply dialog. For example, when the target user corresponds to T candidate reply dialogs, the usage score of the 1 st candidate reply dialog may be compared with the usage score of the T th candidate reply dialog, and then the candidate reply dialog corresponding to the largest usage score may be determined as the target reply dialog corresponding to the target user.

Based on the related content of the above S601 to S604, after the history dialog content corresponding to the target user is obtained, firstly, according to the history dialog content corresponding to the target user, a candidate reply dialog corresponding to the target user is generated, and the history dialog content corresponding to the target user and the candidate reply dialog thereof are input into a reply dialog score model, so as to obtain a usage score of the candidate reply dialog output by the reply dialog score model; and determining the target reply dialog corresponding to the target user according to the use score of the candidate reply dialog. The grading performance of the reply dialogue grading model is good, so that the reply dialogue grading model can accurately determine the use grade of the candidate reply dialogue, the target reply dialogue determined based on the use grade of the candidate reply dialogue is more accurate, and the human-computer dialogue experience of a user is improved.

In addition, the embodiment of the present application is not limited to the execution subject of the dialog reply method, and for example, the dialog reply method provided by the embodiment of the present application may be applied to a data processing device such as a terminal device or a server. The terminal device may be a smart phone, a computer, a Personal Digital Assistant (PDA), a tablet computer, or the like. The server may be a stand-alone server, a cluster server, or a cloud server.

In order to facilitate understanding of the technical solutions provided in the embodiments of the present application, an application scenario of the dialog reply method provided in the embodiments of the present application is exemplarily described below with reference to fig. 7 and fig. 8, respectively. Fig. 7 is an application scenario diagram of a dialog reply method applied to a terminal device according to an embodiment of the present application; fig. 8 is a schematic application scenario diagram of a dialog reply method applied to a server according to an embodiment of the present application.

In the application scenario shown in fig. 7, when the target user 701 triggers a dialog reply request on the terminal device 702, the terminal device 702 receives the dialog reply request, and performs a dialog reply to the target user 701 by executing the dialog reply method provided in the embodiment of the present application.

In the application scenario shown in fig. 8, when the target user 801 triggers a dialog reply request on the terminal device 802, the terminal device 802 receives the dialog reply request and forwards the dialog reply request to the server 803, so that the server 803 performs a dialog reply to the target user 801 by executing the dialog reply method provided by the embodiment of the present application.

Taking the process of fig. 8 as an example, the process of the server 803 performing a dialog reply to the target user 801 may specifically be: the server 803 first obtains the history conversation content corresponding to the target user 801, and generates a candidate reply conversation corresponding to the target user 801 according to the history conversation content; then, the history conversation content and the candidate reply conversations are input into a reply conversation score model to obtain usage scores of the candidate reply conversations output by the reply conversation score model, and a target reply conversation corresponding to the target user 801 is determined according to the usage scores of the candidate reply conversations so as to send the target reply conversation to the terminal device 802, so that the terminal device 802 can feed the target reply conversation back to the target user 801 through a preset mode (such as character display, voice broadcast and the like).

It should be noted that the dialog reply method provided in the embodiment of the present application can be applied to not only the application scenario shown in fig. 7 or fig. 8, but also other application scenarios that require dialog reply, and the embodiment of the present application is not particularly limited to this.

Based on the reply dialog scoring model training method provided by the above method embodiment, the embodiment of the present application further provides a reply dialog scoring model training device, which is explained and explained below with reference to the accompanying drawings.

Apparatus embodiment one

The device embodiment introduces a reply dialog scoring model training device, and please refer to the method embodiment above for related content.

Referring to fig. 9, the figure is a schematic structural diagram of a reply dialog scoring model training device according to an embodiment of the present application.

The training device 900 for the reply dialog scoring model provided by the embodiment of the present application includes:

a sample obtaining unit 901, configured to obtain a session training sample; wherein the dialogue training samples comprise M +1 round sample dialogue contents; m is a positive integer;

a data generating unit 902, configured to generate model training data and a dialog importance level according to the dialog training sample; the dialogue importance is used for describing the information importance of the sample dialogue contents in the dialogue training sample;

and the model training unit 903 is used for training the reply dialogue scoring model according to the model training data and the dialogue importance.

In a possible implementation manner, in order to improve the prediction accuracy of the reply dialog scoring model, the data generating unit 902 includes:

the first generation subunit is used for generating reply reference contents according to the 1 st round sample conversation contents to the M th round sample conversation contents in the conversation training samples;

the first obtaining subunit is configured to obtain candidate reply content corresponding to the reply reference content;

and the second generation subunit is used for generating model training data according to the reply reference content and the candidate reply content corresponding to the reply reference content.

In one possible implementation manner, in order to improve the prediction accuracy of the reply dialogue scoring model, the candidate reply contents comprise positive example reply contents and/or negative example reply contents;

the first obtaining subunit is specifically configured to:

generating regular example reply content corresponding to the reply reference content according to the (M + 1) th round of sample conversation content in the conversation training sample;

and/or the presence of a gas in the gas,

and generating negative example reply content corresponding to the reply reference content according to a preset dialogue corpus.

the first determining subunit is configured to determine, when the model training data includes tth sample session contents in the session training samples, and the session importance includes information importance of the tth sample session contents, and t is a positive integer and is less than or equal to M, information importance of the tth sample session contents in the session training samples according to unique information of the tth sample session contents in the session training samples.

In a possible implementation manner, in order to improve the prediction accuracy of the reply dialog scoring model, the first determining subunit includes:

the second determining subunit is configured to determine a tth wheel conversation prediction content according to the t +1 th wheel sample conversation content to the mth wheel sample conversation content in the conversation training samples; the unique information of the sample conversation content of the tth round is an information difference value of the sample conversation content of the tth round and the predicted conversation content of the tth round;

a third determining subunit, configured to input, to the conversation contents of the t +1 th round sample to the conversation contents of the M th round sample in the conversation training samples, and the conversation prediction content of the t th round into a forward generation type conversation model that is constructed in advance, to obtain a pseudo generation probability corresponding to the conversation contents of the M +1 th round sample in the conversation training samples;

a fourth determining subunit, configured to input pre-constructed forward-generated dialogue models into the t-th to M-th sample dialogue contents in the dialogue training samples, so as to obtain true generation probabilities corresponding to the M + 1-th sample dialogue contents in the dialogue training samples;

and the fifth determining subunit is configured to determine the information importance of the sample dialog content of the t-th round in the dialog training sample according to the pseudo-generation probability corresponding to the dialog content of the sample of the M + 1-th round in the dialog training sample and the true generation probability corresponding to the dialog content of the sample of the M + 1-th round in the dialog training sample.

In a possible implementation manner, in order to improve the prediction accuracy of the reply dialog scoring model, the second determining subunit is specifically configured to:

inputting the t +1 th round sample conversation content to the Mth round sample conversation content in the conversation training samples into a pre-constructed reverse generative conversation model to obtain the t-th round conversation prediction content output by the reverse generative conversation model.

In a possible embodiment, in order to improve the prediction accuracy of the reply dialog scoring model, the model training unit 903 includes:

a sixth determining subunit, configured to, if the model training data includes candidate reply content, obtain a predicted usage score of the candidate reply content according to the model training data, the dialog importance, and the reply dialog score model;

and the model updating subunit is used for updating the reply dialogue scoring model according to the predicted usage score of the candidate reply content and the actual usage score of the candidate reply content, and returning to the sixth determining subunit to continue executing the reply dialogue scoring model according to the model training data, the dialogue importance and the reply dialogue scoring model to obtain the predicted usage score of the candidate reply content until a preset stopping condition is reached.

In one possible embodiment, in order to improve the prediction accuracy of a reply dialog scoring model, where the reply dialog scoring model includes an input layer, a vector layer, N aggregation layers, and a prediction layer, and N is a positive integer, the sixth determining subunit includes:

a data input subunit, configured to input the model training data to the reply dialog scoring model by using the input layer;

the vector extraction subunit is used for carrying out vectorization processing on the model training data by utilizing the vector layer to obtain a training dialogue vector;

the data aggregation subunit is configured to perform aggregation processing on the training dialogue vector by using the ith aggregation layer and the dialogue importance degree to obtain ith layer overall aggregated data; wherein i is a positive integer, and i is not more than N;

and the data prediction subunit is used for performing prediction processing on the 1 st layer overall aggregation data to the Nth layer overall aggregation data by using the prediction layer to obtain the predicted use score of the candidate reply content.

In a possible implementation manner, in order to improve the prediction accuracy of the reply dialogue scoring model, the obtaining process of the i-th layer overall aggregated data is as follows:

generating the ith layer of overall aggregated data according to the training dialogue vectors, the dialogue importance and the ith-1 layer of overall aggregated data; wherein the layer 0 global aggregate data is generated from the training dialog vector.

In a possible implementation manner, in order to improve the prediction accuracy of the reply dialog scoring model, if the model training data includes the dialog contents of the 1 st round sample to the mth round sample in the dialog training samples, the obtaining process of the i-th layer overall aggregated data is as follows:

acquiring the attention distribution of the ith layer; the ith layer attention distribution is used for describing the attention degree of the ith aggregation layer to each round of sample conversation content in the model training data;

and performing aggregation processing on the training dialogue vectors according to the attention distribution of the ith layer and the dialogue importance degree to obtain the integral aggregation data of the ith layer.

In one possible implementation manner, in order to improve the prediction accuracy of the reply dialog scoring model, if the training dialog vector includes the 1 st to mth sample dialog vectors and the candidate reply vector, the aggregating the training dialog vector according to the ith layer attention distribution and the dialog importance degree to obtain the ith layer overall aggregated data includes:

generating ith layer conversation aggregation data of the jth round of sample conversation content according to the jth round of sample conversation vector and the word aggregation weight of the jth round of sample conversation content; wherein j is a positive integer, and j is less than or equal to M;

generating an ith layer conversation aggregation weight according to the ith layer attention distribution, the conversation importance and the ith layer conversation aggregation data of the 1 st round of sample conversation contents to the ith layer conversation aggregation data of the Mth round of sample conversation contents; the ith layer of dialogue aggregation weight is used for describing the importance degree of each round of sample dialogue content in the model training data in the ith layer of aggregation processing;

weighting and summing the ith layer of conversation aggregation data of the 1 st round of sample conversation content to the ith layer of conversation aggregation data of the Mth round of sample conversation content according to the ith layer of conversation aggregation weight to obtain ith layer of historical aggregation data;

and obtaining the ith layer of overall aggregated data according to the ith layer of historical aggregated data and the candidate reply vectors.

In a possible implementation manner, in order to improve the prediction accuracy of the reply dialog scoring model, the dialog importance includes information importance of sample dialog contents of a jth round, the ith layer attention distribution includes ith layer attention of sample dialog contents of a jth round, the ith layer dialog aggregation weight includes ith layer aggregation weight of the sample dialog contents of the jth round, j is a positive integer, j is less than or equal to M, and the ith layer aggregation weight of the sample dialog contents of the jth round is obtained by:

generating an ith layer attention weight of the jth sample conversation content according to the ith layer conversation aggregate data and the (i-1) th layer integral aggregate data of the jth sample conversation content;

and determining the ith layer aggregation weight of the jth round sample conversation content according to the product of the ith layer attention weight of the jth round sample conversation content, the information importance of the jth round sample conversation content and the ith layer attention of the jth round sample conversation content.

the model replication sub-unit is used for generating a pre-training model according to the reply dialogue scoring model;

the primary training subunit is used for training the pre-training model according to the model training data, the dialogue importance and the preset model comprehension, and determining the prediction loss value of the pre-training model as a model reference loss value when a first training stopping condition is reached;

and the retraining subunit is used for training the reply dialogue scoring model according to the model training data, the dialogue importance and the model reference loss value.

In a possible implementation manner, in order to improve the prediction accuracy of the reply dialog scoring model, the retraining subunit is specifically configured to:

training the reply dialogue scoring model according to the model training data, the dialogue importance and the preset model comprehension corresponding to the 1 st round of training process to obtain the 1 st round of updated reply dialogue scoring model and the model prediction loss value corresponding to the 1 st round of training process;

training the response dialogue scoring model after the kth round of updating according to the model training data, the dialogue importance and the model comprehension corresponding to the training process of the kth +1 round to obtain the response dialogue scoring model after the kth +1 round of updating and a model prediction loss value corresponding to the training process of the kth +1 round; determining model comprehension corresponding to the k +1 th round of training process according to the model reference loss value and the model prediction loss value corresponding to the k +1 th round of training process; wherein k is a positive integer.

In one possible embodiment, in order to improve the prediction accuracy of the reply dialog scoring model, when the reply dialog scoring model includes R prediction layers, the model prediction loss value corresponding to the kth training process includes a kth prediction loss value of a1 st prediction layer to a kth prediction loss value of an R th prediction layer, the model reference loss value includes a reference loss value of the 1 st prediction layer to a reference loss value of the R th prediction layer, the model understanding corresponding to the k +1 th training process includes a (k + 1) th understanding of a jth sample dialog content, j is a positive integer, j ≦ M, and R is a positive integer, the (k + 1) th understanding of the jth sample dialog content is obtained by:

the k-th round prediction loss value of the r-th prediction layer is differed with the reference loss value of the r-th prediction layer to obtain k-th round prediction performance parameters of the r-th prediction layer; wherein R is a positive integer, and R is not more than R;

adding the kth round prediction performance parameters of the 1 st prediction layer to the kth round prediction performance parameters of the R th prediction layer to obtain the kth round prediction performance parameters of the reply dialogue scoring model;

determining the integral comprehension corresponding to the jth sample conversation content according to the kth prediction performance parameter of the reply conversation scoring model and the time parameter of the jth sample conversation content;

determining local comprehension corresponding to the jth sample conversation content according to the kth round prediction performance parameter of the reply conversation scoring model and the kth round prediction performance parameter of the jth sample conversation attention layer;

and determining the (k + 1) th round understanding force of the jth sample conversation content according to the overall understanding force of the jth sample conversation content and the local understanding force of the jth sample conversation content.

Based on the dialog reply method provided by the above method embodiment, the embodiment of the present application further provides a dialog reply device, which is explained and explained below with reference to the accompanying drawings.

Device embodiment II

The device embodiment introduces a dialog reply device, and please refer to the method embodiment above for related content.

Referring to fig. 10, the drawing is a schematic structural diagram of a dialog reply device according to an embodiment of the present application.

The dialog reply device 1000 provided in the embodiment of the present application includes:

a dialog obtaining unit 1001 configured to obtain sample dialog content corresponding to a target user;

a reply generation unit 1002, configured to generate a candidate reply dialog corresponding to the target user according to the sample dialog content corresponding to the target user;

a probability prediction unit 1003, configured to input the sample conversation content corresponding to the target user and the candidate reply conversation corresponding to the target user into a reply conversation score model, and obtain a usage score of the candidate reply conversation output by the reply conversation score model;

a reply determining unit 1004, configured to determine, according to the usage score of the candidate reply dialog, a target reply dialog corresponding to the target user.

In a possible implementation manner, the reply dialog scoring model is trained by any implementation manner of the reply dialog scoring model training method provided in the embodiments of the present application.

Further, an embodiment of the present application further provides a reply dialog scoring model training device, including: a processor, a memory, a system bus;

the processor and the memory are connected through the system bus;

the memory is configured to store one or more programs, the one or more programs including instructions, which when executed by the processor, cause the processor to perform any of the implementations of the reply dialog scoring model training method described above.

Further, an embodiment of the present application further provides a dialog reply device, including: a processor, a memory, a system bus;

the processor and the memory are connected through the system bus;

the memory is used for storing one or more programs, and the one or more programs comprise instructions which, when executed by the processor, cause the processor to execute any one of the implementation methods of the dialog reply method described above.

Further, an embodiment of the present application further provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are executed on a terminal device, the terminal device is caused to execute any implementation method of the above reply dialogue scoring model training method, or execute any implementation method of the above dialogue reply method.

As can be seen from the above description of the embodiments, those skilled in the art can clearly understand that all or part of the steps in the above embodiment methods can be implemented by software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present application may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network communication device such as a media gateway, etc.) to execute the method according to the embodiments or some parts of the embodiments of the present application.

It should be noted that, in the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for training a reply dialog scoring model, the method comprising:

2. The method of claim 1, wherein generating model training data from the conversational training samples comprises:

generating reply reference content according to the 1 st round sample conversation content to the M round sample conversation content in the conversation training samples;

acquiring candidate reply contents corresponding to the reply reference contents;

and generating model training data according to the reply reference content and the candidate reply content corresponding to the reply reference content.

3. The method of claim 2, wherein the candidate reply content comprises positive reply content and/or negative reply content;

the obtaining of the candidate reply content corresponding to the reply reference content includes:

and/or the presence of a gas in the gas,

4. The method of claim 1, wherein when the model training data includes tth sample dialog contents in the dialog training samples, and the dialog importance includes information importance of the tth sample dialog contents, and t is a positive integer, and t ≦ M, the generating a dialog importance according to the dialog training samples comprises:

and determining the information importance of the t-th round sample conversation contents in the conversation training samples according to the unique information of the t-th round sample conversation contents in the conversation training samples.

5. The method of claim 4, wherein the determining the information importance of the sample dialog contents of the t-th round in the dialog training samples according to the unique information of the sample dialog contents of the t-th round in the dialog training samples comprises:

determining the prediction content of the t-th round of dialogue according to the contents of the t + 1-th round of sample dialogue to the M-th round of sample dialogue in the dialogue training samples; the unique information of the sample conversation content of the tth round is an information difference value of the sample conversation content of the tth round and the predicted conversation content of the tth round;

inputting the t +1 th round sample conversation content to the M th round sample conversation content in the conversation training samples and the t th round conversation prediction content into a pre-constructed forward generation type conversation model to obtain the pseudo generation probability corresponding to the M +1 th round sample conversation content in the conversation training samples;

inputting the t-th to M-th sample conversation contents in the conversation training samples into a pre-constructed forward generating conversation model to obtain true generating probabilities corresponding to the M + 1-th sample conversation contents in the conversation training samples;

and determining the information importance of the dialogue contents of the sample of the t round in the dialogue training sample according to the pseudo generation probability corresponding to the dialogue contents of the sample of the M +1 round in the dialogue training sample and the true generation probability corresponding to the dialogue contents of the sample of the M +1 round in the dialogue training sample.

6. The method of claim 5, wherein determining the tth wheel session prediction content according to the t +1 th to mth sample session contents of the session training samples comprises:

7. The method of claim 1, wherein if the model training data includes candidate reply content, the training a reply dialog scoring model based on the model training data and the dialog importance comprises:

obtaining a predicted usage score of the candidate reply content according to the model training data, the dialogue importance and the reply dialogue scoring model;

and updating the reply dialogue scoring model according to the predicted usage score of the candidate reply content and the actual usage score of the candidate reply content, and continuously executing the step of obtaining the predicted usage score of the candidate reply content according to the model training data, the dialogue importance and the reply dialogue scoring model until a preset stop condition is reached.

8. The method of claim 7, wherein the reply dialog scoring model includes an input layer, a vector layer, N aggregation layers, and a prediction layer, and N is a positive integer, and wherein deriving the predicted usage score for the candidate reply content based on the model training data, the dialog importance, and the reply dialog scoring model comprises:

inputting the model training data to the reply dialog scoring model using the input layer;

vectorizing the model training data by using the vector layer to obtain a training dialogue vector;

performing aggregation processing on the training dialogue vectors by using the ith aggregation layer and the dialogue importance degree to obtain ith layer overall aggregation data; wherein i is a positive integer, and i is not more than N;

and performing prediction processing on the overall aggregation data from the layer 1 to the layer N by using the prediction layer to obtain a predicted use score of the candidate reply content.

9. The method according to claim 8, wherein the obtaining process of the ith layer overall aggregated data is as follows:

10. The method of claim 8, wherein if the model training data includes 1 st to M th sample session contents in the session training samples, the obtaining process of the i-th layer overall aggregated data is:

11. The method of claim 10, wherein if the training dialogue vectors include 1 st to M th sample dialogue vectors and candidate reply vectors, the aggregating the training dialogue vectors according to the i-th layer attention distribution and the dialogue importance to obtain i-th layer overall aggregated data comprises:

12. The method of claim 11, wherein the conversation importance includes information importance of sample conversation content of a jth round, the ith layer attention distribution includes ith layer attention of sample conversation content of the jth round, the ith layer conversation aggregation weight includes ith layer aggregation weight of sample conversation content of the jth round, j is a positive integer, j ≦ M, and the ith layer aggregation weight of sample conversation content of the jth round is obtained by:

13. The method of claim 1, wherein training a reply dialog scoring model based on the model training data and the dialog importance comprises:

generating a pre-training model according to the reply dialogue scoring model;

training the pre-training model according to the model training data, the dialogue importance and the preset model comprehension, and determining the predicted loss value of the pre-training model as a model reference loss value when a first training stopping condition is reached;

and training the reply dialogue scoring model according to the model training data, the dialogue importance and the model reference loss value.

14. A dialog reply method, characterized in that the method comprises:

acquiring historical conversation content corresponding to a target user;

15. A reply dialog scoring model training device, the device comprising: a processor, a memory, a system bus;

the processor and the memory are connected through the system bus;

the memory is to store one or more programs, the one or more programs including instructions, which when executed by the processor, cause the processor to perform the reply dialog scoring model training method of any of claims 1 to 13.

16. A conversation reply device, characterized in that the device comprises: a processor, a memory, a system bus;

the processor and the memory are connected through the system bus;

the memory is to store one or more programs, the one or more programs including instructions, which when executed by the processor, cause the processor to perform the conversation reply method of claim 14.

17. A computer-readable storage medium having stored therein instructions that, when run on a terminal device, cause the terminal device to perform the reply dialog scoring model training method of any of claims 1 to 13, or to perform the dialog reply method of claim 14.