CN115495568B

CN115495568B - Training method and device for dialogue model, dialogue response method and device

Info

Publication number: CN115495568B
Application number: CN202211441290.4A
Authority: CN
Inventors: 刘红丽; 李峰
Original assignee: Suzhou Inspur Intelligent Technology Co Ltd
Current assignee: Suzhou Inspur Intelligent Technology Co Ltd
Priority date: 2022-11-17
Filing date: 2022-11-17
Publication date: 2023-08-22
Anticipated expiration: 2042-11-17
Also published as: CN115495568A; WO2024103609A1

Abstract

The invention discloses a training method of a dialogue model, which comprises the following steps: training an original dialogue model by utilizing a universal dialogue data set to obtain a universal dialogue model; acquiring a preset professional keyword group, and carrying out data screening on the universal dialogue data set according to the professional keyword group; training the universal dialogue model by using the screened initial labeling data set to obtain an initial professional dialogue model; verifying the initial professional dialogue model by using a verification data set and a preset natural language processing evaluation index to obtain a verification score; judging whether the verification score is larger than a preset score threshold value or not; if yes, the initial professional dialog model is determined to be the target professional dialog model. The method and the device enable the trained target professional dialogue model to have universality and professionality at the same time, and improve the use experience of users. The invention also discloses a training device of the dialogue model, a dialogue response method and device, electronic equipment and a computer readable storage medium, and the training device and the dialogue response method and device have corresponding technical effects.

Description

Training method and device for dialogue model, dialogue response method and device

Technical Field

The present invention relates to the field of artificial intelligence, and in particular, to a method and apparatus for training a dialogue model, a method and apparatus for responding to a dialogue, an electronic device, and a computer readable storage medium.

Background

Man-machine dialogue, a fundamental application of natural language processing (Natural Language Processing, NLP), has been receiving great attention from both academia and industry. With the development of artificial intelligence technology, dialog models based on generation are becoming more popular, and they are trained specifically for dialog data, so that very good performance is obtained in open domain dialog. However, training a large dialogue model from scratch requires a large amount of multi-type dialogue data as a training corpus, which requires a relatively high cost and a long training time.

Different chat needs often exist in professional human-machine conversation systems, including: chat, general knowledge questions and answers, professional questions and answers, etc. For example, the medical robot not only answers medical professional knowledge in the chat process with the patient, but also involves common sense questions in life, and also is chatted to solve the emotion of the patient. The main principle of the current professional dialogue model is semantic matching, namely, answers to questions asked by users are found in a knowledge base. Although the technology is mature, the technology is too dependent on corpus, knowledge is unilateral, replies are single and hard, the universality and the diversity are lacking, and the user experience is poor.

In summary, how to effectively solve the problems of single hard reply, lack of versatility and diversity, poor user experience and the like in the conventional dialogue response method is a problem that needs to be solved by those skilled in the art.

Disclosure of Invention

The invention aims to provide a training method of a dialogue model, which enables a trained target professional dialogue model to have universality and professionality at the same time, and improves the use experience of a user; another object of the present invention is to provide a training apparatus, a dialogue response method and apparatus, an electronic device, and a computer-readable storage medium for a dialogue model.

In order to solve the technical problems, the invention provides the following technical scheme:

a method of training a dialog model, comprising:

training an original dialogue model by utilizing a pre-acquired universal dialogue data set to obtain a universal dialogue model;

acquiring a preset professional keyword group, carrying out data screening on the universal dialogue data set according to the professional keyword group, and determining the screened data set as an initial labeling data set;

training the universal dialogue model by using the initial annotation data set to obtain an initial professional dialogue model;

Performing verification operation on the initial professional dialogue model by using a verification data set and a preset natural language processing evaluation index to obtain a verification score;

judging whether the verification score is larger than a preset score threshold value or not;

if yes, the initial professional dialog model is determined to be a target professional dialog model.

In one embodiment of the present invention, when it is determined that the verification score is equal to or less than the preset score threshold, the method further includes:

generating corresponding response data for each sample data in a preset unlabeled pool by using the initial professional dialogue model;

respectively calculating an automatic evaluation score corresponding to each response data;

sorting the magnitude of each automatic evaluation score, and selecting a preset number of automatic evaluation scores from the end with smaller score;

outputting labeling prompt information for labeling the response data corresponding to the selected automatic evaluation scores;

updating the initial annotation data set according to the annotation result to obtain an updated annotation data set;

training the initial professional dialog model based on the updated labeling data set to obtain an updated professional dialog model;

and performing verification operation on the updated professional dialogue model by using the verification data set to obtain a verification score, and repeatedly executing the step of judging whether the verification score is larger than a preset score threshold.

In one embodiment of the present invention, after obtaining the updated annotation data set, the method further comprises:

and carrying out updating operation on the preset unmarked pool according to the updated marked data set.

In one embodiment of the present invention, the verifying the initial professional dialog model using a verification data set and a preset natural language processing evaluation index includes:

performing verification operation on the initial professional dialog model in combination with the verification data set, BLEU index, ROUGE index, PPL index and DISTINCT index through the following formula:

；

wherein ,for the score of the initial professional dialog model on the BLEU index, < >>For the score of the initial professional dialog model on the ROUGE index,/I>For the score of the initial professional dialog model on the PPL index, adopting the reciprocal form of the score of the PPL index, and adopting the +.>For the score of the initial professional dialog model on the DISTINCT index,/I>To validate the score.

In one embodiment of the present invention, the method further comprises the step of scoring the BLEU index by the initial professional dialog modelIs calculated by the initial professional dialog model, the score of the initial professional dialog model on the BLEU indicator is +.>The calculation process of (1) comprises:

Calculating the score of the initial professional dialog model on the BLEU index by the following formula：

；

wherein ,，/>for the length of the machine translation, +.>Translating the length of the sentence for the shortest reference, +.>For the accuracy of n-gram, +.>Is the weight of an n-gram, there is +.>BP is a penalty factor.

In one embodiment of the present invention, the method further comprises a score of the initial professional dialog model on a ROUGE indexIs calculated by the initial professional dialog model on the ROUGE index>The calculation process of (1) comprises:

calculating the score of the initial professional dialog model on the ROUGE index by the following formula：

；

Wherein { reference translation } represents a set of reference translations,representing a combination of N words, ">The denominator of the formula is to count the number of N-grams in all reference translations, and the numerator is to count the number of N-grams shared by all reference translations and machine translations.

In one embodiment of the present invention, the method further comprises the step of scoring the PPL index by the initial professional dialog modelIs calculated by the initial professional dialog model, the score of the initial professional dialog model on the PPL index is +.>Is calculated according to the following steps:

；

wherein ,representing the probability of predicting the ith word from the above words, N represents the sentence length.

In one embodiment of the present invention, the method further comprises the step of scoring the initial professional dialog model on the DISTINCT indexThe score of the initial professional dialog model on the DISTINCT indexThe calculation process of (1) comprises:

calculating the score of the initial professional dialog model on the DISTRINCT index by the following formula：

；

wherein ,indicate the number of non-repeated ngram, < > in reply>Indicating the total number of ngram words in the reply.

In one embodiment of the present invention, before training the original dialogue model using the pre-acquired generic dialogue data set, further comprises:

and respectively filtering question and answer data and boring data in the general dialogue data set.

In one embodiment of the present invention, training an original session model using a pre-acquired generic session data set to obtain a generic session model includes:

inputting the universal dialogue data set into the original dialogue model to perform model iterative training;

obtaining a current iteration number and a loss standard deviation obtained by the current iteration training;

determining whether a model training cut-off condition is reached according to the current iteration number and the loss standard deviation;

If yes, determining the dialogue model obtained by the round of iterative training as the universal dialogue model.

In one embodiment of the present invention, determining whether a model training cutoff condition is reached based on the current iteration number and the loss standard deviation includes:

and judging whether the current iteration number is larger than a first preset value and the loss standard deviation is smaller than a second preset value.

In one embodiment of the present invention, when it is determined that the current iteration number is greater than the first preset value and the loss standard deviation is greater than or equal to the second preset value, the method further includes:

judging whether the current iteration number is larger than a third preset value or not; wherein the third preset value is greater than the first preset value;

if yes, executing the step of determining the dialogue model obtained by the round of iterative training as the universal dialogue model;

if not, inputting the universal dialogue data set into the dialogue model obtained by the iterative training of the present round to carry out the iterative training of the model, and repeatedly executing the steps of obtaining the current iteration number and the loss standard deviation obtained by the iterative training of the present round.

In one embodiment of the present invention, the data filtering the generic dialogue data set according to the professional keyword group includes:

And carrying out data screening on the universal dialogue data set according to the professional keyword group by using a DFA algorithm.

A dialog response method for use in a dialog system including a target professional dialog model as trained previously, comprising:

receiving target question voice to be responded;

generating target response voice corresponding to the target questioning voice by utilizing a target professional dialogue model obtained based on training a general dialogue model;

and outputting the target response voice.

In one embodiment of the present invention, the method further comprises:

searching a related answer from a database based on a preset search algorithm when the target professional dialogue model fails to respond to the target questioning voice;

and outputting the voice of the related answer.

A training device for a dialog model, comprising:

the universal dialogue model obtaining module is used for training the original dialogue model by utilizing the pre-obtained universal dialogue data set to obtain a universal dialogue model;

the initial labeling data set determining module is used for acquiring a preset professional keyword group, carrying out data screening on the universal dialogue data set according to the professional keyword group, and determining the screened data set as an initial labeling data set;

The initial professional dialog model obtaining module is used for training the general dialog model by utilizing the initial labeling data set to obtain an initial professional dialog model;

the verification score obtaining module is used for carrying out verification operation on the initial professional dialogue model by utilizing a verification data set and a preset natural language processing evaluation index to obtain a verification score;

the judging module is used for judging whether the verification score is larger than a preset score threshold value or not;

and the target professional dialog model determining module is used for determining the initial professional dialog model as a target professional dialog model when the verification score is larger than a preset score threshold.

A dialog response device comprising:

the questioning voice receiving module is used for receiving target questioning voice to be responded;

the response voice generation module is used for generating target response voice corresponding to the target question voice by utilizing a target professional dialogue model obtained based on training of the universal dialogue model;

and the response voice output module is used for outputting the target response voice.

An electronic device, comprising:

a memory for storing a computer program;

a processor for implementing the steps of the training method or the dialog response method of the dialog model as described above when executing the computer program.

A computer readable storage medium having stored thereon a computer program which when executed by a processor implements the steps of a training method or a dialog response method of a dialog model as described above.

According to the training method of the dialogue model, the original dialogue model is trained by utilizing the pre-acquired universal dialogue data set, and the universal dialogue model is obtained; acquiring a preset professional keyword group, carrying out data screening on a universal dialogue data set according to the professional keyword group, and determining the screened data set as an initial labeling data set; training the universal dialogue model by using the initial labeling data set to obtain an initial professional dialogue model; performing verification operation on the initial professional dialogue model by using the verification data set and a preset natural language processing evaluation index to obtain a verification score; judging whether the verification score is larger than a preset score threshold value or not; if yes, the initial professional dialog model is determined to be the target professional dialog model.

According to the technical scheme, the target professional dialogue model applied to the specific dialogue scene is obtained through training based on the universal dialogue model in advance, so that the requirements on data quantity and calculation force are greatly reduced, the trained target professional dialogue model has universality and professionality at the same time, and the user experience is improved.

Correspondingly, the invention also provides a training device, a dialogue response method and device, electronic equipment and a computer readable storage medium of the dialogue model corresponding to the training method of the dialogue model, which have the technical effects and are not repeated herein.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of an implementation of a method for training a dialog model in an embodiment of the invention;

FIG. 2 is a flowchart of another implementation of a training method for a dialogue model according to an embodiment of the invention;

FIG. 3 is a flow chart illustrating an implementation of a dialogue response method according to an embodiment of the present invention;

FIG. 4 is a block diagram of a training device for a dialogue model according to an embodiment of the present invention;

FIG. 5 is a block diagram of a dialogue response apparatus according to an embodiment of the present invention;

FIG. 6 is a block diagram of an electronic device according to an embodiment of the present invention;

fig. 7 is a schematic diagram of a specific structure of an electronic device according to this embodiment.

Detailed Description

In order to better understand the aspects of the present invention, the present invention will be described in further detail with reference to the accompanying drawings and detailed description. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, fig. 1 is a flowchart showing an implementation of a method for training a dialogue model according to an embodiment of the invention, where the method may include the following steps:

s101: training the original dialogue model by utilizing the pre-acquired universal dialogue data set to obtain the universal dialogue model.

The general dialogue data sets are collected in the public data set in advance, and can be divided into two categories of questions and answers and boring. The question and answer data can relate to the fields of common sense, actual matters, mother and infant, medical treatment, law, insurance, aviation, psychological, traditional Chinese medicine, epidemic situation and the like. The gossip data may include a plurality of data sets such as microblog discussions, television drama dialect, bar discussion, bean comments, e-commerce conversations, etc., and may relate to various topic discussions of daily life such as history, movies, weather, entertainment, sports, etc.

Specific examples of building a generic dialog data set are as follows:

the entry interpretation type sample format is the title: "title", article: "text". Original corpus example { "id": "0", "url": https: /(xxx, "title": "economics", "text": "economics is a social science … …" that studies the production, distribution and consumption of products and services, after composition in the pro format: title: "economics", article: "economics is a social science … …" of research into the production, distribution and consumption of products and services.

Question and answer type prompt format: asking for: "title+desc" answer: "answer". Original corpus example { "qid":0, "title": "alpha go only goes down to go, alfa dog can write novels", "desc": "no intelligent robot can engage in literature authoring, < br > if any, can write what level of work", "answer": the alpha go only can go down the go, because of the design purpose, architecture, technical scheme and training data, the alpha go is … …' which is carried out around the core of go down, and the alpha go is composed according to the promt format: asking for: "alpha go only goes down go, the afte dog can write a novel, no intelligent robot can do literature creation at present, and what level of works can be written if any" answer: the alpha go only goes down the go because of its design purpose, architecture, technical scheme and training data, all about the go-down core … … ".

Reading and understanding the pro-like format: context question: "query" answer: "answer". Original corpus example { "id": "0", "context": "treatment of cholelithiasis should be treated separately in different cases, asymptomatic gall-stone may not be treated, but good eating habits … …,", "query" should be observed and noted regularly: "what type of gall bladder stone may not be treated", "answer", "asymptomatic gall bladder stone" }, after composition in the pro format: the cholelithiasis treatment should be treated separately in different cases, and asymptomatic cholelithiasis may not be treated, but regular observation and attention should be paid to good eating habits … …: "what type of gall bladder stone may not be treated" in response to: asymptomatic gall-stone.

Single-round or multi-round dialog-like prompt format: a dialogue: "dialog1", "dialog2", "dialog3" … …. After composition in the prompt format: a dialogue: "how not live, i don't see you," not broadcast, "please me like you" … …

Training the original dialogue model by utilizing the pre-acquired universal dialogue data set to obtain the universal dialogue model.

S102: and acquiring a preset professional keyword group, carrying out data screening on the universal dialogue data set according to the professional keyword group, and determining the screened data set as an initial labeling data set.

Professional dialog data sets are generally marked by experts, and although the data requirements are much smaller than those of general dialog data sets, marking by experts alone is time-consuming and labor-consuming, so professional keyword groups are preset. After training an original dialogue model by utilizing a general dialogue data set to obtain a general dialogue model, acquiring a preset professional keyword group, carrying out data screening on the general dialogue data set according to the professional keyword group, determining the screened data set as an initial labeling data set, and recording as. The special key word group is set to screen from the general dialogue data set to obtain the initial annotation data set, and compared with a simple manual annotation method, the generation efficiency of the special dialogue data set is greatly improved.

In a specific embodiment of the present invention, the data filtering of the generic dialogue data set according to the professional keyword group may include the following steps:

When screening the professional dialog data set from the general dialog data set, the DFA algorithm is utilized to perform data screening on the general dialog data set according to the professional keyword group. Therefore, the advantages of filtering sensitive words can be realized while efficient keyword matching can be realized by fully utilizing the DFA algorithm.

The embodiment of the invention adopts the DFA algorithm to realize keyword matching, and the process of screening professional dialogue data from the universal dialogue data set can comprise the following steps:

(1) Providing professional key word groups by experts;

(2) Constructing a professional word chain table (ending with a specific character '\x00') by establishing a nested dictionary by using the professional keyword group;

(3) Traversing each group of dialogs in the universal dialog data set, using the dialogs as input traversing the professional word linked list, and if encountering a specific character \x00, describing that the group of dialogs contains professional keywords, and screening out.

Although the professional dialogue data can be partly screened through the keyword matching, the professional dialogue related in the general dialogue data set is limited, especially some off-the-shelf professions, so that expert labeling is still needed. If the expert labels data relate to privacy, desensitization treatment (hiding private information such as names, mobile phone numbers, mailboxes and the like in a dialogue) needs to be added. As with the construction of the generic dialog data set, the professional dialog data set is composed in the prompt format of table 1.

Specific examples of building a server professional dialog dataset are as follows:

if the intelligent customer service of the server belongs to multiple rounds of conversations, the conversation content is as follows: "you good, please ask what you can help you. The status light red is a prayer wheel related to the power supply, the status light which does not affect the normal running bar of the server and the status light is a general light, the machine is lightened when a problem exists, and 4 paths of electricity are recommended to be plugged in. The 'on site is not provided with 4 paths of power supply, and the status lamp is not lightened' is not provided with a method, and the power supply strategy is brushed into double power by using instructions. "

S103: and training the universal dialogue model by using the initial annotation data set to obtain an initial professional dialogue model.

After the initial annotation data set is obtained, the initial annotation data set is utilizedTraining the general dialogue model to obtain an initial professional dialogue model, which is recorded as +.>。

S104: and performing verification operation on the initial professional dialogue model by using the verification data set and a preset natural language processing evaluation index to obtain a verification score.

Training to obtain initial professional dialogue modelThen, the verification data set and the preset natural language processing evaluation index are utilized to carry out verification operation on the initial professional dialogue model, and a verification score is obtained and recorded as +.>. And predicting the response performance of the initial professional dialog model to the voice question through the verification score.

S105: and judging whether the verification score is larger than a preset score threshold, if so, executing the step S106, and if not, continuing training the initial professional dialogue model.

And presetting a score threshold, judging whether the verification score is larger than the preset score threshold after the verification operation is carried out on the initial professional dialogue model by using the verification data set and the preset natural language processing evaluation index, if so, indicating that the model is trained well, executing step S106, and if not, indicating that the initial professional dialogue model needs to be trained continuously.

S106: the initial specialized dialog model is determined as the target specialized dialog model.

When the verification score is determined to be larger than the preset score threshold value, the description model is trained, and the initial professional dialog model is determined to be the target professional dialog model. The target professional dialog model and the current all expert annotation dataset may also be output. Whether the professional dialogue model is trained is judged according to a preset score threshold, so that the target professional dialogue model obtained through training can have better answer generation capability on questioning voice.

Referring to fig. 2, fig. 2 is a flowchart of another implementation of a training method for a dialogue model according to an embodiment of the invention, where the method may include the following steps:

s201: training the original dialogue model by utilizing the pre-acquired universal dialogue data set to obtain the universal dialogue model.

In a specific embodiment of the present invention, before step S201, the training method of the session model may further include the steps of:

After the general dialogue data set is acquired, question and answer data and boring data in the general dialogue data set are filtered respectively. For example, because the overall noise of the question-answer dataset is relatively low, simple filtering can be performed, including removing dialogs containing sensitive words, removing inadequaciesDialogs of individual words, dialogs that remove questions identical to answers, removing nonsensical characters in corpus, etc. Because the chatting data set is relatively noisy as a whole, strict filtering is required. The filtering method comprises removing dialogue containing sensitive words, and removing deficiency +.>Dialogs of individual words, dialogs of only one sentence, dialogs of not containing Chinese characters, dialogs of advertisement, repeated dialogs of de-duplication, non-meaning characters in corpus, etc. By training the original dialogue model by using the filtered universal dialogue data set, the interference of useless data is avoided, the model training complexity is reduced, the model training efficiency is improved, and the accuracy of the model obtained by training is improved.

In order to make the training effect better, the data sets can also be respectively formed according to different categories and a certain prompt format as follows:

TABLE 1

By means of the fixed campt format, the subsequent processing effort is reduced.

In one embodiment of the present invention, step S201 may include the steps of:

step one: inputting the universal dialogue data set into an original dialogue model to carry out model iterative training;

step two: obtaining a current iteration number and a loss standard deviation obtained by the current iteration training;

step three: determining whether the model training cut-off condition is reached according to the current iteration number and the loss standard deviation, if so, executing the fourth step, and if not, executing the fifth step;

step four: determining a dialogue model obtained by the iterative training of the round as a general dialogue model;

step five: and (3) inputting the universal dialogue data set into a dialogue model obtained by the iterative training of the round of model iterative training, and returning to the execution step (II).

For convenience of description, the above five steps may be combined for explanation.

The process of training the original dialogue model by using the universal dialogue data set to obtain the universal dialogue model may include inputting the universal dialogue data set to the original dialogue model to perform model iterative training, obtaining a current iteration number and a loss standard deviation obtained by the current iteration training, determining whether a model training deadline condition is reached according to the current iteration number and the loss standard deviation, if so, indicating that the model obtained by the current training can give better voice response to the universal question, determining the dialogue model obtained by the current iteration training as the universal dialogue model, if not, indicating that the model obtained by the current training can not give better voice response to the universal question, inputting the universal dialogue data set to the dialogue model obtained by the current iteration training to perform model iterative training, obtaining the current iteration number and the loss standard deviation obtained by the current iteration training again, and continuously optimizing the model through multiple training iterations.

It should be noted that, the model training cut-off condition may be set and adjusted according to the actual situation, which is not limited in the embodiment of the present invention, and may be set as an upper limit of the iteration number, and may also be set as a loss threshold.

In one embodiment of the present invention, determining whether the model training cutoff condition is reached according to the current iteration number and the loss standard deviation may include the steps of:

step three: judging whether the current iteration number is larger than a first preset value and the loss standard deviation is smaller than a second preset value, if yes, executing the fourth step, and if not, executing the fifth step when the current iteration number is larger than the first preset value and the loss standard deviation is larger than or equal to the second preset value;

step five: judging whether the current iteration number is larger than a third preset value, if so, returning to the execution step four, and if not, executing the step six;

wherein the third preset value is greater than the first preset value;

Step six: and (3) inputting the universal dialogue data set into a dialogue model obtained by the iterative training of the round of model iterative training, and returning to the execution step (II).

For convenience of description, the above six steps may be combined for explanation.

Presetting super parameters in model training, wherein the super parameters can comprise iteration numbersPre-training minimum iteration number +.>(i.e. a first preset value) Standard deviation of loss->Standard deviation threshold +.>(i.e. the second preset value), loss standard deviation +.>Representing the standard deviation of the last ten iterative loss. After obtaining the current iteration number and the loss standard deviation obtained by the current iteration training, judging whether the current iteration number is larger than a first preset value and the loss standard deviation is smaller than a second preset value, namely +.>Thereby determining whether the model training cutoff condition has been reached. The model training stage judgment is carried out by combining the current iteration number and the loss standard deviation, so that the model meeting the training cut-off condition is ensured to be iterated for a certain number of times, and the model performance is improved.

The super-parameters in the preset model training can also comprise iteration numberMaximum number of pre-training iterations +. >(i.e., a third preset value) greater than the first preset value, i.e.)>. When the current iteration number is determined to be larger than the first preset value and the loss standard deviation is determined to be larger than or equal to the second preset value, judging whether the current iteration number is larger than the third preset value, if so, judging that the loss value is slow to fall, training the model to be close to global optimum, determining a dialogue model obtained by the iterative training of the round as a general dialogue model, and if not, judging thatAnd if so, reciprocating until reaching the preset model training cut-off condition, thereby obtaining the general dialogue model capable of responding to general questioning voice well.

S202: and acquiring a preset professional keyword group, carrying out data screening on the universal dialogue data set according to the professional keyword group, and determining the screened data set as an initial labeling data set.

S203: and training the universal dialogue model by using the initial annotation data set to obtain an initial professional dialogue model.

S204: and performing verification operation on the initial professional dialogue model by using the verification data set and a preset natural language processing evaluation index to obtain a verification score.

In one embodiment of the present invention, the verification operation of the initial professional dialog model using the verification data set and the preset natural language processing evaluation index may include the steps of:

the initial professional dialog model is validated by combining the validation dataset, BLEU index, ROUGE index, PPL index, DISTINCT index through the following formula:

；

wherein ,score on BLEU index for initial professional dialog model,/->Score on ROUGE index for initial professional dialog model,/A->For the score of the initial professional dialog model on the PPL index, adopting the reciprocal form of the score of the PPL index, and +.>For the score of the initial professional dialog model on the distict indicator,to validate the score.

When the initial professional dialog model is verified, the initial professional dialog model can be verified by combining a verification data set, BLEU indexes, ROUGE indexes, PPL indexes and DISTRINCT indexes. The verification score may be calculated, for example, by the following formula:

；

wherein ,score on BLEU index for initial professional dialog model,/- >Score on ROUGE index for initial professional dialog model,/A->For the score of the initial professional dialog model on the PPL index, adopting the reciprocal form of the score of the PPL index, and +.>Smaller indicates that the model generation effect is worse>The score of the initial professional dialog model on the distict indicator.

The performance of the model on the validation dataset was comprehensively evaluated by using BLEU, ROUGE, PPL, DISTINCT four indicators. The accuracy and recall rate of the generated answers are ensured while the smoothness and diversity of model generation are ensured.

In one embodiment of the present invention, the training method of the session model may further include scoring the BLEU index by the initial professional session modelScore +.f. of initial professional dialog model on BLEU index>May comprise the steps of:

；

The core idea of BLEU is to compare the degree of overlap of the n-gram in the candidate translation with that in the reference translation, and the higher the degree of overlap, the higher the quality of the translation is considered. In practice, n=1 to 4 is usually taken, and then weighted average is performed.

；

wherein ,，/>for the length of the machine translation, +.>Translating the length of the sentence for the shortest reference, +.>For the accuracy of n-gram, +.>The weight of the n-gram is generally set to be uniform, i.e., there is +.>. BP is a penalty factor, and BP is less than 1 if the length of the translation is less than the shortest reference translation. The 1-gram accuracy of BLEU indicates how faithful the translation is to the original, while the other n-grams indicate how fluent the translation is.

In one embodiment of the present invention, the training method of the dialog model may further include scoring the initial professional dialog model on the ROUGE indexScore +.A. Score of initial professional dialog model on ROUGE index +.>The calculation process of (1) may include:

；

ROUGE-N focuses on recall rather than precision. See how many n-gram groups of reference translated sentences appear in the output. "N" refers to an N-gram that is calculated in a similar manner as BLEU, except that BLEU is based on precision and ROUGE is based on recall. The ROUGE-N is mainly used for counting recall rate on the N-gram, and for the N-gram, the ROUGE-N score can be calculated, and the calculation formula is as follows:

；

Where { reference translations } represents a set of reference translations, which may be plural in practical applications.Representing a combination of N words, ">Representing the number of N-grams in the computation translation. The denominator of the formula is to count the number of N-grams in all reference translations, while the numerator is to count the number of N-grams that are common to all reference translations and machine translations.

In one embodiment of the present invention, the training method of the dialogue model may further include scoring the PPL index by the initial professional dialogue modelScore of initial professional dialog model on PPL indexIs calculated according to the following steps:

；

PPL refers to the Perplexity in a language model, and the confusion (Perplexity) is an indicator of whether a sentence is smooth. The definition is as follows:

；

wherein ,representing the probability of predicting the ith word from the above words, N represents the sentence length. The smaller the PPL value, the more natural the reply the model generates and the more smooth the statement. Evaluating the reply quality by PPL can avoid model-generated repliesThere are cases of out-of-order, upside down.

In one embodiment of the present invention, the method may further comprise scoring the DISTRINCT index by the initial professional dialog model Is calculated by the initial professional dialog model on the DISTRINCT index>The calculation process of (1) comprises:

；

The distict evaluation index judges the diversity of machine recovery, and the distict index judges whether a large number of general and repeated recovery occurs. Distinct is defined as follows:

；

wherein ,indicate the number of non-repeated ngram, < > in reply>Indicating the total number of ngram words in the reply. />Larger represents higher diversity in generating replies.

S205: judging whether the verification score is greater than a preset score threshold, if so, executing step S106, and if not, executing step S207.

S206: the initial specialized dialog model is determined as the target specialized dialog model.

S207: and generating corresponding response data for each sample data in the preset unlabeled pool by using the initial professional dialogue model.

When the verification score is determined to be less than or equal to the preset score threshold, the model is required to be trained continuously, and the initial professional dialogue model is utilizedAnd generating corresponding response data for each sample data in the preset unlabeled pool.

S208: and respectively calculating the automatic evaluation scores corresponding to the response data.

After corresponding response data are generated for each sample data in the preset unlabeled pond by utilizing the initial professional dialogue model, the automatic evaluation scores corresponding to the response data are respectively calculated. If the automatic evaluation score corresponding to each response data can be calculated according to the PPL index and the distict index, the calculation formula is as follows:

；

thereby obtaining the automatic evaluation score corresponding to each response data.

S209: and sorting the magnitude of each dynamic evaluation score, and selecting a preset number of automatic evaluation scores from the end with smaller score.

After the automatic evaluation scores corresponding to the response data are calculated respectively, the automatic evaluation scores are calculated respectivelyThe dynamic evaluation scores are ranked, and a preset number of automatic evaluation scores are selected from the end with smaller scores, such as selecting the lowest NAnd (5) scoring.

S210: and outputting labeling prompt information for labeling the response data corresponding to the selected dynamic evaluation scores.

After a preset number of automatic evaluation scores are selected from the end with smaller scores, marking prompt information for marking response data corresponding to the selected dynamic evaluation scores is output, so that the lowest N number of the prompt pairs are prompted And (5) performing expert annotation on response data corresponding to the scores.

S211: and updating the initial annotation data set according to the annotation result to obtain an updated annotation data set.

After marking prompt information for marking response data corresponding to the selected dynamic evaluation scores is output, marking results are obtained, and the initial marking data set is updated according to the marking results to obtain an updated marking data set, so that effective marking of data with poor response data effect generated by the current professional dialogue model is achieved.

In a specific embodiment of the present invention, after step S211, the training method of the session model may further include the steps of:

After the updated marked data set is obtained, updating the preset unmarked pool according to the updated marked data set, so that the unmarked sample data in the preset unmarked pool is updated in time.

S212: training the initial professional dialog model based on the updated labeling data set to obtain an updated professional dialog model.

After the initial labeling data set is updated according to the labeling result to obtain an updated labeling data set, training the initial professional dialog model based on the updated labeling data set to obtain an updated professional dialog model.

According to the embodiment of the invention, the expert labeling sample size is reduced as much as possible by adopting an active learning mode, and the influence on the model performance is reduced. The 'difficult sample' with the greatest improvement on the model performance is continuously selected from the preset unlabeled pool, so that the model performance is improved.

S213: and (3) performing verification operation on the updated professional dialog model by using the verification data set to obtain a verification score, and returning to execute step S205.

After training the initial professional dialogue model based on the updated labeling data set to obtain the updated professional dialogue model, performing verification operation on the updated professional dialogue model by using the verification data set to obtain a verification score, and returning to execute the step of judging whether the verification score is larger than a preset score threshold value, so as to reciprocate until the calculated verification score is larger than the preset score threshold value, thereby obtaining the target professional dialogue model capable of responding well to the received questioning voice.

Referring to fig. 3, fig. 3 is a flowchart showing an implementation of a dialogue response method applied to a dialogue system including a target professional dialogue model obtained by training as above, and the method may include the following steps:

S301: and receiving target question voice to be responded.

When the user needs to conduct a scene dialogue, outputting target questioning voice to a dialogue response control center, and receiving the target questioning voice to be responded by the dialogue response control center.

The dialog response control center may be a processor deployed with a dialog model.

The target question voice may be boring, common sense questions and answers, professional questions and answers, etc.

S302: and generating target response voice corresponding to the target question voice by using a target professional dialogue model obtained based on training the universal dialogue model.

The generic dialog model is Pre-trained, e.g., model Training may be performed on a generic dialog data set based on a large model, which may be based on a transducer structure, suitable for generating tasks, such as a GPT (generating Pre-Training) model, a BERT (Bidirectional Encoder Representation from Transformers) model, etc. And training based on the general dialogue model to obtain the target professional dialogue model. After receiving target questioning voice to be responded, generating target response voice corresponding to the target questioning voice by utilizing a target professional dialogue model obtained by training a general dialogue model.

By retraining on the basis of a large model, the requirements for data quantity and computational power are greatly reduced, and a two-stage training model mode is adopted, so that the trained target professional dialogue model has universality and professionality at the same time.

S303: and outputting the target response voice.

After target response voices corresponding to the target questioning voices are generated by utilizing the target professional dialogue model obtained through training of the general dialogue model, output operation is carried out on the target response voices, and therefore response to the target questioning voices is achieved.

Because the model training process requires more resources than the model application process, more resources can be allocated in advance for the model training process and relatively less resources can be allocated for the model application process. For example, GPUs (Graphics Processing Unit, image processors) of 80G size and above can be pre-partitioned for model training, and GPUs of 1G size and above can be partitioned for model application.

In one embodiment of the present invention, the dialog response method may further include the steps of:

step one: searching a related answer from a database based on a preset search algorithm when the target professional dialogue model fails to respond to the target questioning voice;

step two: and outputting the voice of the relevant answers.

For convenience of description, the above two steps may be combined for explanation.

According to the embodiment of the invention, a spam scheme is preset, a professional database is constructed by utilizing a professional data set, and when the response of a target professional dialogue model to target questioning voice fails, namely when the output of the target professional dialogue model is empty, relevant answers are searched from the database based on a preset search algorithm, and voice output is carried out on the relevant answers. Therefore, the application flow of the professional dialogue model is optimized, the condition that the questioning voice of the user cannot fall down is further ensured, and the user experience is improved.

Corresponding to the above method embodiment, the present invention further provides a training device for a dialogue model, where the training device for a dialogue model described below and the training method for a dialogue model described above can be referred to correspondingly.

Referring to fig. 4, fig. 4 is a block diagram of a training apparatus for a dialogue model according to an embodiment of the invention, where the training apparatus for a dialogue model may include:

A universal dialogue model obtaining module 41, configured to train the original dialogue model by using the pre-obtained universal dialogue data set to obtain a universal dialogue model;

the initial labeling data set determining module 42 is configured to obtain a preset professional keyword group, perform data screening on the universal dialogue data set according to the professional keyword group, and determine the screened data set as an initial labeling data set;

an initial professional dialog model obtaining module 43, configured to train the general dialog model using the initial labeling data set to obtain an initial professional dialog model;

the verification score obtaining module 44 is configured to perform a verification operation on the initial professional dialog model by using the verification data set and a preset natural language processing evaluation index, so as to obtain a verification score;

a judging module 45, configured to judge whether the verification score is greater than a preset score threshold;

the target professional dialog model determination module 46 is configured to determine the initial professional dialog model as the target professional dialog model when the verification score is greater than the preset score threshold.

In a specific embodiment of the present invention, the training device for a dialogue model may further include:

the response data generation module is used for generating corresponding response data for each sample data in the preset unlabeled pool by utilizing the initial professional dialogue model when the verification score is smaller than or equal to the preset score threshold value;

the automatic evaluation score calculation module is used for calculating the automatic evaluation scores corresponding to the response data respectively;

the automatic evaluation score selection module is used for sorting the magnitude of each dynamic evaluation score and selecting a preset number of automatic evaluation scores from one end with smaller score;

the marking prompt information output module is used for outputting marking prompt information for marking response data corresponding to the selected dynamic evaluation scores;

the labeling data set updating module is used for updating the initial labeling data set according to the labeling result to obtain an updated labeling data set;

the professional dialog model updating module is used for training the initial professional dialog model based on the updated labeling data set to obtain an updated professional dialog model;

and the repeated execution module is used for carrying out verification operation on the updated professional dialogue model by using the verification data set to obtain a verification score, and repeatedly executing the step of judging whether the verification score is larger than a preset score threshold value.

and the unlabeled pool updating module is used for updating the preset unlabeled pool according to the updated labeled data set after the updated labeled data set is obtained.

In one embodiment of the present invention, the verification score obtaining module 44 is specifically configured to perform a verification operation on the initial professional dialog model by combining the verification data set, the BLEU index, the ROUGE index, the PPL index, and the distict index according to the following formula:

；

a score calculation module on the BLEU index for calculating the score of the initial professional dialog model on the BLEU index by the following formula：

；

wherein ,，/>for the length of the machine translation, +.>Translating the length of the sentence for the shortest reference, +. >For the accuracy of n-gram, +.>Is the weight of an n-gram, there is +.>BP is a penalty factor.

a score calculation module on the ROUGE index for calculating a score of the initial professional dialog model on the ROUGE index by the following formula：

；

a score calculation module on the PPL index for calculating the score of the initial professional dialog model on the PPL index by the following formula：

；

a score calculation module on the display index for calculating the score of the initial professional dialog model on the display index by the following formula ：

；

and the data filtering module is used for filtering the question and answer data and the boring data in the universal dialogue data set before training the original dialogue model by utilizing the pre-acquired universal dialogue data set.

In one embodiment of the present invention, the generic dialog model acquisition module 41 comprises:

the iterative training sub-module is used for inputting the universal dialogue data set into the original dialogue model to carry out model iterative training;

the loss standard deviation acquisition sub-module is used for acquiring the current iteration number and the loss standard deviation obtained by the current iteration training;

the training cut-off judging sub-module is used for determining whether the model training cut-off condition is reached according to the current iteration number and the loss standard deviation;

and the general dialogue model determining submodule is used for determining the dialogue model obtained by the iterative training of the round as the general dialogue model when the model training cut-off condition is reached according to the current iteration number and the loss standard deviation.

In one embodiment of the present invention, the training cutoff determination submodule is specifically a module for determining whether the current iteration number is greater than a first preset value and the loss standard deviation is less than a second preset value.

the iteration number statistics sub-module is used for judging whether the current iteration number is larger than a third preset value or not when the current iteration number is larger than a first preset value and the loss standard deviation is larger than or equal to a second preset value; wherein the third preset value is greater than the first preset value;

the general dialogue model determining submodule is further used for determining a dialogue model obtained by the round of iterative training as a general dialogue model when the current iteration number is larger than a third preset value;

and the iteration training sub-module is also used for inputting the universal dialogue data set into the dialogue model obtained by the current iteration training to carry out model iteration training when the current iteration number is smaller than or equal to a third preset value, and repeatedly executing the steps of obtaining the current iteration number and the loss standard deviation obtained by the current iteration training.

In one embodiment of the present invention, the initial annotation data set determination module 42 is specifically a module that performs data screening on the generic dialogue data set according to the professional keyword group using the DFA algorithm.

Corresponding to the above method embodiments, the present invention further provides a dialogue response apparatus, and the dialogue response apparatus described below and the dialogue response method described above may be referred to correspondingly to each other.

Referring to fig. 5, fig. 5 is a block diagram illustrating a dialogue response apparatus according to an embodiment of the present invention, the dialogue response apparatus may include:

a question voice receiving module 51, configured to receive a target question voice to be responded;

a response voice generating module 52, configured to generate a target response voice corresponding to the target question voice using a target professional dialog model obtained based on training the universal dialog model;

the response voice output module 53 is configured to perform an output operation on the target response voice.

In one embodiment of the present invention, the dialogue response apparatus may further include:

the answer searching module is used for searching relevant answers from the database based on a preset searching algorithm when the response of the target professional dialogue model to the target questioning voice fails;

and the voice output module is used for outputting voice to the relevant answers.

Corresponding to the above method embodiment, referring to fig. 6, fig. 6 is a schematic diagram of an electronic device provided by the present invention, where the device may include:

a memory 332 for storing a computer program;

a processor 322 for implementing the steps of the training method or the dialogue response method of the dialogue model of the above-described method embodiment when executing the computer program.

Specifically, referring to fig. 7, fig. 7 is a schematic diagram of a specific structure of an electronic device according to the present embodiment, where the electronic device may have a relatively large difference due to different configurations or performances, and may include a processor (central processing units, CPU) 322 (e.g. one or more processors) and a memory 332, where the memory 332 stores one or more computer programs 342 or data 344. Wherein the memory 332 may be transient storage or persistent storage. The program stored in memory 332 may include one or more modules (not shown), each of which may include a series of instruction operations in the data processing apparatus. Still further, the processor 322 may be configured to communicate with the memory 332 and execute a series of instruction operations in the memory 332 on the electronic device 301.

The electronic device 301 may also include one or more power supplies 326, one or more wired or wireless network interfaces 350, one or more input/output interfaces 358, and/or one or more operating systems 341.

The steps in the dialog response method described above may be implemented by the structure of the electronic device.

Corresponding to the above method embodiments, the present invention also provides a computer readable storage medium having a computer program stored thereon, which when executed by a processor, performs the steps of:

training an original dialogue model by utilizing a pre-acquired universal dialogue data set to obtain a universal dialogue model; acquiring a preset professional keyword group, carrying out data screening on a universal dialogue data set according to the professional keyword group, and determining the screened data set as an initial labeling data set; training the universal dialogue model by using the initial labeling data set to obtain an initial professional dialogue model; performing verification operation on the initial professional dialogue model by using the verification data set and a preset natural language processing evaluation index to obtain a verification score; judging whether the verification score is larger than a preset score threshold value or not; if yes, determining the initial professional dialog model as a target professional dialog model;

Or alternatively, the first and second heat exchangers may be,

receiving target question voice to be responded; generating target response voice corresponding to the target question voice by utilizing a target professional dialogue model obtained based on training the universal dialogue model; and outputting the target response voice.

The computer readable storage medium may include: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

For the description of the computer-readable storage medium provided by the present invention, refer to the above method embodiments, and the disclosure is not repeated here.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other. The apparatus, the electronic device and the computer readable storage medium disclosed in the embodiments have a relatively simple description, and the relevant points refer to the description of the method section since the apparatus, the electronic device and the computer readable storage medium correspond to the method disclosed in the embodiments.

The principles and embodiments of the present invention have been described herein with reference to specific examples, but the description of the examples above is only for aiding in understanding the technical solution of the present invention and its core ideas. It should be noted that it will be apparent to those skilled in the art that various modifications and adaptations of the invention can be made without departing from the principles of the invention and these modifications and adaptations are intended to be within the scope of the invention as defined in the following claims.

Claims

1. A method for training a dialog model, comprising:

if yes, determining the initial professional dialog model as a target professional dialog model;

when the verification score is determined to be less than or equal to the preset score threshold, the method further comprises:

performing verification operation on the updated professional dialogue model by using the verification data set to obtain a verification score, and repeatedly executing the step of judging whether the verification score is larger than a preset score threshold;

the automatic evaluation score corresponding to each response data is calculated according to the PPL index and the distict index, and the calculation formula is as follows:

Score _PPL score on PPL index, score on distict index.

2. The method of training a dialog model of claim 1, further comprising, after obtaining the updated annotation dataset:

3. The method of claim 1, wherein validating the initial specialized dialog model using a validation data set and a preset natural language processing evaluation index comprises:

wherein Score _BLEU Score for the Score of the initial professional dialog model on BLEU index _ROUGE Score for scoring the initial specialized dialog model on a ROUGE index _PPL For the score of the initial professional dialog model on the PPL index, PPL is adoptedReciprocal form of index Score, score _distinct Score for scoring the initial professional dialog model on the DISTINCT index _val To validate the score.

4. A method of training a dialog model according to claim 3, further comprising scoring Score of the initial professional dialog model on a BLEU indicator _BLEU Score of the initial professional dialog model on BLEU index _BLEU The calculation process of (1) comprises:

calculating the Score of the initial professional dialog model on the BLEU index by the following formula _BLEU ：

wherein ,lc is the length of the machine translation, lr is the length of the shortest reference translation sentence, P _n For the accuracy of n-gram, W _n Weight for n-gram, with W for any n _n =1/N, BP is a penalty factor.

5. A method of training a dialog model as claimed in claim 3, further comprising scoring Score of the initial professional dialog model on a ROUGE indicator _ROUGE Score of the initial professional dialog model on the ROUGE index _ROUGE The calculation process of (1) comprises:

calculating the Score of the initial professional dialog model on the ROUGE index by the following formula _ROUGE ：

Wherein { reference translation } represents a reference translationAggregation, gram _N Representing a combination of N words, count (gram _N ) The denominator of the formula is to count the number of N-grams in all reference translations, and the numerator is to count the number of N-grams shared by all reference translations and machine translations.

6. A method of training a dialog model according to claim 3, further comprising scoring Score of the initial professional dialog model on PPL indicators _PPL Score of the initial professional dialog model on the PPL index _PPL Is calculated according to the following steps:

wherein ,P(x_i |x ₁ ，x ₂ ，…，x _i-1 ) Representing the probability of predicting the ith word from the above words, N represents the sentence length.

7. A method of training a dialog model according to claim 3, further comprising scoring Score of the initial professional dialog model on a distict index _distinct Score of the initial professional dialog model on the DISTRINCT index _distinct The calculation process of (1) comprises:

calculating the Score of the initial professional dialog model on the DISTRINCT index by the following formula _distinct ：

Where Count (uniquengram) represents the number of ngram that is not repeated in the reply and Count (word) represents the total number of ngram words in the reply.

8. The method of training a dialog model of claim 1, further comprising, prior to training the original dialog model with the pre-acquired universal dialog data set:

9. The method for training a conversation model of claim 1 wherein training the original conversation model using the pre-acquired generic conversation data set to obtain the generic conversation model comprises:

10. The method of claim 9, wherein determining whether a model training cutoff condition is reached based on the current iteration number and the loss standard deviation comprises:

11. The method of training a dialog model of claim 10, further comprising, when it is determined that the current number of iterations is greater than the first preset value and the standard deviation of loss is greater than or equal to the second preset value:

12. The method of claim 1, wherein data filtering the generic dialogue data set according to the specialized keyword group comprises:

13. A dialog response method, applied to a dialog system comprising a target professional dialog model trained as claimed in any of claims 1 to 12, comprising:

receiving target question voice to be responded;

and outputting the target response voice.

14. The dialog response method of claim 13, further comprising:

and outputting the voice of the related answer.

15. A training device for a dialog model, comprising:

the target professional dialog model determining module is used for determining the initial professional dialog model as a target professional dialog model when the verification score is greater than a preset score threshold;

the training device of the dialogue model may further include:

the repeated execution module is used for carrying out verification operation on the updated professional dialogue model by utilizing the verification data set to obtain a verification score, and repeatedly executing the step of judging whether the verification score is larger than a preset score threshold value;

Score _PPL score on PPL index, score on distict index.

16. A dialog response device for use in a dialog system comprising a target professional dialog model trained in accordance with any of claims 1 to 12, comprising:

17. An electronic device, comprising:

a memory for storing a computer program;

processor for implementing the steps of the training method of a dialog model according to any of claims 1 to 12 or the dialog response method according to any of claims 13 to 14 when said computer program is executed.

18. A computer readable storage medium, having stored thereon a computer program which, when executed by a processor, implements the steps of the training method of a dialog model according to any of claims 1 to 12 or the dialog response method according to any of claims 13 to 14.