CN115391512A

CN115391512A - Training method, device, equipment and storage medium of dialogue language model

Info

Publication number: CN115391512A
Application number: CN202211049973.5A
Authority: CN
Inventors: 胡岩; 郭林海; 张琛; 万化
Original assignee: Shanghai Pudong Development Bank Co Ltd
Current assignee: Shanghai Pudong Development Bank Co Ltd
Priority date: 2022-08-30
Filing date: 2022-08-30
Publication date: 2022-11-25

Abstract

The embodiment of the invention discloses a method, a device, equipment and a storage medium for training a dialogue language model. The method comprises the following steps: obtaining dialogue linguistic data as a training sample, extracting semantic feature codes, fragment feature codes and position feature codes from the dialogue linguistic data, inputting the semantic feature codes, the fragment feature codes and the position feature codes into a dialogue language model to output embedded vectors, outputting a predicted role to which at least one text fragment belongs based on a nonlinear role classifier according to the embedded vectors, calculating a role loss relation with the predicted role according to an actual role to which the text fragment belongs in the dialogue linguistic data, and optimally training the dialogue language model according to the role loss relation. According to the technical scheme of the embodiment of the invention, the dialogue language model is trained by utilizing the information such as the role characteristics, the semantic sequence characteristics and the like extracted from the dialogue corpus, so that the effect of training the dialogue language model is improved.

Description

Training method, device, equipment and storage medium of dialogue language model

Technical Field

The embodiment of the invention relates to the technical field of natural language processing, in particular to a method, a device, equipment and a storage medium for training a dialogue language model.

Background

With the rapid development of the related art of natural language, more and more pre-trained language models are presented, which usually need to be trained in large-scale corpora.

Currently, mainstream Pre-Training language models include an ELMO model (Deep contextualized word Representations), a BERT model (Bidirectional Encoder Representations from converters), a GPT model (Gererate Pre-Training), and the like. The ELMO model can simultaneously consider context information through a Long short-term memory network (LSTM), the learning of polysemous Words is solved, the core idea of the GPT model is that a pre-training task is completed in a mode of a general language model and fine-tuning training through two-stage training, the BERT model refers to the two-way coding idea of the ELMO model, the idea that the GPT uses a Transformer (characteristic extraction model) as a training characteristic extractor is used for reference, and a training method of a Continuous Bag-of-Words model (CBOW, continuous Bag-of-Words) is adopted.

However, the training corpus of the existing pre-training language model is mainly based on a large-scale standardized internet text corpus, which is generally written language, and has a large difference from the conversation actually occurring in real life, and the training result of the pre-training language model is not ideal.

Disclosure of Invention

The embodiment of the invention provides a method, a device, equipment and a storage medium for training a dialogue language model, which aim to solve the problem that the training result of the language model is not ideal.

In a first aspect, an embodiment of the present invention provides a method for training a dialogue language model, including:

obtaining dialogue corpora as training samples; the dialogue corpus comprises at least one round of dialogue of at least two roles, and language text of each role in one round of dialogue is used as a text fragment;

extracting semantic feature codes, segment feature codes and position feature codes from the dialogue corpus;

inputting the semantic feature codes, the segment feature codes and the position feature codes into a dialogue language model and outputting embedded vectors;

outputting a predicted role to which at least one text segment belongs based on a nonlinear role classifier according to the embedded vector;

calculating a role loss relation with the predicted role according to the actual role to which the text segment in the dialogue corpus belongs;

and carrying out optimization training on the dialogue language model according to the role loss relation.

In a second aspect, an embodiment of the present invention provides an apparatus for training a dialogue language model, including:

the training sample acquisition module is used for acquiring dialogue corpora as training samples; the dialogue corpus comprises at least one round of dialogue of at least two roles, and language text of each role in one round of dialogue is used as a text fragment;

the code extraction module is used for extracting semantic feature codes, segment feature codes and position feature codes from the dialogue corpus;

the embedded vector determining module is used for inputting the semantic feature codes, the segment feature codes and the position feature codes into a dialogue language model and outputting embedded vectors;

the predicted role determining module is used for outputting a predicted role to which at least one text segment belongs based on a nonlinear role classifier according to the embedded vector;

the calculation module is used for calculating a role loss relation with the predicted role according to the actual role to which the text fragment in the dialogue corpus belongs;

and the training module is used for carrying out optimization training on the dialogue language model according to the role loss relation.

In a third aspect, an embodiment of the present invention provides an electronic device, where the electronic device includes:

one or more processors;

a storage device for storing one or more programs,

when the one or more programs are executed by the one or more processors, the one or more processors implement the method for training a dialogue language model of the first aspect described above.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored, and the computer program is configured to, when executed, cause a processor to implement the method for training a dialogue language model according to the first aspect.

According to the training scheme of the dialogue language model provided by the embodiment of the invention, a dialogue corpus is obtained as a training sample, wherein the dialogue corpus comprises at least one round of dialogue of at least two roles, a language text of each role in one round of conversation is used as a text segment, semantic feature codes, segment feature codes and position feature codes are extracted from the dialogue corpus, the semantic feature codes, the segment feature codes and the position feature codes are input into the dialogue language model to output an embedded vector, a predicted role to which at least one text segment belongs is output based on a nonlinear role classifier according to the embedded vector, a role loss relation is calculated according to an actual role to which the text segment in the dialogue corpus belongs and the predicted role, and the dialogue language model is optimally trained according to the role loss relation. By adopting the technical scheme, semantic feature codes, segment feature codes and position feature codes extracted from training samples formed by dialogue linguistic data are input into a dialogue language model and output embedded vectors, then the predicted roles to which the text segments in the dialogue linguistic data belong are obtained based on a nonlinear role classifier and the embedded vectors, finally, the role loss relation is calculated according to the actual roles and the predicted roles to which the text segments belong, and the dialogue language model is optimally trained according to the relation.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present invention, nor do they necessarily limit the scope of the invention. Other features of the present invention will become apparent from the following description.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flowchart of a training method for a dialogue language model according to an embodiment of the invention;

FIG. 2 is a flowchart of a training method of a dialogue language model according to a second embodiment of the invention;

FIG. 3 is a schematic structural diagram of a training apparatus for a dialogue language model according to a third embodiment of the present invention;

fig. 4 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. In the description of the present invention, "a plurality" means two or more unless otherwise specified. "and/or" describes the association relationship of the associated object, indicating that there may be three relationships, for example, a and/or B, which may indicate: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example one

Fig. 1 is a flowchart of a training method for a dialogue language model according to an embodiment of the present invention, where the embodiment is applicable to a case of training the dialogue language model, and the method may be performed by a training apparatus for a dialogue language model, where the training apparatus for a dialogue language model may be implemented in a hardware and/or software, and the training apparatus for a dialogue language model may be configured in an electronic device.

As shown in fig. 1, a training method for a dialogue language model provided by an embodiment of the present invention specifically includes the following steps:

s101, obtaining dialogue corpora as training samples.

Wherein the dialog corpus comprises at least one round of dialog of at least two characters, the language text of each character in one round of dialog is used as a text fragment,

in this embodiment, the dialog corpus may be understood as text corresponding to at least one pair of dialogues conducted between at least two characters. The conversation carried out among the characters is preferably spoken conversation, the conversation may have problems of language sickness, wrong words, wrong semantic sequence and the like, the problems can be utilized to train a conversation language model, the learning capacity of the conversation language model is improved, the theme of the content of the conversation corpus can be selected according to actual requirements, and for example, when the conversation language model needs to be applied to the financial field, the theme of the content of the conversation corpus can be the theme related to the financial field.

S102, extracting semantic feature codes, segment feature codes and position feature codes from the dialogue corpus.

In this embodiment, at least one round of text segments in a training sample, that is, a dialog corpus, may be extracted, and converted into semantic feature coding, segment feature coding, and position feature coding by using a preset coding mode, for example, a coding mode in a BERT (Bidirectional Encoder representation based on a converter) model. The semantic feature coding may be understood as that after each character in the text segment is segmented, each character is converted into a specific expression form, and the expression form is the semantic feature coding. The segment feature coding may be understood as that after the sequence of the text segments is sequenced, the codes corresponding to the sequencing are converted into a specific expression form, and the expression form is the segment feature coding. The position feature code may be understood as a code corresponding to the sorting after sorting the position of each character in the text segment, and converting the code into a specific representation form, where the representation form is the position feature code.

S103, inputting the semantic feature codes, the segment feature codes and the position feature codes into a dialogue language model and outputting embedded vectors.

In this embodiment, the semantic feature codes, the segment feature codes and the position feature codes obtained as described above may be input into a dialogue language model, and an embedded vector may be obtained. The function of the dialogue language model can be understood as feature extraction performed on input quantity, and the output embedded vector is a feature vector, which can be used for feature extraction in many models, such as a transform model.

And S104, outputting the prediction role to which at least one text segment belongs based on a nonlinear role classifier according to the embedded vector.

In this embodiment, the embedded vector corresponding to at least one text segment may be input to the nonlinear character classifier, so that the predicted character corresponding to the text segment may be obtained. The role of the nonlinear character classifier can be understood as predicting the character of the text segment corresponding to the embedded vector. For example, if the dialog corpus includes the dialog corpus between the role a and the role b, the embedded vector of the text segment of the dialog corpus between the role a and the role b is input into the nonlinear role classifier, so that it can be predicted whether the role of each text segment belongs to the role a or the role b.

And S105, calculating a role loss relation with the predicted role according to the actual role to which the text fragment in the dialogue corpus belongs.

In this embodiment, the actual role of the text segment embedded with the vector may be compared with the predicted role, and the role loss relationship may be obtained according to the comparison result. The expression form of the role loss relation can be a role loss function, and a corresponding loss function, namely a loss relation, can be constructed according to the difference between a true value, namely an actual role, and a predicted value, namely a predicted role.

And S106, carrying out optimization training on the dialogue language model according to the role loss relation.

In this embodiment, the dialogue language model may be trained according to the numerical value corresponding to the loss relationship. Generally speaking, the smaller the numerical value is, the closer the predicted role output by the nonlinear role classifier is to the actual role, that is, the stronger the semantic expression capability of the dialogue language model is, and if the numerical value is larger, the numerical value corresponding to the loss relationship can be reduced by adjusting the relevant parameters of the dialogue language model, such as the weight coefficient and/or the offset parameter, and the like, so as to enhance the semantic expression capability of the dialogue language model.

The embodiment of the invention provides a training method of a dialogue language model, which obtains dialogue corpora as training samples; the dialogue corpus comprises at least one round of dialogue of at least two roles, a language text of each role in one round of conversation is used as a text segment, semantic feature codes, segment feature codes and position feature codes are extracted from the dialogue corpus, the semantic feature codes, the segment feature codes and the position feature codes are input into a dialogue language model to output embedded vectors, predicted roles to which at least one text segment belongs are output based on a nonlinear role classifier according to the embedded vectors, a role loss relation is calculated with the predicted roles according to actual roles to which the text segments in the dialogue corpus belong, and the dialogue language model is optimally trained according to the role loss relation. The technical scheme of the embodiment of the invention is that semantic feature codes, segment feature codes and position feature codes extracted from training samples formed by dialogue linguistic data are input into a dialogue language model and embedded vectors are output, then the predicted roles of the text segments in the dialogue linguistic data are obtained based on a nonlinear role classifier and the embedded vectors, finally a role loss relation is calculated according to the actual roles and the predicted roles of the text segments, and the dialogue language model is optimally trained according to the relation.

Example two

Fig. 2 is a flowchart of a training method of a dialogue language model according to a second embodiment of the present invention. The technical scheme of the embodiment of the invention is further optimized on the basis of the optional technical schemes, and a specific process for training the dialogue language model is given.

Optionally, before extracting the semantic feature code from the dialog corpus, the method further includes: performing mask processing on at least one set field vocabulary in the dialogue corpus according to a set mask processing strategy to update actual characters in the dialogue corpus to be mask characters; correspondingly, after the semantic feature coding, the segment feature coding and the position feature coding are input into the dialogue language model and an embedded vector is output, the method further comprises the following steps: outputting a predicted character corresponding to each character in the dialogue corpus based on a nonlinear character classifier according to the embedded vector; calculating a character loss relation according to actual characters corresponding to the characters in the dialogue corpus and the predicted characters; and carrying out optimization training on the dialogue language model according to the character loss relation. The advantage of this arrangement is that the training intensity of the dialogue language model in the set field is improved by adding the mask code in the dialogue corpus by adopting the self-supervision training mode.

Optionally, the performing optimization training on the dialogue language model according to the role loss relationship, and performing optimization training on the dialogue language model according to the character loss relationship includes: calculating a total loss relation according to the role loss relation and the character loss relation; and carrying out optimization training on the dialogue language model according to the total loss relation. The advantage of this arrangement is that when the loss relation is calculated, the element of the character loss relation is introduced, and the dialogue language model is trained more comprehensively.

Optionally, the inputting the semantic feature coding, the segment feature coding, and the position feature coding into the dialogue language model and outputting the embedded vector includes: splicing the semantic feature codes, the segment feature codes and the position feature codes into input vectors; inputting the input vector into a dialogue language model, and outputting an embedded vector; wherein the character positions of the embedded vector and the input vector correspond to each other. The advantage of setting up like this is, through the mode of concatenation, combine multiple feature encoding, input in the dialogue language model to the embedding vector that obtains contains comparatively comprehensive characteristic information in the dialogue corpus.

As shown in fig. 2, a training method of a dialogue language model provided by the second embodiment of the present invention specifically includes the following steps:

s201, obtaining dialogue corpora as training samples.

Specifically, forThe corpus can be abstractly represented as:

wherein d represents a complete multi-turn dialog, k represents the total turn of the multi-turn dialog, (a) _h ,u _h ) Denoted as h-th session, a, in a multi-session d _h Content of the h-th turn of the dialog, u, of the user with the character a _h The content of the h-th turn of the user with the character u is shown. a is _h And can be further expressed as

Wherein a is _hj The jth word that represents the h-th wheel role a. N is the total word number of the dialogue corpus,

n _i is the total number of characters in the ith segment, and S is the number of segments.

S202, performing mask processing on at least one set domain vocabulary in the dialogue corpus according to a set mask processing strategy to update actual characters in the dialogue corpus to be mask characters.

Specifically, a part of words in the set region in the dialog prediction may be masked, that is, at least one set region word may be replaced with a mask, so that a dialog corpus including mask characters of the set region word may be obtained. The mask processing policy may include a mask mode, such as a full word mask, a character mask, and information such as a selection ratio to be processed by the mask, and the setting field may be set according to an actual situation, for example, when the dialogue language model needs to be applied to a financial field, the setting field may be the financial field.

Optionally, according to a set mask processing policy, performing mask processing on at least one set domain vocabulary in the dialog corpus includes: identifying and determining a set domain vocabulary from the dialogue corpus; selecting vocabularies which accord with the selection proportion from the dialogue corpus as vocabularies to be replaced according to the selection proportion in the set mask processing strategy, wherein the probability that the set field vocabularies are selected as the vocabularies to be replaced is greater than the probability that the non-set field vocabularies are selected as the vocabularies to be replaced; and if the vocabulary to be replaced contains the set field vocabulary, performing mask processing on characters of the set field vocabulary in the vocabulary to be replaced by using symbols or texts to form mask characters. The advantage of setting up like this is, utilize mask technology, training the appropriate quantity of setting domain vocabulary in the sample, replace for the mask character, promoted the learning ability of dialogue language model in the setting domain.

Specifically, a preset method, such as a semantic recognition algorithm, may be used to recognize words in a set field from the dialog corpus, and then a set number of words may be selected from the dialog corpus as words to be masked, i.e., words to be replaced, according to a mask processing policy. The set number is consistent with the number of words corresponding to the selection proportion in the mask processing strategy, the probability that the set field words are selected as the words to be replaced is higher than that of the non-replacement words, the proportion that the mask characters are replaced in the words to be replaced can be set according to actual conditions, for example, the mask characters can be symbols or text characters and the like according to the complexity of dialogue linguistic data, and when the words to be replaced are selected from the set field words, the selection mode can be random selection, and the selection is not limited.

For example, if the selection ratio in the mask setting processing policy is 15%, the set domain vocabulary is financial vocabulary, and the number of vocabularies in the dialogue corpus is 1000, the set domain vocabulary in the dialogue corpus may be recognized first, and then the selection may be performed from the 1000 vocabularies according to the mask setting processing policy, where the selection probability is high if the set domain vocabulary is the set domain vocabulary, and the selection probability is low if the selection probability is not the set domain vocabulary. The characters of the selected mask vocabulary are masked with the symbols or texts to form mask characters, for example, 80% of the vocabulary to be replaced is replaced with the symbols, 10% of the vocabulary to be replaced is replaced with the texts, and 10% of the vocabulary characters to be replaced is not processed. The vocabulary which is not masked still records the position of masking, and the result is taken into consideration as the factor for calculating the loss relation. For example, if role a _h Is expressed as

Masking the 2 nd and 3 rd characters to obtain

Wherein, [ MASK ]]Is a mask character.

S203, extracting semantic feature codes, segment feature codes and position feature codes from the dialogue corpus.

Optionally, the semantic feature codes are used to characterize semantic features of each character, and the semantic feature code of each character is recorded as

Adding semantic feature codes of each character into a character embedding table, and marking as E ^t ∈R ^V×d (ii) a Wherein e represents a feature code of a position, i represents a segment serial number of the dialogue corpus, j represents a character serial number in the ith segment, superscript t represents a semantic feature code, d represents the dialogue corpus, R represents a vector, and V represents a word list size.

Specifically, the text segments in the dialog corpus may be encoded to obtain semantic feature codes and a corresponding character embedding table.

Optionally, the segment feature codes are used to characterize segment sequence features of the text segments, and the segment feature code of each text segment is recorded as

Adding each segment feature code into a segment embedding table, and marking as E ^s ∈R ^S×d (ii) a Wherein S represents the total text fragment number of the dialogue corpus; the superscript s denotes the segment feature code.

Specifically, the text segment in the dialog corpus may be encoded to obtain a segment feature code and a segment embedding table corresponding to the segment feature code.

Optionally, the position feature codes are used to represent position features of each character, and the position feature code of each character is recorded as

Adding each position feature code into a position embedding table, and recording E ^p ∈R ^N×d (ii) a Wherein N represents the total number of positions of the feature code, i.e. the total number of words of the dialog corpus, and the superscript p represents the position feature code

Specifically, the text segment in the dialog corpus may be encoded to obtain the position feature code and the corresponding position embedding table.

And S204, splicing the semantic feature codes, the segment feature codes and the position feature codes into input vectors.

Specifically, the splicing manner may be a preset manner, such as end-to-end splicing or other combination manners, which is not limited herein.

Illustratively, as described above, if the semantic feature code is recorded as

Segment feature codes are denoted as

Feature codes are recorded as

The input vector may be a vector of values,

wherein e is _ij Is the input vector.

And S205, inputting the input vector into the dialogue language model, and outputting an embedded vector.

Wherein the character positions of the embedded vector and the input vector correspond to each other.

Illustratively, if the dialogue language model is a transform model, the embedding of the output is performed after the processing of the model, as described aboveThe vector may be, E _ij ＝transformer(e _ij ) Wherein E is _ij ∈R ^d ，E _ij For embedding a vector, the character positions of the embedded vector and the input vector correspond to each other by means of indices i, j, e.g. input e ₁₁ Then the output embedded vector is E ₁₁ 。

And S206, outputting the predicted character corresponding to each character in the dialogue corpus based on a nonlinear character classifier according to the embedded vector.

Specifically, in this embodiment, if a mask character exists in a text segment corresponding to an embedded vector, the embedded vector is input to a nonlinear character classifier, and then the output predicted character is a prediction of an actual character, not a prediction of a mask character. In the training process of the dialogue language model, the positions of the mask characters in the training sample are known information, and the positions of the mask characters can be recorded when the input characters are subjected to mask processing.

And S207, calculating a character loss relation according to the actual character corresponding to each character in the dialogue corpus and the predicted character.

Optionally, calculating a character loss relationship with the predicted character according to the actual character corresponding to each character in the dialog corpus includes: according to the actual characters corresponding to the characters in the dialogue corpus and the predicted characters, calculating a character loss relation according to the following formula:

wherein L is ₁ () Is a character loss relationship;

is the probability that the predicted character is the same as the actual character;

theta is a predicted character of a k mask character and theta is a parameter of the dialogue language model; theta ₁ Parameters of the non-linear character classifier; m is the number of characters processed by the mask in the input sequence; s is the pairTotal number of segments of the corpus;

is the actual character of the k-th mask character,

the value range of the position of the actual character of the mask character is represented as a word list range;

after representing the pair mask

The predicted value of (2); b is a mixture of ₁ A bias parameter for the non-linear character classifier; e ^t T is the transpose of the character embedding table; e _ij Coding the characteristics of the jth position in the input vector of the ith segment; softmax () function of the non-linear character classifier. The method has the advantages that the character loss relation is expressed in the form of a loss function, and the training effect of the dialogue language model can be intuitively judged according to the result of the loss function.

Specifically, the expression of the character loss function may mean that when the parameter of the dialogue language model is θ, the parameter of the nonlinear character classifier is θ ₁ And then, for the probability that the predicted character and the actual character of each mask character are the same, obtaining a logarithm value and carrying out accumulation calculation. Wherein, the first and the second end of the pipe are connected with each other,

and

all the single dimension position representation can convert the multi-dimension position representation form of the dialog corpus and the output quantity of the non-linear character classifier into the single dimension position representation form, for example, the front of the character or the output quantity can be represented in a set modeThe subsequent sequence is sorted, different text segments are not distinguished, such as 10 characters of the 1 st text segment, and the position representation form of the 1 st character of the 2 nd segment can be converted into the position representation form of the 11 th character.

And S208, outputting the prediction role to which at least one text segment belongs based on the nonlinear role classifier according to the embedded vector.

S209, calculating a role loss relation with the predicted role according to the actual role to which the text fragment in the dialogue corpus belongs.

Optionally, calculating a role loss relationship with the predicted role according to the actual role to which the text fragment in the dialog corpus belongs includes: and calculating the role loss relation according to the actual role to which the text segment belongs in the dialogue corpus and the predicted role according to the following formula:

wherein: l is ₂ () A role loss relationship; p is _i ³ The probability that the predicted role is the same as the actual role is obtained;

a predicted role for the ith text fragment; theta is a parameter of the dialogue language model; theta ₂ Parameters for a non-linear character classifier; theta ₂ Is denoted as [ W ] ₃ ∈R ^d×1 ,b ₃ ]；W ₃ ∈R ^d×1 As matrix parameters of the nonlinear character classifier, b ₃ Bias parameters for the non-linear classifier; s is the total fragment number of the dialogue corpus, and i is the text fragment sequence number;

for the actual role of the ith segment,

representing the actual role of the text segment, wherein the value range of the actual role is two different values;

and is

E _i1 Coding the feature of the first position in the input vector of the ith segment; sigmoid () function of the nonlinear character classifier. The method has the advantages that the character loss relation is expressed in the form of the loss function, and the training effect of the dialogue language model can be intuitively judged according to the result of the loss function.

Specifically, the expression of the character loss function may mean that when the parameter of the dialogue language model is θ, the parameter of the nonlinear character classifier is θ ₂ And then, calculating the logarithm value of the probability that the predicted role and the actual role of each text segment are the same, and performing accumulation calculation.

S210, calculating a total loss relation according to the role loss relation and the character loss relation.

Specifically, in this embodiment, elements of the character loss relationship are added, and the loss relationship can be calculated according to the character loss relationship and the character loss relationship, where the manner of calculating the loss relationship is a setting manner, such as summation or weighted summation, and is not limited herein.

And S211, carrying out optimization training on the dialogue language model according to the total loss relation.

The invention provides a training method of a dialogue language model, which comprises the steps of performing mask processing on set domain vocabularies in a dialogue corpus in a training sample formed by the dialogue corpus, extracting semantic feature codes, fragment feature codes and position feature codes from the dialogue corpus after the mask processing, inputting the semantic feature codes, the fragment feature codes and the position feature codes into the dialogue language model, outputting an embedded vector, obtaining predicted roles and predicted characters of text fragments in the dialogue corpus based on a nonlinear role classifier and a nonlinear character classifier, calculating role loss relations and character loss relations according to the actual roles and the predicted characters, optimizing and training the dialogue language model according to the two loss relations, performing mask processing on the set domain vocabularies in the dialogue corpus, predicting the mask characters by the dialogue language model, enabling the dialogue language model to have semantic extraction conversation capacity based on context information, improving the semantic expression capacity of the dialogue language model to the set domain vocabularies, combining the role features of the set domain vocabularies and the corpus features, combining the language model, and ensuring the dialogue effect of the dialogue model.

EXAMPLE III

Fig. 3 is a schematic structural diagram of a training apparatus for a dialogue language model according to a third embodiment of the present invention. As shown in fig. 3, the apparatus includes: a training sample obtaining module 301, an encoding extraction module 302, an embedded vector determination module 303, a predicted role determination module 304, a calculation module 305, and a training module 306, wherein:

the training sample acquisition module is used for acquiring dialogue corpora as training samples; the dialogue corpus comprises at least one round of dialogue of at least two roles, and the language text of each role in one round of dialogue is used as a text fragment;

The training device of the dialogue language model provided by the embodiment of the invention inputs semantic feature codes, fragment feature codes and position feature codes extracted from a training sample consisting of dialogue linguistic data into the dialogue language model, outputs an embedded vector, obtains a predicted role to which a text fragment in the dialogue linguistic data belongs based on a nonlinear role classifier and the embedded vector, calculates a role loss relation according to an actual role and the predicted role to which the text fragment belongs, and performs optimization training on the dialogue language model according to the relation.

Optionally, the apparatus further includes:

the mask processing module is used for performing mask processing on at least one set domain vocabulary in the dialogue corpus according to a set mask processing strategy before extracting semantic feature codes from the dialogue corpus so as to update actual characters in the dialogue corpus to be mask characters;

the predicted character output module is used for outputting the predicted characters corresponding to the characters in the dialogue corpus on the basis of a nonlinear character classifier according to the embedded vector after the semantic feature coding, the segment feature coding and the position feature coding are input into a dialogue language model and an embedded vector is output;

the calculation module is used for calculating a character loss relation according to the actual character corresponding to each character in the dialogue corpus and the predicted character;

a training module for performing optimization training on the dialogue language model according to the character loss relation

Optionally, the apparatus includes:

the calculation module is used for calculating a total loss relation according to the role loss relation and the character loss relation;

and the training module is used for carrying out optimization training on the dialogue language model according to the total loss relation.

Optionally, the mask processing module includes:

a domain vocabulary determining unit for identifying and determining the set domain vocabulary from the dialogue corpus;

the vocabulary to be replaced determining unit is used for selecting the vocabulary which accords with the selection proportion from the dialogue corpus as the vocabulary to be replaced according to the selection proportion in the set mask processing strategy, wherein the probability that the set field vocabulary is selected as the vocabulary to be replaced is greater than the probability that the non-set field vocabulary is selected as the vocabulary to be replaced;

and the mask processing unit is used for performing mask processing on one or more characters of the set field vocabulary in the vocabulary to be replaced by using symbols or texts to form mask characters if the vocabulary to be replaced contains the set field vocabulary.

Optionally, the embedded vector determining module includes:

the splicing unit is used for splicing the semantic feature codes, the segment feature codes and the position feature codes into input vectors;

an embedded vector determination unit for inputting the input vector into a dialogue language model and outputting an embedded vector; wherein the character positions of the embedded vector and the input vector correspond to each other.

Adding semantic feature codes of each character into a character embedding table, and marking as E ^t ∈R ^V×d (ii) a Wherein e represents a feature code of a position, i represents a segment serial number of the dialogue corpus, j represents a character serial number in the ith segment, superscript t represents a semantic feature code, d represents the dialogue corpus, R represents a vector, and V represents a word list size; the segment feature code is used for representing the segment sequence feature of each text segment, and the segment feature code of each text segment is recordedIs composed of

Adding each segment feature code into a segment embedding table, and marking as E ^s ∈R ^S×d (ii) a Wherein S represents the total text fragment number of the dialogue corpus; the superscript s represents segment feature coding; the position feature codes are used for representing the position features of the characters, and the position feature codes of each character are recorded as

Adding each position feature code into a position embedding table, and marking as E ^p ∈R ^N×d (ii) a Where N represents the total number of positions of the feature code and the superscript p represents the position feature code.

Optionally, the calculation module includes:

the first calculating unit is used for calculating the role loss relation according to the actual role to which the text segment belongs in the dialogue corpus and the predicted role according to the following formula:

wherein: l is a radical of an alcohol ₂ () A role loss relationship; p _i ³ Is the probability that the predicted character is the same as the actual character,

a predicted role for the ith text fragment; theta is a parameter of the dialogue language model; theta ₂ Parameters for a non-linear character classifier; theta ₂ Is marked as [ W ] ₃ ∈R ^d×1 ,b ₃ ]；W ₃ ∈R ^d×1 As matrix parameters of the nonlinear character classifier, b ₃ Bias parameters for the non-linear classifier; s is the total fragment number of the dialogue corpus, and i is the fragment sequence number;

for the actual role of the ith segment,

and is

E _i1 Coding the feature of the first position in the input vector of the ith segment; sigmoid () a function of the nonlinear character classifier.

Optionally, the calculation module includes:

a second calculating unit, configured to calculate, according to the actual character corresponding to each character in the dialog corpus and the predicted character, a character loss relationship according to the following formula:

wherein L is ₁ () Is a character loss relationship;

a predicted character value for a kth mask character; theta is a parameter of the dialogue language model; theta ₁ Parameters of the non-linear character classifier; m is the number of characters processed by mask in the input sequence; s is the total fragment number of the dialogue corpus;

the actual character value for the kth mask character,

after representing the pair mask

The predicted value of (2); b ₁ Is a bias parameter of the non-linear character classifier; e ^t T is the transpose of the character embedding table; e _ij Coding the characteristics of the jth position in the input vector of the ith segment; softmax () function of the nonlinear character classifier.

The training device of the dialogue language model provided by the embodiment of the invention can execute the training method of the dialogue language model provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.

Example four

FIG. 4 shows a schematic block diagram of an electronic device that may be used to implement an embodiment of the invention.

As shown in fig. 4, the electronic apparatus includes a processor 40, a memory 41, an input device 42, and an output device 43; the number of the processors 40 in the electronic device may be one or more, and one processor 40 is taken as an example in fig. 4; the processor 40, the memory 41, the input device 42 and the output device 43 in the electronic apparatus may be connected by a bus or other means, and the connection by the bus is exemplified in fig. 4.

The memory 41 is used as a computer-readable storage medium, and can be used for storing software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the training method of the dialogue language model in the embodiment of the present invention (for example, the training sample acquisition module 301, the code extraction module 302, the embedded vector determination module 303, the predicted role determination module 304, the calculation module 305, and the training module 306 in the training apparatus of the dialogue language model). The processor 40 executes various functional applications of the device/terminal/server and data processing by running software programs, instructions and modules stored in the memory 41, that is, implements the above-described training method of the dialogue language model.

The memory 41 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the memory 41 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, memory 41 may further include memory located remotely from processor 40, which may be connected to devices/terminals/servers via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 42 may be used to receive input numeric or character information and to generate key signal inputs associated with a dialog corpus of the electronic device. The output device 43 may include a display device such as a display screen.

EXAMPLE five

An embodiment of the present invention further provides a storage medium containing computer-executable instructions, which when executed by a computer processor, perform a method for training a dialogue language model, the method including:

Of course, the storage medium containing the computer-executable instructions provided by the embodiments of the present invention is not limited to the method operations described above, and may also perform related operations in the training method of the dialog language model provided by any embodiments of the present invention.

From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.

It should be noted that, in the embodiment of the above search apparatus, each included unit and module are merely divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.

It is to be noted that the foregoing description is only exemplary of the invention and that the principles of the technology may be employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A method for training a conversational language model, comprising:

obtaining dialogue corpora as training samples; the dialogue corpus comprises at least one round of dialogue of at least two roles, and the language text of each role in one round of dialogue is used as a text fragment;

2. The method according to claim 1, further comprising, before extracting semantic feature codes from the dialog corpus:

performing mask processing on at least one set domain vocabulary in the dialogue corpus according to a set mask processing strategy to update actual characters in the dialogue corpus to be mask characters;

correspondingly, after the semantic feature coding, the segment feature coding and the position feature coding are input into the dialogue language model and an embedded vector is output, the method further comprises the following steps:

according to the embedded vector, outputting a predicted character corresponding to each character in the dialogue corpus based on a nonlinear character classifier;

calculating a character loss relation according to actual characters corresponding to the characters in the dialogue corpus and the predicted characters;

and carrying out optimization training on the dialogue language model according to the character loss relation.

3. The method of claim 2, wherein optimally training the dialog language model according to the character loss relationship and optimally training the dialog language model according to the character loss relationship comprises:

calculating a total loss relation according to the role loss relation and the character loss relation;

and carrying out optimization training on the dialogue language model according to the total loss relation.

4. The method according to claim 2, wherein masking at least one domain-setting vocabulary in the dialog corpus according to a mask-setting policy comprises:

identifying and determining a set domain vocabulary from the dialogue corpus;

selecting vocabularies which accord with the selection proportion from the dialogue corpus as vocabularies to be replaced according to the selection proportion in the set mask processing strategy, wherein the probability that the set field vocabularies are selected as the vocabularies to be replaced is greater than the probability that the non-set field vocabularies are selected as the vocabularies to be replaced;

if the vocabulary to be replaced contains the set field vocabulary, one or more characters of the set field vocabulary in the vocabulary to be replaced are subjected to mask processing by using symbols or texts to form mask characters.

5. The method of claim 1, wherein the semantic feature coding, segment feature coding, and position feature coding, the inputting the dialog language model output embedded vector comprises:

splicing the semantic feature codes, the segment feature codes and the position feature codes into input vectors;

inputting the input vector into a dialogue language model, and outputting an embedded vector; wherein the character positions of the embedded vector and the input vector correspond to each other.

6. The method according to any one of claims 1 to 5, wherein:

the semantic feature codes are used for representing the semantic features of all the characters, and the semantic feature codes of all the characters are recorded as

Adding semantic feature codes of each character into a character embedding table, and marking as E ^t ∈R ^V×d (ii) a Wherein e represents a feature code of a position, i represents a segment serial number of the dialogue corpus, j represents a character serial number in the ith segment, superscript t represents a semantic feature code, d represents the dialogue corpus, R represents a vector, and V represents a word list size;

the segment feature codes are used for representing segment sequence features of the text segments, and the segment feature codes of each text segment are recorded as

Adding each segment feature code into a segment embedding table, and marking as E ^s ∈R ^S×d (ii) a Wherein S represents the total text fragment number of the dialogue corpus; the superscript s represents segment feature coding;

the position feature codes are used for representing the position features of the characters, and the position feature codes of each character are recorded as

Adding each position feature code into a position embedding table, and recording E ^p ∈R ^N×d (ii) a Where N represents the total number of positions of the feature code and the superscript p represents the position feature code.

7. The method of claim 6, wherein calculating the character loss relationship with the predicted character based on the actual character to which the text segment in the dialog corpus belongs comprises:

and calculating the role loss relation according to the actual role to which the text segment belongs in the dialogue corpus and the predicted role according to the following formula:

wherein: l is ₂ () A role loss relationship; p is _i ³ Is the probability that the predicted character is the same as the actual character,

a predicted role for the ith text segment; theta is a parameter of the dialogue language model; theta.theta. ₂ Parameters for a non-linear character classifier; theta ₂ Is marked as [ W ] ₃ ∈R ^d×1 ，b ₃ ]；W ₃ ∈R ^d×1 As matrix parameters of the nonlinear character classifier, b ₃ Bias parameters for the non-linear classifier; s is the total fragment number of the dialogue corpus, and i is the fragment sequence number;

for the actual role of the ith segment,

and is

E _i1 Is the ith sliceFeature encoding of a first position in the input vector of the segment; sigmoid () function of the nonlinear character classifier.

8. The method according to claim 6, wherein calculating a character loss relationship with the predicted character according to the actual character corresponding to each character in the dialog corpus comprises:

according to the actual characters corresponding to the characters in the dialogue corpus and the predicted characters, calculating a character loss relation according to the following formula:

wherein L is ₁ () Is a character loss relationship;

a predicted character value for a kth mask character; theta is a parameter of the dialogue language model; theta ₁ Parameters of the non-linear character classifier; m is the number of characters processed by the mask in the input sequence; s is the total fragment number of the dialogue corpus;

the actual character value for the kth mask character,

after representing the pair mask

The predicted value of (2); b is a mixture of ₁ Is a bias parameter of the non-linear character classifier; e ^t T is the transpose of the character embedding table; e _ij Coding the characteristics of the jth position in the input vector of the ith segment; softmax () function of the non-linear character classifier.

9. An apparatus for training a conversational language model, comprising:

10. An electronic device, characterized in that the electronic device comprises:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement a method of training a conversational language model as recited in any one of claims 1-8.

11. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a method for training a dialogue language model according to any one of claims 1 to 8.