CN111291166A

CN111291166A - Method and device for training language model based on Bert

Info

Publication number: CN111291166A
Application number: CN202010384255.8A
Authority: CN
Inventors: 刘佳
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2020-05-09
Filing date: 2020-05-09
Publication date: 2020-06-16
Anticipated expiration: 2040-05-09
Also published as: CN111291166B

Abstract

An embodiment of the present specification provides a method for training a language model based on Bert, where the method includes: firstly, obtaining a historical conversation sample, wherein the historical conversation sample comprises a plurality of conversation sentences generated by a certain business conversation and corresponding class labels, and the class labels indicate whether the business purpose is achieved through the certain business conversation; secondly, determining a semantic symbol sequence based on the plurality of dialogue sentences, and inputting the semantic symbol sequence into the language model to obtain an integral semantic vector; then, inputting the whole semantic vector into a business conversation classification model to obtain a prediction classification result; finally, based on the prediction classification result and the class label, the model parameters of the business conversation classification model and the language model are adjusted. In this way, by judging whether the purpose of the multi-turn dialogue is achieved as a training task, the semantic understanding depth and the semantic understanding width of the trained language model to the multi-turn dialogue can be improved.

Description

Method and device for training language model based on Bert

Technical Field

The embodiment of the specification relates to the technical field of natural language processing, in particular to a method and a device for training a language model based on Bert.

Background

Language models have been widely used in a number of fields including machine translation, handwritten Chinese character recognition, information retrieval, and the like. In the current process, only the pure text information in the training text is usually considered, so that the learned text semantics are more shallow.

Especially, in the scene of multi-turn conversations, only the pure text contents of the multi-turn conversations are integrally input into the language model during training, so that the semantic understanding degree of the trained language model to the multi-turn conversations is very limited, and higher requirements cannot be met. For example, when the language model is applied to a multi-turn dialog interaction scenario, a reply sentence needs to be generated according to the generated dialog content, and at this time, if the language model cannot better understand the semantics of the generated dialog content, the reply sentence close to the dialog topic and conforming to the human language habit cannot be accurately generated.

Therefore, a solution is needed to enable the trained language model to accurately mine the deep semantics of multiple rounds of dialog.

Disclosure of Invention

One or more embodiments of the present specification describe a training method of a Bert-based language model, which can learn more semantic representation information of multiple rounds of dialog.

According to a first aspect, an embodiment of the present specification provides a method for training a Bert-based language model. The method comprises the following steps: acquiring a historical conversation sample, wherein the historical conversation sample comprises a plurality of conversation sentences generated by a certain business conversation and corresponding class labels, and the class labels indicate whether the purpose of business is achieved through the certain business conversation; determining a semantic symbol sequence based on the plurality of dialog sentences; inputting the semantic symbol sequence into the language model to obtain an integral semantic vector; inputting the whole semantic vector into a business session classification model to obtain a prediction classification result; and adjusting model parameters of a business conversation classification model and the language model based on the prediction classification result and the class label.

In one embodiment, the session role of the certain service session includes a service party and a user, and the obtaining a historical session sample includes: acquiring the plurality of dialogue sentences and acquiring the service behavior data of the user after the certain service session; determining the class label according to the service behavior data based on a preset rule; and constructing the historical conversation sample based on the plurality of dialog sentences and the category labels.

In one embodiment, determining a sequence of semantic symbols based on the plurality of dialog statements comprises: adding a preset initial symbol before a first dialog statement and determining a plurality of semantic unit symbols corresponding to each dialog statement to obtain a semantic symbol sequence; wherein the semantic unit symbols are characters or words in the dialogue sentences.

In a specific embodiment, obtaining the semantic symbol sequence further includes: in which a predetermined separation symbol is added between every two adjacent dialog sentences.

In a more specific embodiment, the historical conversation sample further includes a conversation role corresponding to each conversation statement, and the predetermined separation symbol belongs to a plurality of role indicators corresponding to a plurality of conversation roles; adding a predetermined separation symbol between every two adjacent dialog sentences, including: and adding a role indicator corresponding to the conversation role of each conversation statement before each conversation statement based on the corresponding relation between the conversation roles and the role indicators.

On the other hand, in a specific embodiment, inputting the semantic symbol sequence into the language model to obtain an overall semantic vector, includes: inputting the semantic symbol sequence into the language model to obtain a semantic expression vector corresponding to the position of each semantic symbol in the semantic symbol sequence; and determining the whole semantic vector based on the obtained semantic expression vector.

In a more specific embodiment, determining the overall semantic vector based on the obtained semantic representation vector includes: taking the semantic expression vector corresponding to the position of the preset initial symbol as the whole semantic vector; or, calculating an average vector of the obtained semantic expression vectors as the whole semantic vector; or, aiming at the obtained semantic expression vector, determining the maximum value in a plurality of element values corresponding to each vector element position to form the whole semantic vector.

In another more specific embodiment, obtaining the semantic symbol sequence further includes: randomly replacing a first number of semantic unit symbols with a first number of default symbols according to a plurality of determined semantic unit symbols corresponding to each dialogue statement; after obtaining the semantic representation vector corresponding to the position of each semantic symbol in the semantic symbol sequence, the method further includes: respectively inputting semantic expression vectors corresponding to the positions of the default characters of the first quantity into a character prediction model to correspondingly obtain a predicted character result of the first quantity; based on the classification prediction result and the class label, adjusting model parameters of a service session classification model and the language model, and further comprising: adjusting model parameters of the language model and the character prediction model based on the first number of semantic unit symbols and the predicted character result.

According to a second aspect, an embodiment of the present specification provides a training apparatus based on a Bert language model, including: the sample acquisition unit is configured to acquire a historical conversation sample, wherein the historical conversation sample comprises a plurality of conversation sentences generated by a certain business conversation and corresponding class labels, and the class labels indicate whether the business purpose is achieved through the certain business conversation; a sequence determination unit configured to determine a semantic symbol sequence based on the plurality of dialog sentences; the vector prediction unit is configured to input the semantic symbol sequence into the language model to obtain an overall semantic vector; the classification prediction unit is configured to input the whole semantic vector into a business conversation classification model to obtain a prediction classification result; and the model parameter adjusting unit is configured to adjust the business conversation classification model and the model parameters of the language model based on the prediction classification result and the class label.

According to a third aspect, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in the first aspect.

According to a fourth aspect, there is provided a computing device comprising a memory and a processor, wherein the memory has stored therein executable code, and the processor, when executing the executable code, implements the method described in the first aspect.

In summary, in the training method and apparatus disclosed in the embodiments of the present specification, whether or not a dialog target of learning a universal dialog is achieved is taken as a model training task, so that the semantic comprehension ability of the Bert-based language model can be improved. In addition, a plurality of dialogue sentences in the whole dialogue are separated by a preset separator and then input into the model, so that the number of the language model processing sentences is expanded from 2 to more. Further, by setting a predetermined delimiter as a character indicator indicating a conversation character, conversation character information is introduced, thereby allowing the language model to learn semantic information in a plurality of turns of conversation more and better. In a word, the language model trained in the way has wider, deeper and more effective semantic understanding on multiple rounds of conversation.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments disclosed in the present specification, the drawings needed to be used in the description of the embodiments will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments disclosed in the present specification, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.

FIG. 1 illustrates a schematic diagram of an implementation process for training a Bert-based language model, according to one embodiment;

FIG. 2 illustrates a flowchart of a method for training based on a Bert's language model, according to one embodiment;

FIG. 3 illustrates a process diagram for joint training of a language model, a predictive classification model, and a character prediction model, according to one embodiment;

FIG. 4 shows a schematic structural diagram of a training apparatus based on a Bert language model according to an embodiment.

Detailed Description

Embodiments disclosed in the present specification are described below with reference to the accompanying drawings.

As mentioned above, there is a need for a solution that enables a trained language model to learn more about or mine the semantic representation information of multiple rounds of dialog. The embodiment of the specification discloses a method for training a Bert-based language model. For ease of understanding, a brief description of Bert follows.

Currently, the best deep semantic understanding model is Bert, introduced in 2018, which is called BidirectionalEncoder reproduction from Transformers, i.e., the encoder of the bidirectional Transformer. The basic integrated unit of Bert is a transform coder (Encoder), but the semantic comprehension capability of Bert is superior to that of other models such as a transform, and for this reason, the pretraining process of Bert plays a decisive role. It should be noted that most of the existing methods for using Bert are to obtain a Bert base model obtained through pre-training (for example, a large amount of general corpora can be used for pre-training, or an existing Bert base model can be directly obtained), and then, according to a specific application scenario, the training corpora in the corresponding scenario are used to perform fine tuning (fine tune) on the Bert base model, so as to obtain a model applied to the scenario.

In the pretraining process of Bert, two training tasks are introduced, so that the text semantics can be well understood. One of the training tasks is to predict the removed (mask) words in the sentence. Specifically, part of words in a sentence are removed, then the removed words are predicted by using the model, model training is realized, and in the process of predicting the removed words, the model realizes bidirectional representation of the removed words by using the words before and the words after the removed words, so that semantic understanding is deepened. The other training task is to input two sentences and enable the model to learn whether the two sentences are adjacent texts, so that the model can capture semantic relation between the two sentences. Through the two training tasks, the pretraining of the Bert can be realized, and the pretrained Bert base model is obtained.

When the Bert is applied to a multi-turn dialog scene, the Bert base model needs to be finely adjusted according to the training corpus in the scene to obtain a language model in the scene. In one embodiment, a plurality of dialog sentences in the whole dialog can be used as a whole text and input into the Bert base model, and the words in which the sentences are predicted to be removed are used as training tasks, so that the fine-tuned language model is obtained. For such an embodiment, the language model only relies on the plain text information for semantic learning, so the learned semantic representation information is relatively limited.

Based on the observation and analysis, the inventor proposes to set a training task by utilizing the particularity of a multi-turn dialogue scene, so that the semantic model obtained by training can better mine effective semantic information of the multi-turn dialogue. In particular, the inventors have found that in the context of multiple rounds of dialog, the dialog is typically conducted for some purpose. For example, in a customer service scene, a user has a small dialog with customer service, and the user problem is to be solved. For example, in an insurance renewal scene, an insurance service person has a conversation with a user and wishes to renew a insurance. Therefore, the judgment of whether the conversation is achieved or not can be used as a training task, and the semantic comprehension capability of the language model is improved.

Accordingly, the embodiment of the specification discloses a training method based on the Bert language model. In one embodiment, FIG. 1 illustrates a schematic diagram of an implementation process for training a Bert-based language model, according to one embodiment. As shown in fig. 1, first, a history session sample is obtained, which includes a plurality of dialog statements generated by a certain service session and corresponding category labels, where the category labels indicate whether a service purpose is achieved by the certain service session; then, inputting the semantic sequence number sequences corresponding to a plurality of dialogue sentences into the language model to obtain an integral semantic vector; then, inputting the whole semantic vector into a service session classification model to obtain a prediction classification result; and then, based on the prediction classification result and the class label, determining classification loss for adjusting model parameters of a business classification model and a language model. In this way, the semantic comprehension capability of the trained language model on multiple rounds of conversations can be effectively improved by judging whether the purpose of the conversations is achieved or not as a training task.

The following describes the implementation steps of the training method in conjunction with specific embodiments. Specifically, fig. 2 is a flowchart illustrating a training method based on a Bert language model according to an embodiment, where an execution subject of the method may be any device or equipment with computing and processing capabilities, or a server cluster, and the like. As shown in fig. 2, the method comprises the steps of:

step S210, obtaining a historical conversation sample, wherein the historical conversation sample comprises a plurality of conversation sentences generated by a certain business conversation and corresponding class labels, and the class labels indicate whether the business purpose is achieved through the certain business conversation; step S220, determining a semantic symbol sequence based on the plurality of dialogue sentences; step S230, inputting the semantic symbol sequence into a language model to obtain an integral semantic vector; step S240, inputting the whole semantic vector into a business conversation classification model to obtain a prediction classification result; step S250, based on the prediction classification result and the category label, adjusting model parameters of the service session classification model and the language model.

The steps are as follows:

first, in step S210, a history session sample is obtained, which includes a plurality of dialog statements generated by a certain service session and corresponding class labels, where the class labels indicate whether a service purpose is achieved by the certain service session.

It is to be understood that a business session may have multiple participants (or multiple session roles), such as 2 or more than 3. For the conversation sentences, in a defined manner, a single message sent by any one of the parties can be used as one conversation sentence, which means that two adjacent conversation sentences may come from the same party; in another definition, a plurality of consecutive messages sent by any one of the parties (which are not interrupted by messages sent by the other party) may be integrated into one conversation sentence, which means that two adjacent conversation sentences come from different participants.

In one embodiment, the service scenario corresponding to the certain service session is a customer service scenario, and the purpose of the customer service and the user dialog is to solve the user problem. In another embodiment, the service scenario corresponding to the certain service session is an insurance renewal scenario, the purpose of the interaction between the insurance service staff and the user is to hope the user to renew the insurance, and accordingly, the category tag may indicate whether the user is successfully renewed through the certain service session. In another embodiment, the service scenario corresponding to the certain service session is a collection scenario, the service staff has a dialog with the user for the purpose of expecting the user to pay, and accordingly, the category tag may indicate whether the user is successfully paid through the certain service session. In a further embodiment, the service scenario corresponding to the certain service session is a negotiation cooperation scenario, and the purpose of the conversation between the negotiators is to achieve cooperation, and accordingly, the category label may indicate whether the negotiators achieve cooperation through the certain service session.

And obtaining the historical conversation sample. In one aspect, the above-mentioned dialog statements may be used for obtaining under the authorization of the session participant. In one embodiment, the plurality of dialog statements may be obtained from a business system log. In another embodiment, the collection of the buried points may be performed in the session terminal, for example, the session terminal may upload a plurality of dialog sentences generated in the session in response to the session end instruction.

On the other hand, the class label can be determined in various ways. In an embodiment, the session role of the certain service session includes a service party and a user, and accordingly, the service behavior data of the user after the certain service session may be acquired, and the category label may be determined by combining with a preset rule. In a specific embodiment, the service party includes a customer service (e.g., a human customer service or a robot customer service), a service person (e.g., an insurance service person or a collection service person, etc.). In a specific embodiment, the preset rule may include, for a service session in a customer service scenario, if a user does not seek again (e.g., does not initiate a service session again) within a predetermined time (e.g., 1 day, one week, etc.) after a certain session, it is determined that the purpose of solving the user problem through the certain session is achieved, otherwise, it is determined that the purpose is not achieved. In another embodiment, the preset rule may include, for a service session in an insurance renewal scenario, if a user renews after a certain session, determining that the session is fulfilled, otherwise, determining that the session is not fulfilled. In another embodiment, the preset rule may include that, for a business session in the collection scenario, if the user completes payment after a certain session, the purpose of the session is determined to be fulfilled, and if the user does not fulfill the purpose, the purpose of the session is determined to be not fulfilled. Therefore, the category label can be determined by acquiring the service behavior data of the user after a certain session and combining a preset rule.

In another embodiment, an evaluation function for the service session may be set in advance, and an evaluation option in the function may be determined according to the type of the category label, so that the category label may be determined by collecting evaluation data. In a specific embodiment, in the customer service scenario, after the session is ended, the following contents "ask if your question is solved" are presented to the user: option a. resolved; option B, unresolved ", receiving user evaluation data for determining a category label, more specifically, if user a is selected, determining that a business objective is achieved, and if user B is selected, determining that a business objective is not achieved. In another specific embodiment, in the scenario of insurance renewal, the user may be asked whether you intend to renew by presenting the following to the user after the session is ended: option a. plan to renew; option b. no renewal is intended. In another specific embodiment, in a multi-party negotiation scenario, after the session is ended, the following contents "ask for your intention to collaborate" may be presented to each negotiating party: option a. want to collaborate; and B, not wanting to cooperate), receiving user evaluation data for determining the class label, and more specifically, if all the negotiators select A, judging that the service purpose is achieved, otherwise, judging that the service purpose is not achieved.

In a further embodiment, the category labels of the dialog sentences may be determined by manual marking.

The historical conversation sample is obtained by acquiring a plurality of pieces of conversation data generated by the certain business conversation and determining the corresponding class label.

After the historical conversation sample is obtained, in step S220, a semantic symbol sequence is determined based on the plurality of dialog sentences.

Specifically, the semantic symbol sequence is obtained by adding a predetermined starting symbol before the first dialog sentence and determining a plurality of semantic unit symbols corresponding to each dialog sentence. Note that the predetermined start symbol may be set manually, for example, may be set to [ start ] or [ CLS ] or [ C ]. For the determination of several semantic unit symbols, in one embodiment, several words contained in each dialog sentence may be determined as the above several semantic unit symbols. In another embodiment, each dialogue sentence may be participled, and the obtained participles may be determined as the semantic unit symbols. This means that the semantic unit symbol (token) may be a word or a word in a conversational sentence.

In one embodiment, obtaining the semantic symbol sequence may further include: in which a predetermined separation symbol is added between every two adjacent dialog sentences. In a specific embodiment, wherein the predetermined separation symbol is used to separate two adjacent dialog sentences. In one example, the predetermined separation symbol may be set manually, for example, may be set to [ SEP ] or [ S ], etc. In one possible example, the determined sequence of semantic symbols is:

[CLS]Tok1_1Tok1_2[SEP]Tok2_1Tok2_2[SEP]Tok3_1...

wherein [ CLS ] represents a preset starting symbol, Tokx _ x represents a semantic unit symbol, and [ SEP ] represents a preset separation symbol, wherein the [ SEP ] can be more than two, so that the quantity of input sentences is expanded from two to more than two.

In another specific embodiment, the predetermined separation symbol can not only separate two adjacent dialog sentences, but also identify conversation roles of the dialog sentences, thereby realizing the role information of the dialog to be introduced into the training text, and enabling the language model to learn more semantic information in multiple rounds of dialog. Specifically, the historical conversation sample further includes a conversation role corresponding to each conversation sentence, the predetermined separation symbol belongs to a plurality of role indicators corresponding to a plurality of conversation roles, and accordingly, adding the predetermined separation symbol between each two adjacent conversation sentences includes: and adding a role indicator corresponding to the conversation role of each conversation statement before each conversation statement based on the corresponding relation between the conversation roles and the role indicators. It should be noted that, the correspondence between the plurality of session roles and the plurality of role indicators may be preset manually. In one example, assuming that the session roles include customer service and user, Role indicators [ Role1] and [ Role2] may be assigned accordingly. In another example, assuming that the session Role includes three negotiators, Role indicators [ Role1], [ Role2], and [ Role3] may be assigned accordingly. In one possible example, the determined sequence of semantic symbols is:

[C][Role1]Tok1_1[Role2]Tok2_1Tok2_2[Role1]Tok3_1...

where [ C ] denotes a predetermined start symbol, Tokx _ x denotes a semantic unit symbol, and [ Role1] and [ Role2] denote Role indicators indicating different session roles. In this way, the introduction of the conversation role information into the model input can be realized, so that the language model can learn more semantic information in multiple rounds of conversation.

On the other hand, in the above training method, in addition to the judgment dialogue task as the training task, a word predicted to be removed may be used as another training task. Thus, in this step, a part of the full-scale semantic unit symbols corresponding to the plurality of dialogue sentences can be replaced by the default symbols. Specifically, the method may further include: and randomly replacing a first number of semantic unit symbols with a first number of default symbols according to the determined plurality of semantic unit symbols corresponding to each dialogue statement, so as to obtain the semantic symbol sequence. It should be noted that the first number and the form of the default symbol may be manually set, for example, the first number may be set to 2 or 3, and the default symbol may be set to [ N ] or [ ], etc. In addition, in addition to replacing some semantic unit symbols with default symbols, some semantic unit symbols may be replaced with randomly extracted semantic unit symbols (for example, "sheep" is replaced with "road surface"), and the language comprehension capability of the language model is improved by making the language model perform error correction type character prediction.

Thus, the semantic symbol sequence can be determined from the plurality of dialog sentences. Next, in step S230, the semantic symbol sequence is input into the language model to obtain an overall semantic vector.

Specifically, after the semantic symbol sequence is input into the language model, the language model may encode each semantic symbol (including the predetermined start symbol, the predetermined separation symbol, or the semantic unit symbol) for multiple times, and an encoded vector obtained by the last encoding may be used as a semantic representation vector corresponding to the position of each semantic symbol, or referred to as a context representation vector. Further, the whole semantic vector can be output according to a plurality of semantic expression vectors corresponding to the semantic symbol sequence.

In one embodiment, a semantic representation vector corresponding to a position of a predetermined start symbol in the plurality of semantic representation vectors may be used as the whole semantic vector. It is to be understood that the predetermined start symbol is used for identifying the start of the input sentence in the input, and may be represented by any symbol, and has no other semantics, but an encoder for encoding the predetermined start symbol in the language model introduces a self-attention mechanism, so that the encoding vector corresponding to the predetermined start symbol position may actually represent the overall semantics of other semantic symbols in the semantic symbol sequence. Accordingly, the semantic representation vector corresponding to the position of the predetermined start symbol can be used as the overall semantic vector representing the overall semantics of the semantic symbol sequence.

In another embodiment, an average vector of the plurality of semantic representation vectors may be used as the overall semantic vector. In another embodiment, an average vector of the plurality of semantic representation vectors except the semantic representation vector corresponding to the position of the predetermined start symbol may be obtained as the overall semantic vector.

In yet another embodiment, the global semantic vector may be determined from a plurality of semantic representation vectors in a max pooling (Maxpooling) manner. Specifically, for each vector element position in the semantic representation vector, a maximum value of a plurality of element values corresponding to each vector element position in the plurality of semantic representation vectors is determined, and the maximum value corresponding to each vector element position is further combined into the whole semantic vector.

In this way, the semantic symbol sequence may be input into a language model to obtain an overall semantic vector for representing the semantics of the plurality of dialog sentences. Next, in step S240, the whole semantic vector is input into the service session classification model to obtain a prediction classification result. In one embodiment, the traffic session classification model described above may be implemented based on a neural network. In a specific embodiment, a number of feed forward Neural Network (feedforward Neural Network) layers and a subsequent Softmax layer may be included in the conversational classification model. In another embodiment, the traffic classification model may be implemented using a support vector machine. Therefore, the whole semantic vector is input into the business session classification model, a prediction classification result can be obtained, and whether the business purpose is achieved through the certain business session or not can be predicted.

Then, in step S250, model parameters of the business conversation classification model and the language model are adjusted based on the prediction classification result and the category label.

Specifically, the service session classification loss may be determined according to the prediction classification result and the class label, and then model parameter adjustment may be performed according to the loss. In one embodiment, the traffic session classification loss may be determined by calculating a cross-entropy loss, a hinge loss, and the like. In a specific embodiment, the traffic session classification loss may be calculated by the following formula:

（1）

in the formula (1), the reaction mixture is,

representing the semantic symbol sequence;

indicating the category corresponding to the category label

；

Indicating that the semantic symbol sequence included in the prediction classification result is predicted and classified into a category

The probability of (d); wherein

，

Representing the corresponding semantic representation function of the language model,

representing the above-mentioned overall semantic vector,

and representing the classification function corresponding to the business session model.

Therefore, the model parameters of the business conversation classification model and the language model can be adjusted through the determined business conversation classification loss.

On the other hand, in the above training method, in addition to the judgment dialogue task as the training task, a word predicted to be removed may be used as another training task. Thus, in an embodiment, in step S230, after obtaining the semantic representation vector corresponding to the position of each semantic symbol in the semantic symbol sequence, the method further includes: the semantic expression vectors corresponding to the default character positions of the first number are respectively input into the character prediction model, and a first number of predicted character results are correspondingly obtained, and correspondingly, the method further includes the following steps: and adjusting model parameters of the language model and the character prediction model based on the semantic unit symbols of the first number and the character prediction result. Specifically, the predicted character loss may be determined according to the first number of semantic unit symbols and the predicted character result, and then the model parameter adjustment may be performed according to the loss. In one embodiment, the predicted character loss may be determined by calculating cross entropy loss, hinge loss, and the like. In a specific embodiment, the traffic session classification loss may be calculated by the following formula:

（2）

in the formula (2), the reaction mixture is,

a first number representing the default symbol;

representing the semantic symbol sequence;

to represent

A default symbolTo middle

Semantic unit symbols corresponding to the default symbols;

indicating that included in the predicted character result

A default symbol is predicted as

The probability of (c). In this way, the model parameters of the language model and the character prediction model can be adjusted based on the predicted character loss calculated by the expression (2).

According to a specific embodiment, in a certain training, the value can be calculated by using the above formula (1)

Adjusting the model parameters of the service session classification model, and calculating by using the formula (2)

The model parameters of the character prediction model may be adjusted and calculated by the following equation (3)

Adjusting the model parameters of the language model:

（3）

in the above-mentioned formula (3),

and

can be respectively provided withDetermined by the above equations (1) and (2),

the super ginseng may be set to 0.01 or 0.02 manually, for example.

Therefore, the parameter adjustment of the language model and the business conversation classification model can be realized, and in some embodiments, the parameter adjustment of the character prediction model can also be realized. By performing the above steps S210 to S250, one iterative training can be realized.

The above training method is described below with reference to a specific embodiment. In particular, FIG. 3 illustrates a process diagram for joint training of a language model, a predictive classification model, and a character prediction model, according to one embodiment. As shown in fig. 3, for the obtained historical conversation sample, firstly, processing a plurality of dialogue sentences therein into a semantic symbol sequence 31, specifically, adding a predetermined starting symbol [ CLS ] before the first sentence, adding a Role indication symbol such as [ Role1], [ Role2] before each dialogue sentence, and determining a plurality of semantic unit symbols corresponding to each dialogue sentence, for example, representing the semantic unit symbols in the dialogue sentence 1 as Tok1 to TokN in fig. 3, specifically, "request, question, this, person, business, item, how, back" or "ask, this, commodity, how, back" in one possible example, and replacing part of the semantic unit symbols in the dialogue sentence 3 with default symbols [ Mask ], and then inputting the processed semantic symbol sequence 31 into the language model 32, and obtaining a semantic representation vector corresponding to each semantic symbol position. Furthermore, the semantic expression vector C corresponding to the position of the predetermined initial symbol [ CLS ] is input into the business conversation classification model 33 as the whole semantic expression vector to obtain the predicted classification result, and the semantic expression vector corresponding to the position of the default symbol in the semantic symbol sequence is input into the character prediction model 34 to obtain the predicted character result. Then, based on the predicted classification result and the class label in the historical conversation sample, the semantic unit symbol corresponding to the predicted character result and the default symbol position, and model parameters in the business classification model, the language model and the character prediction model are adjusted. In this way, one iteration of training can be completed.

It should be noted that the historical session samples used in one iterative training may be one or more, and are not limited. Further, by repeatedly executing the iterative training process, multiple times of iterative training of the model can be realized until a preset number of iterations is reached, or until the iteration of the model converges, and the language model obtained by the last iterative training can be used.

In particular, the trained language model may be applied to a variety of scenarios. In one embodiment, the trained language model and the service session classification model may be used in combination to predict whether a newly generated service session achieves a service objective, so as to perform service quality evaluation on the newly generated service session. In another embodiment, the trained language model may be used as a base model in a multi-turn dialog scenario, for example, the trained language model may be fine-tuned according to a specific scenario task and then used as a task model or a component of a task model. In a specific embodiment, if it is necessary to determine the dialogue atmosphere of multiple rounds of dialogue, such as peace, joy, or tension, the language model and the atmosphere classification model trained by the above training method may be trained by using the related marking data, so as to obtain the language model and the atmosphere classification model that are finally used for predicting dialogue classification.

In summary, in the training method disclosed in the embodiment of the present specification, the semantic comprehension ability of the Bert-based language model can be improved by learning whether the dialog target of the universal dialog is achieved. In addition, a plurality of dialogue sentences in the whole dialogue are separated by a preset separator and then input into the model, so that the number of the language model processing sentences is expanded from 2 to more. Further, by setting a predetermined delimiter as a character indicator indicating a conversation character, conversation character information is introduced, thereby allowing the language model to learn semantic information in a plurality of turns of conversation more and better. In a word, the language model trained in the way has wider, deeper and more effective semantic understanding on multiple rounds of conversation.

Corresponding to the training method, the embodiment of the specification also discloses a training device. Specifically, fig. 4 is a schematic structural diagram of a training apparatus based on a Bert language model according to an embodiment, where the training apparatus may be implemented by any computing node, device, or server cluster, etc. having computing and processing capabilities. As shown in fig. 4, the training device 400 includes the following units and modules:

the sample obtaining unit 410 is configured to obtain a historical session sample, where the historical session sample includes a plurality of dialog statements generated by a certain service session and a corresponding class label, and the class label indicates whether a service purpose is achieved through the certain service session. A sequence determination unit 420 configured to determine a sequence of semantic symbols based on the plurality of dialog sentences. A vector prediction unit 430, configured to input the semantic symbol sequence into the language model, so as to obtain an overall semantic vector. And the classification prediction unit 440 is configured to input the whole semantic vector into a service session classification model to obtain a prediction classification result. A model parameter adjusting unit 450 configured to adjust model parameters of the business session classification model and the language model based on the predicted classification result and the class label.

In an embodiment, the session role of the certain service session includes a service party and a user, where the sample obtaining unit 410 is specifically configured to: acquiring the plurality of dialogue sentences and acquiring the service behavior data of the user after the certain service session; determining the class label according to the service behavior data based on a preset rule; and constructing the historical conversation sample based on the plurality of dialog sentences and the category labels.

In an embodiment, the sequence determining unit 420 specifically includes: a start symbol adding module 421 configured to add a predetermined start symbol before the first dialog sentence; a semantic unit determining module 422 configured to determine a plurality of semantic unit symbols corresponding to each dialog sentence to obtain the semantic symbol sequence; wherein the semantic unit symbols are characters or words in the dialogue sentences.

In a specific embodiment, the sequence determining unit 420 further includes: a separation symbol adding module 423 configured to add a predetermined separation symbol between each two adjacent dialogue sentences therein.

In a more specific embodiment, the historical conversation sample further includes a conversation role corresponding to each conversation statement, and the predetermined separation symbol belongs to a plurality of role indicators corresponding to a plurality of conversation roles; wherein the separation symbol adding module 423 is specifically configured to: and adding a role indicator corresponding to the conversation role of each conversation statement before each conversation statement based on the corresponding relation between the conversation roles and the role indicators.

In one embodiment, the vector prediction unit 430 specifically includes: a position vector prediction module 431, configured to input the semantic symbol sequence into the language model, to obtain a semantic representation vector corresponding to the position of each semantic symbol in the semantic symbol sequence; an overall vector determination module 432 configured to determine the overall semantic vector based on the obtained semantic representation vector.

In a specific embodiment, the overall vector prediction module 432 is specifically configured to: taking the semantic expression vector corresponding to the position of the preset initial symbol as the whole semantic vector; or, calculating an average vector of the obtained semantic expression vectors as the whole semantic vector; or, aiming at the obtained semantic expression vector, determining the maximum value in a plurality of element values corresponding to each vector element position to form the whole semantic vector.

In another specific embodiment, the sequence determining unit 420 further includes: a semantic unit replacing module 424, configured to randomly replace a first number of semantic unit symbols with a first number of default symbols according to the determined plurality of semantic unit symbols corresponding to each dialog sentence; the apparatus 400 further comprises: the character prediction unit 460 is configured to input semantic expression vectors corresponding to the positions of the default characters of the first number into a character prediction model respectively, and obtain predicted character results of the first number correspondingly; wherein the model parameter tuning unit 450 is further configured to: adjusting model parameters of the language model and the character prediction model based on the first number of semantic unit symbols and the predicted character result.

In summary, in the training apparatus disclosed in the embodiment of the present specification, the semantic comprehension ability of the Bert-based language model can be improved by learning whether or not the dialog target of the universal dialog is achieved. In addition, a plurality of dialogue sentences in the whole dialogue are separated by a preset separator and then input into the model, so that the number of the language model processing sentences is expanded from 2 to more. Further, by setting a predetermined delimiter as a character indicator indicating a conversation character, conversation character information is introduced, thereby allowing the language model to learn semantic information in a plurality of turns of conversation more and better. In a word, the language model trained in the way has wider, deeper and more effective semantic understanding on multiple rounds of conversation.

As above, according to an embodiment of a further aspect, there is also provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 2.

There is also provided, according to an embodiment of yet another aspect, a computing device comprising a memory having stored therein executable code, and a processor that, when executing the executable code, implements the method described in connection with fig. 2.

Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in the embodiments disclosed herein may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.

The above-mentioned embodiments, objects, technical solutions and advantages of the embodiments disclosed in the present specification are further described in detail, it should be understood that the above-mentioned embodiments are only specific embodiments of the embodiments disclosed in the present specification, and are not intended to limit the scope of the embodiments disclosed in the present specification, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the embodiments disclosed in the present specification should be included in the scope of the embodiments disclosed in the present specification.

Claims

1. A training method of a language model based on Bert comprises the following steps:

acquiring a historical conversation sample, wherein the historical conversation sample comprises a plurality of conversation sentences generated by a certain business conversation and corresponding class labels, and the class labels indicate whether the purpose of business is achieved through the certain business conversation;

determining a semantic symbol sequence based on the plurality of dialog sentences;

inputting the semantic symbol sequence into the language model to obtain an integral semantic vector;

inputting the whole semantic vector into a business session classification model to obtain a prediction classification result;

and adjusting model parameters of a business conversation classification model and the language model based on the prediction classification result and the class label.

2. The method of claim 1, wherein the session roles of the certain service session include a service party and a user, and the obtaining of the historical session sample comprises:

acquiring the plurality of dialogue sentences and acquiring the service behavior data of the user after the certain service session;

determining the class label according to the service behavior data based on a preset rule;

and constructing the historical conversation sample based on the plurality of dialog sentences and the category labels.

3. The method of claim 1, wherein determining a sequence of semantic symbols based on the plurality of dialog statements comprises:

adding a preset initial symbol before a first dialog statement and determining a plurality of semantic unit symbols corresponding to each dialog statement to obtain a semantic symbol sequence; wherein the semantic unit symbols are characters or words in the dialogue sentences.

4. The method of claim 3, wherein obtaining the sequence of semantic symbols further comprises:

in which a predetermined separation symbol is added between every two adjacent dialog sentences.

5. The method of claim 4, wherein the historical conversation sample further includes a conversation role corresponding to each conversation statement, and the predetermined separation symbol belongs to a plurality of role indicators corresponding to a plurality of conversation roles; adding a predetermined separation symbol between every two adjacent dialog sentences, including:

and adding a role indicator corresponding to the conversation role of each conversation statement before each conversation statement based on the corresponding relation between the conversation roles and the role indicators.

6. The method of claim 3, wherein inputting the sequence of semantic symbols into the language model results in an overall semantic vector comprising:

inputting the semantic symbol sequence into the language model to obtain a semantic expression vector corresponding to the position of each semantic symbol in the semantic symbol sequence;

and determining the whole semantic vector based on the obtained semantic expression vector.

7. The method of claim 6, wherein determining the overall semantic vector based on the derived semantic representation vector comprises:

taking the semantic expression vector corresponding to the position of the preset initial symbol as the whole semantic vector; or the like, or, alternatively,

calculating an average vector of the obtained semantic expression vectors as the whole semantic vector; or the like, or, alternatively,

and aiming at the obtained semantic expression vector, determining the maximum value in a plurality of element values corresponding to each vector element position to form the whole semantic vector.

8. The method of claim 6, wherein obtaining the sequence of semantic symbols further comprises:

randomly replacing a first number of semantic unit symbols with a first number of default symbols according to a plurality of determined semantic unit symbols corresponding to each dialogue statement;

after obtaining the semantic representation vector corresponding to the position of each semantic symbol in the semantic symbol sequence, the method further includes:

respectively inputting semantic expression vectors corresponding to the positions of the default characters of the first quantity into a character prediction model to correspondingly obtain a predicted character result of the first quantity;

based on the classification prediction result and the class label, adjusting model parameters of a service session classification model and the language model, and further comprising:

adjusting model parameters of the language model and the character prediction model based on the first number of semantic unit symbols and the predicted character result.

9. A training apparatus for a Bert-based language model, comprising:

the sample acquisition unit is configured to acquire a historical conversation sample, wherein the historical conversation sample comprises a plurality of conversation sentences generated by a certain business conversation and corresponding class labels, and the class labels indicate whether the business purpose is achieved through the certain business conversation;

a sequence determination unit configured to determine a semantic symbol sequence based on the plurality of dialog sentences;

the vector prediction unit is configured to input the semantic symbol sequence into the language model to obtain an overall semantic vector;

the classification prediction unit is configured to input the whole semantic vector into a business conversation classification model to obtain a prediction classification result;

and the model parameter adjusting unit is configured to adjust the business conversation classification model and the model parameters of the language model based on the prediction classification result and the class label.

10. The apparatus of claim 9, wherein the session roles of the certain service session include a service party and a user, and wherein the sample obtaining unit is specifically configured to

11. The apparatus according to claim 9, wherein the sequence determining unit specifically includes:

a starting symbol adding module configured to add a predetermined starting symbol before a first dialog sentence;

the semantic unit determining module is configured to determine a plurality of semantic unit symbols corresponding to each conversation sentence to obtain a semantic symbol sequence; wherein the semantic unit symbols are characters or words in the dialogue sentences.

12. The apparatus of claim 11, wherein the sequence determining unit further comprises:

and the separation symbol adding module is configured to add a preset separation symbol between every two adjacent conversation sentences.

13. The apparatus of claim 12, wherein the historical conversation sample further includes a conversation role corresponding to each conversation statement, and the predetermined separation symbol belongs to a plurality of role indicators corresponding to a plurality of conversation roles; wherein the separation symbol adding module is specifically configured to:

14. The apparatus of claim 11, wherein the vector prediction unit specifically comprises:

the position vector prediction module is configured to input the semantic symbol sequence into the language model to obtain a semantic expression vector corresponding to the position of each semantic symbol in the semantic symbol sequence;

and the overall vector determining module is configured to determine the overall semantic vector based on the obtained semantic expression vector.

15. The apparatus of claim 14, wherein the global vector prediction module is specifically configured to:

16. The apparatus of claim 14, wherein the sequence determining unit further comprises:

the semantic unit replacing module is configured to randomly replace a first number of semantic unit symbols with a first number of default symbols according to the determined plurality of semantic unit symbols corresponding to each dialogue statement;

the device further comprises:

the character prediction unit is configured to input semantic expression vectors corresponding to the positions of the default characters of the first number into a character prediction model respectively, and correspondingly obtain a predicted character result of the first number;

wherein the model parameter tuning unit is further configured to:

17. A computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed in a computer, causes the computer to perform the method of any of claims 1-8.

18. A computing device comprising a memory and a processor, wherein the memory has stored therein executable code that when executed by the processor implements the method of any of claims 1-8.