CN111460115A

CN111460115A - Intelligent man-machine conversation model training method, model training device and electronic equipment

Info

Publication number: CN111460115A
Application number: CN202010187709.2A
Authority: CN
Inventors: 马力; 熊为星; 庞建新; 熊友军
Original assignee: Ubtech Robotics Corp
Current assignee: Ubtech Robotics Corp
Priority date: 2020-03-17
Filing date: 2020-03-17
Publication date: 2020-07-28
Anticipated expiration: 2040-03-17
Also published as: CN111460115B

Abstract

The application provides an intelligent man-machine conversation model training method, an intelligent man-machine conversation model training device, electronic equipment and a computer readable storage medium, wherein the method comprises the following steps: acquiring user statements of a current round and system statements of a previous round, splicing and inputting the user statements and the system statements into a BERT model to obtain a current dialogue matrix; inputting the current dialog matrix into a first bidirectional GRU model to be trained to obtain a session semantic vector; performing first linear transformation on the session semantic vector to obtain an intention vector, and performing second linear transformation on the session semantic vector to obtain a conversation behavior vector; calculating according to the intention vector to obtain an intention loss, and calculating according to the conversation behavior vector to obtain a conversation behavior loss; and carrying out back propagation on the basis of the intention loss and the dialogue action loss, and updating the model parameters of each model to be trained. According to the scheme, the BERT model and the GRU model are fused to encode history memory, and the system sentences in the previous round are fused during semantic analysis, so that the semantic analysis model with higher accuracy can be obtained.

Description

Intelligent man-machine conversation model training method, model training device and electronic equipment

Technical Field

The application belongs to the technical field of artificial intelligence, and particularly relates to an intelligent man-machine conversation model training method, a model training device, electronic equipment and a computer readable storage medium.

Background

With the development of technology, man-machine conversation interactive systems are increasingly widely applied, and in order to realize automatic man-machine conversation, a computer needs to analyze the contained intentions and conversation behavior categories from characters input by a user and extract keywords in the analyzed intentions and conversation behavior categories to formulate a corresponding reply strategy. In recent years, with the development of deep learning techniques and the improvement of computing power of computers, people have begun to apply deep learning techniques to human-computer interactive systems. However, the current representative model for processing the task of multi-task semantic analysis based on multi-turn conversation still has the problem of low accuracy and the like, and cannot meet the requirements of people.

Disclosure of Invention

The embodiment of the application provides an intelligent man-machine conversation model training method, a model training device, electronic equipment and a computer readable storage medium, and a semantic analysis model with higher accuracy can be obtained.

In a first aspect, the present application provides a method for training an intelligent human-machine interaction model, including:

acquiring user sentences of a current round and system sentences of a previous round;

splicing the user statement and the system statement and inputting the user statement and the system statement into a BERT model to obtain a current dialogue matrix;

inputting the current dialog matrix into a first bidirectional GRU model to be trained to obtain a session semantic vector, wherein the session semantic vector is obtained by splicing a first output result of the first bidirectional GRU model in a first direction and a second output result of the first bidirectional GRU model in a second direction, an initial hidden layer of the first bidirectional GRU model is constructed based on other sentences, and the other sentences are sentences of historical turns except the system sentences;

performing first linear transformation on the session semantic vector to obtain an intention vector, and performing second linear transformation on the session semantic vector to obtain a conversation behavior vector;

calculating according to the intention vector to obtain intention loss, and calculating according to the conversation behavior vector to obtain conversation behavior loss;

and carrying out back propagation on the basis of the intention loss and the dialogue action loss, and updating the model parameters of each model to be trained.

In a second aspect, the present application provides an intelligent human-machine interaction model training device, comprising:

the sentence acquisition unit is used for acquiring user sentences of the current round and system sentences of the previous round;

the dialogue matrix acquisition unit is used for splicing the user statement and the system statement and inputting the user statement and the system statement into a BERT model to obtain a current dialogue matrix;

a session semantic vector obtaining unit, configured to input the current dialog matrix into a first bidirectional GRU model to be trained to obtain a session semantic vector, where the session semantic vector is obtained by splicing a first output result in a first direction and a second output result in a second direction based on the first bidirectional GRU model, an initial hidden layer of the first bidirectional GRU model is constructed based on other sentences, and the other sentences are sentences of a history turn except the system sentences;

the linear transformation unit is used for carrying out first linear transformation on the conversation semantic vector to obtain an intention vector and carrying out second linear transformation on the conversation semantic vector to obtain a conversation behavior vector;

the loss calculation unit is used for calculating and obtaining the intention loss according to the intention vector and calculating and obtaining the dialogue action loss according to the dialogue action vector;

and the parameter updating unit is used for carrying out back propagation on the basis of the intention loss and the dialogue action loss and updating the model parameters of each model to be trained.

In a third aspect, the present application provides an electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the method according to the first aspect when executing the computer program.

In a fourth aspect, the present application provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of the first aspect.

In a fifth aspect, the present application provides a computer program product comprising a computer program which, when executed by one or more processors, performs the steps of the method of the first aspect as described above.

According to the scheme, the user statements of the current round and the system statements of the previous round are obtained, the user statements of the current round and the system statements of the previous round are spliced and input into a BERT model to obtain a current dialogue matrix, the current dialogue matrix is input into a first bidirectional GRU model to be trained to obtain a dialogue semantic vector, the dialogue semantic vector is subjected to first linear transformation to obtain an intention vector, the dialogue semantic vector is subjected to second linear transformation to obtain a dialogue behavior vector, the intention loss is obtained through calculation according to the intention vector, the dialogue behavior loss is obtained through calculation according to the dialogue behavior vector, finally, back propagation is carried out on the basis of the intention loss and the dialogue behavior loss, and the model parameters of each model to be trained are updated. According to the scheme, the BERT model and the GRU model are fused to encode history memory, and the system sentences in the previous round are fused during semantic analysis, so that the semantic analysis model with higher accuracy can be obtained. It is understood that the beneficial effects of the second aspect to the fifth aspect can be referred to the related description of the first aspect, and are not described herein again.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1 is a schematic flow chart diagram illustrating a method for training an intelligent human-machine interaction model according to an embodiment of the present application;

fig. 2 is a schematic workflow diagram of a first bidirectional GRU model and a BERT model according to an embodiment of the present application;

fig. 3 is a schematic workflow diagram of a second bidirectional GRU model and BERT model according to an embodiment of the present application;

FIG. 4 is a schematic workflow diagram of a unidirectional GRU model provided by an embodiment of the present application;

FIG. 5 is a block diagram of an intelligent human-machine interaction model training device according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to" determining "or" in response to detecting ". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.

Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.

Example one

Referring to fig. 1, a method for training an intelligent human-machine conversation model provided in an embodiment of the present application is described below, where the method for training an intelligent human-machine conversation model in an embodiment of the present application includes:

step 101, obtaining user statements of a current round and system statements of a previous round;

in the embodiment of the application, for a group of human-computer conversations, the sentences of each turn can be numbered according to the sequence of the turn of the conversation, so as to distinguish the usersIn the embodiment of the present application, the character u represents the user sentence input by the tester during the training process, the character s represents the system sentence fed back by the intelligent question-answering system, and the following mark represents the turn of the sentence, so that the conversation sequence (u) can be obtained₁，s₂，u₃，s₄，…，s_t-1，u_t) That is, the user statement of the current round is denoted as u_tThe system statement of the previous round is denoted as s_t-1. Referring to fig. 2, the semantic analysis model trained by the intelligent human-machine dialogue model training method provided by the present application can finally realize recognition of the intention of the user sentence, the dialogue behavior category to which the user sentence belongs, and the named entity included in the user sentence, and the intelligent question-answering system can generate a corresponding system sentence based on the above result obtained by recognition and feed back the system sentence to the user, so as to realize human-machine dialogue.

102, splicing the user statement and the system statement and inputting the user statement and the system statement into a BERT model to obtain a current dialog matrix;

in the embodiment of the present application, please refer to fig. 2, where the semantic analysis model is actually composed of a plurality of models, which include BERT (bidirectional Encoder retrieval from transforms) models, it should be noted that the BERT model is a model that has been trained in advance, that is, the intelligent human-computer conversation model training method proposed in the present application does not need to train the BERT model. Considering that the previous round of conversation of the current conversation usually has a significant meaning for understanding of the current conversation, and the previous round of conversation is often ignored in the prior art, resulting in loss of information, the user statement u of the current round is selected in the embodiment of the present application_tAnd the system statement s of the previous round_t-1Jointly as input for the BERT model, in particular user statements u_tAnd a system statement s_t-1Spliced into a sentence and then input into the BERT model, i.e. the user sentence u is not_tAnd a system statement s_t-1The input is used alone, but the input and the input are spliced together and then used in the form of a sentenceAnd inputting, so that the semantic analysis model can more effectively utilize the information from the previous session in the semantic analysis process, and the information loss is avoided.

Optionally, note user statement u_tAnd a system statement s_t-1If a sentence obtained after the concatenation operation is a concatenation sentence, the BERT model may first perform preprocessing on the concatenation sentence, where the preprocessing is specifically a word-wise segmentation processing. That is, the BERT model may first call a token to segment the concatenation statement by words, so as to obtain a plurality of tokens (i.e., token sequences); then, the token sequences obtained by cutting are really put into a BERT model, and hidden layers corresponding to each token at the topmost layer of the BERT model are taken out, so that a matrix with a preset first size can be obtained by splicing, and the matrix is the current dialogue matrix; that is, the current dialog matrix is actually a matrix formed by all vectors output by the last layer of the BERT model after the spliced sentences obtained by splicing are input into the BERT model. For example only, the first dimension may be [128 x 768]]Wherein "128" is the length of token sequence after 0 vector padding processing, and the token sequence includes user sentence u_tAnd a system statement s_t-1Therefore, the value is large and can be adjusted according to the requirements of different application scenes; the aforementioned "768" is a fixed value and outputs the dimension of the hidden layer for the BERT model.

Step 103, inputting the current dialog matrix into a first bidirectional GRU model to be trained to obtain a session semantic vector;

in the embodiment of the present application, please refer to fig. 2, the semantic parsing model further includes a bidirectional gru (gatedcurrentunit) model. The current dialog matrix is input to a bidirectional GRU model, and the bidirectional GRU model in this step is referred to as a first bidirectional GRU model and two directions related to the first bidirectional GRU model are referred to as a first direction and a second direction for distinction, considering that there may be a plurality of bidirectional GRU models in the semantic analysis model. Thus, the session semantic vector is actually based on the first bidirectional GRU model at the first partyAnd splicing the upward first output result and the second output result in the second direction. Specifically, the initial hidden layer of the first bidirectional GRU model is constructed based on other sentences, which are sentences of historical turns other than the system sentences, i.e., the sentences u₁，s₂，u₃，s₄，…，s_t-3，u_t-2. It should be noted that, since the first bidirectional GRU model involves two directions, the initial hidden layer actually refers to the initial hidden layer in two directions, i.e. the initial hidden layer in the first direction and the initial hidden layer in the second direction; that is, the initial hidden layer in the first direction and the initial hidden layer in the second direction are constructed based on the other sentences.

Specifically, since the first bidirectional GRU model relates to two directions, after the current dialog matrix is input to the first bidirectional GRU model, the output of the last hidden layer of the first bidirectional GRU model in the first direction may be used as a first output result, the output of the last hidden layer of the first bidirectional GRU model in the second direction may be used as a second output result, and then the first output result and the second output result may be spliced to obtain the session semantic vector. The session semantic vector may be considered a semantic representation of a user statement that contains context information. In fig. 2, the session semantic vector is labeled I.

Alternatively, referring to fig. 3, the initial hidden layer of the first bidirectional GRU model may be constructed as follows:

a1, respectively obtaining statement vectors of other statements;

wherein the other sentence is a sentence u₁，s₂，u₃，s₄，…，s_t-3，u_t-2Here, the statement u may be obtained first₁，s₂，u₃，s₄，…，s_t-3，u_t-2The statement vector of (2), wherein the statement vector is a semantic representation of the statement. In particular, a statement vector for each statementCan be generated by the following way:

a11, performing word-based segmentation processing on other target sentences to obtain segmented sequences;

the target other sentence is any one of the other sentences, that is, the target other sentence may be u (i) or s (i), where u (i) is used to express u₁，u₃，……，u_t-2The above-mentioned s (i) is used for expressing s₂，s₄，……，s_t-3Any one of the above statements. The other sentences of the target firstly call the token to perform word-by-word segmentation processing through the BERT model to obtain a segmented sequence, namely a token sequence.

A12, inputting the segmented sequence into the BERT model to obtain a statement matrix of the other target statements;

the hidden layer corresponding to each token at the topmost layer of the BERT model is taken out, a matrix with a preset second size can be obtained through splicing, and the matrix is a statement matrix of other target statements; that is, similar to the current dialog matrix, the statement matrix is a matrix formed by all vectors output by the last layer of the BERT model after other statements of the target are input into the BERT model. For example only, the size of the current dialog matrix may be [64 × 768], where "64" is the length of the token sequence after the 0 vector filling process, and since the token sequence only contains other target statements, this value is smaller than the size "128" involved in the current dialog matrix, and the value may be adjusted according to the needs of different application scenarios; the aforementioned "768" is a fixed value and outputs the dimension of the hidden layer for the BERT model.

And A13, inputting the statement matrix into a second bidirectional GRU model to be trained to obtain statement vectors of the other target statements.

Here, since the bidirectional GRU model in step a13 is different from the first bidirectional GRU model, the bidirectional GRU model in step a13 is referred to as a second bidirectional GRU model, and the two directions of the second bidirectional GRU model are referred to as a third direction and a fourth direction. It should be noted that the second bidirectional GRU model is also one of the models to be trained.

Specifically, since the second bidirectional GRU model relates to two directions, after the term matrix is input to the second bidirectional GRU model, the term vector of the target other term may be obtained by using an output of the last hidden layer of the second bidirectional GRU model in the third direction as a third output result, using an output of the last hidden layer of the second bidirectional GRU model in the fourth direction as a fourth output result, and then concatenating the third output result and the fourth output result. For convenience of explanation, the statement vector of the target other statement may be recorded as m_i. The statement vector may have a preset data buffer area after being generated. When the conversation goes to u of the current round_tThen, the statement vector of each of the other statements can be obtained from the data cache region, that is, the statement u is obtained₁Sentence vector m₁，s₂Sentence vector m₂，u₃Sentence vector m₃，s₄Sentence vector m₄，…，s_t-3Sentence vector m_t-3，u_t-2Sentence vector m_t-2. The statement vectors of each of the other statements above may constitute a semantic token vector set m₁,m₂,……,m_t-2]。

A2, inputting each statement vector into a one-way GRU model to be trained;

referring to fig. 4, the semantic representation vector group [ m ] can be represented by₁,m₂,……,m_t-2]Input into the unidirectional GRU model, it should be noted that the unidirectional GRU model described above is also one of the models to be trained.

A3, using the vector output by the last hidden layer of the unidirectional GRU model as a memory coding vector;

a4, constructing an initial hidden layer of the first bidirectional GRU model based on the memory coding vector.

Wherein the memory can be wovenThe code vector is denoted as M, and in the embodiment of the present application, the memory code vector M can be used as the initial hidden layer h of the first bidirectional GRU model in two directions₀。

104, performing first linear transformation on the session semantic vector to obtain an intention vector, and performing second linear transformation on the session semantic vector to obtain a conversation behavior vector;

in the embodiment of the present application, the session semantic vector may be linearly transformed. The mapping obtains an intention vector and a dialogue action vector, wherein, in order to distinguish the two linear transformation operations, the linear transformation operation for obtaining the intention vector is regarded as a first linear transformation, and the linear transformation operation for obtaining the dialogue action vector is regarded as a second linear transformation.

Optionally, the operation of performing the first linear transformation on the session semantic vector to obtain the intention vector may specifically include:

b1, performing linear transformation on the session semantic vector according to a preset first parameter matrix to obtain a first intermediate vector;

the number of matrix rows of the first parameter matrix is determined according to the dimension of the session semantic vector, the number of matrix columns of the first parameter matrix is determined according to the total number of preset intention categories, and the dimension of the first intermediate vector is the total number of the intention categories. Specifically, the intention category is proposed in advance by the developer, and there may be a plurality of intention categories such as "listen to music", "make a flight", "go to a city", and "eat food", for example, each intention category representing an intention. Assuming that the dimension of the session semantic vector is m and assuming that the intention classes proposed by the developers have n classes in common, the size of the first parameter matrix is m × n. The operation of performing linear transformation on the session semantic vector according to the preset first parameter matrix is specifically to multiply the utterance semantic vector and the first parameter matrix to obtain an n-dimensional first intermediate vector.

B2, transforming the first intermediate vector based on a preset first activation function to obtain an intention vector.

The first activation function is specifically a softmax function. The first intermediate vector can be converted into a probability vector, i.e. an intention vector, by softmax. Each dimension in the intent vector is used to represent a probability that the user statement belongs to each intent category, that is, each dimension in the intent vector is a probability prediction of different intentions by the semantic analysis model. For example, if the intention category corresponding to the first dimension of the intention vector is "listen to music", the elements of the first dimension (assumed to be p)₁) I.e. the user statement u predicted by the semantic analysis model_tThe probability of expressing the intention of "listening to music".

Optionally, the performing a second linear transformation on the session semantic vector to obtain a dialog behavior vector may specifically include:

c1, performing linear transformation on the session semantic vector according to a preset second parameter matrix to obtain a second intermediate vector;

the number of the matrix rows of the second parameter matrix is determined according to the dimension of the session semantic vector, the number of the matrix columns of the second parameter matrix is determined according to the total number of preset dialogue behavior categories, and the dimension of the second intermediate vector is the total number of the dialogue behavior categories. Specifically, similar to the intention category, the above-described dialogue action categories are proposed in advance by the developer, each of which represents a kind of dialogue action. Assuming that the dimension of the session semantic vector is m and that the dialog behavior classes proposed by the developers share class i, the size of the first parameter matrix is m x l. The operation of performing linear transformation on the session semantic vector according to the preset second parameter matrix is specifically to multiply the utterance semantic vector and the second parameter matrix to obtain a second intermediate vector of one dimension.

And C2, transforming the second intermediate vector based on a preset second activation function to obtain a dialogue action vector.

The second activation function is specifically a sigmoid function. The first intermediate vector can be converted into a vector with a value range of [0,1] for each dimension, i.e. a dialogue action vector, by softmax. Each dimension in the intention vector is used to represent a score of each dialogue behavior category hit by the user statement, and if the score of a certain dimension is greater than 0.5, the user statement is considered to contain the dialogue behavior corresponding to the dimension, that is, the user statement hits the dialogue behavior corresponding to the dimension. It should be noted that a user statement may hit multiple dialog behaviors, i.e., there may be multiple dimensions with scores greater than 0.5.

105, calculating according to the intention vector to obtain an intention loss, and calculating according to the conversation behavior vector to obtain a conversation behavior loss;

in the embodiment of the application, each user statement carries an intention label y_intentAnd dialog behavior tag y_actWherein the intention label is used for expressing the real intention of the user sentence, and the dialogue behavior label is used for expressing the real dialogue behavior of the user sentence, so that the intention vector and the intention label y of the user sentence can be combined in the training process of the semantic analysis model_intentCalculating intention loss by cross entropy loss function, and associating the dialog behavior vector with the dialog behavior label y of the user statement_actThe dialogue action loss is calculated by a multi-label BCE loss function.

And 106, performing back propagation on the basis of the intention loss and the dialogue action loss, and updating model parameters of each model to be trained.

In this embodiment, the intention loss and the dialogue action loss may be added to be a total loss of the semantic analysis model, and model parameters of each model to be trained may be updated through inverse gradient propagation, where the model to be trained includes the first bidirectional GRU model, the second bidirectional GRU model, the unidirectional GRU model, and the like, and is not limited herein.

Optionally, the semantic analysis model may further include L STM-CRF model, and the method for training the intelligent human-machine interaction model further includes extracting the user statement u from the first bidirectional GRU model_tWhen initializing the CRF probability transition matrix, the probability of the entity label of any non-relevant category transferring to the label at the beginning of the I of the category is constrained to be 0 forcibly, and is made to be constant and not adjusted along with the training process, or the CRF probability transition matrix is made to be equal to a very small value, such as-10000 if the CRF probability transition matrix is a logarithmic probability matrix, that is, the probability of the state transition relation expressed by certain positions in the CRF probability transition matrix is very small, on the basis of the situation, the values of the positions with very small possibility of the expressed state transition relation in the CRF probability transition matrix can be set to be 0 or a very small value forcibly, and during training, the probability of the input entity sequence appearing in all possible entity sequences can be taken as a structural loss function after taking a negative logarithm as the structural loss function according to the maximum likelihood estimation principle, so as to obtain the structural loss of the semantic analysis model.

Optionally, in the case of a structured loss function, the total loss of the semantic analysis model should also take into account the structured loss function, so the step 105 can be expressed as:

dimension adjustment is carried out on the structured loss obtained based on the structured loss function, wherein the operation of the dimension adjustment is specifically to divide the structured loss by (100 times the sequence length of the user statement of the current round);

adding the structure loss after dimension adjustment with the intention loss and the dialogue action loss to obtain the total loss of the semantic analysis model;

accordingly, the above step 106 may be represented as:

and performing back propagation through the total loss, and updating the model parameters of each model to be trained.

Optionally, the semantic analysis model may also perform training set prediction in combination with dictionary information. On the premise of having a dictionary, in order to assist word slot recognition by using dictionary information, the embodiment of the application may use the user sentence u_tThe words in (1) are matched with the relevant dictionary, and the specific process is as follows:

respectively constructing a hash index for each dictionary, wherein keys in the hash index are vocabularies in the dictionary;

generating a line number equal to the number of dictionaries and a column number equal to the number of user sentences u_tA 0 matrix of length, which may be denoted as a dictionary match matrix;

based on the order of the length of the word segment (n-gram) from long to short (i.e., from the length of the word segment to the user's sentence u)_tLength to length of 1) sequentially traversing all the word segments which are not marked in the sentence;

inquiring whether the traversed word fragments appear in the corresponding dictionary; if yes, assigning values to elements of corresponding rows and corresponding columns in the dictionary matching matrix, specifically: marking columns as first characters of the word segments (corresponding to English as first tokens, determined by a segmentation method of a BERT model), marking lines as elements of a corresponding dictionary as 1, marking columns as subsequent characters of the word segments, and marking lines as elements of the corresponding dictionary as-1;

extracting column vectors corresponding to all characters to be used as codes of matching information of the character dictionary;

when the semantic analysis model is trained, the column vector corresponding to each word and the vector related to the word in the current dialog matrix output by the BERT can be spliced together and input into the subsequent first bidirectional GRU network, so that the dictionary information and the semantic analysis model are fused. It should be noted that, in the training process, in order to ensure the recognition capability of the semantic analysis model for entities other than the dictionary, entries in the dictionary in a preset proportion can be randomly shielded, the preset proportion can be 50%, or other values set by research and development personnel, and the preset proportion is not limited here, so as to avoid that the semantic analysis model excessively depends on dictionary matching information; in prediction, all entries need to be matched in order to ensure dictionary coverage. The shielding mode has two modes: the first is that when the semantic analysis model is initialized, entries with preset proportion in each dictionary are randomly shielded, and the entries are shielded from beginning to end in the training process; the second method is that at the beginning of each training round, the vocabulary entries of the preset proportion are randomly selected again for each dictionary to be shielded. That is, the first way is to mask the same set of entries from beginning to end in the training process, that is, the masked entries are fixed; the second way is to change a set of entries for masking during each round of training, that is, the masked entries are dynamically changed. In the embodiment of the application, the first mode is preferably used for shielding, so that the trained semantic analysis model has stronger learning adaptability.

To better illustrate the influence of the dictionary information on the semantic analysis model, the following description is given by using a specific example:

assuming that there are 3 dictionary categories, singer, album, song respectively, the input user statement is "i want to listen to the jane album of zhou jeren", wherein the fragment "zhou jeren" hits in singer's dictionary, the column is labeled as the first character of the fragment (i.e., "week"), and the element labeled singer is assigned "1"; the following characters of the word fragment (i.e., "Jie" and "Lun"), the element whose row is labeled singer, is assigned "-1". For the case of hitting multiple dictionaries, for example, the word fragment "Jay" hits both singer and album, the assignments can be assigned, and no conflict occurs between them. Based on this, the following dictionary matching matrix can be obtained:

	i am	Want to	Listening device	Week (week)	Jie's wine	All-in-one	Is/are as follows	Jay	Specially for cleaning	Editing of
											singer	0	0	0	1	-1	-1	0	1	0	0
album	0	0	0	0	0	0	0	1	0	0
											song	0	0	0	0	0	0	0	0	0	0

According to the dictionary matching matrix, the column vector corresponding to the character "week" is [1,0,0], so that [1,0,0] can be spliced into the vector output by the BERT model corresponding to the character "week" to form a vector of 768+3 dimensions, and the vectors are input into the subsequent first bidirectional GRU model together.

The semantic analysis model provided by the scheme of the application is combined with a BERT model, a plurality of bidirectional GRU models, a unidirectional GRU model and an L STM-CRF model, relies on coding history memory by combining the BERT model and the GRU model, is combined with system sentences of the previous round during semantic analysis, can obtain the semantic analysis model with higher accuracy, can avoid an entity recognition sequence violating a marking rule by restricting a probability transfer matrix of the L STM-CRF model, further can be combined with dictionary information to improve an entity recognition effect, can be widely applied to human-computer interaction dependent systems such as robots, intelligent sound boxes, intelligent customer service, voice assistants, intelligent diagnosis and the like, and has remarkable advantage in the processing accuracy of user sentences.

Example two

A second embodiment of the present application provides an intelligent human-machine interaction model training apparatus, which can be integrated in an electronic device, as shown in fig. 5, the intelligent human-machine interaction model training apparatus 500 in the embodiment of the present application includes:

a sentence obtaining unit 501, configured to obtain a user sentence in a current round and a system sentence in a previous round;

a dialog matrix obtaining unit 502, configured to splice and input the user statement and the system statement into a BERT model to obtain a current dialog matrix;

a session semantic vector obtaining unit 503, configured to input the current dialog matrix into a first bidirectional GRU model to be trained, to obtain a session semantic vector, where the session semantic vector is obtained by splicing a first output result in a first direction and a second output result in a second direction based on the first bidirectional GRU model, an initial hidden layer of the first bidirectional GRU model is constructed based on other sentences, and the other sentences are sentences of a history turn except the system sentences;

a linear transformation unit 504, configured to perform a first linear transformation on the session semantic vector to obtain an intention vector, and perform a second linear transformation on the session semantic vector to obtain a dialogue behavior vector;

a loss calculating unit 505, configured to calculate an intention loss according to the intention vector, and calculate a dialogue action loss according to the dialogue action vector;

a parameter updating unit 506, configured to perform back propagation on the basis of the intention loss and the dialogue action loss, and update model parameters of each model to be trained.

Optionally, the intelligent human-machine interaction model training device 500 further includes:

the statement vector acquisition unit is used for respectively acquiring statement vectors of other statements, wherein the statement vectors are semantic representations of the statements;

the sentence vector input unit is used for inputting each sentence vector into the unidirectional GRU model to be trained;

a memory coding vector obtaining unit, configured to use a vector output by a last hidden layer of the unidirectional GRU model as a memory coding vector;

an initial hidden layer constructing unit, configured to construct an initial hidden layer of the first bidirectional GRU model based on the memory coding vector.

Optionally, the statement vector obtaining unit includes:

the segmentation processing subunit is used for performing word-based segmentation processing on other target statements to obtain a segmented sequence, wherein the other target statements are any one of the other statements;

a statement matrix obtaining subunit, configured to input the segmented sequence into the BERT model, so as to obtain a statement matrix of the other target statements;

and a statement vector obtaining subunit, configured to input the statement matrix into a second bidirectional GRU model to be trained, so as to obtain a statement vector of the target other statement, where the statement vector of the target other statement is obtained by splicing a third output result in the third direction and a fourth output result in the fourth direction based on the second bidirectional GRU model.

Optionally, the term vector obtaining subunit is specifically configured to, after the term matrix is input to the second bidirectional GRU model, take an output of a last hidden layer of the second bidirectional GRU model in the third direction as a third output result, take an output of a last hidden layer of the second bidirectional GRU model in the fourth direction as a fourth output result, and concatenate the third output result and the fourth output result to obtain the term vector of the target other term.

Optionally, the session semantic vector obtaining unit 503 is specifically configured to, after the current dialog matrix is input to the first bidirectional GRU model, take an output of a last hidden layer of the first bidirectional GRU model in the first direction as a first output result, take an output of a last hidden layer of the first bidirectional GRU model in the second direction as a second output result, and splice the first output result and the second output result to obtain the session semantic vector.

Optionally, the linear transformation unit 504 includes:

a first linear transformation subunit, configured to perform linear transformation on the session semantic vector according to a preset first parameter matrix to obtain a first intermediate vector, where a matrix row number of the first parameter matrix is determined according to a dimension of the session semantic vector, a matrix column number of the first parameter matrix is determined according to a preset total number of intention categories, and the dimension of the first intermediate vector is the total number of the intention categories;

and the first activation subunit is configured to transform the first intermediate vector based on a preset first activation function to obtain an intention vector, where each dimension in the intention vector is used to represent a probability that the user statement belongs to each intention category.

Optionally, the linear transformation unit 504 includes:

a second linear transformation subunit, configured to perform linear transformation on the session semantic vector according to a preset second parameter matrix to obtain a second intermediate vector, where a matrix row number of the second parameter matrix is determined according to a dimension of the session semantic vector, a matrix column number of the second parameter matrix is determined according to a preset total number of dialog behavior categories, and the dimension of the second intermediate vector is the total number of the dialog behavior categories;

and the second activation subunit is configured to transform the second intermediate vector based on a preset second activation function to obtain a dialog behavior vector, where each dimension in the intention vector is used to represent a score of each dialog behavior category hit by the user statement.

l STM-CRF input acquisition unit, for taking the hidden layer corresponding to each word of the user sentence in the first bidirectional GRU model as the input of the L STM-CRF model;

the forced constraint unit is used for forcibly constraining the probability that any entity mark in a non-relevant category is transferred to the mark at the beginning of the I of the category to be a preset numerical value when the CRF probability transfer matrix is initialized, and making the probability constant and not adjusted along with the training process;

and the structural loss calculating unit is used for taking the negative logarithm of the probability of the input entity sequence appearing in all possible entity sequences as a structural loss function according to the maximum likelihood estimation principle so as to obtain the structural loss of the semantic analysis model.

Optionally, the loss calculating unit 505 includes:

a structural loss adjustment subunit, configured to perform dimensional adjustment on the structural loss obtained based on the structural loss function, where the dimensional adjustment is specifically performed by dividing the structural loss by (100 × sequence length);

a total loss determining subunit, configured to add the dimension-adjusted structured loss to the intent loss and the dialogue action loss to obtain a total loss of the semantic analysis model;

accordingly, the parameter updating unit 506 is specifically configured to update the model parameters of each model to be trained by performing back propagation on the total loss.

the hash index establishing unit is used for respectively establishing a hash index for each dictionary, wherein keys in the hash index are words in the dictionaries;

a dictionary matching matrix initialization unit for generating a dictionary with a number of rows equal to the number of dictionaries and a number of columns equal to the number of user sentences u_tA 0 matrix of length, which may be denoted as a dictionary match matrix;

the traversal subunit is used for sequentially traversing all the word fragments which are not marked in the sentence based on the sequence of the length of the word fragments from long to short;

the query unit is used for querying whether the traversed word segments appear in the corresponding dictionary or not;

and the element assignment unit is used for assigning values to elements of corresponding rows and corresponding columns in the dictionary matching matrix if the traversed word segments appear in the corresponding dictionary, and specifically comprises the following steps: marking the column as the first character of the character fragment, marking the row as the element of the corresponding dictionary as 1, marking the column as the subsequent character of the character fragment, and marking the row as the element of the corresponding dictionary as-1;

a coding unit for extracting column vectors corresponding to the characters as codes of the character dictionary matching information;

correspondingly, the session semantic vector obtaining unit is specifically configured to splice together the column vector corresponding to each word and a vector related to the word in the current dialog matrix output by the BERT, and input the spliced column vector and the vector into a first bidirectional GRU network to be trained to obtain a session semantic vector.

and the shielding unit is used for randomly shielding entries with preset proportion in each dictionary before training begins.

The intelligent man-machine conversation model training device provided by the scheme of the application can train a semantic analysis model which integrates a BERT model, a plurality of bidirectional GRU models, a unidirectional GRU model and an L STM-CRF model, the semantic analysis model can obtain a semantic analysis model with higher accuracy by integrating the BERT model and the GRU model with coding history memory and integrating system sentences of the previous round during semantic analysis, and an entity recognition sequence violating a marking rule can be avoided by constraining a probability transfer matrix of the L STM-CRF model.

EXAMPLE III

Referring to fig. 6, an electronic device 6 in the embodiment of the present application includes: a memory 601, one or more processors 602 (only one shown in fig. 6), and computer programs stored on the memory 601 and executable on the processors. Wherein: the memory 601 is used for storing software programs and modules, and the processor 602 executes various functional applications and data processing by running the software programs and units stored in the memory 601, so as to acquire resources corresponding to the preset events. Specifically, the processor 602 implements the following steps by running the above-mentioned computer program stored in the memory 601:

Assuming that the above is the first possible implementation manner, in a second possible implementation manner provided on the basis of the first possible implementation manner, the processor 602 further implements the following steps by running the above computer program stored in the memory 601:

respectively obtaining statement vectors of other statements, wherein the statement vectors are semantic representations of the statements;

inputting each statement vector into a one-way GRU model to be trained;

taking the vector output by the last hidden layer of the unidirectional GRU model as a memory coding vector;

an initial concealment layer for the first bi-directional GRU model is constructed based on the memory-encoded vector.

In a third possible implementation manner provided on the basis of the second possible implementation manner, the obtaining statement vectors of the respective other statements includes:

performing word-based segmentation processing on other target sentences to obtain segmented sequences, wherein the other target sentences are any one of the other target sentences;

inputting the segmented sequence into the BERT model to obtain a statement matrix of the other target statements;

and inputting the statement matrix into a second bidirectional GRU model to be trained to obtain a statement vector of the target other statement, wherein the statement vector of the target other statement is obtained by splicing a third output result in a third direction and a fourth output result in a fourth direction based on the second bidirectional GRU model.

In a fourth possible implementation manner provided based on the three possible implementation manners, the inputting the sentence matrix into a second bidirectional GRU model to be trained to obtain a sentence vector of the target other sentence includes:

after the sentence matrix is input to the second bidirectional GRU model, outputting a last hidden layer of the second bidirectional GRU model in the third direction as a third output result;

taking an output of a last hidden layer of the second bidirectional GRU model in the fourth direction as a fourth output result;

and splicing the third output result and the fourth output result to obtain the statement vectors of the other target statements.

In a fifth possible implementation manner provided on the basis of the first possible implementation manner, the second possible implementation manner, the third possible implementation manner, or the fourth possible implementation manner, the inputting the current dialog matrix into the first bidirectional GRU model to be trained to obtain a session semantic vector includes:

after the current dialog matrix is input into the first bidirectional GRU model, outputting a last hidden layer of the first bidirectional GRU model in the first direction as a first output result;

taking an output of a last hidden layer of the first bidirectional GRU model in the second direction as a second output result;

and splicing the first output result and the second output result to obtain the session semantic vector.

In a sixth possible implementation form, which is provided on the basis of the first possible implementation form, the second possible implementation form, the third possible implementation form, or the fourth possible implementation form, the processor 602, by executing the computer program stored in the memory 601, further implements the following steps:

the above-mentioned first linear transformation to the above-mentioned session semantic vector obtains the intention vector, including:

performing linear transformation on the session semantic vector according to a preset first parameter matrix to obtain a first intermediate vector, wherein the number of matrix rows of the first parameter matrix is determined according to the dimension of the session semantic vector, the number of matrix columns of the first parameter matrix is determined according to the total number of preset intention categories, and the dimension of the first intermediate vector is the total number of the intention categories;

and transforming the first intermediate vector based on a preset first activation function to obtain an intention vector, wherein each dimension in the intention vector is used for representing the probability that the user statement belongs to each intention category.

In a seventh possible implementation form that is provided based on the first possible implementation form, the second possible implementation form, the third possible implementation form, or the fourth possible implementation form, the performing the second linear transformation on the session semantic vector to obtain the dialog behavior vector includes:

performing linear transformation on the session semantic vector according to a preset second parameter matrix to obtain a second intermediate vector, wherein the number of matrix rows of the second parameter matrix is determined according to the dimension of the session semantic vector, the number of matrix columns of the second parameter matrix is determined according to the total number of preset dialogue behavior categories, and the dimension of the second intermediate vector is the total number of the dialogue behavior categories;

and transforming the second intermediate vector based on a preset second activation function to obtain a conversation behavior vector, wherein each dimension in the intention vector is used for expressing the score of each conversation behavior category hit by the user statement.

It should be understood that, in the embodiment of the present Application, the Processor 602 may be a Central Processing Unit (CPU), and the Processor may also be other general processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Memory 601 may include both read-only memory and random-access memory, and provides instructions and data to processor 602. Some or all of memory 601 may also include non-volatile random access memory. For example, the memory 601 may also store device class information.

The electronic equipment provided by the scheme of the application can train a semantic analysis model fusing a BERT model, a plurality of bidirectional GRU models, a unidirectional GRU model and a L STM-CRF model, the semantic analysis model can obtain the semantic analysis model with higher accuracy by fusing the BERT model and the GRU model with coding history memory and fusing system sentences of the previous round during semantic analysis, and can avoid an entity recognition sequence violating a marking rule by constraining a probability transfer matrix of the L STM-CRF model.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned functions may be distributed as different functional units and modules according to needs, that is, the internal structure of the apparatus may be divided into different functional units or modules to implement all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art would appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of external device software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described system embodiments are merely illustrative, and for example, the division of the above-described modules or units is only one logical functional division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

The integrated unit may be stored in a computer-readable storage medium if it is implemented in the form of a software functional unit and sold or used as a separate product. Based on such understanding, all or part of the flow in the method of the embodiments described above can be realized by a computer program, which can be stored in a computer-readable storage medium and can realize the steps of the embodiments of the methods described above when the computer program is executed by a processor. The computer program includes computer program code, and the computer program code may be in a source code form, an object code form, an executable file or some intermediate form. The computer-readable storage medium may include: any entity or device capable of carrying the above-described computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer readable Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signal, telecommunication signal, software distribution medium, etc. It should be noted that the computer readable storage medium may contain other contents which can be appropriately increased or decreased according to the requirements of the legislation and the patent practice in the jurisdiction, for example, in some jurisdictions, the computer readable storage medium does not include an electrical carrier signal and a telecommunication signal according to the legislation and the patent practice.

The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. An intelligent human-computer dialogue model training method is characterized by comprising the following steps:

inputting the current dialog matrix into a first bidirectional GRU model to be trained to obtain a session semantic vector, wherein the session semantic vector is obtained by splicing a first output result of the first bidirectional GRU model in a first direction and a second output result of the first bidirectional GRU model in a second direction, an initial hidden layer of the first bidirectional GRU model is constructed based on other sentences, and the other sentences are sentences of a historical turn except the system sentences;

2. The intelligent human-machine dialog model training method of claim 1, further comprising:

inputting each statement vector into a one-way GRU model to be trained;

taking a vector output by the last hidden layer of the unidirectional GRU model as a memory coding vector;

constructing an initial concealment layer for the first bidirectional GRU model based on the memory-encoded vector.

3. The intelligent human-machine dialogue model training method of claim 2, wherein the obtaining of the sentence vector of each of the other sentences, respectively, comprises:

performing word-based segmentation processing on other target sentences to obtain segmented sequences, wherein the other target sentences are any one of the other sentences;

and inputting the statement matrix into a second bidirectional GRU model to be trained to obtain statement vectors of the other target statements, wherein the statement vectors of the other target statements are obtained by splicing a third output result in a third direction and a fourth output result in a fourth direction based on the second bidirectional GRU model.

4. The intelligent human-machine dialogue model training method of claim 3, wherein the inputting the sentence matrix into a second bidirectional GRU model to be trained to obtain a sentence vector of the target other sentence comprises:

after inputting the statement matrix to the second bidirectional GRU model, outputting a last hidden layer of the second bidirectional GRU model in the third direction as a third output result;

5. The intelligent human-machine conversation model training method according to any one of claims 1 to 4, wherein the inputting the current conversation matrix into the first bidirectional GRU model to be trained to obtain a session semantic vector comprises:

after inputting the current dialog matrix to the first bidirectional GRU model, outputting a last hidden layer of the first bidirectional GRU model in the first direction as a first output result;

6. The intelligent human-machine conversation model training method according to any one of claims 1 to 4, wherein the first linear transformation of the conversation semantic vector to obtain an intention vector comprises:

7. The intelligent human-machine conversation model training method according to any one of claims 1 to 4, wherein the second linear transformation of the conversation semantic vector to obtain a conversation behavior vector comprises:

8. An intelligent human-machine dialogue model training device, comprising:

a session semantic vector obtaining unit, configured to input the current dialog matrix into a first bidirectional GRU model to be trained, to obtain a session semantic vector, where the session semantic vector is obtained by splicing a first output result in a first direction and a second output result in a second direction based on the first bidirectional GRU model, an initial hidden layer of the first bidirectional GRU model is constructed based on other sentences, and the other sentences are sentences of a history turn except the system sentences;

the loss calculation unit is used for calculating according to the intention vector to obtain intention loss and calculating according to the conversation behavior vector to obtain conversation behavior loss;

9. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the method of any of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.