CN111460115B - Intelligent man-machine conversation model training method, model training device and electronic equipment - Google Patents

Intelligent man-machine conversation model training method, model training device and electronic equipment Download PDF

Info

Publication number
CN111460115B
CN111460115B CN202010187709.2A CN202010187709A CN111460115B CN 111460115 B CN111460115 B CN 111460115B CN 202010187709 A CN202010187709 A CN 202010187709A CN 111460115 B CN111460115 B CN 111460115B
Authority
CN
China
Prior art keywords
vector
model
matrix
sentences
dialogue
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010187709.2A
Other languages
Chinese (zh)
Other versions
CN111460115A (en
Inventor
马力
熊为星
庞建新
熊友军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Ubtech Technology Co ltd
Original Assignee
Shenzhen Ubtech Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Ubtech Technology Co ltd filed Critical Shenzhen Ubtech Technology Co ltd
Priority to CN202010187709.2A priority Critical patent/CN111460115B/en
Publication of CN111460115A publication Critical patent/CN111460115A/en
Application granted granted Critical
Publication of CN111460115B publication Critical patent/CN111460115B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Machine Translation (AREA)

Abstract

The application provides an intelligent man-machine conversation model training method, an intelligent man-machine conversation model training device, electronic equipment and a computer readable storage medium, wherein the intelligent man-machine conversation model training method comprises the following steps: acquiring a user sentence of a current round and a system sentence of a previous round, splicing and inputting the user sentence and the system sentence of the previous round into a BERT model to obtain a current dialogue matrix; inputting the current dialogue matrix into a first bidirectional GRU model to be trained to obtain a dialogue semantic vector; performing first linear transformation on the conversation semantic vector to obtain an intention vector, and performing second linear transformation on the conversation semantic vector to obtain a conversation behavior vector; according to the intention vector, calculating to obtain the intention loss, and according to the dialogue action vector, calculating to obtain the dialogue action loss; and back-propagating based on the intention loss and the dialogue behavior loss, and updating the model parameters of each model to be trained. According to the scheme, the BERT model and the GRU model are fused to encode the history memory, and the system sentences of the previous round are fused during semantic analysis, so that the semantic analysis model with higher accuracy can be obtained.

Description

Intelligent man-machine conversation model training method, model training device and electronic equipment
Technical Field
The application belongs to the technical field of artificial intelligence, and particularly relates to an intelligent man-machine dialogue model training method, a model training device, electronic equipment and a computer readable storage medium.
Background
With the development of technology, a man-machine interaction system is increasingly widely applied, and in order to realize automatic man-machine interaction, a computer needs to analyze the contained intention and dialogue behavior category from the words input by a user, extract keywords in the words, and formulate a corresponding reply strategy. In recent years, with the development of deep learning technology and the improvement of computing power of computers, people begin to apply the deep learning technology to man-machine interactive systems. However, the current task representing model based on multi-task semantic analysis of multi-round session still has the problems of low accuracy and the like, and cannot meet the demands of people.
Disclosure of Invention
The embodiment of the application provides an intelligent man-machine dialogue model training method, a model training device, electronic equipment and a computer readable storage medium, and a semantic analysis model with higher accuracy can be obtained.
In a first aspect, the present application provides a method for training an intelligent human-machine dialogue model, including:
Acquiring a user statement of a current round and a system statement of a previous round;
splicing the user sentences and the system sentences and inputting the user sentences and the system sentences into a BERT model to obtain a current dialogue matrix;
inputting the current dialogue matrix into a first bidirectional GRU model to be trained to obtain a dialogue semantic vector, wherein the dialogue semantic vector is obtained by splicing a first output result of the first bidirectional GRU model in a first direction and a second output result of the first bidirectional GRU model in a second direction, an initial hidden layer of the first bidirectional GRU model is constructed based on other sentences, and the other sentences are sentences of historical rounds except the system sentences;
performing first linear transformation on the session semantic vector to obtain an intention vector, and performing second linear transformation on the session semantic vector to obtain a dialogue action vector;
calculating according to the intention vector to obtain intention loss, and calculating according to the dialogue action vector to obtain dialogue action loss;
and carrying out back propagation based on the intention loss and the dialogue behavior loss, and updating model parameters of each model to be trained.
In a second aspect, the present application provides an intelligent human-machine conversation model training apparatus, including:
The sentence acquisition unit is used for acquiring the user sentence of the current round and the system sentence of the previous round;
the dialogue matrix acquisition unit is used for splicing the user sentences and the system sentences and inputting the user sentences and the system sentences into the BERT model to obtain a current dialogue matrix;
a session semantic vector obtaining unit, configured to input the current dialogue matrix into a first bidirectional GRU model to be trained, to obtain a session semantic vector, where the session semantic vector is obtained by splicing a first output result of the first bidirectional GRU model in a first direction and a second output result of the first bidirectional GRU model in a second direction, and an initial hidden layer of the first bidirectional GRU model is constructed based on other sentences, where the other sentences are sentences of a history round except the system sentences;
the linear transformation unit is used for performing first linear transformation on the conversation semantic vector to obtain an intention vector, and performing second linear transformation on the conversation semantic vector to obtain a conversation behavior vector;
the loss calculation unit is used for calculating the intention loss according to the intention vector and calculating the dialogue action loss according to the dialogue action vector;
and the parameter updating unit is used for carrying out back propagation based on the intention loss and the dialogue behavior loss and updating the model parameters of each model to be trained.
In a third aspect, the present application provides an electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the method of the first aspect when executing the computer program.
In a fourth aspect, the present application provides a computer readable storage medium storing a computer program which, when executed by a processor, performs the steps of the method of the first aspect described above.
In a fifth aspect, the present application provides a computer program product comprising a computer program which, when executed by one or more processors, implements the steps of the method of the first aspect described above.
From the above, according to the scheme of the application, the user statement of the current round and the system statement of the previous round are firstly obtained, spliced and input into the BERT model to obtain the current dialogue matrix, then the current dialogue matrix is input into the first bidirectional GRU model to be trained to obtain the dialogue semantic vector, the first linear transformation is carried out on the dialogue semantic vector to obtain the intention vector, the second linear transformation is carried out on the dialogue semantic vector to obtain the dialogue action vector, then the intention loss is obtained according to the calculation of the intention vector, the dialogue action loss is obtained according to the calculation of the dialogue action vector, finally the back propagation is carried out based on the intention loss and the dialogue action loss, and the model parameters of each model to be trained are updated. According to the scheme, the BERT model and the GRU model are fused to encode the history memory, and the system sentences of the previous round are fused during semantic analysis, so that the semantic analysis model with higher accuracy can be obtained. It will be appreciated that the advantages of the second to fifth aspects may be found in the relevant description of the first aspect, and are not described here again.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the following description will briefly introduce the drawings that are needed in the embodiments or the description of the prior art, it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a training method for intelligent human-machine dialogue model according to an embodiment of the present application;
FIG. 2 is a schematic workflow diagram of a first bidirectional GRU model and BERT model according to one embodiment of the present application;
FIG. 3 is a schematic workflow diagram of a second bidirectional GRU model and BERT model according to one embodiment of the present application;
FIG. 4 is a schematic workflow diagram of a unidirectional GRU model provided by an embodiment of the application;
FIG. 5 is a block diagram of a training device for intelligent human-machine dialogue model according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
As used in this specification and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".
In addition, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used merely to distinguish between descriptions and are not to be construed as indicating or implying relative importance.
Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.
Example 1
Referring to fig. 1, the method for training the intelligent man-machine conversation model in the embodiment of the present application includes:
step 101, obtaining user sentences of a current round and system sentences of a previous round;
in the embodiment of the present application, for a group of human-computer conversations, sentences of each turn may be numbered according to the order of the conversational turns, in order to distinguish user input and replies of the intelligent reply system, in the embodiment of the present application, a character u is used to represent user sentences input by a tester in a training process, a character s is used to represent system sentences fed back by the intelligent inquiry and reply system, and a subscript is used to represent the turn to which the sentences belong, so that a conversation sequence (u 1 ,s 2 ,u 3 ,s 4 ,…,s t-1 ,u t ) That is, the user statement of the current round is noted as u t The system statement of the previous round is denoted s t-1 . Please refer to fig. 2, byThe semantic analysis model trained by the intelligent man-machine conversation model training method provided by the application can finally realize the recognition of the intention of the user sentence, the conversation behavior category to which the user sentence belongs and the named entity contained in the user sentence, and the intelligent question-answering system can generate a corresponding system sentence based on the recognition result and feed the system sentence back to the user to realize man-machine conversation.
Step 102, splicing the user sentences and the system sentences and inputting the spliced user sentences and the system sentences into a BERT model to obtain a current dialogue matrix;
in the embodiment of the present application, please refer to fig. 2, the semantic analysis model is actually composed of a plurality of models, which includes a BERT (Bidirectional Encoder Representation from Transformers) model, and it should be noted that the BERT model is a model already trained in advance, that is, the intelligent man-machine conversation model training method proposed in the present application does not need to train the BERT model. Considering that the previous session of the current session is also quite important for understanding the current session, the prior art often ignores the previous session, resulting in loss of information, so the user statement u of the current round is selected in the embodiment of the present application t The system statement s of the previous round t-1 Commonly used as inputs to the BERT model, in particular, the user statement u t System statement s t-1 Spliced into a sentence and then input into the BERT model, i.e. not the user sentence u t System statement s t-1 The method is used as input independently, and after the two are spliced, the method takes a sentence form as input, so that the semantic analysis model can more effectively utilize information from the previous session in the semantic analysis process, and the loss of the information is avoided.
Optionally, record user statement u t System statement s t-1 And if one sentence obtained after the splicing operation is a spliced sentence, the BERT model can pre-process the spliced sentence, and the pre-process is specifically word-based segmentation. That is, the BERT model may call the token to cut the spliced sentence word by wordDividing to obtain a plurality of token (namely obtaining a token sequence); putting the token sequence obtained by the segmentation into a BERT model, and taking out a hidden layer corresponding to each token at the top layer of the BERT model, so that a matrix with a preset first size can be spliced, wherein the matrix is the current dialogue matrix; that is, the current dialogue matrix is actually a matrix composed of all vectors outputted from the last layer of the BERT model after the spliced sentence obtained after splicing is inputted into the BERT model. For example only, the first size may be [128×768 ]Wherein, 128 is the length of the token sequence after 0 vector filling processing, and the token sequence contains the user statement u t System statement s t-1 So this value is large and can be adjusted according to the needs of different application scenarios; the above 768 is a constant value, and the dimension of the hidden layer is output for the BERT model.
Step 103, inputting the current dialogue matrix into a first bidirectional GRU model to be trained to obtain a dialogue semantic vector;
in the embodiment of the present application, referring to fig. 2, the semantic parsing model further includes a bidirectional GRU (Gated Recurrent Unit) model. The current dialogue matrix is input into a bidirectional GRU model, and in consideration of the fact that there may be a plurality of bidirectional GRU models in the semantic analysis model, the bidirectional GRU model involved in this step is referred to as a first bidirectional GRU model, and two directions involved in the first bidirectional GRU model are referred to as a first direction and a second direction for distinguishing. Thus, the session semantic vector is actually obtained by stitching the first output result of the first bidirectional GRU model in the first direction and the second output result in the second direction. Specifically, the initial hidden layer of the first bidirectional GRU model is constructed based on other sentences, which are sentences of a history round other than the system sentences, namely, sentence u 1 ,s 2 ,u 3 ,s 4 ,…,s t-3 ,u t-2 . It should be noted that, since the first bidirectional GRU model involves two directions, the above initial hidden layers actually refer to the initial hidden layers in the two directions, respectivelyIs an initial hidden layer in a first direction and an initial hidden layer in a second direction; that is, the initial hidden layer in the first direction and the initial hidden layer in the second direction are each constructed based on the other statements described above.
Specifically, since the first bidirectional GRU model involves two directions, after the current dialogue matrix is input to the first bidirectional GRU model, the output of the last hidden layer of the first bidirectional GRU model in the first direction may be used as a first output result, and the output of the last hidden layer of the first bidirectional GRU model in the second direction may be used as a second output result, and then the first output result and the second output result may be spliced to obtain the dialogue semantic vector. The conversational semantic vector may be considered a semantic representation of a user statement containing contextual information. In fig. 2, the session semantic vector is labeled I.
Alternatively, referring to fig. 3, the initial hidden layer of the first bidirectional GRU model may be constructed as follows:
A1, respectively acquiring statement vectors of other statements;
wherein the other sentence refers to sentence u 1 ,s 2 ,u 3 ,s 4 ,…,s t-3 ,u t-2 The statement u can be obtained first here 1 ,s 2 ,u 3 ,s 4 ,…,s t-3 ,u t-2 Wherein the sentence vector is a semantic representation of a sentence. Specifically, the sentence vector of each sentence can be generated as follows:
a11, performing word segmentation processing on other sentences of the target to obtain a segmented sequence;
wherein the target other sentence is any one of the other sentences, that is, the target other sentence may be u (i) or s (i), and the u (i) is used for expressing u 1 ,u 3 ,……,u t-2 Any of the above statements, wherein s (i) is used to express s 2 ,s 4 ,……,s t-3 Any of the languagesSentence. The target other sentences are firstly called by the BERT model to carry out word-based segmentation processing, and a segmented sequence, namely a token sequence, is obtained.
A12, inputting the segmented sequence into the BERT model to obtain a statement matrix of the other statements of the target;
the hidden layer corresponding to each token at the top layer of the BERT model is taken out, so that a matrix with a preset second size can be spliced, and the matrix is a statement matrix of other statements of the target; that is, similar to the current dialogue matrix, the sentence matrix is a matrix composed of all vectors outputted from the last layer of the BERT model after inputting other sentences of interest into the BERT model. For example only, the size of the current dialog matrix may be [64×768], where "64" is the length of the token sequence after 0 vector filling, and since the token sequence only includes other target sentences, this value is smaller than the size "128" related to the current dialog matrix, and the value may be adjusted according to the needs of different application scenarios; the above 768 is a constant value, and the dimension of the hidden layer is output for the BERT model.
A13, inputting the sentence matrix into a second bidirectional GRU model to be trained, and obtaining sentence vectors of other sentences of the target.
The bidirectional GRU model related to step a13 is different from the first bidirectional GRU model, and thus, the bidirectional GRU model related to step a13 is referred to herein as a second bidirectional GRU model, and two directions related to the second bidirectional GRU model are referred to herein as a third direction and a fourth direction. It should be noted that the second two-way GRU model is also one of the models to be trained.
Specifically, since the second bidirectional GRU model involves two directions, after the sentence matrix is input to the second bidirectional GRU model, the output of the last hidden layer of the second bidirectional GRU model in the third direction may be used as a third output result, and the output of the last hidden layer of the second bidirectional GRU model in the fourth direction may be used as a fourth outputAnd finally, splicing the third output result and the fourth output result to obtain statement vectors of other target statements. For convenience of explanation, the statement vector of the other statement of the target may be denoted as m i . The statement vectors may have a predetermined data buffer area after being generated. When the dialogue proceeds to u of the current round t In this case, the term vector of each of the other terms can be obtained from the data cache region, that is, term u 1 Statement vector m of (2) 1 ,s 2 Statement vector m of (2) 2 ,u 3 Statement vector m of (2) 3 ,s 4 Statement vector m of (2) 4 ,…,s t-3 Statement vector m of (2) t-3 ,u t-2 Statement vector m of (2) t-2 . Statement vectors of the various other statements described above may form a semantic token vector set [ m ] 1 ,m 2 ,……,m t-2 ]。
A2, inputting each sentence vector into a unidirectional GRU model to be trained;
wherein, referring to FIG. 4, the semantic token vector group [ m ] can be described as 1 ,m 2 ,……,m t-2 ]Input to the unidirectional GRU model, it should be noted that the unidirectional GRU model is also one of the models to be trained.
A3, taking a vector output by the last hidden layer of the unidirectional GRU model as a memory coding vector;
a4, constructing an initial hiding layer of the first bidirectional GRU model based on the memory coding vector.
Wherein the memory encoded vector may be denoted as M, in this embodiment, the memory encoded vector M may be used as the initial concealment layer h of the first bidirectional GRU model in two directions 0
104, performing a first linear transformation on the session semantic vector to obtain an intention vector, and performing a second linear transformation on the session semantic vector to obtain a dialogue action vector;
In the embodiment of the application, the session semantic vector can be subjected to linear transformation. The intent vector and the dialogue action vector are mapped, wherein, in consideration of that the objects of the linear transformation are dialogue semantic vectors, in order to distinguish the two linear transformation operations, the linear transformation operation for obtaining the intent vector is marked as a first linear transformation, and the linear transformation operation for obtaining the dialogue action vector is marked as a second linear transformation.
Optionally, the operation of performing the first linear transformation on the session semantic vector to obtain the intent vector may specifically include:
b1, linearly transforming the session semantic vector according to a preset first parameter matrix to obtain a first intermediate vector;
the number of matrix rows of the first parameter matrix is determined according to the dimension of the session semantic vector, the number of matrix columns of the first parameter matrix is determined according to the total number of preset intention categories, and the dimension of the first intermediate vector is the total number of the intention categories. Specifically, the above intention category is proposed in advance by a developer, and for example, there may be a plurality of intention categories such as "listen to music", "take a flight", "go to a city", and "eat food", each of which represents an intention. Assuming that the dimension of the session semantic vector is m, and assuming that the intention category proposed by the developer has n categories, the size of the first parameter matrix is [ m×n ]. The operation of linearly transforming the session semantic vector according to the preset first parameter matrix is specifically to multiply the session semantic vector with the first parameter matrix, so as to obtain an n-dimensional first intermediate vector.
And B2, transforming the first intermediate vector based on a preset first activation function to obtain an intention vector.
Wherein the first activation function is specifically a softmax function. The first intermediate vector can be converted into a probability vector, i.e. an intention vector, by softmax. Each dimension in the intent vector is used for representing the probability that the user sentence belongs to each intent category, namely, each dimension in the intent vector is the probability prediction of different intentions by the semantic analysis model. For example, if the intention class corresponding to the first dimension of the intention vector is "listen to music", the first dimensionIs assumed to be p 1 ) Namely the user statement u predicted by the semantic analysis model t The probability of the intention of "listen to music" is expressed.
Optionally, the operation of performing the second linear transformation on the session semantic vector to obtain a session behavior vector may specifically include:
c1, linearly transforming the session semantic vector according to a preset second parameter matrix to obtain a second intermediate vector;
the number of matrix rows of the second parameter matrix is determined according to the dimension of the session semantic vector, the number of matrix columns of the second parameter matrix is determined according to the total number of preset dialogue action categories, and the dimension of the second intermediate vector is the total number of the dialogue action categories. Specifically, similar to the intention category, the above-described dialog behavior categories are proposed in advance by the developer, each dialog behavior category representing one kind of dialog behavior. Assuming that the dimension of the session semantic vector is m, and assuming that the dialog behavior classes proposed by the developer share l classes, the size of the first parameter matrix is [ m×l ]. And performing linear transformation on the session semantic vector according to a preset second parameter matrix, namely multiplying the session semantic vector by the second parameter matrix to obtain a second intermediate vector with one dimension.
And C2, transforming the second intermediate vector based on a preset second activation function to obtain a dialogue action vector.
Wherein the second activation function is specifically a sigmoid function. The first intermediate vector can be converted into a vector with a value range of 0,1 for each dimension, i.e. a dialog behavior vector, by softmax. Each dimension in the intent vector is used to indicate a score of each dialogue action category hit by the user sentence, and if the score in a certain dimension is greater than 0.5, the user sentence is considered to contain the dialogue action corresponding to the certain dimension, that is, the user sentence hits the dialogue action corresponding to the certain dimension. It is noted that one user statement may hit multiple dialog behaviors, i.e., there may be multiple dimensions with scores greater than 0.5.
Step 105, calculating according to the intention vector to obtain an intention loss, and calculating according to the dialogue action vector to obtain a dialogue action loss;
in the embodiment of the application, each user statement carries an intention label y intent Dialog behavior label y act Wherein the intent label is used for expressing the real intent of the user statement, and the dialogue action label is used for expressing the real dialogue action of the user statement, so that the semantic analysis model can combine the intent vector with the intent label y of the user statement in the training process intent Calculating intent loss by cross entropy loss function, and combining dialogue action vector with dialogue action label y of user sentence act The dialog behavior penalty is calculated by the multi-label BCE penalty function.
And step 106, back-propagating based on the intention loss and the dialogue behavior loss, and updating model parameters of each model to be trained.
In this embodiment of the present application, the intent loss and the dialogue behavior loss may be added to be the total loss of the semantic analysis model, and model parameters of each model to be trained may be updated through inverse gradient propagation, where the model to be trained includes the first bidirectional GRU model, the second bidirectional GRU model, the unidirectional GRU model, and the like, and the application is not limited herein.
Optionally, the semantic analysis model may further include an LSTM-CRF model, and the intelligent man-machine conversation model training method further includes: fetching the user sentence u from the first bidirectional GRU model t The hidden layer corresponding to each token is used as the input of the LSTM-CRF model; when initializing the CRF probability transition matrix, the probability of forcing the entity mark of any non-relevant category to transition to the mark at the beginning of I of the category is 0, and the probability is made to be constant and is not adjusted along with the training process; alternatively, the CRF probability transition matrix is a logarithmic probability matrix, and is made to be a small value, for example, -10000. That is, in the CRF probability transition matrix, there is little possibility that some positions express state transition relationships, based on which The values of these expressed state transition relation-less likely locations in the CRF probability transition matrix are forced to 0 or some small value. During training, the probability of the input entity sequence in all possible entity sequences can be taken as a negative logarithm according to the principle of maximum likelihood estimation and then used as a structural loss function, so that the structural loss of the semantic analysis model is obtained.
Optionally, in the case where there is a structured loss function, the total loss of the semantic parsing model should also take into account the structured loss function, so the step 105 may be expressed as:
the method comprises the steps of carrying out dimension adjustment on the structured loss obtained based on the structured loss function, wherein the dimension adjustment operation is specifically that the structured loss is divided by (100 times of the sequence length of a user statement of the current round);
adding the structured loss after dimension adjustment to the intent loss and the dialogue behavior loss to obtain the total loss of the semantic analysis model;
accordingly, the step 106 may be expressed as:
and (3) back-propagating through the total loss, and updating the model parameters of each model to be trained.
Optionally, the semantic analysis model may also perform training set prediction in combination with dictionary information. Then in the premise of possessing the dictionary, in order to utilize the dictionary information to assist word slot recognition, the embodiment of the application can make the user sentence u t The words in the dictionary are matched with the related dictionary, and the specific process is as follows:
respectively constructing hash indexes for each dictionary, wherein keys in the hash indexes are words in the dictionary;
generating a line number equal to the number of dictionaries and a column number equal to the user statement u t The matrix may be written as a dictionary match matrix;
based on the order of the length of the word segment (n-gram) from long to short (i.e. from the length of the word segment to the user statement u) t The sequence from length to length 1) sequentially traversing all the word fragments which are not marked yet in the sentence;
inquiring whether the traversed word segments appear in the corresponding dictionary; if the corresponding element appears, assigning values to the elements of the corresponding row and the corresponding column in the dictionary matching matrix specifically comprises the following steps: the first character of the character segment is marked by the column (the first token is corresponding to English and is determined by the segmentation method of the BERT model), the element of the corresponding dictionary is marked by the row as 1, the subsequent character of the character segment is marked by the column, and the element of the corresponding dictionary is marked by the row as-1;
extracting column vectors corresponding to the characters as codes of the character dictionary matching information;
when the semantic analysis model is trained, the column vector corresponding to each word and the vector related to the word in the current dialogue matrix output by the BERT can be spliced together and input into a subsequent first bidirectional GRU network, and the fusion of dictionary information and the semantic analysis model is realized in the mode. It should be noted that, in the training process, in order to ensure the recognition capability of the semantic analysis model to the entities outside the dictionary, the vocabulary entries with preset proportion in the dictionary can be randomly shielded, the preset proportion can be 50%, or other values set by the research staff, and the method is not limited herein, so as to avoid that the semantic analysis model excessively depends on the dictionary matching information; in prediction, all entries need to be matched in order to ensure dictionary coverage. The shielding modes include two modes: firstly, randomly shielding entries with preset proportions in each dictionary when the semantic analysis model is initialized, and shielding the entries from beginning to end in the training process; the second is to randomly select the vocabulary entries with preset proportions again for shielding each dictionary when each round of training is started. That is, the first way is to mask the same batch of entries from beginning to end during training, that is, the masked entries are fixed; the second way is to replace a batch of entries for masking during each round of training, i.e. the masked entries are dynamically changed. In the embodiment of the application, the first mode is preferably used for shielding, so that the semantic analysis model obtained through training has stronger learning adaptability.
In order to better explain the influence of the dictionary information on the semantic analysis model, the following description is given by way of specific examples:
assuming that there are 3 dictionary categories in total, singer, album, song respectively, and the input user sentence is "Jay album I want to listen to Zhou Jielun", wherein the word segment "Zhou Jielun" hits the dictionary of singer, the first character (i.e., "week") of the word segment is listed, and the element of the line marked singer is given "1"; the following characters of the word segment (i.e., "jetty" and "rennet"), the element labeled singer is assigned a "-1". For example, word fragment "Jay" hits both singer and album, all can be assigned without collision. Based on this, the following dictionary matching matrix can be obtained:
i am Think about Listening device Circumference of circumference Jie (Jie) Lorentz machine A kind of electronic device Jay Special purpose Editing machine
singer 0 0 0 1 -1 -1 0 1 0 0
album 0 0 0 0 0 0 0 1 0 0
song 0 0 0 0 0 0 0 0 0 0
As can be seen from the dictionary matching matrix, the column vector corresponding to the character 'week' is [1, 0], so that the [1, 0] can be spliced into the vector output by the BERT model corresponding to the character 'week', so as to form a 768+3-dimensional vector, and the 768+3-dimensional vector is input into the subsequent first bidirectional GRU model together.
From the above, the semantic analysis model proposed in the scheme of the present application merges the BERT model, the plurality of bidirectional GRU models, the unidirectional GRU model, and the LSTM-CRF model. By fusing the BERT model and the GRU model, relying on coding history memory and fusing the previous round of system sentences during semantic analysis, a semantic analysis model with higher accuracy can be obtained; by constraining the probability transition matrix of the LSTM-CRF model, the occurrence of entity identification sequences that violate the marking rules can be avoided. Furthermore, the semantic analysis model can be fused with dictionary information to improve the entity recognition effect. The semantic analysis model can be widely applied to systems relying on man-machine interaction, such as robots, intelligent sound boxes, intelligent customer service, voice assistants, intelligent diagnosis and the like, and has remarkable advantages in accuracy of processing user sentences.
Example two
The second embodiment of the present application provides an intelligent man-machine conversation model training device, where the intelligent man-machine conversation device may be integrated in an electronic device, as shown in fig. 5, an intelligent man-machine conversation model training device 500 in the embodiment of the present application includes:
a sentence acquisition unit 501, configured to acquire a user sentence of a current round and a system sentence of a previous round;
A dialogue matrix obtaining unit 502, configured to splice the user statement and the system statement and input the spliced user statement and the system statement into a BERT model, so as to obtain a current dialogue matrix;
a session semantic vector obtaining unit 503, configured to input the current dialogue matrix into a first bidirectional GRU model to be trained, to obtain a session semantic vector, where the session semantic vector is obtained by splicing a first output result of the first bidirectional GRU model in a first direction and a second output result of the first bidirectional GRU model in a second direction, and an initial hidden layer of the first bidirectional GRU model is constructed based on other sentences, where the other sentences are sentences of a history round except the system sentences;
a linear transformation unit 504, configured to perform a first linear transformation on the session semantic vector to obtain an intent vector, and perform a second linear transformation on the session semantic vector to obtain a session behavior vector;
a loss calculation unit 505, configured to calculate an intent loss according to the intent vector, and calculate a dialogue action loss according to the dialogue action vector;
and a parameter updating unit 506, configured to update model parameters of each model to be trained based on the intent loss and the dialogue behavior loss.
Optionally, the intelligent man-machine conversation model training apparatus 500 further includes:
a sentence vector obtaining unit, configured to obtain sentence vectors of each other sentence, where the sentence vectors are semantic representations of the sentences;
the sentence vector input unit is used for inputting each sentence vector into the unidirectional GRU model to be trained;
a memory coding vector obtaining unit, configured to use a vector output by a last hidden layer of the one-way GRU model as a memory coding vector;
and the initial hidden layer construction unit is used for constructing an initial hidden layer of the first bidirectional GRU model based on the memory coding vector.
Optionally, the sentence vector obtaining unit includes:
a segmentation processing subunit, configured to perform word-based segmentation processing on other target sentences to obtain a segmented sequence, where the other target sentences are any one of the other sentences;
a sentence matrix obtaining subunit, configured to input the segmented sequence into the BERT model, to obtain a sentence matrix of the other target sentences;
and the sentence vector obtaining subunit is used for inputting the sentence matrix into a second bidirectional GRU model to be trained to obtain sentence vectors of the target other sentences, wherein the sentence vectors of the target other sentences are obtained by splicing the third output result of the second bidirectional GRU model in the third direction and the fourth output result of the second bidirectional GRU model in the fourth direction.
Optionally, the sentence vector obtaining subunit is specifically configured to, after inputting the sentence matrix to the second bidirectional GRU model, take an output of a last hidden layer of the second bidirectional GRU model in the third direction as a third output result, take an output of a last hidden layer of the second bidirectional GRU model in the fourth direction as a fourth output result, and splice the third output result and the fourth output result to obtain the sentence vector of the other sentence of the target.
Optionally, the session semantic vector obtaining unit 503 is specifically configured to, after inputting the current dialogue matrix to the first bidirectional GRU model, splice the first output result and the second output result with an output of a last hidden layer of the first bidirectional GRU model in the first direction as a first output result, and use an output of a last hidden layer of the first bidirectional GRU model in the second direction as a second output result, thereby obtaining the session semantic vector.
Optionally, the linear transformation unit 504 includes:
the first linear transformation subunit is configured to perform linear transformation on the session semantic vector according to a preset first parameter matrix to obtain a first intermediate vector, where the number of matrix rows of the first parameter matrix is determined according to the dimension of the session semantic vector, the number of matrix columns of the first parameter matrix is determined according to the total number of preset intention categories, and the dimension of the first intermediate vector is the total number of the intention categories;
The first activation subunit is configured to transform the first intermediate vector based on a preset first activation function to obtain an intent vector, where each dimension in the intent vector is used to represent a probability that the user sentence belongs to each intent category.
Optionally, the linear transformation unit 504 includes:
the second linear transformation subunit is configured to perform linear transformation on the session semantic vector according to a preset second parameter matrix to obtain a second intermediate vector, where the number of matrix rows of the second parameter matrix is determined according to the dimension of the session semantic vector, the number of matrix columns of the second parameter matrix is determined according to the total number of preset session behavior categories, and the dimension of the second intermediate vector is the total number of the session behavior categories;
and the second activation subunit is used for transforming the second intermediate vector based on a preset second activation function to obtain a dialogue action vector, wherein each dimension in the intention vector is used for indicating the score of each dialogue action class hit by the user statement.
Optionally, the intelligent man-machine conversation model training apparatus 500 further includes:
an LSTM-CRF input obtaining unit, configured to take out a hidden layer corresponding to each word of the user sentence in the first bidirectional GRU model as an input of the LSTM-CRF model;
The forced constraint unit is used for forcing the probability of transferring the entity mark of any irrelevant class to the mark at the beginning of I of the class to be a preset value and making the probability be a constant when initializing the CRF probability transfer matrix, and the probability is not adjusted along with the training process;
and the structured loss calculation unit is used for taking the negative logarithm of the probability of occurrence of the input entity sequence in all possible entity sequences as a structured loss function according to the maximum likelihood estimation principle so as to obtain the structured loss of the semantic analysis model.
Alternatively, the above-described loss calculation unit 505 includes:
a structured loss adjustment subunit, configured to dimension-adjust a structured loss obtained based on the structured loss function, where the dimension-adjusting operation specifically includes dividing the structured loss by (100×sequence length);
a total loss determination subunit, configured to add the structured loss after dimension adjustment to the intent loss and the dialogue behavior loss to obtain a total loss of the semantic analysis model;
accordingly, the parameter updating unit 506 is specifically configured to update the model parameters of each model to be trained by back-propagating the total loss.
Optionally, the intelligent man-machine conversation model training apparatus 500 further includes:
a hash index establishing unit, configured to establish a hash index for each dictionary, where keys in the hash index are words in the dictionary;
a dictionary matching matrix initializing unit for generating a dictionary number with a number of lines equal to the number of dictionary words and a number of columns equal to the user sentence u t The matrix may be written as a dictionary match matrix;
a traversing subunit, configured to sequentially traverse all the word segments that have not been marked in the sentence based on the order of the word segment lengths from long to short;
the inquiring unit is used for inquiring whether the traversed word segments appear in the corresponding dictionary;
the element assignment unit is configured to assign values to elements of corresponding rows and corresponding columns in the dictionary matching matrix if the traversed word segments appear in the corresponding dictionary, specifically: the first character of the word segment is marked by a column, the element of the corresponding dictionary is marked by a row as 1, the subsequent character of the word segment is marked by a column, and the element of the corresponding dictionary is marked by a row as-1;
the coding unit is used for taking out the column vectors corresponding to the characters and taking the column vectors as codes of the character dictionary matching information;
correspondingly, the session semantic vector obtaining unit is specifically configured to splice a column vector corresponding to each word with a vector related to the word in a current dialogue matrix output by the BERT, and input the spliced column vector and the vector into a first bidirectional GRU network to be trained, so as to obtain a session semantic vector.
Optionally, the intelligent man-machine conversation model training apparatus 500 further includes:
and the shielding unit is used for randomly shielding the entries with preset proportions in each dictionary before training starts.
From the above, the intelligent man-machine dialogue model training device provided in the scheme of the application can train the semantic analysis model fused with the BERT model, the plurality of bidirectional GRU models, the unidirectional GRU model and the LSTM-CRF model. The semantic analysis model is used for obtaining a semantic analysis model with higher accuracy by fusing the BERT model and the GRU model to rely on the coding history memory and fusing the previous round of system sentences during semantic analysis; by constraining the probability transition matrix of the LSTM-CRF model, the occurrence of entity identification sequences that violate the marking rules can be avoided. Furthermore, the semantic analysis model can be fused with dictionary information to improve the entity recognition effect. The semantic analysis model can be widely applied to systems relying on man-machine interaction, such as robots, intelligent sound boxes, intelligent customer service, voice assistants, intelligent diagnosis and the like, and has remarkable advantages in accuracy of processing user sentences.
Example III
Referring to fig. 6, an electronic device 6 in the third embodiment of the present application includes: a memory 601, one or more processors 602 (only one shown in fig. 6) and computer programs stored on the memory 601 and executable on the processors. Wherein: the memory 601 is used for storing software programs and modules, and the processor 602 executes various functional applications and data processing by running the software programs and units stored in the memory 601 to acquire resources corresponding to the preset events. Specifically, the processor 602 implements the following steps by running the above-described computer program stored in the memory 601:
Acquiring a user statement of a current round and a system statement of a previous round;
splicing the user sentences and the system sentences and inputting the user sentences and the system sentences into a BERT model to obtain a current dialogue matrix;
inputting the current dialogue matrix into a first bidirectional GRU model to be trained to obtain a dialogue semantic vector, wherein the dialogue semantic vector is obtained by splicing a first output result of the first bidirectional GRU model in a first direction and a second output result of the first bidirectional GRU model in a second direction, an initial hidden layer of the first bidirectional GRU model is constructed based on other sentences, and the other sentences are sentences of historical rounds except the system sentences;
performing first linear transformation on the session semantic vector to obtain an intention vector, and performing second linear transformation on the session semantic vector to obtain a dialogue action vector;
calculating according to the intention vector to obtain intention loss, and calculating according to the dialogue action vector to obtain dialogue action loss;
and carrying out back propagation based on the intention loss and the dialogue behavior loss, and updating model parameters of each model to be trained.
Assuming that the above is a first possible embodiment, in a second possible embodiment provided on the basis of the first possible embodiment, the processor 602 further implements the following steps by running the above-mentioned computer program stored in the memory 601:
Respectively obtaining sentence vectors of other sentences, wherein the sentence vectors are semantic representations of the sentences;
inputting each sentence vector into a unidirectional GRU model to be trained;
taking a vector output by the last hidden layer of the unidirectional GRU model as a memory coding vector;
and constructing an initial hidden layer of the first bidirectional GRU model based on the memory coding vector.
In a third possible implementation manner provided by the second possible implementation manner, the acquiring statement vectors of each other statement includes:
performing word segmentation processing on other target sentences to obtain a segmented sequence, wherein the other target sentences are any one of the other sentences;
inputting the segmented sequence into the BERT model to obtain a sentence matrix of the other sentences of the target;
and inputting the sentence matrix into a second bidirectional GRU model to be trained to obtain sentence vectors of the target other sentences, wherein the sentence vectors of the target other sentences are obtained by splicing third output results of the second bidirectional GRU model in a third direction and fourth output results of the second bidirectional GRU model in a fourth direction.
In a fourth possible implementation manner provided by the three possible implementation manners as a basis, the inputting the sentence matrix into the second bidirectional GRU model to be trained to obtain the sentence vector of the other sentence of the target includes:
after the sentence matrix is input into the second bidirectional GRU model, taking the output of the last hidden layer of the second bidirectional GRU model in the third direction as a third output result;
taking the output of the last hidden layer of the second bidirectional GRU model in the fourth direction as a fourth output result;
and splicing the third output result and the fourth output result to obtain statement vectors of other statements of the target.
In a fifth possible implementation provided by the first possible implementation, the second possible implementation, the third possible implementation, or the fourth possible implementation as a basis, the inputting the current dialogue matrix into the first bidirectional GRU model to be trained to obtain a dialogue semantic vector includes:
after the current dialog matrix is input into the first bidirectional GRU model, taking the output of the last hidden layer of the first bidirectional GRU model in the first direction as a first output result;
Taking the output of the last hidden layer of the first bidirectional GRU model in the second direction as a second output result;
and splicing the first output result and the second output result to obtain the session semantic vector.
In a sixth possible implementation provided on the basis of the above first possible implementation, or the above second possible implementation, or the above third possible implementation, or the above fourth possible implementation, the processor 602 further implements the following steps by running the above computer program stored in the memory 601:
the first linear transformation of the session semantic vector to obtain an intent vector includes:
linearly transforming the conversation semantic vector according to a preset first parameter matrix to obtain a first intermediate vector, wherein the number of matrix rows of the first parameter matrix is determined according to the dimension of the conversation semantic vector, the number of matrix columns of the first parameter matrix is determined according to the total number of preset intention categories, and the dimension of the first intermediate vector is the total number of the intention categories;
And transforming the first intermediate vector based on a preset first activation function to obtain an intention vector, wherein each dimension in the intention vector is used for representing the probability that the user sentence belongs to each intention category.
In a seventh possible implementation manner provided by the first possible implementation manner, the second possible implementation manner, the third possible implementation manner, or the fourth possible implementation manner, the performing a second linear transformation on the session semantic vector to obtain a dialogue action vector includes:
linearly transforming the conversation semantic vector according to a preset second parameter matrix to obtain a second intermediate vector, wherein the number of matrix rows of the second parameter matrix is determined according to the dimension of the conversation semantic vector, the number of matrix columns of the second parameter matrix is determined according to the total number of preset conversation behavior categories, and the dimension of the second intermediate vector is the total number of the conversation behavior categories;
and transforming the second intermediate vector based on a preset second activation function to obtain a dialogue action vector, wherein each dimension in the intention vector is used for indicating the score of each dialogue action class hit by the user statement.
It should be appreciated that in embodiments of the present application, the processor 602 may be a central processing unit (Central Processing Unit, CPU), which may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSPs), application specific integrated circuits (Application Specific Integrated Circuit, ASICs), off-the-shelf programmable gate arrays (Field-Programmable Gate Arra, FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Memory 601 may include read only memory and random access memory and provides instructions and data to processor 602. Some or all of the memory 601 may also include non-volatile random access memory. For example, the memory 601 may also store information of a device class.
From the above, the electronic device provided in the scheme of the present application can train the semantic analysis model fused with the BERT model, the plurality of bidirectional GRU models, the unidirectional GRU model and the LSTM-CRF model. The semantic analysis model is used for obtaining a semantic analysis model with higher accuracy by fusing the BERT model and the GRU model to rely on the coding history memory and fusing the previous round of system sentences during semantic analysis; by constraining the probability transition matrix of the LSTM-CRF model, the occurrence of entity identification sequences that violate the marking rules can be avoided. Furthermore, the semantic analysis model can be fused with dictionary information to improve the entity recognition effect. The semantic analysis model can be widely applied to systems relying on man-machine interaction, such as robots, intelligent sound boxes, intelligent customer service, voice assistants, intelligent diagnosis and the like, and has remarkable advantages in accuracy of processing user sentences.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of external device software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the system embodiments described above are merely illustrative, e.g., the division of modules or units described above is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.
The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
The integrated units described above, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application implements all or part of the flow of the method of the above-described embodiments, or may be implemented by a computer program to instruct associated hardware, where the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of each of the method embodiments described above. The computer program comprises computer program code, and the computer program code can be in a source code form, an object code form, an executable file or some intermediate form and the like. The above computer readable storage medium may include: any entity or device capable of carrying the computer program code described above, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer readable Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier wave signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the content of the computer readable storage medium described above may be appropriately increased or decreased according to the requirements of the jurisdiction's legislation and the patent practice, for example, in some jurisdictions, the computer readable storage medium does not include electrical carrier signals and telecommunication signals according to the legislation and the patent practice.
The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims (7)

1. An intelligent human-machine conversation model training method is characterized by comprising the following steps:
acquiring a user statement of a current round and a system statement of a previous round;
splicing the user sentences and the system sentences and inputting the user sentences and the system sentences into a BERT model to obtain a current dialogue matrix;
inputting the current dialogue matrix into a first bidirectional GRU model to be trained to obtain a dialogue semantic vector, wherein the dialogue semantic vector is obtained by splicing a first output result of the first bidirectional GRU model in a first direction and a second output result of the first bidirectional GRU model in a second direction, and an initial hidden layer of the first bidirectional GRU model is constructed based on other sentences which are sentences of historical rounds except the system sentences;
Performing first linear transformation on the conversation semantic vector to obtain an intention vector, and performing second linear transformation on the conversation semantic vector to obtain a conversation behavior vector;
calculating according to the intention vector to obtain intention loss, and calculating according to the dialogue action vector to obtain dialogue action loss;
based on the intention loss and the dialogue behavior loss, back-propagating is carried out, and model parameters of each model to be trained are updated;
the step of inputting the current dialogue matrix into a first bidirectional GRU model to be trained to obtain a dialogue semantic vector comprises the following steps:
after inputting the current dialog matrix into the first bidirectional GRU model, taking an output of a last hidden layer of the first bidirectional GRU model in the first direction as a first output result;
taking the output of the last hidden layer of the first bidirectional GRU model in the second direction as a second output result;
splicing the first output result and the second output result to obtain the session semantic vector;
the first linear transformation of the session semantic vector to obtain an intent vector includes:
performing linear transformation on the conversation semantic vector according to a preset first parameter matrix to obtain a first intermediate vector, wherein the number of matrix rows of the first parameter matrix is determined according to the dimension of the conversation semantic vector, the number of matrix columns of the first parameter matrix is determined according to the total number of preset intention categories, and the dimension of the first intermediate vector is the total number of the intention categories;
Transforming the first intermediate vector based on a preset first activation function to obtain an intention vector, wherein each dimension in the intention vector is used for representing the probability that the user statement belongs to each intention category;
the second linear transformation of the session semantic vector to obtain a session behavior vector includes:
performing linear transformation on the conversation semantic vector according to a preset second parameter matrix to obtain a second intermediate vector, wherein the number of matrix rows of the second parameter matrix is determined according to the dimension of the conversation semantic vector, the number of matrix columns of the second parameter matrix is determined according to the total number of preset conversation behavior categories, and the dimension of the second intermediate vector is the total number of the conversation behavior categories;
and transforming the second intermediate vector based on a preset second activation function to obtain a dialogue action vector, wherein each dimension in the intention vector is used for representing the score of each dialogue action class hit by the user statement.
2. The intelligent human-machine conversation model training method of claim 1, wherein the intelligent human-machine conversation model training method further comprises:
Respectively obtaining sentence vectors of other sentences, wherein the sentence vectors are semantic representations of the sentences;
inputting each sentence vector into a unidirectional GRU model to be trained;
taking a vector output by the last hidden layer of the unidirectional GRU model as a memory coding vector;
an initial concealment layer for the first two-way GRU model is constructed based on the memory encoded vector.
3. The intelligent human-machine conversation model training method of claim 2 wherein the respectively acquiring sentence vectors of each other sentence comprises:
performing word segmentation processing on other target sentences to obtain a segmented sequence, wherein the other target sentences are any one of the other sentences;
inputting the segmented sequence into the BERT model to obtain statement matrixes of other statements of the target;
and inputting the sentence matrix into a second bidirectional GRU model to be trained to obtain sentence vectors of the target other sentences, wherein the sentence vectors of the target other sentences are obtained by splicing based on a third output result of the second bidirectional GRU model in a third direction and a fourth output result of the second bidirectional GRU model in a fourth direction.
4. The intelligent human-machine conversation model training method of claim 3 wherein inputting the sentence matrix into a second bidirectional GRU model to be trained to obtain sentence vectors of the other sentences of the target comprises:
after inputting the sentence matrix to the second bidirectional GRU model, taking an output of a last hidden layer of the second bidirectional GRU model in the third direction as a third output result;
taking the output of the last hidden layer of the second bidirectional GRU model in the fourth direction as a fourth output result;
and splicing the third output result and the fourth output result to obtain statement vectors of other statements of the target.
5. An intelligent human-machine conversation model training device, comprising:
the sentence acquisition unit is used for acquiring the user sentence of the current round and the system sentence of the previous round;
the dialogue matrix acquisition unit is used for splicing the user sentences and the system sentences and inputting the user sentences and the system sentences into the BERT model to obtain a current dialogue matrix;
a session semantic vector obtaining unit, configured to input the current dialogue matrix into a first bidirectional GRU model to be trained, to obtain a session semantic vector, where the session semantic vector is obtained by splicing a first output result of the first bidirectional GRU model in a first direction and a second output result of the first bidirectional GRU model in a second direction, and an initial hidden layer of the first bidirectional GRU model is constructed based on other sentences, where the other sentences are sentences of a historical round except the system sentences;
The linear transformation unit is used for performing first linear transformation on the conversation semantic vector to obtain an intention vector, and performing second linear transformation on the conversation semantic vector to obtain a conversation behavior vector;
the loss calculation unit is used for calculating the intention loss according to the intention vector and calculating the dialogue action loss according to the dialogue action vector;
the parameter updating unit is used for carrying out back propagation based on the intention loss and the dialogue behavior loss and updating the model parameters of each model to be trained;
the session semantic vector obtaining unit is specifically configured to, after inputting the current dialogue matrix to the first bidirectional GRU model, take an output of a last hidden layer of the first bidirectional GRU model in the first direction as a first output result, take an output of a last hidden layer of the first bidirectional GRU model in the second direction as a second output result, and splice the first output result and the second output result to obtain the session semantic vector;
wherein the linear transformation unit includes:
the first linear transformation subunit is configured to perform linear transformation on the session semantic vector according to a preset first parameter matrix to obtain a first intermediate vector, where the number of matrix rows of the first parameter matrix is determined according to the dimension of the session semantic vector, the number of matrix columns of the first parameter matrix is determined according to the total number of preset intention categories, and the dimension of the first intermediate vector is the total number of the intention categories;
The first activation subunit is used for transforming the first intermediate vector based on a preset first activation function to obtain an intention vector, wherein each dimension in the intention vector is used for representing the probability that the user statement belongs to each intention category;
wherein the linear transformation unit includes:
the second linear transformation subunit is configured to perform linear transformation on the session semantic vector according to a preset second parameter matrix to obtain a second intermediate vector, where the number of matrix rows of the second parameter matrix is determined according to the dimension of the session semantic vector, the number of matrix columns of the second parameter matrix is determined according to the total number of preset session behavior categories, and the dimension of the second intermediate vector is the total number of the session behavior categories;
and the second activation subunit is used for transforming the second intermediate vector based on a preset second activation function to obtain a dialogue action vector, wherein each dimension in the intention vector is used for indicating the score of each dialogue action class hit by the user statement.
6. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 4 when executing the computer program.
7. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the method according to any one of claims 1 to 4.
CN202010187709.2A 2020-03-17 2020-03-17 Intelligent man-machine conversation model training method, model training device and electronic equipment Active CN111460115B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010187709.2A CN111460115B (en) 2020-03-17 2020-03-17 Intelligent man-machine conversation model training method, model training device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010187709.2A CN111460115B (en) 2020-03-17 2020-03-17 Intelligent man-machine conversation model training method, model training device and electronic equipment

Publications (2)

Publication Number Publication Date
CN111460115A CN111460115A (en) 2020-07-28
CN111460115B true CN111460115B (en) 2023-05-26

Family

ID=71682088

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010187709.2A Active CN111460115B (en) 2020-03-17 2020-03-17 Intelligent man-machine conversation model training method, model training device and electronic equipment

Country Status (1)

Country Link
CN (1) CN111460115B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112528668A (en) * 2020-11-27 2021-03-19 湖北大学 Deep emotion semantic recognition method, system, medium, computer equipment and terminal
CN112579760B (en) * 2020-12-29 2024-01-19 深圳市优必选科技股份有限公司 Man-machine conversation method, device, computer equipment and readable storage medium
CN113158691B (en) * 2021-04-15 2023-02-28 清华大学 Dialogue method and device based on mixed knowledge management and electronic equipment
CN113139045B (en) * 2021-05-13 2023-05-05 八维(杭州)科技有限公司 Selective question-answering method based on task-driven man-machine dialogue
CN113326365B (en) * 2021-06-24 2023-11-07 中国平安人寿保险股份有限公司 Reply sentence generation method, device, equipment and storage medium
CN113934825B (en) * 2021-12-21 2022-03-08 北京云迹科技有限公司 Question answering method and device and electronic equipment
CN114417891B (en) * 2022-01-22 2023-05-09 平安科技(深圳)有限公司 Reply statement determination method and device based on rough semantics and electronic equipment
CN114490994B (en) * 2022-03-28 2022-06-28 北京沃丰时代数据科技有限公司 Conversation management method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109522545A (en) * 2018-10-11 2019-03-26 华东师范大学 A kind of appraisal procedure that more wheels are talked with coherent property amount
CN109885670A (en) * 2019-02-13 2019-06-14 北京航空航天大学 A kind of interaction attention coding sentiment analysis method towards topic text

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6828667B2 (en) * 2017-11-28 2021-02-10 トヨタ自動車株式会社 Voice dialogue device, voice dialogue method and program

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109522545A (en) * 2018-10-11 2019-03-26 华东师范大学 A kind of appraisal procedure that more wheels are talked with coherent property amount
CN109885670A (en) * 2019-02-13 2019-06-14 北京航空航天大学 A kind of interaction attention coding sentiment analysis method towards topic text

Also Published As

Publication number Publication date
CN111460115A (en) 2020-07-28

Similar Documents

Publication Publication Date Title
CN111460115B (en) Intelligent man-machine conversation model training method, model training device and electronic equipment
CN111191016B (en) Multi-round dialogue processing method and device and computing equipment
CN112100354B (en) Man-machine conversation method, device, equipment and storage medium
US12008336B2 (en) Multimodal translation method, apparatus, electronic device and computer-readable storage medium
CN111026857B (en) Conversation state tracking method, man-machine conversation method and system
CN110678882B (en) Method and system for selecting answer spans from electronic documents using machine learning
CN111931513A (en) Text intention identification method and device
JP2023535709A (en) Language expression model system, pre-training method, device, device and medium
CN110990555B (en) End-to-end retrieval type dialogue method and system and computer equipment
CN112163067A (en) Sentence reply method, sentence reply device and electronic equipment
CN112668333A (en) Named entity recognition method and device, and computer-readable storage medium
CN111079418A (en) Named body recognition method and device, electronic equipment and storage medium
CN111814479B (en) Method and device for generating enterprise abbreviations and training model thereof
CN116127001A (en) Sensitive word detection method, device, computer equipment and storage medium
CN112487813B (en) Named entity recognition method and system, electronic equipment and storage medium
CN117056494B (en) Open domain question and answer method, device, electronic equipment and computer storage medium
CN111563161B (en) Statement identification method, statement identification device and intelligent equipment
CN117370512A (en) Method, device, equipment and storage medium for replying to dialogue
CN113051935A (en) Intelligent translation method and device, terminal equipment and computer readable storage medium
CN110750967A (en) Pronunciation labeling method and device, computer equipment and storage medium
WO2023137903A1 (en) Reply statement determination method and apparatus based on rough semantics, and electronic device
CN116186219A (en) Man-machine dialogue interaction method, system and storage medium
CN115906855A (en) Word information fused Chinese address named entity recognition method and device
CN112183114B (en) Model training and semantic integrity recognition method and device
CN117235205A (en) Named entity recognition method, named entity recognition device and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant