CN110674281B

CN110674281B - Man-machine conversation and man-machine conversation model acquisition method, device and storage medium

Info

Publication number: CN110674281B
Application number: CN201911231125.4A
Authority: CN
Inventors: 郭振; 王海峰; 吴华; 刘占一
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2019-12-05
Filing date: 2019-12-05
Publication date: 2020-05-29
Anticipated expiration: 2039-12-05
Also published as: CN110674281A

Abstract

The application discloses a method, a device and a storage medium for acquiring a man-machine conversation and a man-machine conversation model, which relate to the field of artificial intelligence, wherein the man-machine conversation method can comprise the following steps: the man-machine conversation model encodes the acquired conversation input to obtain an encoding result; the man-machine conversation model is obtained by training according to conversation data serving as a training sample; the man-machine conversation model determines semantic content for constraint reply according to the coding result; the man-machine conversation model determines decoding parameters; and the man-machine conversation model decodes the semantic content according to the decoding parameters to generate a reply aiming at the conversation input. By applying the scheme, the accuracy of the generated reply can be improved.

Description

Man-machine conversation and man-machine conversation model acquisition method, device and storage medium

Technical Field

The present application relates to the field of computer applications, and in particular, to a method and an apparatus for human-machine interaction and human-machine interaction model acquisition in the field of artificial intelligence, and a storage medium.

Background

Human-machine dialogue is an important subject in the field of artificial intelligence, and common human-machine dialogue models are Sequence-to-Sequence (Seq 2 Seq) models and various variants thereof. The Seq2Seq model includes an Encoder (Encoder) and a Decoder (Decoder), the Encoder encodes the dialog input (utterance), i.e. maps to the implicit semantic space, and the Decoder generates each word in the reply (response) in turn in a recursive manner.

The training samples of the Seq2Seq model are usually real person-to-person dialogue data mined from a network chat scene, and a dialogue input-reply pair is formed by 'question-answer' in the dialogue data and is used for model training. However, a real person-to-person conversation is constrained by a plurality of complex background information and rules, that is, a transition from a conversation input to a reply can be established only under the condition that all actual conversations are met, and the Seq2Seq model attempts to model the mapping relation only by fitting the conversation data, which causes the generated reply to be inaccurate, and the like.

Disclosure of Invention

In view of the above, the present application provides a method, an apparatus and a storage medium for human-machine interaction and human-machine interaction model acquisition.

A human-machine dialog method, comprising:

the man-machine conversation model encodes the acquired conversation input to obtain an encoding result; the man-machine conversation model is obtained by training according to conversation data serving as a training sample;

the man-machine conversation model determines semantic content for constraint reply according to the coding result;

the man-machine conversation model maps the semantic content into M weights, wherein M is a positive integer greater than one, products between each weight and vector representation of corresponding parameters are calculated respectively, and the M products are added to obtain decoding parameters, wherein the vector representation corresponding to the parameters is obtained by training;

and the man-machine conversation model decodes the semantic content according to the decoding parameters to generate a reply aiming at the conversation input.

According to a preferred embodiment of the present application, the determining semantic content for constraining reply according to the encoding result includes:

respectively calculating the similarity between the encoding result and the vector representation of the keywords in N groups of keyword-value pairs, wherein N is a positive integer greater than one;

respectively taking the similarity corresponding to each keyword as the weight of the value corresponding to the keyword;

respectively calculating the product of the vector representation of each value and the corresponding weight, and adding the N products to obtain the semantic content;

and the vector representation corresponding to the keywords and the values is obtained by training.

A human-machine dialogue model acquisition method comprises the following steps:

obtaining dialogue data serving as a training sample;

training by using the training sample to obtain a man-machine conversation model;

the man-machine dialogue model consists of an encoder, a content memory unit, a strategy memory unit and a decoder; in the process of man-machine conversation, the encoder is used for encoding the acquired conversation input to obtain an encoding result, the content memory unit is used for determining semantic content for constraint reply according to the encoding result, the strategy memory unit is used for mapping the semantic content into M weights, M is a positive integer greater than one, products between each weight and vector representation of corresponding parameters are respectively calculated, the M products are added to obtain decoding parameters, the vector representation corresponding to the parameters is obtained through training, and the decoder is used for decoding the semantic content according to the decoding parameters to generate reply aiming at the conversation input.

According to a preferred embodiment of the present application, the training of the human-machine interaction model using the training samples includes:

training the content memory unit by using training samples in a first training sample set;

training the man-machine conversation model by using training samples in a second training sample set after training is finished;

wherein the number of training samples in the second set of training samples is less than the number of training samples in the first set of training samples.

According to a preferred embodiment of the present application, the training the content memorization unit by using the training samples in the first training sample set comprises:

and training to obtain the keywords and the vector representations of the values in N groups of keyword-value pairs in the content memory unit, wherein N is a positive integer greater than one, so that the semantic content for the constraint reply is determined by the content memory unit according to the coding result and the keyword and value vector representations in the N groups of keyword-value pairs in the man-machine conversation process.

According to a preferred embodiment of the present application, the training to obtain the vector representation of the keywords and values in the N keyword-value pairs in the content memory unit includes:

respectively obtaining a positive sample and at least one negative sample during each training, wherein each positive sample comprises two vector representations U and R, the U corresponds to a dialog input in a real dialog, the R corresponds to a reply in the real dialog, and each negative sample comprises two vector representations U 'and R', the U 'corresponds to a dialog input in a false dialog, and the R' corresponds to a reply in the false dialog;

respectively calculating the similarity between the U and the vector representation of the keywords in the N groups of keyword-value pairs, and respectively taking the similarity corresponding to each keyword as the weight of the keyword and the value corresponding to the keyword;

respectively calculating the product of the vector representation of each keyword and the corresponding weight, adding the N products to obtain an intermediate vector U ', respectively calculating the product of the vector representation of each value and the corresponding weight, and adding the N products to obtain an intermediate vector R';

and finally training to obtain keywords and vector representation of values in the N groups of keyword-value pairs according to the principle that the U 'and the R' are close to the U and the R and far away from the U 'and the R'.

According to a preferred embodiment of the present application, the training the human-machine dialogue model using the training samples in the second training sample set includes: and training to obtain vector representation of M parameters in the strategy memory unit.

A human-machine interaction device applied to a human-machine interaction model trained from interaction data serving as training samples, comprising: an encoder, a content memory unit, a policy memory unit, and a decoder;

the encoder is used for encoding the acquired dialogue input to obtain an encoding result;

the content memory unit is used for determining semantic content for constraint reply according to the encoding result;

the strategy memory unit is used for mapping the semantic content into M weights, wherein M is a positive integer greater than one, respectively calculating the product between each weight and the vector representation of the corresponding parameter, and adding the M products to obtain a decoding parameter, wherein the vector representation corresponding to the parameter is obtained by training;

the decoder is used for decoding the semantic content according to the decoding parameters and generating a reply aiming at the dialogue input.

According to a preferred embodiment of the present application, the content memory unit calculates similarity between the encoding result and vector representations of keywords in N sets of keyword-value pairs, where N is a positive integer greater than one, respectively takes the similarity corresponding to each keyword as a weight of a value corresponding to the keyword, respectively calculates a product of the vector representation of each value and the corresponding weight, and adds the N products to obtain the semantic content, where the keywords and the vector representations corresponding to the values are both obtained by pre-training.

A human-machine dialogue model acquisition apparatus comprising: an acquisition unit and a training unit;

the acquisition unit is used for acquiring dialogue data serving as training samples;

the training unit is used for training by using the training sample to obtain a man-machine conversation model; the man-machine dialogue model consists of an encoder, a content memory unit, a strategy memory unit and a decoder; in the process of man-machine conversation, the encoder is used for encoding the acquired conversation input to obtain an encoding result, the content memory unit is used for determining semantic content for constraint reply according to the encoding result, the strategy memory unit is used for mapping the semantic content into M weights, M is a positive integer greater than one, products between each weight and vector representation of corresponding parameters are respectively calculated, the M products are added to obtain decoding parameters, the vector representation corresponding to the parameters is obtained through training, and the decoder is used for decoding the semantic content according to the decoding parameters to generate reply aiming at the conversation input.

According to a preferred embodiment of the present application, the training unit is further configured to train the content memorizing unit with training samples in a first training sample set, and train the human-machine dialogue model with training samples in a second training sample set after training, wherein the number of training samples in the second training sample set is smaller than the number of training samples in the first training sample set.

According to a preferred embodiment of the present application, the training unit trains to obtain vector representations of keywords and values in N sets of keyword-value pairs in the content memory unit, where N is a positive integer greater than one, so that in a human-computer conversation process, the content memory unit determines the semantic content for constraint reply according to the encoding result and the vector representations of the keywords and values in the N sets of keyword-value pairs.

According to a preferred embodiment of the present application, the training unit obtains a positive sample and at least one negative sample respectively during each training, each positive sample including two vector representations U and R, the U corresponding to a dialog input in a set of real dialogs, the R corresponding to a reply in the real dialogs, and each negative sample including two vector representations U "and R", the U "corresponding to a dialog input in a set of false dialogs, and the R" corresponding to a reply in the false dialogs; respectively calculating the similarity between the U and the vector representation of the keywords in the N groups of keyword-value pairs, and respectively taking the similarity corresponding to each keyword as the weight of the keyword and the value corresponding to the keyword; respectively calculating the product of the vector representation of each keyword and the corresponding weight, adding the N products to obtain an intermediate vector U ', respectively calculating the product of the vector representation of each value and the corresponding weight, and adding the N products to obtain an intermediate vector R'; and finally training to obtain keywords and vector representation of values in the N groups of keyword-value pairs according to the principle that the U 'and the R' are close to the U and the R and far away from the U 'and the R'.

According to a preferred embodiment of the present application, the training unit utilizes the training samples in the second training sample set to train and obtain vector representations of M parameters in the policy memory unit.

An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method as described above.

A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method as described above.

One embodiment in the above application has the following advantages or benefits: through training, the man-machine conversation model can have memory capacity, so that after the man-machine conversation model encodes the acquired conversation input, semantic content for constraint reply can be determined according to an encoding result, a corresponding decoding parameter can be determined, and the semantic content can be decoded according to the decoding parameter, so that a reply for the conversation input is generated, compared with the existing operation of only encoding and decoding, the process of generating the reply is perfected, and the accuracy, diversity and the like of the generated reply are improved; in addition, when the human-computer dialogue model is trained, the training samples in the first training sample set can be used for training the content memory unit in the human-computer dialogue model to enable the content memory unit to memorize a large amount of dialogue data, decoding operation is not needed in the process, the speed is high, after training is finished, the training samples in the second training sample set can be further used for training the human-computer dialogue model, wherein the number of the training samples in the second training sample set is smaller than that of the training samples in the first training sample set, and therefore training efficiency and the like are improved; other effects of the above-described alternative will be described below with reference to specific embodiments.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

FIG. 1 is a flowchart of an embodiment of a method for obtaining a human-machine interaction model according to the present application;

FIG. 2 is a flow chart of an embodiment of a human-machine dialog method described herein;

FIG. 3 is a schematic diagram illustrating a process for obtaining decoding parameters according to the present application;

FIG. 4 is a block diagram of an embodiment of a human-machine interaction device 400 according to the present application;

FIG. 5 is a schematic diagram illustrating a structure of an embodiment of a human-machine interaction model obtaining apparatus 500 according to the present application;

FIG. 6 is a block diagram of an electronic device according to the method of an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In addition, it should be understood that the term "and/or" herein is merely one type of association relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

The cognitive psychology shows that the working mechanism of the brain has a long-term memory module in addition to a short-term working memory module. Long-term memory is mainly divided into two categories: one is declarative memory for memorizing semantic content for operation such as concepts, knowledge, common knowledge and the like; one is procedural memory, which is used to memorize executable operation methods such as methods, programs, etc.

The understanding and the operation of the world are hidden behind the conversation, the skill of the most intelligent human is embodied, and the conversation is closely inseparable with the work mechanism of the brain. Based on the above, the application provides an improved human-machine conversation model, and the memory capacity is increased on the basis of the existing human-machine conversation model.

Fig. 1 is a flowchart of an embodiment of a method for acquiring a human-machine interaction model according to the present application. As shown in fig. 1, the following detailed implementation is included.

In 101, dialogue data is acquired as a training sample.

At 102, a human-machine dialogue model is obtained by training with training samples.

The man-machine dialogue model is composed of an encoder, a content Memory unit (DMN), a policy Memory unit (PMN), and a decoder.

In the man-machine conversation process, the encoder can be used for encoding the acquired conversation input to obtain an encoding result, the content memory unit can be used for determining semantic content for constraint reply according to the encoding result, the strategy memory unit can be used for determining decoding parameters, and the decoder can be used for decoding the semantic content according to the decoding parameters to generate reply aiming at the conversation input.

In order to improve the training efficiency, in practical application, the training samples in the first training sample set can be used for training the content memory unit, decoding operation is not needed in the process, so that the speed is high, and after training is finished, the training samples in the second training sample set can be further used for training the man-machine conversation model. Wherein the number of training samples in the second training sample set is smaller than the number of training samples in the first training sample set.

The method comprises the steps of firstly training a content memory unit by using a large-scale training sample to enable the content memory unit to memorize a large amount of dialogue data to form a dialogue semantic model with strong universality, then training the whole man-machine dialogue model by using a relatively small-scale training sample, wherein the whole man-machine dialogue model comprises an encoder, a strategy memory unit and a decoder in the training man-machine dialogue model, and further optimizing the content memory unit. The large-scale training samples and the small-scale training samples can be partially or completely overlapped or completely not overlapped.

When the content memory unit is trained by using the training samples in the first training sample set, the vector representation of key and value in N groups of key-value pairs in the content memory unit can be obtained through training, wherein N is a positive integer greater than one. In this way, in the man-machine conversation process, the semantic content for constraint reply can be determined by the content memory unit according to the obtained coding result and the vector representation of the key and the value in the N groups of key-value pairs.

The specific value of N can be determined according to actual needs, initial values of the vector representations of the keys and the values in the N groups of key-value pairs can be obtained by random initialization, and the vector representations of the keys and the values in the N groups of key-value pairs are finally determined through training.

During each training, a positive sample and at least one negative sample can be obtained respectively, usually only one negative sample is needed, each positive sample can contain two vector representations U and R, U corresponds to a dialog input in a group of real dialogs, R corresponds to a reply in the group of real dialogs, each negative sample can also contain two vector representations U 'and R', U 'corresponds to a dialog input in a group of false dialogs, and R' corresponds to a reply in the group of false dialogs; then, the similarity between the vector representations of the U and the key in the N groups of key-value pairs can be respectively calculated, and the similarity corresponding to each key is respectively used as the weight of the key and the value corresponding to the key; then, the vector representation of each key and the product of the corresponding weight can be respectively calculated, N products are added to obtain an intermediate vector U ', the vector representation of each value and the product of the corresponding weight can be respectively calculated, N products are added to obtain an intermediate vector R', and further, the vector representations of the keys and the values in N groups of key-value pairs can be finally trained according to the principle that the U 'and the R' are close to the U and the R and far away from the U 'and the R'. How to calculate the similarity is not limited.

The dialogue corresponding to the positive sample can be real interpersonal dialogue data mined from the network chat scene, a group of real dialogues is formed by asking for one answer (namely one dialogue input + corresponding reply), the false dialogue can be constructed dialogue, and usually the dialogue input and the reply are unmatched, for example, the dialogue input is 'how much like the weather today' and the reply is 'Qisha festival' today.

U may be a coding result obtained by coding the dialog input in the positive sample, the coding result is usually represented by a vector, or may be represented by a vector of the dialog input content itself, accordingly, R may be represented by a vector of the reply content itself, and U "and R" are similar and are not described in detail, and in addition, how to obtain the vector representation is not limited. Preferably, the dimensions of the different vector representations are the same.

Assuming that there are 100 sets of key-value pairs, for each U, the similarity between the vector representations of U and 100 keys may be calculated respectively, thereby obtaining 100 calculation results, and each calculation result may be taken as the weight of the corresponding key and value, respectively, after which the vector representations of 100 keys may be weighted and summed, thereby obtaining an intermediate vector U ', and the vector representations of 100 values may be weighted and summed, thereby obtaining an intermediate vector R'. By optimizing the Rank Loss (Rank Loss), U 'and R' can be close to U and R and far away from U 'and R', when the convergence condition is reached, the obtained vector representation of key and value in N groups of key-value pairs is the finally required vector representation, and therefore the memory capacity of the content memory unit is achieved, wherein key is used for memorizing dialog input in dialog data, and value is used for memorizing reply in the dialog data.

When training the human-machine dialogue model by using the training samples in the second training sample set, vector representations of M parameters in the strategy memory unit can be obtained through training, wherein M is a positive integer larger than one. In this way, in the man-machine conversation process, the obtained semantic content can be mapped into M weights by the strategy memory unit, products between each weight and the vector representation of the corresponding parameter are respectively calculated, and the M products are added to obtain the decoding parameters.

The core function of the strategy memory unit is to memorize the dialogue strategy, and the dialogue strategy in the dialogue system is the parameter of the decoder, namely the decoding parameter.

For the obtained semantic content, the strategy memory unit can firstly map the semantic content into M weights through a neural network, the specific value of M can be determined according to actual needs, and the value of M is the same as the number of parameters in the strategy memory unit. The mapping method may be preset. Semantic content is also represented by a vector, the dimension is usually larger than M, the semantic content larger than M dimension can be mapped to be represented by an M-dimension vector by using a neural network, and each element can be used as a mapped weight. Further, the product between each weight and the corresponding vector representation of the parameter may be calculated separately, and the M products may be added to obtain the required decoding parameters. Preferably, the dimensions of the vector representations of the parameters are the same. In addition, the initial values of the vector representations of the parameters may be randomly initialized.

Based on the above description, fig. 2 is a flowchart of an embodiment of a man-machine interaction method according to the present application. As shown in fig. 2, the following detailed implementation is included.

In 201, the man-machine conversation model encodes the acquired conversation input to obtain an encoding result; the man-machine dialogue model is obtained by training according to dialogue data serving as training samples.

At 202, the human-machine dialogue model determines semantic content for constraining the reply according to the encoding result.

In 203, the man-machine dialogue model maps semantic content into M weights, where M is a positive integer greater than one, calculates products between each weight and vector representations of corresponding parameters, and adds the M products to obtain decoding parameters, where the vector representations corresponding to the parameters are obtained by training.

At 204, the human-machine dialog model decodes the semantic content according to the decoding parameters, generating a reply to the dialog input.

Wherein, the operation in 201 can be completed by an encoder in the human-machine conversation model, the operation in 202 can be completed by a content memory unit in the human-machine conversation model, the operation in 203 can be completed by a strategy memory unit in the human-machine conversation model, and the operation in 204 can be completed by a decoder in the human-machine conversation model.

The obtained dialog input may be encoded in the existing manner to obtain an encoding result, which may be a vector representation.

The similarity between the encoding result and the vector representation of the key in the N groups of key-value pairs can be respectively calculated, N is a positive integer greater than one, the similarity corresponding to each key can be respectively used as the weight of the value corresponding to the key, the product of the vector representation of each value and the corresponding weight can be respectively calculated, the N products are added, and the required semantic content is obtained, wherein the vector representations corresponding to each key and value are obtained through pre-training.

Assuming that there are 100 sets of key-value pairs, the similarity between the encoding result and the vector representations of 100 keys can be calculated respectively to obtain 100 calculation results, and each calculation result can be used as the weight of the corresponding value, and then the vector representations of 100 values can be weighted and summed to obtain the required semantic content.

Further, the obtained semantic content may be mapped into M weights, where M is a positive integer greater than one, and products between each weight and the vector representation of the corresponding parameter may be calculated respectively, and the M products are added to obtain the decoding parameter, where the vector representation corresponding to each parameter is obtained by pre-training. FIG. 3 is a schematic diagram of a process for obtaining decoding parameters according to the present application, as shown in FIG. 3, where p 1-pm represents each weight, ⍬ 1- ⍬ m represents a vector representation of each parameter, and ⍬ represents a decoding parameter.

Finally, the semantic content can be decoded according to the decoding parameters to generate a reply aiming at the dialogue input, and how to decode the reply is the prior art.

It is noted that while for simplicity of explanation, the foregoing method embodiments are described as a series of acts or combination of acts, those skilled in the art will appreciate that the present application is not limited by the order of acts, as some steps may, in accordance with the present application, occur in other orders and concurrently. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In a word, by adopting the scheme of the embodiment of the method, the man-machine conversation model can have memory capacity through training, so that after the man-machine conversation model encodes the acquired conversation input, semantic content for constraint reply can be determined according to an encoding result, and a suitable decoding parameter can be determined, and further the semantic content can be decoded according to the decoding parameter, so that the reply for the conversation input is generated, compared with the existing operation of only encoding and decoding, the process of generating the reply is perfected, so that the accuracy, diversity and the like of the generated reply are improved; in addition, when the human-computer dialogue model is trained, the content memory unit in the human-computer dialogue model can be trained by using the training samples in the first training sample set to enable the training samples to memorize a large amount of dialogue data, the decoding operation is not needed in the process, the speed is high, after the training is finished, the training samples in the second training sample set can be further used for training the human-computer dialogue model, wherein the number of the training samples in the second training sample set is smaller than that of the training samples in the first training sample set, and therefore the training efficiency and the like are improved.

The above is a description of method embodiments, and the embodiments of the present application are further described below by way of apparatus embodiments.

Fig. 4 is a schematic structural diagram of a human-machine interaction device 400 according to an embodiment of the present application. The human-machine dialogue device can be applied to a human-machine dialogue model trained from dialogue data as a training sample, and as shown in fig. 4, the human-machine dialogue device includes: an encoder 401, a content storage unit 402, a policy storage unit 403, and a decoder 404.

And the encoder 401 is configured to encode the acquired dialog input to obtain an encoding result.

A content memorizing unit 402, configured to determine semantic content for constraint reply according to the encoding result.

A strategy memory unit 403, configured to map semantic content into M weights, where M is a positive integer greater than one, calculate products between each weight and vector representations of corresponding parameters, and add the M products to obtain decoding parameters, where the vector representations corresponding to the parameters are obtained through training;

a decoder 404, configured to decode the semantic content according to the decoding parameters, and generate a reply for the dialog input.

The encoder 401 may encode the obtained dialog input in the existing manner, so as to obtain an encoding result, which may be a vector representation.

The content memory unit 402 may respectively calculate similarities between the encoding result and vector representations of keys in N sets of key-value pairs, where N is a positive integer greater than one, respectively take the similarity corresponding to each key as a weight of the value corresponding to the key, respectively calculate products of the vector representation of each value and the corresponding weight, and add the N products to obtain semantic content, where the vector representations corresponding to the keys and the values are both obtained by pre-training.

The policy memory unit 403 may map the semantic content into M weights, where M is a positive integer greater than one, and may calculate the product between each weight and the vector representation of the corresponding parameter, and add the M products to obtain the decoding parameter, where the vector representation corresponding to the parameter is obtained by pre-training.

The decoder 404 can decode the semantic content according to the decoding parameters, generate a reply for the dialog input, and how to decode it is prior art.

Fig. 5 is a schematic structural diagram illustrating a component of an embodiment of a human-machine interaction model obtaining apparatus 500 according to the present application. As shown in fig. 5, includes: an acquisition unit 501 and a training unit 502.

An obtaining unit 501 is configured to obtain dialogue data as a training sample.

A training unit 502, configured to train with a training sample to obtain a human-computer interaction model; the man-machine dialogue model can be composed of an encoder, a content memory unit, a strategy memory unit and a decoder; in the man-machine conversation process, the encoder can be used for encoding the acquired conversation input to obtain an encoding result, the content memory unit can be used for determining semantic content for constraint reply according to the encoding result, the strategy memory unit can be used for mapping the semantic content into M weights, M is a positive integer greater than one, products between each weight and vector representation of corresponding parameters are respectively calculated, the M products are added to obtain decoding parameters, the vector representation corresponding to the parameters is obtained through training, and the decoder can be used for decoding the semantic content according to the decoding parameters to generate reply aiming at the conversation input.

To improve training efficiency, the training unit 502 may first train the content memory unit using the training samples in the first training sample set, and after training, may further train the human-machine dialogue model using the training samples in the second training sample set, where the number of training samples in the second training sample set is smaller than the number of training samples in the first training sample set.

When training the content memory unit using the training samples in the first training sample set, training unit 502 may train to obtain vector representations of key and value in N sets of key-value pairs in the content memory unit, where N is a positive integer greater than one, so that in the process of human-computer interaction, semantic content for constraint reply is determined by the content memory unit according to the encoding result and the vector representations of key and value in the N sets of key-value pairs.

Specifically, the training unit 502 may obtain a positive sample and at least one negative sample respectively during each training, where each positive sample includes two vector representations U and R, U corresponds to a dialog input in a set of real dialogs, R corresponds to a reply in the real dialogs, and each negative sample includes two vector representations U "and R", U "corresponds to a dialog input in a set of false dialogs, and R" corresponds to a reply in the false dialogs; respectively calculating the similarity between the vector representations of the U and the keys in the N groups of key-value pairs, and respectively taking the similarity corresponding to each key as the weight of the key and the value corresponding to the key; respectively calculating the product of the vector representation of each key and the corresponding weight, adding the N products to obtain an intermediate vector U ', respectively calculating the product of the vector representation of each value and the corresponding weight, and adding the N products to obtain an intermediate vector R'; and finally training to obtain the vector representation of the key and the value in the N groups of key-value pairs according to the principle that the U 'and the R' are close to the U and the R and far away from the U 'and the R'.

When training the human-computer dialogue model using the training samples in the second training sample set, the training unit 502 may train to obtain vector representations of M parameters in the policy storage unit, where M is a positive integer greater than one, so that in the human-computer dialogue process, the policy storage unit maps the obtained semantic content into M weights, and calculates products between each weight and the vector representation of the corresponding parameter, and adds the M products to obtain a decoding parameter.

For a specific work flow of the device embodiments shown in fig. 4 and fig. 5, reference is made to the related description in the foregoing method embodiments, and details are not repeated.

In a word, by adopting the scheme of the embodiment of the application device, the man-machine conversation model can have memory capacity through training, so that after the man-machine conversation model encodes the acquired conversation input, semantic content for constraint reply can be determined according to an encoding result, and a suitable decoding parameter can be determined, and further the semantic content can be decoded according to the decoding parameter, so that a reply for the conversation input is generated, compared with the existing operation of only encoding and decoding, the process of generating the reply is perfected, so that the accuracy, diversity and the like of the generated reply are improved; in addition, when the human-computer dialogue model is trained, the content memory unit in the human-computer dialogue model can be trained by using the training samples in the first training sample set to enable the training samples to memorize a large amount of dialogue data, the decoding operation is not needed in the process, the speed is high, after the training is finished, the training samples in the second training sample set can be further used for training the human-computer dialogue model, wherein the number of the training samples in the second training sample set is smaller than that of the training samples in the first training sample set, and therefore the training efficiency and the like are improved.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

Fig. 6 is a block diagram of an electronic device according to the method of the embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 6, the electronic apparatus includes: one or more processors Y01, a memory Y02, and interfaces for connecting the various components, including a high speed interface and a low speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 6, one processor Y01 is taken as an example.

Memory Y02 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by at least one processor to cause the at least one processor to perform the method of target object identification provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method of target object recognition provided herein.

Memory Y02, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the method of target object identification in the embodiments of the present application (e.g., xx module X01, xx module X02, and xx module X03 shown in fig. X). The processor Y01 executes various functional applications of the server and data processing, i.e., a method of object identification in the above-described method embodiments, by executing non-transitory software programs, instructions, and modules stored in the memory Y02.

The memory Y02 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the electronic device, and the like. Additionally, the memory Y02 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory Y02 may optionally include memory located remotely from processor Y01, which may be connected to the electronic device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device may further include: an input device Y03 and an output device Y04. The processor Y01, the memory Y02, the input device Y03 and the output device Y04 may be connected by a bus or in another manner, and the connection by the bus is exemplified in fig. 6.

The input device Y03 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device, such as a touch screen, keypad, mouse, track pad, touch pad, pointer, one or more mouse buttons, track ball, joystick, or other input device. The output device Y04 may include a display device, an auxiliary lighting device (e.g., LED), a tactile feedback device (e.g., vibration motor), and the like. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method for human-computer interaction, comprising:

the man-machine conversation model determines semantic content for constraint reply according to the coding result, and the method comprises the following steps: respectively calculating the similarity between the encoding result and the vector representation of the keywords in N groups of keyword-value pairs, wherein N is a positive integer greater than one; respectively taking the similarity corresponding to each keyword as the weight of the value corresponding to the keyword; respectively calculating the product of the vector representation of each value and the corresponding weight, and adding the N products to obtain the semantic content; wherein, the keywords and the vector representation corresponding to the values are obtained by training;

2. A human-machine dialogue model acquisition method is characterized by comprising the following steps:

obtaining dialogue data serving as a training sample;

the man-machine dialogue model consists of an encoder, a content memory unit, a strategy memory unit and a decoder; in the process of man-machine conversation, the encoder is used for encoding the acquired conversation input to obtain an encoding result; the content memory unit is used for determining semantic content for constraint reply according to the coding result, and comprises: respectively calculating the similarity between the encoding result and the vector representation of the keywords in N groups of keyword-value pairs, wherein N is a positive integer greater than one; respectively taking the similarity corresponding to each keyword as the weight of the value corresponding to the keyword; respectively calculating the product of the vector representation of each value and the corresponding weight, and adding the N products to obtain the semantic content; wherein, the keywords and the vector representation corresponding to the values are obtained by training; the strategy memory unit is used for mapping the semantic content into M weights, wherein M is a positive integer greater than one, respectively calculating the product between each weight and the vector representation of the corresponding parameter, and adding the M products to obtain a decoding parameter, wherein the vector representation corresponding to the parameter is obtained by training; the decoder is used for decoding the semantic content according to the decoding parameters and generating a reply aiming at the dialogue input.

3. The method of claim 2,

the training by using the training sample to obtain the man-machine conversation model comprises the following steps:

4. The method of claim 2,

the training to obtain the vector representation of the keywords and the values in the N groups of keyword-value pairs in the content memory unit comprises the following steps:

5. The method of claim 3,

the training the human-machine dialogue model with training samples of the second training sample set comprises: and training to obtain vector representation of M parameters in the strategy memory unit.

6. A human-machine interaction device, which is applied to a human-machine interaction model trained from interaction data serving as training samples, includes: an encoder, a content memory unit, a policy memory unit, and a decoder;

the content memory unit is used for determining semantic content for constraint reply according to the encoding result, and comprises the following steps: respectively calculating the similarity between the encoding result and the vector representation of the keywords in N groups of keyword-value pairs, wherein N is a positive integer greater than one; respectively taking the similarity corresponding to each keyword as the weight of the value corresponding to the keyword; respectively calculating the product of the vector representation of each value and the corresponding weight, and adding the N products to obtain the semantic content; wherein, the keywords and the vector representation corresponding to the values are obtained by training;

7. A human-machine interaction model acquisition apparatus, comprising: an acquisition unit and a training unit;

the training unit is used for training by using the training sample to obtain a man-machine conversation model; the man-machine dialogue model consists of an encoder, a content memory unit, a strategy memory unit and a decoder; in the process of man-machine conversation, the encoder is used for encoding the acquired conversation input to obtain an encoding result; the content memory unit is used for determining semantic content for constraint reply according to the coding result, and comprises: respectively calculating the similarity between the encoding result and the vector representation of the keywords in N groups of keyword-value pairs, wherein N is a positive integer greater than one; respectively taking the similarity corresponding to each keyword as the weight of the value corresponding to the keyword; respectively calculating the product of the vector representation of each value and the corresponding weight, and adding the N products to obtain the semantic content; wherein, the keywords and the vector representation corresponding to the values are obtained by training; the strategy memory unit is used for mapping the semantic content into M weights, wherein M is a positive integer greater than one, respectively calculating the product between each weight and the vector representation of the corresponding parameter, and adding the M products to obtain a decoding parameter, wherein the vector representation corresponding to the parameter is obtained by training; the decoder is used for decoding the semantic content according to the decoding parameters and generating a reply aiming at the dialogue input.

8. The apparatus of claim 7,

the training unit is further configured to train the content memory unit using training samples in a first training sample set, and train the human-computer conversation model using training samples in a second training sample set after training, where the number of training samples in the second training sample set is smaller than the number of training samples in the first training sample set.

9. The apparatus of claim 7,

the training unit respectively obtains a positive sample and at least one negative sample during each training, wherein each positive sample comprises two vector representations U and R, the U corresponds to a dialog input in a real dialog, the R corresponds to a reply in the real dialog, and each negative sample comprises two vector representations U 'and R', the U 'corresponds to a dialog input in a false dialog, and the R' corresponds to a reply in the false dialog; respectively calculating the similarity between the U and the vector representation of the keywords in the N groups of keyword-value pairs, and respectively taking the similarity corresponding to each keyword as the weight of the keyword and the value corresponding to the keyword; respectively calculating the product of the vector representation of each keyword and the corresponding weight, adding the N products to obtain an intermediate vector U ', respectively calculating the product of the vector representation of each value and the corresponding weight, and adding the N products to obtain an intermediate vector R'; and finally training to obtain keywords and vector representation of values in the N groups of keyword-value pairs according to the principle that the U 'and the R' are close to the U and the R and far away from the U 'and the R'.

10. The apparatus of claim 8,

and the training unit utilizes the training samples in the second training sample set to train and obtain vector representation of the M parameters in the strategy memory unit.

11. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.

12. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-5.