CN112860862B

CN112860862B - Method and device for generating intelligent agent dialogue sentences in man-machine dialogue

Info

Publication number: CN112860862B
Application number: CN202110133448.0A
Authority: CN
Inventors: 宇洋; 袁彩霞; 王小捷; 刘咏彬; 李蕾
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2021-02-01
Filing date: 2021-02-01
Publication date: 2022-11-11
Anticipated expiration: 2041-02-01
Also published as: CN112860862A

Abstract

The application discloses a method and a device for generating intelligent agent dialogue sentences in man-machine dialogue, wherein the method comprises the following steps: extracting attribute values and scene categories in a preset knowledge base from conversation historical data of the current man-machine conversation by using a pre-trained natural language understanding model; wherein the knowledge base is composed of knowledge triples; based on the attribute values and the scene categories, relevant knowledge triples are screened out from the knowledge base to obtain candidate knowledge subsets; and generating and outputting a current response sentence for the intelligent agent by utilizing a pre-trained dialogue generation model based on the dialogue historical data and the candidate knowledge subset. By adopting the invention, the man-machine conversation of a multi-task scene can be supported.

Description

Method and device for generating intelligent body dialogue sentences in man-machine dialogue

Technical Field

The invention relates to an artificial intelligence technology, in particular to a method and a device for generating intelligent agent dialogue sentences in man-machine dialogue.

Background

Existing human-machine dialog implementations are typically implemented for a particular scenario. The specific scenes can be divided into four types, namely chatting, question answering, recommending and task type dialogue. The intelligent agent can chat with the user without an explicit target; the question answering means that when a user asks a question to the intelligent agent, the intelligent agent can answer; the recommendation means that the intelligent agent can recommend proper information to the user according to the knowledge base and the chat with the user; by task-based dialog is meant that the agent may engage in a dialog with the user around a particular object, such as helping the user buy movie tickets, book a hotel, etc.

Due to the fact that the conversation targets of different types of scenes are different, the man-machine conversation implementation scheme aiming at a certain type of scenes can only adapt to the corresponding application scene, and is not suitable for other types of scenes. In real life, the application scene boundary of the man-machine conversation is not clear. For example, a person may join some greetings, error messages, and other conversations unrelated to the task when completing a ticket ordering task, or may initiate some specific service requests in a chat scenario, such as when chatting about a movie topic, the user may need an agent to help order a movie ticket, query an order, request a recommendation, ask an answer, and so on. Therefore, it is desirable to provide a human-machine conversation scheme that can serve multiple task scenarios to meet the above application requirements.

Disclosure of Invention

In view of this, the main objective of the present invention is to provide a method and an apparatus for generating an intelligent agent dialog statement in a human-computer dialog, which can support a human-computer dialog in a multitask scenario.

In order to achieve the above purpose, the embodiment of the present invention provides a technical solution:

a method for generating intelligent agent dialogue sentences in man-machine dialogue comprises the following steps:

extracting attribute values and scene categories in a preset knowledge base from conversation historical data of the current man-machine conversation by using a pre-trained natural language understanding model; wherein the knowledge base is composed of knowledge triples;

based on the attribute values and the scene categories, relevant knowledge triples are screened out from the knowledge base to obtain candidate knowledge subsets;

and generating and outputting a current response sentence for the intelligent agent by utilizing a pre-trained dialogue generation model based on the dialogue historical data and the candidate knowledge subset.

Preferably, the extracting the attribute values and the scene categories in the preset knowledge base includes:

splicing the dialogue history data with a preset special mark, and inputting the dialogue history data into an encoder of the natural language understanding model for encoding to obtain a corresponding dialogue history vector and a corresponding scene information vector;

inputting the dialogue history vector into a CRF layer of the natural language understanding model for sequence labeling to obtain the attribute values contained in the dialogue history data;

and inputting the scene information vector into a multilayer perceptron of the natural language understanding model for scene classification to obtain the scene category of the man-machine conversation.

Preferably, the screening out relevant knowledge triples from the knowledge base based on the attribute values and the scene categories to obtain candidate knowledge subsets includes:

if the scene category is chatting, traversing each attribute value, searching a knowledge triple containing the attribute value from the knowledge base, and constructing the candidate knowledge subset by using all the searched knowledge triples;

if the scene category is question answering, traversing each attribute value contained in the latest dialog in the dialog historical data, and searching a knowledge triple containing the attribute value from the knowledge base; constructing the candidate knowledge subsets by using all the found knowledge triples;

if the scene category is recommended, combining all the primary key entity values in the attribute values pairwise, traversing each combination, determining a common attribute value of the attribute values in the combination, and searching a knowledge triple containing the common attribute value from the knowledge base for each common attribute value; constructing the candidate knowledge subsets by using all searched knowledge triples;

and if the scene type is a task type conversation, traversing each key entity value in the attribute values, searching a knowledge triple which contains the key entity value and is related to the current man-machine conversation task from the knowledge base, and constructing the candidate knowledge subset by using all the searched knowledge triples.

Preferably, the generating a current response sentence for the agent using a pre-trained dialogue generating model based on the dialogue history data and the candidate knowledge subset comprises:

inputting the dialogue historical data into a dialogue coder of the dialogue generating model for coding to obtain a comprehensive characterization vector C of the dialogue historical data and word vectors of all words contained in the dialogue historical data;

inputting the candidate knowledge subsets into a knowledge encoder of the dialogue generation model for encoding to obtain a comprehensive characterization vector kg of the candidate knowledge subsets and a vector representation of each knowledge triple in the candidate knowledge subsets;

generating the response sentence using a natural language generator of the dialogue generation model based on the comprehensive characterization vector C of the dialogue history data, the comprehensive characterization vector kg of the candidate knowledge subset, the word vector, and the vector representation of the knowledge triples.

Preferably, the inputting the dialogue history data into the dialogue coder of the dialogue generating model for coding comprises:

expanding the conversation history data by adding conversation role information and conversation turn information to which each word belongs in the conversation history data;

dividing the expanded dialogue historical data according to dialogue turns;

coding each pair of dialogue data obtained by the division by using a sentence-level bidirectional threshold recurrent neural network (BiGRU) to obtain word vectors of all words contained in each pair of dialogs;

calculating a first dialogue vector of each dialogue by adopting a self-attention mechanism based on the word vectors of all words contained in each dialogue;

encoding the first dialogue vectors of all the dialogs in a turn by using a dialogue-turn secondary BiGRU to obtain a second dialogue vector of each turn of dialogue;

and calculating a comprehensive characterization vector C of the dialogue historical data by adopting a self-attention mechanism based on the second dialogue vector.

Preferably, the encoding the candidate knowledge subset input to the knowledge encoder of the dialog generation model comprises:

calculating an entity word vector of each knowledge triple in the candidate knowledge subset by using a TransE model;

obtaining a vector representation of each knowledge triple in the candidate knowledge subset by using a multilayer perceptron based on the entity word vector of each knowledge triple;

and obtaining a comprehensive characterization vector kg of the candidate knowledge subset by using a self-attention mechanism based on the vector representation of each knowledge triple.

Preferably, generating the response sentence using a natural language generator of the dialogue generation model includes:

splicing the vector representation of the knowledge triple with the word vector, and writing a splicing result M into a memory network of the natural language generator; wherein, M = [ (h) ₁ ,...,h _n )；(k ₁ ,...,k _g )]＝[M ₁ ,...,M _n+g ]，h _n Representing the nth word vector; k is a radical of formula _g A vector representation representing the g-th knowledge triplet; n represents the number of word vectors; g represents the number of knowledge triples;

using GRU of the natural language generator for decoded initial query vector s ₀ Initializing a splicing result of the comprehensive characterization vector C and the comprehensive characterization vector kg;

at each time t at which the GRU decodes, the GRU bases on the query vector s at the last time _t-1 And the generated word y at the previous moment _t-1 Generating a query vector s of the current time t _t Computing said query vector s using an attention mechanism _t Obtaining the query vector s according to the correlation degree of each storage unit in the memory network _t Degree of correlation α with each of the words in the dialogue history data _i ^t And the query vector s _t A degree of correlation β with each knowledge triple in the candidate knowledge subset _r ^t (ii) a Based on the degree of correlation α _i t, calculating the joint representation c of the dialogue historical data by adopting a weighted summation mode _t Based on said degree of correlation β _r t, calculating the joint representation g of the candidate knowledge subsets by adopting a weighted summation mode _t (ii) a With said c _t As a query vector, accessing the memory in a multi-hop mannerNetwork, get knowledge distribution p _ptr (ii) a In the g of _t As a query vector, accessing a preset dictionary by adopting a multilayer perceptron to obtain dictionary distribution p _vocab (ii) a Distributing p based on said knowledge _ptr And the dictionary distribution p _vocab Obtaining the generated word y at the current time t by using a gating mechanism _t ；

And the GRU generates a current response sentence for the agent based on the generated words at all the moments.

The embodiment of the invention also discloses a device for generating the intelligent agent dialogue sentences in the man-machine dialogue, which comprises the following steps:

the information extraction module is used for extracting attribute values and scene categories in a preset knowledge base from conversation historical data of the current man-machine conversation by utilizing a pre-trained natural language understanding model; wherein the knowledge base is composed of knowledge triples;

the knowledge screening module is used for screening out related knowledge triples from the knowledge base based on the attribute values and the scene categories to obtain candidate knowledge subsets;

and the dialogue response module is used for generating and outputting a current response sentence for the intelligent agent by utilizing a pre-trained dialogue generation model based on the dialogue historical data and the candidate knowledge subset.

The embodiment of the invention also discloses equipment for generating the intelligent agent dialogue sentences in the man-machine dialogue, which comprises a processor and a memory;

the memory stores an application program executable by the processor, and is used for enabling the processor to execute the method for generating the intelligent agent dialogue sentences in the man-machine dialogue.

A computer-readable storage medium having stored therein computer-readable instructions for executing the method for generating an agent dialog statement in a human-computer dialog as described above.

According to the technical scheme, the generation scheme of the intelligent agent dialogue sentences in the man-machine dialogue provided by the embodiment of the invention extracts the attribute values in the knowledge base from the dialogue historical data and identifies the dialogue scene types, selects the candidate knowledge subsets related to the dialogue from the knowledge base based on the attribute values and the scene types in the dialogue, and then generates the current intelligent agent response sentences based on the candidate knowledge subsets and the current dialogue historical data. Therefore, on one hand, the number of the knowledge triples used for generating the response sentences can be effectively reduced by constructing the candidate knowledge subsets, so that the operation overhead generated by the response sentences can be reduced, and the sentence generation efficiency can be improved, on the other hand, the generated candidate knowledge subsets can be matched with the current scene categories by screening the knowledge triples in the knowledge base based on the scene categories, so that the generated response sentences can be matched with the current man-machine conversation scene, the intelligence of the response sentences is improved, and the man-machine conversation experience of the user is improved. Therefore, the intelligent agent dialogue statement generation scheme provided by the embodiment of the invention can be suitable for various task scenes.

Drawings

FIG. 1 is a schematic flow chart of a method according to an embodiment of the present invention.

Fig. 2 is a schematic structural diagram of an apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.

Fig. 1 is a schematic flow diagram of an embodiment of the present invention, and as shown in fig. 1, a method for generating an agent dialog statement in a human-computer dialog implemented by the embodiment mainly includes:

101, extracting attribute values and scene categories in a preset knowledge base from conversation historical data of current man-machine conversation by using a pre-trained natural language understanding model; wherein the knowledge base is composed of knowledge triples.

Here, the dialog history data of the current man-machine dialog is dialog sentences that have been generated in the current man-machine dialog. In this step, the attribute values and scene categories in the dialog history data are identified, so that in step 102, a candidate knowledge sub-base related to the current dialog is constructed based on the attribute values and the scene categories, and thus, in the subsequent steps, only the candidate knowledge sub-base needs to be generated based on the candidate knowledge sub-base, but not the whole knowledge base, so that the generation efficiency of the response sentence can be improved.

The dialogue history data may be specifically represented by X = (X) ₁ ,...,x _n ) Wherein each element corresponds to a word, and n represents the number of words contained in the dialog history data.

It should be noted that, the embodiment of the present invention requires a large-scale data set to be constructed in advance, which can be used for a multitask scenario dialog. Wherein, the training data can realize large-scale expansion according to the template and the knowledge base. Given a domain-specific database, it contains a primary key field and several attribute fields describing the primary key. The home key field refers to a specific entity object that uniquely identifies a record in the database, such as a movie work, a hotel, a tourist attraction, etc., and the attribute field refers to a plurality of elements describing the home key entity, such as a movie work having attributes "director", "actor", "show time", etc., with attribute having an attribute value, which is also considered to be an entity. The attribute name often describes the type of semantic relationship between an entity and some attribute value thereof, for example, a movie work "hawthorn tree love" has an attribute "director", which has an attribute value "Zhang Yimou", and "director" describes the semantic relationship between "hawthorn tree love" and "Zhang Yimou". Thus, an entity and one of its attribute values may be represented as a knowledge triplet, each triplet consisting of a head entity, a relationship and a tail entity, e.g. < hawthorn tree love, director, zhang Yimou > indicating that the relationship between the head entity "hawthorn tree love" and the tail entity "Zhang Yimou" is "director". The knowledge base is composed of a large number of knowledge triples. The relationship types between the entities can be set according to different fields, for example, in the Movie field, 12 relationship types can be designed, which are Movie _ Name, actor, director, writer, release, gente, language, plot, date, num _ packets, theatre _ Name, and Time, respectively.

Specifically, the method for constructing the knowledge base based on the knowledge triples is known to those skilled in the art, and is not described herein again.

In one embodiment, the following method may be specifically adopted to extract the attribute values and the scene categories in the preset knowledge base:

and step 1011, after the dialogue history data is spliced with a preset special mark, inputting the dialogue history data into an encoder of the natural language understanding model for encoding, and obtaining a corresponding dialogue history vector and a corresponding scene information vector.

In this step, dialog history data X = (X) ₁ ,...,x _n ) And a special mark (denoted as x) _@ ) Splicing is carried out, and then the splicing result is (x) ₁ ,...,x _n ,x _@ ) Encoding will result in a vector representation H = (H) ₁ ,...,h _n ，h _@ ). Wherein the special mark x _@ After encoding, the obtained scene information vector h _@ Scene information related to the dialogue historical data is carried, and therefore scene classification can be carried out in the subsequent steps based on the scene information vector, and the scene category can be obtained.

Specifically, the encoder may be a BiGRU, and the specific encoding method is the same as that in the prior art, and is not described herein again.

Step 1012, inputting the dialogue history vector into a Conditional Random Field (CRF) layer of the natural language understanding model for sequence labeling, so as to obtain the attribute values included in the dialogue history data.

The step is expressed by a formula as follows: y is _logit ＝CRF(h ₁ ,...,h _n )。Y _logit Indicating the result of the sequence annotation.

And 1013, inputting the scene information vector into a multilayer perceptron of the natural language understanding model to perform scene classification, so as to obtain the scene category of the man-machine conversation.

This step is expressed by the formula: sce _logit ＝softmax(MLP(h _@ ))。Sce _logit Indicating the output scene category.

And 102, screening related knowledge triples from the knowledge base based on the attribute values and the scene categories to obtain candidate knowledge subsets.

In an embodiment, the following method may be specifically adopted to screen out relevant knowledge triples from the knowledge base based on the attribute values and the scene categories, so as to obtain a candidate knowledge subset:

a. and if the scene category is chatting, traversing each attribute value, searching a knowledge triple containing the attribute value from the knowledge base, and constructing the candidate knowledge subset by using all the searched knowledge triples.

b. If the scene category is question answering, traversing each attribute value contained in the latest dialog in the dialog historical data, and searching a knowledge triple containing the attribute value from the knowledge base; and constructing the candidate knowledge subsets by using all the found knowledge triples.

Considering that in the question-and-answer scenario, the response of the agent should be answered to the latest question posed by the user, therefore, it is necessary to find the relevant knowledge triple based on each attribute value contained in the latest round of dialog in the dialog history data so that the constructed candidate knowledge subset matches the dialog scenario requirement.

c. If the scene category is recommended, combining all the primary key entity values in the attribute values pairwise, traversing each combination, determining a common attribute value of the attribute values in the combination, and searching a knowledge triple containing the common attribute value from the knowledge base for each common attribute value; and constructing the candidate knowledge subsets by using all the found knowledge triples.

In consideration of the fact that information which is interested by the user needs to be provided for the user in a recommendation scene, all key entity values in the attribute values in the conversation history need to be combined pairwise to obtain all possible pairwise key entity value combinations, interest points of the user are found by searching a common attribute value between the two key entity values, and then knowledge triples are screened based on the common attribute value, so that the screened knowledge triples can provide the information which is interested by the user, and the task requirement of the recommendation scene can be met.

d. And if the scene category is task-based dialogue, traversing each key entity value in the attribute values, searching a knowledge triple which contains the key entity value and is related to the current man-machine dialogue task from the knowledge base, and constructing the candidate knowledge subset by using all the searched knowledge triples.

In consideration of the fact that the human-computer conversation in the task-based conversation scene needs to complete a predetermined task, in this case, when the knowledge triples are screened, it is necessary to ensure that the screened knowledge triples are related to the current human-computer conversation task so as to meet the task requirements of the task-based conversation scene.

And 103, generating and outputting a current response sentence for the intelligent agent by utilizing a pre-trained dialogue generation model based on the dialogue historical data and the candidate knowledge subset.

In one embodiment, the following method may be specifically adopted to generate a current response statement for the agent based on the dialogue history data and the candidate knowledge subset:

step 1031, inputting the dialogue history data into a dialogue encoder of the dialogue generating model for encoding, and obtaining a comprehensive characterization vector C of the dialogue history data and word vectors of all words contained in the dialogue history data.

In one embodiment, the dialog history data may be input to a dialog encoder of the dialog generation model to be encoded by the method described below:

and step x1, expanding the conversation history data by adding the conversation role information and the conversation turn information to which each word belongs in the conversation history data.

The step is used for expanding the conversation history data, and corresponding conversation role information and conversation turn information are added behind each word, namely, the conversation history data X = (X) ₁ ,...,x _n ) Extension to X = (c) ₁ ,...,c _n ) Wherein c is _i ＝(x _i And u/$ s, t), wherein i is more than or equal to 1 and less than or equal to n, u represents a sentence of the user, s represents a sentence returned by the intelligent agent, and t represents a conversation turn. Therefore, the expanded conversation history can enable the model to capture more information related to the conversation during coding, so that the generation of a reply sentence which is more matched with the sentence of the user is facilitated, and the effectiveness of the reply sentence can be improved.

And step x2, dividing the expanded dialogue historical data according to the dialogue turns.

And step x3, coding each pair of dialogue data obtained by division by using a sentence-level bidirectional threshold recurrent neural network (BiGRU) to obtain word vectors of all words contained in each pair of dialogs.

And step x4, calculating a first dialogue vector of each dialogue by adopting a self-attention mechanism based on the word vectors of all words contained in each dialogue.

And step x5, encoding the first dialogue vectors of all the dialogs to obtain a second dialogue vector of each dialog by using the secondary BiGRU of the dialog wheel.

And step x6, calculating a comprehensive characterization vector C of the dialogue historical data by adopting an attention-free mechanism based on the second dialogue vector.

Step 1032, inputting the candidate knowledge subsets into a knowledge encoder of the dialogue generating model for encoding, so as to obtain a comprehensive characterization vector kg of the candidate knowledge subsets and a vector representation of each knowledge triple in the candidate knowledge subsets.

In one embodiment, this step may encode the candidate knowledge subset by inputting the candidate knowledge subset to a knowledge encoder of the dialog generation model by:

and step y1, calculating an entity word vector of each knowledge triple in the candidate knowledge subset by using a TransE model.

And y2, obtaining the vector representation of each knowledge triple in the candidate knowledge subset by utilizing a multilayer perceptron based on the entity word vector of each knowledge triple.

And y3, based on the vector representation of each knowledge triple, obtaining a comprehensive characterization vector kg of the candidate knowledge subset by using a self-attention mechanism.

Step 1033, generating the response sentence by using a natural language generator of the dialogue generation model based on the comprehensive characterization vector C of the dialogue history data, the comprehensive characterization vector kg of the candidate knowledge subset, the word vector, and the vector representation of the knowledge triplet.

Preferably, the response statement may be generated by a method of dynamic interaction between a Memory Network (Memory Network) and a GRU by using a natural language generator of the dialog generation model by the following method:

and step z1, splicing the vector representation of the knowledge triple with the word vector, and writing a spliced result M into a memory network of the natural language generator.

Here, the vector of knowledge triples is represented by KG = [ k ] ₁ ,...,k _g ]And word vector representation of words in conversation history H = (H) ₁ ,...,h _n ) The splices are collectively written as input into the memory network.

Wherein M = [ (h) ₁ ,...,h _n )；(k ₁ ,...,k _g )]＝[M ₁ ,...,M _n+g ]，h _n Representing the nth word vector; k is a radical of _g A vector representation representing the g-th knowledge triplet; n represents the number of word vectors; g represents the number of knowledge triples.

The memory network, i.e. the memory, has the function of reading and writing, and is mainly used for outputting the output result H = (H) of the dialog encoder ₁ ,...,h _n ) And the output result KG = [ k ] of the knowledge encoder ₁ ,...,k _g ]And writing the data into a memory so as to facilitate the query when the GRU dynamically generates words at each moment.

Step z2, using GRU of the natural language generator for decoding initial query vector s ₀ And initializing the splicing result of the comprehensive characterization vector C and the comprehensive characterization vector kg.

Step z3, decoding each of said GRUsAt time t, the GRU is based on the query vector s at the last time _t-1 And the generated word y at the previous moment _t-1 Generating a query vector s of the current time t _t (ii) a I.e. according to s _t ＝GRU(s _t-1 ，e(y _t-1 ) Get the query vector s) _t ；

Using the attention mechanism, in accordance with the calculation of the query vector s _t Obtaining the query vector s according to the correlation degree of each storage unit in the memory network _t A degree of correlation with each of the words in the dialogue history data

And the query vector s _t A degree of relevance to each knowledge triple in the candidate knowledge subset

Namely: according to

Obtaining the correlation

Wherein i represents a word number; i is more than or equal to 1 and less than or equal to n; according to

Obtaining the correlation

Wherein r represents a knowledge triplet number; r is more than or equal to 1 and less than or equal to g;

based on the degree of correlation

Calculating a joint representation c of the dialogue historical data by adopting a weighted summation mode _t (ii) a That is, in accordance with

To obtain the c _t ；

Based on the degree of correlation

Calculating a joint representation g of the candidate knowledge subsets by means of weighted summation _t (ii) a That is, in accordance with

To obtain the said g _t ；

With said c _t As a query vector, accessing the memory network in a multi-hop manner to obtain a knowledge distribution p _ptr (ii) a I.e. according to p _ptr ＝multihop([s _t ，g _t ]M) to obtain said p _ptr (ii) a The specific method for accessing the memory network by using the multi-hop method is known by those skilled in the art and will not be described herein;

in the g _t As a query vector, accessing a preset dictionary by adopting a multilayer perceptron to obtain dictionary distribution p _vocab (ii) a I.e. according to p _vocab ＝mlp([s _t ，g _t ]V) to obtain said p _vocab ；

Distributing p based on said knowledge _ptr And the dictionary distribution p _vocab And obtaining the generated word at the current time t by using a gating mechanism.

In this step, c is expressed by the union of the dialogue history data _t As a query vector, accessing the memory network in a multi-hop manner to obtain a knowledge distribution p _ptr The knowledge distribution p may be improved by using the result of the concatenation of the vector representation of the knowledge triples and the word vector _ptr The accuracy of (2).

And z4, the GRU generates a current response statement for the agent based on the generated words at all times.

In this step, the system response Y = (Y) is generated from the generated word selected at all times by the GRU ₁ ,...,y _t ,...y _m ) Wherein, y _t Representing the generated word of the GRU at time t.

Based on the embodiment, it can be seen that in the technical scheme, attribute values and conversation scene categories in conversation historical data are identified, knowledge triples related to conversation are screened from a knowledge base based on the attribute values and the scene categories in the conversation, a candidate knowledge subset is constructed, and then a current system response statement is generated based on the candidate knowledge subset and current conversation historical data. Therefore, on one hand, the number of knowledge triples used for generating the response statements can be effectively reduced by constructing the candidate knowledge subsets, so that the operation overhead of generating the response statements can be reduced, and the statement generation efficiency can be improved, on the other hand, the generated candidate knowledge subsets can be matched with the current scene category by screening the knowledge triples in the knowledge base based on the scene category, so that the generated response statements can be matched with the current man-machine conversation scene, the intelligence and the accuracy of the response statements can be further improved, and the man-machine conversation experience of a user can be effectively improved. Therefore, the invention is applicable to various task scenarios.

Corresponding to the above method embodiment, the embodiment of the present invention further discloses a device for generating an intelligent agent dialog statement in a human-computer dialog, as shown in fig. 2, the device includes:

the information extraction module is used for extracting attribute values and scene categories in a preset knowledge base from conversation historical data of the current man-machine conversation by using a pre-trained natural language understanding model; wherein the knowledge base is composed of knowledge triples;

and the dialogue response module is used for generating and outputting a current response statement for the intelligent agent by utilizing a pre-trained dialogue generation model based on the dialogue historical data and the candidate knowledge subset.

The embodiment of the invention also discloses equipment for generating the intelligent agent dialogue sentences in the man-machine dialogue, which comprises a processor and a memory; the memory stores an application program executable by the processor, and the application program is used for enabling the processor to execute the method for generating the intelligent agent dialogue statement in the man-machine dialogue.

The memory may be embodied as various storage media such as an Electrically Erasable Programmable Read Only Memory (EEPROM), a Flash memory (Flash memory), and a Programmable Read Only Memory (PROM). The processor may be implemented to include one or more central processors or one or more field programmable gate arrays, wherein the field programmable gate arrays integrate one or more central processor cores. In particular, the central processor or central processor core may be implemented as a CPU or MCU.

It should be noted that not all steps and modules in the above flows and structures are necessary, and some steps or modules may be omitted according to actual needs. The execution order of the steps is not fixed and can be adjusted as required. The division of each module is only for convenience of describing adopted functional division, and in actual implementation, one module may be divided into multiple modules, and the functions of multiple modules may also be implemented by the same module, and these modules may be located in the same device or in different devices.

The hardware modules in the various embodiments may be implemented mechanically or electronically. For example, a hardware module may comprise a specially designed non-volatile circuit or logic device (e.g., a special-purpose processor such as an FPGA or an ASIC) for performing certain operations. A hardware module may also comprise programmable logic devices or circuits (e.g., including a general-purpose processor or other programmable processor) that are temporarily configured by software to perform certain operations. The implementation of the hardware module in a mechanical manner, or in a dedicated permanent circuit, or in a temporarily configured circuit (e.g., configured by software), may be determined based on cost and time considerations.

The present invention also provides a machine-readable storage medium storing instructions for causing a machine to perform a method as described herein. Specifically, a system or an apparatus equipped with a storage medium on which a software program code that realizes the functions of any of the embodiments described above is stored may be provided, and a computer (or a CPU or MPU) of the system or the apparatus is caused to read out and execute the program code stored in the storage medium. Further, part or all of the actual operations may be performed by an operating system or the like operating on the computer by instructions based on the program code. The functions of any of the above-described embodiments may also be implemented by writing the program code read out from the storage medium to a memory provided in an expansion board inserted into the computer or to a memory provided in an expansion unit connected to the computer, and then causing a CPU or the like mounted on the expansion board or the expansion unit to perform part or all of the actual operations based on the instructions of the program code.

Embodiments of the storage medium used to provide the program code include floppy disks, hard disks, magneto-optical disks, optical disks (e.g., CD-ROMs, CD-R, CD-RWs, DVD-ROMs, DVD-RAMs, DVD-RWs, DVD + RWs), magnetic tapes, non-volatile memory cards, and ROMs. Alternatively, the program code may be downloaded from a server computer or the cloud by a communication network.

"exemplary" means "serving as an example, instance, or illustration" herein, and any illustration, embodiment, or steps described as "exemplary" herein should not be construed as a preferred or advantageous alternative. For the sake of simplicity, the drawings are only schematic representations of the relevant parts of the invention, and do not represent the actual structure of the product. In addition, in order to make the drawings concise and understandable, components having the same structure or function in some of the drawings are only schematically illustrated or only labeled. In this document, "a" does not mean that the number of the relevant portions of the present invention is limited to "only one", and "a" does not mean that the number of the relevant portions of the present invention "more than one" is excluded. In this document, "upper", "lower", "front", "rear", "left", "right", "inner", "outer", and the like are used only to indicate relative positional relationships between relevant portions, and do not limit absolute positions of the relevant portions.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for generating intelligent agent dialogue sentences in man-machine dialogue is characterized by comprising the following steps:

generating and outputting a current response sentence for the agent by utilizing a pre-trained dialogue generation model based on the dialogue historical data and the candidate knowledge subset;

wherein, the extracting the attribute values and the scene categories in the preset knowledge base comprises:

inputting the dialogue history vector into a CRF layer of the natural language understanding model for sequence marking to obtain the attribute value contained in the dialogue history data;

2. The method of claim 1, wherein the screening of relevant knowledge triples from the knowledge base based on the attribute values and the scene categories to obtain candidate knowledge subsets comprises:

if the scene category is question answering, traversing each attribute value contained in the latest dialog in the dialog historical data, and searching a knowledge triple containing the attribute value from the knowledge base; constructing the candidate knowledge subsets by using all searched knowledge triples;

and if the scene category is task-based dialogue, traversing each key entity value in the attribute values, searching a knowledge triple which contains the key entity value and is related to the current man-machine dialogue task from the knowledge base, and constructing the candidate knowledge subset by using all the searched knowledge triples.

3. The method of claim 1, wherein generating a current response statement for an agent using a pre-trained dialogue generation model based on the dialogue history data and the candidate knowledge subset comprises:

inputting the dialogue historical data into a dialogue encoder of the dialogue generating model for encoding to obtain a comprehensive characterization vector C of the dialogue historical data and word vectors of all words contained in the dialogue historical data;

and generating the response sentence by utilizing a natural language generator of the dialogue generation model based on the comprehensive characterization vector C of the dialogue historical data, the comprehensive characterization vector kg of the candidate knowledge subset, the word vector and the vector representation of the knowledge triplet.

4. The method of claim 3, wherein the inputting the dialogue history data into the dialogue coder of the dialogue generating model for coding comprises:

dividing the expanded dialogue historical data according to the dialogue turns;

5. The method of claim 3, wherein the inputting the subset of candidate knowledge into the knowledge coder of the dialog generation model for encoding comprises:

6. The method of claim 3, wherein generating the response sentence using a natural language generator of the dialog generation model comprises:

splicing the vector representation of the knowledge triples with the word vectors, and writing a spliced result M into a memory network of the natural language generator; wherein M = [ (h) ₁ ,...,h _n )；(k ₁ ,...,k _g )]＝[M ₁ ,...,M _n+g ]，h _n Representing an nth word vector; k is a radical of _g A vector representation representing the g-th knowledge triplet; n represents the number of word vectors; g represents the number of knowledge triples;

initial query vector s using GRU of the natural language generator for decoding ₀ Initializing a splicing result of the comprehensive characterization vector C and the comprehensive characterization vector kg;

at each time t at which the GRU decodes, the GRU is based on a query vector s at a previous time _t-1 And the generated word y at the previous moment _t-1 Generating a query vector s of the current time t _t Computing the query vector s using an attention mechanism _t Obtaining the query vector s according to the correlation degree of each storage unit in the memory network _t A degree of correlation with each of the words in the dialogue history data

Based on the degree of correlation

Calculating a joint representation c of the dialogue historical data by adopting a weighted summation mode _t Based on said degree of correlation

Computing a joint representation g of the candidate knowledge subsets by means of weighted summation _t (ii) a With said c _t As a query vector, accessing the memory network in a multi-hop manner to obtain a knowledge distribution p _ptr (ii) a In the g _t As a query vector, accessing a preset dictionary by adopting a multi-layer perceptron to obtain dictionary distribution p _vocab (ii) a Distributing p based on said knowledge _ptr And the dictionary distribution p _vocab Obtaining the generated word y at the current time t by using a gating mechanism _t ；

And the GRU generates a current response statement for the intelligent agent based on the generated words at all the moments.

7. An apparatus for generating dialog sentences of an agent in a human-computer dialog, comprising:

the information extraction module is used for extracting attribute values and scene categories in a preset knowledge base from conversation historical data of the current man-machine conversation by utilizing a pre-trained natural language understanding model; wherein the knowledge base is composed of knowledge triples; wherein, the extracting the attribute values and the scene categories in the preset knowledge base comprises: splicing the dialogue historical data with a preset special mark, and inputting the dialogue historical data into a coder of the natural language understanding model for coding to obtain a corresponding dialogue historical vector and a scene information vector; inputting the dialogue history vector into a CRF layer of the natural language understanding model for sequence labeling to obtain the attribute values contained in the dialogue history data; inputting the scene information vector into a multilayer perceptron of the natural language understanding model for scene classification to obtain a scene category of the man-machine conversation;

8. The generation equipment of the intelligent agent dialogue sentences in the man-machine dialogue is characterized by comprising a processor and a memory;

the memory stores an application program executable by the processor for causing the processor to execute the method for generating dialog statements in a human-computer dialog according to any one of claims 1 to 6.

9. A computer-readable storage medium having stored therein computer-readable instructions for executing the method for generating an agent dialog statement in a human-computer dialog according to any one of claims 1 to 6.