CN115525740A

CN115525740A - Method and device for generating dialogue response sentence, electronic equipment and storage medium

Info

Publication number: CN115525740A
Application number: CN202110702211.XA
Authority: CN
Inventors: 曹源
Original assignee: TCL Technology Group Co Ltd
Current assignee: TCL Technology Group Co Ltd
Priority date: 2021-06-24
Filing date: 2021-06-24
Publication date: 2022-12-27

Abstract

The application provides a generation method and device of a dialogue response sentence, electronic equipment and a computer readable storage medium. The generation method of the dialog response statement comprises the following steps: acquiring a statement to be replied; acquiring a target expression vector of the statement to be replied according to a preset vector format based on the scene information in the statement to be replied; performing feature extraction based on the target expression vector to obtain the sentence features of the sentence to be replied; and determining a target answer sentence of the sentence to be replied based on the sentence characteristics. According to the method and the device, the reply of the man-machine conversation can be generated by potentially integrating information of various aspects such as emotion, topics and knowledge of human interaction based on the scene information in the sentence to be replied, and the accuracy of the reply of the man-machine conversation is improved to a certain extent.

Description

Method and device for generating dialogue response sentence, electronic equipment and storage medium

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a method and a device for generating a dialogue response statement, electronic equipment and a computer-readable storage medium.

Background

With the rapid development of artificial intelligence, artificial intelligence technology is applied in more and more fields, wherein man-machine conversation communication is an important expression form of artificial intelligence.

In recent years, various deep learning models try to simulate the capability of man-machine conversation interaction, and the traditional man-machine conversation model takes question-and-answer conversation corpora as a data set to train an end-to-end man-machine conversation model.

However, human dialogue exchange is not a linear response in a simple time sequence, and the existing man-machine dialogue model cannot integrate information of various aspects such as emotion, topic, knowledge and the like of human interaction to generate a reply of the man-machine dialogue, so that the existing man-machine dialogue model has the problem that the generated reply is not accurate enough.

Disclosure of Invention

The application provides a method and a device for generating a dialogue response sentence, electronic equipment and a computer readable storage medium, and aims to solve the problem that the reply generated by the existing man-machine dialogue model is not accurate enough.

In a first aspect, the present application provides a method for generating a dialog response statement, where the method includes:

acquiring a sentence to be replied;

acquiring a target expression vector of the statement to be replied according to a preset vector format based on the scene information in the statement to be replied;

performing feature extraction based on the target expression vector to obtain the sentence features of the sentence to be replied;

and determining a target answer sentence of the sentence to be replied based on the sentence characteristics.

In a second aspect, the present application provides a dialog response sentence generation device, including:

the acquisition unit is used for acquiring the sentence to be replied;

the representation unit is further used for acquiring a target representation vector of the statement to be replied according to a preset vector format based on the scene information in the statement to be replied;

the feature extraction unit is used for extracting features based on the target expression vector to obtain the sentence features of the sentence to be replied;

and the generating unit is used for determining a target answer sentence of the sentence to be replied based on the sentence characteristics.

In a third aspect, the present application further provides an electronic device, where the electronic device includes a processor and a memory, where the memory stores a computer program, and the processor executes, when calling the computer program in the memory, any one of the steps in the method for generating a dialog response statement provided in the present application.

In a fourth aspect, the present application further provides a computer-readable storage medium, on which a computer program is stored, the computer program being loaded by a processor to execute the steps in the method for generating a dialog response sentence.

The method and the device have the advantages that the scene information in the sentence to be replied is fused into the vector expression of the sentence to be replied to obtain the target expression vector of the sentence to be replied, so that the vector expression quality of the sentence to be replied is enhanced, and the sentence to be replied has richer content expression; and then the target response sentence of the sentence to be replied is determined through the target expression vector of the sentence to be replied, and because of the scene information in the sentence to be replied, the information of various aspects such as emotion, topic, knowledge and the like of human interaction can be potentially integrated to generate the reply of the man-machine conversation, so that the accuracy of the reply of the man-machine conversation is improved to a certain extent.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic diagram of a scenario of a dialog response sentence generation detection system provided in an embodiment of the present application;

fig. 2 is a schematic flowchart of a method for generating a dialog response statement according to an embodiment of the present application;

FIG. 3 is a schematic diagram illustrating a principal framework provided in an embodiment of the present application;

FIG. 4 is a schematic diagram illustrating a vector stitching algorithm of the vector stitching module provided in the embodiment of the present application;

FIG. 5 is a schematic diagram illustrating a training process for a dialog generation model provided in an embodiment of the present application;

FIG. 6 is a schematic diagram of an interaction relationship of a sample dialog provided in an embodiment of the present application;

FIG. 7 is a schematic illustration of an illustrative dialog feature diagram provided in an embodiment of the present application;

FIG. 8 is an illustrative diagram of a sub-graph provided in an embodiment of the present application;

FIG. 9 is a schematic illustration of a graph convolution operation provided in an embodiment of the present application;

fig. 10 is a schematic structural diagram of an embodiment of a device for generating a dialog response statement provided in an embodiment of the present application;

fig. 11 is a schematic structural diagram of an embodiment of an electronic device provided in an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described clearly and completely with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In the description of the embodiments of the present application, it should be understood that the terms "first", "second", and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, features defined as "first" and "second" may explicitly or implicitly include one or more of the described features. In the description of the embodiments of the present application, "a plurality" means two or more unless specifically defined otherwise.

The following description is presented to enable any person skilled in the art to make and use the application. In the following description, details are set forth for the purpose of explanation. It will be apparent to one of ordinary skill in the art that the present application may be practiced without these specific details. In other instances, well-known processes have not been described in detail so as not to obscure the description of the embodiments of the present application with unnecessary detail. Thus, the present application is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed in the embodiments herein.

The execution main body of the method for generating a dialog response statement in the embodiment of the present application may be the apparatus for generating a dialog response statement provided in the embodiment of the present application, or different types of electronic devices such as a server device, a physical host, or a User Equipment (UE) that integrates the apparatus for generating a dialog response statement, where the apparatus for generating a dialog response statement may be implemented in a hardware or software manner, and the UE may specifically be a terminal device such as a smart phone, a tablet computer, a notebook computer, a palmtop computer, a desktop computer, or a Personal Digital Assistant (PDA).

The electronic equipment can adopt a working mode of independent operation or a working mode of equipment cluster, and by applying the generation method of the dialogue response sentence provided by the embodiment of the application, because of the scene information in the sentence to be replied, the information of emotion, topic, knowledge and the like of human interaction can be potentially integrated to generate the reply of the human-computer dialogue, and the accuracy of the reply of the human-computer dialogue is improved to a certain extent.

Referring to fig. 1, fig. 1 is a schematic view of a scenario of a dialog response sentence generation system provided in an embodiment of the present application. The system for generating the dialog response sentence may include the electronic device 100, and the generating device of the dialog response sentence is integrated in the electronic device 100. For example, the electronic device may obtain a sentence to be replied; acquiring a target expression vector of the statement to be replied according to a preset vector format based on the scene information in the statement to be replied; performing feature extraction based on the target expression vector to obtain the sentence features of the sentence to be replied; and determining a target answer sentence of the sentence to be replied based on the sentence characteristics.

In addition, as shown in fig. 1, the system for generating dialog response sentences may further include a memory 200 for storing data, such as storing sentence data.

It should be noted that the scenario diagram of the dialog response statement generation system shown in fig. 1 is only an example, the dialog response statement generation system and the scenario described in the embodiment of the present application are for more clearly illustrating the technical solution of the embodiment of the present application, and do not form a limitation on the technical solution provided in the embodiment of the present application, and as can be known by those skilled in the art, with the evolution of the dialog response statement generation system and the occurrence of a new service scenario, the technical solution provided in the embodiment of the present invention is also applicable to similar technical problems.

In the following, a method for generating a dialog response statement provided in an embodiment of the present application is described, where an electronic device is used as an execution subject, and for simplicity and convenience of description, the execution subject is omitted in subsequent embodiments of the method, and the method for generating a dialog response statement includes: acquiring a statement to be replied; acquiring a target expression vector of the statement to be replied according to a preset vector format based on the scene information in the statement to be replied; performing feature extraction based on the target expression vector to obtain sentence features of the sentence to be replied; and determining a target answer sentence of the sentence to be replied based on the sentence characteristics.

Referring to fig. 2, fig. 2 is a schematic flowchart of a method for generating a dialog response statement according to an embodiment of the present application. It should be noted that, although a logical order is shown in the flow chart, in some cases, the steps shown or described may be performed in an order different than that shown or described herein. The generation method of the dialogue response statement comprises a step 201 to a step 204, wherein:

201. and acquiring the statement to be replied.

The sentence to be replied is a sentence which needs to be replied in a man-machine conversation, for example, in a telephone fee recharging service application, a user sends' i recharged 100 yuan, what do if recharging fails? If yes, then in the man-machine conversation between the user and the intelligent online customer service, the sentence to be replied is "how do i have recharged 100 yuan and failed with recharge? ".

In step 201, there are various ways to obtain the statement to be replied, which exemplarily include:

(1) The electronic equipment is used as interactive equipment of the man-machine conversation, sentences in the man-machine conversation are obtained in real time, and the sentences input by the user are recognized to be used as the sentences to be replied.

(2) The electronic equipment can be connected with interactive equipment of man-machine conversation, and the sentence to be replied input by the user is obtained from the interactive equipment.

202. And acquiring a target expression vector of the statement to be replied according to a preset vector format based on the scene information in the statement to be replied.

The scene information is information indicating a conversation topic, background knowledge corresponding to the conversation topic, emotion of a speaker, and the like, for example, emotion information, topic information, extension information, and the like of a sentence to be replied.

The target expression vector is the expression vector obtained after the vectorization of the statement to be replied.

The preset vector format may be various, and the corresponding target representation vector may also be various in expression form, which exemplarily includes:

1) The preset vector format is that the original sentence vector of the sentence to be replied and the emotion enhancement vector of the sentence to be replied are spliced in sequence, and the target expression vector is obtained by splicing the original sentence vector of the sentence to be replied and the emotion enhancement vector. For example, the target representation vector expression format of the to-be-replied statement is shown in table 1 below.

TABLE 1

Vector Pm-Un of original sentence

Mood enhancement vector Ue

The emotion enhancement vector is obtained after emotion information contained in the sentence to be replied is converted into a vector form capable of being processed by a computer program.

For example, seven main emotional categories of anger, aversion, fear, happiness, sadness, excitement and neutrality may be preset. The emotion enhancement vector Ue may be set to a 7-bit array structure for storing emotion information of the sentence to be replied. For example, each bit in a 7-bit array structure [0, 0] sequentially represents anger, aversion, fear, happiness, sadness, excitement, and neutrality, the position of each emotion in the emotion enhancement vector Ue array structure is "1" indicating that the emotion is present in the sentence to be replied, and "0" indicating that the emotion is not present in the sentence to be replied. For example, the emotion enhancement vector Ue = [0,1, 0], which indicates that the sentence to be replied carries emotion information of "aversion".

In this case, step 202 may specifically include the following steps 2021A to 2023A:

2021A, obtaining the original sentence vector of the sentence to be replied.

The original sentence office vector is obtained by converting the sentence to be replied into a vector form which can be processed by a computer program. For example, the sentence to be replied may be obtained by converting the sentence to be replied by a commonly used text vectorization tool word2vec, gloVe, or BERT.

2022A, identifying target emotion information of the sentence to be replied.

The target emotion information refers to the emotion of the speaker who is to reply to the sentence.

Specifically, the target emotion information of the sentence to be replied can be identified according to the original sentence vector through the trained emotion identification model. For example, seven main emotional categories of anger, disgust, fear, happiness, sadness, excitement and neutrality are preset, and the corresponding category labels are 0,1,2,3,4,5 and 6. And identifying the category label of the sentence to be replied to be 0 according to the original sentence vector through the emotion identification model, wherein the emotion of the speaker of the sentence to be replied is 'angry'.

2023A, generating a target representation vector of the sentence to be replied according to the original sentence vector, the target emotion information and the preset vector format.

Specifically, target emotion information is represented by an emotion enhancement vector Ue; and then splicing the original sentence vector Pm-Un and the emotion enhancement vector Ue according to a preset vector format to obtain a target expression vector of the sentence to be replied.

As shown in table 1, in the embodiment of the present application, the statements to be replied are uniformly processed and expressed in a preset vector format, and the emotion tags are spliced on the vector expression of the statements to be replied, so that the vector expression quality of the statements to be replied is enhanced, and richer content expression is achieved, so that subsequently predicted response statements can fuse emotion information, and the accuracy and the authenticity of the human-computer interaction reply statements are improved.

2) The preset vector format is that an original sentence vector of a sentence to be replied and a topic enhancement vector of the sentence to be replied are sequentially spliced, and the target expression vector is a vector obtained by splicing the original sentence vector and the topic enhancement vector of the sentence to be replied. For example, the target representation vector expression format of the to-be-replied statement is shown in table 2 below.

TABLE 2

Primitive sentence vector Pm-Un

Topic enhancement vector Ut

The topic enhancement vector is a vector obtained after topic information contained in the sentence to be replied is converted into a vector form capable of being processed by a computer program.

For example, four main topic categories of movie, character, eating, and unknown may be preset. The topic enhancement vector Ut can be set to a 4-bit array structure for storing the topic information of the statement to be replied. For example, each bit in the array structure [0, 0] of 4 bits sequentially represents a movie, a person, a praise, and unknown, the position of each topic in the array structure of the topic enhancement vector Ut is "1" to indicate that the topic exists in the sentence to be replied, and "0" to indicate that the topic does not exist in the sentence to be replied. For example, the topic enhancement vector Ut = [0,1, 0], which represents the topic of the sentence to be replied with "person".

In this case, step 202 may specifically include the following steps 2021B to 2023B:

2021B, obtaining the original sentence vector of the sentence to be replied.

Step 2021B is similar to step 2021A described above and will not be described herein again.

2022B, identifying target topic information of the sentence to be replied.

The target topic information refers to topics contained in the to-be-replied sentence.

Specifically, the target topic information of the sentence to be replied can be identified according to the original sentence vector through the trained topic identification model.

2023B, generating a target representation vector of the to-be-replied sentence according to the original sentence vector, the target topic information and the preset vector format.

Specifically, the target topic information is represented by a topic enhancement vector Ut; and then splicing the original sentence vector Pm-Un and the topic enhancement vector Ut according to a preset vector format to obtain a target expression vector of the sentence to be replied.

As shown in table 2, in the embodiment of the present application, the statements to be replied are uniformly processed and expressed in a preset vector format, and due to the fact that the tags of the topics are spliced on the vector expression of the statements to be replied, the vector expression quality of the statements to be replied can be enhanced, and richer content expression is provided, so that topic information can be fused in the subsequently predicted reply statements, and the accuracy and the authenticity of the human-computer interaction reply statements are improved.

3) The preset vector format is that an original sentence vector of a sentence to be replied, an emotion enhancement vector of the sentence to be replied, a topic enhancement vector of the sentence to be replied and a knowledge enhancement vector of the sentence to be replied are spliced in sequence, and the target expression vector is obtained by splicing the original sentence vector, the emotion enhancement vector, the topic enhancement vector and the extension sentence vector of the sentence to be replied. For example, the target representation vector expression format of the to-be-replied statement is shown in table 3 below.

TABLE 3

Vector Pm-Un of original sentence

Emotion enhancement vector Ue

Topic enhancement vector Ut

Knowledge enhancement vector Uk

The knowledge enhancement vector is a vector obtained after an expanded statement of a statement to be replied is converted into a vector form which can be processed by a computer program.

In this case, step 202 may specifically include the following steps 2021C to 2024C:

2021C, obtaining the original sentence vector of the sentence to be replied.

Step 2021C is similar to step 2021A described above and will not be described herein again.

2022C, identifying target emotion information and target topic information of the sentence to be replied.

The target emotion information and the target topic information of the to-be-replied sentence identified in step 2022C may be described with reference to step 2022A and step 2022B, respectively, and are not described herein again.

2023C, acquiring target extension information of the statement to be replied according to the target topic information of the statement to be replied;

the target expansion information is obtained by performing knowledge expansion according to topics contained in the sentence to be replied to obtain background knowledge of the sentence to be replied.

For example, "Exclusive expert 2" is talking about the topic "movie", the director is "Zhang three", and the director has also taken "Du 2". Through the knowledge expansion similar to the above, the background knowledge of the sentence to be replied can be expanded, the target expansion information of the sentence to be replied is obtained, and the artificial intelligence algorithm model is facilitated to make a response closer to the theme and accurate.

The knowledge expansion method of the sentence to be replied has various ways, for example, the knowledge expansion method can be manually introduced based on the existing knowledge according to topics. The execution subject in the embodiment of the present application may also be obtained by querying an external knowledge base such as a Baidu base according to the topic of the sentence to be replied. The execution main body of the embodiment of the application may also perform deep search or breadth search on a Knowledge Graph (Knowledge Graph) according to the topic of the sentence to be replied, for example, return the expanded related background Knowledge according to the technical scheme adopted by the concept network concept net.

2024C, generating the target expression vector according to the primitive sentence vector, the target emotion information, the target topic information, the target extension information and the preset vector format.

Specifically, first, target extension information is merged into the to-be-replied sentence to obtain a target extension sentence of the to-be-replied sentence. Then, expressing the target emotion information by an emotion enhancement vector Ue, expressing the target topic information by a topic enhancement vector Ut, and expressing the target expanded statement by a knowledge enhancement vector Uk; and finally, splicing the original sentence vector Pm-Un, the emotion enhancement vector Ue, the topic enhancement vector Ut and the knowledge enhancement vector Uk according to a preset vector format to obtain a target expression vector of the sentence to be replied.

As shown in table 3, in the embodiment of the present application, the statements to be replied are uniformly processed and expressed in a preset vector format, and because the statements to be replied are spliced with emotion and topic tags on vector expression and contain extended information of background knowledge, the vector expression quality of the statements to be replied can be enhanced, and the statements to be replied have richer content expression, so that subsequently predicted reply statements can fuse information such as emotion, topic, extended knowledge, and the like, and the accuracy and the authenticity of the human-computer interaction reply statements are improved.

To better understand how to splice to obtain the target expression vector, a specific example is shown, please refer to fig. 3 and fig. 4, and the pseudo-code principle of the vector splicing algorithm is as follows:

1. inputting parameters: primitive sentence vectors, target emotion information and target topic information.

2. The execution process comprises the following steps: 1. checking whether the original sentence vector carries target emotion information, if so, adding a dimension corresponding to an emotion enhancement vector Ue behind the original sentence vector Pm-Un (adding a one-dimensional vector in the embodiment);

2. whether the original sentence vector Pm-Un carries target topic information or not is checked, if yes, the dimensionality corresponding to the topic enhancement vector Ut is added behind the original sentence vector (the number of topics corresponding to the added dimensionality is the most important topic, so that one dimensionality is added on the basis of the original sentence vector);

3. according to topics corresponding to the topic enhancement vectors, an external database is inquired or association matching is carried out through a knowledge map, matched knowledge is blended into a sentence to be replied, a new extension sentence is generated, the extension sentence is vectorized and then recorded as a knowledge enhancement vector Uk, and the dimension of the Uk is consistent with that of an original sentence. And if the plurality of knowledge are matched, generating a plurality of expanded sentences in parallel. If no matching knowledge exists, filling 0 values in all the Uk vectors;

4. and checking whether the generated new vector data structures are aligned or not, ensuring that the vector dimensions of the sentences are kept consistent, and if the corresponding dimensions of the Ue, the Ut and the Uk have no actual data, uniformly setting the dimensions to be 0 values. For example, if no topic is identified in the original sentence vector, the corresponding dimension of the topic is 0; taking a 200-dimensional original sentence vector as an example, when knowledge expansion is performed, if expandable knowledge is not matched, 200 0 values are filled in the corresponding dimensions of the knowledge enhancement vector.

3. And (3) outputting: and obtaining an enhanced target expression vector after splicing.

As can be seen from the contents of the above steps 2021A to 2023A, steps 2021B to 2023B, and steps 2021C to 2024C, the target expression vector of the sentence to be replied can be obtained in a preset vector format based on at least one of the target emotion information, the target topic information, and the target expansion information of the sentence to be replied.

203. And performing feature extraction based on the target expression vector to obtain the sentence features of the sentence to be replied.

In order to improve the ability of extracting information such as emotion, topic, expansion and the like of the sentence to be replied so as to improve the accuracy of the target answer sentence, in some embodiments, feature extraction may be performed on the basis of the target representation vector by a volume module in a trained dialog generation model in the embodiment of the present application, so as to obtain the sentence feature of the sentence to be replied.

In some embodiments, in order to improve the information extraction capability of the emotion, topic, extension, and the like of the sentence to be replied, a target graph convolution parameter learned by the dialogue generation model in the embodiment of the present application may also be extracted in advance, and then convolution calculation is performed based on the target graph convolution parameter and the target representation vector. In this case, step 203 may specifically include the following steps 2031 to 2033:

2031. and acquiring preset target graph convolution parameters.

Wherein the target graph convolution parameter is a weight matrix of a graph convolution module in the trained dialog generation model. Specifically, the weight matrix of the graph convolution module can be obtained from the trained dialog generation model as the target graph convolution parameter. The determination of the convolution parameters of the target graph is described in detail later, and is not described herein again for the sake of simplicity.

2032. And acquiring a preset target Laplace matrix.

The preset target Laplace matrix is determined according to a dialogue characteristic diagram constructed by sample dialogue.

In the embodiment of the present application, a preset target laplacian matrix may be determined through the dialog feature maps constructed in the following steps 501 to 506, and the preset target laplacian matrix is directly obtained in step 2032. Since the determination of the target laplacian matrix is described in detail later, the description is omitted here for the sake of simplicity.

2033. And performing convolution calculation according to the target expression vector, the target graph convolution parameters and the target Laplace matrix to obtain the sentence characteristics of the sentence to be replied.

Specifically, a target representation vector X-c, a target graph convolution parameter W, and a target Laplace matrix L α are executed _β Performing matrix multiplication operation to obtain an implicit characteristic vector of the statement to be replied; and taking the implicit characteristic vector of the statement to be replied as the statement characteristic of the statement to be replied.

Further, in order to improve the quality of feature extraction of the to-be-replied sentence, after the implicit feature vector of the to-be-replied sentence, non-linear activation and regularization may be further performed, where step 2033 may specifically include: performing convolution calculation according to the target expression vector, the target graph convolution parameters and the target Laplace matrix to obtain an implicit feature vector of the statement to be replied; carrying out nonlinear activation and regularization processing on the implicit characteristic vector to obtain an intermediate characteristic vector of the sentence to be replied; and carrying out weighted average according to the intermediate characteristic vector and the target expression vector to obtain the sentence characteristics of the sentence to be replied.

The following describes the process of performing nonlinear activation and regularization processing on the implicit feature vector by using an algorithm pseudo-code principle:

1. inputting parameters: implicit feature vectors of the to-be-replied statement.

2. The execution process comprises the following steps: 1. carrying out nonlinear activation sigma on the implicit characteristic vector of the sentence to be replied;

2. carrying out regularization treatment: log (σ (attentioaggregate ()); wherein, σ represents a nonlinear activation function, which can be ReLU, sigmoid, etc.; attentionalAggregate () represents the implicit feature vector of the statement to be replied to

3. Updating with the target expression vector X-c: avg (X-c + log (sigma (AtlantAggregate ()))) to get statement feature X-hat.

3. And (3) outputting: and after the graph convolution updating operation, the statement characteristic X-hat of the statement to be replied.

204. And determining a target answer sentence of the sentence to be replied based on the sentence characteristics.

The target answer sentence is an answer sentence of the sentence to be replied, which is obtained by predicting the sentence characteristics of the sentence to be replied.

In some embodiments, the sentence generation module in the embodiment of the present application may output the target answer sentence of the sentence to be replied according to the convolution feature of the sentence to be replied.

In some embodiments, the weight parameter of the statement generation module may be extracted from the dialog generation model trained in the embodiments of the present application; and outputting the target answer sentence of the sentence to be replied according to the convolution characteristic of the sentence to be replied based on the extracted weight parameter of the sentence generation module.

As can be seen from the above, in the embodiment of the application, the target expression vector of the sentence to be replied is obtained by fusing the scene information in the sentence to be replied into the vector expression of the sentence to be replied, so that the vector expression quality of the sentence to be replied is enhanced, and richer content expression is achieved; and determining a target answer sentence of the sentence to be replied through the target expression vector of the sentence to be replied, and potentially integrating information of various aspects such as emotion, topic, knowledge and the like of human interaction to generate a reply of the human-computer conversation based on scene information in the sentence to be replied, so that the accuracy of the reply of the human-computer conversation is improved to a certain extent.

The following describes a training process of the dialog generation model in the embodiment of the present application. The dialog generation model to be trained may include a feature extraction module and a sentence generation module. The feature extraction module and the statement generation module can be set up according to actual requirements.

And the feature extraction module is used for performing feature extraction operations such as convolution, sampling and the like on the target sentence vector to obtain the target convolution feature of the original sentence.

In some embodiments, the feature extraction module may be a loop structure such as a Recurrent Neural Networks (RNN), long-Short Term Memory Networks (LSTM), gated loop Unit (GRU).

In some embodiments, the Graph convolution module provided in the embodiments of the present application may be used as a feature extraction module, and the Graph convolution module in the embodiments of the present application is a Graph convolution neural Network (GCN) structure. In the embodiment of the present application, the feature extraction module is a graph convolution module as an example.

And the sentence generating module is used for performing feature coding and decoding on the basis of the target convolution feature of the original sentence and outputting a response sentence of the original sentence. In some embodiments, the statement generation module may be a conventional encoder-decoder architecture.

To facilitate understanding of the working principle of the dialog generation model in the embodiment of the present application, a principle framework in the embodiment of the present application is described first, and the whole training process of the dialog generation model is described in the embodiment of the present application by taking the principle framework shown in fig. 3 as an example, where fig. 3 includes:

1) Dialogue statement vectorization processing module

The task of this module is to convert each chinese conversational sentence into a vector form that can be processed by a computer program.

2) Emotion recognition module

The module is tasked with determining and identifying the emotion of the speaker contained in each conversation. For example, the module identifies the emotion implied therein from the conversation and records the emotion label as Ue, with the Ue field taking values of [0,1,2,3,4,5,6] for seven major categories of emotion, 0-Anger, 1-distust-aversion, 2-Fear, 3-Happy, 4-Sad-sadness, 5-Surprise-excitement, and 6-Neutral, respectively.

3) Topic identification module

The module is used for extracting main topics from conversation sentences, for example, a mature LDA model can be adopted, the LDA model can identify probability distribution of the topics, only the topic type with the maximum probability is reserved in the embodiment of the application for further simplification processing, the topic is recorded and marked as Ut, the Ut is an array structure, and the identified main topics are stored in the Ut, so that preparation is made for later knowledge expansion.

4) Knowledge extension module

The module has the task of expanding knowledge according to main topics contained in conversation sentences. For example, "expert 2 in bomb disposal" is talking about the topic "movie", the director is "zhang san", and the director has also taken "poison scanning 2". By such knowledge expansion, the background knowledge of the conversation can be expanded, which is helpful for the artificial intelligence algorithm model to make more close-to-subject and accurate responses. Illustratively, knowledge expansion has two routes: firstly, external knowledge bases such as Wikipedia, baidu and the like are inquired according to topics; secondly, deep search or breadth search is carried out on a Knowledge Graph (Knowledge Graph) according to topics, and related expanded Knowledge is returned if the technical scheme adopted by the concept network concept net is adopted. And recorded in a Uk array structure, with extended associated knowledge stored therebetween. In the later vector splicing stage, the data structures of Ue, ut and Uk are all expanded into a vector form, so that the sentence vector form can participate in model calculation.

5) Vector splicing module

The module has the task of splicing common dialogue sentences and dialogue sentence vectors with emotion, topics and knowledge expansion, so that the subsequent composition and image convolution operation are facilitated.

6) Graph convolution module

The module is used for constructing a dialog diagram, performing diagram convolution operation on the dialog diagram, and obtaining dialog sentence vectors with stronger expression capacity through the convolution operation to participate in the subsequent generation of dialog response sentences. And the vector splicing module obtains a spliced sentence vector as X-c (representing the spliced vector), and after the convolution operation is executed, the obtained new vector is X-hat (representing the convolved vector).

7) Statement generation module

The module is used for adopting an encoder-decoder (encoder-decoder) structure, taking a new dialogue statement vector with richer expression capability obtained by the graph convolution module as the input of the encoder, and outputting a better response statement by the decoder after model learning.

As shown in fig. 5, in the embodiment of the present application, the training process of the dialog generation model includes the following steps 501 to 506:

501. and acquiring a target sentence vector of an original sentence in the sample dialogue and a sample answer sentence of the original sentence.

The sample dialogue comprises a plurality of sentences, and each sentence in the sample dialogue can be used as an original sentence.

In the embodiment of the present application, all sentences that two or more speakers have appeared in one round of conversation are used as sample conversations. The original sentence in the sample conversation refers to a sentence that two or more speakers have appeared in themselves in a round of conversation.

For example, a sample dialog may include the following statement set 1:

P1-U1: it is not exaggeratedly said that after seeing the bomb disposal expert 2, i have repeated for a while before leaving the seat.

P2-U1: it is really the favorite one of the movies that I watch in domestic theatres this year.

P3-U1: because the first part is just shot, where the second part can go?

P1-U2: what, do you feel unsightly? The first part is also good.

P3-U2: zhang III belongs to a director who takes a picture of what type, a horror picture, a love picture, an action picture, and all the other things but all the things are common.

P1-U3: i have no language to you, you have a bias to the director, and the picture is good looking just from the "Dudu 2".

P3-U3: i really express my view only and have no meaning of offending to you.

P1-U4: the words are not speculative, and the half sentences are really multiple.

P2-U2: good cheer, good cheer is not exactly what one movie is watched together.

P3-U4: i do not know why she so intentioned to my evaluations as if she were angry.

P2-U3: all are noisy, and we go to green tea together to eat and are good.

In the above sample dialog, pm-Un represents the nth speech uttered by the mth speaker, for example, P1-U1 represents the 1 st speech uttered by the 1 st speaker, and so on.

The sample answer sentence refers to a better answer sentence in the original sentence. The sample answer statement may take a variety of forms, including, for example:

1) And acquiring a response statement corresponding to the original statement in the sample conversation as a sample response statement of the original statement.

For example, in the example of the sample dialogue, if the original sentence is the 1 st sentence P1-U1 spoken by the 1 st person, the answer sentence P2-U1 corresponding to the original sentence P1-U1 in the sample dialogue may be used as the sample answer sentence of the original sentence P1-U1. Or taking the answer sentence P3-U1 corresponding to the original sentence P1-U1 in the sample dialogue as a sample answer sentence of the original sentence P1-U1.

2) And acquiring a response sentence corresponding to the original sentence in the sample conversation, and expanding the response sentence to obtain a sentence serving as a sample response sentence of the original sentence.

For example, in the example of the sample conversation, if the original sentence is the 2 nd sentence P1-U2 spoken by the 1 st person, the answer sentence P3-U2 "zhang san corresponding to the original sentence P1-U2 in the sample conversation may be extended to show what type is taken, horror, love, action, and what he takes, but what is taken is common," the obtained sentence "i really has no language to you and you have a bias to the show," virus eradication 2", zhang san takes, and you have a good look of the beat" as the sample answer sentence of the original sentence P1-U2.

The target sentence vector is an expression vector obtained by vectorizing the original sentence. The manner of "obtaining the target sentence vector of the original sentence" in step 501 is similar to the manner of "obtaining the target expression vector of the to-be-replied sentence" in step 202, and reference may be made to the description of step 202 specifically, and details are not repeated here.

502. And acquiring a dialog feature map of the sample dialog.

In some embodiments, in order to improve the feature expression capability, a dialogue feature map is constructed based on sentence interaction relations such as occurrence time sequence and response relation among sentences, and feature extraction is performed based on the dialogue feature map. In this case, step 502 may specifically include steps 5021 to 5022:

5021. and obtaining the sentence interaction relation of the sample conversation.

And the sentence interaction relationship comprises at least one of the occurrence time sequence and the response relationship among the sentences of the sample conversation.

The actual dialogue includes not only simple one-to-one linear responses in time series (hereinafter, referred to as sequential responses), but also many-to-one responses (hereinafter, referred to as parallel responses) or one-to-many responses (hereinafter, referred to as parallel responses). The response relationship includes sequential response, parallel response, and concurrent response.

For example, the sample dialog shown in step 501 above has an interaction relationship as shown in FIG. 6.

5022. And generating a dialogue characteristic diagram of the sample dialogue according to the statement interaction relation.

In some embodiments, in order to improve the information association capability of the dialog feature map to improve the learning capability of the dialog generation model to be trained on information and further improve the feature extraction capability of subsequent sentences, the step 5022 may specifically include the following steps a1 to a2:

and a1, taking each statement in the target statement set as a graph node to obtain a graph node set of the sample conversation.

The target sentence set refers to a set of sentences of the sample conversation. For example, the target sentence set includes each original sentence in the sample dialog. As another example, the set of target statements includes each original statement in the sample conversation, as well as an expanded statement of the original statement.

The "expanded statement of the original statement" may be determined by referring to the determination manner of the "target expanded statement of the to-be-replied statement" in the above steps 2021C to 2024C, and for simplification of description, details are not repeated here.

To better understand the expanded sentence, a specific example is illustrated. For example, in the sample dialogue example of step 501, after background knowledge expansion, the following sentence set 2 of the following sample dialogue can be obtained:

P1-U1: it is not exaggeratedly said that after seeing the expert 2 for shell dismantling, I can leave from the seat after a while. (subject: movie; emotion: surpirise)

P2-U1: it is really the favorite one of the movies that I watch in domestic theatres this year. (theme: movie; emotion: joy)

P3-U1: because the first part is just shot, where the second part can go? (theme: movie; emotion: dispust)

P1-U2: what, do you feel unsightly? The first part is also good. (subject: movie; emotion: sad)

P3-U2-1: zhang III belongs to a director who takes a picture of what type, a horror picture, a love picture, an action picture, and all the other things but all the things are common. (theme: movie; emotion: neural)

P3-U2-2: zhang III, the director of expert 2, who dismissed bullets, belongs to the director who took what type, the horror film, the love film, the action film, and what he took, but what he took was very common. ( Subject matter: a movie; emotion: neural; and (3) knowledge expansion: name of a person )

P1-U3-1: i have no language to you, you have a bias to the director, and the picture is good looking just from the "Dudu 2". ( Subject matter: a character; emotion: sad; and (3) knowledge expansion: name of movie )

P1-U3-2: i have no language to you, you have a bias to the director, and the picture is good if you take three shots, dudu 2. ( Subject matter: a character; emotion: sad; and (3) knowledge expansion: name of movie )

P3-U3: i really express my view only and have no meaning of offending to you. (subject: unknown; emotion: neural)

P1-U4: the words are not speculative, and the half sentences are really multiple. (subject: unknown; emotion: sad)

P2-U2: good cheer, good cheer is not exactly what one movie is watched together. (subject: movie; emotion: neural)

P3-U4: i do not know why she so intentioned to my evaluations as if she were angry. (subject: unknown; mood: clustrate)

P2-U3-1: all are noisy, and we go to green tea together to eat and are good. (subject: eating; emotion: happy)

P2-U3-2: people are quite noisy, and people can go to a green tea Hangzhou vegetable restaurant together to eat and are good. (subject: eating; emotion: happy; knowledge extension: restaurant name).

Wherein, P3-U2-2 is an extended statement of the original statement P3-U2, and P1-U3-2 is an extended statement of the original statement P1-U3.

In some embodiments, the dialog feature graph takes each original statement in the sample dialog as a graph node, resulting in a set of graph nodes for the sample dialog.

In some embodiments, the dialog feature graph takes each original statement in the sample dialog and the expanded statement of the original statement as graph nodes, resulting in a set of graph nodes for the sample dialog.

and a2, connecting edges of all nodes in the graph node set according to at least one of the occurrence time sequence and the response relation among all sentences of the sample conversation to obtain a conversation feature graph of the sample conversation.

For example, in step 502, each sentence in the target sentence set is used as a node by the graph convolution module, and the sentence interaction relationship (including the occurrence time sequence and the response relationship between the sentences) of the sample conversation, the relationship between the original sentence and the expanded sentence, and the self-loop of the preceding and following sentences of the same speaker are used as the basis for connecting the edges, so as to form the conversation feature graph shown in fig. 7.

503. And performing convolution operation according to the dialogue characteristic diagram and the target sentence vector through a diagram convolution module in the dialogue generating model to be trained to obtain the target convolution characteristic of the original sentence.

In some embodiments, a convolution operation may be performed based on a subgraph of an original sentence to obtain a target convolution feature of the original sentence, where step 503 may specifically include the following steps 5031 to 5032:

5031. and acquiring a subgraph of a graph node where the original statement is located according to the conversation feature graph.

The subgraph comprises graph nodes where the original sentences are located and first adjacent nodes of the graph nodes where the original sentences are located.

The first adjacent node refers to a first node adjacent to the graph node where the original statement is located.

Specifically, through the graph convolution module, the graph node where the original statement is located is taken as the center of the subgraph, the first adjacent node of the dialog feature graph and the graph node where the original statement is located is taken as the adjacent node of the subgraph, and the subgraph of the graph node where the original statement is located is generated. Similarly, a subgraph of each original sentence in the sample dialogue can be obtained.

For ease of understanding, a specific example is illustrated. For example, taking the node where the original sentence is located as the node P1-U1 in fig. 7 as an example, the nodes P1-U2, P3-U1, P2-U1, and P1-U4 are the first nodes adjacent to the node P1-U1 where the original sentence is located, and then the nodes P1-U2, P3-U1, P2-U1, and P1-U4 are the first adjacent nodes of the node P1-U1 where the original sentence is located. A subgraph comprising the graph nodes P1-U1, P1-U2, P3-U1, P2-U1 and P1-U4 where the original sentence is located can be constructed and obtained as shown in FIG. 8, and the subgraph is taken as the subgraph of the node where the original sentence is located.

Similarly, a subgraph of the graph node where each original sentence in the sample conversation is located can be constructed.

5032. And carrying out convolution operation according to the subgraph, the target sentence vector and the default weight parameter of the graph convolution module through the graph convolution module to obtain the target convolution characteristic.

In some embodiments, convolution operation can be directly performed based on a laplacian matrix of a subgraph of a graph node where an original sentence is located, a target sentence vector and a default weight parameter of a graph convolution module to obtain a target convolution characteristic; in this case, step 5032 may specifically include the following steps 50321A to 50324A:

50321A, computing an adjacency matrix A of the subgraph.

50322A, calculating degree matrix D of the subgraph.

50323A, calculating Laplace matrix L of the subgraph according to the adjacency matrix A and degree matrix D of the subgraph.

50324A, performing matrix multiplication operation between the target sentence vector X-c, the preset weight parameter W of the graph convolution module, and the Laplacian matrix L of the subgraph to obtain the target convolution characteristic of the original sentence.

Similarly, the original sentences and the expanded sentences in the sample dialogue are processed by the graph convolution module to execute the steps 50321A to 50324A, so as to obtain the corresponding target convolution characteristics.

Referring to fig. 9, in other embodiments, emotion information and an attention coefficient of topic information may also be added to the laplacian matrix of the subgraph of the graph node where the original sentence is located, in this case, the step 5032 may specifically include the following steps 50321B to 50324B:

50321B, obtaining a first Laplace matrix of the subgraph.

Specifically, through the graph convolution module, firstly, an adjacent matrix A and a degree matrix D of a subgraph are calculated; then, calculating a Laplace matrix L of the subgraph as a first Laplace matrix according to the adjacent matrix A and the degree matrix D of the subgraph; for example, the first laplacian matrix is shown in table 4 below, and V1, V2, V3, and V4 in table 4 represent nodes in the subgraph.

TABLE 4

(Laplace matrix) L	v1	v2	v3	v4
					v1	2	-1	-1	-1
v2	-1	0	0	0
					v3	-1	0	0	0
v4	-1	0	0	0

50322B, obtaining sample emotion information and sample topic information of the original sentence.

The step 50322B of "obtaining the sample emotion information and the sample topic information of the original sentence" is similar to the step 2022C of "identifying the target emotion information and the target topic information of the sentence to be replied", and for details, reference may be made to the description of the step 2022C, and details are not repeated here.

50323B, adding an attention coefficient between a first adjacent node of a graph node where the original sentence is located and a graph node where the original sentence is located in the initial laplacian matrix according to a preset emotion weight coefficient, a preset topic weight coefficient, the sample emotion information, and the sample topic information to obtain the second laplacian matrix.

For example, the preset emotion weight coefficient α =0.2 and the preset topic weight coefficient β =0.1, and taking the first laplacian matrix obtained in table 4 as an example, since the node V1 and the node V2 have a topic-related β coefficient, the edge V is opposite to the edge V _1-2 Making weight increment of 0.1 times to obtain edge V _1-2 The new weight is-1.1; similarly, V node 1 and V node 3 have emotion-related alpha coefficients, so opposite side V _1-3 Increment by 0.2 times to obtain edge V _1-3 The new weight is-1.2. Obtaining a second Laplace matrix integrating the emotion attention coefficient and the topic attention coefficient according to the rule

As shown in table 5 below, V1, V2, V3, V4 in table 5 represent nodes in the subgraph.

TABLE 5

(Laplace matrix) L	v1	v2	v3	v4
					v1	2	-1.1	-1.2	-1
v2	-1.1	0	0	0
					v3	-1.2	0	0	0
v4	-1	0	0	0

The above α =0.2 and β =0.1 are merely examples, and specifically, the values of the emotion weight coefficient α and the topic weight coefficient β may be adjusted according to actual circumstances, but are not limited thereto.

50324B, performing convolution calculation by the graph convolution module according to the target sentence vector, the weight parameter preset by the graph convolution module, and the second laplacian matrix to obtain the target convolution characteristic.

Specifically, the graph convolution module executes a target sentence vector X-c, a weight parameter W preset by the graph convolution module, and a Laplacian matrix of a subgraph

And performing matrix multiplication operation to obtain the target convolution characteristic of the original statement, such as the node "New-1" shown in fig. 9.

Similarly, each original sentence and each expanded sentence in the sample dialog pass through the graph convolution module to execute steps 50321B-50324B, so as to obtain the corresponding target convolution characteristics.

Through the steps 50321B to 50324B, when performing convolution calculation on the target sentence vector, an emotion weight coefficient and a preset topic weight coefficient are added to the laplacian matrix of the subgraph of the graph node where the original sentence is located, so that target convolution characteristics including information such as emotion and topic of the original sentence can be extracted and trained, and the graph convolution module can learn information such as emotion and topic of the sentence. And furthermore, the trained dialogue generation model can integrate information of emotion, topic, knowledge and the like of human interaction to generate a reply of the human-computer dialogue, so that the accuracy of the reply generated by the human-computer dialogue model is improved, and the human-computer interaction experience is improved.

Further, in order to improve the feature extraction quality of the graph convolution module, the graph convolution module may further perform nonlinear activation and regularization, in this case, the step 50324B may further include: performing convolution calculation through the graph convolution module according to the target sentence vector, a preset weight parameter of the graph convolution module and the second Laplace matrix to obtain an implicit feature vector of a graph node where the original sentence is located; carrying out nonlinear activation and regularization processing on the implicit characteristic vector through the graph convolution module to obtain an intermediate characteristic vector of a graph node where the original sentence is located; and carrying out weighted average according to the intermediate feature vector and the target sentence vector by the graph convolution module to obtain the target convolution feature.

504. And outputting the prediction response sentence of the original sentence according to the target convolution characteristic through a sentence generating module in the dialog generating model to be trained.

The predicted response sentence refers to a response sentence of an original sentence obtained by prediction.

Specifically, the target convolution characteristic is input to a statement generation module in the dialog generation model to be trained, so that the statement generation module decodes and outputs a prediction response statement of the original statement according to the target convolution characteristic.

505. And determining the training loss of the dialog generation model to be trained according to the predicted response sentence and the sample response sentence.

The training loss of the dialog generation model can be set in various ways, for example, the training loss can be set as the generation loss of the statement generation module; as another example, the training penalty may be set as a feature extraction penalty for the graph convolution module; as another example, the training penalty may be set as a weighted sum of the feature extraction penalty of the graph convolution module and the generation penalty of the sentence generation module.

Specifically, with the sample answer sentence as a learning target, the sentence generation module may set a first loss function correspondingly, and then determine the generation loss of the sentence generation module according to the first loss function, the predicted answer sentence, and the sample answer sentence. The graph convolution module can correspondingly set a second loss function, and then determine the feature extraction loss of the graph convolution module according to the second loss function, the prediction answer sentence and the sample answer sentence.

506. And updating the model parameters of the dialogue generating model according to the training loss until a preset training stopping condition is reached, and obtaining the trained dialogue generating model.

Wherein, the preset training stopping condition can be set according to the actual requirement. For example, when the training loss is smaller than a preset value, or the training loss is not substantially changed, that is, the difference between the training losses corresponding to adjacent training times is smaller than the preset value; or when the number of iterations of the generation of the dialogue response sentence reaches the maximum number of iterations.

For example, when the training loss is set as the generation loss of the sentence generation module, the target sentence vector of the original sentence is input to the dialogue generation model to be trained, forward propagation is performed, and the generation loss of the sentence generation module is calculated from the predicted response sentence output by the dialogue generation model to be trained. And then performing backward propagation according to the generation loss of the sentence generation module, and performing optimization adjustment on the parameters of the sentence generation module so as to perform forward and backward propagation repeatedly until a preset training stopping condition is reached, thus finishing the training of the model and obtaining the trained dialogue generation model. At this point, the trained dialog-generating model may be applied to predict the answer sentence for a sentence.

For another example, when the training loss is set as the feature extraction loss of the graph convolution module, the target sentence vector of the original sentence is input to the dialogue generation model, forward propagation is performed, and the feature extraction loss of the graph convolution module is calculated from the prediction response sentence output by the dialogue generation model. And then carrying out back propagation according to the feature extraction loss of the graph convolution module, optimizing and adjusting the parameters of the graph convolution module, and carrying out forward and back propagation repeatedly until a preset training stopping condition is reached, thus finishing the training of the model and obtaining the trained dialogue generating model. At this time, the trained dialog generation model may be applied to the answer sentence that predicts a certain sentence.

In the embodiment of the application, emotion information, topic information and extension information in an original sentence are blended into vector expression of the original sentence to obtain a target sentence vector of the original sentence, so that the vector expression quality of the original sentence is enhanced, and richer content expression is achieved; the trained dialogue generating model is obtained through training of the target sentence vector of the original sentence, so that the trained dialogue generating model can learn information of emotion, topic, knowledge and the like of human interaction, the trained dialogue generating model can integrate the information of emotion, topic, knowledge and the like of human interaction to generate a reply of the human-computer dialogue, and the accuracy of the reply generated by the human-computer dialogue model is improved to a certain extent.

The determination of the convolution parameters of the target graph and the target laplacian matrix is described below.

1. And (4) convolution parameters of the target image.

And after the trained dialogue generating model is obtained, acquiring the target graph convolution parameter based on a graph convolution module of the trained dialogue generating model. Specifically, a weight matrix of the graph convolution module is obtained from the trained dialog generation model, and the weight matrix is used as the target graph convolution parameter.

For example, when the dialog generation model includes a penalty of the graph convolution module and a penalty of the sentence generation module, the training penalty of the dialog generation model may be set as a penalty of the graph convolution module and a penalty of the sentence generation module, and the step 506 may specifically include: adjusting model parameters of the graph convolution module according to the loss of the graph convolution module; and adjusting the model parameters of the sentence generation module according to the loss of the sentence generation module until a preset training stopping condition is reached, and obtaining a trained dialogue generation model.

The model parameter of the graph convolution module specifically refers to a weight parameter W preset by the graph convolution module in the steps 50324A and 50324B.

The network weight parameter W preset by the graph convolution module in the steps 50324A and 50324B is continuously adjusted in the model training process, that is, when the model parameter of the dialog generation model is adjusted in the step 506, the weight parameter W preset by the graph convolution module is continuously adjusted based on the training loss of the dialog generation model until the model converges, and the network weight parameter W of the graph convolution module is fixed. For the convenience of distinction, in this document, after the model is trained, the weight parameter W preset by the updated convolution module is called as an updated weight parameter, and is denoted as W'. At this time, the weight matrix W' of the graph convolution module may be obtained from the trained dialog generation model as the target graph convolution parameter.

2. The target laplacian matrix.

In some embodiments, after obtaining the dialog feature map of the sample dialog in step 502, the adjacency matrix a of the dialog feature map may be calculated by referring to steps 50321A to 50323A; calculating a degree matrix D of the conversation feature map; calculating a Laplace matrix L of the dialogue characteristic diagram according to the adjacency matrix A and the degree matrix D of the dialogue characteristic diagram to obtain an initial Laplace matrix of the dialogue characteristic diagram; and taking the initial Laplace matrix of the dialogue characteristic diagram as a preset target Laplace matrix.

In some embodiments, in order to improve the attention degree of the graph convolution module to the emotion information and topic information to improve the extraction capability of the emotion information and topic information, an attention coefficient of the emotion information and topic information may also be added to the initial laplacian matrix. At this time, the "target laplacian matrix" may be determined as follows: acquiring an initial Laplace matrix of the dialogue characteristic diagram; acquiring sample emotion information and sample topic information of the original sentence; and according to a preset emotion weight coefficient, a preset topic weight coefficient, the sample emotion information and the sample topic information, adding an attention coefficient between a first adjacent node of a graph node where the original sentence is located and the graph node where the original sentence is located in the initial laplacian matrix to obtain the target laplacian matrix.

Acquiring an initial Laplace matrix of the dialogue characteristic diagram; acquiring sample emotion information and sample topic information of the original sentence; according to a preset emotion weight coefficient, a preset topic weight coefficient, and the sample emotion information and sample topic information, an attention coefficient between a first adjacent node of a graph node where the original sentence is located and a graph node where the original sentence is located is added to the initial laplacian matrix to obtain the target laplacian matrix, which is similar to the implementation of the steps 50321B to 50323B, and specific reference may be made to the description of the steps 50321B to 50323B, which is not described herein again.

In order to better implement the method for generating a dialog response sentence in the embodiment of the present application, on the basis of the method for generating a dialog response sentence, an embodiment of the present application further provides a device for generating a dialog response sentence, as shown in fig. 10, which is a schematic structural diagram of an embodiment of the device for generating a dialog response sentence in the embodiment of the present application, and the device 1000 for generating a dialog response sentence includes:

an obtaining unit 1001 configured to obtain a statement to be replied;

the representing unit 1002 is further configured to obtain a target representation vector of the to-be-replied statement according to a preset vector format based on the scene information in the to-be-replied statement;

a feature extraction unit 1003, configured to perform feature extraction based on the target expression vector to obtain a statement feature of the statement to be replied;

a generating unit 1004, configured to determine a target answer sentence of the to-be-replied sentence based on the sentence characteristic.

In some embodiments of the present application, the scenario information is selected from at least one of target emotion information, target topic information, and target expansion information of the to-be-replied sentence, and the representing unit 1002 is specifically configured to:

and acquiring a target expression vector of the sentence to be replied according to a preset vector format based on at least one of the target emotion information, the target topic information and the target extension information of the sentence to be replied.

In some embodiments of the present application, the scene information includes target emotion information, target topic information, and target extension information, and the representing unit 1002 is specifically configured to:

acquiring an original sentence vector of the sentence to be replied;

identifying target emotion information and target topic information of the sentence to be replied;

acquiring target extension information of the sentence to be replied according to the target topic information;

and generating the target expression vector according to the original sentence vector, the target emotion information, the target topic information, the target expansion information and the preset vector format.

In some embodiments of the present application, the feature extraction unit 1003 is specifically configured to:

acquiring preset target image convolution parameters;

acquiring a preset target Laplace matrix;

and performing convolution calculation according to the target expression vector, the target graph convolution parameters and the target Laplace matrix to obtain the sentence characteristics of the sentence to be replied.

performing convolution calculation according to the target expression vector, the target graph convolution parameters and the target Laplace matrix to obtain an implicit feature vector of the statement to be replied;

carrying out nonlinear activation and regularization processing on the implicit characteristic vector to obtain an intermediate characteristic vector of the sentence to be replied;

and carrying out weighted average according to the intermediate characteristic vector and the target expression vector to obtain the sentence characteristics of the sentence to be replied.

In some embodiments of the present application, the obtaining unit 1001 is specifically configured to:

in some embodiments of the present application, the apparatus for generating a dialog response sentence further includes a training unit (not shown in the figure), where the training unit is specifically configured to:

obtaining a target sentence vector of an original sentence in a sample conversation and a sample answer sentence of the original sentence;

obtaining a dialogue feature map of the sample dialogue;

performing convolution operation according to the dialogue feature diagram and the target sentence vector through a diagram convolution module in a dialogue generating model to be trained to obtain the target convolution feature of the original sentence;

outputting a prediction response sentence of the original sentence according to the target convolution characteristic through a sentence generating module in a dialog generating model to be trained;

determining the training loss of the dialog generation model to be trained according to the predicted response sentence and the sample response sentence;

updating model parameters of the dialogue generating model according to the training loss until a preset training stopping condition is reached, and obtaining a trained dialogue generating model;

and acquiring the target graph convolution parameter based on the graph convolution module of the trained dialogue generating model.

In some embodiments of the present application, the training unit is specifically configured to:

acquiring a subgraph of a graph node where the original statement is located according to the conversation feature graph, wherein the subgraph comprises the graph node where the original statement is located and a first adjacent node of the graph node where the original statement is located;

and carrying out convolution operation according to the subgraph, the target sentence vector and the default weight parameter of the graph convolution module through the graph convolution module to obtain the target convolution characteristic.

obtaining a statement interaction relation of the sample conversation;

and generating a dialogue characteristic graph of the sample dialogue according to the sentence interaction relation, wherein the dialogue characteristic graph takes the original sentence as a graph node.

in some embodiments of the application, the sentence interaction relationship includes at least one of an occurrence time sequence and a response relationship between the sentences of the sample dialog, and the training unit is specifically configured to:

taking each statement in a target statement set as a graph node to obtain a graph node set of the sample dialogue, wherein the target statement set comprises an original statement and an extended statement of the original statement in the sample dialogue;

and connecting edges of all nodes in the graph node set according to at least one of the occurrence time sequence and the response relation among all statements of the sample conversation to obtain a conversation feature graph of the sample conversation.

acquiring an initial Laplace matrix of the dialogue characteristic diagram;

and obtaining the target Laplace matrix based on the initial Laplace matrix.

acquiring sample emotion information and sample topic information of the original sentence;

and according to a preset emotion weight coefficient, a preset topic weight coefficient, the sample emotion information and the sample topic information, adding an attention coefficient between a first adjacent node of a graph node where the original sentence is located and the graph node where the original sentence is located in the initial laplacian matrix to obtain the target laplacian matrix.

and acquiring a weight matrix of the graph convolution module from the trained dialog generation model, and taking the weight matrix as the target graph convolution parameter.

In a specific implementation, the above units may be implemented as independent entities, or may be combined arbitrarily to be implemented as the same or several entities, and the specific implementation of the above units may refer to the foregoing method embodiments, which are not described herein again.

since the apparatus for generating dialog response statements may execute the steps in the method for generating dialog response statements in any embodiment corresponding to fig. 1 to 9, the advantageous effects that can be achieved by the method for generating dialog response statements in any embodiment corresponding to fig. 1 to 9 in the present application can be achieved, for details, see the foregoing description, and are not repeated herein.

In addition, in order to better implement the generation method of the dialog response statement in the embodiment of the present application, based on the generation method of the dialog response statement, an embodiment of the present application further provides an electronic device, referring to fig. 11, fig. 11 shows a schematic structural diagram of the electronic device in the embodiment of the present application, specifically, the electronic device in the embodiment of the present application includes a processor 1101, and when the processor 1101 is used to execute the computer program stored in the memory 1102, each step of the generation method of the dialog response statement in any embodiment corresponding to fig. 1 to 9 is implemented; alternatively, the processor 1101 is configured to implement the functions of the units in the corresponding embodiment shown in fig. 10 when executing the computer program stored in the memory 1102.

Illustratively, a computer program may be partitioned into one or more modules/units, which are stored in the memory 1102 and executed by the processor 1101 to implement embodiments of the present application. One or more modules/units may be a series of computer program instruction segments capable of performing certain functions, the instruction segments being used to describe the execution of a computer program in a computer device.

The electronic device may include, but is not limited to, a processor 1101, a memory 1102. Those skilled in the art will appreciate that the illustration is merely an example of an electronic device and does not constitute a limitation of an electronic device, and may include more or less components than those illustrated, or combine some of the components, or be different components, for example, an electronic device may further include an input output device, a network access device, a bus, etc., and the processor 1101, the memory 1102, the input output device, the network access device, etc., are connected via the bus.

The Processor 1101 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, the processor being the control center for the electronic device and various interfaces and lines connecting the various parts of the overall electronic device.

The memory 1102 may be used to store computer programs and/or modules, and the processor 1101 implements various functions of the computer device by running or executing the computer programs and/or modules stored in the memory 1102 and calling data stored in the memory 1102. The memory 1102 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, video data, etc.) created according to the use of the electronic device, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the apparatus for generating a dialog response statement, the electronic device and the corresponding units thereof described above may refer to descriptions of a method for generating a dialog response statement in any embodiment corresponding to fig. 1 to 9, and are not described herein again in detail.

It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions or by associated hardware controlled by the instructions, which may be stored in a computer readable storage medium and loaded and executed by a processor.

For this reason, an embodiment of the present application provides a computer-readable storage medium, where multiple instructions are stored, and the instructions can be loaded by a processor to execute steps in a method for generating a dialog response statement in any embodiment of the present application corresponding to fig. 1 to 9, and for specific operations, reference may be made to descriptions of the method for generating a dialog response statement in any embodiment corresponding to fig. 1 to 9, which are not described herein again.

Wherein the computer-readable storage medium may include: read Only Memory (ROM), random Access Memory (RAM), magnetic or optical disks, and the like.

Since the instructions stored in the computer-readable storage medium can execute the steps in the method for generating the dialog response statement in any embodiment of the present application, such as that shown in fig. 1 to 9, the beneficial effects that can be achieved by the method for generating the dialog response statement in any embodiment of the present application, such as that shown in fig. 1 to 9, can be achieved, which are described in detail in the foregoing description and are not repeated herein.

The method, the apparatus, the electronic device, and the computer-readable storage medium for generating a dialog response statement provided in the embodiments of the present application are described in detail above, and specific examples are applied herein to explain the principles and implementations of the present application, and the description of the embodiments is only used to help understand the method and its core idea of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A method for generating a dialog response sentence, the method comprising:

acquiring a statement to be replied;

2. The method of generating a dialogue response sentence according to claim 1, wherein the scene information is selected from at least one of target emotion information, target topic information, and target extension information.

3. The generation method of dialogue response sentence according to claim 1, wherein the scenario information includes target emotion information, target topic information, and target extension information;

the step of obtaining the target representation vector of the to-be-replied statement according to a preset vector format based on the scene information in the to-be-replied statement specifically includes:

acquiring an original sentence vector of the sentence to be replied;

4. The method for generating the dialog response sentence according to any one of claims 1 to 3, wherein the performing feature extraction based on the target representation vector to obtain the sentence feature of the sentence to be replied includes:

acquiring preset target image convolution parameters;

acquiring a preset target Laplace matrix;

5. The method for generating the dialog answer sentence according to claim 4, wherein the obtaining the sentence feature of the sentence to be replied by performing convolution calculation according to the target representation vector, the target graph convolution parameter, and the target laplacian matrix comprises:

6. The method for generating dialogue response statements according to claim 4, wherein the obtaining of preset target graph convolution parameters further comprises:

obtaining a dialogue characteristic diagram of the sample dialogue;

determining the training loss of the dialog generation model to be trained according to the predicted response sentences and the sample response sentences;

and acquiring the target graph convolution parameter based on the graph convolution module of the trained dialogue generation model.

7. The method according to claim 6, wherein the performing convolution operation according to the dialog feature map and the target sentence vector by using a map convolution module in the dialog generation model to be trained to obtain the target convolution feature of the original sentence comprises:

8. The method of generating a dialog response statement according to claim 6, wherein the step of obtaining a dialog feature map of the sample dialog comprises:

obtaining a statement interaction relation of the sample conversation;

9. The method according to claim 8, wherein the sentence interaction relationship includes at least one of an occurrence time sequence and a response relationship between the sentences of the sample conversation, and wherein generating the conversation feature map of the sample conversation based on the sentence interaction relationship includes:

taking each statement in a target statement set as a graph node to obtain a graph node set of the sample conversation, wherein the target statement set comprises an original statement and an extended statement of the original statement in the sample conversation;

10. The method for generating dialog answer sentences according to claim 6, wherein the obtaining of the preset target Laplacian matrix further comprises:

acquiring an initial Laplace matrix of the dialogue characteristic diagram;

and obtaining the target Laplace matrix based on the initial Laplace matrix.

11. The method for generating a dialog response statement according to claim 10, wherein the obtaining the target laplacian matrix based on the initial laplacian matrix includes:

12. The method of generating dialogue response statements according to claim 6, wherein obtaining the target graph convolution parameters based on the graph convolution module of the trained dialogue generating model comprises:

and acquiring a weight matrix of the graph convolution module from the trained dialogue generation model, and taking the weight matrix as the target graph convolution parameter.

13. A dialog response sentence generation apparatus, characterized by comprising:

the acquisition unit is used for acquiring the sentence to be replied;

14. An electronic device, comprising a processor and a memory, wherein the memory stores a computer program, and the processor executes the steps in the generation method of a dialogue response sentence according to any one of claims 1 to 12 when calling the computer program in the memory.

15. A computer-readable storage medium, having stored thereon a computer program which is loaded by a processor to execute the steps in the generation method of a dialogue response sentence according to any one of claims 1 to 12.