CN113868395A

CN113868395A - Multi-round dialogue generation type model establishing method and system, electronic equipment and medium

Info

Publication number: CN113868395A
Application number: CN202111180118.3A
Authority: CN
Inventors: 刘伟硕
Original assignee: Beijing Mininglamp Software System Co ltd
Current assignee: Beijing Mininglamp Software System Co ltd
Priority date: 2021-10-11
Filing date: 2021-10-11
Publication date: 2021-12-31

Abstract

The application discloses a method, a system, an electronic device and a medium for establishing a multi-round conversation generation model, wherein the method for establishing the multi-round conversation generation model comprises the following steps: constructing an initial multi-round dialogue generating model based on an encoding layer of an attention mechanism and a decoding layer of an LSTM network; processing a text through the coding layer to obtain a text vector and an attention distribution vector, and processing the text vector and the attention distribution vector through the decoding layer to obtain a response text; and carrying out back propagation calculation on the initial multi-round conversation generative model through the response text to update the coding layer, and obtaining a final multi-round conversation generative model. The invention solves the problem of storing the call information of the previous wheel by designing the attention mechanism aiming at the multi-wheel call scene, and improves the utilization rate and the mining degree of the call information of the previous wheel.

Description

Multi-round dialogue generation type model establishing method and system, electronic equipment and medium

Technical Field

The present application relates to the field of deep learning technologies, and in particular, to a method, a system, an electronic device, and a medium for establishing a multi-round dialog generation model.

Background

In the prior art, the multi-round dialog generation model is mainly established through the following two schemes, one is a pipeline-based method, and the other is a deep learning network-based method. The method for generating the dialogue based on pipeline mainly comprises three parts, namely natural language understanding, dialogue state management, natural language generation and the like, and the generalization capability of the model is poor because the overall expression of the model is limited by all the parts; the multi-round conversation generation mode based on the deep learning network is mainly limited by storing and utilizing the conversation information of the previous round, the background information is increased along with the increase of the number of the conversation rounds, and the basic information such as the conversation mode, the sequence length and the like is not controlled. However, how to solve the storage problem of the conventional wheel-to-wheel call information and improve the utilization rate and the mining degree of the conventional wheel-to-wheel call information become an urgent problem to be solved.

Disclosure of Invention

The embodiment of the application provides a multi-turn dialogue generating model establishing method, a multi-turn dialogue generating model establishing system, electronic equipment and a multi-turn dialogue generating model establishing medium, and at least solves the problems of low dialogue generating quality, low utilization rate and mining degree of the call information of the previous wheel, unreasonable dialogue information storage and the like through the multi-turn dialogue generating model establishing method and the multi-turn dialogue generating model establishing system.

The invention provides a multi-round dialogue generating model establishing method, which comprises the following steps:

constructing an initial multi-round dialogue generating model based on an encoding layer of an attention mechanism and a decoding layer of an LSTM network;

processing a text through the coding layer to obtain a text vector and an attention distribution vector, and processing the text vector and the attention distribution vector through the decoding layer to obtain a response text;

and carrying out back propagation calculation on the initial multi-round conversation generative model through the response text to update the coding layer, and obtaining a final multi-round conversation generative model.

In the above method for establishing a multi-round dialog generation model, the processing the text by the coding layer to obtain a text vector and an attention distribution vector, and the processing the text vector and the attention distribution vector by the decoding layer to obtain a response text includes:

performing feature extraction on the text to obtain text features, and performing vectorization on the text features to obtain text vectors;

the text vectors are subjected to scoring processing through the key matrix and the value matrix of the coding layer to obtain the attention distribution vectors;

splicing the text vector and the attention distribution vector to obtain a spliced vector;

and obtaining response text according to the splicing vector through a decoding layer.

In the above method for establishing a multi-round dialog generation model, the step of obtaining a concatenation vector after concatenating the text vector and the attention distribution vector includes:

and processing the text vector and the attention distribution vector based on a jump connection mode to obtain the splicing vector.

In the above method for establishing a multi-round dialog generation model, the step of obtaining the attention distribution vector after scoring the text vector by the key matrix and the value matrix of the coding layer includes:

performing product operation on the text vector according to the key matrix to obtain an operation result;

and performing product operation on the operation result according to the value matrix to obtain the attention distribution vector.

In the above method for establishing a multi-round dialog generation model, the processing the text by the coding layer to obtain a text vector and an attention distribution vector, and the processing the text vector and the attention distribution vector by the decoding layer to obtain a response text further includes:

and adjusting the dimension of the splicing vector through a multi-layer perceptron structure of the coding layer.

In the above method for establishing a multi-round dialog generating model, the step of updating the coding layer by performing back propagation calculation on the multi-round dialog generating model through the response text to obtain a final multi-round dialog generating model includes:

and after the initial multi-turn dialog generating model is subjected to back propagation calculation according to the response text to obtain a loss function value, updating model parameters according to the loss function value to obtain the final multi-turn dialog generating model.

In the above-mentioned multi-round dialog generation type model building method, the model parameter includes at least one of the key matrix and the value matrix.

The present invention also provides a multi-round dialogue generating model building system, wherein the multi-round dialogue generating model building system is suitable for the multi-round dialogue generating model building method, and comprises:

the encoding layer construction unit is used for constructing an encoding layer based on an attention mechanism, processing a text through the encoding layer to obtain a text vector and an attention distribution vector, and processing the text vector and the attention distribution vector through the decoding layer to obtain a response text;

and the decoding layer construction unit is used for constructing a decoding layer based on an LSTM network, and updating the coding layer by performing back propagation calculation on the initial multi-turn dialogue generating model through the response text to obtain a final multi-turn dialogue generating model.

The present invention also provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements any one of the above-mentioned methods for creating a multi-turn dialog generation model when executing the computer program.

The invention further provides an electronic device readable storage medium, which stores computer program instructions, and the computer program instructions, when executed by the processor, implement any one of the above-mentioned multi-turn dialog generation model building methods.

Compared with the prior art, the multi-round conversation generation type model establishing method, the multi-round conversation generation type model establishing system, the electronic equipment and the medium provided by the invention have the advantages that when the forward propagation calculation is carried out on the text in the model training stage, each round of text carries out attention calculation, the used value matrix and the key matrix are the value matrix and the key matrix of the previous round, the value matrix and the key matrix of the previous round contain the information of all previous rounds of conversations, the related information of the current round is updated in the value matrix and the key matrix during the backward propagation and is used for the later conversations, and the forward-round conversation information is simply stored by using only two matrices. The problem that the utilization rate and the mining degree of the forward call information are low due to the fact that the forward call information is unreasonably stored is solved, and the natural language processing capacity is improved.

The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the application.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

FIG. 1 is a flow diagram of a method for multi-pass dialog-generating modeling according to an embodiment of the present application;

FIG. 2 is a block diagram of a multi-pass dialog-generating model according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a multi-round dialog-generating modeling system according to the present invention;

fig. 4 is a frame diagram of an electronic device according to an embodiment of the present application.

Wherein the reference numerals are:

an encoding layer construction unit: 51;

a decoding layer construction unit: 52;

80 parts of a bus;

a processor: 81;

a memory: 82;

a communication interface: 83.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application.

It is obvious that the drawings in the following description are only examples or embodiments of the present application, and that it is also possible for a person skilled in the art to apply the present application to other similar contexts on the basis of these drawings without inventive effort. Moreover, it should be appreciated that such a development effort might be complex and tedious, but would nevertheless be a routine undertaking of design, fabrication, and manufacture for those of ordinary skill having the benefit of this disclosure, and thus should not be construed as a limitation of this disclosure.

Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.

Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as referred to herein means two or more. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.

The invention establishes and reasonably stores and utilizes the model of the forward wheel conversation information by the design and introduction of the attention mechanism, and has positive significance for analyzing the user conversation scene, establishing the user portrait and improving the conversation generation quality.

The present invention will be described with reference to specific examples.

Example one

The present embodiment provides a multi-round dialog-generating model building method. Referring to fig. 1 to 2, fig. 1 is a flowchart of a multi-round dialog generation model building method according to an embodiment of the present application; fig. 2 is a framework diagram of a multi-round dialogue-generating modeling according to an embodiment of the present application, and as shown in fig. 1 to 2, the multi-round dialogue-generating modeling method includes the following steps:

step S1: constructing an initial multi-round dialogue generating model based on an encoding layer of an attention mechanism and a decoding layer of an LSTM network;

step S2: processing the text through the coding layer to obtain a text vector and an attention distribution vector, and processing the text vector and the attention distribution vector through the decoding layer to obtain a response text;

step S3: and carrying out back propagation calculation on the initial multi-round conversation generative model through the response text to update the coding layer, so as to obtain a final multi-round conversation generative model.

In one embodiment, after the initial multi-turn dialog generation model is constructed based on the encoding layer of the attention mechanism and the decoding layer of the LSTM network, performing model training on the initialized multi-turn dialog generation model, firstly performing forward propagation calculation on text data input by a user (forward propagation means that a text vector and an attention distribution vector are obtained by processing a text through the coding layer, and a response text is obtained by processing the text vector and the attention distribution vector through the decoding layer), and performing back propagation calculation on the initialized multi-turn dialog generating model according to the forward propagation result (the back propagation calculation refers to updating model parameters according to the loss function values after performing back propagation calculation on the initialized multi-turn dialog generating model according to the response text to obtain the loss function values), and then obtaining the final multi-turn dialog generating model.

In an embodiment, the step S2 of processing the text through the encoding layer to obtain the text vector and the attention distribution vector, and processing the text vector and the attention distribution vector through the decoding layer to obtain the response text includes:

the text vectors are subjected to scoring processing through the key matrix and the value matrix of the coding layer to obtain attention distribution vectors;

splicing the text vector and the attention distribution vector to obtain a splicing direction;

In an embodiment, the step of obtaining the attention distribution vector after scoring the text vector by the key matrix and the value matrix of the coding layer includes:

and performing product operation on the operation result according to the value matrix to obtain an attention distribution vector.

In specific implementation, after a text is input into a coding layer, the coding layer performs feature extraction on the text to obtain text features and vectorizes the text features to obtain a text vector; and after the text vector and the value matrix are multiplied in the coding layer to obtain an operation result, after the operation result and the key matrix are multiplied, scoring is carried out on the text vector, namely, the attention of the text vector is judged to be distributed to which place, so that an attention distribution vector is obtained. In the process of training a model through forward propagation and backward propagation calculation to obtain new model parameters, a value matrix and a key matrix are initialized randomly during a first round of conversation, the model is updated and the value matrix and the key matrix are optimized after a final loss function result is obtained through model training, and the value matrix and the key matrix obtained in the previous round of model training are adopted in the forward propagation calculation value matrix and the key matrix during each later round of conversation.

In an embodiment, the step of obtaining a stitching vector after stitching the text vector and the attention distribution vector includes:

and processing the text vector and the attention distribution vector based on a jump connection mode to obtain a splicing vector.

In specific implementation, according to a jump connection concept using a residual error network, a text vector and an attention distribution vector are summed, that is, vector addition is performed, so as to obtain a spliced vector.

In an embodiment, the step of processing the text by the encoding layer to obtain the text vector and the attention distribution vector, and the step of processing the text vector and the attention distribution vector by the decoding layer to obtain the response text further includes:

In a specific implementation, the dimension of the splicing vector is adjusted through a multi-layer perceptron structure of an encoding layer according to the dimension of an input vector set by an LSTM network in a decoding layer.

In an embodiment, the step S3 of updating the encoding layer by performing a back propagation calculation on the multi-turn dialog generating model through the response text to obtain a final multi-turn dialog generating model includes:

and after the initial multi-turn dialogue generating model is subjected to back propagation calculation according to the response text to obtain a loss function value, updating the model parameters according to the loss function value to obtain a final multi-turn dialogue generating model.

In an embodiment, the model parameters include at least one of a key matrix and a value matrix.

Example two

Referring to fig. 3, fig. 3 is a schematic structural diagram of a multi-round dialog generation model building system according to the present invention. As shown in fig. 3, the multi-round dialogue generating model building system according to the present invention is applicable to the above-mentioned multi-round dialogue generating model building method, and includes:

the encoding layer construction unit 51 is configured to construct an encoding layer based on an attention mechanism, process a text through the encoding layer to obtain a text vector and an attention distribution vector, and process the text vector and the attention distribution vector through the decoding layer to obtain a response text;

and a decoding layer construction unit 52 which constructs a decoding layer based on the LSTM network, and updates the encoding layer by performing back propagation calculation on the initial multi-round dialog generation model through the response text to obtain a final multi-round dialog generation model.

EXAMPLE III

Referring to fig. 4, this embodiment discloses a specific implementation of an electronic device. The electronic device may include a processor 81 and a memory 82 storing computer program instructions.

Specifically, the processor 81 may include a Central Processing Unit (CPU), or A Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present Application.

Memory 82 may include, among other things, mass storage for data or instructions. By way of example, and not limitation, memory 82 may include a Hard Disk Drive (Hard Disk Drive, abbreviated HDD), a floppy Disk Drive, a Solid State Drive (SSD), flash memory, an optical Disk, a magneto-optical Disk, tape, or a Universal Serial Bus (USB) Drive or a combination of two or more of these. Memory 82 may include removable or non-removable (or fixed) media, where appropriate. The memory 82 may be internal or external to the anomaly data monitoring device, where appropriate. In a particular embodiment, the memory 82 is a Non-Volatile (Non-Volatile) memory. In particular embodiments, Memory 82 includes Read-Only Memory (ROM) and Random Access Memory (RAM). The ROM may be mask-programmed ROM, Programmable ROM (PROM), Erasable PROM (FPROM), Electrically Erasable PROM (EFPROM), Electrically rewritable ROM (EAROM), or FLASH Memory (FLASH), or a combination of two or more of these, where appropriate. The RAM may be a Static Random-Access Memory (SRAM) or a Dynamic Random-Access Memory (DRAM), where the DRAM may be a Fast Page Mode Dynamic Random-Access Memory (FPMDRAM), an Extended data output Dynamic Random-Access Memory (EDODRAM), a Synchronous Dynamic Random-Access Memory (SDRAM), and the like.

The memory 82 may be used to store or cache various data files for processing and/or communication use, as well as possible computer program instructions executed by the processor 81.

The processor 81 implements any of the multi-round dialog-generating model building methods of the above embodiments by reading and executing computer program instructions stored in the memory 82.

In some of these embodiments, the electronic device may also include a communication interface 83 and a bus 80. As shown in fig. 4, the processor 81, the memory 82, and the communication interface 83 are connected via the bus 80 to complete communication therebetween.

The communication interface 83 is used for implementing communication between modules, devices, units and/or equipment in the embodiment of the present application. The communication port 83 may also be implemented with other components such as: and data communication is carried out among external equipment, image/abnormal data monitoring equipment, a database, external storage, an image/abnormal data monitoring workstation and the like.

The bus 80 includes hardware, software, or both to couple the components of the electronic device to one another. Bus 80 includes, but is not limited to, at least one of the following: data Bus (Data Bus), Address Bus (Address Bus), Control Bus (Control Bus), Expansion Bus (Expansion Bus), and Local Bus (Local Bus). By way of example, and not limitation, Bus 80 may include an Accelerated Graphics Port (AGP) or other Graphics Bus, an Enhanced Industry Standard Architecture (EISA) Bus, a Front-Side Bus (FSB), a Hyper Transport (HT) Interconnect, an ISA (ISA) Bus, an InfiniBand (InfiniBand) Interconnect, a Low Pin Count (LPC) Bus, a memory Bus, a microchannel Architecture (MCA) Bus, a PCI (Peripheral Component Interconnect) Bus, a PCI-Express (PCI-X) Bus, a Serial Advanced Technology Attachment (SATA) Bus, a Video Electronics Bus (audio Electronics Association), abbreviated VLB) bus or other suitable bus or a combination of two or more of these. Bus 80 may include one or more buses, where appropriate. Although specific buses are described and shown in the embodiments of the application, any suitable buses or interconnects are contemplated by the application.

The electronic device may be connected to a multi-pass dialog-generating modeling system to implement the method in conjunction with fig. 1-2.

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

In summary, the invention aims at the storage mode, the attention design mode and the introduction mode of the residual error network of the forward call information, establishes and reasonably stores and utilizes the model of the forward call information through the design and introduction of the attention mechanism, solves the problems of low conversation generation quality, low utilization rate and mining degree of the forward call information, unreasonable conversation information storage and the like, analyzes the conversation situation of the user to establish the user portrait, improves the utilization rate and mining degree of the forward call information, and has positive significance for improving the conversation generation quality.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent application shall be subject to the protection scope of the appended claims.

Claims

1. A method for establishing a multi-turn dialogue generating model is characterized in that the method for establishing the multi-turn dialogue generating model to be applied to a multi-turn dialogue scene comprises the following steps:

2. The method as claimed in claim 1, wherein the step of processing the text by the encoding layer to obtain a text vector and an attention distribution vector, and the step of processing the text vector and the attention distribution vector by the decoding layer to obtain a response text comprises:

and obtaining the response text according to the splicing vector through the decoding layer.

3. The method of claim 2, wherein said step of concatenating the text vector and the attention distribution vector to obtain a concatenated vector comprises:

4. The method as claimed in claim 2, wherein the step of obtaining the attention distribution vector after scoring the text vector by the key matrix and the value matrix of the coding layer comprises:

5. The method as claimed in claim 2, wherein the step of processing the text by the encoding layer to obtain a text vector and an attention distribution vector, and the step of processing the text vector and the attention distribution vector by the decoding layer to obtain a response text further comprises:

6. The method as claimed in claim 5, wherein said step of updating the coding layer by performing back propagation calculation on the multi-turn dialog generation model through the response text to obtain a final multi-turn dialog generation model comprises:

7. The method of claim 6, wherein said model parameters comprise at least one of said key matrix and said value matrix.

8. A multiple-round dialogue-generating modeling system adapted to the multiple-round dialogue-generating modeling method of any one of claims 1 to 7, comprising:

9. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the multi-round dialog-generating model building method of any of claims 1 to 7 when executing the computer program.

10. An electronic device readable storage medium having stored thereon computer program instructions which, when executed by the processor, implement the multi-pass dialog-generating model building method of any one of claims 1 to 7.