CN113868395A - Multi-round dialogue generation type model establishing method and system, electronic equipment and medium - Google Patents

Multi-round dialogue generation type model establishing method and system, electronic equipment and medium Download PDF

Info

Publication number
CN113868395A
CN113868395A CN202111180118.3A CN202111180118A CN113868395A CN 113868395 A CN113868395 A CN 113868395A CN 202111180118 A CN202111180118 A CN 202111180118A CN 113868395 A CN113868395 A CN 113868395A
Authority
CN
China
Prior art keywords
text
vector
layer
attention distribution
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111180118.3A
Other languages
Chinese (zh)
Inventor
刘伟硕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Mininglamp Software System Co ltd
Original Assignee
Beijing Mininglamp Software System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Mininglamp Software System Co ltd filed Critical Beijing Mininglamp Software System Co ltd
Priority to CN202111180118.3A priority Critical patent/CN113868395A/en
Publication of CN113868395A publication Critical patent/CN113868395A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The application discloses a method, a system, an electronic device and a medium for establishing a multi-round conversation generation model, wherein the method for establishing the multi-round conversation generation model comprises the following steps: constructing an initial multi-round dialogue generating model based on an encoding layer of an attention mechanism and a decoding layer of an LSTM network; processing a text through the coding layer to obtain a text vector and an attention distribution vector, and processing the text vector and the attention distribution vector through the decoding layer to obtain a response text; and carrying out back propagation calculation on the initial multi-round conversation generative model through the response text to update the coding layer, and obtaining a final multi-round conversation generative model. The invention solves the problem of storing the call information of the previous wheel by designing the attention mechanism aiming at the multi-wheel call scene, and improves the utilization rate and the mining degree of the call information of the previous wheel.

Description

Multi-round dialogue generation type model establishing method and system, electronic equipment and medium
Technical Field
The present application relates to the field of deep learning technologies, and in particular, to a method, a system, an electronic device, and a medium for establishing a multi-round dialog generation model.
Background
In the prior art, the multi-round dialog generation model is mainly established through the following two schemes, one is a pipeline-based method, and the other is a deep learning network-based method. The method for generating the dialogue based on pipeline mainly comprises three parts, namely natural language understanding, dialogue state management, natural language generation and the like, and the generalization capability of the model is poor because the overall expression of the model is limited by all the parts; the multi-round conversation generation mode based on the deep learning network is mainly limited by storing and utilizing the conversation information of the previous round, the background information is increased along with the increase of the number of the conversation rounds, and the basic information such as the conversation mode, the sequence length and the like is not controlled. However, how to solve the storage problem of the conventional wheel-to-wheel call information and improve the utilization rate and the mining degree of the conventional wheel-to-wheel call information become an urgent problem to be solved.
Disclosure of Invention
The embodiment of the application provides a multi-turn dialogue generating model establishing method, a multi-turn dialogue generating model establishing system, electronic equipment and a multi-turn dialogue generating model establishing medium, and at least solves the problems of low dialogue generating quality, low utilization rate and mining degree of the call information of the previous wheel, unreasonable dialogue information storage and the like through the multi-turn dialogue generating model establishing method and the multi-turn dialogue generating model establishing system.
The invention provides a multi-round dialogue generating model establishing method, which comprises the following steps:
constructing an initial multi-round dialogue generating model based on an encoding layer of an attention mechanism and a decoding layer of an LSTM network;
processing a text through the coding layer to obtain a text vector and an attention distribution vector, and processing the text vector and the attention distribution vector through the decoding layer to obtain a response text;
and carrying out back propagation calculation on the initial multi-round conversation generative model through the response text to update the coding layer, and obtaining a final multi-round conversation generative model.
In the above method for establishing a multi-round dialog generation model, the processing the text by the coding layer to obtain a text vector and an attention distribution vector, and the processing the text vector and the attention distribution vector by the decoding layer to obtain a response text includes:
performing feature extraction on the text to obtain text features, and performing vectorization on the text features to obtain text vectors;
the text vectors are subjected to scoring processing through the key matrix and the value matrix of the coding layer to obtain the attention distribution vectors;
splicing the text vector and the attention distribution vector to obtain a spliced vector;
and obtaining response text according to the splicing vector through a decoding layer.
In the above method for establishing a multi-round dialog generation model, the step of obtaining a concatenation vector after concatenating the text vector and the attention distribution vector includes:
and processing the text vector and the attention distribution vector based on a jump connection mode to obtain the splicing vector.
In the above method for establishing a multi-round dialog generation model, the step of obtaining the attention distribution vector after scoring the text vector by the key matrix and the value matrix of the coding layer includes:
performing product operation on the text vector according to the key matrix to obtain an operation result;
and performing product operation on the operation result according to the value matrix to obtain the attention distribution vector.
In the above method for establishing a multi-round dialog generation model, the processing the text by the coding layer to obtain a text vector and an attention distribution vector, and the processing the text vector and the attention distribution vector by the decoding layer to obtain a response text further includes:
and adjusting the dimension of the splicing vector through a multi-layer perceptron structure of the coding layer.
In the above method for establishing a multi-round dialog generating model, the step of updating the coding layer by performing back propagation calculation on the multi-round dialog generating model through the response text to obtain a final multi-round dialog generating model includes:
and after the initial multi-turn dialog generating model is subjected to back propagation calculation according to the response text to obtain a loss function value, updating model parameters according to the loss function value to obtain the final multi-turn dialog generating model.
In the above-mentioned multi-round dialog generation type model building method, the model parameter includes at least one of the key matrix and the value matrix.
The present invention also provides a multi-round dialogue generating model building system, wherein the multi-round dialogue generating model building system is suitable for the multi-round dialogue generating model building method, and comprises:
the encoding layer construction unit is used for constructing an encoding layer based on an attention mechanism, processing a text through the encoding layer to obtain a text vector and an attention distribution vector, and processing the text vector and the attention distribution vector through the decoding layer to obtain a response text;
and the decoding layer construction unit is used for constructing a decoding layer based on an LSTM network, and updating the coding layer by performing back propagation calculation on the initial multi-turn dialogue generating model through the response text to obtain a final multi-turn dialogue generating model.
The present invention also provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements any one of the above-mentioned methods for creating a multi-turn dialog generation model when executing the computer program.
The invention further provides an electronic device readable storage medium, which stores computer program instructions, and the computer program instructions, when executed by the processor, implement any one of the above-mentioned multi-turn dialog generation model building methods.
Compared with the prior art, the multi-round conversation generation type model establishing method, the multi-round conversation generation type model establishing system, the electronic equipment and the medium provided by the invention have the advantages that when the forward propagation calculation is carried out on the text in the model training stage, each round of text carries out attention calculation, the used value matrix and the key matrix are the value matrix and the key matrix of the previous round, the value matrix and the key matrix of the previous round contain the information of all previous rounds of conversations, the related information of the current round is updated in the value matrix and the key matrix during the backward propagation and is used for the later conversations, and the forward-round conversation information is simply stored by using only two matrices. The problem that the utilization rate and the mining degree of the forward call information are low due to the fact that the forward call information is unreasonably stored is solved, and the natural language processing capacity is improved.
The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the application.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a flow diagram of a method for multi-pass dialog-generating modeling according to an embodiment of the present application;
FIG. 2 is a block diagram of a multi-pass dialog-generating model according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a multi-round dialog-generating modeling system according to the present invention;
fig. 4 is a frame diagram of an electronic device according to an embodiment of the present application.
Wherein the reference numerals are:
an encoding layer construction unit: 51;
a decoding layer construction unit: 52;
80 parts of a bus;
a processor: 81;
a memory: 82;
a communication interface: 83.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application.
It is obvious that the drawings in the following description are only examples or embodiments of the present application, and that it is also possible for a person skilled in the art to apply the present application to other similar contexts on the basis of these drawings without inventive effort. Moreover, it should be appreciated that such a development effort might be complex and tedious, but would nevertheless be a routine undertaking of design, fabrication, and manufacture for those of ordinary skill having the benefit of this disclosure, and thus should not be construed as a limitation of this disclosure.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as referred to herein means two or more. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.
The invention establishes and reasonably stores and utilizes the model of the forward wheel conversation information by the design and introduction of the attention mechanism, and has positive significance for analyzing the user conversation scene, establishing the user portrait and improving the conversation generation quality.
The present invention will be described with reference to specific examples.
Example one
The present embodiment provides a multi-round dialog-generating model building method. Referring to fig. 1 to 2, fig. 1 is a flowchart of a multi-round dialog generation model building method according to an embodiment of the present application; fig. 2 is a framework diagram of a multi-round dialogue-generating modeling according to an embodiment of the present application, and as shown in fig. 1 to 2, the multi-round dialogue-generating modeling method includes the following steps:
step S1: constructing an initial multi-round dialogue generating model based on an encoding layer of an attention mechanism and a decoding layer of an LSTM network;
step S2: processing the text through the coding layer to obtain a text vector and an attention distribution vector, and processing the text vector and the attention distribution vector through the decoding layer to obtain a response text;
step S3: and carrying out back propagation calculation on the initial multi-round conversation generative model through the response text to update the coding layer, so as to obtain a final multi-round conversation generative model.
In one embodiment, after the initial multi-turn dialog generation model is constructed based on the encoding layer of the attention mechanism and the decoding layer of the LSTM network, performing model training on the initialized multi-turn dialog generation model, firstly performing forward propagation calculation on text data input by a user (forward propagation means that a text vector and an attention distribution vector are obtained by processing a text through the coding layer, and a response text is obtained by processing the text vector and the attention distribution vector through the decoding layer), and performing back propagation calculation on the initialized multi-turn dialog generating model according to the forward propagation result (the back propagation calculation refers to updating model parameters according to the loss function values after performing back propagation calculation on the initialized multi-turn dialog generating model according to the response text to obtain the loss function values), and then obtaining the final multi-turn dialog generating model.
In an embodiment, the step S2 of processing the text through the encoding layer to obtain the text vector and the attention distribution vector, and processing the text vector and the attention distribution vector through the decoding layer to obtain the response text includes:
performing feature extraction on the text to obtain text features, and performing vectorization on the text features to obtain text vectors;
the text vectors are subjected to scoring processing through the key matrix and the value matrix of the coding layer to obtain attention distribution vectors;
splicing the text vector and the attention distribution vector to obtain a splicing direction;
and obtaining response text according to the splicing vector through a decoding layer.
In an embodiment, the step of obtaining the attention distribution vector after scoring the text vector by the key matrix and the value matrix of the coding layer includes:
performing product operation on the text vector according to the key matrix to obtain an operation result;
and performing product operation on the operation result according to the value matrix to obtain an attention distribution vector.
In specific implementation, after a text is input into a coding layer, the coding layer performs feature extraction on the text to obtain text features and vectorizes the text features to obtain a text vector; and after the text vector and the value matrix are multiplied in the coding layer to obtain an operation result, after the operation result and the key matrix are multiplied, scoring is carried out on the text vector, namely, the attention of the text vector is judged to be distributed to which place, so that an attention distribution vector is obtained. In the process of training a model through forward propagation and backward propagation calculation to obtain new model parameters, a value matrix and a key matrix are initialized randomly during a first round of conversation, the model is updated and the value matrix and the key matrix are optimized after a final loss function result is obtained through model training, and the value matrix and the key matrix obtained in the previous round of model training are adopted in the forward propagation calculation value matrix and the key matrix during each later round of conversation.
In an embodiment, the step of obtaining a stitching vector after stitching the text vector and the attention distribution vector includes:
and processing the text vector and the attention distribution vector based on a jump connection mode to obtain a splicing vector.
In specific implementation, according to a jump connection concept using a residual error network, a text vector and an attention distribution vector are summed, that is, vector addition is performed, so as to obtain a spliced vector.
In an embodiment, the step of processing the text by the encoding layer to obtain the text vector and the attention distribution vector, and the step of processing the text vector and the attention distribution vector by the decoding layer to obtain the response text further includes:
and adjusting the dimension of the splicing vector through a multi-layer perceptron structure of the coding layer.
In a specific implementation, the dimension of the splicing vector is adjusted through a multi-layer perceptron structure of an encoding layer according to the dimension of an input vector set by an LSTM network in a decoding layer.
In an embodiment, the step S3 of updating the encoding layer by performing a back propagation calculation on the multi-turn dialog generating model through the response text to obtain a final multi-turn dialog generating model includes:
and after the initial multi-turn dialogue generating model is subjected to back propagation calculation according to the response text to obtain a loss function value, updating the model parameters according to the loss function value to obtain a final multi-turn dialogue generating model.
In an embodiment, the model parameters include at least one of a key matrix and a value matrix.
Example two
Referring to fig. 3, fig. 3 is a schematic structural diagram of a multi-round dialog generation model building system according to the present invention. As shown in fig. 3, the multi-round dialogue generating model building system according to the present invention is applicable to the above-mentioned multi-round dialogue generating model building method, and includes:
the encoding layer construction unit 51 is configured to construct an encoding layer based on an attention mechanism, process a text through the encoding layer to obtain a text vector and an attention distribution vector, and process the text vector and the attention distribution vector through the decoding layer to obtain a response text;
and a decoding layer construction unit 52 which constructs a decoding layer based on the LSTM network, and updates the encoding layer by performing back propagation calculation on the initial multi-round dialog generation model through the response text to obtain a final multi-round dialog generation model.
EXAMPLE III
Referring to fig. 4, this embodiment discloses a specific implementation of an electronic device. The electronic device may include a processor 81 and a memory 82 storing computer program instructions.
Specifically, the processor 81 may include a Central Processing Unit (CPU), or A Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present Application.
Memory 82 may include, among other things, mass storage for data or instructions. By way of example, and not limitation, memory 82 may include a Hard Disk Drive (Hard Disk Drive, abbreviated HDD), a floppy Disk Drive, a Solid State Drive (SSD), flash memory, an optical Disk, a magneto-optical Disk, tape, or a Universal Serial Bus (USB) Drive or a combination of two or more of these. Memory 82 may include removable or non-removable (or fixed) media, where appropriate. The memory 82 may be internal or external to the anomaly data monitoring device, where appropriate. In a particular embodiment, the memory 82 is a Non-Volatile (Non-Volatile) memory. In particular embodiments, Memory 82 includes Read-Only Memory (ROM) and Random Access Memory (RAM). The ROM may be mask-programmed ROM, Programmable ROM (PROM), Erasable PROM (FPROM), Electrically Erasable PROM (EFPROM), Electrically rewritable ROM (EAROM), or FLASH Memory (FLASH), or a combination of two or more of these, where appropriate. The RAM may be a Static Random-Access Memory (SRAM) or a Dynamic Random-Access Memory (DRAM), where the DRAM may be a Fast Page Mode Dynamic Random-Access Memory (FPMDRAM), an Extended data output Dynamic Random-Access Memory (EDODRAM), a Synchronous Dynamic Random-Access Memory (SDRAM), and the like.
The memory 82 may be used to store or cache various data files for processing and/or communication use, as well as possible computer program instructions executed by the processor 81.
The processor 81 implements any of the multi-round dialog-generating model building methods of the above embodiments by reading and executing computer program instructions stored in the memory 82.
In some of these embodiments, the electronic device may also include a communication interface 83 and a bus 80. As shown in fig. 4, the processor 81, the memory 82, and the communication interface 83 are connected via the bus 80 to complete communication therebetween.
The communication interface 83 is used for implementing communication between modules, devices, units and/or equipment in the embodiment of the present application. The communication port 83 may also be implemented with other components such as: and data communication is carried out among external equipment, image/abnormal data monitoring equipment, a database, external storage, an image/abnormal data monitoring workstation and the like.
The bus 80 includes hardware, software, or both to couple the components of the electronic device to one another. Bus 80 includes, but is not limited to, at least one of the following: data Bus (Data Bus), Address Bus (Address Bus), Control Bus (Control Bus), Expansion Bus (Expansion Bus), and Local Bus (Local Bus). By way of example, and not limitation, Bus 80 may include an Accelerated Graphics Port (AGP) or other Graphics Bus, an Enhanced Industry Standard Architecture (EISA) Bus, a Front-Side Bus (FSB), a Hyper Transport (HT) Interconnect, an ISA (ISA) Bus, an InfiniBand (InfiniBand) Interconnect, a Low Pin Count (LPC) Bus, a memory Bus, a microchannel Architecture (MCA) Bus, a PCI (Peripheral Component Interconnect) Bus, a PCI-Express (PCI-X) Bus, a Serial Advanced Technology Attachment (SATA) Bus, a Video Electronics Bus (audio Electronics Association), abbreviated VLB) bus or other suitable bus or a combination of two or more of these. Bus 80 may include one or more buses, where appropriate. Although specific buses are described and shown in the embodiments of the application, any suitable buses or interconnects are contemplated by the application.
The electronic device may be connected to a multi-pass dialog-generating modeling system to implement the method in conjunction with fig. 1-2.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
In summary, the invention aims at the storage mode, the attention design mode and the introduction mode of the residual error network of the forward call information, establishes and reasonably stores and utilizes the model of the forward call information through the design and introduction of the attention mechanism, solves the problems of low conversation generation quality, low utilization rate and mining degree of the forward call information, unreasonable conversation information storage and the like, analyzes the conversation situation of the user to establish the user portrait, improves the utilization rate and mining degree of the forward call information, and has positive significance for improving the conversation generation quality.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent application shall be subject to the protection scope of the appended claims.

Claims (10)

1. A method for establishing a multi-turn dialogue generating model is characterized in that the method for establishing the multi-turn dialogue generating model to be applied to a multi-turn dialogue scene comprises the following steps:
constructing an initial multi-round dialogue generating model based on an encoding layer of an attention mechanism and a decoding layer of an LSTM network;
processing a text through the coding layer to obtain a text vector and an attention distribution vector, and processing the text vector and the attention distribution vector through the decoding layer to obtain a response text;
and carrying out back propagation calculation on the initial multi-round conversation generative model through the response text to update the coding layer, and obtaining a final multi-round conversation generative model.
2. The method as claimed in claim 1, wherein the step of processing the text by the encoding layer to obtain a text vector and an attention distribution vector, and the step of processing the text vector and the attention distribution vector by the decoding layer to obtain a response text comprises:
performing feature extraction on the text to obtain text features, and performing vectorization on the text features to obtain text vectors;
the text vectors are subjected to scoring processing through the key matrix and the value matrix of the coding layer to obtain the attention distribution vectors;
splicing the text vector and the attention distribution vector to obtain a spliced vector;
and obtaining the response text according to the splicing vector through the decoding layer.
3. The method of claim 2, wherein said step of concatenating the text vector and the attention distribution vector to obtain a concatenated vector comprises:
and processing the text vector and the attention distribution vector based on a jump connection mode to obtain the splicing vector.
4. The method as claimed in claim 2, wherein the step of obtaining the attention distribution vector after scoring the text vector by the key matrix and the value matrix of the coding layer comprises:
performing product operation on the text vector according to the key matrix to obtain an operation result;
and performing product operation on the operation result according to the value matrix to obtain the attention distribution vector.
5. The method as claimed in claim 2, wherein the step of processing the text by the encoding layer to obtain a text vector and an attention distribution vector, and the step of processing the text vector and the attention distribution vector by the decoding layer to obtain a response text further comprises:
and adjusting the dimension of the splicing vector through a multi-layer perceptron structure of the coding layer.
6. The method as claimed in claim 5, wherein said step of updating the coding layer by performing back propagation calculation on the multi-turn dialog generation model through the response text to obtain a final multi-turn dialog generation model comprises:
and after the initial multi-turn dialog generating model is subjected to back propagation calculation according to the response text to obtain a loss function value, updating model parameters according to the loss function value to obtain the final multi-turn dialog generating model.
7. The method of claim 6, wherein said model parameters comprise at least one of said key matrix and said value matrix.
8. A multiple-round dialogue-generating modeling system adapted to the multiple-round dialogue-generating modeling method of any one of claims 1 to 7, comprising:
the encoding layer construction unit is used for constructing an encoding layer based on an attention mechanism, processing a text through the encoding layer to obtain a text vector and an attention distribution vector, and processing the text vector and the attention distribution vector through the decoding layer to obtain a response text;
and the decoding layer construction unit is used for constructing a decoding layer based on an LSTM network, and updating the coding layer by performing back propagation calculation on the initial multi-turn dialogue generating model through the response text to obtain a final multi-turn dialogue generating model.
9. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the multi-round dialog-generating model building method of any of claims 1 to 7 when executing the computer program.
10. An electronic device readable storage medium having stored thereon computer program instructions which, when executed by the processor, implement the multi-pass dialog-generating model building method of any one of claims 1 to 7.
CN202111180118.3A 2021-10-11 2021-10-11 Multi-round dialogue generation type model establishing method and system, electronic equipment and medium Pending CN113868395A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111180118.3A CN113868395A (en) 2021-10-11 2021-10-11 Multi-round dialogue generation type model establishing method and system, electronic equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111180118.3A CN113868395A (en) 2021-10-11 2021-10-11 Multi-round dialogue generation type model establishing method and system, electronic equipment and medium

Publications (1)

Publication Number Publication Date
CN113868395A true CN113868395A (en) 2021-12-31

Family

ID=79002470

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111180118.3A Pending CN113868395A (en) 2021-10-11 2021-10-11 Multi-round dialogue generation type model establishing method and system, electronic equipment and medium

Country Status (1)

Country Link
CN (1) CN113868395A (en)

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106776578A (en) * 2017-01-03 2017-05-31 竹间智能科技(上海)有限公司 Talk with the method and device of performance for lifting conversational system
US20170372200A1 (en) * 2016-06-23 2017-12-28 Microsoft Technology Licensing, Llc End-to-end memory networks for contextual language understanding
US20180329884A1 (en) * 2017-05-12 2018-11-15 Rsvp Technologies Inc. Neural contextual conversation learning
US20180329998A1 (en) * 2017-05-15 2018-11-15 Apple Inc. Optimizing dialogue policy decisions for digital assistants using implicit feedback
EP3486842A1 (en) * 2017-11-17 2019-05-22 Digital Genius Limited Template generation for a conversational agent
CN110032633A (en) * 2019-04-17 2019-07-19 腾讯科技(深圳)有限公司 More wheel dialog process method, apparatus and equipment
CN110413752A (en) * 2019-07-22 2019-11-05 中国科学院自动化研究所 More wheel speech understanding methods, system, device based on dialog logic
US20200098353A1 (en) * 2018-09-28 2020-03-26 Capital One Services, Llc Adversarial learning framework for persona-based dialogue modeling
CN110929476A (en) * 2019-09-27 2020-03-27 中国人民解放军63626部队 Task type multi-round dialogue model construction method based on mixed granularity attention mechanism
US20200285705A1 (en) * 2019-03-05 2020-09-10 Salesforce.Com, Inc. Agent persona grounded chit-chat generation framework
CN112231457A (en) * 2020-10-19 2021-01-15 北京明略昭辉科技有限公司 Multi-turn dialogue generation method and device for chatting robot and chatting robot
US20210082398A1 (en) * 2019-09-13 2021-03-18 Mitsubishi Electric Research Laboratories, Inc. System and Method for a Dialogue Response Generation System
CN113239174A (en) * 2021-06-09 2021-08-10 华南师范大学 Hierarchical multi-round conversation generation method and device based on double-layer decoding
CN113342947A (en) * 2021-05-26 2021-09-03 华南师范大学 Multi-round dialog text generation method capable of sensing dialog context relative position information
US11132988B1 (en) * 2020-10-22 2021-09-28 PolyAI Limited Dialogue system, a dialogue method, and a method of training

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170372200A1 (en) * 2016-06-23 2017-12-28 Microsoft Technology Licensing, Llc End-to-end memory networks for contextual language understanding
CN106776578A (en) * 2017-01-03 2017-05-31 竹间智能科技(上海)有限公司 Talk with the method and device of performance for lifting conversational system
US20180329884A1 (en) * 2017-05-12 2018-11-15 Rsvp Technologies Inc. Neural contextual conversation learning
US20180329998A1 (en) * 2017-05-15 2018-11-15 Apple Inc. Optimizing dialogue policy decisions for digital assistants using implicit feedback
EP3486842A1 (en) * 2017-11-17 2019-05-22 Digital Genius Limited Template generation for a conversational agent
US20200098353A1 (en) * 2018-09-28 2020-03-26 Capital One Services, Llc Adversarial learning framework for persona-based dialogue modeling
US20200285705A1 (en) * 2019-03-05 2020-09-10 Salesforce.Com, Inc. Agent persona grounded chit-chat generation framework
CN110032633A (en) * 2019-04-17 2019-07-19 腾讯科技(深圳)有限公司 More wheel dialog process method, apparatus and equipment
CN110413752A (en) * 2019-07-22 2019-11-05 中国科学院自动化研究所 More wheel speech understanding methods, system, device based on dialog logic
US20210082398A1 (en) * 2019-09-13 2021-03-18 Mitsubishi Electric Research Laboratories, Inc. System and Method for a Dialogue Response Generation System
CN110929476A (en) * 2019-09-27 2020-03-27 中国人民解放军63626部队 Task type multi-round dialogue model construction method based on mixed granularity attention mechanism
CN112231457A (en) * 2020-10-19 2021-01-15 北京明略昭辉科技有限公司 Multi-turn dialogue generation method and device for chatting robot and chatting robot
US11132988B1 (en) * 2020-10-22 2021-09-28 PolyAI Limited Dialogue system, a dialogue method, and a method of training
CN113342947A (en) * 2021-05-26 2021-09-03 华南师范大学 Multi-round dialog text generation method capable of sensing dialog context relative position information
CN113239174A (en) * 2021-06-09 2021-08-10 华南师范大学 Hierarchical multi-round conversation generation method and device based on double-layer decoding

Similar Documents

Publication Publication Date Title
CN113010778A (en) Knowledge graph recommendation method and system based on user historical interest
Huai et al. Zerobn: Learning compact neural networks for latency-critical edge systems
CN113570030A (en) Data processing method, device, equipment and storage medium
US20230252294A1 (en) Data processing method, apparatus, and device, and computer-readable storage medium
CN113569705A (en) Scene segmentation point judgment method and system, storage medium and electronic device
CN112732920A (en) BERT-based multi-feature fusion entity emotion analysis method and system
CN113743277A (en) Method, system, equipment and storage medium for short video frequency classification
CN112801302A (en) Machine learning model publishing method and system based on interface
CN116505954B (en) Huffman coding method, system, device and medium
CN113868395A (en) Multi-round dialogue generation type model establishing method and system, electronic equipment and medium
CN113722471A (en) Text abstract generation method, system, electronic equipment and medium
CN114049539B (en) Collaborative target identification method, system and device based on decorrelation binary network
CN113569703B (en) Real division point judging method, system, storage medium and electronic equipment
CN113569704B (en) Segmentation point judging method, system, storage medium and electronic equipment
CN113255334A (en) Method, system, electronic device and storage medium for calculating word vector
CN111404703B (en) Time delay optimization method and device, equipment and storage medium
CN112819513A (en) Text chain generation method, device, equipment and medium
CN114445510A (en) Image optimization method and device, electronic equipment and storage medium
CN113591655A (en) Video contrast loss calculation method, system, storage medium and electronic device
CN113192491B (en) Acoustic model generation method, acoustic model generation device, computer equipment and storage medium
CN113343669B (en) Word vector learning method, system, electronic equipment and storage medium
US20240045895A1 (en) Information processing device, information processing method, and program
CN118093143B (en) Data scheduling method and device in large language model decoding stage
CN113852443B (en) Low-complexity multi-user detection method in SCMA system
US20230386500A1 (en) Method and appartus for audio processing using a nested convolutional neural network architechture

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination