CN111611352A

CN111611352A - Dialog generation method and device, electronic equipment and readable storage medium

Info

Publication number: CN111611352A
Application number: CN201910138890.5A
Authority: CN
Inventors: 黄林豪
Original assignee: Beijing Didi Infinity Technology and Development Co Ltd
Current assignee: Beijing Didi Infinity Technology and Development Co Ltd
Priority date: 2019-02-25
Filing date: 2019-02-25
Publication date: 2020-09-01

Abstract

The application provides a dialog generation method, a dialog generation device, an electronic device and a readable storage medium, wherein the method comprises the following steps: acquiring a historical associated output statement and a current user input statement, wherein the historical associated output statement is an associated output statement corresponding to a last user input statement adjacent to the current user input statement; and taking the historical associated output statement and the current user input statement as the input of a first deep learning model to generate a current associated output statement, wherein the current associated output statement is an associated output statement corresponding to the current user input statement. According to the embodiment of the application, when the reply sentence is carried out on the current user input sentence, the content of the current user input sentence is considered, and the conversation content of the previous sentence adjacent to the current user input sentence is related, so that the connection of the adjacent two conversation contents is tighter.

Description

Dialog generation method and device, electronic equipment and readable storage medium

Technical Field

The present application relates to the field of internet technologies, and in particular, to a dialog generation method, an apparatus, an electronic device, and a readable storage medium.

Background

In the service field, customer service plays a very important role, which is related to the use experience of users, so enterprises generally invest more cost for customer service to maintain the relationship with customers. However, with the rising of personnel cost and urgent need for improving user experience, the emergence of automatic and intelligent customer service robots can reduce the personnel cost of enterprises for customer service investment at present.

Disclosure of Invention

In view of the above, an object of the embodiments of the present application is to provide a dialog generating method, apparatus, electronic device and readable storage medium, which can enable content connections between two adjacent dialogs to be closer.

According to an aspect of the present application, there is provided a dialog generation method, the method including:

acquiring a historical associated output statement and a current user input statement, wherein the historical associated output statement is an associated output statement corresponding to a last user input statement adjacent to the current user input statement;

and taking the historical associated output statement and the current user input statement as the input of a first deep learning model to generate a current associated output statement, wherein the current associated output statement is an associated output statement corresponding to the current user input statement.

In some embodiments, the first deep learning model comprises an encoding model and a decoding model;

the step of generating a current association output sentence by taking the historical association output sentence and the current user input sentence as the input of a first deep learning model comprises the following steps:

the coding model combines vectors corresponding to the historical associated output statement and the current user input statement to obtain a current associated input vector;

and the decoding model processes the current association input vector to generate the current association output statement.

In some embodiments, the coding model comprises a first coding submodel, a second coding submodel, and a first vector combination submodel;

the coding model combines the vectors corresponding to the historical associated output statement and the current user input statement to obtain a current associated input vector, and the coding model comprises the following steps:

the first coding sub-model codes the historical association output statement into a historical association output vector;

the second coding sub-model codes the current user input statement into a current user input vector;

and the first vector combination sub-model combines the historical association output vector and the current user input vector to obtain the current association input vector.

the coding model codes both the historical associated output statement and the current user input statement to respectively obtain a historical associated output vector and a current user input vector;

and the decoding model takes the historical association output vector and the current user input vector as input to be processed to generate the current association output statement.

In some embodiments, the coding model comprises a first coding sub-model and a second coding sub-model;

the step of coding both the history correlation output statement and the current user input statement by the coding model to respectively obtain a history correlation output vector and a current user input vector comprises the following steps:

the first coding sub-model codes the historical association output statement to obtain the historical association output vector;

the second coding sub-model codes the current user input vector to obtain the current user input vector;

the decoding model comprises a second vector combination submodel and a sentence decoding submodel;

the decoding model processes the historical association output vector and the current user input vector as input to generate the current association output statement, and the decoding model comprises the following steps:

the second vector combination sub-model combines the historical association output vector and the current user input vector to obtain a current association input vector;

and the sentence decoding submodel processes the current association input vector to generate the current association output sentence.

In some embodiments, prior to the step of generating a current associated output statement using the historical associated output statement and the current user input statement as inputs to a first deep learning model, the method further comprises:

judging whether the time difference value between the historical associated output statement and the current user input statement reaches a time threshold value or not;

and if the time difference value between the historical associated output statement and the current user input statement does not reach the time threshold, executing a step of generating the current associated output statement by taking the historical associated output statement and the current user input statement as the input of a first deep learning model.

In some embodiments, prior to the step of obtaining the historical associated output statement and the current user input statement, the method further comprises:

obtaining a plurality of training sample sets, wherein each training sample set comprises a plurality of groups of adjacent question-answer conversations, and each group of question-answer conversations comprises training question sentences and training answer sentences;

training a second deep learning model by using the plurality of training sample sets until the training of the second deep learning model is finished, and taking the second deep learning model with the finished training as the first deep learning model, wherein when the second deep learning model is trained, the input of the second deep learning model comprises training question sentences of a target group question-answering conversation and training answer sentences of a last group question-answering conversation adjacent to the target group question-answering conversation.

In some embodiments, the step of training the second deep learning model using each training sample set comprises:

taking the training question sentences in the current group of question-answering conversations and the training answer sentences in the previous group of question-answering conversations adjacent to the current group of question-answering conversations as the input of the second deep learning model to obtain the current answer sentences;

if the deviation value of the current answer sentence and the training answer sentences in the current group of question-answer sessions is larger than the deviation threshold value, adjusting the parameters of the second deep learning model to obtain a new deep learning model as the second deep learning model, taking the next group of question-answer sessions adjacent to the current group of question-answer sessions as the new current group of question-answer sessions, and returning to retraining;

and if the deviation value between the current answer sentence and the training answer sentence in the current group of question and answer sentences is less than or equal to the deviation threshold value, taking the deep learning model corresponding to the current group of question and answer conversations as the second deep learning model for finishing the training.

In some embodiments, after the step of obtaining a plurality of training sample sets, the method further comprises:

aligning each set of question-answering sessions such that each training question sentence and each training answer sentence in the plurality of training sample sets contains n words, where n is a positive integer.

In some embodiments, the step of aligning each set of question-answering sessions comprises:

if the number of words in the target training question sentence or the target training answer sentence is larger than n, reserving n words in the target training question sentence or the target training answer sentence, and removing other words;

if the number of words in the target training question sentence or the target training answer sentence is less than n, at least one word is supplemented to the target training question sentence or the target training answer sentence, so that the number of words in the target training question sentence or the target training answer sentence is n.

According to another aspect of the present application, there is provided a dialog generation apparatus, the apparatus comprising:

an obtaining module, configured to obtain a historical associated output statement and a current user input statement, where the historical associated output statement is an associated output statement corresponding to a last user input statement adjacent to the current user input statement;

and the processing module is used for taking the historical associated output statement and the current user input statement as the input of a first deep learning model to generate a current associated output statement, wherein the current associated output statement is an associated output statement corresponding to the current user input statement.

the processing module is specifically configured to:

the processing module is further specifically configured to:

In some embodiments, the processing module is further to:

and if the time difference value between the historical associated output statement and the current user input statement does not reach the time threshold, taking the historical associated output statement and the current user input statement as the input of a first deep learning model to generate the current associated output statement.

In some embodiments, the obtaining module is further configured to obtain a plurality of training sample sets, wherein each training sample set includes a plurality of sets of contiguous question-and-answer sessions, and each set of question-and-answer sessions includes a training question sentence and a training answer sentence;

the device further comprises:

and the training module is used for training a second deep learning model by using the plurality of training sample sets until the second deep learning model is trained, and taking the second deep learning model after the training as the first deep learning model, wherein when the second deep learning model is trained, the input of the second deep learning model comprises training question sentences of a target group question-answering conversation and training answer sentences of a last group question-answering conversation adjacent to the target group question-answering conversation.

In some embodiments, the training module is specifically configured to:

In some embodiments, the training module is further to:

In some embodiments, the training module is further specifically configured to:

According to another aspect of the present application, there is provided an electronic device including: the server comprises a processor, a storage medium and a bus, wherein the storage medium stores machine-readable instructions executable by the processor, when the server runs, the processor and the storage medium are communicated through the bus, and the processor executes the machine-readable instructions to execute the steps of the dialog generation method.

According to another aspect of the present application, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the dialog generation method described above.

Based on any one of the above aspects, according to the dialog generation method, apparatus, electronic device, and readable storage medium provided in the embodiments of the present application, the associated output sentence corresponding to the last user input sentence adjacent to the current user input sentence is used as the historical associated output sentence, and is used as the input of the first deep learning model together with the current user input sentence, so as to obtain the current associated output sentence as the reply to the current user input sentence.

In addition, in some embodiments, the relevance between the history associated output sentence and the current user input sentence is determined by determining whether a time difference between the history associated output sentence and the current user input sentence reaches a time threshold, so that when the contents of the history associated output sentence and the current user input sentence are determined to have a relevance, the history associated output sentence and the current user input sentence are used as the input of the first deep learning model to generate the current associated output sentence, and when the contents of the history associated output sentence and the current user input sentence are determined not to have a relevance, the current user input sentence is used as the input of the first deep learning model to generate the current associated output sentence, so as to at least partially reduce the calculation amount.

In addition, in some embodiments, each training question sentence and each training answer sentence in the training sample are aligned to the same word number, so that the input word number is the same in each training of the deep learning model during training, the structure of the model does not need to be changed, and the training speed of the model is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.

Fig. 1 is a schematic structural diagram of an electronic device provided in an embodiment of the present application;

FIG. 2 is a schematic flow chart diagram illustrating a dialog generation method provided by an embodiment of the present application;

FIG. 3 is a schematic block diagram of a first deep learning model according to an embodiment of the present application;

FIG. 4 is a schematic flow chart of the substeps of S208 of FIG. 2;

FIG. 5 is a schematic flow chart of the substeps of S2081 of FIG. 4;

FIG. 6 is another schematic block diagram of a first deep learning model in an embodiment of the present application;

FIG. 7 is another schematic flow chart of S208 of FIG. 2;

FIG. 8 is a schematic flow chart of the substeps of S2086 of FIG. 7;

FIG. 9 is a schematic flow chart of the substeps of S2087 of FIG. 7;

FIG. 10 is another schematic flow chart diagram of a dialog generation method according to an embodiment of the present application;

FIG. 11 is a schematic flow diagram of training a first deep learning model;

FIG. 12 is a schematic flow chart of the substeps of S204 of FIG. 11;

FIG. 13 is another schematic flow chart of training to obtain a first deep learning model;

FIG. 14 is a schematic flow chart of the substeps of S201 of FIG. 13;

fig. 15 is a schematic structural diagram illustrating a dialog generating device according to an embodiment of the present application.

In the figure: 100-an electronic device; 101-a memory; 102-a processor; 103-a memory controller; 104-peripheral interfaces; 105-a radio frequency unit; 106-communication bus/signal line; 300-dialog generating means; 301-an obtaining module; 302-a processing module; 303-training module.

Detailed Description

In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it should be understood that the drawings in the present application are for illustrative and descriptive purposes only and are not used to limit the scope of protection of the present application. Additionally, it should be understood that the schematic drawings are not necessarily drawn to scale. The flowcharts used in this application illustrate operations implemented according to some embodiments of the present application. It should be understood that the operations of the flow diagrams may be performed out of order, and steps without logical context may be performed in reverse order or simultaneously. One skilled in the art, under the guidance of this application, may add one or more other operations to, or remove one or more operations from, the flowchart.

In addition, the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

In order to enable a person skilled in the art to use the present disclosure, the following embodiments are given in connection with the application-specific scenario "intelligent dialog customer service system". It will be apparent to those skilled in the art that the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the application. Although the present application is primarily described in the context of an intelligent dialog service system, it should be understood that this is merely one exemplary embodiment. The application can be applied to any other application scenario of automatic dialog. For example, the present application can be applied to various dialogue environments including an e-government service question answering, an automatic dialogue robot, an automatic customer service robot, and the like. The present application may also include any service system for intelligent conversations, for example, a system for sending and/or receiving couriers, a service system for business to business parties transactions. Applications of the system or method of the present application may include web pages, plug-ins for browsers, client terminals, customization systems, internal analysis systems, or artificial intelligence robots, among others, or any combination thereof.

It should be noted that in the embodiments of the present application, the term "comprising" is used to indicate the presence of the features stated hereinafter, but does not exclude the addition of further features.

Before the application is proposed, in an application scenario of an intelligent dialogue customer service system, for example, a task-based dialogue system is generally adopted, and when a user has a dialogue, an output statement matched with a current user input statement of the user is output according to the current user input statement of the user based on technologies such as rule matching and slot extraction.

However, generally speaking, the continuous dialog of the user generally has a certain continuity, and the rule matching based method adopts a "question-and-answer" method, and performs rule matching according to the current user input sentence of the user, so that the obtained output sentence lacks the context understanding of the user semantic, and the logic between the output sentences is not tight enough.

Based on the above defects, an implementation manner provided by the embodiment of the present application is as follows: and taking the associated output sentence corresponding to the last user input sentence adjacent to the current user input sentence as a historical associated output sentence, and taking the associated output sentence and the current user input sentence as the input of the first deep learning model, so as to obtain the current associated output sentence which is used as a reply to the current user input sentence.

Referring to fig. 1, a schematic block diagram of an electronic device 100 provided in an embodiment of the present application is shown, in which the electronic device 100 may be, but is not limited to, a smart phone, a Personal Computer (PC), a tablet computer, a laptop computer, a Personal Digital Assistant (PDA), and the like. The electronic device 100 includes a memory 101, a memory controller 103, one or more processors 102 (only one is shown), a peripheral interface 104, a radio frequency unit 105, and the like. These components communicate with each other via one or more communication buses/signal lines 106.

The memory 101 may be used to store software programs and modules, such as program instructions/modules corresponding to the dialog generating device 300 provided in the embodiment of the present application, and the processor 102 executes various functional applications and image processing by running the software programs and modules stored in the memory 101, such as the dialog generating method provided in the embodiment of the present application.

The Memory 101 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like.

The processor 102 may be an integrated circuit chip having signal processing capabilities. The processor 102 may be a general-purpose processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), a voice processor, a video processor, and the like; but may also be a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor 102 may be any conventional processor or the like.

The peripheral interface 104 couples various input/output devices to the processor 102 as well as to the memory 101. In some embodiments, the peripheral interface 104, the processor 102, and the memory controller 103 may be implemented in a single chip. In other embodiments of the present application, they may be implemented by separate chips.

The rf unit 105 is used for receiving and transmitting electromagnetic waves, and implementing interconversion between the electromagnetic waves and electrical signals, so as to communicate with a communication network or other devices.

It will be appreciated that the configuration shown in FIG. 1 is merely illustrative and that electronic device 100 may include more or fewer components than shown in FIG. 1 or have a different configuration than shown in FIG. 1. The components shown in fig. 1 may be implemented in hardware, software, or a combination thereof.

Referring to fig. 2, fig. 2 shows a schematic flowchart of a dialog generating method provided in an embodiment of the present application, and as a possible implementation manner, the dialog generating method includes the following steps:

s206, acquiring the historical associated output statement and the current user input statement.

And S208, taking the historical associated output sentence and the current user input sentence as the input of the first deep learning model, and generating the current associated output sentence.

In the embodiment of the present application, taking the application scenario of the intelligent customer service dialog system as an example, when providing a question-answering conversation service for a user, output sentences for answering the user every time are stored.

When the current user input statement is obtained, obtaining a historical associated output statement associated with the current user input statement, wherein the historical associated output statement is an associated output statement corresponding to a last user input statement adjacent to the current user input statement.

For example, if the user has a conversation with the intelligent customer service dialog system as follows:

the user: sentence 1 is input.

The system comprises the following steps: the intelligent reply is 1.

The user: sentence 2 is input.

The system comprises the following steps: and intelligently replying to 2.

The user: sentence 3 is input.

The system comprises the following steps: smart reply 3.

If the current user input statement is an input statement 2, the historical associated output statement is an intelligent reply 1; and if the current user input statement is the input statement 3, the history association output statement is the intelligent reply 2.

In this application, when a current user input sentence is obtained, the obtained history associated output sentence and the current user input sentence are used as input of a first deep learning model, so that the first deep learning model generates a current associated output sentence according to the history associated output sentence and the current user input sentence, where the current associated output sentence is an associated output sentence corresponding to the current user input sentence, that is: and the intelligent customer service dialogue system replies to the current user input statement.

For example, in the above example, if the current user input sentence is input sentence 2, the intelligent customer service dialog system uses input sentence 2 and intelligent reply 1 together as the input of the first deep learning model, so as to obtain intelligent reply 2; or, if the current user input sentence is the input sentence 3, the intelligent customer service dialog system takes the input sentence 3 and the intelligent reply 2 as the input of the first deep learning model, so as to obtain the intelligent reply 3.

Based on the above design, in the dialog generation method provided in the embodiment of the present application, the associated output sentence corresponding to the last user input sentence adjacent to the current user input sentence is used as the historical associated output sentence, and is used as the input of the first deep learning model together with the current user input sentence, so as to obtain the current associated output sentence as the reply to the current user input sentence.

Generally, when a deep learning model is used to reply a user input sentence, the user input sentence needs to be vector-encoded, and then the encoded vector is processed and decoded for output, so as to obtain an output sentence for replying the user input sentence.

Optionally, as a possible implementation manner, please refer to fig. 3, where fig. 3 is a schematic structural diagram of a first deep learning model in an embodiment of the present application, in the embodiment of the present application, the first deep learning model includes a coding model (Encode) and a decoding model (Decode), the coding model is used to code a user input sentence into a vector, and the decoding model is used to process the coded vector to generate an output sentence.

Referring to fig. 4, fig. 4 is a schematic flow chart of the sub-steps of S208 in fig. 2, and as a possible implementation, S208 includes the following sub-steps:

s2081, the coding model combines the vectors corresponding to the historical associated output statement and the current user input statement to obtain the current associated input vector.

S2082, the decoding model processes the current association input vector to generate a current association output statement.

For example, in the above example, if the current user input statement is input statement 2, and the historical associated output statement is intelligent reply 1, the coding model combines vectors corresponding to input statement 2 and intelligent reply 1, so as to obtain a current associated input vector; the decoding model further processes the current association input vector to generate a current association output statement (the current association output statement is smart reply 2).

In a possible implementation manner, please continue to refer to fig. 3, the first deep learning model may adopt a sequence2sequence structure based on LSTM (Long Short-Term Memory network), and is constructed and trained based on a tensrflow deep learning framework.

And the coding model comprises a first coding submodel, a second coding submodel and a first vector combination submodel, wherein the first coding submodel and the second coding submodel respectively comprise a plurality of LSTMs.

Referring to fig. 5, fig. 5 is a schematic flow chart of the sub-step of S2081 in fig. 4, and as a possible implementation manner, S2081 includes the following sub-steps:

s20811, the first coding sub-model codes the history correlation output statement into a history correlation output vector.

S20812, the second coding sub-model codes the current user input sentence into the current user input vector.

S20813, the first vector combination sub-model combines the historical association output vector and the current user input vector to obtain the current association input vector.

For example, in the example that the input statement 2 is used as the current user input statement and the intelligent reply 1 is used as the historical associated output statement, when the current associated input vector is obtained, the first encoding sub-model encodes the intelligent reply 1 to obtain the historical associated output vector; the second coding sub-model codes the input statement 2 to obtain a current user input vector; and the first vector combination sub-model combines the historical associated output vector obtained after the intelligent reply 1 is coded with the current user input vector obtained after the input statement 2 is coded to obtain the current associated input vector.

As a possible implementation manner, the current associated input vector is a combination of the current user input vector and the historical associated output vector, and at this time, the dimension of the current associated input vector is the dimension of the current user input vector and the dimension of the historical associated output vectorSummation, e.g. if the current user input vector is a vector with dimension 512

And the historical associated output vector is a vector with the dimension of 512

The current associated output vector is a vector with dimension 1024

As another possible implementation manner, the current associated input vector may also be a sum of elements corresponding to both the current user input vector and the historical associated output vector, and at this time, the dimension of the current associated input vector is the same as the current user input vector or the historical associated output vector, for example, if the current user input vector is a vector with a dimension of 512

The current associated output vector is a vector with dimension 512

The first deep learning model is implemented by improving the coding model without improving the decoding model. As another possible implementation, both the encoding model and the decoding model may be improved.

For example, referring to fig. 6, fig. 6 is another schematic structural diagram of a first deep learning model in the embodiment of the present application, where the first deep learning model also includes a coding model and a decoding model, the coding model is used to code a user input sentence into a vector, and the decoding model is used to process the coded vector to generate an output sentence.

However, in the first deep learning model shown in fig. 6, the coding model only codes the current user input sentence and the historical associated output sentence, and does not combine the two sentences to obtain the current associated input vector, but combines the two sentences by the decoding model to obtain the current associated input vector, thereby generating the current associated output sentence.

At this time, as another possible implementation manner, please refer to fig. 7, where fig. 7 is another schematic flowchart of S208 in fig. 2, and S208 may further include the following sub-steps:

s2086, the coding model codes both the historical associated output statement and the current user input statement to respectively obtain a historical associated output vector and a current user input vector.

S2087, the decoding model takes the historical association output vector and the current user input vector as input to process, and generates a current association output statement.

For example, in the above example, if the current user input statement is the input statement 2, and the history associated output statement is the intelligent reply 1, the coding model codes both the input statement 2 and the intelligent reply 1 to obtain the current user input vector corresponding to the input statement 2 and the history associated output vector corresponding to the intelligent reply 1, respectively.

The coding model takes the current user input vector corresponding to the input statement 2 and the history associated output vector corresponding to the intelligent reply 1 as input, such as the above vectors

Sum vector

And performing data processing so as to generate a current association output statement.

As shown in fig. 6, as another structure of the first deep learning model, the coding model includes a first coding sub-model and a second coding sub-model, and the decoding model includes a second vector combination sub-model and a sentence decoding sub-model.

When executing S2086, please refer to fig. 8, fig. 8 is a schematic flowchart of the sub-steps of S2086 in fig. 7, and S2086 includes the following sub-steps as a possible implementation manner:

s20861, the first coding sub-model codes the history correlation output statement to obtain a history correlation output vector.

S20862, the second coding sub-model codes the current user input vector to obtain the current user input vector.

For example, in the above example, if the current user input statement is input statement 2, and the history association output statement is intelligent reply 1, the first encoding sub-model encodes the intelligent reply 1 to obtain a history association output vector; and the second coding sub-model codes the input statement 2 to obtain the current user input vector.

While the decoding model executes S2087, please refer to fig. 9, fig. 9 is a schematic flowchart of the sub-step of S2087 in fig. 7, and S2087 includes the following sub-steps as a possible implementation manner:

s20871, the second vector combination sub-model combines the historical associated output vector and the current user input vector to obtain the current associated input vector.

S20872, the sentence decoding submodel processes the current association input vector to generate the current association output sentence.

For example, in the above example, if the current user input vector is a vector with dimension 512

Decoding model in obtaining vector

Sum vector

Then, the second vector combination submodel firstly carries out the direction conversionMeasurement of

Sum vector

Combining to obtain the current associated input vector, such as the 1024-dimensional vector

The sentence decoding submodel then decodes the 1024-dimensional vector

And processing to generate a current associated output statement to obtain an intelligent reply 2.

It should be noted that the difference between the two first deep learning models shown in fig. 3 and fig. 6 is that, in the first deep learning model shown in fig. 3, the combination of the history correlation output vector and the current user input vector is implemented by the coding model, and the decoding model is not improved; in the first deep learning model shown in fig. 6, the combination of the history correlation output vector and the current user input vector is implemented by the decoding model, and both the encoding model and the decoding model are improved.

Moreover, it is worth mentioning that in some other possible implementation manners of the embodiment of the present application, the historical associated output vector and the current user input vector may be combined and implemented by a vector combination model, and the coding model only converts the historical associated output statement and the current user input statement into the corresponding historical associated output vector and current user input vector, respectively, and at this time, the decoding model does not need to be improved.

The above-mentioned policy for obtaining the current associated output sentence as a reply to the current user input sentence is based on an application scenario of continuous dialog of the user by using the associated output sentence corresponding to the last user input sentence adjacent to the current user input sentence as the historical associated output sentence and using the current user input sentence and the current user input sentence as the input of the first deep learning model, and in other application scenarios, if the interval time between two adjacent dialogs of the user and the intelligent customer service dialog system is too long, there may be no logical association between the two dialogs.

For example, in the application scenario of the user interacting with the smart customer service dialog system, if the time interval between the input statement 1 and the input statement 2 of the user is too long, the input statement 1 and the input statement 2 may not have a relationship, for example, the input statement 1 may be asking weather, and the input statement 2 is asking for a phone number.

Therefore, as a possible implementation manner, please refer to fig. 10, fig. 10 is another schematic flowchart of a dialog generating method provided in an embodiment of the present application, based on the flowchart shown in fig. 2, before executing S208, the dialog generating method further includes the following steps:

s207, judging whether the time difference value between the history correlation output statement and the current user input statement reaches a time threshold value; if so, go to S208; if not, go to S209.

And S209, taking the current user input sentence as the input of the first deep learning model, and generating a current association output sentence.

For example, in the application scenario of the user interacting with the intelligent customer service dialog system, when the intelligent customer service dialog system receives the input sentence 2 of the user, if the intelligent customer service dialog system determines that the time difference between the input sentence 2 and the intelligent reply 1 does not reach the time threshold, the content representing the intelligent reply 1 and the content representing the input sentence 2 may have a correlation, and at this time, S208 is executed, and the first coding sub-model and the second coding sub-model are both input and output on the first deep learning model shown in fig. 3; on the contrary, if the intelligent customer service dialog system determines that the time difference between the input sentence 2 and the intelligent reply 1 reaches the time threshold, the content representing the intelligent reply 1 and the input sentence 2 may not have relevance, at this time, S209 is executed, the current user input sentence is used as the input of the first deep learning model to generate the current relevant output sentence, and the expression on the first deep learning model shown in fig. 3 is that the first coding sub model has no input and output, and the second coding sub model has input and output.

It should be noted that, in some possible implementations, the time threshold may be a value predetermined for the intelligent customer service dialog system, such as 2 minutes or 5 minutes, or may be a value received from the user, as long as the intelligent customer service dialog system stores the time threshold.

Based on the above design, in the dialog generation method provided in the embodiment of the present application, by determining whether a time difference between the history associated output statement and the current user input statement reaches a time threshold, and further determining the association between the history associated output statement and the current user input statement, when it is determined that the contents of the history associated output statement and the current user input statement are associated with each other, the history associated output statement and the current user input statement are used as the inputs of the first deep learning model to generate the current associated output statement, and when it is determined that the contents of the history associated output statement and the current user input statement are not associated with each other, the current user input statement is used as the input of the first deep learning model to generate the current associated output statement, so as to at least partially reduce the amount of computation.

The implementation manner is a process of generating a dialog by using the first deep learning model, and before the dialog is generated by specifically applying the first deep learning model, the model needs to be trained, so that the first deep learning model specifically used for generating the dialog is obtained.

Referring to fig. 11, fig. 11 is a schematic flowchart of a process for training a first deep learning model, as a possible implementation manner, the process for training the first deep learning model includes the following steps:

s200, a plurality of training sample sets are obtained.

S202, training the second deep learning model by using a plurality of training sample sets.

And S204, taking the second deep learning model after training as the first deep learning model.

The method comprises the steps of obtaining a plurality of training sample sets, wherein each training sample set comprises a plurality of groups of adjacent question-answer conversations, and each group of question-answer conversations comprises a training question sentence and a training answer sentence.

For example, in the above example of the user's dialog with the intelligent customer service dialog system, there are 3 groups of adjacent question-answering sessions, including input sentence 1-intelligent reply 1, input sentence 2-intelligent reply 2, and input sentence 3-intelligent reply 3. In the included 3 sets of adjacent question-answering sessions, input sentence 1, input sentence 2 and input sentence 3 are all training question-asking sentences, and intelligent answer 1, intelligent answer 2 and intelligent answer 3 are all training answer sentences.

When the second deep learning model is trained, the input of the second deep learning model comprises the training question sentences of the target group question-answering conversation and the training answer sentences of the previous group question-answering conversation adjacent to the target group question-answering conversation, similar to the process of generating the conversation by using the first deep learning model.

For example, when input sentence 2 — Smart reply 2 is used as the target group question-answer session, the inputs for training the second deep learning model are input sentence 2 and Smart reply 1.

It should be noted that, since the first training question-answering sentence has no adjacent last question-answering conversation in each question-answering conversation group, for example, the input sentence 1 has no adjacent intelligent reply in the above example, at this time, the last question-answering conversation group can be set to 0.

Optionally, referring to fig. 12, fig. 12 is a schematic flowchart of the sub-steps of S204 in fig. 11, and as a possible implementation, S204 includes the following sub-steps:

s2041, taking the training question sentences in the current group of question-answering conversations and the training answer sentences in the previous group of question-answering conversations adjacent to the current group of question-answering conversations as the input of a second deep learning model to obtain the current answer sentences.

S2042, judging whether the deviation value of the current answer sentence and the training answer sentences in the current group of question-answer sessions is greater than a deviation threshold value; if so, go to S2043; if not, go to S2044.

S2043, adjusting parameters of the second deep learning model to obtain a new deep learning model serving as the second deep learning model, taking the next group of question-answer conversations adjacent to the current group of question-answer conversations as a new current group of question-answer conversations, and returning to retrain.

And S2044, taking the deep learning model corresponding to the current group of question-answering sessions as a second deep learning model after training is finished.

For example, taking the dialog between the user and the intelligent customer service dialog system as an example, when the input sentence 2-intelligent reply 2 is used as the target group question-answer session to train the second deep learning model, the input sentence 2 and the intelligent reply 2 are used as the input of the second deep learning model to obtain the current answer sentence of the second deep learning model; then, calculating a deviation value of the current answer sentence and the intelligent reply 2, for example, by using a cross entropy function, taking the current answer sentence and the intelligent reply 2 as the input of the cross entropy function, taking the calculated cross entropy function value as the deviation value, if the deviation value is greater than a deviation threshold value, indicating that the second deep learning model is not finished being learned, and needing to continue training, executing S2043 at this moment, adjusting parameters of the second deep learning model to obtain a new deep learning model as the second deep learning model, taking the input sentence 3-the intelligent reply 3 as a new current group question-answer session, and returning to retrain the second deep learning model; and if the calculated deviation value is smaller than or equal to the deviation threshold value, the learning of the second deep learning model is ended, and at the moment, the deep learning model corresponding to the current group of question-answering sessions is the second deep learning model with the end of training.

Referring back to fig. 3, as shown in fig. 3, the first deep learning model is built by using an LSTM-based structure, where, for example, in the coding model, each LSTM is used to code one word in a training sentence as a vector, and when training, if the numbers of words in consecutive adjacent training question sentences or consecutive adjacent training answer sentences are different, the number of LSTMs needs to be adjusted, which results in low training efficiency.

For example, the numbers of words included in the two sentences "how you have eaten" and "how today," are 5 and 6, respectively, and the number of LSTM of the coding model needs to be adjusted to 5 and 6 when the second deep learning model is trained in the two sentences, respectively, resulting in low training efficiency.

Based on this, optionally, referring to fig. 13, fig. 13 is another schematic flowchart for obtaining the first deep learning model through training, and as a possible implementation manner, after performing S200, the process for obtaining the first deep learning model through training further includes the following steps:

s201, aligning a plurality of training samples.

After a plurality of training sample pairs are carried out, each training question sentence and each training answer sentence contain n characters, and n is a positive integer.

Based on the above design, according to the dialog generation method provided by the embodiment of the application, each training question sentence and each training answer sentence in the training sample are aligned to the same word number, so that the input word number is the same in each training of the deep learning model during training, the structure of the model does not need to be changed, and the model training speed is increased.

Optionally, referring to fig. 14, fig. 14 is a schematic flowchart of the sub-steps of S201 in fig. 13, and as a possible implementation, S201 includes the following sub-steps:

s2011, comparing the word number of the target training question sentence or the training answer sentence with n; if so, go to S2012; if so, go to step S2013.

S2012, reserving n words in the target training question sentence or the target training answer sentence, and removing other words.

And S2013, supplementing at least one word to the target training question sentence or the target training answer sentence, so that the number of words of the target training question sentence or the target training answer sentence is n.

In one possible example, assuming n equals 10, for the example above where the statement is "how today's weather", the number of words is 6, less than 10, the statement is supplemented to 10 words, e.g., it may be supplemented with a set default character "dnn _ padding", after which the statement becomes "how today's weather dnn _ padding dnn _ padding," where "dnn _ padding" works out as a whole to one word; in another example, if the sentence is "today's wind and workday's suggested going-out-away", the number of words is 12, which is greater than 10, 10 words in the sentence of the newspaper are retained, and other words are eliminated, wherein, as a possible implementation, the word at the end of the sentence can be optionally eliminated, and the first 10 words are retained, for example, "today's wind and workday's suggested going-out" is retained.

Referring to fig. 15, fig. 15 is a schematic structural diagram illustrating a dialog generating device 300 according to an embodiment of the present application, where the dialog generating device 300 includes an obtaining module 301 and a processing module 302.

The obtaining module 301 is configured to obtain a historical associated output statement and a current user input statement, where the historical associated output statement is an associated output statement corresponding to a previous user input statement adjacent to the current user input statement.

The processing module 302 is configured to use the historical associated output statement and the current user input statement as inputs of the first deep learning model, and generate a current associated output statement, where the current associated output statement is an associated output statement corresponding to the current user input statement.

Optionally, as a possible implementation, the first deep learning model includes an encoding model and a decoding model;

the processing module 302 is specifically configured to:

combining vectors corresponding to the historical associated output sentences and the current user input sentences by the coding model to obtain current associated input vectors;

the decoding model processes the current association input vector to generate a current association output statement.

Optionally, as a possible implementation manner, the coding model includes a first coding sub-model, a second coding sub-model, and a first vector combination sub-model;

the processing module 302 is specifically configured to:

the decoding model takes the historical associated output vector and the current user input vector as input to be processed to generate a current associated output statement.

Optionally, as a possible implementation manner, the coding model includes a first coding sub-model and a second coding sub-model;

the processing module 302 is specifically configured to:

the first coding sub-model codes the historical association output statement to obtain a historical association output vector;

the processing module 302 is further specifically configured to:

and the sentence decoding submodel processes the current association input vector to generate a current association output sentence.

Optionally, as a possible implementation manner, the processing module 302 is further configured to:

and if the time difference value between the historical associated output statement and the current user input statement does not reach the time threshold value, taking the historical associated output statement and the current user input statement as the input of the first deep learning model, and generating the current associated output statement.

Optionally, as a possible implementation manner, the obtaining module 301 is further configured to obtain a plurality of training sample sets, where each training sample set includes a plurality of sets of adjacent question-answering sessions, and each set of question-answering sessions includes a training question sentence and a training answer sentence;

the dialog generating device 300 further includes:

the training module 303 is configured to train the second deep learning model using a plurality of training sample sets until the second deep learning model is trained, and use the second deep learning model after training as the first deep learning model, where when training the second deep learning model, inputs of the second deep learning model include training question statements of a target group question-and-answer session and training answer statements of a previous group question-and-answer session adjacent to the target group question-and-answer session.

Optionally, as a possible implementation manner, the training module 303 is specifically configured to:

taking the training question sentences in the current group of question-answering conversations and the training answer sentences in the previous group of question-answering conversations adjacent to the current group of question-answering conversations as the input of a second deep learning model to obtain current answer sentences;

if the deviation value of the current answer sentence and the training answer sentences in the current group of question-answer conversations is larger than the deviation threshold value, adjusting the parameters of the second deep learning model to obtain a new deep learning model as the second deep learning model, taking the next group of question-answer conversations adjacent to the current group of question-answer conversations as the new current group of question-answer conversations, and returning to retraining;

and if the deviation value of the current answer sentence and the training answer sentences in the current group of question-answer sentences is less than or equal to the deviation threshold value, taking the deep learning model corresponding to the current group of question-answer conversations as a second deep learning model for finishing training.

Optionally, as a possible implementation manner, the training module 303 is further configured to:

each set of question-answering sessions is aligned such that each training question sentence and each training answer sentence in the plurality of training sample sets contains n words, where n is a positive integer.

Optionally, as a possible implementation manner, the training module 303 is further specifically configured to:

and if the number of words of the target training question sentence or the target training answer sentence is less than n, supplementing at least one word in the target training question sentence or the target training answer sentence so as to enable the number of words of the target training question sentence or the target training answer sentence to be n.

The modules may be connected or in communication with each other via a wired or wireless connection. The wired connection may include a metal cable, an optical cable, a hybrid cable, etc., or any combination thereof. The wireless connection may comprise a connection over a LAN, WAN, bluetooth, ZigBee, NFC, or the like, or any combination thereof. Two or more modules may be combined into a single module, and any one module may be divided into two or more units.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to corresponding processes in the method embodiments, and are not described in detail in this application. In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and there may be other divisions in actual implementation, and for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or modules through some communication interfaces, and may be in an electrical, mechanical or other form.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.

In summary, according to the dialog generation method, apparatus, electronic device, and readable storage medium provided in the embodiments of the present application, an associated output sentence corresponding to a previous user input sentence adjacent to a current user input sentence is used as a historical associated output sentence, and the historical associated output sentence and the current user input sentence are used as inputs of a first deep learning model together, so as to obtain a current associated output sentence as a reply to the current user input sentence, compared with the prior art, a reply sentence is performed on the current user input sentence, not only the content of the current user input sentence is considered, but also the content of a previous sentence adjacent to the current user input sentence is associated, so that the contents of two adjacent dialogs are connected more tightly; judging the relevance between the history correlation output statement and the current user input statement by judging whether the time difference between the history correlation output statement and the current user input statement reaches a time threshold value or not, so that when the contents of the history correlation output statement and the current user input statement are judged to be correlated, the history correlation output statement and the current user input statement are used as the input of a first deep learning model to generate a current correlation output statement, and when the contents of the history correlation output statement and the current user input statement are judged not to be correlated, the current user input statement is used as the input of the first deep learning model to generate a current correlation output statement so as to at least partially reduce the calculated amount; and each training question sentence and each training answer sentence in the training sample are aligned to the same word number, so that the input word number of each training is the same when the deep learning model is trained, the structure of the model does not need to be changed, and the training speed of the model is improved.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A dialog generation method, characterized in that the method comprises:

2. The method of claim 1, wherein the first deep learning model comprises an encoding model and a decoding model;

3. The method of claim 2, wherein the coding model comprises a first coding sub-model, a second coding sub-model, and a first vector combination sub-model;

4. The method of claim 1, wherein the first deep learning model comprises an encoding model and a decoding model;

5. The method of claim 4, wherein the coding model comprises a first coding sub-model and a second coding sub-model;

6. The method of any of claims 1-5, wherein prior to the step of generating a current associated output sentence using the historical associated output sentence and the current user input sentence as inputs to a first deep learning model, the method further comprises:

7. The method of claim 1, wherein prior to the step of obtaining the historical associated output statement and the current user input statement, the method further comprises:

8. The method of claim 7, wherein the step of training the second deep learning model using each training sample set comprises:

9. The method of claim 7, wherein after the step of obtaining a plurality of training sample sets, the method further comprises:

10. The method of claim 9, wherein the step of aligning each set of question-answering sessions comprises:

11. A dialog generation apparatus, characterized in that the apparatus comprises:

12. The apparatus of claim 11, wherein the first deep learning model comprises an encoding model and a decoding model;

the processing module is specifically configured to:

13. The apparatus of claim 12, wherein the coding model comprises a first coding sub-model, a second coding sub-model, and a first vector combination sub-model;

the processing module is specifically configured to:

14. The apparatus of claim 11, wherein the first deep learning model comprises an encoding model and a decoding model;

the processing module is specifically configured to:

15. The apparatus of claim 14, wherein the coding model comprises a first coding sub-model and a second coding sub-model;

the processing module is specifically configured to:

the processing module is further specifically configured to:

16. The apparatus of any of claims 11-15, wherein the processing module is further configured to:

17. The apparatus of claim 11,

the obtaining module is further configured to obtain a plurality of training sample sets, where each training sample set includes a plurality of sets of adjacent question-answer sessions, and each set of question-answer sessions includes a training question sentence and a training answer sentence;

the device further comprises:

18. The apparatus of claim 17, wherein the training module is specifically configured to:

19. The apparatus of claim 17, wherein the training module is further configured to:

20. The apparatus of claim 19, wherein the training module is further specifically configured to:

21. An electronic device, comprising: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating over the bus when the electronic device is operating, the processor executing the machine-readable instructions to perform the steps of the dialog generation method according to any of claims 1 to 10.

22. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, performs the steps of the dialog generation method according to one of the claims 1 to 10.