CN110457459B

CN110457459B - Dialog generation method, device, equipment and storage medium based on artificial intelligence

Info

Publication number: CN110457459B
Application number: CN201910759962.8A
Authority: CN
Inventors: 张世西; 贾志强
Original assignee: Cloudminds Robotics Co Ltd
Current assignee: Cloudminds Shanghai Robotics Co Ltd
Priority date: 2019-08-16
Filing date: 2019-08-16
Publication date: 2022-04-08
Anticipated expiration: 2039-08-16
Also published as: CN110457459A

Abstract

The embodiment of the invention relates to the field of intelligent robots, and discloses a dialog generation method, a dialog generation device, network equipment and a storage medium based on artificial intelligence, wherein the dialog generation method based on artificial intelligence comprises the following steps: acquiring a sentence to be replied, and inputting the sentence to be replied into a retrieval model, wherein the retrieval model is used for screening out K candidate replies responding to the sentence to be replied from a preset dialogue corpus, and K is a positive integer; acquiring K candidate replies output by the retrieval model, inputting the sentence to be replied and the K candidate replies into a generative model, screening out a predicted word by using the generative model according to the sentence to be replied, the K candidate replies and the inverse document frequency of each word in a dictionary, and outputting a predicted reply formed by using the predicted word; and acquiring a reply statement according to the prediction reply. The dialog generating method, the dialog generating device, the network equipment and the storage medium based on the artificial intelligence can improve the output precision of the chat robot.

Description

Dialog generation method, device, equipment and storage medium based on artificial intelligence

Technical Field

The invention relates to the field of intelligent robots, in particular to a dialog generation method, a dialog generation device, dialog generation equipment and a storage medium based on artificial intelligence.

Background

Chat robots, also known as dialog systems. Currently, there are two main ways of implementing a chat robot: one is an indexing type and the other is a generating type. The retrieval type chat robot is used for finding out the most suitable answer from a large number of existing candidate answers in a retrieval and matching mode to serve as a response output to a user; the generative chat robot trains a model through a large amount of dialogue data, and generates a response sentence word by word or word by combining dialogue history and user input.

However, the response returned by the retrieval type chat robot is often single, and the answer outside the answer library cannot be output, and the combination of the context information is difficult. The generated responses of the chat robot are difficult to control, syntax errors are easy to occur, and even some inappropriate expressions are generated.

In summary, the output of the two implementation modes of the current chat machine is not ideal enough, and the precision is low.

Disclosure of Invention

The invention aims to provide a conversation generation method, a device, equipment and a storage medium based on artificial intelligence, so that the output precision of a chat robot is improved.

In order to solve the above technical problem, an embodiment of the present invention provides a dialog generating method based on artificial intelligence, including the following steps: acquiring a sentence to be replied, and inputting the sentence to be replied into a retrieval model, wherein the retrieval model is used for screening out K candidate replies responding to the sentence to be replied from a preset dialogue corpus, and K is a positive integer; acquiring K candidate replies output by the retrieval model, inputting the sentence to be replied and the K candidate replies into a generative model, screening out a predicted word by using the generative model according to the sentence to be replied, the K candidate replies and the inverse document frequency of each word in a dictionary, and outputting a predicted reply formed by using the predicted word; and acquiring a reply statement according to the prediction reply.

The embodiment of the present invention further provides a dialog generating device based on artificial intelligence, including: the candidate reply retrieval module is used for acquiring the sentences to be replied and inputting the sentences to be replied into the retrieval model, wherein the retrieval model is used for screening out K candidate replies responding to the sentences to be replied from a preset dialogue corpus, and K is a positive integer; the system comprises a prediction reply acquisition module, a generating model and a prediction reply generation module, wherein the prediction reply acquisition module is used for acquiring K candidate replies output by a retrieval model, inputting a sentence to be replied and the K candidate replies into the generating model, and screening a prediction word by the generating model according to the sentence to be replied, the K candidate replies and the inverse document frequency of each word in a dictionary and outputting the prediction reply formed by the prediction word; and the reply statement acquisition module is used for acquiring the reply statement according to the prediction reply.

An embodiment of the present invention further provides a network device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the artificial intelligence based dialog generation method described above.

Embodiments of the present invention also provide a computer-readable storage medium storing a computer program, which when executed by a processor implements the above-mentioned artificial intelligence-based dialog generation method.

Compared with the prior art, the embodiment of the invention can obtain K candidate replies according to the retrieval model by inputting the sentence to be replied into the retrieval model; inputting the sentence to be replied and the K candidate replies into the generative model, so that the sentence to be replied and the searched K candidate replies can be utilized by the generative model, and the output of the generative model is more standard; furthermore, the generating model combines the inverse document frequency of each word in the dictionary when screening the predicted words, so that the probability of high-frequency words as replies can be reduced, the generation of universal answers is reduced, and the output precision of the chat robot is improved.

In addition, after inputting the sentence to be replied and the K candidate reply input generative models, the method further comprises: and coding the sentence to be replied and the K candidate replies to obtain a vector to be replied and K candidate reply vectors. Obtaining a context vector according to the vector to be replied and the K candidate reply vectors; calculating the comprehensive score of each word in the dictionary according to the context vector and the inverse document frequency, and acquiring the word with the highest comprehensive score as a predicted word; a predicted reply is obtained based on the predicted word composition. The sentence to be replied and the K candidate replies are coded into vectors, the context vectors are obtained according to the coded result, and the context vectors are used as the input of the decoder of the generative model, so that the input of the decoder can be optimized, the generative model can fully learn the information of the candidate replies searched by the retrieval model and the expression modes of the sentence to be replied and the K candidate replies, and the output of the generative model is more standard and has higher precision.

In addition, encoding the statement to be replied and the K candidate replies includes: and encoding the sentence to be replied and the K candidate replies by adopting the same encoder. The sentence to be replied and the K candidate replies are input into the encoder of the same generative model, so that the encoder can fully learn the expression modes of the sentence to be replied and the K candidate replies, the output precision of the generative model is optimized, and the model generalization capability of the encoder model is stronger.

In addition, obtaining a context vector according to the vector to be replied and the K candidate reply vectors includes: and mapping the vector to be replied and the K candidate reply vectors to different vector spaces, splicing, and obtaining a context vector according to a splicing result. By mapping the vectors to be replied and the K candidate reply vectors to different vector spaces, the generative model can distinguish information conveyed by the sentences to be replied and the K candidate replies; and obtaining a context vector according to the splicing result, and enabling the generative model to generate a reply statement of the conversation by using the information and the expression mode of the context vector, thereby improving the output precision of the generative model.

In addition, the generative model comprises a first parameter matrix and a second parameter matrix; mapping the vectors to be replied and the K candidate reply vectors to different vector spaces and then splicing, and obtaining context vectors according to a splicing result, wherein the method comprises the following steps: multiplying the vector to be replied by the first parameter matrix to obtain a transformed vector to be replied; multiplying the K candidate reply vectors by a second parameter matrix to obtain K transformed candidate reply vectors; and splicing the transformed vector to be replied and the transformed K candidate replies to obtain a context vector. The vector to be replied and the K candidate reply vectors are multiplied by the first parameter matrix and the second parameter matrix respectively, so that the weight of meaningless answers in the K candidate replies can be reduced, the weight of meaningful answers in the statement to be replied and the K candidate replies is improved, the input of a decoder in the generative model is optimized, and the output of the generative model is optimized.

In addition, calculating a composite score for each word in the dictionary based on the context vector and the inverse document frequency includes: calculating a composite score for each word according to the following first calculation formula:

P(y_t|y_t-1,q,r)＝α*softmax_score(w)+β*idf(w)；

wherein, P (y)_t|y_t-1Q, r) is the composite score for each word, y_tThe predicted words at the time t, q are sentences to be replied, r is candidate reply, alpha and beta are parameters of a preset generative model, idf (w) is the inverse document frequency of each word, and softmax _ score (w) is a normalized exponential function value of each word, and is calculated by using the following second calculation formula:

wherein,

output of the hidden layer of the generative model for time t, C_inputIs a context vector.

Additionally, retrieving a reply statement from the predicted reply includes: and inputting the prediction reply and the K candidate replies into a preset classification model, and acquiring a result output by the preset classification model as a reply statement.

Drawings

One or more embodiments are illustrated by the corresponding figures in the drawings, which are not meant to be limiting.

FIG. 1 is a flow chart illustrating an artificial intelligence based dialog generation method according to a first embodiment of the present invention;

FIG. 2 is another flow chart of the dialog generating method based on artificial intelligence according to the first embodiment of the present invention;

FIG. 3 is a diagram illustrating a specific principle of an artificial intelligence-based dialog generation method according to a first embodiment of the present invention;

FIG. 4 is a block diagram of a dialog generating apparatus based on artificial intelligence according to a second embodiment of the present invention;

fig. 5 is a schematic structural diagram of a network device according to a third embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. However, it will be appreciated by those of ordinary skill in the art that numerous technical details are set forth in order to provide a better understanding of the present application in various embodiments of the present invention. However, the technical solution claimed in the present application can be implemented without these technical details and various changes and modifications based on the following embodiments.

A first embodiment of the present invention relates to a dialog generation method based on artificial intelligence. Inputting the sentence to be replied into a retrieval model by acquiring the sentence to be replied, wherein the retrieval model is used for screening out K candidate replies responding to the sentence to be replied from a preset dialogue corpus, and K is a positive integer; acquiring K candidate replies output by the retrieval model, inputting the sentence to be replied and the K candidate replies into a generative model, screening out a predicted word by using the generative model according to the sentence to be replied, the K candidate replies and the inverse document frequency of each word in a dictionary, and outputting a predicted reply formed by using the predicted word; and acquiring a reply statement according to the prediction reply. Because the candidate reply obtained by the retrieval model is input to the generating model, the information of the candidate reply obtained by the retrieval can be utilized by the generating model, the combination of the retrieval model and the generating model is realized, and the output of the generating model is optimized; in addition, the generating model screens the predicted words according to the inverse document frequency of the words, so that the probability of high-frequency words serving as replies can be reduced, the generation of universal answers can be reduced, and the output precision of the chat robot is improved.

It should be noted that the specific implementation subject of this embodiment may be a server, or may be a chip in a specific product, for example, a chip in a chat robot. The following description will take the server as an example.

A flow diagram of the dialog generating method based on artificial intelligence provided by this embodiment is shown in fig. 1, and specifically includes the following steps:

s101: the method comprises the steps of obtaining a sentence to be replied, and inputting the sentence to be replied into a retrieval model, wherein the retrieval model is used for screening out K candidate replies responding to the sentence to be replied from a preset dialogue corpus, and K is a positive integer.

S102: and acquiring K candidate replies output by the retrieval model, and inputting the sentence to be replied and the K candidate replies into the generating model, wherein the generating model screens out a predicted word according to the sentence to be replied, the K candidate replies and the inverse document frequency of each word in the dictionary, and outputs a predicted reply formed by the predicted word.

S103: and acquiring a reply statement according to the prediction reply.

The preset dialogue corpus can be pre-stored in the database and can be composed of a plurality of dialogues or question-answer groups. The retrieval model can be obtained by extracting the characteristics of the problem and the reply, and training the characteristics through a machine learning algorithm after the characteristics are obtained. The generative model can be obtained by training a neural network model through a preset dialogue or question-answer group of a dialogue corpus. In this embodiment, no specific limitation is imposed on the machine learning algorithm used for the search model and the training neural network model used for the generation model.

The sentence to be replied is a sentence corresponding to the generated reply, and may include a question sentence and a non-question sentence, that is, the sentence to be replied is not necessarily in the form of a question sentence, and may be any one sentence. Optionally, the statement to be replied may be input by the user at the client, and then sent to the server by the client, so that the server can obtain the statement to be replied. The form of the sentence to be replied may be a text form or a voice form, and optionally, when the sentence to be replied is the voice form, the client or the server converts the voice into the text so as to input the retrieval mode and the following calculation. The candidate reply refers to reply sentences which are screened out in a retrieval mode and reply to the sentence to be replied, the number of the reply sentences is K, K is a positive integer, the number of K can be set according to actual conditions, and no specific limitation is made here. The prediction reply refers to a reply sentence composed of prediction words generated by a generative model, and it can be understood that the prediction reply is one.

The dictionary may be generated from a preset dialog corpus, that is, the dictionary may be composed of all non-repeating words included in the preset dialog corpus. The Inverse Document Frequency (IDF) is also called inverse document frequency and is the reciprocal of the document frequency. The general calculation formula is as follows:

in this embodiment, since all the preset dialogue corpora are sentences, the calculation formula of the inverse document frequency of each word in the dictionary in this embodiment is:

it is understood that the neural network model in the generative model is generally selected by calculating a normalized exponential function value (softmax value) of the word to select the predicted word, i.e., the word with the largest softmax value is used as the predicted word. In this embodiment, alternatively, the softmax value and the IDF value of the word may be given different weighting coefficients, respectively, and then the calculation may be performed, and the word corresponding to the value having the largest calculation result may be used as the predicted word. Because the predictive words are screened out by combining the generated model with the inverse document frequency of each word, the probability of high-frequency words as replies can be reduced, thereby reducing the generation of universal answers such as 'I does not know' and 'Ha' and the like, and further leading the reply sentences output by the chat robot to be more reasonable.

Specifically, the server side inputs the sentence to be replied into a trained retrieval model, and the retrieval model screens K candidate replies responding to the sentence to be replied from a preset dialogue corpus; the server side obtains K candidate replies output by the retrieval model, inputs the K candidate replies and the sentences to be replied into the trained generative model, screens out predicted words one by using the generative model according to the sentences to be replied, the K candidate replies and the inverse document frequency of each word in the dictionary, and then forms the predicted words into a predicted reply. It can be understood that the sentence to be replied and the information of the K candidate replies retrieved by the retrieval model can be utilized by the generative model, thereby improving the normalization of the prediction reply output by the generative model. Optionally, the sentence to be replied and the K candidate replies may be combined into a matrix as an input of the neural network model in the generative model. Alternatively, the generative model may include an encoder (encoder model) and a decoder (decoder model), and the sentence to be replied and the K candidate replies are encoded into a vector form by the encoder and then are input into the decoder as a matrix. After the generative model generates the prediction reply, the server side obtains the prediction reply output by the generative model, and then obtains a reply statement according to the prediction reply. Optionally, the server obtains the prediction reply as a reply statement, and outputs the reply statement to the client. Alternatively, the reply sentence may be converted from text to speech, and then output to the client in the form of speech.

Compared with the prior art, the embodiment can obtain K candidate replies according to the retrieval model by inputting the obtained sentence to be replied into the retrieval model; inputting the sentence to be replied and the K candidate replies into the generative model, so that the sentence to be replied and the searched K candidate replies can be utilized by the generative model, and the output of the generative model is more standard; furthermore, the generating model combines the inverse document frequency of each word in the dictionary when screening the predicted words, so that the probability of high-frequency words as replies can be reduced, the generation of universal answers is reduced, and the output precision of the chat robot is improved.

In a specific example, as shown in fig. 2, it is another schematic flow chart of the dialog generating method based on artificial intelligence provided in this embodiment, and specifically includes the following steps:

S1021: and acquiring K candidate replies output by the retrieval model, and inputting the statements to be replied and the K candidate replies into the generating model.

S1022: and coding the sentence to be replied and the K candidate replies to obtain a vector to be replied and K candidate reply vectors.

S1023: and obtaining a context vector according to the vector to be replied and the K candidate reply vectors.

S1024: and calculating the comprehensive score of each word in the dictionary according to the context vector and the inverse document frequency, and acquiring the word with the highest comprehensive score as a predicted word.

S1025: a predicted reply is obtained based on the predicted word composition.

S103: and acquiring a reply statement according to the prediction reply.

Wherein, S101, S103 and S1021 are the same as above, and are not described again here.

In S1022, optionally, the generative model includes an encoder, and the server encodes the statement to be replied and the K candidate replies through the encoder, encodes the statement to be replied to obtain a vector to be replied, and encodes the K candidate replies to obtain the K candidate reply vectors. The encoder model may be an LSTM, GRU, or transform model, and is not limited herein.

It is to be understood that the generative model may encode the reply-to statement and the K candidate replies through different encoders. Optionally, encoding the statement to be replied and the K candidate replies means encoding the statement to be replied and the K candidate replies using the same encoder.

The sentence to be replied and the K candidate replies are input into the encoder of the same generative model, so that the encoder can fully learn the expression modes of the sentence to be replied and the K candidate replies, and the output precision of the generative model is optimized. It will be appreciated that the encoder may be trained in the same manner as it is trained. In the using process of the encoder, along with the increase of the using times, the model in the encoder is continuously learned, so that the generalization capability of the model of the encoder is stronger.

In S1023, a context vector is obtained according to the vector to be restored and the K candidate restoration vectors, which may specifically be: and mapping the vector to be replied and the K candidate reply vectors to different vector spaces, splicing, and obtaining a context vector according to a splicing result.

Specifically, the vector to be replied and the K candidate reply vectors are mapped to different vector spaces, and may be implemented by multiplying the vector to be replied and the K candidate replies by different parameter matrices, respectively. Optionally, the vectors to be replied after being mapped to different vector spaces and the K candidate replies are spliced, where the vectors to be replied after being mapped are respectively spliced with each candidate reply after being mapped, and then a matrix is formed according to the results of the K splices, and the matrix is used as a context vector; or the vector to be replied after mapping can be directly spliced with the K candidate replies after mapping to form a vector, and the vector is taken as a context vector.

By mapping the vectors to be replied and the K candidate reply vectors to different vector spaces, the generative model can distinguish information conveyed by the sentences to be replied and the K candidate replies; and obtaining the context vector according to the splicing result, so that the generative model generates a dialogue reply by using the information and the expression mode of the context vector, and the output precision of the generative model is improved.

Optionally, the generative model comprises a first parameter matrix and a second parameter matrix. The first parameter matrix and the second parameter matrix can be obtained by training existing question-answer groups in a preset dialogue corpus. Mapping the vector to be replied and the K candidate reply vectors to different vector spaces and then splicing, and obtaining a context vector according to a splicing result, wherein the method specifically comprises the following steps: multiplying the vector to be replied by the first parameter matrix to obtain a transformed vector to be replied; multiplying the K candidate reply vectors by a second parameter matrix to obtain K transformed candidate reply vectors; and splicing the transformed vector to be replied with the transformed K candidate replies to obtain a context vector.

The step of splicing the transformed vector to be replied with the transformed K candidates means that a matrix is formed according to K splicing results after the transformed vector to be replied is spliced with each of the transformed K candidates, and the matrix is used as a context vector. Optionally, the generative model further comprises a decoder, the context vector is input into the decoder, and the decoder obtains an output of the generative model according to the context vector and the neural network model adopted. Wherein the decoder can be LSTM, GRU, Transformer and other models.

The vector to be replied and the K candidate reply vectors are multiplied by the first parameter matrix and the second parameter matrix respectively, so that the weight of meaningless answers in the K candidate replies can be reduced, the weight of meaningful answers in the statement to be replied and the K candidate replies is improved, the input of a decoder in the generative model is optimized, and the output of the generative model is optimized.

In S1024, the server calculates a softmax value according to the context vector, and respectively gives the softmax value corresponding to the context vector and weight coefficients with different inverse document frequencies of each word in the dictionary; multiplying the softmax value and the inverse document frequency by respective weight coefficients respectively to obtain a comprehensive score of each word; and selecting the word with the highest comprehensive score as a predicted word.

Alternatively, the composite score for each word may be calculated using the following first calculation formula:

P(y_t|y_t-1,q,r)＝α*softmax_score(w)+β*idf(w)；

wherein, P (y)_tYy_t-1Q, r) is the composite score for each word, y_tThe predicted word at the time t, q is a sentence to be replied, r is a candidate reply, α and β are parameters of a preset generative model, that is, the weight coefficient, idf (w) is the inverse document frequency of each word, softmax _ score (w) is a normalized index function value of each word, and the following second calculation formula is used for calculation:

wherein,

It can be understood that the generative model is generated word by word, and the server can obtain the prediction reply according to all the prediction words and then obtain the reply sentence according to the prediction reply. Alternatively, the server may reply with the prediction as a reply statement.

The sentence to be replied and the K candidate replies are coded into vectors, the context vectors are obtained according to the coded result, and the context vectors are used as the input of the decoder of the generative model, so that the input of the decoder can be optimized, the generative model can fully learn the information of the candidate replies searched by the retrieval model and the expression modes of the sentence to be replied and the K candidate replies, and the output of the generative model is more standard and has higher precision.

In a specific example, in S103, obtaining the dialog statement according to the predicted reply may include: and inputting the prediction reply and the K candidate replies into a preset classification model, and acquiring a result output by the preset classification model as a reply statement.

The preset classification model can be an algorithm model such as a decision tree, a support vector machine or a random forest. Preferably, the preset classification model is an xgboost model.

Specifically, the server side combines the predicted reply obtained by the generative model and the K candidate replies into a larger candidate answer set, inputs the answers into a preset classification model, and takes the result obtained by the preset classification model as a final result, namely, a reply statement.

The predicted replies and the K candidate replies are further screened through the preset classification model, so that the accuracy of reply sentences can be higher, and the output accuracy of the chat robot is improved.

Please refer to fig. 3, which is a diagram illustrating a specific principle of the dialog generating method based on artificial intelligence according to the present embodiment. In the figure, a Retrieval model is a Retrieval model, an Encoder model is an Encoder model in a generative model, a Decoder model is a Decoder model in the generative model, and Word idf refers to the inverse document frequency of a Word. The following is a specific example:

the problem vector cosine value is taken as a search model, and the bidirectional GRU is taken as a model of a generative model encoder and decoder to be explained as an example, and the specific flow is as follows:

(1) marking the sentence to be replied input by the user as q, and coding the sentence to be replied as a vector

Recording the questions in all question-answer pairs in the preset dialogue corpus as Q_i(i-1, 2, …, n), n representing the number of question-answer pairs, encoded as a vector

The following were used:

the sensor _ encoding model here may employ a word vector summation model, namely:

wherein s represents a sentence to be replied and all questions in (1) which need to be encoded as a vector, w represents a word in s, and word _ embedding can adopt a pre-training model such as word2 vec.

(2) Respectively solve for

And

the answers corresponding to the first K questions with the maximum value are selected as candidate replies, and are marked as { r₁,r₂,…,r_k}；

(3) Counting words in a preset dialogue corpus to generate a dictionary, and counting the idf value of each word in the dictionary;

(4) q and { r₁,r₂,…,r_kSending the sentence vectors into the same bidirectional GRU model respectively to obtain corresponding sentence vectors h_qAnd

wherein,

representing the result of the forward GRU model encoding q,

and represents the coding result of the backward GRU model on q.

(5) And (3) carrying out space conversion on the result obtained in the step (4) by using Wq (a first parameter matrix) and Wr (a second parameter matrix) to obtain a final q vector and an r vector, and splicing the q vector and the r vector, wherein the formula is as follows:

C_input＝[v_q,v₁,v₂,...,v_k]；

C_inputrepresenting the context vector jointly obtained by the user query q and the result r of the indexing model as one of the inputs to the decoder.

(6) The decoder generates a response from the context vector provided in (5). The calculation method is as follows:

P(y_t|y_t-1,q,r)＝α*softmax_score(w)+β*idf(w)；

in particular, it is possible to use, for example,

wherein y is_initFor random initialization vectors, α and β are model parameters, and represent weights for softmax and idf values, which need to be learned during model training.

Specifically, aThe context vector calculated according to (5) and the last word y predicted by the model_t-1And hidden layer output of last timestamp of GRU

Computing hidden layer results for a current timestamp

Calculating the softmax value of each word in the dictionary to obtain softmax _ score of each word; and (3) calculating a new score together with the idf value in the step (3), wherein the obtained result is the final score of each word, and the highest score is selected as the current predicted word y_t. The predicted word, together with all the previous predicted results and the context vector in (5), is used as the input of the next prediction, and a complete sentence is generated in turn as an output of the generation model.

(7) And (3) combining the retrieval result of the retrieval formula model in the step (2) and the output of the generation formula model in the step (6) into a larger candidate answer set, then screening out an answer with the highest matching degree with the sentence to be replied from the candidate answer set by adopting an xgboost model, and returning the answer to the user as a reply sentence.

The steps of the above methods are divided for clarity, and the implementation may be combined into one step or split some steps, and the steps are divided into multiple steps, so long as the steps contain the same logical relationship, which is within the protection scope of the present patent; it is within the scope of the patent to add insignificant modifications to the algorithms or processes or to introduce insignificant design changes to the core design without changing the algorithms or processes.

A second embodiment of the present invention relates to an artificial intelligence based dialog generating device, as shown in fig. 4, including: a candidate reply retrieval module 301, a predicted reply acquisition module 302, and a reply sentence acquisition module 303. Specifically, the method comprises the following steps:

the candidate reply retrieval module 301 is configured to obtain a to-be-replied sentence, and input the to-be-replied sentence into a retrieval model, where the retrieval model is configured to screen K candidate replies responding to the to-be-replied sentence from a preset dialogue corpus, where K is a positive integer;

a prediction reply obtaining module 302, configured to obtain K candidate replies output by the search-type model, and input the sentence to be replied and the K candidate replies into a generative model, where the generative model screens out a prediction word according to the sentence to be replied, the K candidate replies, and an inverse document frequency of each word in the dictionary, and outputs a prediction reply composed of the prediction word;

a reply statement obtaining module 303, configured to obtain a reply statement according to the predicted reply.

Further, the predicted reply obtaining module 302 is further configured to:

and coding the sentence to be replied and the K candidate replies to obtain a vector to be replied and K candidate reply vectors.

Obtaining a context vector according to the vector to be replied and the K candidate reply vectors;

calculating the comprehensive score of each word in the dictionary according to the context vector and the inverse document frequency, and acquiring the word with the highest comprehensive score as a predicted word;

a predicted reply is obtained based on the predicted word composition.

Further, encoding the statement to be replied and the K candidate replies includes:

and encoding the sentence to be replied and the K candidate replies by adopting the same encoder.

Further, obtaining a context vector according to the vector to be replied and the K candidate reply vectors, including:

and mapping the vector to be replied and the K candidate reply vectors to different vector spaces, splicing, and obtaining a context vector according to a splicing result.

Further, the generative model comprises a first parameter matrix and a second parameter matrix;

mapping the vectors to be replied and the K candidate reply vectors to different vector spaces and then splicing, and obtaining context vectors according to a splicing result, wherein the method comprises the following steps:

multiplying the vector to be replied by the first parameter matrix to obtain a transformed vector to be replied;

multiplying the K candidate reply vectors by a second parameter matrix to obtain K transformed candidate reply vectors;

and splicing the transformed vector to be replied and the transformed K candidate replies to obtain a context vector.

Further, calculating a composite score for each word in the dictionary based on the context vector and the inverse document frequency, comprising:

calculating a composite score for each word according to the following first calculation formula:

P(y_t|y_t-1,q,r)＝α*softmax_score(w)+β*idf(w)；

wherein,

Further, the reply statement acquiring module 303 is further configured to: and inputting the prediction reply and the K candidate replies into a preset classification model, and acquiring a result output by the preset classification model as a reply statement.

It should be understood that this embodiment is an example of the apparatus corresponding to the first embodiment, and may be implemented in cooperation with the first embodiment. The related technical details mentioned in the first embodiment are still valid in this embodiment, and are not described herein again in order to reduce repetition. Accordingly, the related-art details mentioned in the present embodiment can also be applied to the first embodiment.

It should be noted that each module referred to in this embodiment is a logical module, and in practical applications, one logical unit may be one physical unit, may be a part of one physical unit, and may be implemented by a combination of multiple physical units. In addition, in order to highlight the innovative part of the present invention, elements that are not so closely related to solving the technical problems proposed by the present invention are not introduced in the present embodiment, but this does not indicate that other elements are not present in the present embodiment.

A third embodiment of the invention is directed to a network device, as shown in fig. 5, comprising at least one processor 401; and a memory 402 communicatively coupled to the at least one processor 401; the memory 402 stores instructions executable by the at least one processor 401, and the instructions are executable by the at least one processor 401 to enable the at least one processor 401 to perform the artificial intelligence based dialog generation method described above.

Where the memory 402 and the processor 401 are coupled by a bus, which may include any number of interconnected buses and bridges that couple one or more of the various circuits of the processor 401 and the memory 402 together. The bus may also connect various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor 401 may be transmitted over a wireless medium via an antenna, which may receive the data and transmit the data to the processor 401.

The processor 401 is responsible for managing the bus and general processing and may provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And memory 402 may be used to store data used by processor 401 in performing operations.

A fourth embodiment of the present invention relates to a computer-readable storage medium storing a computer program. The computer program realizes the above-described method embodiments when executed by a processor.

That is, those skilled in the art can understand that all or part of the steps in the method of the foregoing embodiments may be implemented by a program to instruct related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, etc.) or a processor (processor) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples for carrying out the invention, and that various changes in form and details may be made therein without departing from the spirit and scope of the invention in practice.

Claims

1. A dialog generation method based on artificial intelligence, comprising:

acquiring a sentence to be replied, and inputting the sentence to be replied into a retrieval model, wherein the retrieval model is used for screening out K candidate replies responding to the sentence to be replied from a preset dialogue corpus, and K is a positive integer;

acquiring K candidate replies output by the retrieval model, inputting the sentence to be replied and the K candidate replies into a generative model, screening out predicted words by using the generative model according to the sentence to be replied, the K candidate replies and the inverse document frequency of each word in a dictionary, and outputting a predicted reply formed by using the predicted words;

and inputting the prediction reply and the K candidate replies into a preset classification model, and acquiring a result output by the preset classification model as the reply sentence.

2. The artificial intelligence based dialog generation method of claim 1, further comprising, after said generating the to-reply sentence and the K candidate reply inputs into a formula model:

coding the sentence to be replied and the K candidate replies to obtain a vector to be replied and K candidate reply vectors;

calculating the comprehensive score of each word in the dictionary according to the context vector and the inverse document frequency, and acquiring the word with the highest comprehensive score as the predicted word;

obtaining the predicted reply composed according to the predicted word.

3. The artificial intelligence based dialog generation method of claim 2, wherein the encoding the sentence to be replied and the K candidate replies comprises:

4. The method of claim 3, wherein obtaining a context vector according to the vector to be replied and the K candidate reply vectors comprises:

and mapping the vector to be replied and the K candidate reply vectors to different vector spaces and then splicing, and obtaining the context vector according to a splicing result.

5. The artificial intelligence based dialog generation method of claim 4 wherein the generative model comprises a first parameter matrix and a second parameter matrix;

the mapping the vector to be replied and the K candidate reply vectors to different vector spaces and then splicing, and obtaining the context vector according to a splicing result, includes:

multiplying the K candidate reply vectors by the second parameter matrix to obtain K transformed candidate reply vectors;

and splicing the transformed vector to be replied with the transformed K candidate replies to obtain the context vector.

6. The artificial intelligence based dialog generation method of claim 3 wherein said computing a composite score for each word in the lexicon from the context vector and the inverse document frequency comprises:

calculating a composite score for each of the words according to a first calculation formula:

P(y_t|y_t-1,q,r)＝α*softmax_score(w)+β*idf(w)；

wherein, P (y)_t|y_t-1Q, r) is the composite score of each said word, y_tThe predicted word at the time t, q is the sentence to be replied, r is the candidate reply, α and β are parameters of the preset generative model, idf (w) is the inverse document frequency of each word, and softmax _ score (w) is the normalized exponential function value of each word, and is calculated by the following second calculation formula:

wherein, the

For the output of the hidden layer of the generative model at time t, C_inputIs the context vector.

7. An artificial intelligence based dialog generation apparatus, comprising:

the candidate reply retrieval module is used for acquiring a sentence to be replied and inputting the sentence to be replied into a retrieval model, wherein the retrieval model is used for screening K candidate replies responding to the sentence to be replied from a preset dialogue corpus, and K is a positive integer;

the prediction reply acquisition module is used for acquiring K candidate replies output by the retrieval model, inputting the sentence to be replied and the K candidate replies into a generative model, screening out prediction words by the generative model according to the sentence to be replied, the K candidate replies and the inverse document frequency of each word in a dictionary, and outputting the prediction reply formed by the prediction words;

and the reply sentence acquisition module is used for inputting the prediction reply and the K candidate replies into a preset classification model and acquiring a result output by the preset classification model as the reply sentence.

8. A network device, comprising:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the artificial intelligence based dialog generation method of any of claims 1 to 6.

9. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the artificial intelligence based dialog generation method of any of claims 1 to 6.