WO2021051508A1

WO2021051508A1 - Robot dialogue generating method and apparatus, readable storage medium, and robot

Info

Publication number: WO2021051508A1
Application number: PCT/CN2019/116630
Authority: WO
Inventors: 于凤英; 王健宗
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-09-18
Filing date: 2019-11-08
Publication date: 2021-03-25
Also published as: CN110750629A

Abstract

A robot dialogue generating method and apparatus, a computer non-volatile readable storage medium, and a robot, relating to the technical field of computers. After a preset dialogue instruction is received, a dialogue statement of a user is acquired; a statement set to be processed is obtained; word segmentation processing is respectively performed on each statement in the statement set to obtain each word set corresponding to each statement; a word vector of each word in each word set is queried respectively; the occurrence probability of each word in each word set in the statement set is counted respectively; a statement vector of each statement is calculated respectively; clustering processing is performed on each statement according to the statement vector to obtain each clustering group; the similarity between the dialogue statement of the user and each clustering group is calculated respectively; and a reply statement corresponding to the dialogue statement of the user is queried, and the dialogue statement of the user is responded to by using the reply statement, so that the accuracy of a reply statement generated by a robot according to a clustering result is improved.

Description

Robot dialogue generation method, device, readable storage medium and robot

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on September 18, 2019, the application number is 201910880859.9, and the invention title is "Robot dialogue generation method, device, readable storage medium and robot". The entire content of the application is approved. The reference is incorporated in this application.

Technical field

This application belongs to the field of computer technology, and in particular relates to a method and device for generating a robot dialogue, a computer non-volatile readable storage medium, and a robot.

Background technique

In the robot intelligent dialogue technology, clustering the sentences generated in the dialogue process is the basis to ensure that the robot can conduct effective dialogue. In the prior art, the method for clustering sentences is generally based on keyword matching. This method can only take into account the local characteristics of sentences, that is, the characteristics of keywords. The lack of overall consideration of sentences leads to clustering. The accuracy of the class results is low, and the accuracy of the reply sentences generated by the robot based on such clustering results will also be low, which is difficult to meet the needs of actual dialogue scenarios.

technical problem

In view of this, the embodiments of the present application provide a method and device for generating a robot dialogue, a computer non-volatile readable storage medium, and a robot, so as to solve the problem of low accuracy of response sentences generated by robots in the prior art. problem.

Technical solutions

The first aspect of the embodiments of the present application provides a method for generating a robot dialogue, which is applied to a preset robot, and the method includes:

After receiving the preset dialogue instruction, collect the user's dialogue sentence;

Obtain a set of sentences to be processed from a preset database, the set of sentences includes SN sentences, and SN is an integer greater than 1;

Perform word segmentation processing on each sentence in the sentence set to obtain each word set corresponding to each sentence;

Query the word vector of each word in each word set in the preset word vector database;

Respectively count the probability of each word in each word set appearing in the sentence set;

Respectively calculating the sentence vector of each sentence according to the word vector of each word and the probability of each word appearing in the sentence set;

Clustering each sentence according to the sentence vector to obtain each clustering group;

Respectively calculating the similarity between the user's dialogue sentence and each cluster group;

Selecting a preferred group from each clustering group, the preferred group being the clustering group with the greatest degree of similarity with the dialogue sentence of the user;

Query the reply sentence corresponding to the dialogue sentence of the user in the preset preferred reply sentence set, and use the reply sentence to respond to the dialogue sentence of the user, and the preferred reply sentence set is related to the preferred group The set of reply sentences corresponding to the group.

The second aspect of the embodiments of the present application provides a device for generating a robot dialogue, which may include modules for implementing the steps of the method for generating a robot dialogue.

A third aspect of the embodiments of the present application provides a computer non-volatile readable storage medium, the computer non-volatile readable storage medium stores computer readable instructions, and the computer readable instructions are executed by a processor When realizing the steps of the robot dialog generation method described above.

The fourth aspect of the embodiments of the present application provides a robot, including a memory, a processor, and computer-readable instructions stored in the memory and running on the processor, and the processor executes the computer When reading instructions, the steps of the above-mentioned robot dialogue generation method are realized.

Beneficial effect

Compared with the prior art, the embodiment of the application has the beneficial effect that: in the calculation process of the sentence vector, the embodiment of the application fully considers the word vector of each word and the probability of each word, and can represent the sentence as a whole. Features greatly improve the accuracy of the clustering results. Correspondingly, the accuracy of the reply sentences generated by the robot based on such clustering results will also be greatly improved.

Description of the drawings

FIG. 1 is a flowchart of an embodiment of a method for generating a robot dialog in an embodiment of the application;

Figure 2 is a schematic flow chart of clustering each sentence according to the sentence vector;

FIG. 3 is a structural diagram of an embodiment of an apparatus for generating a robot dialog in an embodiment of the application;

Fig. 4 is a schematic block diagram of a robot in an embodiment of the application.

Embodiments of the present invention

The method for generating a robot dialogue in the embodiments of the present application can be applied to a preset dialogue robot (Chatterbot), which is a robot used to simulate human dialogue or chat. In a specific application scenario of the embodiment of the present application, the dialogue robot can be set in an exhibition hall, a company reception desk, an airport information desk, a hospital information desk, etc., to provide convenient consulting services for passing users.

Referring to FIG. 1, an embodiment of a method for generating a robot dialogue in an embodiment of the present application may include:

Step S101: After receiving a preset dialogue instruction, collect the dialogue sentence of the user.

In this embodiment, the user can issue a dialogue instruction to the dialogue robot in the form of voice. When the dialogue robot receives the user’s voice, it can determine whether it includes preset keywords. The keywords include but are not limited to words such as “please ask”, “consult”, “help”, etc. If the user’s voice If the keyword is included, it can be determined that the voice is a dialogue instruction issued by the user. The user can also issue a dialogue instruction to the dialogue robot through the physical buttons or virtual buttons in the designated human-computer interaction interface. For example, the dialogue robot may include a touch screen for interacting with the user. To issue a dialogue instruction to the dialogue robot, a specific button displayed therein can be clicked. After the dialogue robot receives the dialogue instruction issued by the user, it can collect the dialogue sentence of the user through its own microphone, microphone and other voice collection devices.

Step S102: Obtain a set of sentences to be processed from a preset database.

The sentence set includes SN sentences, and SN is an integer greater than 1.

In this embodiment, a database including massive instant messaging (IM) data can be established in advance, and the database contains as many instant messaging data generated during a certain statistical time period as possible. These instant messaging data Presented in the form of sentences, the statistical time period can be set according to the actual situation, for example, it can be set to a time period within a week, a month, a quarter, or a year from the current moment.

Step S103: Perform word segmentation processing on each sentence in the sentence set to obtain each word set corresponding to each sentence.

Word segmentation processing refers to segmenting a sentence into individual words. In this embodiment, the sentence can be segmented according to a general dictionary to ensure that the separated words are all normal vocabulary. If the word is not in the dictionary, single words are separated . When both the forward and backward directions can be formed into words, such as "request for god", it will be divided according to the statistical word frequency. If the word frequency of "requirement" is high, the word "requirement/shen" is divided, and if the word frequency of "quest for god" is high, it is divided into "must" /Pray for God". For any sentence in the sentence set, after word segmentation processing, the segmented words can be formed into a word set corresponding to the sentence.

Step S104, query the word vector of each word in each word set in the preset word vector database.

The word vector database is a database that records the correspondence between words and word vectors. The word vector may be a corresponding word vector obtained by training the word according to the word2vec model. That is, the probability of occurrence of the word is expressed according to the context information of the word. The training of word vectors is still based on the idea of word2vec. First, each word is represented as a 0-1 vector (one-hot) form, and then the word2vec model is trained with the word vector, and n-1 words are used to predict the nth word , The intermediate process obtained after the neural network model prediction is used as the word vector. Specifically, for example, the one-hot vector of "celebration" is assumed to be [1,0,0,0,...,0], and the one-hot vector of "meeting" is [0,1,0,0,... …,0], the one-hot vector for "smooth" is [0,0,1,0,……,0], the vector for predicting "closing" [0,0,0,1,……,0], The model is trained to generate the coefficient matrix W of the hidden layer. The product of the one-hot vector of each word and the coefficient matrix is the word vector of the word. The final form will be similar to "Celebrate [-0.28,0.34,-0.02, …...,0.92]" such a multi-dimensional vector.

Step S105: Count the probability of each word in each word set appearing in the sentence set.

Step S106: Calculate the sentence vector of each sentence according to the word vector of each word and the probability of each word appearing in the sentence set.

In this embodiment, the maximum likelihood method can be used to estimate the sentence vector. Here, it is assumed that all word vectors are roughly uniformly distributed in the entire vector space, and the likelihood function of each sentence is constructed as shown below:

Where s is the sequence number of each sentence in the sentence set, 1≤s≤SN, 1≤w≤WN _s , WN _s is the number of words in the sth sentence, α is a preset constant, p( w) is the probability that the wth word in the sth sentence appears in the sentence set, v _w is the word vector of the wth word in the sth _{sentence, and v s} is the sentence vector of the sth sentence , <v _s ,v _w > is _{the angle between the two vectors v s} and v _w , Z is the preset constant, Sentense _s is the sth sentence, and f(Sentense _s |v _s ) is the sth sentence The likelihood function.

Take the logarithm of this likelihood function to obtain the likelihood function of each word as shown below:

Among them, ln is the natural logarithm function, word _w is the wth word in the sth sentence, and f(word _w |v _s ) is the likelihood function of the wth word in the sth sentence.

The goal is to maximize the above formula, namely:

You can get:

among them,

Is a constant, and ∝ is a proportional sign.

In this embodiment, the sentence vector of each sentence can be expressed as a weighted average of the word vectors of the words contained, namely:

The sentence vector obtained by the above method is its maximum likelihood estimate, or from the Bayesian point of view, the maximum posterior probability. Compared with the traditional simple average, this method takes into account the semantic level information, and the clustering effect is better; compared with the deep learning system, this method is more efficient and effective, and more importantly, it does not need to be labeled. Training data, sentence vectorization makes sentences can be calculated and clustered.

Further, after calculating the sentence vector of each sentence, the sentence vector of each sentence can also be constructed as a sentence matrix of the sentence set, wherein the sentence vector of each sentence is used as a row of the sentence matrix, so The number of rows of the sentence matrix is consistent with the number of sentences included in the sentence set. Then use the principal component analysis method (Principal Component Analysis, PCA) to calculate the principal components in the sentence matrix, and update the sentence vector of each sentence according to the following formula:

v _s = v _s -uu ^T v _s

Among them, u is the principal component in the sentence matrix, and finally the updated sentence vector is obtained, that is, the sentence vector from which the principal component interference is removed. It should be noted that the sentence vector used in the subsequent steps refers to the updated sentence vector.

Further, the sentence vector of each sentence can also be normalized according to the following formula:

Among them, mean is the averaging function, var is the variance function, ε _norm is the preset constant, and finally the normalized sentence vector is obtained. It should be noted that the sentence vectors used in the subsequent steps all refer to the normalized sentence vectors.

Further, the sentence vector of each sentence can also be whitened according to the following formula:

[V,D]=eig(cov(Matrix))

VDV ^T = cov(Matrix)

v _s ＝V(D+ε _zca I) ^-0.5 V ^T v _s

Among them, Matrix is the sentence matrix constructed from the sentence vectors of each sentence, cov is the function for finding the covariance matrix, [V,D]=eig(cov(Matrix)) means finding all the characteristic values of the matrix cov(Matrix) , Form the diagonal matrix D, and find the eigenvector of the matrix cov(Matrix) to form the column vector of V, ε _zca is the preset constant, and I is the whitened sentence vector of the identity matrix. It should be noted that the sentence vectors used in the subsequent steps all refer to the whitened sentence vectors.

Step S107: Perform clustering processing on each sentence according to the sentence vector to obtain each cluster group.

As shown in FIG. 2, step S107 may specifically include the following steps:

Step S1071, initialize the cluster center set.

Specifically, the cluster center set as shown below can be initialized:

Centre(0)={c ₁ (0),c ₂ (0),...,c _k (0),...,c _KN (0)}

Among them, k is the serial number of each cluster center, 1≤k≤KN, KN is the preset number of cluster centers, 1<KN<SN, the specific value can be set according to the actual situation, for example, it can be set Is 5, 10, 15, 20 or other values, c _k (0) is the initialization vector of the k-th cluster center, the dimension of this vector is consistent with the dimension of the sentence vector of each sentence, and satisfies c _k (0 ) ^T c _k (0)=1, its specific vector value can be randomly set, T is the transposed symbol, and Centre(0) is the initialized cluster center set.

Step S1072, update the cluster center set for the g th time.

Specifically, the g-th update of the cluster center set can be performed according to the following formula:

among them,

argmax is the maximum independent variable function, g≥1, in the initial state, g=1, and c _k (g) is the vector after the gth update of the kth cluster center.

Step S1073: Determine whether the set of cluster centers meets a preset convergence condition.

Specifically, first calculate the g-th update distance of each cluster center according to the following formula:

UpGdDis _k,g =VecDis(c _k (g),c _k (g-1))

Among them, VecDis is a function for finding the distance between two vectors, and UpGdDis _k,g is the g-th update distance of the k-th cluster center.

Then, calculate the g-th maximum update distance of the cluster center set according to the following formula:

MaxDis _g =Max(UpGdDis _0,g ,UpGdDis _1,g ,...,UpGdDis _k,g ,...,UpGdDis _KN,g )

Wherein, Max is a maximum value function, and MaxDis _g is the maximum update distance of the g-th cluster center set.

Next, determine whether the set of cluster centers meets the convergence conditions shown below:

MaxDis _g <Thresh

Among them, Thresh is a preset distance threshold, and its specific value can be set according to actual conditions. For example, it can be set to 0.1, 0.01, 0.001 or other values.

If the set of cluster centers does not meet the convergence condition, step S1074 and subsequent steps are performed, and if the set of cluster centers meets the convergence condition, step S1075 is performed.

Step S1074: Increase g by one counting unit.

That is, execute: g=g+1, and then return to execute step S1072 and subsequent steps until the set of cluster centers meets the convergence condition.

Step S1075: Perform clustering processing on each sentence according to the cluster center set after the gth update to obtain each cluster group.

The set of cluster centers at this time is the final set of cluster centers, which is recorded as: Centre={c ₁ ,c ₂ ,...,c _k ,...,c _KN }, where, c _k is the k-th cluster center finally determined.

For any sentence, the distance between its sentence vector and the vector of each cluster center in the cluster center set can be calculated separately, and clustered to the cluster center with the smallest distance therefrom. By traversing each sentence, each sentence clustered to the same cluster center forms a clustering group, and the final clustering result can be obtained.

Step S108: Calculate the similarity between the dialogue sentence of the user and each cluster group respectively.

First, the sentence vector of the dialogue sentence of the user can be calculated, which is denoted as SenVec here. The specific calculation process is similar to the process in step S103 to step S106, and you can refer to the foregoing content, which will not be repeated here.

Then, the similarity between the dialogue sentence of the user and each cluster group can be calculated separately according to the following formula:

SimDeg _k = Recip(VecDis(SenVec,c _k ))

Wherein, Recip is a function for calculating the reciprocal, and SimDeg _k is the similarity between the dialogue sentence of the user and the k-th cluster group.

Step S109: Select a preferred group from each cluster group.

The preferred group is the cluster group with the greatest similarity to the dialogue sentences of the user, namely:

SelGroup=argmin(SimDeg ₁ ,SimDeg ₂ ,...,SimDeg _k ,...,SimDeg _KN )

Wherein, argmin is the smallest independent variable function, and SelGroup is the serial number of the preferred group.

Step S110: Query the reply sentence corresponding to the dialogue sentence of the user in the preset preferred reply sentence set, and use the reply sentence to respond to the dialogue sentence of the user.

The set of preferred reply sentences is a set of reply sentences corresponding to the preferred group. In this embodiment, a set of reply sentences corresponding to each cluster group can be separately set in advance. Each set of reply sentences includes a predetermined number of reply sentences, and each reply sentence is used to perform a certain preset specified sentence. answer. When the dialogue robot needs to respond to the user's dialogue sentence, it can calculate the distance between the sentence vector of the user's dialogue sentence and the sentence vector of each specified sentence, and determine the corresponding distance when the minimum value is obtained. Then, query the reply sentence corresponding to the specified sentence in the preferred reply sentence set, and use this reply sentence to respond to the dialog sentence of the user.

In summary, in the calculation process of the sentence vector in the embodiment of the present application, the word vector of each word and the probability of each word appearing are fully considered, which can characterize the characteristics of the sentence as a whole, and greatly improve the accuracy of the clustering result. Correspondingly, the accuracy of the reply sentences generated by the robot based on such clustering results will also be greatly improved.

Corresponding to the method for generating a robot dialog described in the above embodiment, FIG. 3 shows a structural diagram of an embodiment of a device for generating a robot dialog provided by an embodiment of the present application.

In this embodiment, an apparatus for generating a robot dialog may include:

The dialogue sentence collection module 301 is used to collect the user's dialogue sentence after receiving a preset dialogue instruction;

The sentence set obtaining module 302 is configured to obtain a sentence set to be processed from a preset database, the sentence set includes SN sentences, and SN is an integer greater than 1;

The word segmentation processing module 303 is configured to perform word segmentation processing on each sentence in the sentence set to obtain each word set corresponding to each sentence respectively;

The word vector query module 304 is used to query the word vector of each word in each word set in a preset word vector database;

The probability statistics module 305 is used to separately count the probability of each word in each word set appearing in the sentence set;

The sentence vector calculation module 306 is configured to calculate the sentence vector of each sentence according to the word vector of each word and the probability of each word appearing in the sentence set;

The clustering processing module 307 is configured to perform clustering processing on each sentence according to the sentence vector to obtain each clustering group;

The similarity calculation module 308 is configured to calculate the similarity between the dialogue sentence of the user and each cluster group respectively;

The preferred group selection module 309 is configured to select a preferred group from each cluster group, and the preferred group is the cluster group with the greatest similarity to the dialog sentence of the user;

The dialogue response module 310 is used to query the reply sentence corresponding to the dialogue sentence of the user in the preset preferred reply sentence set, and use the reply sentence to respond to the dialogue sentence of the user, the preferred reply sentence The set is a set of reply sentences corresponding to the preferred group.

Further, the clustering processing module may include:

The initialization unit is used to initialize the cluster center set;

An update unit, configured to update the cluster center set for the g th time;

A convergence judging unit, configured to judge whether the set of cluster centers meets a preset convergence condition;

A counting unit, configured to increase g by one counting unit if the set of cluster centers does not meet the convergence condition;

The clustering processing unit is configured to, if the cluster center set meets the convergence condition, perform clustering processing on each sentence according to the cluster center set after the gth update to obtain each cluster group.

Further, the convergence judgment unit may include:

The first calculation subunit is used to calculate the g-th update distance of each cluster center;

The second calculation subunit is used to calculate the g-th maximum update distance of the cluster center set;

The convergence judgment subunit is used to judge whether the set of cluster centers meets the convergence condition.

Further, the device for generating a robot dialogue may further include:

The matrix construction module is used to construct the sentence vector of each sentence into the sentence matrix of the sentence set;

The principal component calculation module is used to calculate the principal components in the sentence matrix;

The vector update module is used to update the statement vector of each statement.

Those skilled in the art can clearly understand that, for the convenience and conciseness of the description, the specific working processes of the above described devices, modules and units can refer to the corresponding processes in the foregoing method embodiments, which will not be repeated here.

Fig. 4 shows a schematic block diagram of a robot provided by an embodiment of the present application. For ease of description, only parts related to the embodiment of the present application are shown.

In this embodiment, the robot 4 may include: a processor 40, a memory 41, and computer-readable instructions 42 stored in the memory 41 and executable on the processor 40, such as executing the aforementioned robot dialogue. Generate computer-readable instructions for the method. When the processor 40 executes the computer-readable instructions 42, the steps in the above embodiments of the robot dialog generation method are implemented.

Exemplarily, the computer-readable instructions 42 may be divided into one or more modules/units, and the one or more modules/units are stored in the memory 41 and executed by the processor 40, To complete this application. The one or more modules/units may be a series of computer-readable instruction segments capable of completing specific functions, and the instruction segments are used to describe the execution process of the computer-readable instructions 42 in the robot 4.

Persons of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiments can be implemented by instructing relevant hardware through computer-readable instructions. The computer-readable instructions can be stored in a non-volatile computer. In a readable storage medium, when the computer-readable instructions are executed, they may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, a person of ordinary skill in the art should understand that it can still implement the foregoing The technical solutions recorded in the examples are modified, or some of the technical features are equivalently replaced; these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

A method for generating a robot dialogue, characterized in that it is applied to a preset robot, and the method includes:

After receiving the preset dialogue instruction, collect the user's dialogue sentence;

Obtain a set of sentences to be processed from a preset database, the set of sentences includes SN sentences, and SN is an integer greater than 1;

Perform word segmentation processing on each sentence in the sentence set to obtain each word set corresponding to each sentence;

Query the word vector of each word in each word set in the preset word vector database;

Respectively count the probability of each word in each word set appearing in the sentence set;

Respectively calculating the sentence vector of each sentence according to the word vector of each word and the probability of each word appearing in the sentence set;

Clustering each sentence according to the sentence vector to obtain each clustering group;

Respectively calculating the similarity between the user's dialogue sentence and each cluster group;

Selecting a preferred group from each clustering group, the preferred group being the clustering group with the greatest degree of similarity with the dialogue sentence of the user;

Query the reply sentence corresponding to the dialogue sentence of the user in the preset preferred reply sentence set, and use the reply sentence to respond to the dialogue sentence of the user, and the preferred reply sentence set is related to the preferred group The set of reply sentences corresponding to the group.
The method for generating a robot dialogue according to claim 1, wherein the calculation of the sentence vector of each sentence according to the word vector of each word and the probability of each word appearing in the sentence set comprises:

Calculate the sentence vector of each sentence according to the following formula:

Where s is the sequence number of each sentence in the sentence set, 1≤s≤SN, 1≤w≤WN s , WN s is the number of words in the sth sentence, const is the preset constant, p( w) is the probability that the wth word in the sth sentence appears in the sentence set, v w is the word vector of the wth word in the sth sentence, and v s is the sentence vector of the sth sentence .
The method for generating a robot dialogue according to claim 1, wherein the clustering of each sentence according to the sentence vector to obtain each clustering group comprises:

Initialize the cluster center set as shown below:

Centre(0)={c 1 (0),c 2 (0),...,c k (0),...,c KN (0)}

Where k is the serial number of each cluster center, 1≤k≤KN, KN is the preset number of cluster centers, 1<KN<SN, c k (0) is the initialization vector of the k-th cluster center, and Satisfies c k (0) T c k (0) = 1, T is a transposition symbol, and Centre(0) is the initialized set of cluster centers;

Perform the gth update on the cluster center set according to the following formula:

among them,
argmax is the largest independent variable function, g≥1, c k (g) is the vector after the gth update of the kth cluster center;

Judging whether the set of cluster centers meets a preset convergence condition;

If the set of cluster centers does not meet the convergence condition, increase g by one count unit, and return to perform the step of performing the g-th update on the set of cluster centers;

If the cluster center set satisfies the convergence condition, clustering is performed on each sentence according to the cluster center set after the gth update to obtain each cluster group.
The method for generating a robot dialogue according to claim 3, wherein the judging whether the set of cluster centers meets a preset convergence condition comprises:

Calculate the g-th update distance of each cluster center according to the following formula:

UpGdDis k,g =VecDis(c k (g),c k (g-1))

Among them, VecDis is a function for finding the distance between two vectors, and UpGdDis k,g is the g-th update distance of the k-th cluster center;

Calculate the g-th maximum update distance of the cluster center set according to the following formula:

MaxDis g =Max(UpGdDis 0,g ,UpGdDis 1,g ,...,UpGdDis k,g ,...,UpGdDis KN,g )

Wherein, Max is a maximum value function, and MaxDis g is the maximum update distance of the g-th cluster center set;

Determine whether the set of cluster centers meets the convergence conditions shown below:

MaxDis g <Thresh

Among them, Thresh is the preset distance threshold.
The method for generating a robot dialogue according to any one of claims 1 to 4, wherein the sentence vectors of each sentence are calculated according to the word vector of each word and the probability of each word appearing in the sentence set. ,Also includes:

Constructing the sentence vector of each sentence into the sentence matrix of the sentence set;

Calculate the principal components in the sentence matrix;

Update the sentence vector of each sentence according to the following formula:

v s = v s -uu T v s

Among them, u is the principal component in the sentence matrix.
A device for generating a robot dialogue, which is characterized in that it comprises:

The dialogue sentence collection module is used to collect the user's dialogue sentence after receiving the preset dialogue instruction;

The sentence set acquisition module is used to acquire a sentence set to be processed from a preset database, the sentence set includes SN sentences, and SN is an integer greater than 1;

The word segmentation processing module is used to perform word segmentation processing on each sentence in the sentence set to obtain each word set corresponding to each sentence;

The word vector query module is used to query the word vector of each word in each word set in the preset word vector database;

The probability statistics module is used to separately count the probability of each word in each word set appearing in the sentence set;

The sentence vector calculation module is used to calculate the sentence vector of each sentence according to the word vector of each word and the probability of each word appearing in the sentence set;

The clustering processing module is used to perform clustering processing on each sentence according to the sentence vector to obtain each clustering group;

A similarity calculation module for calculating the similarity between the user’s dialogue sentence and each clustering group;

The preferred group selection module is configured to select a preferred group from each clustering group, the preferred group being the clustering group with the greatest similarity to the dialogue sentence of the user;

The dialogue response module is used to query the reply sentence corresponding to the dialogue sentence of the user in the preset preferred reply sentence set, and use the reply sentence to respond to the dialogue sentence of the user, the preferred reply sentence set Is a set of reply sentences corresponding to the preferred group.
The robot dialogue generating device according to claim 6, wherein the sentence vector calculation module is specifically configured to calculate the sentence vector of each sentence according to the following formula:

Where s is the sequence number of each sentence in the sentence set, 1≤s≤SN, 1≤w≤WN s , WN s is the number of words in the sth sentence, const is the preset constant, p( w) is the probability that the wth word in the sth sentence appears in the sentence set, v w is the word vector of the wth word in the sth sentence, and v s is the sentence vector of the sth sentence .
The robot dialog generating device according to claim 6, wherein the clustering processing module comprises:

The initialization unit is used to initialize the cluster center set as shown below:

Centre(0)={c 1 (0),c 2 (0),...,c k (0),...,c KN (0)}

Where k is the serial number of each cluster center, 1≤k≤KN, KN is the preset number of cluster centers, 1<KN<SN, c k (0) is the initialization vector of the k-th cluster center, and Satisfies c k (0) T c k (0) = 1, T is a transposition symbol, and Centre(0) is the initialized set of cluster centers;

The update unit is used to update the cluster center set for the gth time according to the following formula:

among them,
argmax is the largest independent variable function, g≥1, c k (g) is the vector after the gth update of the kth cluster center;

A convergence judging unit, configured to judge whether the set of cluster centers meets a preset convergence condition;

A counting unit, configured to increase g by one counting unit if the set of cluster centers does not meet the convergence condition;

The clustering processing unit is configured to, if the cluster center set meets the convergence condition, perform clustering processing on each sentence according to the cluster center set after the gth update to obtain each cluster group.
8. The robot dialog generating device according to claim 8, wherein the convergence judgment unit comprises:

The first calculation subunit is used to calculate the g-th update distance of each cluster center according to the following formula:

UpGdDis k,g =VecDis(c k (g),c k (g-1))

Among them, VecDis is a function for finding the distance between two vectors, and UpGdDis k,g is the g-th update distance of the k-th cluster center;

The second calculation subunit is used to calculate the g-th maximum update distance of the cluster center set according to the following formula:

MaxDis g =Max(UpGdDis 0,g ,UpGdDis 1,g ,...,UpGdDis k,g ,...,UpGdDis KN,g )

Wherein, Max is a maximum value function, and MaxDis g is the maximum update distance of the g-th cluster center set;

The convergence judgment subunit is used to judge whether the set of cluster centers meets the convergence conditions shown below:

MaxDis g <Thresh

Among them, Thresh is the preset distance threshold.
The robot dialog generating device according to any one of claims 6 to 9, characterized in that it further comprises:

The matrix construction module is used to construct the sentence vector of each sentence into the sentence matrix of the sentence set;

The principal component calculation module is used to calculate the principal components in the sentence matrix;

The vector update module is used to update the statement vector of each statement according to the following formula:

v s = v s -uu T v s

Among them, u is the principal component in the sentence matrix.
A computer non-volatile readable storage medium, the computer non-volatile readable storage medium storing computer readable instructions, wherein the computer readable instructions are executed by a processor to implement the following steps:

After receiving the preset dialogue instruction, collect the user's dialogue sentence;

Obtain a set of sentences to be processed from a preset database, the set of sentences includes SN sentences, and SN is an integer greater than 1;

Perform word segmentation processing on each sentence in the sentence set to obtain each word set corresponding to each sentence;

Query the word vector of each word in each word set in the preset word vector database;

Respectively count the probability of each word in each word set appearing in the sentence set;

Respectively calculating the sentence vector of each sentence according to the word vector of each word and the probability of each word appearing in the sentence set;

Clustering each sentence according to the sentence vector to obtain each clustering group;

Respectively calculating the similarity between the user's dialogue sentence and each cluster group;

Selecting a preferred group from each clustering group, the preferred group being the clustering group with the greatest degree of similarity with the dialogue sentence of the user;

Query the reply sentence corresponding to the dialogue sentence of the user in the preset preferred reply sentence set, and use the reply sentence to respond to the dialogue sentence of the user, and the preferred reply sentence set is related to the preferred group The set of reply sentences corresponding to the group.
The computer non-volatile readable storage medium according to claim 11, wherein the calculation of the sentence vector of each sentence according to the word vector of each word and the probability of each word appearing in the sentence set comprises :

Calculate the sentence vector of each sentence according to the following formula:

Where s is the sequence number of each sentence in the sentence set, 1≤s≤SN, 1≤w≤WN s , WN s is the number of words in the sth sentence, const is the preset constant, p( w) is the probability that the wth word in the sth sentence appears in the sentence set, v w is the word vector of the wth word in the sth sentence, and v s is the sentence vector of the sth sentence .
The computer non-volatile readable storage medium according to claim 11, wherein the clustering of each sentence according to the sentence vector to obtain each clustering group comprises:

Initialize the cluster center set as shown below:

Centre(0)={c 1 (0),c 2 (0),...,c k (0),...,c KN (0)}

Where k is the serial number of each cluster center, 1≤k≤KN, KN is the preset number of cluster centers, 1<KN<SN, c k (0) is the initialization vector of the k-th cluster center, and Satisfies c k (0) T c k (0) = 1, T is a transposition symbol, and Centre(0) is the initialized set of cluster centers;

Perform the gth update on the cluster center set according to the following formula:

among them,
argmax is the largest independent variable function, g≥1, c k (g) is the vector after the gth update of the kth cluster center;

Judging whether the set of cluster centers meets a preset convergence condition;

If the set of cluster centers does not meet the convergence condition, increase g by one count unit, and return to perform the step of performing the g-th update on the set of cluster centers;

If the cluster center set satisfies the convergence condition, clustering is performed on each sentence according to the cluster center set after the gth update to obtain each cluster group.
The computer non-volatile readable storage medium according to claim 13, wherein the judging whether the set of cluster centers meets a preset convergence condition comprises:

Calculate the g-th update distance of each cluster center according to the following formula:

UpGdDis k,g =VecDis(c k (g),c k (g-1))

Among them, VecDis is a function for finding the distance between two vectors, and UpGdDis k,g is the g-th update distance of the k-th cluster center;

Calculate the g-th maximum update distance of the cluster center set according to the following formula:

MaxDis g =Max(UpGdDis 0,g ,UpGdDis 1,g ,...,UpGdDis k,g ,...,UpGdDis KN,g )

Wherein, Max is a maximum value function, and MaxDis g is the maximum update distance of the g-th cluster center set;

Determine whether the set of cluster centers meets the convergence conditions shown below:

MaxDis g <Thresh

Among them, Thresh is the preset distance threshold.
The computer non-volatile readable storage medium according to any one of claims 11 to 14, wherein each item is calculated according to the word vector of each word and the probability of each word appearing in the sentence set. After the statement vector of the statement, it also includes:

Constructing the sentence vector of each sentence into the sentence matrix of the sentence set;

Calculate the principal components in the sentence matrix;

Update the sentence vector of each sentence according to the following formula:

v s = v s -uu T v s

Among them, u is the principal component in the sentence matrix.
A robot comprising a memory, a processor, and computer readable instructions stored in the memory and capable of running on the processor, wherein the processor executes the computer readable instructions to implement the following steps :

After receiving the preset dialogue instruction, collect the user's dialogue sentence;

Obtain a set of sentences to be processed from a preset database, the set of sentences includes SN sentences, and SN is an integer greater than 1;

Perform word segmentation processing on each sentence in the sentence set to obtain each word set corresponding to each sentence;

Query the word vector of each word in each word set in the preset word vector database;

Respectively count the probability of each word in each word set appearing in the sentence set;

Respectively calculating the sentence vector of each sentence according to the word vector of each word and the probability of each word appearing in the sentence set;

Clustering each sentence according to the sentence vector to obtain each clustering group;

Respectively calculating the similarity between the user's dialogue sentence and each cluster group;

Selecting a preferred group from each clustering group, the preferred group being the clustering group with the greatest degree of similarity with the dialogue sentence of the user;

Query the reply sentence corresponding to the dialogue sentence of the user in the preset preferred reply sentence set, and use the reply sentence to respond to the dialogue sentence of the user, and the preferred reply sentence set is related to the preferred group The set of reply sentences corresponding to the group.
The robot according to claim 16, wherein the calculation of the sentence vector of each sentence according to the word vector of each word and the probability of each word appearing in the sentence set comprises:

Calculate the sentence vector of each sentence according to the following formula:

Where s is the sequence number of each sentence in the sentence set, 1≤s≤SN, 1≤w≤WN s , WN s is the number of words in the sth sentence, const is the preset constant, p( w) is the probability that the wth word in the sth sentence appears in the sentence set, v w is the word vector of the wth word in the sth sentence, and v s is the sentence vector of the sth sentence .
The robot according to claim 16, wherein the clustering of each sentence according to the sentence vector to obtain each clustering group comprises:

Initialize the cluster center set as shown below:

Centre(0)={c 1 (0),c 2 (0),...,c k (0),...,c KN (0)}

Where k is the serial number of each cluster center, 1≤k≤KN, KN is the preset number of cluster centers, 1<KN<SN, c k (0) is the initialization vector of the k-th cluster center, and Satisfies c k (0) T c k (0) = 1, T is a transposition symbol, and Centre(0) is the initialized set of cluster centers;

Perform the gth update on the cluster center set according to the following formula:

among them,
argmax is the largest independent variable function, g≥1, c k (g) is the vector after the gth update of the kth cluster center;

Judging whether the set of cluster centers meets a preset convergence condition;

If the set of cluster centers does not meet the convergence condition, increase g by one count unit, and return to perform the step of performing the g-th update on the set of cluster centers;

If the cluster center set satisfies the convergence condition, clustering is performed on each sentence according to the cluster center set after the gth update to obtain each cluster group.
The robot according to claim 18, wherein the judging whether the set of cluster centers meets a preset convergence condition comprises:

Calculate the g-th update distance of each cluster center according to the following formula:

UpGdDis k,g =VecDis(c k (g),c k (g-1))

Among them, VecDis is a function for finding the distance between two vectors, and UpGdDis k,g is the g-th update distance of the k-th cluster center;

Calculate the g-th maximum update distance of the cluster center set according to the following formula:

MaxDis g =Max(UpGdDis 0,g ,UpGdDis 1,g ,...,UpGdDis k,g ,...,UpGdDis KN,g )

Wherein, Max is a maximum value function, and MaxDis g is the maximum update distance of the g-th cluster center set;

Determine whether the set of cluster centers meets the convergence conditions shown below:

MaxDis g <Thresh

Among them, Thresh is the preset distance threshold.
The robot according to any one of claims 16 to 19, characterized in that, after calculating the sentence vectors of each sentence according to the word vector of each word and the probability of each word appearing in the sentence set, it further comprises: :

Constructing the sentence vector of each sentence into the sentence matrix of the sentence set;

Calculate the principal components in the sentence matrix;

Update the sentence vector of each sentence according to the following formula:

v s = v s -uu T v s

Among them, u is the principal component in the sentence matrix.