WO2021051508A1 - Robot dialogue generating method and apparatus, readable storage medium, and robot - Google Patents

Robot dialogue generating method and apparatus, readable storage medium, and robot Download PDF

Info

Publication number
WO2021051508A1
WO2021051508A1 PCT/CN2019/116630 CN2019116630W WO2021051508A1 WO 2021051508 A1 WO2021051508 A1 WO 2021051508A1 CN 2019116630 W CN2019116630 W CN 2019116630W WO 2021051508 A1 WO2021051508 A1 WO 2021051508A1
Authority
WO
WIPO (PCT)
Prior art keywords
sentence
word
vector
dialogue
cluster
Prior art date
Application number
PCT/CN2019/116630
Other languages
French (fr)
Chinese (zh)
Inventor
于凤英
王健宗
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021051508A1 publication Critical patent/WO2021051508A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3346Query execution using probabilistic model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Definitions

  • This application belongs to the field of computer technology, and in particular relates to a method and device for generating a robot dialogue, a computer non-volatile readable storage medium, and a robot.
  • clustering the sentences generated in the dialogue process is the basis to ensure that the robot can conduct effective dialogue.
  • the method for clustering sentences is generally based on keyword matching. This method can only take into account the local characteristics of sentences, that is, the characteristics of keywords. The lack of overall consideration of sentences leads to clustering. The accuracy of the class results is low, and the accuracy of the reply sentences generated by the robot based on such clustering results will also be low, which is difficult to meet the needs of actual dialogue scenarios.
  • the embodiments of the present application provide a method and device for generating a robot dialogue, a computer non-volatile readable storage medium, and a robot, so as to solve the problem of low accuracy of response sentences generated by robots in the prior art. problem.
  • the first aspect of the embodiments of the present application provides a method for generating a robot dialogue, which is applied to a preset robot, and the method includes:
  • Obtain a set of sentences to be processed from a preset database the set of sentences includes SN sentences, and SN is an integer greater than 1;
  • the second aspect of the embodiments of the present application provides a device for generating a robot dialogue, which may include modules for implementing the steps of the method for generating a robot dialogue.
  • a third aspect of the embodiments of the present application provides a computer non-volatile readable storage medium, the computer non-volatile readable storage medium stores computer readable instructions, and the computer readable instructions are executed by a processor When realizing the steps of the robot dialog generation method described above.
  • the fourth aspect of the embodiments of the present application provides a robot, including a memory, a processor, and computer-readable instructions stored in the memory and running on the processor, and the processor executes the computer When reading instructions, the steps of the above-mentioned robot dialogue generation method are realized.
  • the embodiment of the application has the beneficial effect that: in the calculation process of the sentence vector, the embodiment of the application fully considers the word vector of each word and the probability of each word, and can represent the sentence as a whole.
  • the embodiment of the application greatly improve the accuracy of the clustering results.
  • the accuracy of the reply sentences generated by the robot based on such clustering results will also be greatly improved.
  • FIG. 1 is a flowchart of an embodiment of a method for generating a robot dialog in an embodiment of the application
  • Figure 2 is a schematic flow chart of clustering each sentence according to the sentence vector
  • FIG. 3 is a structural diagram of an embodiment of an apparatus for generating a robot dialog in an embodiment of the application
  • Fig. 4 is a schematic block diagram of a robot in an embodiment of the application.
  • the method for generating a robot dialogue in the embodiments of the present application can be applied to a preset dialogue robot (Chatterbot), which is a robot used to simulate human dialogue or chat.
  • the dialogue robot can be set in an exhibition hall, a company reception desk, an airport information desk, a hospital information desk, etc., to provide convenient consulting services for passing users.
  • an embodiment of a method for generating a robot dialogue in an embodiment of the present application may include:
  • Step S101 After receiving a preset dialogue instruction, collect the dialogue sentence of the user.
  • the user can issue a dialogue instruction to the dialogue robot in the form of voice.
  • the dialogue robot receives the user’s voice, it can determine whether it includes preset keywords.
  • the keywords include but are not limited to words such as “please ask”, “consult”, “help”, etc. If the user’s voice If the keyword is included, it can be determined that the voice is a dialogue instruction issued by the user.
  • the user can also issue a dialogue instruction to the dialogue robot through the physical buttons or virtual buttons in the designated human-computer interaction interface.
  • the dialogue robot may include a touch screen for interacting with the user. To issue a dialogue instruction to the dialogue robot, a specific button displayed therein can be clicked.
  • the dialogue robot receives the dialogue instruction issued by the user, it can collect the dialogue sentence of the user through its own microphone, microphone and other voice collection devices.
  • Step S102 Obtain a set of sentences to be processed from a preset database.
  • the sentence set includes SN sentences, and SN is an integer greater than 1.
  • a database including massive instant messaging (IM) data can be established in advance, and the database contains as many instant messaging data generated during a certain statistical time period as possible.
  • IM massive instant messaging
  • the statistical time period can be set according to the actual situation, for example, it can be set to a time period within a week, a month, a quarter, or a year from the current moment.
  • Step S103 Perform word segmentation processing on each sentence in the sentence set to obtain each word set corresponding to each sentence.
  • Word segmentation processing refers to segmenting a sentence into individual words.
  • the sentence can be segmented according to a general dictionary to ensure that the separated words are all normal vocabulary. If the word is not in the dictionary, single words are separated .
  • both the forward and backward directions can be formed into words, such as "request for god”, it will be divided according to the statistical word frequency. If the word frequency of "requirement” is high, the word “requirement/shen” is divided, and if the word frequency of "quest for god" is high, it is divided into “must” /Pray for God".
  • the segmented words can be formed into a word set corresponding to the sentence.
  • Step S104 query the word vector of each word in each word set in the preset word vector database.
  • the word vector database is a database that records the correspondence between words and word vectors.
  • the word vector may be a corresponding word vector obtained by training the word according to the word2vec model. That is, the probability of occurrence of the word is expressed according to the context information of the word.
  • the training of word vectors is still based on the idea of word2vec. First, each word is represented as a 0-1 vector (one-hot) form, and then the word2vec model is trained with the word vector, and n-1 words are used to predict the nth word , The intermediate process obtained after the neural network model prediction is used as the word vector.
  • the one-hot vector of "celebration” is assumed to be [1,0,0,0,...,0] and the one-hot vector of "meeting” is [0,1,0,0,... ...,0], the one-hot vector for "smooth” is [0,0,1,0, whil,0], the vector for predicting "closing” [0,0,0,1,>,0],
  • the model is trained to generate the coefficient matrix W of the hidden layer.
  • the product of the one-hot vector of each word and the coefficient matrix is the word vector of the word.
  • the final form will be similar to "Celebrate [-0.28,0.34,-0.02, ......,0.92]" such a multi-dimensional vector.
  • Step S105 Count the probability of each word in each word set appearing in the sentence set.
  • Step S106 Calculate the sentence vector of each sentence according to the word vector of each word and the probability of each word appearing in the sentence set.
  • the maximum likelihood method can be used to estimate the sentence vector.
  • all word vectors are roughly uniformly distributed in the entire vector space, and the likelihood function of each sentence is constructed as shown below:
  • s is the sequence number of each sentence in the sentence set
  • 1 ⁇ s ⁇ SN, 1 ⁇ w ⁇ WN s , WN s is the number of words in the sth sentence
  • is a preset constant
  • p( w) is the probability that the wth word in the sth sentence appears in the sentence set
  • v w is the word vector of the wth word in the sth sentence
  • v s is the sentence vector of the sth sentence
  • ⁇ v s ,v w > is the angle between the two vectors v s and v w
  • Z is the preset constant
  • Sentense s is the sth sentence
  • v s ) is the sth sentence
  • the likelihood function is the probability that the wth word in the sth sentence appears in the sentence set
  • v w is the word vector of the wth word in the sth sentence
  • v s is the sentence vector of the
  • ln is the natural logarithm function
  • word w is the wth word in the sth sentence
  • v s ) is the likelihood function of the wth word in the sth sentence.
  • the sentence vector of each sentence can be expressed as a weighted average of the word vectors of the words contained, namely:
  • the sentence vector obtained by the above method is its maximum likelihood estimate, or from the Bayesian point of view, the maximum posterior probability. Compared with the traditional simple average, this method takes into account the semantic level information, and the clustering effect is better; compared with the deep learning system, this method is more efficient and effective, and more importantly, it does not need to be labeled. Training data, sentence vectorization makes sentences can be calculated and clustered.
  • the sentence vector of each sentence can also be constructed as a sentence matrix of the sentence set, wherein the sentence vector of each sentence is used as a row of the sentence matrix, so The number of rows of the sentence matrix is consistent with the number of sentences included in the sentence set.
  • PCA Principal Component Analysis
  • u is the principal component in the sentence matrix
  • the updated sentence vector is obtained, that is, the sentence vector from which the principal component interference is removed. It should be noted that the sentence vector used in the subsequent steps refers to the updated sentence vector.
  • sentence vector of each sentence can also be normalized according to the following formula:
  • mean is the averaging function
  • var is the variance function
  • ⁇ norm is the preset constant
  • normalized sentence vector is obtained. It should be noted that the sentence vectors used in the subsequent steps all refer to the normalized sentence vectors.
  • sentence vector of each sentence can also be whitened according to the following formula:
  • VDV T cov(Matrix)
  • Matrix is the sentence matrix constructed from the sentence vectors of each sentence
  • cov is the function for finding the covariance matrix
  • ⁇ zca is the preset constant
  • I is the whitened sentence vector of the identity matrix. It should be noted that the sentence vectors used in the subsequent steps all refer to the whitened sentence vectors.
  • Step S107 Perform clustering processing on each sentence according to the sentence vector to obtain each cluster group.
  • step S107 may specifically include the following steps:
  • Step S1071 initialize the cluster center set.
  • cluster center set as shown below can be initialized:
  • k is the serial number of each cluster center, 1 ⁇ k ⁇ KN
  • KN is the preset number of cluster centers, 1 ⁇ KN ⁇ SN
  • the specific value can be set according to the actual situation, for example, it can be set Is 5, 10, 15, 20 or other values
  • c k (0) is the initialization vector of the k-th cluster center
  • T is the transposed symbol
  • Centre(0) is the initialized cluster center set.
  • Step S1072 update the cluster center set for the g th time.
  • the g-th update of the cluster center set can be performed according to the following formula:
  • Step S1073 Determine whether the set of cluster centers meets a preset convergence condition.
  • UpGdDis k,g VecDis(c k (g),c k (g-1))
  • VecDis is a function for finding the distance between two vectors
  • UpGdDis k,g is the g-th update distance of the k-th cluster center.
  • MaxDis g Max(UpGdDis 0,g ,UpGdDis 1,g ,...,UpGdDis k,g ,...,UpGdDis KN,g )
  • Max is a maximum value function
  • MaxDis g is the maximum update distance of the g-th cluster center set.
  • Thresh is a preset distance threshold, and its specific value can be set according to actual conditions. For example, it can be set to 0.1, 0.01, 0.001 or other values.
  • step S1074 and subsequent steps are performed, and if the set of cluster centers meets the convergence condition, step S1075 is performed.
  • Step S1074 Increase g by one counting unit.
  • Step S1075 Perform clustering processing on each sentence according to the cluster center set after the gth update to obtain each cluster group.
  • the distance between its sentence vector and the vector of each cluster center in the cluster center set can be calculated separately, and clustered to the cluster center with the smallest distance therefrom.
  • each sentence clustered to the same cluster center forms a clustering group, and the final clustering result can be obtained.
  • Step S108 Calculate the similarity between the dialogue sentence of the user and each cluster group respectively.
  • the sentence vector of the dialogue sentence of the user can be calculated, which is denoted as SenVec here.
  • the specific calculation process is similar to the process in step S103 to step S106, and you can refer to the foregoing content, which will not be repeated here.
  • Recip is a function for calculating the reciprocal
  • SimDeg k is the similarity between the dialogue sentence of the user and the k-th cluster group.
  • Step S109 Select a preferred group from each cluster group.
  • the preferred group is the cluster group with the greatest similarity to the dialogue sentences of the user, namely:
  • argmin is the smallest independent variable function
  • SelGroup is the serial number of the preferred group.
  • Step S110 Query the reply sentence corresponding to the dialogue sentence of the user in the preset preferred reply sentence set, and use the reply sentence to respond to the dialogue sentence of the user.
  • the set of preferred reply sentences is a set of reply sentences corresponding to the preferred group.
  • a set of reply sentences corresponding to each cluster group can be separately set in advance.
  • Each set of reply sentences includes a predetermined number of reply sentences, and each reply sentence is used to perform a certain preset specified sentence. answer.
  • the dialogue robot needs to respond to the user's dialogue sentence, it can calculate the distance between the sentence vector of the user's dialogue sentence and the sentence vector of each specified sentence, and determine the corresponding distance when the minimum value is obtained. Then, query the reply sentence corresponding to the specified sentence in the preferred reply sentence set, and use this reply sentence to respond to the dialog sentence of the user.
  • the word vector of each word and the probability of each word appearing are fully considered, which can characterize the characteristics of the sentence as a whole, and greatly improve the accuracy of the clustering result.
  • the accuracy of the reply sentences generated by the robot based on such clustering results will also be greatly improved.
  • FIG. 3 shows a structural diagram of an embodiment of a device for generating a robot dialog provided by an embodiment of the present application.
  • an apparatus for generating a robot dialog may include:
  • the dialogue sentence collection module 301 is used to collect the user's dialogue sentence after receiving a preset dialogue instruction
  • the sentence set obtaining module 302 is configured to obtain a sentence set to be processed from a preset database, the sentence set includes SN sentences, and SN is an integer greater than 1;
  • the word segmentation processing module 303 is configured to perform word segmentation processing on each sentence in the sentence set to obtain each word set corresponding to each sentence respectively;
  • the word vector query module 304 is used to query the word vector of each word in each word set in a preset word vector database
  • the probability statistics module 305 is used to separately count the probability of each word in each word set appearing in the sentence set;
  • the sentence vector calculation module 306 is configured to calculate the sentence vector of each sentence according to the word vector of each word and the probability of each word appearing in the sentence set;
  • the clustering processing module 307 is configured to perform clustering processing on each sentence according to the sentence vector to obtain each clustering group;
  • the similarity calculation module 308 is configured to calculate the similarity between the dialogue sentence of the user and each cluster group respectively;
  • the preferred group selection module 309 is configured to select a preferred group from each cluster group, and the preferred group is the cluster group with the greatest similarity to the dialog sentence of the user;
  • the dialogue response module 310 is used to query the reply sentence corresponding to the dialogue sentence of the user in the preset preferred reply sentence set, and use the reply sentence to respond to the dialogue sentence of the user, the preferred reply sentence
  • the set is a set of reply sentences corresponding to the preferred group.
  • the clustering processing module may include:
  • the initialization unit is used to initialize the cluster center set
  • An update unit configured to update the cluster center set for the g th time
  • a convergence judging unit configured to judge whether the set of cluster centers meets a preset convergence condition
  • a counting unit configured to increase g by one counting unit if the set of cluster centers does not meet the convergence condition
  • the clustering processing unit is configured to, if the cluster center set meets the convergence condition, perform clustering processing on each sentence according to the cluster center set after the gth update to obtain each cluster group.
  • the convergence judgment unit may include:
  • the first calculation subunit is used to calculate the g-th update distance of each cluster center
  • the second calculation subunit is used to calculate the g-th maximum update distance of the cluster center set
  • the convergence judgment subunit is used to judge whether the set of cluster centers meets the convergence condition.
  • the device for generating a robot dialogue may further include:
  • the matrix construction module is used to construct the sentence vector of each sentence into the sentence matrix of the sentence set;
  • the principal component calculation module is used to calculate the principal components in the sentence matrix
  • the vector update module is used to update the statement vector of each statement.
  • Fig. 4 shows a schematic block diagram of a robot provided by an embodiment of the present application. For ease of description, only parts related to the embodiment of the present application are shown.
  • the robot 4 may include: a processor 40, a memory 41, and computer-readable instructions 42 stored in the memory 41 and executable on the processor 40, such as executing the aforementioned robot dialogue. Generate computer-readable instructions for the method.
  • the processor 40 executes the computer-readable instructions 42, the steps in the above embodiments of the robot dialog generation method are implemented.
  • the computer-readable instructions 42 may be divided into one or more modules/units, and the one or more modules/units are stored in the memory 41 and executed by the processor 40, To complete this application.
  • the one or more modules/units may be a series of computer-readable instruction segments capable of completing specific functions, and the instruction segments are used to describe the execution process of the computer-readable instructions 42 in the robot 4.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A robot dialogue generating method and apparatus, a computer non-volatile readable storage medium, and a robot, relating to the technical field of computers. After a preset dialogue instruction is received, a dialogue statement of a user is acquired; a statement set to be processed is obtained; word segmentation processing is respectively performed on each statement in the statement set to obtain each word set corresponding to each statement; a word vector of each word in each word set is queried respectively; the occurrence probability of each word in each word set in the statement set is counted respectively; a statement vector of each statement is calculated respectively; clustering processing is performed on each statement according to the statement vector to obtain each clustering group; the similarity between the dialogue statement of the user and each clustering group is calculated respectively; and a reply statement corresponding to the dialogue statement of the user is queried, and the dialogue statement of the user is responded to by using the reply statement, so that the accuracy of a reply statement generated by a robot according to a clustering result is improved.

Description

机器人对话生成方法、装置、可读存储介质及机器人Robot dialogue generation method, device, readable storage medium and robot
本申请要求于2019年9月18日提交中国专利局、申请号为201910880859.9、发明名称为“机器人对话生成方法、装置、可读存储介质及机器人”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on September 18, 2019, the application number is 201910880859.9, and the invention title is "Robot dialogue generation method, device, readable storage medium and robot". The entire content of the application is approved. The reference is incorporated in this application.
技术领域Technical field
本申请属于计算机技术领域,尤其涉及一种机器人对话生成方法、装置、计算机非易失性可读存储介质及机器人。This application belongs to the field of computer technology, and in particular relates to a method and device for generating a robot dialogue, a computer non-volatile readable storage medium, and a robot.
背景技术Background technique
在机器人智能对话技术中,将对话过程中产生的语句进行聚类是保证机器人能够进行有效对话的基础。现有技术中,对语句进行聚类的方法一般都是基于关键词匹配的,这种方式往往只能考虑到语句的局部特征,也即关键词的特征,缺乏对语句的整体考虑,导致聚类结果的准确率较低,机器人根据这样的聚类结果所生成的回复语句的准确率也会较低,难以满足实际对话场景的需求。In the robot intelligent dialogue technology, clustering the sentences generated in the dialogue process is the basis to ensure that the robot can conduct effective dialogue. In the prior art, the method for clustering sentences is generally based on keyword matching. This method can only take into account the local characteristics of sentences, that is, the characteristics of keywords. The lack of overall consideration of sentences leads to clustering. The accuracy of the class results is low, and the accuracy of the reply sentences generated by the robot based on such clustering results will also be low, which is difficult to meet the needs of actual dialogue scenarios.
技术问题technical problem
有鉴于此,本申请实施例提供了一种机器人对话生成方法、装置、计算机非易失性可读存储介质及机器人,以解决现有技术中存在的机器人生成的回复语句的准确率较低的问题。In view of this, the embodiments of the present application provide a method and device for generating a robot dialogue, a computer non-volatile readable storage medium, and a robot, so as to solve the problem of low accuracy of response sentences generated by robots in the prior art. problem.
技术解决方案Technical solutions
本申请实施例的第一方面提供了一种机器人对话生成方法,应用于预设的机器人中,所述方法包括:The first aspect of the embodiments of the present application provides a method for generating a robot dialogue, which is applied to a preset robot, and the method includes:
在接收到预设的对话指令后,采集用户的对话语句;After receiving the preset dialogue instruction, collect the user's dialogue sentence;
从预设的数据库中获取待处理的语句集合,所述语句集合中包括SN条语句,SN为大于1的整数;Obtain a set of sentences to be processed from a preset database, the set of sentences includes SN sentences, and SN is an integer greater than 1;
对所述语句集合中的各条语句分别进行分词处理,得到与各条语句分别对应的各个词语集合;Perform word segmentation processing on each sentence in the sentence set to obtain each word set corresponding to each sentence;
在预设的词语向量数据库中分别查询各个词语集合中的各个词语的词语向量;Query the word vector of each word in each word set in the preset word vector database;
分别统计各个词语集合中的各个词语在所述语句集合中出现的概率;Respectively count the probability of each word in each word set appearing in the sentence set;
根据各个词语的词语向量和各个词语在所述语句集合中出现的概率分别计算各条语句的语句向量;Respectively calculating the sentence vector of each sentence according to the word vector of each word and the probability of each word appearing in the sentence set;
根据所述语句向量对各条语句进行聚类处理,得到各个聚类群组;Clustering each sentence according to the sentence vector to obtain each clustering group;
分别计算所述用户的对话语句与各个聚类群组之间的相似度;Respectively calculating the similarity between the user's dialogue sentence and each cluster group;
从各个聚类群组中选取出优选群组,所述优选群组为与所述用户的对话语句之间 的相似度最大的聚类群组;Selecting a preferred group from each clustering group, the preferred group being the clustering group with the greatest degree of similarity with the dialogue sentence of the user;
在预设的优选回复语句集合中查询与所述用户的对话语句对应的回复语句,并使用所述回复语句对所述用户的对话语句进行应答,所述优选回复语句集合为与所述优选群组对应的回复语句集合。Query the reply sentence corresponding to the dialogue sentence of the user in the preset preferred reply sentence set, and use the reply sentence to respond to the dialogue sentence of the user, and the preferred reply sentence set is related to the preferred group The set of reply sentences corresponding to the group.
本申请实施例的第二方面提供了一种机器人对话生成装置,可以包括用于实现上述机器人对话生成方法的步骤的模块。The second aspect of the embodiments of the present application provides a device for generating a robot dialogue, which may include modules for implementing the steps of the method for generating a robot dialogue.
本申请实施例的第三方面提供了一种计算机非易失性可读存储介质,所述计算机非易失性可读存储介质存储有计算机可读指令,所述计算机可读指令被处理器执行时实现上述机器人对话生成方法的步骤。A third aspect of the embodiments of the present application provides a computer non-volatile readable storage medium, the computer non-volatile readable storage medium stores computer readable instructions, and the computer readable instructions are executed by a processor When realizing the steps of the robot dialog generation method described above.
本申请实施例的第四方面提供了一种机器人,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现上述机器人对话生成方法的步骤。The fourth aspect of the embodiments of the present application provides a robot, including a memory, a processor, and computer-readable instructions stored in the memory and running on the processor, and the processor executes the computer When reading instructions, the steps of the above-mentioned robot dialogue generation method are realized.
有益效果Beneficial effect
本申请实施例与现有技术相比存在的有益效果是:本申请实施例在语句向量的计算过程中,充分考虑了各个词语的词语向量以及各个词语出现的概率,可以从整体上表征语句的特征,极大提高聚类结果的准确率,相应地,机器人根据这样的聚类结果所生成的回复语句的准确率也会得到极大提高。Compared with the prior art, the embodiment of the application has the beneficial effect that: in the calculation process of the sentence vector, the embodiment of the application fully considers the word vector of each word and the probability of each word, and can represent the sentence as a whole. Features greatly improve the accuracy of the clustering results. Correspondingly, the accuracy of the reply sentences generated by the robot based on such clustering results will also be greatly improved.
附图说明Description of the drawings
图1为本申请实施例中一种机器人对话生成方法的一个实施例流程图;FIG. 1 is a flowchart of an embodiment of a method for generating a robot dialog in an embodiment of the application;
图2为根据语句向量对各条语句进行聚类处理的示意流程图;Figure 2 is a schematic flow chart of clustering each sentence according to the sentence vector;
图3为本申请实施例中一种机器人对话生成装置的一个实施例结构图;FIG. 3 is a structural diagram of an embodiment of an apparatus for generating a robot dialog in an embodiment of the application;
图4为本申请实施例中一种机器人的示意框图。Fig. 4 is a schematic block diagram of a robot in an embodiment of the application.
本发明的实施方式Embodiments of the present invention
本申请实施例中的机器人对话生成方法可以应用于预设的对话机器人(Chatterbot)中,所述对话机器人是一个用来模拟人类对话或聊天的机器人。在本申请实施例的一种具体应用场景中,可以将所述对话机器人设置在展会大厅、公司接待前台、机场问询处、医院咨询台等等位置,为经过的用户提供便捷的咨询服务。The method for generating a robot dialogue in the embodiments of the present application can be applied to a preset dialogue robot (Chatterbot), which is a robot used to simulate human dialogue or chat. In a specific application scenario of the embodiment of the present application, the dialogue robot can be set in an exhibition hall, a company reception desk, an airport information desk, a hospital information desk, etc., to provide convenient consulting services for passing users.
请参阅图1,本申请实施例中一种机器人对话生成方法的一个实施例可以包括:Referring to FIG. 1, an embodiment of a method for generating a robot dialogue in an embodiment of the present application may include:
步骤S101、在接收到预设的对话指令后,采集用户的对话语句。Step S101: After receiving a preset dialogue instruction, collect the dialogue sentence of the user.
在本实施例中,用户可以通过语音的形式向所述对话机器人下发对话指令。所述对话机器人在接收到用户的语音时,可以判断其中是否包括预设的关键词,所述关键词包括但不限于“请问”、“咨询”、“帮助”等等词语,若用户的语音中包括所述 关键词,则可以判定该语音即为用户下发的对话指令。用户还可以通过指定的人机交互界面中的实体按键或者虚拟按键向所述对话机器人下发对话指令,例如,所述对话机器人可以包括一个触控屏幕,用于与用户进行交互,当用户需要向所述对话机器人下发对话指令,可以点击其中显示的特定按键。所述对话机器人在接收到用户下发的对话指令之后,即可通过自身携带的话筒、麦克风等语音采集设备采集所述用户的对话语句。In this embodiment, the user can issue a dialogue instruction to the dialogue robot in the form of voice. When the dialogue robot receives the user’s voice, it can determine whether it includes preset keywords. The keywords include but are not limited to words such as “please ask”, “consult”, “help”, etc. If the user’s voice If the keyword is included, it can be determined that the voice is a dialogue instruction issued by the user. The user can also issue a dialogue instruction to the dialogue robot through the physical buttons or virtual buttons in the designated human-computer interaction interface. For example, the dialogue robot may include a touch screen for interacting with the user. To issue a dialogue instruction to the dialogue robot, a specific button displayed therein can be clicked. After the dialogue robot receives the dialogue instruction issued by the user, it can collect the dialogue sentence of the user through its own microphone, microphone and other voice collection devices.
步骤S102、从预设的数据库中获取待处理的语句集合。Step S102: Obtain a set of sentences to be processed from a preset database.
所述语句集合中包括SN条语句,SN为大于1的整数。The sentence set includes SN sentences, and SN is an integer greater than 1.
在本实施例中,可以预先建立起一个包括海量即时通讯(Instant Messaging,IM)数据的数据库,在该数据库中尽可能多的包含某一统计时间段内产生的即时通讯数据,这些即时通讯数据以语句的形式呈现,该统计时间段可以根据实际情况进行设置,例如,可以将其设置为距离当前时刻一周、一个月、一个季度或者一年内的时间段。In this embodiment, a database including massive instant messaging (IM) data can be established in advance, and the database contains as many instant messaging data generated during a certain statistical time period as possible. These instant messaging data Presented in the form of sentences, the statistical time period can be set according to the actual situation, for example, it can be set to a time period within a week, a month, a quarter, or a year from the current moment.
步骤S103、对所述语句集合中的各条语句分别进行分词处理,得到与各条语句分别对应的各个词语集合。Step S103: Perform word segmentation processing on each sentence in the sentence set to obtain each word set corresponding to each sentence.
分词处理是指将一条语句切分成一个一个单独的词语,在本实施例中,可以根据通用词典对语句进行切分,保证分出的词语都是正常词汇,如词语不在词典内则分出单字。当前后方向都可以成词时,例如“要求神”,会根据统计词频的大小划分,如“要求”词频高则分出“要求/神”,如“求神”词频高则分出“要/求神”。对于所述语句集合中的任意一条语句而言,经过分词处理后,可以将切分出来的词语组成与该语句对应的词语集合。Word segmentation processing refers to segmenting a sentence into individual words. In this embodiment, the sentence can be segmented according to a general dictionary to ensure that the separated words are all normal vocabulary. If the word is not in the dictionary, single words are separated . When both the forward and backward directions can be formed into words, such as "request for god", it will be divided according to the statistical word frequency. If the word frequency of "requirement" is high, the word "requirement/shen" is divided, and if the word frequency of "quest for god" is high, it is divided into "must" /Pray for God". For any sentence in the sentence set, after word segmentation processing, the segmented words can be formed into a word set corresponding to the sentence.
步骤S104、在预设的词语向量数据库中分别查询各个词语集合中的各个词语的词语向量。Step S104, query the word vector of each word in each word set in the preset word vector database.
所述词语向量数据库为记录词语与词语向量之间的对应关系的数据库。所述词语向量可以是根据word2vec模型训练词语所得到对应的词语向量。即根据词语的上下文信息来表示该词出现的概率。词语向量的训练依然按照word2vec的思想,先将每个词表示成一个0-1向量(one-hot)形式,再用词语向量进行word2vec模型训练,用n-1个词来预测第n个词,神经网络模型预测后得到的中间过程作为词语向量。具体地,如“庆祝”的one-hot向量假设定为[1,0,0,0,……,0],“大会”的one-hot向量为[0,1,0,0,……,0],“顺利”的one-hot向量为[0,0,1,0,……,0],预测“闭幕”的向量[0,0,0,1,……,0],模型经过训练会生成隐藏层的系数矩阵W,每个词的one-hot向量和系数矩阵的乘积为该词的词语向量,最后的形式将是类似于“庆祝[-0.28,0.34,-0.02,…...,0.92]”这样的一个多维向量。The word vector database is a database that records the correspondence between words and word vectors. The word vector may be a corresponding word vector obtained by training the word according to the word2vec model. That is, the probability of occurrence of the word is expressed according to the context information of the word. The training of word vectors is still based on the idea of word2vec. First, each word is represented as a 0-1 vector (one-hot) form, and then the word2vec model is trained with the word vector, and n-1 words are used to predict the nth word , The intermediate process obtained after the neural network model prediction is used as the word vector. Specifically, for example, the one-hot vector of "celebration" is assumed to be [1,0,0,0,...,0], and the one-hot vector of "meeting" is [0,1,0,0,... …,0], the one-hot vector for "smooth" is [0,0,1,0,……,0], the vector for predicting "closing" [0,0,0,1,……,0], The model is trained to generate the coefficient matrix W of the hidden layer. The product of the one-hot vector of each word and the coefficient matrix is the word vector of the word. The final form will be similar to "Celebrate [-0.28,0.34,-0.02, …...,0.92]" such a multi-dimensional vector.
步骤S105、分别统计各个词语集合中的各个词语在所述语句集合中出现的概率。Step S105: Count the probability of each word in each word set appearing in the sentence set.
步骤S106、根据各个词语的词语向量和各个词语在所述语句集合中出现的概率分别计算各条语句的语句向量。Step S106: Calculate the sentence vector of each sentence according to the word vector of each word and the probability of each word appearing in the sentence set.
在本实施例中,可以使用最大似然法来估计语句向量,此处假设所有的词语向量是大致均匀分布在整个向量空间中的,并构造出如下所示的各个语句的似然函数:In this embodiment, the maximum likelihood method can be used to estimate the sentence vector. Here, it is assumed that all word vectors are roughly uniformly distributed in the entire vector space, and the likelihood function of each sentence is constructed as shown below:
Figure PCTCN2019116630-appb-000001
Figure PCTCN2019116630-appb-000001
其中,s为所述语句集合中的各条语句的序号,1≤s≤SN,1≤w≤WN s,WN s为第s条语句中的词语数目,α为预设的常数,p(w)为第s条语句中的第w个词语在所述语句集合中出现的概率,v w为第s条语句中的第w个词语的词语向量,v s为第s条语句的语句向量,<v s,v w>为v s和v w这两个向量的夹角,Z为预设的常数,Sentense s为第s条语句,f(Sentense s|v s)为第s条语句的似然函数。 Where s is the sequence number of each sentence in the sentence set, 1≤s≤SN, 1≤w≤WN s , WN s is the number of words in the sth sentence, α is a preset constant, p( w) is the probability that the wth word in the sth sentence appears in the sentence set, v w is the word vector of the wth word in the sth sentence, and v s is the sentence vector of the sth sentence , <v s ,v w > is the angle between the two vectors v s and v w , Z is the preset constant, Sentense s is the sth sentence, and f(Sentense s |v s ) is the sth sentence The likelihood function.
对这一似然函数取对数,得到如下所示的各个词语的似然函数:Take the logarithm of this likelihood function to obtain the likelihood function of each word as shown below:
Figure PCTCN2019116630-appb-000002
Figure PCTCN2019116630-appb-000002
其中,ln为自然对数函数,word w为第s条语句中的第w个词语,f(word w|v s)为第s条语句中的第w个词语的似然函数。 Among them, ln is the natural logarithm function, word w is the wth word in the sth sentence, and f(word w |v s ) is the likelihood function of the wth word in the sth sentence.
将最大化上式作为目标,即:
Figure PCTCN2019116630-appb-000003
则可以得到:
The goal is to maximize the above formula, namely:
Figure PCTCN2019116630-appb-000003
You can get:
Figure PCTCN2019116630-appb-000004
Figure PCTCN2019116630-appb-000004
其中,
Figure PCTCN2019116630-appb-000005
是一个常数,∝为正比符号。
among them,
Figure PCTCN2019116630-appb-000005
Is a constant, and ∝ is a proportional sign.
在本实施例中,可以将每个语句的语句向量表达为所含词语的词语向量的加权平均,即:In this embodiment, the sentence vector of each sentence can be expressed as a weighted average of the word vectors of the words contained, namely:
Figure PCTCN2019116630-appb-000006
Figure PCTCN2019116630-appb-000006
通过以上方法得到的语句向量是对其的最大似然估计,或者从贝叶斯的角度来说,是最大后验概率。与传统的简单平均来比较,该方式考虑到了语义层面的信息,聚类效果更好;而与深度学习系统对比,这种方式计算更有效率,效果更好,更重要的是不需要有标注的训练数据,语句向量化使得语句可计算从而可聚类。The sentence vector obtained by the above method is its maximum likelihood estimate, or from the Bayesian point of view, the maximum posterior probability. Compared with the traditional simple average, this method takes into account the semantic level information, and the clustering effect is better; compared with the deep learning system, this method is more efficient and effective, and more importantly, it does not need to be labeled. Training data, sentence vectorization makes sentences can be calculated and clustered.
进一步地,在计算得到各条语句的语句向量之后,还可以将各条语句的语句向量构造为所述语句集合的语句矩阵,其中,每条语句的语句向量作为所述语句矩阵的一 行,所述语句矩阵的行数与所述语句集合中包括的语句条数一致。然后使用主成分分析方法(Principal Component Analysis,PCA)计算出所述语句矩阵中的主成分,并根据下式对各条语句的语句向量进行更新:Further, after calculating the sentence vector of each sentence, the sentence vector of each sentence can also be constructed as a sentence matrix of the sentence set, wherein the sentence vector of each sentence is used as a row of the sentence matrix, so The number of rows of the sentence matrix is consistent with the number of sentences included in the sentence set. Then use the principal component analysis method (Principal Component Analysis, PCA) to calculate the principal components in the sentence matrix, and update the sentence vector of each sentence according to the following formula:
v s=v s-uu Tv s v s = v s -uu T v s
其中,u为所述语句矩阵中的主成分,最后得到更新后的语句向量,也即去除掉主成分干扰的语句向量。需要注意的是,后续步骤中所使用到的语句向量,均指更新后的语句向量。Among them, u is the principal component in the sentence matrix, and finally the updated sentence vector is obtained, that is, the sentence vector from which the principal component interference is removed. It should be noted that the sentence vector used in the subsequent steps refers to the updated sentence vector.
进一步地,还可以根据下式对各条语句的语句向量进行归一化处理:Further, the sentence vector of each sentence can also be normalized according to the following formula:
Figure PCTCN2019116630-appb-000007
Figure PCTCN2019116630-appb-000007
其中,mean为求平均值函数,var为求方差函数,ε norm为预设的常数,最后得到归一化的语句向量。需要注意的是,后续步骤中所使用到的语句向量,均指归一化的语句向量。 Among them, mean is the averaging function, var is the variance function, ε norm is the preset constant, and finally the normalized sentence vector is obtained. It should be noted that the sentence vectors used in the subsequent steps all refer to the normalized sentence vectors.
进一步地,还可以根据下式对各条语句的语句向量进行白化处理:Further, the sentence vector of each sentence can also be whitened according to the following formula:
[V,D]=eig(cov(Matrix))[V,D]=eig(cov(Matrix))
VDV T=cov(Matrix) VDV T = cov(Matrix)
v s=V(D+ε zcaI) -0.5V Tv s v s =V(D+ε zca I) -0.5 V T v s
其中,Matrix为将各条语句的语句向量构造成的语句矩阵,cov为求协方差矩阵的函数,[V,D]=eig(cov(Matrix))表示求矩阵cov(Matrix)的全部特征值,构成对角阵D,并求矩阵cov(Matrix)的特征向量,构成V的列向量,ε zca为预设的常数,I为单位矩阵最后得到白化的语句向量。需要注意的是,后续步骤中所使用到的语句向量,均指白化的语句向量。 Among them, Matrix is the sentence matrix constructed from the sentence vectors of each sentence, cov is the function for finding the covariance matrix, [V,D]=eig(cov(Matrix)) means finding all the characteristic values of the matrix cov(Matrix) , Form the diagonal matrix D, and find the eigenvector of the matrix cov(Matrix) to form the column vector of V, ε zca is the preset constant, and I is the whitened sentence vector of the identity matrix. It should be noted that the sentence vectors used in the subsequent steps all refer to the whitened sentence vectors.
步骤S107、根据所述语句向量对各条语句进行聚类处理,得到各个聚类群组。Step S107: Perform clustering processing on each sentence according to the sentence vector to obtain each cluster group.
如图2所示,步骤S107具体可以包括如下步骤:As shown in FIG. 2, step S107 may specifically include the following steps:
步骤S1071、初始化聚类中心集合。Step S1071, initialize the cluster center set.
具体地,可以初始化如下所示的聚类中心集合:Specifically, the cluster center set as shown below can be initialized:
Centre(0)={c 1(0),c 2(0),...,c k(0),...,c KN(0)} Centre(0)={c 1 (0),c 2 (0),...,c k (0),...,c KN (0)}
其中,k为各个聚类中心的序号,1≤k≤KN,KN为聚类中心的预设数目,1<KN<SN,其具体取值可以根据实际情况进行设置,例如,可以将其设置为5、10、15、20或者其它取值,c k(0)为第k个聚类中心的初始化向量,该向量的维度与各条语句的语句向量的维度一致,且满足c k(0) Tc k(0)=1,其具体的向量值可以进行随机设置,T为转置符号,Centre(0)为初始化的所述聚类中心集合。 Among them, k is the serial number of each cluster center, 1≤k≤KN, KN is the preset number of cluster centers, 1<KN<SN, the specific value can be set according to the actual situation, for example, it can be set Is 5, 10, 15, 20 or other values, c k (0) is the initialization vector of the k-th cluster center, the dimension of this vector is consistent with the dimension of the sentence vector of each sentence, and satisfies c k (0 ) T c k (0)=1, its specific vector value can be randomly set, T is the transposed symbol, and Centre(0) is the initialized cluster center set.
步骤S1072、对所述聚类中心集合进行第g次更新。Step S1072, update the cluster center set for the g th time.
具体地,可以根据下式对所述聚类中心集合进行第g次更新:Specifically, the g-th update of the cluster center set can be performed according to the following formula:
Figure PCTCN2019116630-appb-000008
Figure PCTCN2019116630-appb-000008
其中,
Figure PCTCN2019116630-appb-000009
argmax为最大自变量函数,g≥1,初始状态下,g=1,c k(g)为第k个聚类中心的第g次更新后的向量。
among them,
Figure PCTCN2019116630-appb-000009
argmax is the maximum independent variable function, g≥1, in the initial state, g=1, and c k (g) is the vector after the gth update of the kth cluster center.
步骤S1073、判断所述聚类中心集合是否满足预设的收敛条件。Step S1073: Determine whether the set of cluster centers meets a preset convergence condition.
具体地,首先根据下式分别计算各个聚类中心第g次的更新距离:Specifically, first calculate the g-th update distance of each cluster center according to the following formula:
UpGdDis k,g=VecDis(c k(g),c k(g-1)) UpGdDis k,g =VecDis(c k (g),c k (g-1))
其中,VecDis为求两个向量之间的距离的函数,UpGdDis k,g为第k个聚类中心第g次的更新距离。 Among them, VecDis is a function for finding the distance between two vectors, and UpGdDis k,g is the g-th update distance of the k-th cluster center.
然后,根据下式计算所述聚类中心集合第g次的最大更新距离:Then, calculate the g-th maximum update distance of the cluster center set according to the following formula:
MaxDis g=Max(UpGdDis 0,g,UpGdDis 1,g,...,UpGdDis k,g,...,UpGdDis KN,g) MaxDis g =Max(UpGdDis 0,g ,UpGdDis 1,g ,...,UpGdDis k,g ,...,UpGdDis KN,g )
其中,Max为求最大值函数,MaxDis g为所述聚类中心集合第g次的最大更新距离。 Wherein, Max is a maximum value function, and MaxDis g is the maximum update distance of the g-th cluster center set.
接着,判断所述聚类中心集合是否满足如下所示的收敛条件:Next, determine whether the set of cluster centers meets the convergence conditions shown below:
MaxDis g<Thresh MaxDis g <Thresh
其中,Thresh为预设的距离阈值,其具体取值可以根据实际情况进行设置,例如,可以将其设置为0.1、0.01、0.001或者其它取值。Among them, Thresh is a preset distance threshold, and its specific value can be set according to actual conditions. For example, it can be set to 0.1, 0.01, 0.001 or other values.
若所述聚类中心集合不满足所述收敛条件,则执行步骤S1074及其后续步骤,若所述聚类中心集合满足所述收敛条件,则执行步骤S1075。If the set of cluster centers does not meet the convergence condition, step S1074 and subsequent steps are performed, and if the set of cluster centers meets the convergence condition, step S1075 is performed.
步骤S1074、将g增加一个计数单位。Step S1074: Increase g by one counting unit.
即执行:g=g+1,然后返回执行步骤S1072及其后续步骤,直至所述聚类中心集合满足所述收敛条件为止。That is, execute: g=g+1, and then return to execute step S1072 and subsequent steps until the set of cluster centers meets the convergence condition.
步骤S1075、根据第g次更新后的所述聚类中心集合对各条语句进行聚类处理,得到各个聚类群组。Step S1075: Perform clustering processing on each sentence according to the cluster center set after the gth update to obtain each cluster group.
此时的所述聚类中心集合即为最终确定的聚类中心集合,将其记为:Centre={c 1,c 2,...,c k,...,c KN},其中,c k即为最终确定的第k个聚类中心。 The set of cluster centers at this time is the final set of cluster centers, which is recorded as: Centre={c 1 ,c 2 ,...,c k ,...,c KN }, where, c k is the k-th cluster center finally determined.
对于任意一条语句而言,可以分别计算其语句向量与所述聚类中心集合中的各个聚类中心的向量之间的距离,并将其聚类到与其距离最小的聚类中心。通过对各条语句进行遍历,聚类到同一聚类中心的各条语句形成一个聚类群组,则可得到最终的聚类结果。For any sentence, the distance between its sentence vector and the vector of each cluster center in the cluster center set can be calculated separately, and clustered to the cluster center with the smallest distance therefrom. By traversing each sentence, each sentence clustered to the same cluster center forms a clustering group, and the final clustering result can be obtained.
步骤S108、分别计算所述用户的对话语句与各个聚类群组之间的相似度。Step S108: Calculate the similarity between the dialogue sentence of the user and each cluster group respectively.
首先,可以计算所述用户的对话语句的语句向量,此处将其记为SenVec,具体的计算过程与步骤S103至步骤S106中的过程类似,可参照前述内容,此处不再赘述。First, the sentence vector of the dialogue sentence of the user can be calculated, which is denoted as SenVec here. The specific calculation process is similar to the process in step S103 to step S106, and you can refer to the foregoing content, which will not be repeated here.
然后,可以根据下式分别计算所述用户的对话语句与各个聚类群组之间的相似度:Then, the similarity between the dialogue sentence of the user and each cluster group can be calculated separately according to the following formula:
SimDeg k=Recip(VecDis(SenVec,c k)) SimDeg k = Recip(VecDis(SenVec,c k ))
其中,Recip为计算倒数的函数,SimDeg k为所述用户的对话语句与第k个聚类群组之间的相似度。 Wherein, Recip is a function for calculating the reciprocal, and SimDeg k is the similarity between the dialogue sentence of the user and the k-th cluster group.
步骤S109、从各个聚类群组中选取出优选群组。Step S109: Select a preferred group from each cluster group.
所述优选群组为与所述用户的对话语句之间的相似度最大的聚类群组,即:The preferred group is the cluster group with the greatest similarity to the dialogue sentences of the user, namely:
SelGroup=argmin(SimDeg 1,SimDeg 2,...,SimDeg k,...,SimDeg KN) SelGroup=argmin(SimDeg 1 ,SimDeg 2 ,...,SimDeg k ,...,SimDeg KN )
其中,argmin为最小自变量函数,SelGroup为所述优选群组的序号。Wherein, argmin is the smallest independent variable function, and SelGroup is the serial number of the preferred group.
步骤S110、在预设的优选回复语句集合中查询与所述用户的对话语句对应的回复语句,并使用所述回复语句对所述用户的对话语句进行应答。Step S110: Query the reply sentence corresponding to the dialogue sentence of the user in the preset preferred reply sentence set, and use the reply sentence to respond to the dialogue sentence of the user.
所述优选回复语句集合为与所述优选群组对应的回复语句集合。在本实施例中,可以预先分别设置与各个聚类群组对应的回复语句集合,每个回复语句集合中包括预定数目的回复语句,每个回复语句用于对某一预先设置的指定语句进行应答。当所述对话机器人需要对所述用户的对话语句进行应答时,可以分别计算所述用户的对话语句的语句向量与各个指定语句的语句向量之间的距离,确定该距离取得最小值时所对应的指定语句,然后在所述优选回复语句集合中查询与该指定语句对应的回复语句,并使用这一回复语句对所述用户的对话语句进行应答。The set of preferred reply sentences is a set of reply sentences corresponding to the preferred group. In this embodiment, a set of reply sentences corresponding to each cluster group can be separately set in advance. Each set of reply sentences includes a predetermined number of reply sentences, and each reply sentence is used to perform a certain preset specified sentence. answer. When the dialogue robot needs to respond to the user's dialogue sentence, it can calculate the distance between the sentence vector of the user's dialogue sentence and the sentence vector of each specified sentence, and determine the corresponding distance when the minimum value is obtained. Then, query the reply sentence corresponding to the specified sentence in the preferred reply sentence set, and use this reply sentence to respond to the dialog sentence of the user.
综上所述,本申请实施例在语句向量的计算过程中,充分考虑了各个词语的词语向量以及各个词语出现的概率,可以从整体上表征语句的特征,极大提高聚类结果的准确率,相应地,机器人根据这样的聚类结果所生成的回复语句的准确率也会得到极大提高。In summary, in the calculation process of the sentence vector in the embodiment of the present application, the word vector of each word and the probability of each word appearing are fully considered, which can characterize the characteristics of the sentence as a whole, and greatly improve the accuracy of the clustering result. Correspondingly, the accuracy of the reply sentences generated by the robot based on such clustering results will also be greatly improved.
对应于上文实施例所述的一种机器人对话生成方法,图3示出了本申请实施例提供的一种机器人对话生成装置的一个实施例结构图。Corresponding to the method for generating a robot dialog described in the above embodiment, FIG. 3 shows a structural diagram of an embodiment of a device for generating a robot dialog provided by an embodiment of the present application.
本实施例中,一种机器人对话生成装置可以包括:In this embodiment, an apparatus for generating a robot dialog may include:
对话语句采集模块301,用于在接收到预设的对话指令后,采集用户的对话语句;The dialogue sentence collection module 301 is used to collect the user's dialogue sentence after receiving a preset dialogue instruction;
语句集合获取模块302,用于从预设的数据库中获取待处理的语句集合,所述语句集合中包括SN条语句,SN为大于1的整数;The sentence set obtaining module 302 is configured to obtain a sentence set to be processed from a preset database, the sentence set includes SN sentences, and SN is an integer greater than 1;
分词处理模块303,用于对所述语句集合中的各条语句分别进行分词处理,得到与各条语句分别对应的各个词语集合;The word segmentation processing module 303 is configured to perform word segmentation processing on each sentence in the sentence set to obtain each word set corresponding to each sentence respectively;
词语向量查询模块304,用于在预设的词语向量数据库中分别查询各个词语集合中的各个词语的词语向量;The word vector query module 304 is used to query the word vector of each word in each word set in a preset word vector database;
概率统计模块305,用于分别统计各个词语集合中的各个词语在所述语句集合中出现的概率;The probability statistics module 305 is used to separately count the probability of each word in each word set appearing in the sentence set;
语句向量计算模块306,用于根据各个词语的词语向量和各个词语在所述语句集合中出现的概率分别计算各条语句的语句向量;The sentence vector calculation module 306 is configured to calculate the sentence vector of each sentence according to the word vector of each word and the probability of each word appearing in the sentence set;
聚类处理模块307,用于根据所述语句向量对各条语句进行聚类处理,得到各个聚类群组;The clustering processing module 307 is configured to perform clustering processing on each sentence according to the sentence vector to obtain each clustering group;
相似度计算模块308,用于分别计算所述用户的对话语句与各个聚类群组之间的相似度;The similarity calculation module 308 is configured to calculate the similarity between the dialogue sentence of the user and each cluster group respectively;
优选群组选取模块309,用于从各个聚类群组中选取出优选群组,所述优选群组为与所述用户的对话语句之间的相似度最大的聚类群组;The preferred group selection module 309 is configured to select a preferred group from each cluster group, and the preferred group is the cluster group with the greatest similarity to the dialog sentence of the user;
对话应答模块310,用于在预设的优选回复语句集合中查询与所述用户的对话语句对应的回复语句,并使用所述回复语句对所述用户的对话语句进行应答,所述优选回复语句集合为与所述优选群组对应的回复语句集合。The dialogue response module 310 is used to query the reply sentence corresponding to the dialogue sentence of the user in the preset preferred reply sentence set, and use the reply sentence to respond to the dialogue sentence of the user, the preferred reply sentence The set is a set of reply sentences corresponding to the preferred group.
进一步地,所述聚类处理模块可以包括:Further, the clustering processing module may include:
初始化单元,用于初始化聚类中心集合;The initialization unit is used to initialize the cluster center set;
更新单元,用于对所述聚类中心集合进行第g次更新;An update unit, configured to update the cluster center set for the g th time;
收敛判断单元,用于判断所述聚类中心集合是否满足预设的收敛条件;A convergence judging unit, configured to judge whether the set of cluster centers meets a preset convergence condition;
计数单元,用于若所述聚类中心集合不满足所述收敛条件,则将g增加一个计数单位;A counting unit, configured to increase g by one counting unit if the set of cluster centers does not meet the convergence condition;
聚类处理单元,用于若所述聚类中心集合满足所述收敛条件,则根据第g次更新后的所述聚类中心集合对各条语句进行聚类处理,得到各个聚类群组。The clustering processing unit is configured to, if the cluster center set meets the convergence condition, perform clustering processing on each sentence according to the cluster center set after the gth update to obtain each cluster group.
进一步地,所述收敛判断单元可以包括:Further, the convergence judgment unit may include:
第一计算子单元,用于分别计算各个聚类中心第g次的更新距离;The first calculation subunit is used to calculate the g-th update distance of each cluster center;
第二计算子单元,用于计算所述聚类中心集合第g次的最大更新距离;The second calculation subunit is used to calculate the g-th maximum update distance of the cluster center set;
收敛判断子单元,用于判断所述聚类中心集合是否满足收敛条件。The convergence judgment subunit is used to judge whether the set of cluster centers meets the convergence condition.
进一步地,所述机器人对话生成装置还可以包括:Further, the device for generating a robot dialogue may further include:
矩阵构造模块,用于将各条语句的语句向量构造为所述语句集合的语句矩阵;The matrix construction module is used to construct the sentence vector of each sentence into the sentence matrix of the sentence set;
主成分计算模块,用于计算所述语句矩阵中的主成分;The principal component calculation module is used to calculate the principal components in the sentence matrix;
向量更新模块,用于对各条语句的语句向量进行更新。The vector update module is used to update the statement vector of each statement.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的装置,模块和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and conciseness of the description, the specific working processes of the above described devices, modules and units can refer to the corresponding processes in the foregoing method embodiments, which will not be repeated here.
图4示出了本申请实施例提供的一种机器人的示意框图,为了便于说明,仅示出了与本申请实施例相关的部分。Fig. 4 shows a schematic block diagram of a robot provided by an embodiment of the present application. For ease of description, only parts related to the embodiment of the present application are shown.
在本实施例中,所述机器人4可以包括:处理器40、存储器41以及存储在所述存储器41中并可在所述处理器40上运行的计算机可读指令42,例如执行上述的机器人对话生成方法的计算机可读指令。所述处理器40执行所述计算机可读指令42时实现上述各个机器人对话生成方法实施例中的步骤。In this embodiment, the robot 4 may include: a processor 40, a memory 41, and computer-readable instructions 42 stored in the memory 41 and executable on the processor 40, such as executing the aforementioned robot dialogue. Generate computer-readable instructions for the method. When the processor 40 executes the computer-readable instructions 42, the steps in the above embodiments of the robot dialog generation method are implemented.
示例性的,所述计算机可读指令42可以被分割成一个或多个模块/单元,所述一个或者多个模块/单元被存储在所述存储器41中,并由所述处理器40执行,以完成本申请。所述一个或多个模块/单元可以是能够完成特定功能的一系列计算机可读指令段,该指令段用于描述所述计算机可读指令42在所述机器人4中的执行过程。Exemplarily, the computer-readable instructions 42 may be divided into one or more modules/units, and the one or more modules/units are stored in the memory 41 and executed by the processor 40, To complete this application. The one or more modules/units may be a series of computer-readable instruction segments capable of completing specific functions, and the instruction segments are used to describe the execution process of the computer-readable instructions 42 in the robot 4.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一计算机非易失性可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。Persons of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiments can be implemented by instructing relevant hardware through computer-readable instructions. The computer-readable instructions can be stored in a non-volatile computer. In a readable storage medium, when the computer-readable instructions are executed, they may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, a person of ordinary skill in the art should understand that it can still implement the foregoing The technical solutions recorded in the examples are modified, or some of the technical features are equivalently replaced; these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims (20)

  1. 一种机器人对话生成方法,其特征在于,应用于预设的机器人中,所述方法包括:A method for generating a robot dialogue, characterized in that it is applied to a preset robot, and the method includes:
    在接收到预设的对话指令后,采集用户的对话语句;After receiving the preset dialogue instruction, collect the user's dialogue sentence;
    从预设的数据库中获取待处理的语句集合,所述语句集合中包括SN条语句,SN为大于1的整数;Obtain a set of sentences to be processed from a preset database, the set of sentences includes SN sentences, and SN is an integer greater than 1;
    对所述语句集合中的各条语句分别进行分词处理,得到与各条语句分别对应的各个词语集合;Perform word segmentation processing on each sentence in the sentence set to obtain each word set corresponding to each sentence;
    在预设的词语向量数据库中分别查询各个词语集合中的各个词语的词语向量;Query the word vector of each word in each word set in the preset word vector database;
    分别统计各个词语集合中的各个词语在所述语句集合中出现的概率;Respectively count the probability of each word in each word set appearing in the sentence set;
    根据各个词语的词语向量和各个词语在所述语句集合中出现的概率分别计算各条语句的语句向量;Respectively calculating the sentence vector of each sentence according to the word vector of each word and the probability of each word appearing in the sentence set;
    根据所述语句向量对各条语句进行聚类处理,得到各个聚类群组;Clustering each sentence according to the sentence vector to obtain each clustering group;
    分别计算所述用户的对话语句与各个聚类群组之间的相似度;Respectively calculating the similarity between the user's dialogue sentence and each cluster group;
    从各个聚类群组中选取出优选群组,所述优选群组为与所述用户的对话语句之间的相似度最大的聚类群组;Selecting a preferred group from each clustering group, the preferred group being the clustering group with the greatest degree of similarity with the dialogue sentence of the user;
    在预设的优选回复语句集合中查询与所述用户的对话语句对应的回复语句,并使用所述回复语句对所述用户的对话语句进行应答,所述优选回复语句集合为与所述优选群组对应的回复语句集合。Query the reply sentence corresponding to the dialogue sentence of the user in the preset preferred reply sentence set, and use the reply sentence to respond to the dialogue sentence of the user, and the preferred reply sentence set is related to the preferred group The set of reply sentences corresponding to the group.
  2. 根据权利要求1所述的机器人对话生成方法,其特征在于,所述根据各个词语的词语向量和各个词语在所述语句集合中出现的概率分别计算各条语句的语句向量包括:The method for generating a robot dialogue according to claim 1, wherein the calculation of the sentence vector of each sentence according to the word vector of each word and the probability of each word appearing in the sentence set comprises:
    根据下式分别计算各条语句的语句向量:Calculate the sentence vector of each sentence according to the following formula:
    Figure PCTCN2019116630-appb-100001
    Figure PCTCN2019116630-appb-100001
    其中,s为所述语句集合中的各条语句的序号,1≤s≤SN,1≤w≤WN s,WN s为第s条语句中的词语数目,const为预设的常数,p(w)为第s条语句中的第w个词语在所述语句集合中出现的概率,v w为第s条语句中的第w个词语的词语向量,v s为第s条语句的语句向量。 Where s is the sequence number of each sentence in the sentence set, 1≤s≤SN, 1≤w≤WN s , WN s is the number of words in the sth sentence, const is the preset constant, p( w) is the probability that the wth word in the sth sentence appears in the sentence set, v w is the word vector of the wth word in the sth sentence, and v s is the sentence vector of the sth sentence .
  3. 根据权利要求1所述的机器人对话生成方法,其特征在于,所述根据所述语句向量对各条语句进行聚类处理,得到各个聚类群组包括:The method for generating a robot dialogue according to claim 1, wherein the clustering of each sentence according to the sentence vector to obtain each clustering group comprises:
    初始化如下所示的聚类中心集合:Initialize the cluster center set as shown below:
    Centre(0)={c 1(0),c 2(0),...,c k(0),...,c KN(0)} Centre(0)={c 1 (0),c 2 (0),...,c k (0),...,c KN (0)}
    其中,k为各个聚类中心的序号,1≤k≤KN,KN为聚类中心的预设数目,1<KN<SN,c k(0)为第k个聚类中心的初始化向量,且满足c k(0) Tc k(0)=1,T为转置符号,Centre(0)为初始化的所述聚类中心集合; Where k is the serial number of each cluster center, 1≤k≤KN, KN is the preset number of cluster centers, 1<KN<SN, c k (0) is the initialization vector of the k-th cluster center, and Satisfies c k (0) T c k (0) = 1, T is a transposition symbol, and Centre(0) is the initialized set of cluster centers;
    根据下式对所述聚类中心集合进行第g次更新:Perform the gth update on the cluster center set according to the following formula:
    Figure PCTCN2019116630-appb-100002
    Figure PCTCN2019116630-appb-100002
    其中,
    Figure PCTCN2019116630-appb-100003
    argmax为最大自变量函数,g≥1,c k(g)为第k个聚类中心的第g次更新后的向量;
    among them,
    Figure PCTCN2019116630-appb-100003
    argmax is the largest independent variable function, g≥1, c k (g) is the vector after the gth update of the kth cluster center;
    判断所述聚类中心集合是否满足预设的收敛条件;Judging whether the set of cluster centers meets a preset convergence condition;
    若所述聚类中心集合不满足所述收敛条件,则将g增加一个计数单位,返回执行所述对所述聚类中心集合进行第g次更新的步骤;If the set of cluster centers does not meet the convergence condition, increase g by one count unit, and return to perform the step of performing the g-th update on the set of cluster centers;
    若所述聚类中心集合满足所述收敛条件,则根据第g次更新后的所述聚类中心集合对各条语句进行聚类处理,得到各个聚类群组。If the cluster center set satisfies the convergence condition, clustering is performed on each sentence according to the cluster center set after the gth update to obtain each cluster group.
  4. 根据权利要求3所述的机器人对话生成方法,其特征在于,所述判断所述聚类中心集合是否满足预设的收敛条件包括:The method for generating a robot dialogue according to claim 3, wherein the judging whether the set of cluster centers meets a preset convergence condition comprises:
    根据下式分别计算各个聚类中心第g次的更新距离:Calculate the g-th update distance of each cluster center according to the following formula:
    UpGdDis k,g=VecDis(c k(g),c k(g-1)) UpGdDis k,g =VecDis(c k (g),c k (g-1))
    其中,VecDis为求两个向量之间的距离的函数,UpGdDis k,g为第k个聚类中心第g次的更新距离; Among them, VecDis is a function for finding the distance between two vectors, and UpGdDis k,g is the g-th update distance of the k-th cluster center;
    根据下式计算所述聚类中心集合第g次的最大更新距离:Calculate the g-th maximum update distance of the cluster center set according to the following formula:
    MaxDis g=Max(UpGdDis 0,g,UpGdDis 1,g,...,UpGdDis k,g,...,UpGdDis KN,g) MaxDis g =Max(UpGdDis 0,g ,UpGdDis 1,g ,...,UpGdDis k,g ,...,UpGdDis KN,g )
    其中,Max为求最大值函数,MaxDis g为所述聚类中心集合第g次的最大更新距离; Wherein, Max is a maximum value function, and MaxDis g is the maximum update distance of the g-th cluster center set;
    判断所述聚类中心集合是否满足如下所示的收敛条件:Determine whether the set of cluster centers meets the convergence conditions shown below:
    MaxDis g<Thresh MaxDis g <Thresh
    其中,Thresh为预设的距离阈值。Among them, Thresh is the preset distance threshold.
  5. 根据权利要求1至4中任一项所述的机器人对话生成方法,其特征在于,在根 据各个词语的词语向量和各个词语在所述语句集合中出现的概率分别计算各条语句的语句向量之后,还包括:The method for generating a robot dialogue according to any one of claims 1 to 4, wherein the sentence vectors of each sentence are calculated according to the word vector of each word and the probability of each word appearing in the sentence set. ,Also includes:
    将各条语句的语句向量构造为所述语句集合的语句矩阵;Constructing the sentence vector of each sentence into the sentence matrix of the sentence set;
    计算所述语句矩阵中的主成分;Calculate the principal components in the sentence matrix;
    根据下式对各条语句的语句向量进行更新:Update the sentence vector of each sentence according to the following formula:
    v s=v s-uu Tv s v s = v s -uu T v s
    其中,u为所述语句矩阵中的主成分。Among them, u is the principal component in the sentence matrix.
  6. 一种机器人对话生成装置,其特征在于,包括:A device for generating a robot dialogue, which is characterized in that it comprises:
    对话语句采集模块,用于在接收到预设的对话指令后,采集用户的对话语句;The dialogue sentence collection module is used to collect the user's dialogue sentence after receiving the preset dialogue instruction;
    语句集合获取模块,用于从预设的数据库中获取待处理的语句集合,所述语句集合中包括SN条语句,SN为大于1的整数;The sentence set acquisition module is used to acquire a sentence set to be processed from a preset database, the sentence set includes SN sentences, and SN is an integer greater than 1;
    分词处理模块,用于对所述语句集合中的各条语句分别进行分词处理,得到与各条语句分别对应的各个词语集合;The word segmentation processing module is used to perform word segmentation processing on each sentence in the sentence set to obtain each word set corresponding to each sentence;
    词语向量查询模块,用于在预设的词语向量数据库中分别查询各个词语集合中的各个词语的词语向量;The word vector query module is used to query the word vector of each word in each word set in the preset word vector database;
    概率统计模块,用于分别统计各个词语集合中的各个词语在所述语句集合中出现的概率;The probability statistics module is used to separately count the probability of each word in each word set appearing in the sentence set;
    语句向量计算模块,用于根据各个词语的词语向量和各个词语在所述语句集合中出现的概率分别计算各条语句的语句向量;The sentence vector calculation module is used to calculate the sentence vector of each sentence according to the word vector of each word and the probability of each word appearing in the sentence set;
    聚类处理模块,用于根据所述语句向量对各条语句进行聚类处理,得到各个聚类群组;The clustering processing module is used to perform clustering processing on each sentence according to the sentence vector to obtain each clustering group;
    相似度计算模块,用于分别计算所述用户的对话语句与各个聚类群组之间的相似度;A similarity calculation module for calculating the similarity between the user’s dialogue sentence and each clustering group;
    优选群组选取模块,用于从各个聚类群组中选取出优选群组,所述优选群组为与所述用户的对话语句之间的相似度最大的聚类群组;The preferred group selection module is configured to select a preferred group from each clustering group, the preferred group being the clustering group with the greatest similarity to the dialogue sentence of the user;
    对话应答模块,用于在预设的优选回复语句集合中查询与所述用户的对话语句对应的回复语句,并使用所述回复语句对所述用户的对话语句进行应答,所述优选回复语句集合为与所述优选群组对应的回复语句集合。The dialogue response module is used to query the reply sentence corresponding to the dialogue sentence of the user in the preset preferred reply sentence set, and use the reply sentence to respond to the dialogue sentence of the user, the preferred reply sentence set Is a set of reply sentences corresponding to the preferred group.
  7. 根据权利要求6所述的机器人对话生成装置,其特征在于,所述语句向量计算模块具体用于根据下式分别计算各条语句的语句向量:The robot dialogue generating device according to claim 6, wherein the sentence vector calculation module is specifically configured to calculate the sentence vector of each sentence according to the following formula:
    Figure PCTCN2019116630-appb-100004
    Figure PCTCN2019116630-appb-100004
    其中,s为所述语句集合中的各条语句的序号,1≤s≤SN,1≤w≤WN s,WN s为第s条语句中的词语数目,const为预设的常数,p(w)为第s条语句中的第w个词语在所述语句集合中出现的概率,v w为第s条语句中的第w个词语的词语向量,v s为第s条语句的语句向量。 Where s is the sequence number of each sentence in the sentence set, 1≤s≤SN, 1≤w≤WN s , WN s is the number of words in the sth sentence, const is the preset constant, p( w) is the probability that the wth word in the sth sentence appears in the sentence set, v w is the word vector of the wth word in the sth sentence, and v s is the sentence vector of the sth sentence .
  8. 根据权利要求6所述的机器人对话生成装置,其特征在于,所述聚类处理模块包括:The robot dialog generating device according to claim 6, wherein the clustering processing module comprises:
    初始化单元,用于初始化如下所示的聚类中心集合:The initialization unit is used to initialize the cluster center set as shown below:
    Centre(0)={c 1(0),c 2(0),...,c k(0),...,c KN(0)} Centre(0)={c 1 (0),c 2 (0),...,c k (0),...,c KN (0)}
    其中,k为各个聚类中心的序号,1≤k≤KN,KN为聚类中心的预设数目,1<KN<SN,c k(0)为第k个聚类中心的初始化向量,且满足c k(0) Tc k(0)=1,T为转置符号,Centre(0)为初始化的所述聚类中心集合; Where k is the serial number of each cluster center, 1≤k≤KN, KN is the preset number of cluster centers, 1<KN<SN, c k (0) is the initialization vector of the k-th cluster center, and Satisfies c k (0) T c k (0) = 1, T is a transposition symbol, and Centre(0) is the initialized set of cluster centers;
    更新单元,用于根据下式对所述聚类中心集合进行第g次更新:The update unit is used to update the cluster center set for the gth time according to the following formula:
    Figure PCTCN2019116630-appb-100005
    Figure PCTCN2019116630-appb-100005
    其中,
    Figure PCTCN2019116630-appb-100006
    argmax为最大自变量函数,g≥1,c k(g)为第k个聚类中心的第g次更新后的向量;
    among them,
    Figure PCTCN2019116630-appb-100006
    argmax is the largest independent variable function, g≥1, c k (g) is the vector after the gth update of the kth cluster center;
    收敛判断单元,用于判断所述聚类中心集合是否满足预设的收敛条件;A convergence judging unit, configured to judge whether the set of cluster centers meets a preset convergence condition;
    计数单元,用于若所述聚类中心集合不满足所述收敛条件,则将g增加一个计数单位;A counting unit, configured to increase g by one counting unit if the set of cluster centers does not meet the convergence condition;
    聚类处理单元,用于若所述聚类中心集合满足所述收敛条件,则根据第g次更新后的所述聚类中心集合对各条语句进行聚类处理,得到各个聚类群组。The clustering processing unit is configured to, if the cluster center set meets the convergence condition, perform clustering processing on each sentence according to the cluster center set after the gth update to obtain each cluster group.
  9. 根据权利要求8所述的机器人对话生成装置,其特征在于,所述收敛判断单元包括:8. The robot dialog generating device according to claim 8, wherein the convergence judgment unit comprises:
    第一计算子单元,用于根据下式分别计算各个聚类中心第g次的更新距离:The first calculation subunit is used to calculate the g-th update distance of each cluster center according to the following formula:
    UpGdDis k,g=VecDis(c k(g),c k(g-1)) UpGdDis k,g =VecDis(c k (g),c k (g-1))
    其中,VecDis为求两个向量之间的距离的函数,UpGdDis k,g为第k个聚类中心第g次的更新距离; Among them, VecDis is a function for finding the distance between two vectors, and UpGdDis k,g is the g-th update distance of the k-th cluster center;
    第二计算子单元,用于根据下式计算所述聚类中心集合第g次的最大更新距离:The second calculation subunit is used to calculate the g-th maximum update distance of the cluster center set according to the following formula:
    MaxDis g=Max(UpGdDis 0,g,UpGdDis 1,g,...,UpGdDis k,g,...,UpGdDis KN,g) MaxDis g =Max(UpGdDis 0,g ,UpGdDis 1,g ,...,UpGdDis k,g ,...,UpGdDis KN,g )
    其中,Max为求最大值函数,MaxDis g为所述聚类中心集合第g次的最大更新距离; Wherein, Max is a maximum value function, and MaxDis g is the maximum update distance of the g-th cluster center set;
    收敛判断子单元,用于判断所述聚类中心集合是否满足如下所示的收敛条件:The convergence judgment subunit is used to judge whether the set of cluster centers meets the convergence conditions shown below:
    MaxDis g<Thresh MaxDis g <Thresh
    其中,Thresh为预设的距离阈值。Among them, Thresh is the preset distance threshold.
  10. 根据权利要求6至9中任一项所述的机器人对话生成装置,其特征在于,还包括:The robot dialog generating device according to any one of claims 6 to 9, characterized in that it further comprises:
    矩阵构造模块,用于将各条语句的语句向量构造为所述语句集合的语句矩阵;The matrix construction module is used to construct the sentence vector of each sentence into the sentence matrix of the sentence set;
    主成分计算模块,用于计算所述语句矩阵中的主成分;The principal component calculation module is used to calculate the principal components in the sentence matrix;
    向量更新模块,用于根据下式对各条语句的语句向量进行更新:The vector update module is used to update the statement vector of each statement according to the following formula:
    v s=v s-uu Tv s v s = v s -uu T v s
    其中,u为所述语句矩阵中的主成分。Among them, u is the principal component in the sentence matrix.
  11. 一种计算机非易失性可读存储介质,所述计算机非易失性可读存储介质存储有计算机可读指令,其特征在于,所述计算机可读指令被处理器执行时实现如下步骤:A computer non-volatile readable storage medium, the computer non-volatile readable storage medium storing computer readable instructions, wherein the computer readable instructions are executed by a processor to implement the following steps:
    在接收到预设的对话指令后,采集用户的对话语句;After receiving the preset dialogue instruction, collect the user's dialogue sentence;
    从预设的数据库中获取待处理的语句集合,所述语句集合中包括SN条语句,SN为大于1的整数;Obtain a set of sentences to be processed from a preset database, the set of sentences includes SN sentences, and SN is an integer greater than 1;
    对所述语句集合中的各条语句分别进行分词处理,得到与各条语句分别对应的各个词语集合;Perform word segmentation processing on each sentence in the sentence set to obtain each word set corresponding to each sentence;
    在预设的词语向量数据库中分别查询各个词语集合中的各个词语的词语向量;Query the word vector of each word in each word set in the preset word vector database;
    分别统计各个词语集合中的各个词语在所述语句集合中出现的概率;Respectively count the probability of each word in each word set appearing in the sentence set;
    根据各个词语的词语向量和各个词语在所述语句集合中出现的概率分别计算各条语句的语句向量;Respectively calculating the sentence vector of each sentence according to the word vector of each word and the probability of each word appearing in the sentence set;
    根据所述语句向量对各条语句进行聚类处理,得到各个聚类群组;Clustering each sentence according to the sentence vector to obtain each clustering group;
    分别计算所述用户的对话语句与各个聚类群组之间的相似度;Respectively calculating the similarity between the user's dialogue sentence and each cluster group;
    从各个聚类群组中选取出优选群组,所述优选群组为与所述用户的对话语句之间的相似度最大的聚类群组;Selecting a preferred group from each clustering group, the preferred group being the clustering group with the greatest degree of similarity with the dialogue sentence of the user;
    在预设的优选回复语句集合中查询与所述用户的对话语句对应的回复语句,并使用所述回复语句对所述用户的对话语句进行应答,所述优选回复语句集合为与所述优选群组对应的回复语句集合。Query the reply sentence corresponding to the dialogue sentence of the user in the preset preferred reply sentence set, and use the reply sentence to respond to the dialogue sentence of the user, and the preferred reply sentence set is related to the preferred group The set of reply sentences corresponding to the group.
  12. 根据权利要求11所述的计算机非易失性可读存储介质,其特征在于,所述根 据各个词语的词语向量和各个词语在所述语句集合中出现的概率分别计算各条语句的语句向量包括:The computer non-volatile readable storage medium according to claim 11, wherein the calculation of the sentence vector of each sentence according to the word vector of each word and the probability of each word appearing in the sentence set comprises :
    根据下式分别计算各条语句的语句向量:Calculate the sentence vector of each sentence according to the following formula:
    Figure PCTCN2019116630-appb-100007
    Figure PCTCN2019116630-appb-100007
    其中,s为所述语句集合中的各条语句的序号,1≤s≤SN,1≤w≤WN s,WN s为第s条语句中的词语数目,const为预设的常数,p(w)为第s条语句中的第w个词语在所述语句集合中出现的概率,v w为第s条语句中的第w个词语的词语向量,v s为第s条语句的语句向量。 Where s is the sequence number of each sentence in the sentence set, 1≤s≤SN, 1≤w≤WN s , WN s is the number of words in the sth sentence, const is the preset constant, p( w) is the probability that the wth word in the sth sentence appears in the sentence set, v w is the word vector of the wth word in the sth sentence, and v s is the sentence vector of the sth sentence .
  13. 根据权利要求11所述的计算机非易失性可读存储介质,其特征在于,所述根据所述语句向量对各条语句进行聚类处理,得到各个聚类群组包括:The computer non-volatile readable storage medium according to claim 11, wherein the clustering of each sentence according to the sentence vector to obtain each clustering group comprises:
    初始化如下所示的聚类中心集合:Initialize the cluster center set as shown below:
    Centre(0)={c 1(0),c 2(0),...,c k(0),...,c KN(0)} Centre(0)={c 1 (0),c 2 (0),...,c k (0),...,c KN (0)}
    其中,k为各个聚类中心的序号,1≤k≤KN,KN为聚类中心的预设数目,1<KN<SN,c k(0)为第k个聚类中心的初始化向量,且满足c k(0) Tc k(0)=1,T为转置符号,Centre(0)为初始化的所述聚类中心集合; Where k is the serial number of each cluster center, 1≤k≤KN, KN is the preset number of cluster centers, 1<KN<SN, c k (0) is the initialization vector of the k-th cluster center, and Satisfies c k (0) T c k (0) = 1, T is a transposition symbol, and Centre(0) is the initialized set of cluster centers;
    根据下式对所述聚类中心集合进行第g次更新:Perform the gth update on the cluster center set according to the following formula:
    Figure PCTCN2019116630-appb-100008
    Figure PCTCN2019116630-appb-100008
    其中,
    Figure PCTCN2019116630-appb-100009
    argmax为最大自变量函数,g≥1,c k(g)为第k个聚类中心的第g次更新后的向量;
    among them,
    Figure PCTCN2019116630-appb-100009
    argmax is the largest independent variable function, g≥1, c k (g) is the vector after the gth update of the kth cluster center;
    判断所述聚类中心集合是否满足预设的收敛条件;Judging whether the set of cluster centers meets a preset convergence condition;
    若所述聚类中心集合不满足所述收敛条件,则将g增加一个计数单位,返回执行所述对所述聚类中心集合进行第g次更新的步骤;If the set of cluster centers does not meet the convergence condition, increase g by one count unit, and return to perform the step of performing the g-th update on the set of cluster centers;
    若所述聚类中心集合满足所述收敛条件,则根据第g次更新后的所述聚类中心集合对各条语句进行聚类处理,得到各个聚类群组。If the cluster center set satisfies the convergence condition, clustering is performed on each sentence according to the cluster center set after the gth update to obtain each cluster group.
  14. 根据权利要求13所述的计算机非易失性可读存储介质,其特征在于,所述判断所述聚类中心集合是否满足预设的收敛条件包括:The computer non-volatile readable storage medium according to claim 13, wherein the judging whether the set of cluster centers meets a preset convergence condition comprises:
    根据下式分别计算各个聚类中心第g次的更新距离:Calculate the g-th update distance of each cluster center according to the following formula:
    UpGdDis k,g=VecDis(c k(g),c k(g-1)) UpGdDis k,g =VecDis(c k (g),c k (g-1))
    其中,VecDis为求两个向量之间的距离的函数,UpGdDis k,g为第k个聚类中心第g次的更新距离; Among them, VecDis is a function for finding the distance between two vectors, and UpGdDis k,g is the g-th update distance of the k-th cluster center;
    根据下式计算所述聚类中心集合第g次的最大更新距离:Calculate the g-th maximum update distance of the cluster center set according to the following formula:
    MaxDis g=Max(UpGdDis 0,g,UpGdDis 1,g,...,UpGdDis k,g,...,UpGdDis KN,g) MaxDis g =Max(UpGdDis 0,g ,UpGdDis 1,g ,...,UpGdDis k,g ,...,UpGdDis KN,g )
    其中,Max为求最大值函数,MaxDis g为所述聚类中心集合第g次的最大更新距离; Wherein, Max is a maximum value function, and MaxDis g is the maximum update distance of the g-th cluster center set;
    判断所述聚类中心集合是否满足如下所示的收敛条件:Determine whether the set of cluster centers meets the convergence conditions shown below:
    MaxDis g<Thresh MaxDis g <Thresh
    其中,Thresh为预设的距离阈值。Among them, Thresh is the preset distance threshold.
  15. 根据权利要求11至14中任一项所述的计算机非易失性可读存储介质,其特征在于,在根据各个词语的词语向量和各个词语在所述语句集合中出现的概率分别计算各条语句的语句向量之后,还包括:The computer non-volatile readable storage medium according to any one of claims 11 to 14, wherein each item is calculated according to the word vector of each word and the probability of each word appearing in the sentence set. After the statement vector of the statement, it also includes:
    将各条语句的语句向量构造为所述语句集合的语句矩阵;Constructing the sentence vector of each sentence into the sentence matrix of the sentence set;
    计算所述语句矩阵中的主成分;Calculate the principal components in the sentence matrix;
    根据下式对各条语句的语句向量进行更新:Update the sentence vector of each sentence according to the following formula:
    v s=v s-uu Tv s v s = v s -uu T v s
    其中,u为所述语句矩阵中的主成分。Among them, u is the principal component in the sentence matrix.
  16. 一种机器人,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其特征在于,所述处理器执行所述计算机可读指令时实现如下步骤:A robot comprising a memory, a processor, and computer readable instructions stored in the memory and capable of running on the processor, wherein the processor executes the computer readable instructions to implement the following steps :
    在接收到预设的对话指令后,采集用户的对话语句;After receiving the preset dialogue instruction, collect the user's dialogue sentence;
    从预设的数据库中获取待处理的语句集合,所述语句集合中包括SN条语句,SN为大于1的整数;Obtain a set of sentences to be processed from a preset database, the set of sentences includes SN sentences, and SN is an integer greater than 1;
    对所述语句集合中的各条语句分别进行分词处理,得到与各条语句分别对应的各个词语集合;Perform word segmentation processing on each sentence in the sentence set to obtain each word set corresponding to each sentence;
    在预设的词语向量数据库中分别查询各个词语集合中的各个词语的词语向量;Query the word vector of each word in each word set in the preset word vector database;
    分别统计各个词语集合中的各个词语在所述语句集合中出现的概率;Respectively count the probability of each word in each word set appearing in the sentence set;
    根据各个词语的词语向量和各个词语在所述语句集合中出现的概率分别计算各条语句的语句向量;Respectively calculating the sentence vector of each sentence according to the word vector of each word and the probability of each word appearing in the sentence set;
    根据所述语句向量对各条语句进行聚类处理,得到各个聚类群组;Clustering each sentence according to the sentence vector to obtain each clustering group;
    分别计算所述用户的对话语句与各个聚类群组之间的相似度;Respectively calculating the similarity between the user's dialogue sentence and each cluster group;
    从各个聚类群组中选取出优选群组,所述优选群组为与所述用户的对话语句之间的相似度最大的聚类群组;Selecting a preferred group from each clustering group, the preferred group being the clustering group with the greatest degree of similarity with the dialogue sentence of the user;
    在预设的优选回复语句集合中查询与所述用户的对话语句对应的回复语句,并使用所述回复语句对所述用户的对话语句进行应答,所述优选回复语句集合为与所述优选群组对应的回复语句集合。Query the reply sentence corresponding to the dialogue sentence of the user in the preset preferred reply sentence set, and use the reply sentence to respond to the dialogue sentence of the user, and the preferred reply sentence set is related to the preferred group The set of reply sentences corresponding to the group.
  17. 根据权利要求16所述的机器人,其特征在于,所述根据各个词语的词语向量和各个词语在所述语句集合中出现的概率分别计算各条语句的语句向量包括:The robot according to claim 16, wherein the calculation of the sentence vector of each sentence according to the word vector of each word and the probability of each word appearing in the sentence set comprises:
    根据下式分别计算各条语句的语句向量:Calculate the sentence vector of each sentence according to the following formula:
    Figure PCTCN2019116630-appb-100010
    Figure PCTCN2019116630-appb-100010
    其中,s为所述语句集合中的各条语句的序号,1≤s≤SN,1≤w≤WN s,WN s为第s条语句中的词语数目,const为预设的常数,p(w)为第s条语句中的第w个词语在所述语句集合中出现的概率,v w为第s条语句中的第w个词语的词语向量,v s为第s条语句的语句向量。 Where s is the sequence number of each sentence in the sentence set, 1≤s≤SN, 1≤w≤WN s , WN s is the number of words in the sth sentence, const is the preset constant, p( w) is the probability that the wth word in the sth sentence appears in the sentence set, v w is the word vector of the wth word in the sth sentence, and v s is the sentence vector of the sth sentence .
  18. 根据权利要求16所述的机器人,其特征在于,所述根据所述语句向量对各条语句进行聚类处理,得到各个聚类群组包括:The robot according to claim 16, wherein the clustering of each sentence according to the sentence vector to obtain each clustering group comprises:
    初始化如下所示的聚类中心集合:Initialize the cluster center set as shown below:
    Centre(0)={c 1(0),c 2(0),...,c k(0),...,c KN(0)} Centre(0)={c 1 (0),c 2 (0),...,c k (0),...,c KN (0)}
    其中,k为各个聚类中心的序号,1≤k≤KN,KN为聚类中心的预设数目,1<KN<SN,c k(0)为第k个聚类中心的初始化向量,且满足c k(0) Tc k(0)=1,T为转置符号,Centre(0)为初始化的所述聚类中心集合; Where k is the serial number of each cluster center, 1≤k≤KN, KN is the preset number of cluster centers, 1<KN<SN, c k (0) is the initialization vector of the k-th cluster center, and Satisfies c k (0) T c k (0) = 1, T is a transposition symbol, and Centre(0) is the initialized set of cluster centers;
    根据下式对所述聚类中心集合进行第g次更新:Perform the gth update on the cluster center set according to the following formula:
    Figure PCTCN2019116630-appb-100011
    Figure PCTCN2019116630-appb-100011
    其中,
    Figure PCTCN2019116630-appb-100012
    argmax为最大自变量函数,g≥1,c k(g)为第k个聚类中心的第g次更新后的向量;
    among them,
    Figure PCTCN2019116630-appb-100012
    argmax is the largest independent variable function, g≥1, c k (g) is the vector after the gth update of the kth cluster center;
    判断所述聚类中心集合是否满足预设的收敛条件;Judging whether the set of cluster centers meets a preset convergence condition;
    若所述聚类中心集合不满足所述收敛条件,则将g增加一个计数单位,返回执行 所述对所述聚类中心集合进行第g次更新的步骤;If the set of cluster centers does not meet the convergence condition, increase g by one count unit, and return to perform the step of performing the g-th update on the set of cluster centers;
    若所述聚类中心集合满足所述收敛条件,则根据第g次更新后的所述聚类中心集合对各条语句进行聚类处理,得到各个聚类群组。If the cluster center set satisfies the convergence condition, clustering is performed on each sentence according to the cluster center set after the gth update to obtain each cluster group.
  19. 根据权利要求18所述的机器人,其特征在于,所述判断所述聚类中心集合是否满足预设的收敛条件包括:The robot according to claim 18, wherein the judging whether the set of cluster centers meets a preset convergence condition comprises:
    根据下式分别计算各个聚类中心第g次的更新距离:Calculate the g-th update distance of each cluster center according to the following formula:
    UpGdDis k,g=VecDis(c k(g),c k(g-1)) UpGdDis k,g =VecDis(c k (g),c k (g-1))
    其中,VecDis为求两个向量之间的距离的函数,UpGdDis k,g为第k个聚类中心第g次的更新距离; Among them, VecDis is a function for finding the distance between two vectors, and UpGdDis k,g is the g-th update distance of the k-th cluster center;
    根据下式计算所述聚类中心集合第g次的最大更新距离:Calculate the g-th maximum update distance of the cluster center set according to the following formula:
    MaxDis g=Max(UpGdDis 0,g,UpGdDis 1,g,...,UpGdDis k,g,...,UpGdDis KN,g) MaxDis g =Max(UpGdDis 0,g ,UpGdDis 1,g ,...,UpGdDis k,g ,...,UpGdDis KN,g )
    其中,Max为求最大值函数,MaxDis g为所述聚类中心集合第g次的最大更新距离; Wherein, Max is a maximum value function, and MaxDis g is the maximum update distance of the g-th cluster center set;
    判断所述聚类中心集合是否满足如下所示的收敛条件:Determine whether the set of cluster centers meets the convergence conditions shown below:
    MaxDis g<Thresh MaxDis g <Thresh
    其中,Thresh为预设的距离阈值。Among them, Thresh is the preset distance threshold.
  20. 根据权利要求16至19中任一项所述的机器人,其特征在于,在根据各个词语的词语向量和各个词语在所述语句集合中出现的概率分别计算各条语句的语句向量之后,还包括:The robot according to any one of claims 16 to 19, characterized in that, after calculating the sentence vectors of each sentence according to the word vector of each word and the probability of each word appearing in the sentence set, it further comprises: :
    将各条语句的语句向量构造为所述语句集合的语句矩阵;Constructing the sentence vector of each sentence into the sentence matrix of the sentence set;
    计算所述语句矩阵中的主成分;Calculate the principal components in the sentence matrix;
    根据下式对各条语句的语句向量进行更新:Update the sentence vector of each sentence according to the following formula:
    v s=v s-uu Tv s v s = v s -uu T v s
    其中,u为所述语句矩阵中的主成分。Among them, u is the principal component in the sentence matrix.
PCT/CN2019/116630 2019-09-18 2019-11-08 Robot dialogue generating method and apparatus, readable storage medium, and robot WO2021051508A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910880859.9A CN110750629A (en) 2019-09-18 2019-09-18 Robot dialogue generation method and device, readable storage medium and robot
CN201910880859.9 2019-09-18

Publications (1)

Publication Number Publication Date
WO2021051508A1 true WO2021051508A1 (en) 2021-03-25

Family

ID=69276649

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/116630 WO2021051508A1 (en) 2019-09-18 2019-11-08 Robot dialogue generating method and apparatus, readable storage medium, and robot

Country Status (2)

Country Link
CN (1) CN110750629A (en)
WO (1) WO2021051508A1 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111694941B (en) * 2020-05-22 2024-01-05 腾讯科技(深圳)有限公司 Reply information determining method and device, storage medium and electronic equipment
CN111858891A (en) * 2020-07-23 2020-10-30 平安科技(深圳)有限公司 Question-answer library construction method and device, electronic equipment and storage medium
CN112100677B (en) * 2020-11-13 2021-02-05 支付宝(杭州)信息技术有限公司 Privacy data protection method and device and electronic equipment
CN112328776A (en) * 2021-01-04 2021-02-05 北京百度网讯科技有限公司 Dialog generation method and device, electronic equipment and storage medium
CN113239668B (en) * 2021-05-31 2023-06-23 平安科技(深圳)有限公司 Keyword intelligent extraction method and device, computer equipment and storage medium
CN113723115B (en) * 2021-09-30 2024-02-09 平安科技(深圳)有限公司 Open domain question-answer prediction method based on pre-training model and related equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017037602A (en) * 2015-08-14 2017-02-16 Psソリューションズ株式会社 Dialog interface
CN106844587A (en) * 2017-01-11 2017-06-13 北京光年无限科技有限公司 A kind of data processing method and device for talking with interactive system
CN108345672A (en) * 2018-02-09 2018-07-31 平安科技(深圳)有限公司 Intelligent response method, electronic device and storage medium
CN109102809A (en) * 2018-06-22 2018-12-28 北京光年无限科技有限公司 A kind of dialogue method and system for intelligent robot
CN109766437A (en) * 2018-12-07 2019-05-17 中科恒运股份有限公司 A kind of Text Clustering Method, text cluster device and terminal device
CN109885664A (en) * 2019-01-08 2019-06-14 厦门快商通信息咨询有限公司 A kind of Intelligent dialogue method, robot conversational system, server and storage medium
CN109977207A (en) * 2019-03-21 2019-07-05 网易(杭州)网络有限公司 Talk with generation method, dialogue generating means, electronic equipment and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103810218B (en) * 2012-11-14 2018-06-08 北京百度网讯科技有限公司 A kind of automatic question-answering method and device based on problem cluster
CN105955965A (en) * 2016-06-21 2016-09-21 上海智臻智能网络科技股份有限公司 Question information processing method and device
JP6709748B2 (en) * 2017-04-13 2020-06-17 日本電信電話株式会社 Clustering device, answer candidate generation device, method, and program
CN110008465B (en) * 2019-01-25 2023-05-12 网经科技(苏州)有限公司 Method for measuring semantic distance of sentence
CN110096580B (en) * 2019-04-24 2022-05-24 北京百度网讯科技有限公司 FAQ conversation method and device and electronic equipment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017037602A (en) * 2015-08-14 2017-02-16 Psソリューションズ株式会社 Dialog interface
CN106844587A (en) * 2017-01-11 2017-06-13 北京光年无限科技有限公司 A kind of data processing method and device for talking with interactive system
CN108345672A (en) * 2018-02-09 2018-07-31 平安科技(深圳)有限公司 Intelligent response method, electronic device and storage medium
CN109102809A (en) * 2018-06-22 2018-12-28 北京光年无限科技有限公司 A kind of dialogue method and system for intelligent robot
CN109766437A (en) * 2018-12-07 2019-05-17 中科恒运股份有限公司 A kind of Text Clustering Method, text cluster device and terminal device
CN109885664A (en) * 2019-01-08 2019-06-14 厦门快商通信息咨询有限公司 A kind of Intelligent dialogue method, robot conversational system, server and storage medium
CN109977207A (en) * 2019-03-21 2019-07-05 网易(杭州)网络有限公司 Talk with generation method, dialogue generating means, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN110750629A (en) 2020-02-04

Similar Documents

Publication Publication Date Title
WO2021051508A1 (en) Robot dialogue generating method and apparatus, readable storage medium, and robot
WO2020143844A1 (en) Intent analysis method and apparatus, display terminal, and computer readable storage medium
US20240202446A1 (en) Method for training keyword extraction model, keyword extraction method, and computer device
US11093854B2 (en) Emoji recommendation method and device thereof
WO2020042925A1 (en) Man-machine conversation method and apparatus, electronic device, and computer readable medium
US11295090B2 (en) Multi-scale model for semantic matching
Geng et al. Facial age estimation by learning from label distributions
WO2020232877A1 (en) Question answer selection method and apparatus, computer device, and storage medium
CN105183833B (en) Microblog text recommendation method and device based on user model
WO2020082560A1 (en) Method, apparatus and device for extracting text keyword, as well as computer readable storage medium
JP2021532499A (en) Machine learning-based medical data classification methods, devices, computer devices and storage media
US20170150235A1 (en) Jointly Modeling Embedding and Translation to Bridge Video and Language
US20020194158A1 (en) System and method for context-dependent probabilistic modeling of words and documents
CN110222560B (en) Text person searching method embedded with similarity loss function
CN107832326B (en) Natural language question-answering method based on deep convolutional neural network
US11461613B2 (en) Method and apparatus for multi-document question answering
Li et al. A general framework for association analysis of heterogeneous data
CN107992477A (en) Text subject determines method, apparatus and electronic equipment
JP2022063250A (en) Super loss: general loss for robust curriculum learning
US20170193197A1 (en) System and method for automatic unstructured data analysis from medical records
CN110795542A (en) Dialogue method and related device and equipment
CN110705247A (en) Based on x2-C text similarity calculation method
Peng et al. Learning on probabilistic labels
Han et al. Conditional word embedding and hypothesis testing via bayes-by-backprop
WO2021253938A1 (en) Neural network training method and apparatus, and video recognition method and apparatus

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19945647

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19945647

Country of ref document: EP

Kind code of ref document: A1