WO2020191828A1 - 基于图的上下文关联回复生成方法、计算机及介质 - Google Patents

基于图的上下文关联回复生成方法、计算机及介质 Download PDF

Info

Publication number
WO2020191828A1
WO2020191828A1 PCT/CN2019/082913 CN2019082913W WO2020191828A1 WO 2020191828 A1 WO2020191828 A1 WO 2020191828A1 CN 2019082913 W CN2019082913 W CN 2019082913W WO 2020191828 A1 WO2020191828 A1 WO 2020191828A1
Authority
WO
WIPO (PCT)
Prior art keywords
dialogue information
context
graph
information
subgraph
Prior art date
Application number
PCT/CN2019/082913
Other languages
English (en)
French (fr)
Inventor
邱楠
宋亚楠
严汉明
梁剑华
邹创华
邓婧文
Original Assignee
深圳狗尾草智能科技有限公司
深圳琥珀虚颜智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳狗尾草智能科技有限公司, 深圳琥珀虚颜智能科技有限公司 filed Critical 深圳狗尾草智能科技有限公司
Publication of WO2020191828A1 publication Critical patent/WO2020191828A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation

Definitions

  • the invention belongs to the technical field of human-computer interaction, and specifically relates to a graph-based context correlation response generation method, computer and medium.
  • a graph is a commonly used data structure or storage method in a computer system. If in human-computer interaction, the content of the machine's reply is related to the content of the previous n rounds of human-computer interaction, then the machine's reply satisfies the context.
  • the present invention provides a graph-based context correlation response generation method, computer and medium, which have the ability of context analysis and memory, and are flexible in response.
  • a graph-based context correlation response generation method includes the following steps:
  • the generating of the contextual subgraph according to multiple rounds of dialogue information specifically includes:
  • context subgraph If the context subgraph is not empty, add the triples of the last round of dialogue information and the extracted entities to the context subgraph, and update the context subgraph.
  • the method further includes:
  • the incremental set includes entities that exist in the general graph of the dialogue information but do not exist in the context subgraph ;
  • the entities in the incremental set are added to the context subgraph, and the context subgraph is updated.
  • said segmenting the dialogue information and extracting triples specifically includes:
  • the method further includes:
  • the method further includes:
  • the triples and their corresponding entities whose automatic destruction time arrives are stored in a preset long-term memory storage area.
  • the generating reply information according to the intersection specifically includes:
  • the initial response information whose scoring result meets the preset scoring requirements is extracted, and the response information is generated.
  • a computer in a second aspect, includes a processor, an input device, an output device, and a memory, where the processor, input device, output device, and memory are connected to each other, wherein the memory is used to store a computer program, and the computer program Including program instructions, the processor is configured to call the program instructions to execute the method described in the first aspect.
  • a computer-readable storage medium stores a computer program, and the computer program includes program instructions that, when executed by a processor, cause the processor to perform the operations described in the first aspect. The method described.
  • the graph-based contextual response generation method, computer, and media provided by the present invention can largely escape topics without triggering related contextual responses, and have the functions of context analysis and memory.
  • the reply topic can constantly wander between different entities and topics, constantly load old and new topics into memory, and reply flexibly.
  • Fig. 1 is a flowchart of a method for generating a context-related reply provided in the first embodiment.
  • Fig. 2 is a flowchart of a method for constructing a context subgraph provided in the second embodiment.
  • Fig. 3 is a block diagram of a computer module provided in the fourth embodiment.
  • the term “if” can be interpreted as “when” or “once” or “in response to determination” or “in response to detection” depending on the context .
  • the phrase “if determined” or “if detected [described condition or event]” can be interpreted as meaning “once determined” or “response to determination” or “once detected [described condition or event]” depending on the context ]” or “in response to detection of [condition or event described]”.
  • a graph-based context correlation response generation method includes the following steps:
  • the context subgraph contains the themes and entities of multiple rounds of historical dialogue information, which can reflect the themes and related entities that the user pays attention to during several interactions. If this multi-round dialogue is the first dialogue, it means that there is no historical record of multi-round dialogue information, and the default context sub-picture is empty or a sub-picture in the field.
  • the context subgraph can be initially set as a subgraph related to the core topic of the specific field, so that the context subgraph can cover most of the core topics in the field, so that the user During the initial conversation, you can use this sub-picture to conduct multiple conversations.
  • S2 Receive dialogue information input by the user, and perform word segmentation on the dialogue information to obtain a word segmentation entity;
  • the core information includes main information such as subject, predicate and object.
  • the general graph of the dialogue information can completely reflect the subject and entity of the dialogue information.
  • a general graph can be constructed according to grammatical rules.
  • the user enters "Li Bai is a great romantic poet in the Tang Dynasty, and is hailed as a "poetry fairy" by later generations, and is called “Li Du” together with Du Fu”; for the dialogue information, the following multiple triples are extracted, of which triples
  • the empty content in the group is represented by a blank space: [ ⁇ ,, ⁇ ], [ ⁇ , ⁇ ,Romantic poet], [ ⁇ , ⁇ , ⁇ ], [ ⁇ ,, ⁇ ], [ ⁇ ], and Said, Li Du], to establish a general picture of this sentence.
  • the core entity in the general picture is Li Bai, which includes entities such as "Poets in Tang Dynasty", “Romantic Poets", “Du Fu", and "Li Du”.
  • the intersection includes the set of entities that exist in both the general graph and the context subgraph.
  • the context sub-picture is about the topic of a person's name Zhou, such as: horoscope, occupation, hobby, height, weight, etc. If the user’s message in this dialogue is "What is Zhou's constellation", because the constellation is an entity that exists in both the general graph and the context subgraph, the intersection of the two includes the constellation, so according to the context subgraph " The related entity of the "constellation" entity (for example, Aries) generates a reply message, such as reply "Zhou Mo is in Aries".
  • This method can largely shift the topic without triggering relevant contextual answers, and has the functions of context analysis and memory.
  • the reply topic can constantly wander between different entities and topics, constantly load old and new topics into memory, and reply flexibly.
  • the generating of the context sub-graph based on multiple rounds of dialogue information specifically includes:
  • the dialogue information of this round will be segmented to extract the triples of the dialogue information of this time. If the current conversation process ends, the context subgraph is updated with the triples in the conversation information that has ended.
  • the entity directly connected to the target entity (that is, the triplet of the last round of dialogue information) in the knowledge graph is within one hop of the target entity, that is, one hop is reachable. If there is an entity connected to another entity between the target entity, it is two hops away from the target entity. For example, in the above example, if the target entity is a profession, the entity that can be reached by one jump is a singer.
  • the one-hop reachable entity includes the user's expected response, information related to the user's input, and the next question that the user may ask/input, and other entities closely related to the target entity. If "Zhou XX" is the target entity, the entities that can be reached in one jump include occupations, hobbies, constellations, etc.
  • the general knowledge graph contains topics or entities involved in common human-computer interaction content. Extracting entities that are hop-reachable to the target entity from the general knowledge graph can well predict the user's expected response, information related to user input, and questions that the user may ask/input in the next step.
  • the context subgraph does not yet exist or the context subgraph is empty.
  • a contextual sub-graph is generated based on the content of the first interaction.
  • the context subgraph contains the user's multiple interaction records, that is, the context sub-graph includes all entities and topics in the user's dialogue process.
  • the method further includes:
  • the incremental set includes entities that exist in the general graph of the dialogue information but do not exist in the context subgraph ;
  • the entities in the incremental set are added to the context subgraph, and the context subgraph is updated.
  • the incremental set reflects the newly added entities or topics in the dialogue information.
  • the context subgraph can be updated according to the incremental set of each round of dialogue.
  • Said word segmentation of dialogue information and extraction of triples specifically include:
  • the word segmentation tool may be a jieba word segmentation tool.
  • Part-of-speech tagging is a text data processing technology in corpus linguistics that marks the part-of-speech of words in a corpus according to their meaning and contextual content. Part-of-speech tagging can be done by specific algorithms. Common part-of-speech tagging algorithms include Hidden Markov Model (HMM), Conditional Random Fields (CRFs), etc. This method is performing part-of-speech tagging, which is mainly used to mark core information.
  • HMM Hidden Markov Model
  • CRFs Conditional Random Fields
  • the method further includes:
  • the automatic destruction time of the triplet is set to simulate the forgetting function of the person.
  • the automatic destruction time can be set to 1 minute, which means that each group of triples only exists for 1 minute.
  • a context subgraph can be formed based on all the triples of the dialogue information within one minute, but one minute ago
  • the triples of the dialogue information and its related entities are deleted, which means that the brain only remembers the most recent (within 1 minute) data.
  • the method further includes:
  • the triples and their corresponding entities whose automatic destruction time arrives are stored in a preset long-term memory storage area.
  • a timeout deletion set can be constructed based on the triples whose automatic destruction time has arrived.
  • the timeout deletion set represents the entities that are deleted during the timeout in the user conversation.
  • the timeout deletion set is stored separately in the long-term memory storage area to simulate human long-term memory and subconsciousness. Represents the subject or entity that the user has interacted with over a long period of time before.
  • the generating reply information according to the intersection specifically includes:
  • the initial response information whose scoring result meets the preset scoring requirements is extracted, and the response information is generated.
  • the method uses intersection as the dialogue information to generate facts, and generates replies according to the rule template.
  • the rule template is predefined, and the rule template can also be continuously optimized to strengthen the naturalness of the generated expression. Since multiple sentences of initial response information can often be generated based on the intersection, this method can score the initial response information to select the most appropriate initial response information as the final response information. For example, the ranking algorithm is used for sorting. After sorting, the answer with the highest ranking is the most suitable answer.
  • a computer see FIG. 3, includes a processor 801, an input device 802, an output device 803, and a memory 804.
  • the processor 801, input device 802, output device 803, and memory 804 are connected to each other via a bus 805, wherein the
  • the memory 804 is configured to store a computer program, the computer program including program instructions, and the processor 801 is configured to call the program instructions to execute the above-mentioned method.
  • the so-called processor 801 may be a central processing unit (Central Processing Unit, CPU), and the processor may also be other general-purpose processors or digital signal processors (Digital Signal Processors, DSP) , Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the input device 802 may include a touch panel, a fingerprint sensor (used to collect user fingerprint information and fingerprint orientation information), a microphone, etc.
  • the output device 803 may include a display (LCD, etc.), a speaker, etc.
  • the memory 804 may include a read-only memory and a random access memory, and provides instructions and data to the processor 801. A part of the memory 804 may also include a non-volatile random access memory. For example, the memory 804 may also store device type information.
  • a computer-readable storage medium stores a computer program
  • the computer program includes program instructions, and when executed by a processor, the program instructions cause the processor to perform the above-mentioned method.
  • the computer-readable storage medium may be the internal storage unit of the terminal described in any of the foregoing embodiments, such as the hard disk or memory of the terminal.
  • the computer-readable storage medium may also be an external storage device of the terminal, such as a plug-in hard disk equipped on the terminal, a Smart Media Card (SMC), or a Secure Digital (SD) card , Flash Card, etc.
  • the computer-readable storage medium may also include both an internal storage unit of the terminal and an external storage device.
  • the computer-readable storage medium is used to store the computer program and other programs and data required by the terminal.
  • the computer-readable storage medium can also be used to temporarily store data that has been output or will be output.
  • the medium provided in the embodiment of the present invention is for brief description.
  • the parts not mentioned in the embodiment please refer to the corresponding content in the foregoing method embodiment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

基于图的上下文关联回复生成方法、计算机及介质,该方法当存在历史的多轮对话信息时,根据该多轮对话信息生成语境子图(S1);接收用户输入的对话信息,对所述对话信息进行分词,得到分词实体(S2);从所述分词实体中提取对话信息中的核心信息,定义为三元组(S3);根据所述对话信息中的三元组以及其他分词实体构建该对话信息的通用图(S4);将该对话信息的通用图与语境子图进行对比,获得两者的交集(S5);根据所述交集生成回复信息(S6)。该方法具备了上下文分析和记忆的能力,回复灵活。

Description

基于图的上下文关联回复生成方法、计算机及介质 技术领域
本发明属于人机交互技术领域,具体涉及基于图的上下文关联回复生成方法、计算机及介质。
背景技术
图是计算机系统中一种常用的数据结构或存储方式。如果在人机交互中,机器回复的内容与之前n轮人机交互的内容有关,此时机器的回复满足上下文关联。
目前,在人机交互领域,机器生成回复的方法有许多。比如:通过预设问答库的方式让机器在问答库中检索一条回答回复给用户,通过机器学习、深度学习的等人工智能方法让机器生成答复等。
这些方法虽然得到了较为深入的研究和较为普及的应用,但是其缺点也尤为突出,前者需要预设足够大的问答库,且本质上机器的回复过程并不存在任何智能,机器也不能理解用户的输入。后者,机器生成的答复经常会存在语法错误,或机器的答复高度雷同。
发明内容
针对现有技术中的缺陷,本发明提供一种基于图的上下文关联回复生成方法、计算机及介质,具备了上下文分析和记忆的能力,回复灵活。
第一方面,一种基于图的上下文关联回复生成方法,包括以下步骤:
当存在历史的多轮对话信息时,根据该多轮对话信息生成语境子图;
接收用户输入的对话信息,对所述对话信息进行分词,得到分词实体;
从所述分词实体中提取对话信息中的核心信息,定义为三元组;
根据所述对话信息中的三元组以及其他分词实体构建该对话信息的通用图;
将该对话信息的通用图与语境子图进行对比,获得两者的交集;
根据所述交集生成回复信息。
优选地,所述根据多轮对话信息生成语境子图具体包括:
获取上轮对话信息提取到的三元组;
在预设的通用知识图谱中,提取出与上轮对话信息中三元组对应的实体距离为一跳范围内的实体;
如果语境子图为空或不存在,将上轮对话信息的三元组以及提取出来的实体填充至语境子图中;
如果语境子图不为空,将上轮对话信息的三元组以及提取出来的实体加入所述语境子图中,更新所述语境子图。
优选地,该方法在所述将该对话信息的通用图与语境子图进行对比,获得两者的交集之后,还包括:
将该对话信息的通用图与语境子图进行对比,获得两者的增量集;所述增量集包括存在于该对话信息的通用图中,但是不存在于语境子图中的实体;
将增量集中的实体加入所述语境子图中,更新所述语境子图。
优选地,所述对对话信息进行分词,提取三元组具体包括:
采用分词工具对对话信息进行分词,获得分词实体;
对所述分词实体进行词性标注,提取所述三元组。
优选地,该方法在所述生成语境子图之后,还包括:
设置三元组的自动销毁时间;
当检测到三元组的自动销毁时间到达时,删除所述语境子图中的自动销毁时间到达的三元组及其对应的实体。
优选地,该方法在删除所述语境子图中的自动销毁时间到达的三元组及其对应的实体之后,还包括:
将所述自动销毁时间到达的三元组及其对应的实体存储至预设的长时记忆存储区中。
优选地,所述根据所述交集生成回复信息具体包括:
根据预设的规则模板和所述交集生成多条初始回复信息;
对所有的初始回复信息进行评分;
提取出评分结果满足预设的评分要求的初始回复信息,生成回复信息。
第二方面,一种计算机,包括处理器、输入设备、输出设备和存储器,所述处理器、输入设备、输出设备和存储器相互连接,其中,所述存储器用于存储计算机程序,所述计算机程序包括程序指令,所述处理器被配置用于调用所述程序指令,执行第一方面所述的方法。
第三方面,一种计算机可读存储介质,所述计算机存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时使所述处理器执行第一方面所述的方法。
由上述技术方案可知,本发明提供的基于图的上下文关联回复生成方法、计算机及介质,能够大幅度的转义话题,不触发相关上下文回答,具备上下文分析和记忆的功能。以连续的与上文相关语料作为输入,回复的话题能够不断在不同实体和主题之间游走,不断将旧话题和新话题载入内存,回复灵活。
附图说明
为了更清楚地说明本发明具体实施方式或现有技术中的技术方案,下面将对具体实施方式或现有技术描述中所需要使用的附图作简单地介绍。在所有附图中,类似的元件或部分一般由类似的附图标记标识。附图中,各元件或部分并不一定按照实际的比例绘制。
图1为实施例一提供的上下文关联回复生成方法的流程图。
图2为实施例二提供的语境子图构建的方法流程图。
图3为实施例四提供的计算机的模块框图。
具体实施方式
下面将结合附图对本发明技术方案的实施例进行详细的描述。以下实施例仅用于更加清楚地说明本发明的技术方案,因此只作为示例,而不能以此来限制本发明的保护范围。需要注意的是,除非另有说明,本申请使用的技术术语或者科学术语应当为本发明所属领域技术人员所理解的通常意义。
应当理解,当在本说明书和所附权利要求书中使用时,术语“包括”和“包含”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。
还应当理解,在此本发明说明书中所使用的术语仅仅是出于描述特定实施例的目的而并不意在限制本发明。如在本发明说明书和所附权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。
如在本说明书和所附权利要求书中所使用的那样,术语“如果”可以依据上下文被解释为“当...时”或“一旦”或“响应于确定”或“响应于检测到”。类似地,短语“如果确定”或“如果检测到[所描述条件或事件]”可以依据上下文被解释为意指“一旦确定”或“响应于确定”或“一旦检测到[所描述条件或事件]”或“响应于检测到[所描述条件或事件]”。
实施例一:
一种基于图的上下文关联回复生成方法,参见图1,包括以下步骤:
S1:当存在历史的多轮对话信息时,根据该多轮对话信息生成语境子图;
具体地,语境子图中包含了多轮历史对话信息的主题和实体,能够反映用户在几次交互过程中关注的主题及相关的实体。如果本次多轮对话是首次对话时,说明不存在多轮对话信息的历史记录,默认语境子图为空或者是该领域的子图。例如:当用户选择的是特定领域时,则语境子图可以初始设置为与该特 定领域核心话题相关的子图,这样语境子图能够覆盖到该领域内大部分的核心话题,这样用户在进行初始对话时,可以利用该子图进行多轮会话。
S2:接收用户输入的对话信息,对所述对话信息进行分词,得到分词实体;
S3:从所述分词实体中提取对话信息中的核心信息,定义为三元组;
具体地,对对话信息进行分词是为了后续更好地提取出三元组。核心信息包括主语、谓语和宾语等主干信息。
S4:根据所述对话信息中的三元组以及其他分词实体构建该对话信息的通用图;
具体地,该对话信息的通用图能够完整地反映出本次对话信息的主题和实体。例如可以按照语法规则构建通用图。如用户输入“李白是唐代伟大的浪漫主义诗人,被后人誉为“诗仙”,与杜甫并称为“李杜””;针对该对话信息抽取以下多个三元组,其中三元组中为空的内容用空格表示:[李白,,唐代]、[李白,是,浪漫主义诗人]、[李白,是,诗仙]、[李白,,杜甫]、[李白与杜甫,并称,李杜],建立这句话的通用图,通用图中核心实体是李白,包含“唐代诗人”、“浪漫主义诗人”、“杜甫”、“李杜”等实体。
S5:将该对话信息的通用图与语境子图进行对比,获得两者的交集;
S6:根据所述交集生成回复信息。
具体地,交集包括同时存在于通用图与语境子图的实体的集合。例如语境子图是关于一人名周某某的主题,例如:星座、职业、爱好、身高、体重等。如果用户本次对话信息是“请问周某某的星座是什么”,由于星座是同时存在于通用图与语境子图的实体,所以两者的交集包括星座,所以根据语境子图中“星座”这一实体的关联实体(例如白羊座)生成回复信息,例如回复“周某某是白羊座”。还例如:针对上面李白例子,如果语境子图中包含实体“杜甫”、“李杜”,则“杜甫”、“李杜”是语境子图和通用图的交集,也就是核心实体从李白扩大/转移到杜甫,为后续转移话题提供参考,让用户感觉机器人非常智能,能够思考。
该方法能够大幅度的转移话题,不触发相关上下文回答,具备上下文分析和记忆的功能。以连续的与上文相关语料作为输入,回复的话题能够不断在不同实体和主题之间游走,不断将旧话题和新话题载入内存,回复灵活。
实施例二:
实施例二在实施例一的基础上增加以下内容:
参见图2,所述根据多轮对话信息生成语境子图具体包括:
S11:获取上轮对话信息提取到的三元组;
具体地,当用户在进行每一轮对话时,就会对该轮对话信息进行分词,提取出本次对话信息的三元组。如果当前会话处理结束后,结合已结束的会话信息中的三元组更新语境子图。
S12:在预设的通用知识图谱中,提取出与上轮对话信息中三元组对应的实体距离为一跳范围内的实体;
具体地,知识图谱中和目标实体(即上轮对话信息的三元组)直接相连的实体,就是和目标实体距离为一跳范围内,即一跳可达。如果和目标实体中间隔着一个其他实体相连的实体,就是和目标实体距离为两跳。例如上述例子中,目标实体是职业,则一跳可达的实体是歌手。一跳可达实体包含了用户期望的回复、与用户输入相关的信息、用户下一步可能询问/输入的问题等与目标实体密切相关的实体。如果“周某某”是目标实体,那么其一跳可达的实体包括职业、爱好、星座等。
通用知识图谱包含有常见人机交互内容中涉及到的主题或实体。从通用知识图谱中提取出与目标实体一跳可达的实体,便能够很好地预测用户期望的回复、与用户输入相关的信息、用户下一步可能询问/输入的问题。
S13:如果语境子图为空或不存在,将上轮对话信息的三元组以及提取出来的实体填充至语境子图中。
具体地,如果上轮对话信息是用户的首次交互,那么语境子图还不存在或语境子图为空。此时根据首次交互内容生成语境子图。
S14:如果语境子图不为空,将上轮对话信息的三元组以及提取出来的实体加入所述语境子图中,更新所述语境子图。
具体地,如果上轮对话信息不是用户的首次交互,那么语境子图中有数据。此时根据上轮对话信息的三元组以及提取出来的实体更新所述语境子图。这样语境子图就包含了用户多次交互记录,即语境子图包括用户对话过程中所有实体和主题。
优选地,该方法在所述将该对话信息的通用图与语境子图进行对比,获得两者的交集之后,还包括:
将该对话信息的通用图与语境子图进行对比,获得两者的增量集;所述增量集包括存在于该对话信息的通用图中,但是不存在于语境子图中的实体;
将增量集中的实体加入所述语境子图中,更新所述语境子图。
具体地,增量集反应了本次对话信息中新增的实体或主题。当每轮对话结束后,可以根据每轮对话的增量集更新语境子图。
本发明实施例所提供的方法,为简要描述,实施例部分未提及之处,可参考前述方法实施例中相应内容。
实施例三:
实施例三在其他实施例的基础上增加以下内容:
所述对对话信息进行分词,提取三元组具体包括:
采用分词工具对对话信息进行分词,获得分词实体;
对所述分词实体进行词性标注,提取所述三元组。
具体地,分词工具可以为jieba分词工具。词性标注是语料库语言学中将语料库内单词的词性按其含义和上下文内容进行标记的文本数据处理技术。词性标注可以由特定算法完成,常见的词性标注算法包括隐马尔可夫模型(Hidden Markov Model,HMM)、条件随机场(Conditional random fields,CRFs)等。该方法在进行词性标注,主要用于标出核心信息。
优选地,该方法在所述生成语境子图之后,还包括:
设置三元组的自动销毁时间;
当检测到三元组的自动销毁时间到达时,删除所述语境子图中的自动销毁时间到达的三元组及其对应的实体。
具体地,当三元组载入内存时,即存储新的对话信息中的三元组时,设置三元组的自动销毁时间,以模拟人的遗忘功能。比如,自动销毁时间可以设置为1分钟,即表示每组三元组仅存在1分钟,这样一来,可以根据一分钟内对话信息的所有三元组形成了语境子图,但是一分钟之前对话信息的三元组和其相关实体(例如相连的实体)都被删除,表示大脑只记忆近期(1分钟内)的数据。
优选地,该方法在删除所述语境子图中的自动销毁时间到达的三元组及其对应的实体之后,还包括:
将所述自动销毁时间到达的三元组及其对应的实体存储至预设的长时记忆存储区中。
具体地,可以根据自动销毁时间到达的三元组构建超时删除集,这样超时删除集即表示用户对话中超时被删除的实体。超时删除集被单独保存在长时记忆存储区,以模拟人类的长时记忆和潜意识。表示用户之前很长一段时间内交互过的主题或实体。
优选地,所述根据所述交集生成回复信息具体包括:
根据预设的规则模板和所述交集生成多条初始回复信息;
对所有的初始回复信息进行评分;
提取出评分结果满足预设的评分要求的初始回复信息,生成回复信息。
具体地,该方法以交集作为对话信息生成事实,根据规则模板生成回复。 其中规则模板是预定义的,也可以通过不断优化完善规则模板,加强生成表达的自然度。由于根据交集往往可会生成多句初始回复信息,所以该方法可以对初始回复信息进行评分,以选择最合适的初始回复信息作为最终的回复信息。例如采用ranking算法进行排序,排序之后,排名最高的回答就是最合适的回答。
本发明实施例所提供的方法,为简要描述,实施例部分未提及之处,可参考前述方法实施例中相应内容。
实施例四:
一种计算机,参见图3,包括处理器801、输入设备802、输出设备803和存储器804,所述处理器801、输入设备802、输出设备803和存储器804通过总线805相互连接,其中,所述存储器804用于存储计算机程序,所述计算机程序包括程序指令,所述处理器801被配置用于调用所述程序指令,执行上述的方法。
应当理解,在本发明实施例中,所称处理器801可以是中央处理单元(Central Processing Unit,CPU),该处理器还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
输入设备802可以包括触控板、指纹采传感器(用于采集用户的指纹信息和指纹的方向信息)、麦克风等,输出设备803可以包括显示器(LCD等)、扬声器等。
该存储器804可以包括只读存储器和随机存取存储器,并向处理器801提供指令和数据。存储器804的一部分还可以包括非易失性随机存取存储器。例如,存储器804还可以存储设备类型的信息。
本发明实施例所提供的计算机,为简要描述,实施例部分未提及之处,可参考前述方法实施例中相应内容。
实施例五:
一种计算机可读存储介质,所述计算机存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时使所述处理器执行上述的方法。
所述计算机可读存储介质可以是前述任一实施例所述的终端的内部存储单元,例如终端的硬盘或内存。所述计算机可读存储介质也可以是所述终端的外部存储设备,例如所述终端上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,所述计算机可读存储介质还可以既包括所述终端的内部存储单元也包括外部存储设备。所述计算机可读存储介质用于存储所述计算机程序以及所述终端所需的其他程序和数据。所述计算机可读存储介质还可以用于暂时地存储已经输出或者将要输出的数据。
本发明实施例所提供的介质,为简要描述,实施例部分未提及之处,可参考前述方法实施例中相应内容。
最后应说明的是:以上各实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述各实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的范围,其均应涵盖在本发明的权利要求和说明书的范围当中。

Claims (9)

  1. 一种基于图的上下文关联回复生成方法,其特征在于,包括以下步骤:
    当存在历史的多轮对话信息时,根据该多轮对话信息生成语境子图;
    接收用户输入的对话信息,对所述对话信息进行分词,得到分词实体;
    从所述分词实体中提取对话信息中的核心信息,定义为三元组;
    根据所述对话信息中的三元组以及其他分词实体构建该对话信息的通用图;
    将该对话信息的通用图与语境子图进行对比,获得两者的交集;
    根据所述交集生成回复信息。
  2. 根据权利要求1所述基于图的上下文关联回复生成方法,其特征在于,所述根据多轮对话信息生成语境子图具体包括:
    获取上轮对话信息提取到的三元组;
    在预设的通用知识图谱中,提取出与上轮对话信息中三元组对应的实体距离为一跳范围内的实体;
    如果语境子图为空或不存在,将上轮对话信息的三元组以及提取出来的实体填充至语境子图中;
    如果语境子图不为空,将上轮对话信息的三元组以及提取出来的实体加入所述语境子图中,更新所述语境子图。
  3. 根据权利要求2所述基于图的上下文关联回复生成方法,其特征在于,该方法在所述将该对话信息的通用图与语境子图进行对比,获得两者的交集之后,还包括:
    将该对话信息的通用图与语境子图进行对比,获得两者的增量集;所述增量集包括存在于该对话信息的通用图中,但是不存在于语境子图中的实体;
    将增量集中的实体加入所述语境子图中,更新所述语境子图。
  4. 根据权利要求2所述基于图的上下文关联回复生成方法,其特征在于, 所述对对话信息进行分词,提取三元组具体包括:
    采用分词工具对对话信息进行分词,获得分词实体;
    对所述分词实体进行词性标注,提取所述三元组。
  5. 根据权利要求1所述基于图的上下文关联回复生成方法,其特征在于,该方法在所述生成语境子图之后,还包括:
    设置三元组的自动销毁时间;
    当检测到三元组的自动销毁时间到达时,删除所述语境子图中的自动销毁时间到达的三元组及其对应的实体。
  6. 根据权利要求5所述基于图的上下文关联回复生成方法,其特征在于,该方法在删除所述语境子图中的自动销毁时间到达的三元组及其对应的实体之后,还包括:
    将所述自动销毁时间到达的三元组及其对应的实体存储至预设的长时记忆存储区中。
  7. 根据权利要求1所述基于图的上下文关联回复生成方法,其特征在于,所述根据所述交集生成回复信息具体包括:
    根据预设的规则模板和所述交集生成多条初始回复信息;
    对所有的初始回复信息进行评分;
    提取出评分结果满足预设的评分要求的初始回复信息,生成回复信息。
  8. 一种计算机,其特征在于,包括处理器、输入设备、输出设备和存储器,所述处理器、输入设备、输出设备和存储器相互连接,其中,所述存储器用于存储计算机程序,所述计算机程序包括程序指令,所述处理器被配置用于调用所述程序指令,执行如权利要求1-7任一项所述的方法。
  9. 一种计算机可读存储介质,其特征在于,所述计算机存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时使所述处理器执行如权利要求1-7任一项所述的方法。
PCT/CN2019/082913 2019-03-22 2019-04-16 基于图的上下文关联回复生成方法、计算机及介质 WO2020191828A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910224022.9 2019-03-22
CN201910224022.9A CN109918494B (zh) 2019-03-22 2019-03-22 基于图的上下文关联回复生成方法、计算机及介质

Publications (1)

Publication Number Publication Date
WO2020191828A1 true WO2020191828A1 (zh) 2020-10-01

Family

ID=66966436

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/082913 WO2020191828A1 (zh) 2019-03-22 2019-04-16 基于图的上下文关联回复生成方法、计算机及介质

Country Status (2)

Country Link
CN (1) CN109918494B (zh)
WO (1) WO2020191828A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114328956A (zh) * 2021-12-23 2022-04-12 北京百度网讯科技有限公司 文本信息的确定方法、装置、电子设备及存储介质

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111325034A (zh) * 2020-02-12 2020-06-23 平安科技(深圳)有限公司 多轮对话中语义补齐的方法、装置、设备及存储介质
CN111831801B (zh) * 2020-05-27 2022-02-25 北京市农林科学院 一种人机对话方法及系统
CN111985237A (zh) * 2020-06-29 2020-11-24 联想(北京)有限公司 一种实体抽取方法、装置及设备
CN111930916B (zh) * 2020-09-18 2021-02-05 北京百度网讯科技有限公司 对话生成方法、装置、电子设备和存储介质
CN112507728A (zh) * 2020-12-11 2021-03-16 平安科技(深圳)有限公司 智能对话方法、装置、电子设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140012795A1 (en) * 2006-12-21 2014-01-09 Support Machines Ltd. Method and computer program product for providing a response to a statement of a user
US20160124937A1 (en) * 2014-11-03 2016-05-05 Service Paradigm Pty Ltd Natural language execution system, method and computer readable medium
CN108170749A (zh) * 2017-12-21 2018-06-15 北京百度网讯科技有限公司 基于人工智能的对话方法、装置及计算机可读介质
CN108763568A (zh) * 2018-06-05 2018-11-06 北京玄科技有限公司 智能机器人交互流程的管理方法、多轮对话方法及装置
CN109033318A (zh) * 2018-07-18 2018-12-18 北京市农林科学院 智能问答方法及装置

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108804525B (zh) * 2018-04-27 2021-04-13 出门问问信息科技有限公司 一种智能回答方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140012795A1 (en) * 2006-12-21 2014-01-09 Support Machines Ltd. Method and computer program product for providing a response to a statement of a user
US20160124937A1 (en) * 2014-11-03 2016-05-05 Service Paradigm Pty Ltd Natural language execution system, method and computer readable medium
CN108170749A (zh) * 2017-12-21 2018-06-15 北京百度网讯科技有限公司 基于人工智能的对话方法、装置及计算机可读介质
CN108763568A (zh) * 2018-06-05 2018-11-06 北京玄科技有限公司 智能机器人交互流程的管理方法、多轮对话方法及装置
CN109033318A (zh) * 2018-07-18 2018-12-18 北京市农林科学院 智能问答方法及装置

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114328956A (zh) * 2021-12-23 2022-04-12 北京百度网讯科技有限公司 文本信息的确定方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN109918494B (zh) 2022-11-04
CN109918494A (zh) 2019-06-21

Similar Documents

Publication Publication Date Title
WO2020191828A1 (zh) 基于图的上下文关联回复生成方法、计算机及介质
US11816438B2 (en) Context saliency-based deictic parser for natural language processing
US11394667B2 (en) Chatbot skills systems and methods
US11062270B2 (en) Generating enriched action items
JP6971853B2 (ja) コミュニケーション及びコンテンツからのコミットメント及びリクエストの自動抽出
US20190103111A1 (en) Natural Language Processing Systems and Methods
US10223355B2 (en) Knowledge-based editor with natural language interface
WO2019153522A1 (zh) 智能交互方法、电子装置及存储介质
US8818795B1 (en) Method and system for using natural language techniques to process inputs
US11586689B2 (en) Electronic apparatus and controlling method thereof
US10169466B2 (en) Persona-based conversation
WO2024103609A1 (zh) 一种对话模型的训练方法及装置、对话响应方法及装置
US20150149391A1 (en) System and a method for prividing a dialog with a user
WO2021063089A1 (zh) 规则匹配方法、规则匹配装置、存储介质及电子设备
JP7488871B2 (ja) 対話推薦方法、装置、電子機器、記憶媒体ならびにコンピュータプログラム
CN110188189B (zh) 一种基于知识的自适应事件索引认知模型提取文档摘要的方法
WO2021114836A1 (zh) 一种文本通顺度确定方法、装置、设备及介质
US11544467B2 (en) Systems and methods for identification of repetitive language in document using linguistic analysis and correction thereof
WO2023246719A1 (zh) 会议记录处理方法、装置、设备及存储介质
CN109388695B (zh) 用户意图识别方法、设备及计算机可读存储介质
JP4824043B2 (ja) 自然言語対話エージェントの知識構造構成方法、知識構造を用いた自動応答の作成方法および自動応答作成装置
US11978458B2 (en) Electronic apparatus and method for recognizing speech thereof
US11328719B2 (en) Electronic device and method for controlling the electronic device
JP5717103B2 (ja) 文書間関係推定装置、方法、及びプログラム
JP7442217B1 (ja) プログラム、情報処理方法、および情報処理装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19921278

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 040222)

122 Ep: pct application non-entry in european phase

Ref document number: 19921278

Country of ref document: EP

Kind code of ref document: A1