WO2018006472A1 - 基于知识图谱的人机交互方法及系统 - Google Patents

基于知识图谱的人机交互方法及系统 Download PDF

Info

Publication number
WO2018006472A1
WO2018006472A1 PCT/CN2016/094908 CN2016094908W WO2018006472A1 WO 2018006472 A1 WO2018006472 A1 WO 2018006472A1 CN 2016094908 W CN2016094908 W CN 2016094908W WO 2018006472 A1 WO2018006472 A1 WO 2018006472A1
Authority
WO
WIPO (PCT)
Prior art keywords
record
knowledge map
module
input record
parsing
Prior art date
Application number
PCT/CN2016/094908
Other languages
English (en)
French (fr)
Inventor
邱楠
王昊奋
Original Assignee
深圳狗尾草智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳狗尾草智能科技有限公司 filed Critical 深圳狗尾草智能科技有限公司
Priority to JP2017538655A priority Critical patent/JP2018525691A/ja
Publication of WO2018006472A1 publication Critical patent/WO2018006472A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor

Definitions

  • the invention relates to the field of artificial intelligence technology, in particular to a human-computer interaction method and system based on knowledge map.
  • the existing human-computer interaction technology based on knowledge map cannot realize the smooth progress of multiple rounds of conversation. For example, the user input: the sound is louder, the robot can't recognize whether the sound of the played music is turned up, or the sound of the robot is turned up. This creates ambiguity that prevents multiple rounds of conversation from continuing.
  • the object of the present invention is to provide a human-computer interaction method and system based on knowledge map, which solves the problem that the existing knowledge-based human-computer interaction technology cannot continue the conversation when the user instruction is ambiguous.
  • a human-computer interaction method based on knowledge map including:
  • association record includes a multimodal input record and an associated intent module
  • the multimodal input record is parsed according to the input information and the knowledge map based on the state, the scene, the template, the model, and the method in the intent module, and the parsing result of the intent module is obtained as the multi The result of parsing the modal input record; or, when the associated intent module is not unique, based on the state, scene, template, model, and method in each intent module, the multimodal input record is based on the input information and the knowledge map. Parsing, obtaining the parsing result of each intent module as the parsing result of the multimodal input record;
  • the parsing result of the multimodal input record is unique, the parsing result is output; or, when the parsing result of the multimodal input record is not unique, the plurality of parsing results are evaluated, the optimal parsing result is obtained, and the optimal output is obtained. Analyze the results.
  • the multimodal input is recorded as text, voice, animation, expression, motion, gesture, picture or video.
  • the knowledge map is updated according to the multimodal input record and the analysis result thereof;
  • the knowledge map is updated according to the multimodal input record and its optimal parsing result.
  • the update record of the knowledge map is stored in the cloud server.
  • the intent module and the knowledge map are stored in a cloud server.
  • a human-computer interaction system based on knowledge map including:
  • An input module for acquiring a multimodal input record of the user and processing the multimodal input record
  • An intent identification module configured to associate the multimodal input record and the intent module according to the knowledge map to obtain an associated record; the associated record includes a multimodal input record and an associated intent module;
  • the parsing module is configured to parse the multimodal input record according to the input information and the knowledge map according to the state, the scene, the template, the model, and the method in the intent module when the associated intent module is unique, and obtain the intent module according to the input information and the knowledge map. Parsing the result as the parsing result of the multimodal input record; or, when the associated intent module is not unique, based on the state, scene, template, model, and method in each intent module, based on the input information and the knowledge map
  • the modal input record is parsed, and the parsing result of each intent module is obtained as an analysis result of the multimodal input record;
  • An output module configured to output the parsing result when the parsing result of the multimodal input record is unique; or, when the parsing result of the multimodal input record is not unique, evaluate the plurality of parsing results to obtain an optimal parsing result , output the optimal analysis result.
  • the multimodal input is recorded as text, voice, animation, expression, motion, gesture, picture or video.
  • the output module is further configured to update the knowledge map according to the multimodal input record and the analysis result when the analysis result of the multimodal input record is unique; or When the analysis result of the multimodal input record is not unique, the knowledge map is updated according to the multimodal input record and its optimal analysis result.
  • the update record module is configured to store the update record of the knowledge map on the cloud server.
  • the intent module and the knowledge map are stored in a cloud server.
  • the invention provides a human-computer interaction method and system based on knowledge map, according to the knowledge map, intent recognition of the user input, and analysis of the intention recognition result, and obtaining the analysis result, thereby eliminating ambiguity and realizing multiple rounds of conversation.
  • FIG. 1 is a flowchart of a human-computer interaction method based on a knowledge map provided by an embodiment of the present invention
  • FIG. 2 is a schematic structural diagram of a human-computer interaction system based on a knowledge map provided by an embodiment of the present invention.
  • an embodiment of the present invention provides a human-computer interaction method based on a knowledge map, including:
  • Step S101 Acquire a multimodal input record of the user, and process the multimodal input record
  • Step S102 Associate the multimodal input record and the intent module according to the knowledge map, and obtain an association record;
  • the association record includes a multimodal input record and an associated intent module;
  • Step S103 when the associated intent module is unique, parsing the multimodal input record according to the input information and the knowledge map based on the state, the scene, the template, the model, and the method in the intent module, and obtaining the parsing result of the intent module.
  • the parsing result of the multimodal input record or, when the associated intent module is not unique, based on the state, scene, template, model, and method in each intent module, multimodal according to the input information and the knowledge map Input records are parsed to obtain the parsing result of each intent module as the parsing result of the multimodal input record;
  • Step S104 When the analysis result of the multimodal input record is unique, the analysis result is output; or when the analysis result of the multimodal input record is not unique, the multiple analysis results are evaluated to obtain an optimal analysis result, and the output is obtained. The optimal analysis result.
  • step S101 since the output information is diversified (picture, text, voice, motion, gesture, video, etc.), Therefore, it is necessary to process a variety of inputs, and the processing target is for the intent recognition module to directly utilize;
  • the input information is associated with the corresponding intent (can be understood as a scene, a state, a certain skill, etc.); it should be noted that the input information may be associated with multiple, such as “volume increase”, It may be said that "the TV volume is increased”, or it may be "the volume of the machine itself is increased”; in addition, sometimes, the input information alone does not recognize the intention.
  • the knowledge map can be combined (the knowledge map here can be It is divided into two parts, one part is static; the long-term unchanged, different users are common; the part is dynamic, it is continuously updated, or it is different depending on the user and time;
  • step S103 after the intent recognition, the multimodal input record is associated with one or more intent modules, and the associated intent modules are respectively performed based on the own state, the scene, the defined template, the trained model, the method, and the like. Parsing; different intent module parsing methods are different, which can be: parsing according to input information and knowledge map;
  • step S104 after the intent recognition, it is possible to associate with one or more intent modules. If it is an intent module, the parsing result is unique, and the output module outputs only the parsing result of the intent module; when the intent is identified, the association is When there are multiple intent modules, there will be multiple parsing results. The output module needs to evaluate multiple parsing results and select an optimal parsing result for output.
  • the knowledge map in the embodiment of the present invention can be divided into two parts, one part is a public knowledge map; one part is a user-private knowledge map; the public knowledge map is a map constructed by all users' common knowledge; and the private knowledge map is a user private
  • the knowledge of the maps between different users is not universal.
  • the embodiment of the invention performs intent recognition on the input of the user, and parses the result of the intent recognition, and obtains the analysis result, thereby eliminating ambiguity and realizing multiple rounds of conversation.
  • the embodiment of the present invention does not limit the form of the multi-modal input record in step S101.
  • the multi-modal input record may be text, voice, animation, expression, action, gesture, picture or video.
  • the embodiment of the present invention may further include:
  • the knowledge map is updated according to the multimodal input record and the analysis result thereof;
  • the knowledge map is updated according to the multimodal input record and its optimal parsing result.
  • the embodiment of the invention can copy some data to the knowledge map while the output is being output, so that the knowledge map can be updated. Xi.
  • the embodiment of the present invention may further include storing the update record of the knowledge map in the cloud server.
  • the advantage of this is that it is convenient for users to query historical update records.
  • the embodiment of the present invention does not limit the storage location of the intent module and the knowledge map.
  • the intent module and the knowledge map may be stored in a cloud server. Storing the intent module and the knowledge map on the cloud server enables information sharing when users use different devices.
  • an embodiment of the present invention provides a human-computer interaction system based on a knowledge map, including:
  • An input module for acquiring a multimodal input record of the user and processing the multimodal input record
  • An intent identification module configured to associate the multimodal input record and the intent module according to the knowledge map to obtain an associated record; the associated record includes a multimodal input record and an associated intent module;
  • the parsing module is configured to parse the multimodal input record according to the input information and the knowledge map according to the state, the scene, the template, the model, and the method in the intent module when the associated intent module is unique, and obtain the intent module according to the input information and the knowledge map. Parsing the result as the parsing result of the multimodal input record; or, when the associated intent module is not unique, based on the state, scene, template, model, and method in each intent module, based on the input information and the knowledge map
  • the modal input record is parsed, and the parsing result of each intent module is obtained as an analysis result of the multimodal input record;
  • An output module configured to output the parsing result when the parsing result of the multimodal input record is unique; or, when the parsing result of the multimodal input record is not unique, evaluate the plurality of parsing results to obtain an optimal parsing result , output the optimal analysis result.
  • the output information is diversified (picture, text, voice, motion, gesture, video, etc.), the function of this module is to process a variety of inputs. The goal of the processing is to directly identify the module. use;
  • the main function of the intent recognition module is to associate the input information with the corresponding intent (which can be understood as a scene, a state, a certain skill, etc.); it should be noted that the input information may be associated with multiple, such as "volume” "Increase”, it may be said that "the TV volume is increased”, or it may be "the volume of the machine itself is increased”; in addition, sometimes, the input information alone does not recognize the intent, and at this time, the knowledge map can be combined (the knowledge here)
  • the map can be divided into two parts, one part is static; the long-term constant, different users are common; the part is dynamic, it is continuously updated, or it is different depending on the user and time) ;
  • the parsing module is at the heart of multiple rounds of conversation.
  • the intent recognition module performs intent recognition on the multimodal input record
  • the input is associated with one or more intent modules, and the associated intent modules are respectively based on their own state, scene, defined template, trained model, method, etc. Analyze; different methods of intent module parsing are different, but the common approach is: parsing based on input information and knowledge map;
  • Output module After the intent is identified, it may be associated with one or more intent modules. If it is an intent module, the parsing result is unique, and the output module only outputs the parsing result of the intent module; when the intent is identified, it is associated with multiple When the module is intent, there will be multiple parsing results. The output module needs to evaluate multiple parsing results and select an optimal parsing result for output.
  • the knowledge map in the embodiment of the present invention can be divided into two parts, one part is a public knowledge map; one part is a user-private knowledge map; the public knowledge map is a map constructed by all users' common knowledge; and the private knowledge map is a user private
  • the knowledge of the maps between different users is not universal.
  • the embodiment of the invention performs intent recognition on the input of the user, and parses the result of the intent recognition, and obtains the analysis result, thereby eliminating ambiguity and realizing multiple rounds of conversation.
  • the embodiment of the present invention does not limit the form of the multi-modal input record in the input module.
  • the multi-modal input record can be text, voice, animation, expression, action, gesture, picture or video.
  • the output module in the embodiment of the present invention may be further configured to: when the analysis result of the multi-modal input record is unique, according to the multi-modal input record and the analysis result thereof, the knowledge map The update is performed; or, when the parsing result of the multimodal input record is not unique, the knowledge map is updated according to the multimodal input record and its optimal parsing result.
  • some data can be copied to the knowledge map at the same time as the output, so that the knowledge map can be updated and learned.
  • the embodiment of the present invention may further include an update record module, configured to store the update record of the knowledge map in the cloud server.
  • an update record module configured to store the update record of the knowledge map in the cloud server.
  • the embodiment of the present invention does not limit the storage location of the intent module and the knowledge map.
  • the intent module and the knowledge map may be stored in a cloud server. Storing the intent module and the knowledge map on the cloud server enables information sharing when users use different devices.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • User Interface Of Digital Computer (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明提供了一种基于知识图谱的人机交互方法及系统,该方法包括:获取用户的多模态输入记录;根据知识图谱对多模态输入记录和意图模块进行关联;当关联到的意图模块唯一时,获取该意图模块的解析结果,输出该解析结果;或者,当关联到的意图模块不唯一时,获取每个意图模块的解析结果,对多个解析结果进行评估,获取最优解析结果,输出该最优解析结果。本发明可以根据知识图谱,对用户的输入进行意图识别,以及对意图识别结果进行解析,获取解析结果,从而消除歧义,实现多轮会话。

Description

基于知识图谱的人机交互方法及系统 技术领域
本发明涉及人工智能技术领域,尤其涉及一种基于知识图谱的人机交互方法及系统。
背景技术
现有的基于知识图谱的人机交互技术无法实现多轮会话的顺利进行,例如,用户输入:声音大一点,机器人无法识别是将所播放的音乐声音调大,还是机器人的声音调大,由此产生歧义,导致多轮会话无法继续。
发明内容
本发明的目的是提供一种基于知识图谱的人机交互方法及系统,解决了现有的基于知识图谱的人机交互技术在用户指令产生歧义时会话无法继续的问题。
本发明解决其技术问题所采用的技术方案是:
一种基于知识图谱的人机交互方法,包括:
获取用户的多模态输入记录,对多模态输入记录进行处理;
根据知识图谱对多模态输入记录和意图模块进行关联,获取关联记录;所述关联记录包括多模态输入记录和关联到的意图模块;
当关联到的意图模块唯一时,基于该意图模块中的状态、场景、模板、模型、方法,根据输入信息和知识图谱对多模态输入记录进行解析,获取该意图模块的解析结果,作为多模态输入记录的解析结果;或者,当关联到的意图模块不唯一时,基于每个意图模块中的状态、场景、模板、模型、方法,根据输入信息和知识图谱对多模态输入记录进行解析,获取每个意图模块的解析结果,作为多模态输入记录的解析结果;
当多模态输入记录的解析结果唯一时,输出该解析结果;或者,当多模态输入记录的解析结果不唯一时,对多个解析结果进行评估,获取最优解析结果,输出该最优解析结果。
在此基础上,进一步地,所述多模态输入记录为文字、语音、动画、表情、动作、手势、图片或视频。
在上述任意实施例的基础上,进一步地,还包括:
当多模态输入记录的解析结果唯一时,根据多模态输入记录及其解析结果对知识图谱进行更新;或者,
当多模态输入记录的解析结果不唯一时,根据多模态输入记录及其最优解析结果对知识图谱进行更新。
在此基础上,进一步地,还包括:
将知识图谱的更新记录存储于云服务器。
在上述任意实施例的基础上,进一步地,所述意图模块和知识图谱存储于云服务器。
一种基于知识图谱的人机交互系统,包括:
输入模块,用于获取用户的多模态输入记录,对多模态输入记录进行处理;
意图识别模块,用于根据知识图谱对多模态输入记录和意图模块进行关联,获取关联记录;所述关联记录包括多模态输入记录和关联到的意图模块;
解析模块,用于当关联到的意图模块唯一时,基于该意图模块中的状态、场景、模板、模型、方法,根据输入信息和知识图谱对多模态输入记录进行解析,获取该意图模块的解析结果,作为多模态输入记录的解析结果;或者,当关联到的意图模块不唯一时,基于每个意图模块中的状态、场景、模板、模型、方法,根据输入信息和知识图谱对多模态输入记录进行解析,获取每个意图模块的解析结果,作为多模态输入记录的解析结果;
输出模块,用于当多模态输入记录的解析结果唯一时,输出该解析结果;或者,当多模态输入记录的解析结果不唯一时,对多个解析结果进行评估,获取最优解析结果,输出该最优解析结果。
在此基础上,进一步地,所述多模态输入记录为文字、语音、动画、表情、动作、手势、图片或视频。
在上述任意实施例的基础上,进一步地,所述输出模块还用于当多模态输入记录的解析结果唯一时,根据多模态输入记录及其解析结果对知识图谱进行更新;或者,当多模态输入记录的解析结果不唯一时,根据多模态输入记录及其最优解析结果对知识图谱进行更新。
在此基础上,进一步地,还包括:
更新记录模块,用于将知识图谱的更新记录存储于云服务器。
在上述任意实施例的基础上,进一步地,所述意图模块和知识图谱存储于云服务器。
本发明的有益效果是:
本发明提供了一种基于知识图谱的人机交互方法及系统,根据知识图谱,对用户的输入进行意图识别,以及对意图识别结果进行解析,获取解析结果,从而消除歧义,实现多轮会话。
附图说明
下面结合附图和实施例对本发明进一步说明。
图1示出了本发明实施例提供的一种基于知识图谱的人机交互方法的流程图;
图2示出了本发明实施例提供的一种基于知识图谱的人机交互系统的结构示意图。
具体实施方式
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本发明,并不限定本发明。
具体实施例一
如图1所示,本发明实施例提供了一种基于知识图谱的人机交互方法,包括:
步骤S101,获取用户的多模态输入记录,对多模态输入记录进行处理;
步骤S102,根据知识图谱对多模态输入记录和意图模块进行关联,获取关联记录;所述关联记录包括多模态输入记录和关联到的意图模块;
步骤S103,当关联到的意图模块唯一时,基于该意图模块中的状态、场景、模板、模型、方法,根据输入信息和知识图谱对多模态输入记录进行解析,获取该意图模块的解析结果,作为多模态输入记录的解析结果;或者,当关联到的意图模块不唯一时,基于每个意图模块中的状态、场景、模板、模型、方法,根据输入信息和知识图谱对多模态输入记录进行解析,获取每个意图模块的解析结果,作为多模态输入记录的解析结果;
步骤S104,当多模态输入记录的解析结果唯一时,输出该解析结果;或者,当多模态输入记录的解析结果不唯一时,对多个解析结果进行评估,获取最优解析结果,输出该最优解析结果。
在步骤S101中,因为输出的信息多样化(图片、文字、语音、动作、手势、视频等), 因此要对多种输入进行处理,处理的目标就是供意图识别模块能直接利用;
在步骤S102中,把输入信息关联到对应的意图(可以理解成是场景、状态、某项技能等);需要说明的是,输入的信息有可能是关联到多个,比如“音量增加”,有可能是说“电视音量增加”,也有可能是“机器本身的音量增加”;另外,有时候,仅凭输入信息并不能识别出意图,这个时候,可以结合知识图谱(此处的知识图谱可以简单地分两个部分,一部分是静态的;长期不变的,不同的用户通用的;一部分是动态的,是持续更新的,或者说因用户的不同、时间的不同而不同)进行识别;
在步骤S103中,在意图识别过后,会将多模态输入记录关联到一个或多个意图模块,被关联到的意图模块分别基于自己状态、场景、定义的模板、训练的模型、方法等进行解析;不同的意图模块解析的方式方法各异,其可以是:根据输入信息及知识图谱进行解析;
在步骤S104中,在意图识别过后,有可能关联到了一个或多个意图模块,若是一个意图模块,那解析结果唯一,输出模块就仅仅将该意图模块的解析结果输出;当意图识别过后关联到多个意图模块时,就会有多个解析结果,输出模块就需要对多个解析结果进行评估,并选取一个最优解析结果进行输出。
本发明实施例中的知识图谱可以分为两部分,一部分是公有的知识图谱;一部分是用户私有的知识图谱;公有知识图谱是所有用户通用的知识所构建的图谱;而私有知识图谱是用户私有的,不同用户间其图谱知识不通用。
本发明实施例根据知识图谱,对用户的输入进行意图识别,以及对意图识别结果进行解析,获取解析结果,从而消除歧义,实现多轮会话。
本发明实施例对步骤S101中多模态输入记录的形式不做限定,优选的,所述多模态输入记录可以为文字、语音、动画、表情、动作、手势、图片或视频。
在上述任意实施例的基础上,优选的,本发明实施例还可以包括:
当多模态输入记录的解析结果唯一时,根据多模态输入记录及其解析结果对知识图谱进行更新;或者,
当多模态输入记录的解析结果不唯一时,根据多模态输入记录及其最优解析结果对知识图谱进行更新。
本发明实施例可以在输出的同时,将某些数据抄送给知识图谱,使知识图谱得以更新学 习。
在上述实施例的基础上,优选的,本发明实施例还可以包括将知识图谱的更新记录存储于云服务器。这样做的好处是,方便用户查询历史更新记录。
本发明实施例对意图模块和知识图谱的存储位置不做限定,在上述任意实施例的基础上,优选的,所述意图模块和知识图谱可以存储于云服务器。将意图模块和知识图谱存储于云服务器,可以实现用户在使用不同设备时的信息共享。
具体实施例二
如图2所示,本发明实施例提供了一种基于知识图谱的人机交互系统,包括:
输入模块,用于获取用户的多模态输入记录,对多模态输入记录进行处理;
意图识别模块,用于根据知识图谱对多模态输入记录和意图模块进行关联,获取关联记录;所述关联记录包括多模态输入记录和关联到的意图模块;
解析模块,用于当关联到的意图模块唯一时,基于该意图模块中的状态、场景、模板、模型、方法,根据输入信息和知识图谱对多模态输入记录进行解析,获取该意图模块的解析结果,作为多模态输入记录的解析结果;或者,当关联到的意图模块不唯一时,基于每个意图模块中的状态、场景、模板、模型、方法,根据输入信息和知识图谱对多模态输入记录进行解析,获取每个意图模块的解析结果,作为多模态输入记录的解析结果;
输出模块,用于当多模态输入记录的解析结果唯一时,输出该解析结果;或者,当多模态输入记录的解析结果不唯一时,对多个解析结果进行评估,获取最优解析结果,输出该最优解析结果。
在“输入”模块中,因为输出的信息多样化(图片、文字、语音、动作、手势、视频等),此模块的作用是对多种输入进行处理,处理的目标就是供意图识别模块能直接利用;
意图识别模块主要的作用就是,把输入信息关联到对应的意图(可以理解成是场景、状态、某项技能等);需要说明的是,输入的信息有可能是关联到多个,比如“音量增加”,有可能是说“电视音量增加”,也有可能是“机器本身的音量增加”;另外,有时候,仅凭输入信息并不能识别出意图,此时可以结合知识图谱(此处的知识图谱可以简单地分两个部分,一部分是静态的;长期不变的,不同的用户通用的;一部分是动态的,是持续更新的,或者说因用户的不同、时间的不同而不同)进行识别;
解析模块属于多轮对话的核心。在意图识别模块对多模态输入记录进行意图识别后,会将输入关联到一个或多个意图模块,被关联到的意图模块分别基于自己状态、场景、定义的模板、训练的模型、方法等进行解析;不同的意图模块解析的方式方法各异,但通用的做法都是:根据输入信息及知识图谱进行解析;
输出模块:在意图识别过后,有可能关联到了一个或多个意图模块,若是一个意图模块,那解析结果唯一,输出模块就仅仅将该意图模块的解析结果输出;当意图识别过后关联到多个意图模块时,就会有多个解析结果,输出模块就需要对多个解析结果进行评估,并选取一个最优解析结果进行输出。
本发明实施例中的知识图谱可以分为两部分,一部分是公有的知识图谱;一部分是用户私有的知识图谱;公有知识图谱是所有用户通用的知识所构建的图谱;而私有知识图谱是用户私有的,不同用户间其图谱知识不通用。
本发明实施例根据知识图谱,对用户的输入进行意图识别,以及对意图识别结果进行解析,获取解析结果,从而消除歧义,实现多轮会话。
本发明实施例对输入模块中多模态输入记录的形式不做限定,优选的,所述多模态输入记录可以为文字、语音、动画、表情、动作、手势、图片或视频。
在上述任意实施例的基础上,优选的,本发明实施例中所述输出模块还可以用于当多模态输入记录的解析结果唯一时,根据多模态输入记录及其解析结果对知识图谱进行更新;或者,当多模态输入记录的解析结果不唯一时,根据多模态输入记录及其最优解析结果对知识图谱进行更新。本发明实施例可以在输出的同时,将某些数据抄送给知识图谱,使知识图谱得以更新学习。
在上述实施例的基础上,优选的,本发明实施例还可以包括更新记录模块,用于将知识图谱的更新记录存储于云服务器。这样做的好处是,方便用户查询历史更新记录。
本发明实施例对意图模块和知识图谱的存储位置不做限定,在上述任意实施例的基础上,优选的,所述意图模块和知识图谱可以存储于云服务器。将意图模块和知识图谱存储于云服务器,可以实现用户在使用不同设备时的信息共享。
尽管本发明已进行了一定程度的描述,明显地,在不脱离本发明的精神和范围的条件下,可进行各个条件的适当变化。可以理解,本发明不限于所述实施方案,而归于权利要求的范 围,其包括所述每个因素的等同替换。

Claims (10)

  1. 一种基于知识图谱的人机交互方法,其特征在于,包括:
    获取用户的多模态输入记录,对多模态输入记录进行处理;
    根据知识图谱对多模态输入记录和意图模块进行关联,获取关联记录;所述关联记录包括多模态输入记录和关联到的意图模块;
    当关联到的意图模块唯一时,基于该意图模块中的状态、场景、模板、模型、方法,根据输入信息和知识图谱对多模态输入记录进行解析,获取该意图模块的解析结果,作为多模态输入记录的解析结果;或者,当关联到的意图模块不唯一时,基于每个意图模块中的状态、场景、模板、模型、方法,根据输入信息和知识图谱对多模态输入记录进行解析,获取每个意图模块的解析结果,作为多模态输入记录的解析结果;
    当多模态输入记录的解析结果唯一时,输出该解析结果;或者,当多模态输入记录的解析结果不唯一时,对多个解析结果进行评估,获取最优解析结果,输出该最优解析结果。
  2. 根据权利要求1所述的基于知识图谱的人机交互方法,其特征在于,所述多模态输入记录为文字、语音、动画、表情、动作、手势、图片或视频。
  3. 根据权利要求1或2所述的基于知识图谱的人机交互方法,其特征在于,还包括:
    当多模态输入记录的解析结果唯一时,根据多模态输入记录及其解析结果对知识图谱进行更新;或者,
    当多模态输入记录的解析结果不唯一时,根据多模态输入记录及其最优解析结果对知识图谱进行更新。
  4. 根据权利要求3所述的基于知识图谱的人机交互方法,其特征在于,还包括:
    将知识图谱的更新记录存储于云服务器。
  5. 根据权利要求1或2所述的基于知识图谱的人机交互方法,其特征在于,所述意图模块和知识图谱存储于云服务器。
  6. 一种基于知识图谱的人机交互系统,其特征在于,包括:
    输入模块,用于获取用户的多模态输入记录,对多模态输入记录进行处理;
    意图识别模块,用于根据知识图谱对多模态输入记录和意图模块进行关联,获取关联记 录;所述关联记录包括多模态输入记录和关联到的意图模块;
    解析模块,用于当关联到的意图模块唯一时,基于该意图模块中的状态、场景、模板、模型、方法,根据输入信息和知识图谱对多模态输入记录进行解析,获取该意图模块的解析结果,作为多模态输入记录的解析结果;或者,当关联到的意图模块不唯一时,基于每个意图模块中的状态、场景、模板、模型、方法,根据输入信息和知识图谱对多模态输入记录进行解析,获取每个意图模块的解析结果,作为多模态输入记录的解析结果;
    输出模块,用于当多模态输入记录的解析结果唯一时,输出该解析结果;或者,当多模态输入记录的解析结果不唯一时,对多个解析结果进行评估,获取最优解析结果,输出该最优解析结果。
  7. 根据权利要求6所述的基于知识图谱的人机交互系统,其特征在于,所述多模态输入记录为文字、语音、动画、表情、动作、手势、图片或视频。
  8. 根据权利要求6或7所述的基于知识图谱的人机交互系统,其特征在于,所述输出模块还用于当多模态输入记录的解析结果唯一时,根据多模态输入记录及其解析结果对知识图谱进行更新;或者,当多模态输入记录的解析结果不唯一时,根据多模态输入记录及其最优解析结果对知识图谱进行更新。
  9. 根据权利要求8所述的基于知识图谱的人机交互系统,其特征在于,还包括:
    更新记录模块,用于将知识图谱的更新记录存储于云服务器。
  10. 根据权利要求6或7所述的基于知识图谱的人机交互系统,其特征在于,所述意图模块和知识图谱存储于云服务器。
PCT/CN2016/094908 2016-07-07 2016-08-12 基于知识图谱的人机交互方法及系统 WO2018006472A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2017538655A JP2018525691A (ja) 2016-07-07 2016-08-12 知識マップに基づくヒューマンコンピュータインタラクション方法及びシステム

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610539546.3 2016-07-07
CN201610539546.3A CN107589828A (zh) 2016-07-07 2016-07-07 基于知识图谱的人机交互方法及系统

Publications (1)

Publication Number Publication Date
WO2018006472A1 true WO2018006472A1 (zh) 2018-01-11

Family

ID=60901744

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/094908 WO2018006472A1 (zh) 2016-07-07 2016-08-12 基于知识图谱的人机交互方法及系统

Country Status (3)

Country Link
JP (1) JP2018525691A (zh)
CN (1) CN107589828A (zh)
WO (1) WO2018006472A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108536861A (zh) * 2018-04-19 2018-09-14 中国科学院重庆绿色智能技术研究院 一种医疗指南的交互式培训方法及其系统
CN110442694A (zh) * 2019-02-26 2019-11-12 北京蓦然认知科技有限公司 一种智能交互平台训练方法、装置、设备
CN110795532A (zh) * 2019-10-18 2020-02-14 珠海格力电器股份有限公司 一种语音信息的处理方法、装置、智能终端以及存储介质
CN111694965A (zh) * 2020-05-29 2020-09-22 中国科学院上海微系统与信息技术研究所 一种基于多模态知识图谱的图像场景检索系统及方法
CN112131405A (zh) * 2020-09-28 2020-12-25 中国科学技术大学 一种基于智能搜索的ar肿瘤知识图谱多模态演示方法

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108214513A (zh) * 2018-01-23 2018-06-29 深圳狗尾草智能科技有限公司 机器人多维度响应交互方法及装置
WO2019184103A1 (zh) * 2018-03-30 2019-10-03 深圳狗尾草智能科技有限公司 基于人物ip的人机交互方法、系统、介质及设备
CN108920497B (zh) * 2018-05-23 2021-10-15 北京奇艺世纪科技有限公司 一种人机交互方法及装置
CN109272999B (zh) * 2018-09-19 2019-08-16 三角兽(北京)科技有限公司 信息处理装置、其人机对话方法及存储介质
CN110162641B (zh) * 2019-05-17 2022-01-04 五竹科技(北京)有限公司 基于音频交互的营销方法、装置以及存储介质
CN110516050A (zh) * 2019-07-15 2019-11-29 上海文思海辉金信软件有限公司 一种基于知识图谱的多路径训练场景的构建方法
CN112115252B (zh) * 2020-08-26 2023-06-02 罗彤 智能辅助写作处理方法、装置、电子设备及存储介质
CN112464814A (zh) * 2020-11-27 2021-03-09 北京百度网讯科技有限公司 视频处理方法、装置、电子设备及存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104871150A (zh) * 2012-07-20 2015-08-26 韦韦欧股份有限公司 在对话交互系统中推断搜索输入中的用户意图的方法和系统
CN105426436A (zh) * 2015-11-05 2016-03-23 百度在线网络技术(北京)有限公司 基于人工智能机器人的信息提供方法和装置
CN105630917A (zh) * 2015-12-22 2016-06-01 成都小多科技有限公司 智能应答方法及装置

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104965592A (zh) * 2015-07-08 2015-10-07 苏州思必驰信息科技有限公司 基于语音和手势识别的多模态非触摸人机交互方法及系统

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104871150A (zh) * 2012-07-20 2015-08-26 韦韦欧股份有限公司 在对话交互系统中推断搜索输入中的用户意图的方法和系统
CN105426436A (zh) * 2015-11-05 2016-03-23 百度在线网络技术(北京)有限公司 基于人工智能机器人的信息提供方法和装置
CN105630917A (zh) * 2015-12-22 2016-06-01 成都小多科技有限公司 智能应答方法及装置

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108536861A (zh) * 2018-04-19 2018-09-14 中国科学院重庆绿色智能技术研究院 一种医疗指南的交互式培训方法及其系统
CN108536861B (zh) * 2018-04-19 2022-03-18 中国科学院重庆绿色智能技术研究院 一种医疗指南的交互式培训方法及其系统
CN110442694A (zh) * 2019-02-26 2019-11-12 北京蓦然认知科技有限公司 一种智能交互平台训练方法、装置、设备
CN110795532A (zh) * 2019-10-18 2020-02-14 珠海格力电器股份有限公司 一种语音信息的处理方法、装置、智能终端以及存储介质
CN111694965A (zh) * 2020-05-29 2020-09-22 中国科学院上海微系统与信息技术研究所 一种基于多模态知识图谱的图像场景检索系统及方法
CN111694965B (zh) * 2020-05-29 2023-06-13 中国科学院上海微系统与信息技术研究所 一种基于多模态知识图谱的图像场景检索系统及方法
CN112131405A (zh) * 2020-09-28 2020-12-25 中国科学技术大学 一种基于智能搜索的ar肿瘤知识图谱多模态演示方法
CN112131405B (zh) * 2020-09-28 2024-05-17 中国科学技术大学 一种基于智能搜索的ar肿瘤知识图谱多模态演示方法

Also Published As

Publication number Publication date
JP2018525691A (ja) 2018-09-06
CN107589828A (zh) 2018-01-16

Similar Documents

Publication Publication Date Title
WO2018006472A1 (zh) 基于知识图谱的人机交互方法及系统
US11321535B2 (en) Hierarchical annotation of dialog acts
US11308934B2 (en) Hotword-aware speech synthesis
US9691379B1 (en) Selecting from multiple content sources
US10339715B2 (en) Virtual reality system
US11188586B2 (en) Organization, retrieval, annotation and presentation of media data files using signals captured from a viewing environment
WO2020078098A1 (zh) 一种基于梯度提升决策树的模型训练方法及装置
JP2022547704A (ja) 訓練を減らした意図認識技術
WO2017084185A1 (zh) 基于语义分析的智能终端控制方法、系统及智能终端
JP2019507417A (ja) 多変数検索のためのユーザインターフェース
JP2017068861A5 (zh)
US20160275952A1 (en) Communicating metadata that identifies a current speaker
US10276158B2 (en) System and method for initiating multi-modal speech recognition using a long-touch gesture
US20190042185A1 (en) Flexible voice-based information retrieval system for virtual assistant
JP7300435B2 (ja) 音声インタラクションするための方法、装置、電子機器、およびコンピュータ読み取り可能な記憶媒体
CN109741755A (zh) 语音唤醒词阈值管理装置及管理语音唤醒词阈值的方法
US20200357382A1 (en) Oral, facial and gesture communication devices and computing architecture for interacting with digital media content
US10402647B2 (en) Adapted user interface for surfacing contextual analysis of content
US10135950B2 (en) Creating a cinematic storytelling experience using network-addressable devices
TW201915665A (zh) 機器人互動方法和設備
US10762902B2 (en) Method and apparatus for synthesizing adaptive data visualizations
US11762451B2 (en) Methods and apparatus to add common sense reasoning to artificial intelligence in the context of human machine interfaces
WO2018094952A1 (zh) 一种内容推荐方法与装置
US10693944B1 (en) Media-player initialization optimization
US8798996B2 (en) Splitting term lists recognized from speech

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2017538655

Country of ref document: JP

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16907976

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16907976

Country of ref document: EP

Kind code of ref document: A1