WO2017031860A1 - Artificial intelligence-based control method and system for intelligent interaction device - Google Patents

Artificial intelligence-based control method and system for intelligent interaction device Download PDF

Info

Publication number
WO2017031860A1
WO2017031860A1 PCT/CN2015/096587 CN2015096587W WO2017031860A1 WO 2017031860 A1 WO2017031860 A1 WO 2017031860A1 CN 2015096587 W CN2015096587 W CN 2015096587W WO 2017031860 A1 WO2017031860 A1 WO 2017031860A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
willingness
interact
interaction
sound source
Prior art date
Application number
PCT/CN2015/096587
Other languages
French (fr)
Chinese (zh)
Inventor
葛行飞
李峥
林汉权
Original Assignee
百度在线网络技术(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 百度在线网络技术(北京)有限公司 filed Critical 百度在线网络技术(北京)有限公司
Publication of WO2017031860A1 publication Critical patent/WO2017031860A1/en

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B15/00Systems controlled by a computer
    • G05B15/02Systems controlled by a computer electric

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

An artificial intelligence-based control method and system for an intelligent interaction device, and an intelligent interaction device. The method comprises: receiving multi-modal input signals, the multi-modal input signals comprising an image signal, a sound signal, and/or a distance signal input by a user (S101); performing human face detection according to the image signal, and acquiring, when a human face is detected, a human face image and human face information (S102); performing lip region detection according to the human face image, to determine the motion condition of the lip region(S103); positioning a sound source according to the sound signal, to obtain information about the sound source (S104); determining the interaction intention of the user and the intensity degree of the interaction intention according to the human face information, the motion condition of the lip region, the information about the sound source, and/or the distance signal(S105); and controlling, according to the interaction intention of the user and the intensity degree of the interaction intention, an intelligent interaction device to perform a corresponding interaction response (S106). By means of the method, the interaction experience of a user during interaction with an intelligent interaction device is improved, and the intelligence of the intelligent interaction device is improved.

Description

基于人工智能的智能交互设备控制方法及系统Intelligent interactive device control method and system based on artificial intelligence
相关申请的交叉引用Cross-reference to related applications
本申请要求百度在线网络技术(北京)有限公司于2015年8月24日提交的、发明名称为“基于人工智能的智能交互设备控制方法及系统”的、中国专利申请号“201510523179.3”的优先权。This application claims the priority of the Chinese patent application number “201510523179.3” submitted by Baidu Online Network Technology (Beijing) Co., Ltd. on August 24, 2015, and the invention name is “Intelligent Intelligent Intelligent Interactive Device Control Method and System”. .
技术领域Technical field
本发明涉及智能终端技术领域,特别涉及一种基于人工智能(Artificial Intelligence,简称:AI)的智能交互设备控制方法、控制系统及智能交互设备。The present invention relates to the field of intelligent terminal technologies, and in particular, to an intelligent intelligence device control method, a control system, and an intelligent interaction device based on artificial intelligence (AI).
背景技术Background technique
现在的智能交互设备,如电视机、生活电器等,通常是采用遥控或者提前设定好的程序来执行相关的动作。这种通过遥控或者提前设定好的程序来执行相关动作的智能交互设备存在以下缺点:Today's smart interactive devices, such as televisions, living appliances, etc., usually use remote control or pre-programmed procedures to perform related actions. Such a smart interactive device that performs related actions by remote control or a program set in advance has the following disadvantages:
与人类之间的交互方式单一且互动性差,这是由于遥控操作功能有限,智能交互设备无法完成遥控操作功能以外的动作,同样,智能交互设备按照提前设定好的程序进行动作,也是存在无法完成设定程序以外的其它动作,不能针对不同用户需求进行不同的运动。另外,这些交互方式均是在用户遥控或者触发某个功能按键后进行的,因此,完全是被动的交互方式。The interaction with humans is single and the interaction is poor. This is because the remote control operation has limited functions, and the intelligent interaction device cannot perform actions other than the remote operation function. Similarly, the intelligent interaction device operates according to the program set in advance, and there is also Other actions than the setup procedure are completed, and different movements cannot be performed for different user needs. In addition, these interactions are performed after the user remotely controls or triggers a function button, so it is completely passive interaction.
虽然有一些如视频会议跟踪系统可以根据说话人的声音将摄像头等转向说话人,但是并不能够准确判断说话人是否存在交互意愿,也不能够根据交互意愿做成适当的反应。Although some video conferencing tracking systems can turn the camera and the like to the speaker according to the voice of the speaker, it is not possible to accurately determine whether the speaker has an willingness to interact or to respond appropriately according to the willingness to interact.
发明内容Summary of the invention
本发明的目的旨在至少解决所述技术缺陷之一。It is an object of the invention to at least address one of the technical drawbacks.
为此,本发明的第一个目的在于提出一种基于人工智能的智能交互设备控制方法。该方法能够提升用户与智能交互设备的交互体验,提升智能交互设备的智能性。To this end, the first object of the present invention is to propose an intelligent interactive device control method based on artificial intelligence. The method can improve the interaction experience between the user and the smart interaction device, and improve the intelligence of the smart interaction device.
本发明的第二个目的在于提出一种基于人工智能的智能交互设备控制系统。A second object of the present invention is to provide an intelligent interactive device control system based on artificial intelligence.
本发明的第三个目的在于提出一种智能交互设备。A third object of the present invention is to provide an intelligent interactive device.
本发明的第四个目的在于提出一种设备。A fourth object of the invention is to propose an apparatus.
本发明的第五个目的在于提出一种非易失性计算机存储介质。 A fifth object of the present invention is to provide a non-volatile computer storage medium.
为达到上述目的,本发明的第一方面的实施例公开了一种基于人工智能的智能交互设备控制方法,包括以下步骤:接收多模态的输入信号,所述多模态的输入信号包括用户输入的图像信号、声音信号和/或距离信号;根据所述图像信号进行人脸检测,并在检测到有人脸时,获取所述人脸图像和人脸信息;根据所述人脸图像进行唇区检测以确定唇区运动情况;根据所述声音信号进行声源定位以得到声源信息;根据所述人脸信息、所述唇区运动情况、所述声源信息和/或所述距离信号判断所述用户的交互意愿以及交互意愿强烈程度;以及根据所述用户的交互意愿以及交互意愿强烈程度控制所述智能交互设备进行相应的交互响应。To achieve the above object, an embodiment of the first aspect of the present invention discloses an artificial intelligence-based intelligent interactive device control method, including the steps of: receiving a multi-modal input signal, the multi-modal input signal including a user Input image signal, sound signal and/or distance signal; performing face detection according to the image signal, and acquiring the face image and face information when detecting a human face; performing lip according to the face image Area detection to determine a lip motion condition; performing sound source localization according to the sound signal to obtain sound source information; according to the face information, the lip motion condition, the sound source information, and/or the distance signal Determining the user's willingness to interact and the degree of willingness to interact; and controlling the smart interaction device to perform a corresponding interactive response according to the user's willingness to interact and the willingness to interact.
根据本发明实施例的基于人工智能的智能交互设备控制方法,能够实时的采集用户的声音信号、图像信号和/或距离信号,经过人工智能的分析后确定出用户是否存在交互意愿,并可以确定出交互意愿的强烈程度,然后自主地控制智能交互设备进行相应的动作,主动地与用户进行交互且交互手段丰富,进而提升用户的使用体验。The artificial intelligence-based intelligent interactive device control method according to the embodiment of the invention can collect the user's sound signal, image signal and/or distance signal in real time, and after analyzing the artificial intelligence, determine whether the user has the willingness to interact, and can determine The intensity of the interaction will be strong, and then the intelligent interaction device is controlled autonomously to perform corresponding actions, actively interacting with the user and enriching the interaction means, thereby improving the user experience.
本发明第二方面的实施例公开了一种基于人工智能的智能交互设备控制系统,包括:接收模块,用于接收多模态的输入信号,所述多模态的输入信号包括用户输入的图像信号、声音信号和/或距离信号;人脸检测模块,用于根据所述图像信号进行人脸检测,并在检测到有人脸时,获取所述人脸图像和人脸信息;唇区检测模块,用于根据所述人脸图像进行唇区检测以确定唇区运动情况;声源定位模块,用于根据所述声音信号进行声源定位以得到声源信息;决策模块,所述决策模块用于根据所述人脸信息、所述唇区运动情况、所述声源信息和/或所述距离信号判断所述用户的交互意愿以及交互意愿强烈程度;以及复合输出控制模块,用于根据所述用户的交互意愿以及交互意愿强烈程度控制所述智能交互设备进行相应的交互响应。An embodiment of the second aspect of the present invention discloses an artificial intelligence-based intelligent interactive device control system, including: a receiving module, configured to receive a multi-modal input signal, where the multi-modal input signal includes an image input by a user a signal, a sound signal, and/or a distance signal; a face detection module, configured to perform face detection according to the image signal, and acquire the face image and face information when a human face is detected; a lip detection module For performing lip detection according to the face image to determine a lip motion condition; a sound source positioning module, configured to perform sound source localization according to the sound signal to obtain sound source information; and a decision module, the decision module Determining the user's willingness to interact and the degree of interaction will be based on the face information, the lip motion, the sound source information, and/or the distance signal; and a composite output control module for The user's willingness to interact and the willingness to interact strongly control the intelligent interactive device to perform a corresponding interactive response.
根据本发明实施例的基于人工智能的智能交互设备控制系统,能够实时的采集用户的声音信号、图像信号和/或距离信号,经过人工智能的分析后确定出用户是否存在交互意愿,并可以确定出交互意愿的强烈程度,然后自主地控制智能交互设备进行相应的动作,主动地与用户进行交互且交互手段丰富,进而提升用户的使用体验。The artificial intelligence-based intelligent interactive device control system according to the embodiment of the present invention can collect the user's sound signal, image signal and/or distance signal in real time, and after analyzing the artificial intelligence, determine whether the user has the willingness to interact, and can determine The intensity of the interaction will be strong, and then the intelligent interaction device is controlled autonomously to perform corresponding actions, actively interacting with the user and enriching the interaction means, thereby improving the user experience.
本发明第三方面的实施例公开了一种智能交互设备,包括:根据上述的第二方面实施例所述的基于人工智能的智能交互设备控制系统。该智能的智能交互设备能够实时的采集用户的声音信号、图像信号和/或距离信号,经过人工智能的分析后确定出用户是否存在交互意愿,并可以确定出交互意愿的强烈程度,然后自主地控制智能交互设备进行相应的动作,主动地与用户进行交互且交互手段丰富,进而提升用户的使用体验。The embodiment of the third aspect of the present invention discloses an intelligent interaction device, comprising: the artificial intelligence-based intelligent interaction device control system according to the second aspect embodiment. The intelligent intelligent interaction device can collect the user's sound signal, image signal and/or distance signal in real time, and after analyzing by artificial intelligence, determine whether the user has the willingness to interact, and can determine the strong degree of interaction intention, and then autonomously Control the intelligent interaction device to perform corresponding actions, actively interact with the user and enrich the interaction means, thereby improving the user experience.
本发明第四方面实施例提供了一种设备,包括:一个或者多个处理器;存储器;一个或者多个程序,所述一个或者多个程序存储在所述存储器中,当被所述一个或者多 个处理器执行时,执行本发明第一方面实施例的基于人工智能的智能交互设备控制方法。A fourth aspect of the present invention provides an apparatus comprising: one or more processors; a memory; one or more programs, the one or more programs being stored in the memory when many When the processor executes, the artificial intelligence-based intelligent interactive device control method of the first aspect of the present invention is executed.
本发明第五方面实施例提供了一种非易失性计算机存储介质,所述计算机存储介质存储有一个或者多个程序,当所述一个或者多个程序被一个设备执行时,使得所述设备执行本发明第一方面实施例的基于人工智能的智能交互设备控制方法。A fifth aspect of the present invention provides a non-volatile computer storage medium storing one or more programs, when the one or more programs are executed by a device, causing the device An artificial intelligence-based intelligent interactive device control method for implementing the first aspect of the present invention.
本发明附加的方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本发明的实践了解到。The additional aspects and advantages of the invention will be set forth in part in the description which follows.
附图说明DRAWINGS
本发明所述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解,其中:The aspects and advantages of the invention will become apparent and readily understood from the following description of the embodiments of the invention
图1是根据本发明一个实施例的基于人工智能的智能交互设备控制方法的流程图;1 is a flowchart of an artificial intelligence based intelligent interactive device control method according to an embodiment of the present invention;
图2是根据本发明一个实施例的基于人工智能的智能交互设备控制系统的结构框图;以及2 is a structural block diagram of an artificial intelligence based intelligent interactive device control system according to an embodiment of the present invention;
图3是根据本发明一个实施例的基于人工智能的智能交互设备控制系统的原理图。3 is a schematic diagram of an artificial intelligence based intelligent interactive device control system in accordance with one embodiment of the present invention.
具体实施方式detailed description
下面详细描述本发明的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,仅用于解释本发明,而不能解释为对本发明的限制。The embodiments of the present invention are described in detail below, and the examples of the embodiments are illustrated in the drawings, wherein the same or similar reference numerals are used to refer to the same or similar elements or elements having the same or similar functions. The embodiments described below with reference to the drawings are intended to be illustrative of the invention and are not to be construed as limiting.
在本发明的描述中,需要说明的是,除非另有规定和限定,术语“安装”、“相连”、“连接”应做广义理解,例如,可以是机械连接或电连接,也可以是两个元件内部的连通,可以是直接相连,也可以通过中间媒介间接相连,对于本领域的普通技术人员而言,可以根据具体情况理解所述术语的具体含义。In the description of the present invention, it should be noted that the terms "installation", "connected", and "connected" are to be understood broadly, and may be, for example, mechanically or electrically connected, or two, unless otherwise specified and defined. The internal communication of the components may be directly connected or indirectly connected through an intermediate medium. For those skilled in the art, the specific meanings of the terms may be understood according to specific situations.
为了解决相关技术中存在的智能交互设备智能性差且不能很好与人类进行交互的问题,本发明基于人工智能实现了智能性高且与人类交互体验好的智能交互设备控制方法、控制系统及智能交互设备,其中,人工智能(Artificial Intelligence,简称:AI),是研究、开发用于模拟、延伸和扩展人的智能的理论、方法、技术及应用系统的一门新的技术科学。人工智能是计算机科学的一个分支,企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器,该领域的研究包括机器人、语言识别、图像识别、自然语言处理和专家系统等。In order to solve the problem that the intelligent interactive device existing in the related art has poor intelligence and cannot interact well with humans, the present invention realizes intelligent interactive device control method, control system and intelligence based on artificial intelligence with high intelligence and good human interaction experience. Interactive devices, in which Artificial Intelligence (AI) is a new technical science that studies and develops theories, methods, techniques, and application systems for simulating, extending, and extending human intelligence. Artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that responds in a manner similar to human intelligence. Research in this area includes robotics, speech recognition, image recognition, and nature. Language processing and expert systems.
人工智能是对人的意识、思维的信息过程的模拟。人工智能不是人的智能,但能像人那样思考,也可能超过人的智能。人工智能是包括十分广泛的科学,由不同的领域组成, 如机器学习,计算机视觉等等,总的说来,人工智能研究的一个主要目标是使机器能够胜任一些通常需要人类智能才能完成的复杂工作。Artificial intelligence is a simulation of the information process of human consciousness and thinking. Artificial intelligence is not human intelligence, but it can be like human thinking, and it may exceed human intelligence. Artificial intelligence is a very broad science that consists of different fields. Such as machine learning, computer vision, etc. In general, one of the main goals of artificial intelligence research is to enable machines to perform complex tasks that typically require human intelligence.
以下结合附图描述根据本发明实施例的基于人工智能的智能交互设备控制方法、控制系统及智能交互设备。An artificial intelligence-based intelligent interactive device control method, a control system, and an intelligent interaction device according to an embodiment of the present invention are described below with reference to the accompanying drawings.
图1是根据本发明一个实施例的基于人工智能的智能交互设备控制方法的流程图。如图1所示,该方法包括如下步骤:1 is a flow chart of an artificial intelligence based intelligent interactive device control method according to an embodiment of the present invention. As shown in FIG. 1, the method includes the following steps:
S101:接收多模态的输入信号,所述多模态的输入信号包括用户输入的图像信号、声音信号和/或距离信号。S101: Receive a multi-modal input signal, where the multi-modal input signal includes an image signal, a sound signal, and/or a distance signal input by a user.
具体地,用户输入的声音信号可以是用户通过麦克风输入的;上述的图像信号可以是通过摄像头采集得到的;上述的距离信号可以通过红外距离传感器采集得到。Specifically, the sound signal input by the user may be input by the user through a microphone; the image signal may be collected by a camera; the distance signal may be acquired by an infrared distance sensor.
S102:根据图像信号进行人脸检测,并在检测到有人脸时,获取人脸图像和人脸信息。其中,人脸信息包括但不限于人脸面积信息和人脸正面面对程度。S102: Perform face detection according to the image signal, and acquire a face image and a face information when a human face is detected. Among them, the face information includes but is not limited to the face area information and the face face degree.
具体地,对于如摄像头采集到的图像,可以通过人脸检测手段检测图像中是否存在人脸、人脸在图像中占据的面积、人脸是否正对智能交互设备等。Specifically, for an image captured by a camera, a face detection means may be used to detect whether there is a face in the image, an area occupied by the face in the image, whether the face is facing the smart interaction device, or the like.
当检测到图像中存在人脸之后,可以从图像中截取人脸图像,并保存人脸信息。After detecting the presence of a face in the image, the face image can be intercepted from the image and the face information can be saved.
S103:根据人脸图像进行唇区检测以确定唇区运动情况。S103: Perform lip detection according to the face image to determine the movement of the lip region.
具体地,当步骤S102中检测到图像中存在人脸后,可以通过唇区检测手段从截取的人脸图像中进行唇区运动情况的检测。例如:检测结果为唇区发送动作或者唇区未发生动作。Specifically, when it is detected in step S102 that a human face exists in the image, the detection of the lip motion condition may be performed from the intercepted face image by the lip region detecting means. For example, the detection result is that the lip zone sends an action or the lip zone does not act.
在本发明的一个实施例中,可以根据多帧人脸图像之间的唇区形状差异确定唇区运动情况。例如:前一帧人脸图像中唇区部分显示上下嘴唇闭合,而后一帧人脸图像中唇区部分显示上下嘴唇张开,此时,可以判定出用户唇区产生动作,可能是用户在开口说话等。In one embodiment of the invention, the lip motion condition can be determined based on the lip shape difference between the multi-frame face images. For example, the lip area of the face image of the previous frame shows that the upper and lower lips are closed, and the lip area of the face image of the latter frame shows that the upper and lower lips are open. At this time, it can be determined that the user's lip area is moving, and the user may be opening. Speak and so on.
需要说明的是,正常情况下即使用户没有说话等,上下嘴唇也可能在某个时刻产生动作,例如打哈欠。这种情况下,不应该认为用户的唇区产生了了与说话等相关的动作,因此,为了避免误判的发生,可以通过连续的多帧图像间唇区部分的比较来确定上下嘴唇是否产生了动作,即用户是否存在说话等行为。此外,还可以通过对声音信号进行语音活动检测的方式判断用户是否存在说话等行,例如:判断声音信号中是否包含了用户说话时的语音(即:说话声),具体可以通过人工智能中语音识别的功能实现。当识别出声音信号中包含了说话人的语音(即:说话声),则可以判断出用户存在说话行为。这样,也可以避免上述中误判的发生。It should be noted that, under normal circumstances, even if the user does not speak, the upper and lower lips may act at a certain moment, such as yawning. In this case, the user's lip area should not be considered to have an action related to speaking or the like. Therefore, in order to avoid the occurrence of misjudgment, it is possible to determine whether the upper and lower lips are generated by comparing the lip regions of the continuous multi-frame image. The action, that is, whether the user has a speech or the like. In addition, it is also possible to determine whether the user has a speech or the like by detecting the voice activity of the voice signal, for example, determining whether the voice signal contains the voice when the user speaks (ie, the voice), and specifically can pass the voice in the artificial intelligence. The function of the recognition is implemented. When it is recognized that the voice of the speaker is included in the voice signal (ie, the voice), it can be determined that the user has a voice behavior. In this way, the occurrence of misjudgment in the above can also be avoided.
S104:根据声音信号进行声源定位以得到声源信息。其中,声源信息包括但不限于声源方位信息和声音强度信息。S104: Perform sound source localization according to the sound signal to obtain sound source information. The sound source information includes, but is not limited to, sound source orientation information and sound intensity information.
具体地,例如对于通过麦克风阵列接收到的多方位的声音信号,可以据此通过声源定 位手段进行声源定位,从而确定出声源方位信息(即:声源角度信息)和声音强度信息。Specifically, for example, for a multi-directional sound signal received through a microphone array, the sound source can be determined accordingly. The bit means performs sound source localization to determine sound source orientation information (ie, sound source angle information) and sound intensity information.
需要说明的是,通常声音信号中包括了多种声音,如:说话声和其它的噪声,因此,为了能够准确地对说话人的说话声进行声源定位,因此,在根据声音信号进行声源定位以得到声源信息之前,可以对声音信号进行去噪以便滤除其它的噪声干扰,提升对说话人的说话声进行声源定位的定位精度。具体而言:判断声音信号中是否包含用户说话时的语音;如果是,则保留声音信号中用户说话时的语音,并从声音信号中滤除其它的干扰噪音,在上述示例中,可以通过人工智能中语音识别的功能实现,即通过语音识别功能识别出声音信号中包含的说话人的说话声,进而滤除其它的噪声,由此,提升对说话人的说话声进行声源定位的定位精度。It should be noted that usually, a plurality of sounds, such as speaking sounds and other noises, are included in the sound signal. Therefore, in order to accurately locate the sound of the speaker's voice, the sound source is based on the sound signal. Before positioning to obtain the sound source information, the sound signal can be denoised to filter out other noise interference, and the positioning accuracy of the sound source positioning of the speaker's voice can be improved. Specifically, it is determined whether the voice signal contains the voice when the user speaks; if yes, the voice of the voice signal in the voice signal is retained, and other interference noise is filtered out from the voice signal. In the above example, the voice can be manually The function of speech recognition in intelligence is to recognize the speech of the speaker contained in the sound signal through the speech recognition function, thereby filtering out other noises, thereby improving the positioning accuracy of the sound source localization of the speaker's speech. .
S105:根据人脸信息、唇区运动情况、声源信息和/或距离信号判断用户的交互意愿以及交互意愿强烈程度。S105: Determine the user's willingness to interact and the degree of interaction intention according to the face information, the lip motion, the sound source information, and/or the distance signal.
可以理解的是,在上述描述中,可以根据人脸信息、唇区运动情况、声源信息和距离信号中的任意一个来判断用户的交互意愿以及交互意愿强烈程度,也可以根据人脸信息、唇区运动情况、声源信息和距离信号中的多个或者全部一起来判断用户的交互意愿以及交互意愿强烈程度。相对于通过一个或者少数几个信息来判断用户的交互意愿以及交互意愿强烈程度而言,通过多个或者全部的上述信息来判断用户的交互意愿以及交互意愿强烈程度的准确性和可靠性相对较高。It can be understood that, in the above description, the user's willingness to interact and the degree of interaction will be determined according to any one of the face information, the lip motion, the sound source information, and the distance signal, and may also be based on the face information, A plurality of or all of the lip motion, the sound source information, and the distance signal are used together to determine the user's willingness to interact and the degree of interaction willingness. Relative to the degree of interaction of the user and the willingness to interact through one or a few pieces of information, the accuracy and reliability of the user's willingness to interact and the degree of interaction will be judged by multiple or all of the above information. high.
如下所述:As described below:
1、当判断用户正对智能交互设备、用户的嘴唇未运动、用户发声且声音强度大于预定强度以及用户与智能交互设备之间的距离小于预设距离时,判断用户具有弱交互意愿。其中,预定强度可以根据经验确定,其目的是区别高强度的声音和相对低强度的声音,例如:预定强度可以以分贝的形式存在,预定强度例如为50分贝,当声音强度小于50分贝,则认为是低强度声源,反之则认为其为高强度声源,当然,在本发明的其它示例中,声音强度也可以用语音活动性指数来代替;预设距离也可以根据经验确定,例如:预设距离为1米。也就是说,如果判断出用户正脸面对本智能交互设备,距离近(如1米内),嘴唇未运动,无高强度声源,则判定为用户对智能交互设备感兴趣,存在弱交互意愿。1. When it is determined that the user is facing the smart interaction device, the user's lips are not moving, the user utters the voice and the sound intensity is greater than the predetermined strength, and the distance between the user and the smart interaction device is less than the preset distance, the user is determined to have weak interaction willingness. Wherein, the predetermined intensity can be determined empirically, the purpose of which is to distinguish between a high-intensity sound and a relatively low-intensity sound, for example, the predetermined intensity may exist in the form of decibels, the predetermined intensity is, for example, 50 decibels, and when the sound intensity is less than 50 decibels, It is considered to be a low-intensity sound source, but it is considered to be a high-intensity sound source. Of course, in other examples of the present invention, the sound intensity can also be replaced by a voice activity index; the preset distance can also be determined empirically, for example: The preset distance is 1 meter. That is to say, if it is determined that the user faces the smart interactive device, the distance is close (such as within 1 meter), the lips are not moving, and there is no high-intensity sound source, it is determined that the user is interested in the smart interactive device, and there is a weak willingness to interact.
2、当判断用户正对智能交互设备、用户的嘴唇产生运动、用户发声且声音强度小于预定强度以及用户与智能交互设备之间的距离小于预设距离时,判断用户具有疑似交互意愿。其中,预定强度可以根据经验确定,其目的是区别高强度的声音和相对低强度的声音,例如:预定强度可以以分贝的形式存在,预定强度例如为50分贝,当声音强度小于50分贝,则认为是低强度声源,反之则认为其为高强度声源;预设距离也可以根据经验确定,例如:预设距离为1米。也就是说,如果用户正脸面对智能交互设备,距离近(如1米内),嘴唇 产生动作,不存在高强度声源,此时判定为疑似交互意愿。2. When it is determined that the user is exercising on the smart interaction device, the user's lips, the user utters the voice and the sound intensity is less than the predetermined strength, and the distance between the user and the smart interaction device is less than the preset distance, the user is determined to have a suspected interaction willingness. Wherein, the predetermined intensity can be determined empirically, the purpose of which is to distinguish between a high-intensity sound and a relatively low-intensity sound, for example, the predetermined intensity may exist in the form of decibels, the predetermined intensity is, for example, 50 decibels, and when the sound intensity is less than 50 decibels, It is considered to be a low-intensity sound source, otherwise it is considered to be a high-intensity sound source; the preset distance can also be determined empirically, for example, the preset distance is 1 meter. That is, if the user faces the smart interactive device, the distance is close (eg within 1 meter), the lips The action is generated, and there is no high-intensity sound source, and it is determined to be a suspected interactive will.
3、当判断用户正对智能交互设备、用户的嘴唇产生运动、用户发声且声音强度大于预定强度以及用户与智能交互设备之间的距离小于预设距离时,判断用户具有强交互意愿。其中,预定强度可以根据经验确定,其目的是区别高强度的声音和相对低强度的声音,例如:预定强度可以以分贝的形式存在,预定强度例如为50分贝,当声音强度小于50分贝,则认为是低强度声源,反之则认为其为高强度声源,当然,在本发明的其它示例中,声音强度也可以用语音活动性指数来代替;预设距离也可以根据经验确定,例如:预设距离为1米。也就是说,如果用户正脸面对智能交互设备,距离近(如1米内),嘴唇产生动作,存在高强度声源,则判定用户存在强交互意愿。3. When it is determined that the user is exercising motion on the smart interaction device, the user's lips, the user utters the voice and the sound intensity is greater than the predetermined strength, and the distance between the user and the smart interaction device is less than the preset distance, the user is determined to have a strong willingness to interact. Wherein, the predetermined intensity can be determined empirically, the purpose of which is to distinguish between a high-intensity sound and a relatively low-intensity sound, for example, the predetermined intensity may exist in the form of decibels, the predetermined intensity is, for example, 50 decibels, and when the sound intensity is less than 50 decibels, It is considered to be a low-intensity sound source, but it is considered to be a high-intensity sound source. Of course, in other examples of the present invention, the sound intensity can also be replaced by a voice activity index; the preset distance can also be determined empirically, for example: The preset distance is 1 meter. That is to say, if the user faces the smart interactive device, the distance is close (such as within 1 meter), the lips generate motion, and there is a high-intensity sound source, it is determined that the user has a strong willingness to interact.
4、当判断用户侧面面对智能交互设备、用户发声且声音强度大于预定强度以及用户与智能交互设备之间的距离小于预设距离时,判断用户具有伴随交互意愿。其中,预定强度可以根据经验确定,其目的是区别高强度的声音和相对低强度的声音,例如:预定强度可以以分贝的形式存在,预定强度例如为50分贝,当声音强度小于50分贝,则认为是低强度声源,反之则认为其为高强度声源,当然,在本发明的其它示例中,声音强度也可以用语音活动性指数来代替;预设距离也可以根据经验确定,例如:预设距离为1米。也就是说,如果用户侧脸面对本设备,距离近(如1米内),有高强度声源,则判定用户存在伴随交互意愿。4. When it is determined that the user faces the smart interaction device, the user utters the voice and the sound intensity is greater than the predetermined strength, and the distance between the user and the smart interaction device is less than the preset distance, the user is determined to have the willingness to interact. Wherein, the predetermined intensity can be determined empirically, the purpose of which is to distinguish between a high-intensity sound and a relatively low-intensity sound, for example, the predetermined intensity may exist in the form of decibels, the predetermined intensity is, for example, 50 decibels, and when the sound intensity is less than 50 decibels, It is considered to be a low-intensity sound source, but it is considered to be a high-intensity sound source. Of course, in other examples of the present invention, the sound intensity can also be replaced by a voice activity index; the preset distance can also be determined empirically, for example: The preset distance is 1 meter. That is to say, if the user's side face is facing the device and the distance is close (eg, within 1 meter) and there is a high-intensity sound source, it is determined that the user has a willingness to interact.
5、当未检测到人脸图像、用户发声且声音强度大于预定强度以及用户与智能交互设备之间的距离小于预设距离时,判断用户具有强疑似交互意愿。其中,预定强度可以根据经验确定,其目的是区别高强度的声音和相对低强度的声音,例如:预定强度可以以分贝的形式存在,预定强度例如为50分贝,当声音强度小于50分贝,则认为是低强度声源,反之则认为其为高强度声源,当然,在本发明的其它示例中,声音强度也可以用语音活动性指数来代替;预设距离也可以根据经验确定,例如:预设距离为1米。也就是说,如果有高强度声源,摄像头检测不到人脸,距离近(如1米内):判断为用户存在强疑似交互意愿(即需确认强交互意愿)。5. When the face image is not detected, the user utters the sound and the sound intensity is greater than the predetermined intensity, and the distance between the user and the smart interaction device is less than the preset distance, the user is judged to have a strong suspected interaction willingness. Wherein, the predetermined intensity can be determined empirically, the purpose of which is to distinguish between a high-intensity sound and a relatively low-intensity sound, for example, the predetermined intensity may exist in the form of decibels, the predetermined intensity is, for example, 50 decibels, and when the sound intensity is less than 50 decibels, It is considered to be a low-intensity sound source, but it is considered to be a high-intensity sound source. Of course, in other examples of the present invention, the sound intensity can also be replaced by a voice activity index; the preset distance can also be determined empirically, for example: The preset distance is 1 meter. That is to say, if there is a high-intensity sound source, the camera can not detect the face, and the distance is close (such as within 1 meter): it is judged that the user has a strong suspected willingness to interact (that is, a strong interactive willingness is required).
6、当未检测到人脸图像、用户发声且声音强度大于预定强度以及用户与智能交互设备之间的距离大于预设距离时,判断用户具有弱疑似交互意愿。其中,预定强度可以根据经验确定,其目的是区别高强度的声音和相对低强度的声音,例如:预定强度可以以分贝的形式存在,预定强度例如为50分贝,当声音强度小于50分贝,则认为是低强度声源,反之则认为其为高强度声源,当然,在本发明的其它示例中,声音强度也可以用语音活动性指数来代替;预设距离也可以根据经验确定,例如:预设距离为1米。也就是是说,如果有高强度声源,检测不到人脸,距离远(如大于1米),则判定为弱疑似交互意愿(即弱疑 似交互意愿)。6. When the face image is not detected, the user utters the voice and the sound intensity is greater than the predetermined strength, and the distance between the user and the smart interaction device is greater than the preset distance, the user is judged to have a weak suspected interaction willingness. Wherein, the predetermined intensity can be determined empirically, the purpose of which is to distinguish between a high-intensity sound and a relatively low-intensity sound, for example, the predetermined intensity may exist in the form of decibels, the predetermined intensity is, for example, 50 decibels, and when the sound intensity is less than 50 decibels, It is considered to be a low-intensity sound source, but it is considered to be a high-intensity sound source. Of course, in other examples of the present invention, the sound intensity can also be replaced by a voice activity index; the preset distance can also be determined empirically, for example: The preset distance is 1 meter. That is to say, if there is a high-intensity sound source, no face can be detected, and if the distance is far (such as more than 1 meter), it is judged to be weakly suspected (ie, weakly willing) Like interactive willingness).
7、以上为各种示例情况,综合来讲,是根据输入的多个独立特征构造针对多种交互意愿的多分类器,并依据多模态输入信号的值进行综合判断,来准确判定交互意愿并做出相应的反应。7. The above are various example cases. Generally speaking, a multi-classifier for multiple interaction intentions is constructed according to multiple independent features input, and comprehensive judgment is performed according to the value of the multi-modal input signal to accurately determine the interaction intention. And respond accordingly.
S106:根据用户的交互意愿以及交互意愿强烈程度控制智能交互设备进行相应的交互响应。S106: Control the smart interaction device to perform corresponding interaction response according to the user's willingness to interact and the willingness to interact.
例如:当上述步骤中判断出存在弱交互意愿时,可以智能地控制智能交互设备进行静默响应,如:显示不同表情,简单的机械动作等,而无需发声。For example, when it is determined that there is a weak interaction intention in the above steps, the smart interaction device can be intelligently controlled to perform a silent response, such as displaying different expressions, simple mechanical actions, and the like without vocalization.
当上述步骤中判断出存在疑似交互意愿时,可以控制智能交互设备进行提高音量提示响应,如发出提高音量的提示。When it is determined that there is a suspected interaction intention in the above steps, the smart interaction device may be controlled to perform a volume prompt response, such as issuing a prompt for increasing the volume.
当上述步骤中判断存在强交互意愿时,可以控制智能交互设备进行正式交互响应,即:正式与用户进行交互。When it is determined that there is a strong willingness to interact in the above steps, the smart interaction device may be controlled to perform a formal interactive response, that is, formally interact with the user.
当上述步骤中判断存在伴随交互意愿时,可以控制智能交互设备进行语音/聊天交互响应,即:以语音/聊天交互方式为主。When it is determined that there is a companion to the interaction in the above steps, the smart interaction device may be controlled to perform a voice/chat interaction response, that is, the voice/chat interaction mode is mainly used.
当上述步骤中判断存在强疑似交互意愿时,可以控制智能交互设备转向声源方向并进行提示响应,例如:将麦克风转向声源方向,并对用户进行提示。When it is determined that there is a strong suspected interaction intention in the above steps, the smart interaction device may be controlled to turn to the sound source direction and perform a prompt response, for example, turning the microphone to the sound source direction and prompting the user.
当上述步骤中判断存在弱疑似交互意愿时,可以仅仅控制智能交互设备转向声源方向。例如:仅将麦克风转向声源方向而不进行提示。When it is determined in the above steps that there is a weak intentional interaction intention, only the smart interaction device may be controlled to turn to the sound source direction. For example: just turn the microphone to the direction of the sound source without prompting.
另外,为了更加准确地判断出用户的交互意愿以及交互意愿强烈程度而避免误判的发生,在本发明的一个实施例中,可以在根据人脸信息、所述唇区运动情况、声源信息和/或距离信号判断用户的交互意愿以及交互意愿强烈程度之前,判断人脸信息、唇区运动情况、声源信息和/或距离信号是否满足预定条件;如果满足预定条件,则执行用户的交互意愿以及交互意愿强烈程度的判断。In addition, in order to more accurately determine the user's willingness to interact and the degree of interaction will be strong to avoid the occurrence of misjudgment, in one embodiment of the present invention, according to the face information, the lip motion, and the sound source information And/or determining whether the face information, the lip motion condition, the sound source information, and/or the distance signal satisfy a predetermined condition before the distance signal determines the user's willingness to interact and the degree of interaction intention; if the predetermined condition is met, the user interaction is performed. Willingness and the degree of strong willingness to interact.
具体地说,可以通过一个计时器来实现上述条件的判断,例如:当检测到有正脸面对智能交互设备之后,启动计时器,并在正脸面对智能交互设备的时间超过一个特定的时间(如3秒)后,判定用户确实是面对智能交互设备。这样可以避免误判的发生,设想一下,用户如果仅是活动一下头部,则也可能在某个时刻正脸面对智能交互设备,而通过上述的计时判断,可以对用户活动头部而在某个时刻正脸面对智能交互设备进行忽略,因此,可以降低误判的概率甚至消除误判。Specifically, the above condition can be determined by a timer, for example, when it is detected that a positive face faces the smart interaction device, the timer is started, and the time of facing the smart interaction device on the front face exceeds a specific time. After the time (such as 3 seconds), it is determined that the user is indeed facing the smart interaction device. In this way, the occurrence of misjudgment can be avoided. Imagine that if the user only moves his head, he or she may face the smart interactive device at a certain moment, and by the above timing judgment, the user can move to the head. At some point, the face is neglected by the intelligent interactive device, so the probability of misjudgment can be reduced or even the misjudgment can be eliminated.
此外,为了进一步提升用户的交互意愿以及交互意愿强烈程度的判断的准确性,在根据人脸信息、唇区运动情况、声源信息和/或距离信号判断用户的交互意愿以及交互意愿强烈程度之前,可以对人脸信息和唇部运动情况进行量化处理。如:30%正脸面对智能交互 设备、50%正脸面对智能交互设备等。在进行量化之后,可以为用户的交互意愿以及交互意愿强烈程度的判断提供统一的标准,从而提升判断精度。In addition, in order to further improve the user's willingness to interact and the accuracy of the judgment of the strong degree of interaction, before the user's willingness to interact and the willingness to interact are determined based on the face information, the lip motion, the sound source information, and/or the distance signal. , can quantify face information and lip movements. Such as: 30% positive face to face intelligent interaction The device, 50% of the face is facing the smart interaction device. After quantification, a unified standard can be provided for the user's willingness to interact and the degree of strong willingness to interact, thereby improving the accuracy of the judgment.
在本发明的一个实施例中,该方法还包括:调整人脸信息、唇区运动情况、声源信息和/或距离信号的权重,其中,权重用于影响用户的交互意愿以及交互意愿强烈程度的判断结果;判断用户的交互意愿以及交互意愿强烈程度,进一步包括:根据人脸信息、唇区运动情况、声源信息和/或距离信号的权重判断用户的交互意愿以及交互意愿强烈程度。具体地,通过调整各个输入信号的敏感程度(即权重),如:调高正脸面对信号和唇部运动的权重,降低声源输入强度的权重,则在用户仅动嘴唇,并不实际发声的情况下,也判断为有交互意愿,这样,可以针对不同的场景,进行不同的交互行为的响应,提升智能交互设备的交互体验。In an embodiment of the present invention, the method further includes: adjusting weights of the face information, the lip motion, the sound source information, and/or the distance signal, wherein the weight is used to influence the user's willingness to interact and the degree of interaction intention The judgment result; determining the user's willingness to interact and the intensity of the interaction intention further includes: judging the user's willingness to interact and the intensity of the interaction intention according to the face information, the lip motion, the sound source information, and/or the weight of the distance signal. Specifically, by adjusting the sensitivity (ie, weight) of each input signal, such as: increasing the weight of the positive face to the signal and the lip motion, and reducing the weight of the input intensity of the sound source, the user only moves the lip, which is not practical. In the case of utterance, it is also determined that there is a willingness to interact, so that different interaction behaviors can be responded to different scenarios, and the interactive experience of the smart interaction device is improved.
需要说明的是,智能交互设备可以为普通的生活电器、信息类电器(如:计算机、电视机等)、视频会议系统或者智能机器人等。It should be noted that the smart interaction device can be an ordinary living appliance, an information appliance (such as a computer, a television, etc.), a video conference system, or an intelligent robot.
根据本发明实施例的基于人工智能的智能交互设备控制方法,能够实时的采集用户的声音信号、图像信号和/或距离信号,经过人工智能的分析后确定出用户是否存在交互意愿,并可以确定出交互意愿的强烈程度,然后自主地控制智能交互设备进行相应的动作,主动地与用户进行交互且交互手段丰富,进而提升用户的使用体验。The artificial intelligence-based intelligent interactive device control method according to the embodiment of the invention can collect the user's sound signal, image signal and/or distance signal in real time, and after analyzing the artificial intelligence, determine whether the user has the willingness to interact, and can determine The intensity of the interaction will be strong, and then the intelligent interaction device is controlled autonomously to perform corresponding actions, actively interacting with the user and enriching the interaction means, thereby improving the user experience.
图2是根据本发明一个实施例的基于人工智能的智能交互设备控制系统的结构框图。2 is a structural block diagram of an artificial intelligence based intelligent interactive device control system according to an embodiment of the present invention.
如图2所示,并结合图3根据本发明一个实施例的基于人工智能的智能交互设备控制系统200,包括:接收模块210(如摄像头、红外距离传感器、麦克风阵列)、人脸检测模块220、唇区检测模块230、声源定位模块240、决策模块250(即决策中心)和复合输出控制模块260。As shown in FIG. 2, in conjunction with FIG. 3, an artificial intelligence-based intelligent interactive device control system 200 according to an embodiment of the present invention includes: a receiving module 210 (such as a camera, an infrared distance sensor, a microphone array), and a face detecting module 220. The lip detection module 230, the sound source localization module 240, the decision module 250 (ie, the decision center), and the composite output control module 260.
其中,接收模块210用于接收多模态的输入信号,所述多模态的输入信号包括用户输入的图像信号、声音信号和/或距离信号。人脸检测模块220用于根据所述图像信号进行人脸检测,并在检测到有人脸时,获取所述人脸图像和人脸信息。唇区检测模块230用于根据所述人脸图像进行唇区检测以确定唇区运动情况。声源定位模块240用于根据所述声音信号进行声源定位以得到声源信息。决策模块250用于根据所述人脸信息、所述唇区运动情况、所述声源信息和/或所述距离信号判断所述用户的交互意愿以及交互意愿强烈程度。复合输出控制模块260用于根据所述用户的交互意愿以及交互意愿强烈程度控制所述智能交互设备进行相应的交互响应。The receiving module 210 is configured to receive a multi-modal input signal, where the multi-modal input signal includes an image signal, a sound signal, and/or a distance signal input by a user. The face detection module 220 is configured to perform face detection according to the image signal, and acquire the face image and face information when a human face is detected. The lip detection module 230 is configured to perform lip detection based on the facial image to determine a lip motion condition. The sound source positioning module 240 is configured to perform sound source localization according to the sound signal to obtain sound source information. The decision module 250 is configured to determine the user's willingness to interact and the degree of interaction will be based on the face information, the lip motion condition, the sound source information, and/or the distance signal. The composite output control module 260 is configured to control the smart interaction device to perform a corresponding interaction response according to the user's willingness to interact and the willingness to interact.
在本发明的一个实施例中,还包括:语音活动检测模块(图2中没有示出),用于在声源定位模块240根据声音信号进行声源定位以得到声源信息之前,判断声音信号中是否包含用户说话时的语音,如果是,则保留声音信号中用户说话时的语音,并从声音信号中滤 除其它的干扰噪音。In an embodiment of the present invention, the method further includes: a voice activity detecting module (not shown in FIG. 2), configured to determine the sound signal before the sound source positioning module 240 performs sound source localization according to the sound signal to obtain the sound source information. Whether the voice of the user is spoken, and if so, the voice of the user in the voice signal is kept and filtered from the voice signal Other than interference noise.
具体而言,通常声音信号中包括了多种声音,如:说话声和其它的噪声,因此,为了能够准确地对说话人的说话声进行声源定位,因此,在根据声音信号进行声源定位以得到声源信息之前,可以对声音信号进行去噪以便滤除其它的噪声干扰,后续可以提升对说话人的说话声进行声源定位的定位精度。具体而言:判断声音信号中是否包含用户说话时的语音;如果是,则保留声音信号中用户说话时的语音,并从声音信号中滤除其它的干扰噪音,在上述示例中,可以通过人工智能中语音识别的功能实现,即通过语音识别功能识别出声音信号中包含的说话人的说话声,进而滤除其它的噪声,由此,后续便可以提升对说话人的说话声进行声源定位的定位精度。Specifically, in general, a sound signal includes a plurality of sounds, such as a voice and other noises. Therefore, in order to accurately position the voice of the speaker, the sound source is positioned according to the sound signal. Before the sound source information is obtained, the sound signal can be denoised to filter out other noise interference, and the positioning accuracy of the sound source positioning of the speaker's voice can be improved. Specifically, it is determined whether the voice signal contains the voice when the user speaks; if yes, the voice of the voice signal in the voice signal is retained, and other interference noise is filtered out from the voice signal. In the above example, the voice can be manually The function of speech recognition in intelligence is to recognize the speech of the speaker contained in the sound signal through the speech recognition function, thereby filtering out other noises, thereby enhancing the sound source localization of the speaker's speech. Positioning accuracy.
在本发明的一个实施例中,决策模块250还用于在根据所述人脸信息、所述唇区运动情况、所述声源信息和/或所述距离信号判断所述用户的交互意愿以及交互意愿强烈程度之前,判断所述人脸信息、所述唇区运动情况、所述声源信息和/或所述距离信号是否满足预定条件;如果满足所述预定条件,则执行用户的交互意愿以及交互意愿强烈程度的判断。In an embodiment of the present invention, the decision module 250 is further configured to determine, according to the face information, the lip motion condition, the sound source information, and/or the distance signal, the user's willingness to interact and Before the degree of interaction intention is strong, determining whether the face information, the lip motion, the sound source information, and/or the distance signal meet a predetermined condition; if the predetermined condition is met, performing a user's willingness to interact And a strong degree of willingness to interact.
在本发明的一个实施例中,决策模块250还用于在根据所述人脸信息、所述唇区运动情况、所述声源信息和/或所述距离信号判断所述用户的交互意愿以及交互意愿强烈程度之前,对所述人脸信息和所述唇区运动情况进行量化处理。In an embodiment of the present invention, the decision module 250 is further configured to determine, according to the face information, the lip motion condition, the sound source information, and/or the distance signal, the user's willingness to interact and Before the degree of interaction is strong, the face information and the lip motion are quantified.
在本发明的一个实施例中,决策模块250还用于:调整所述人脸信息、所述唇区运动情况、所述声源信息和/或所述距离信号的权重,其中,所述权重用于影响所述用户的交互意愿以及交互意愿强烈程度的判断结果;所述判断所述用户的交互意愿以及交互意愿强烈程度,包括:根据所述人脸信息、所述唇区运动情况、所述声源信息和/或所述距离信号的权重判断所述用户的交互意愿以及交互意愿强烈程度。In an embodiment of the present invention, the decision module 250 is further configured to: adjust the weight of the face information, the lip motion condition, the sound source information, and/or the distance signal, wherein the weight a judgment result for influencing the user's willingness to interact and a strong degree of interaction intention; the judging the user's willingness to interact and the degree of interaction intention, including: according to the face information, the movement of the lip region, The weight of the sound source information and/or the distance signal determines the user's willingness to interact and the degree of interaction willingness.
在本发明的一个实施例中,人脸信息包括人脸面积信息和人脸正面面对程度,所述声源信息包括声源方位信息和声音强度信息。In an embodiment of the present invention, the face information includes face area information and a face face degree, and the sound source information includes sound source orientation information and sound intensity information.
在本发明的一个实施例中,决策模块250用于:当判断所述用户正对所述智能交互设备、所述用户的嘴唇未运动、所述用户发声且声音强度大于预定强度以及用户与所述智能交互设备之间的距离小于预设距离时,判断所述用户具有弱交互意愿,复合输出控制模块260用于:控制所述智能交互设备进行静默响应。In an embodiment of the present invention, the decision module 250 is configured to: when determining that the user is facing the smart interaction device, the user's lips are not moving, the user is vocalized and the sound intensity is greater than a predetermined intensity, and the user and the user When the distance between the smart interaction devices is less than the preset distance, the user is determined to have weak interaction intention, and the composite output control module 260 is configured to: control the smart interaction device to perform a silent response.
在本发明的一个实施例中,决策模块250用于:当判断所述用户正对所述智能交互设备、所述用户的嘴唇产生运动、所述用户发声且声音强度小于预定强度以及用户与所述智能交互设备之间的距离小于所述预设距离时,判断所述用户具有疑似交互意愿,复合输出控制模块260用于:控制所述智能交互设备进行提高音量提示响应。In an embodiment of the present invention, the decision module 250 is configured to: when determining that the user is exercising motion on the smart interaction device, the user's lips, the user vocalizing and the sound intensity is less than a predetermined intensity, and the user and the user When the distance between the smart interaction devices is less than the preset distance, the user is determined to have a suspected interaction intention, and the composite output control module 260 is configured to: control the smart interaction device to perform a volume prompt response.
在本发明的一个实施例中,决策模块250用于:当判断所述用户正对所述智能交互设 备、所述用户的嘴唇产生运动、所述用户发声且声音强度大于所述预定强度以及所述用户与所述智能交互设备之间的距离小于所述预设距离时,判断用户具有强交互意愿,复合输出控制模块260用于:控制所述智能交互设备进行正式交互响应。In an embodiment of the present invention, the determining module 250 is configured to: when determining that the user is facing the smart interaction Determining that the user has a strong willingness to interact when the user's lips generate motion, the user utters sound and the sound intensity is greater than the predetermined intensity, and the distance between the user and the smart interaction device is less than the preset distance. The composite output control module 260 is configured to: control the smart interaction device to perform a formal interaction response.
在本发明的一个实施例中,决策模块250用于:当判断所述用户侧面面对所述智能交互设备、所述用户发声且声音强度大于所述预定强度以及所述用户与所述智能交互设备之间的距离小于所述预设距离时,判断所述用户具有伴随交互意愿,复合输出控制模块260用于:控制所述智能交互设备进行语音/聊天交互响应。In an embodiment of the present invention, the decision module 250 is configured to: when it is determined that the user faces the smart interaction device, the user vocalizes and the sound intensity is greater than the predetermined strength, and the user interacts with the smart When the distance between the devices is less than the preset distance, the user is determined to have the willingness to interact, and the composite output control module 260 is configured to: control the smart interaction device to perform a voice/chat interaction response.
在本发明的一个实施例中,决策模块250用于:当未检测到人脸图像、所述用户发声且声音强度大于所述预定强度以及所述用户与所述智能交互设备之间的距离小于所述预设距离时,判断所述用户具有强疑似交互意愿,复合输出控制模块260用于:控制所述智能交互设备转向所述声源方向并进行提示响应。In an embodiment of the present invention, the decision module 250 is configured to: when the face image is not detected, the user utters the voice and the sound intensity is greater than the predetermined strength, and the distance between the user and the smart interaction device is less than When the preset distance is determined, the user is determined to have a strong suspected interaction intention, and the composite output control module 260 is configured to: control the smart interaction device to turn to the sound source direction and perform a prompt response.
在本发明的一个实施例中,决策模块250用于:当未检测到人脸图像、所述用户发声且声音强度大于所述预定强度以及所述用户与所述智能交互设备之间的距离大于所述预设距离时,判断所述用户具有弱疑似交互意愿,复合输出控制模块260用于:控制所述智能交互设备转向所述声源的响应。In an embodiment of the present invention, the decision module 250 is configured to: when no face image is detected, the user utters sound and the sound intensity is greater than the predetermined intensity, and the distance between the user and the smart interaction device is greater than When the preset distance is determined, the user is determined to have a weak mutual willingness to interact. The composite output control module 260 is configured to: control the response of the smart interactive device to the sound source.
在本发明的一个实施例中,唇区检测模块230用于:根据多帧人脸图像之间的唇区形状差异确定所述唇区运动情况。In an embodiment of the present invention, the lip detection module 230 is configured to determine the lip motion according to a lip shape difference between the multi-frame facial images.
根据本发明实施例的基于人工智能的智能交互设备控制系统,能够实时的采集用户的声音信号、图像信号和/或距离信号,经过人工智能的分析后确定出用户是否存在交互意愿,并可以确定出交互意愿的强烈程度,然后自主地控制智能交互设备进行相应的动作,主动地与用户进行交互且交互手段丰富,进而提升用户的使用体验。The artificial intelligence-based intelligent interactive device control system according to the embodiment of the present invention can collect the user's sound signal, image signal and/or distance signal in real time, and after analyzing the artificial intelligence, determine whether the user has the willingness to interact, and can determine The intensity of the interaction will be strong, and then the intelligent interaction device is controlled autonomously to perform corresponding actions, actively interacting with the user and enriching the interaction means, thereby improving the user experience.
需要说明的是,本发明实施例的基于人工智能的智能交互设备控制系统的具体实现方式与本发明实施例的基于人工智能的智能交互设备控制方法的具体实现方式类似,具体请参见方法部分的描述,为了减少冗余,此处不做赘述。It should be noted that the specific implementation manner of the artificial intelligence-based intelligent interactive device control system in the embodiment of the present invention is similar to the specific implementation manner of the artificial intelligence-based intelligent interactive device control method in the embodiment of the present invention. For details, refer to the method part. Description, in order to reduce redundancy, we will not repeat them here.
进一步地,本发明的实施例公开了一种智能交互设备,包括:根据上述任意一个实施例所述的基于人工智能的智能交互设备控制系统。该智能交互设备能够实时的采集用户的声音信号、图像信号和/或距离信号,经过人工智能的分析后确定出用户是否存在交互意愿,并可以确定出交互意愿的强烈程度,然后自主地控制智能交互设备进行相应的动作,主动地与用户进行交互且交互手段丰富,进而提升用户的使用体验。Further, an embodiment of the present invention discloses an intelligent interaction device, including: an artificial intelligence-based intelligent interaction device control system according to any one of the above embodiments. The smart interaction device can collect the user's sound signal, image signal and/or distance signal in real time, and after analyzing by artificial intelligence, determine whether the user has the willingness to interact, and can determine the strong degree of interaction intention, and then control the intelligence autonomously. The interactive device performs corresponding actions, actively interacts with the user, and enriches the interaction means, thereby improving the user experience.
在本发明的描述中,需要理解的是,术语“中心”、“纵向”、“横向”、“长度”、“宽度”、“厚度”、“上”、“下”、“前”、“后”、“左”、“右”、“竖直”、“水平”、“顶”、“底”“内”、“外”、“顺时针”、“逆时针”、“轴向”、“径 向”、“周向”等指示的方位或位置关系为基于附图所示的方位或位置关系,仅是为了便于描述本发明和简化描述,而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作,因此不能理解为对本发明的限制。In the description of the present invention, it is to be understood that the terms "center", "longitudinal", "transverse", "length", "width", "thickness", "upper", "lower", "front", " After, "Left", "Right", "Vertical", "Horizontal", "Top", "Bottom", "Inside", "Outside", "Clockwise", "Counterclockwise", "Axial", "Path The orientation or positional relationship indicated to the "," "circumferential" or the like is based on the orientation or positional relationship shown in the drawings, and is merely for convenience of description of the present invention and simplified description, and does not indicate or imply that the device or component referred to has The specific orientation, construction and operation in a particular orientation are not to be construed as limiting the invention.
此外,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。在本发明的描述中,“多个”的含义是至少两个,例如两个,三个等,除非另有明确具体的限定。Moreover, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, features defining "first" or "second" may include at least one of the features, either explicitly or implicitly. In the description of the present invention, the meaning of "a plurality" is at least two, such as two, three, etc., unless specifically defined otherwise.
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。In the description of the present specification, the description with reference to the terms "one embodiment", "some embodiments", "example", "specific example", or "some examples" and the like means a specific feature described in connection with the embodiment or example. A structure, material or feature is included in at least one embodiment or example of the invention. In the present specification, the schematic representation of the above terms is not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in a suitable manner in any one or more embodiments or examples. In addition, various embodiments or examples described in the specification, as well as features of various embodiments or examples, may be combined and combined.
流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为,表示包括一个或更多个用于实现特定逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分,并且本发明的优选实施方式的范围包括另外的实现,其中可以不按所示出或讨论的顺序,包括根据所涉及的功能按基本同时的方式或按相反的顺序,来执行功能,这应被本发明的实施例所属技术领域的技术人员所理解。Any process or method description in the flowcharts or otherwise described herein may be understood to represent a module, segment or portion of code that includes one or more executable instructions for implementing the steps of a particular logical function or process. And the scope of the preferred embodiments of the invention includes additional implementations, in which the functions may be performed in a substantially simultaneous manner or in an opposite order depending on the functions involved, in the order shown or discussed. It will be understood by those skilled in the art to which the embodiments of the present invention pertain.
在流程图中表示或在此以其他方式描述的逻辑和/或步骤,例如,可以被认为是用于实现逻辑功能的可执行指令的定序列表,可以具体实现在任何计算机可读介质中,以供指令执行系统、装置或设备(如基于计算机的系统、包括处理器的系统或其他可以从指令执行系统、装置或设备取指令并执行指令的系统)使用,或结合这些指令执行系统、装置或设备而使用。就本说明书而言,"计算机可读介质"可以是任何可以包含、存储、通信、传播或传输程序以供指令执行系统、装置或设备或结合这些指令执行系统、装置或设备而使用的装置。计算机可读介质的更具体的示例(非穷尽性列表)包括以下:具有一个或多个布线的电连接部(电子装置),便携式计算机盘盒(磁装置),随机存取存储器(RAM),只读存储器(ROM),可擦除可编辑只读存储器(EPROM或闪速存储器),光纤装置,以及便携式光盘只读存储器(CDROM)。另外,计算机可读介质甚至可以是可在其上打印所述程序的纸或其他合适的介质,因为可以例如通过对纸或其他介质进行光学扫描,接着进行编辑、解译或必要时以其他合适方式进行处理来以电子方式获得所述程序,然后将其存储在计算机存储器中。 The logic and/or steps represented in the flowchart or otherwise described herein, for example, may be considered as an ordered list of executable instructions for implementing logical functions, and may be embodied in any computer readable medium, Used in conjunction with, or in conjunction with, an instruction execution system, apparatus, or device (eg, a computer-based system, a system including a processor, or other system that can fetch instructions and execute instructions from an instruction execution system, apparatus, or device) Or use with equipment. For the purposes of this specification, a "computer-readable medium" can be any apparatus that can contain, store, communicate, propagate, or transport a program for use in an instruction execution system, apparatus, or device, or in conjunction with the instruction execution system, apparatus, or device. More specific examples (non-exhaustive list) of computer readable media include the following: electrical connections (electronic devices) having one or more wires, portable computer disk cartridges (magnetic devices), random access memory (RAM), Read only memory (ROM), erasable editable read only memory (EPROM or flash memory), fiber optic devices, and portable compact disk read only memory (CDROM). In addition, the computer readable medium may even be a paper or other suitable medium on which the program can be printed, as it may be optically scanned, for example by paper or other medium, followed by editing, interpretation or, if appropriate, other suitable The method is processed to obtain the program electronically and then stored in computer memory.
应当理解,本发明的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中,多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。例如,如果用硬件来实现,和在另一实施方式中一样,可用本领域公知的下列技术中的任一项或他们的组合来实现:具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路,具有合适的组合逻辑门电路的专用集成电路,可编程门阵列(PGA),现场可编程门阵列(FPGA)等。It should be understood that portions of the invention may be implemented in hardware, software, firmware or a combination thereof. In the above-described embodiments, multiple steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, it can be implemented by any one or combination of the following techniques well known in the art: having logic gates for implementing logic functions on data signals. Discrete logic circuits, application specific integrated circuits with suitable combinational logic gates, programmable gate arrays (PGAs), field programmable gate arrays (FPGAs), etc.
本技术领域的普通技术人员可以理解实现上述实施例方法携带的全部或部分步骤是可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,该程序在执行时,包括方法实施例的步骤之一或其组合。One of ordinary skill in the art can understand that all or part of the steps carried by the method of implementing the above embodiments can be completed by a program to instruct related hardware, and the program can be stored in a computer readable storage medium. When executed, one or a combination of the steps of the method embodiments is included.
此外,在本发明各个实施例中的各功能单元可以集成在一个处理模块中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing module, or each unit may exist physically separately, or two or more units may be integrated into one module. The above integrated modules can be implemented in the form of hardware or in the form of software functional modules. The integrated modules, if implemented in the form of software functional modules and sold or used as stand-alone products, may also be stored in a computer readable storage medium.
上述提到的存储介质可以是只读存储器,磁盘或光盘等。尽管上面已经示出和描述了本发明的实施例,可以理解的是,上述实施例是示例性的,不能理解为对本发明的限制,本领域的普通技术人员在本发明的范围内可以对上述实施例进行变化、修改、替换和变型。 The above mentioned storage medium may be a read only memory, a magnetic disk or an optical disk or the like. Although the embodiments of the present invention have been shown and described, it is understood that the above-described embodiments are illustrative and are not to be construed as limiting the scope of the invention. The embodiments are subject to variations, modifications, substitutions and variations.

Claims (29)

  1. 一种基于人工智能的智能交互设备控制方法,其特征在于,包括以下步骤:An intelligent intelligent device control method based on artificial intelligence, comprising the following steps:
    接收多模态的输入信号,所述多模态的输入信号包括用户输入的图像信号、声音信号和/或距离信号;Receiving a multi-modal input signal, the multi-modal input signal including an image signal, a sound signal, and/or a distance signal input by a user;
    根据所述图像信号进行人脸检测,并在检测到有人脸时,获取所述人脸图像和人脸信息;Performing face detection according to the image signal, and acquiring the face image and face information when a human face is detected;
    根据所述人脸图像进行唇区检测以确定唇区运动情况;Performing lip detection based on the face image to determine lip motion;
    根据所述声音信号进行声源定位以得到声源信息;Performing sound source localization according to the sound signal to obtain sound source information;
    根据所述人脸信息、所述唇区运动情况、所述声源信息和/或所述距离信号判断所述用户的交互意愿以及交互意愿强烈程度;以及Determining the user's willingness to interact and the degree of interaction will be based on the face information, the lip motion, the sound source information, and/or the distance signal;
    根据所述用户的交互意愿以及交互意愿强烈程度控制所述智能交互设备进行相应的交互响应。The smart interaction device is controlled to perform a corresponding interaction response according to the user's willingness to interact and the willingness to interact.
  2. 根据权利要求1所述的基于人工智能的智能交互设备控制方法,其特征在于,在根据所述声音信号进行声源定位以得到声源信息之前,还包括:The method for controlling an artificial intelligence-based intelligent interactive device according to claim 1, wherein before the sound source is located to obtain the sound source information according to the sound signal, the method further includes:
    判断所述声音信号中是否包含所述用户说话时的语音;Determining whether the voice signal includes a voice when the user speaks;
    如果是,则保留所述声音信号中所述用户说话时的语音,并从所述声音信号中滤除其它的干扰噪音。If so, the voice in the voice signal when the user speaks is retained, and other interference noise is filtered out from the sound signal.
  3. 根据权利要求1或2所述的基于人工智能的智能交互设备控制方法,其特征在于,在根据所述人脸信息、所述唇区运动情况、所述声源信息和/或所述距离信号判断所述用户的交互意愿以及交互意愿强烈程度之前,还包括:The artificial intelligence-based intelligent interactive device control method according to claim 1 or 2, wherein the sound source information, the sound source information, and/or the distance signal are used according to the face information, the lip motion condition Before determining the user's willingness to interact and the willingness to interact, it also includes:
    判断所述人脸信息、所述唇区运动情况、所述声源信息和/或所述距离信号是否满足预定条件;Determining whether the face information, the lip motion condition, the sound source information, and/or the distance signal satisfy a predetermined condition;
    如果满足所述预定条件,则执行用户的交互意愿以及交互意愿强烈程度的判断。If the predetermined condition is satisfied, the user's willingness to interact and the degree of strong willingness to interact are performed.
  4. 根据权利要求1-3任一项所述的基于人工智能的智能交互设备控制方法,其特征在于,在根据所述人脸信息、所述唇区运动情况、所述声源信息和/或所述距离信号判断所述用户的交互意愿以及交互意愿强烈程度之前,还包括:对所述人脸信息和所述唇区运动情况进行量化处理。The artificial intelligence-based intelligent interactive device control method according to any one of claims 1 to 3, characterized in that, according to the face information, the lip region motion condition, the sound source information and/or the Before the distance signal determines the user's willingness to interact and the degree of interaction intention, the method further includes: performing quantization processing on the face information and the lip motion.
  5. 根据权利要求1-4任一项所述的基于人工智能的智能交互设备控制方法,其特征在于,还包括:The method for controlling an artificial intelligence-based intelligent interactive device according to any one of claims 1 to 4, further comprising:
    调整所述人脸信息、所述唇区运动情况、所述声源信息和/或所述距离信号的权重,其中,所述权重用于影响所述用户的交互意愿以及交互意愿强烈程度的判断结果; Adjusting the weight information of the face information, the lip motion, the sound source information, and/or the distance signal, wherein the weight is used to influence the interaction intention of the user and the strong degree of interaction intention Result
    所述判断所述用户的交互意愿以及交互意愿强烈程度,进一步包括:The determining the user's willingness to interact and the degree of interaction will further include:
    根据所述人脸信息、所述唇区运动情况、所述声源信息和/或所述距离信号的权重判断所述用户的交互意愿以及交互意愿强烈程度。Determining the user's willingness to interact and the degree of interaction will be based on the face information, the lip motion condition, the sound source information, and/or the weight of the distance signal.
  6. 根据权利要求1-5任一项所述的基于人工智能的智能交互设备控制方法,其特征在于,所述人脸信息包括人脸面积信息和人脸正面面对程度,所述声源信息包括声源方位信息和声音强度信息。The artificial intelligence-based intelligent interactive device control method according to any one of claims 1 to 5, wherein the face information includes face area information and a face face degree, and the sound source information includes Sound source orientation information and sound intensity information.
  7. 根据权利要求1-6任一项所述的基于人工智能的智能交互设备控制方法,其特征在于,所述根据所述人脸信息、所述唇区运动情况、所述声源信息和/或所述距离信号判断所述用户的交互意愿以及交互意愿强烈程度,包括:当判断所述用户正对所述智能交互设备、所述用户的嘴唇未运动、所述用户发声且声音强度大于预定强度以及所述用户与所述智能交互设备之间的距离小于预设距离时,判断所述用户具有弱交互意愿,The artificial intelligence-based intelligent interactive device control method according to any one of claims 1 to 6, wherein the method according to the face information, the lip motion, the sound source information, and/or The distance signal determines the user's willingness to interact and the degree of interaction will be strong, including: when it is determined that the user is facing the smart interaction device, the user's lips are not moving, the user is vocalized and the sound intensity is greater than a predetermined strength And determining that the user has a weak willingness to interact when the distance between the user and the smart interaction device is less than a preset distance.
    所述根据所述用户的交互意愿以及交互意愿强烈程度控制所述智能交互设备进行相应的交互响应,包括:控制所述智能交互设备进行静默响应。The controlling the smart interaction device to perform the corresponding interaction response according to the user's willingness to interact and the willingness to interact include: controlling the smart interaction device to perform a silent response.
  8. 根据权利要求1-6任一项所述的基于人工智能的智能交互设备控制方法,其特征在于,所述根据所述人脸信息、所述唇区运动情况、所述声源信息和/或所述距离信号判断所述用户的交互意愿以及交互意愿强烈程度,包括:当判断所述用户正对所述智能交互设备、所述用户的嘴唇产生运动、所述用户发声且声音强度小于预定强度以及所述用户与所述智能交互设备之间的距离小于所述预设距离时,判断所述用户具有疑似交互意愿,The artificial intelligence-based intelligent interactive device control method according to any one of claims 1 to 6, wherein the method according to the face information, the lip motion, the sound source information, and/or The distance signal determines the user's willingness to interact and the degree of interaction will be strong, including: when it is determined that the user is exercising motion on the smart interaction device, the user's lips, the user is vocalized and the sound intensity is less than a predetermined strength And determining that the user has a suspected interaction intention when the distance between the user and the smart interaction device is less than the preset distance.
    所述根据所述用户的交互意愿以及交互意愿强烈程度控制所述智能交互设备进行相应的交互响应,包括:控制所述智能交互设备进行提高音量提示响应。The controlling the smart interaction device to perform a corresponding interaction response according to the user's willingness to interact and the willingness to interact include: controlling the smart interaction device to perform a volume prompt response.
  9. 根据权利要求1-6任一项所述的基于人工智能的智能交互设备控制方法,其特征在于,所述根据所述人脸信息、所述唇区运动情况、所述声源信息和/或所述距离信号判断所述用户的交互意愿以及交互意愿强烈程度,包括:当判断所述用户正对所述智能交互设备、所述用户的嘴唇产生运动、所述用户发声且声音强度大于所述预定强度以及所述用户与所述智能交互设备之间的距离小于所述预设距离时,判断所述用户具有强交互意愿,The artificial intelligence-based intelligent interactive device control method according to any one of claims 1 to 6, wherein the method according to the face information, the lip motion, the sound source information, and/or The distance signal determines the user's willingness to interact and the degree of willingness to interact, including: determining that the user is exercising motion on the smart interaction device, the user's lips, the user vocalizing and the sound intensity is greater than the Determining that the user has a strong willingness to interact when the predetermined strength and the distance between the user and the smart interaction device are less than the preset distance.
    所述根据所述用户的交互意愿以及交互意愿强烈程度控制所述智能交互设备进行相应的交互响应,包括:控制所述智能交互设备进行正式交互响应。The controlling the smart interaction device to perform a corresponding interaction response according to the user's willingness to interact and the willingness to interact include: controlling the smart interaction device to perform a formal interaction response.
  10. 根据权利要求1-6任一项所述的基于人工智能的智能交互设备控制方法,其特征在于,所述根据所述人脸信息、所述唇区运动情况、所述声源信息和/或所述距离信号判断所述用户的交互意愿以及交互意愿强烈程度,包括:当判断所述用户侧面面对所述智能交互设备、所述用户发声且声音强度大于所述预定强度以及所述用户与所述智能交互设备之间的距离小于所述预设距离时,判断所述用户具有伴随交互意愿, The artificial intelligence-based intelligent interactive device control method according to any one of claims 1 to 6, wherein the method according to the face information, the lip motion, the sound source information, and/or Determining, by the distance signal, the user's willingness to interact and the degree of willingness to interact, including: determining that the user faces the smart interactive device, the user utters the voice and the sound intensity is greater than the predetermined strength, and the user and the user When the distance between the smart interaction devices is less than the preset distance, determining that the user has a willingness to interact,
    所述根据所述用户的交互意愿以及交互意愿强烈程度控制所述智能交互设备进行相应的交互响应,包括:控制所述智能交互设备进行语音/聊天交互响应。The controlling the smart interaction device to perform the corresponding interaction response according to the user's willingness to interact and the willingness to interact include: controlling the smart interaction device to perform a voice/chat interaction response.
  11. 根据权利要求1-6任一项所述的基于人工智能的智能交互设备控制方法,其特征在于,所述根据所述人脸信息、所述唇区运动情况、所述声源信息和/或所述距离信号判断所述用户的交互意愿以及交互意愿强烈程度,包括:当未检测到人脸图像、所述用户发声且声音强度大于所述预定强度以及所述用户与所述智能交互设备之间的距离小于所述预设距离时,判断所述用户具有强疑似交互意愿,The artificial intelligence-based intelligent interactive device control method according to any one of claims 1 to 6, wherein the method according to the face information, the lip motion, the sound source information, and/or The distance signal determines the user's willingness to interact and the degree of willingness to interact, including: when no face image is detected, the user vocalizes and the sound intensity is greater than the predetermined strength, and the user and the smart interaction device When the distance between the two is less than the preset distance, it is determined that the user has a strong suspected interaction intention,
    所述根据所述用户的交互意愿以及交互意愿强烈程度控制所述智能交互设备进行相应的交互响应,包括:控制所述智能交互设备转向所述声源方向并进行提示响应。The controlling the smart interaction device to perform the corresponding interaction response according to the user's willingness to interact and the willingness to interact include: controlling the smart interaction device to turn to the sound source direction and performing a prompt response.
  12. 根据权利要求1-6任一项所述的基于人工智能的智能交互设备控制方法,其特征在于,所述根据所述人脸信息、所述唇区运动情况、所述声源信息和/或所述距离信号判断所述用户的交互意愿以及交互意愿强烈程度,包括:当未检测到人脸图像、所述用户发声且声音强度大于所述预定强度以及所述用户与所述智能交互设备之间的距离大于所述预设距离时,判断所述用户具有弱疑似交互意愿,The artificial intelligence-based intelligent interactive device control method according to any one of claims 1 to 6, wherein the method according to the face information, the lip motion, the sound source information, and/or The distance signal determines the user's willingness to interact and the degree of willingness to interact, including: when no face image is detected, the user vocalizes and the sound intensity is greater than the predetermined strength, and the user and the smart interaction device When the distance between the distances is greater than the preset distance, it is determined that the user has a weak suspected interaction intention,
    所述根据所述用户的交互意愿以及交互意愿强烈程度控制所述智能交互设备进行相应的交互响应,包括:控制所述智能交互设备转向所述声源的响应。The controlling the smart interaction device to perform the corresponding interaction response according to the user's willingness to interact and the willingness to interact include: controlling the response of the smart interaction device to the sound source.
  13. 根据权利要求1-12任一项所述的基于人工智能的智能交互设备控制方法,其特征在于,所述根据所述人脸图像进行唇区检测以确定唇区运动情况,具体包括:根据多帧人脸图像之间的唇区形状差异确定所述唇区运动情况。The artificial intelligence-based intelligent interactive device control method according to any one of claims 1 to 12, wherein the lip region detection is performed according to the face image to determine a lip region motion, which comprises: The lip shape difference between the frame face images determines the lip region motion.
  14. 一种基于人工智能的智能交互设备控制系统,其特征在于,包括:An intelligent interactive device control system based on artificial intelligence, comprising:
    接收模块,用于接收多模态的输入信号,所述多模态的输入信号包括用户输入的图像信号、声音信号和/或距离信号;a receiving module, configured to receive a multi-modal input signal, where the multi-modal input signal includes an image signal, a sound signal, and/or a distance signal input by a user;
    人脸检测模块,用于根据所述图像信号进行人脸检测,并在检测到有人脸时,获取所述人脸图像和人脸信息;a face detection module, configured to perform face detection according to the image signal, and acquire the face image and face information when a human face is detected;
    唇区检测模块,用于根据所述人脸图像进行唇区检测以确定唇区运动情况;a lip detection module, configured to perform lip detection according to the facial image to determine a lip motion;
    声源定位模块,用于根据所述声音信号进行声源定位以得到声源信息;a sound source positioning module, configured to perform sound source localization according to the sound signal to obtain sound source information;
    决策模块,所述决策模块用于根据所述人脸信息、所述唇区运动情况、所述声源信息和/或所述距离信号判断所述用户的交互意愿以及交互意愿强烈程度;以及a decision module, configured to determine, according to the face information, the lip motion condition, the sound source information, and/or the distance signal, the user's willingness to interact and the degree of interaction willingness;
    复合输出控制模块,用于根据所述用户的交互意愿以及交互意愿强烈程度控制所述智能交互设备进行相应的交互响应。The composite output control module is configured to control the smart interaction device to perform a corresponding interaction response according to the user's willingness to interact and the willingness to interact.
  15. 根据权利要求14所述的基于人工智能的智能交互设备控制系统,其特征在于,还包括: The artificial intelligence-based intelligent interactive device control system according to claim 14, further comprising:
    语音活动检测模块,用于在所述声源定位模块根据所述声音信号进行声源定位以得到声源信息之前,判断所述声音信号中是否包含所述用户说话时的语音,如果是,则保留所述声音信号中所述用户说话时的语音,并从所述声音信号中滤除其它的干扰噪音。a voice activity detecting module, configured to determine, before the sound source localization module performs sound source localization according to the sound signal to obtain sound source information, whether the voice signal includes the voice when the user speaks, and if yes, The voice in the voice signal when the user speaks is retained, and other interference noise is filtered out from the sound signal.
  16. 根据权利要求14或15所述的基于人工智能的智能交互设备控制系统,其特征在于,所述决策模块还用于在根据所述人脸信息、所述唇区运动情况、所述声源信息和/或所述距离信号判断所述用户的交互意愿以及交互意愿强烈程度之前,判断所述人脸信息、所述唇区运动情况、所述声源信息和/或所述距离信号是否满足预定条件;如果满足所述预定条件,则执行用户的交互意愿以及交互意愿强烈程度的判断。The artificial intelligence-based intelligent interactive device control system according to claim 14 or 15, wherein the decision module is further configured to: according to the face information, the lip region motion condition, the sound source information And/or determining whether the face information, the lip motion condition, the sound source information, and/or the distance signal satisfy a predetermined period before determining the user's willingness to interact and the degree of interaction intention Condition; if the predetermined condition is satisfied, the user's willingness to interact and the degree of strong willingness to interact are performed.
  17. 根据权利要求14-16任一项所述的基于人工智能的智能交互设备控制系统,其特征在于,所述决策模块还用于在根据所述人脸信息、所述唇区运动情况、所述声源信息和/或所述距离信号判断所述用户的交互意愿以及交互意愿强烈程度之前,对所述人脸信息和所述唇区运动情况进行量化处理。The artificial intelligence-based intelligent interactive device control system according to any one of claims 14-16, wherein the decision module is further configured to: according to the face information, the lip region motion condition, Before the sound source information and/or the distance signal determine the user's willingness to interact and the degree of interaction intention, the face information and the lip motion condition are quantized.
  18. 根据权利要求14-17任一项所述的基于人工智能的智能交互设备控制系统,其特征在于,所述决策模块还用于:The artificial intelligence-based intelligent interactive device control system according to any one of claims 14-17, wherein the decision module is further configured to:
    调整所述人脸信息、所述唇区运动情况、所述声源信息和/或所述距离信号的权重,其中,所述权重用于影响所述用户的交互意愿以及交互意愿强烈程度的判断结果;Adjusting the weight information of the face information, the lip motion, the sound source information, and/or the distance signal, wherein the weight is used to influence the interaction intention of the user and the strong degree of interaction intention result;
    所述判断所述用户的交互意愿以及交互意愿强烈程度,包括:The determining the user's willingness to interact and the willingness to interact include:
    根据所述人脸信息、所述唇区运动情况、所述声源信息和/或所述距离信号的权重判断所述用户的交互意愿以及交互意愿强烈程度。Determining the user's willingness to interact and the degree of interaction will be based on the face information, the lip motion condition, the sound source information, and/or the weight of the distance signal.
  19. 根据权利要求14-18任一项所述的基于人工智能的智能交互设备控制系统,其特征在于,所述人脸信息包括人脸面积信息和人脸正面面对程度,所述声源信息包括声源方位信息和声音强度信息。The artificial intelligence-based intelligent interactive device control system according to any one of claims 14 to 18, wherein the face information includes face area information and a face face degree, and the sound source information includes Sound source orientation information and sound intensity information.
  20. 根据权利要求14-19任一项所述的基于人工智能的智能交互设备控制系统,其特征在于,所述决策模块用于:当判断所述用户正对所述智能交互设备、所述用户的嘴唇未运动、所述用户发声且声音强度大于预定强度以及用户与所述智能交互设备之间的距离小于预设距离时,判断所述用户具有弱交互意愿,The artificial intelligence-based intelligent interactive device control system according to any one of claims 14 to 19, wherein the decision module is configured to: when determining that the user is facing the smart interaction device, the user Determining that the user has a weak willingness to interact when the lips are not moving, the user is vocalized, and the sound intensity is greater than a predetermined intensity and the distance between the user and the smart interaction device is less than a preset distance.
    所述复合输出控制模块用于:控制所述智能交互设备进行静默响应。The composite output control module is configured to: control the smart interaction device to perform a silent response.
  21. 根据权利要求14-19任一项所述的基于人工智能的智能交互设备控制系统,其特征在于,所述决策模块用于:当判断所述用户正对所述智能交互设备、所述用户的嘴唇产生运动、所述用户发声且声音强度小于预定强度以及用户与所述智能交互设备之间的距离小于所述预设距离时,判断所述用户具有疑似交互意愿,The artificial intelligence-based intelligent interactive device control system according to any one of claims 14 to 19, wherein the decision module is configured to: when determining that the user is facing the smart interaction device, the user Determining that the user has a suspected willingness to interact when the lips generate motion, the user utters sound and the sound intensity is less than a predetermined intensity, and the distance between the user and the smart interaction device is less than the preset distance.
    所述复合输出控制模块用于:控制所述智能交互设备进行提高音量提示响应。 The composite output control module is configured to: control the smart interaction device to perform a volume prompt response.
  22. 根据权利要求14-19任一项所述的基于人工智能的智能交互设备控制系统,其特征在于,所述决策模块用于:当判断所述用户正对所述智能交互设备、所述用户的嘴唇产生运动、所述用户发声且声音强度大于所述预定强度以及所述用户与所述智能交互设备之间的距离小于所述预设距离时,判断用户具有强交互意愿,The artificial intelligence-based intelligent interactive device control system according to any one of claims 14 to 19, wherein the decision module is configured to: when determining that the user is facing the smart interaction device, the user Determining that the user has a strong willingness to interact when the lips generate motion, the user utters sound and the sound intensity is greater than the predetermined intensity, and the distance between the user and the smart interaction device is less than the preset distance,
    所述复合输出控制模块用于:控制所述智能交互设备进行正式交互响应。The composite output control module is configured to: control the smart interaction device to perform a formal interaction response.
  23. 根据权利要求14-19任一项所述的基于人工智能的智能交互设备控制系统,其特征在于,所述决策模块用于:当判断所述用户侧面面对所述智能交互设备、所述用户发声且声音强度大于所述预定强度以及所述用户与所述智能交互设备之间的距离小于所述预设距离时,判断所述用户具有伴随交互意愿,The artificial intelligence-based intelligent interactive device control system according to any one of claims 14 to 19, wherein the decision module is configured to: when determining that the user side faces the smart interaction device, the user Determining that the user has a willingness to interact when the sound intensity is greater than the predetermined intensity and the distance between the user and the smart interaction device is less than the preset distance.
    所述复合输出控制模块用于:控制所述智能交互设备进行语音/聊天交互响应。The composite output control module is configured to: control the smart interaction device to perform a voice/chat interaction response.
  24. 根据权利要求14-19任一项所述的基于人工智能的智能交互设备控制系统,其特征在于,所述决策模块用于:当未检测到人脸图像、所述用户发声且声音强度大于所述预定强度以及所述用户与所述智能交互设备之间的距离小于所述预设距离时,判断所述用户具有强疑似交互意愿,The artificial intelligence-based intelligent interactive device control system according to any one of claims 14 to 19, wherein the decision module is configured to: when no face image is detected, the user vocalizes and the sound intensity is greater than Determining that the user has a strong suspected willingness to interact when the predetermined strength and the distance between the user and the smart interaction device are less than the preset distance.
    所述复合输出控制模块用于:控制所述智能交互设备转向所述声源方向并进行提示响应。The composite output control module is configured to: control the smart interaction device to turn to the sound source direction and perform a prompt response.
  25. 根据权利要求14-19任一项所述的基于人工智能的智能交互设备控制系统,其特征在于,所述决策模块用于:当未检测到人脸图像、所述用户发声且声音强度大于所述预定强度以及所述用户与所述智能交互设备之间的距离大于所述预设距离时,判断所述用户具有弱疑似交互意愿,The artificial intelligence-based intelligent interactive device control system according to any one of claims 14 to 19, wherein the decision module is configured to: when no face image is detected, the user vocalizes and the sound intensity is greater than Determining that the user has a weak suspected interaction will when the predetermined strength and the distance between the user and the smart interaction device are greater than the preset distance.
    所述复合输出控制模块用于:控制所述智能交互设备转向所述声源的响应。The composite output control module is configured to: control a response of the smart interaction device to the sound source.
  26. 根据权利要求14-25任一项所述的基于人工智能的智能交互设备控制系统,其特征在于,所述唇区检测模块用于:根据多帧人脸图像之间的唇区形状差异确定所述唇区运动情况。The artificial intelligence-based intelligent interactive device control system according to any one of claims 14-25, wherein the lip region detecting module is configured to: determine, according to a lip shape difference between the multi-frame face images The movement of the lip area.
  27. 一种智能交互设备,其特征在于,包括:根据权利要求14-26任一项所述的基于人工智能的智能交互设备控制系统。An intelligent interaction device, comprising: an artificial intelligence based intelligent interactive device control system according to any one of claims 14-26.
  28. 一种设备,其特征在于,包括:An apparatus, comprising:
    一个或者多个处理器;One or more processors;
    存储器;Memory
    一个或者多个程序,所述一个或者多个程序存储在所述存储器中,当被所述一个或者多个处理器执行时,执行如权利要求1-13任一项所述的基于人工智能的智能交互设备控制方法。 One or more programs, the one or more programs being stored in the memory, when executed by the one or more processors, performing the artificial intelligence-based according to any one of claims 1-13 Intelligent interactive device control method.
  29. 一种非易失性计算机存储介质,其特征在于,所述计算机存储介质存储有一个或者多个程序,当所述一个或者多个程序被一个设备执行时,使得所述设备执行如权利要求1-13任一项所述的基于人工智能的智能交互设备控制方法。 A non-volatile computer storage medium, characterized in that the computer storage medium stores one or more programs, when the one or more programs are executed by a device, causing the device to perform as claimed in claim 1. The artificial intelligence based intelligent interactive device control method according to any one of the preceding claims.
PCT/CN2015/096587 2015-08-24 2015-12-07 Artificial intelligence-based control method and system for intelligent interaction device WO2017031860A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510523179.3A CN105159111B (en) 2015-08-24 2015-08-24 Intelligent interaction device control method and system based on artificial intelligence
CN201510523179.3 2015-08-24

Publications (1)

Publication Number Publication Date
WO2017031860A1 true WO2017031860A1 (en) 2017-03-02

Family

ID=54799999

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/096587 WO2017031860A1 (en) 2015-08-24 2015-12-07 Artificial intelligence-based control method and system for intelligent interaction device

Country Status (2)

Country Link
CN (1) CN105159111B (en)
WO (1) WO2017031860A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107657852A (en) * 2017-11-14 2018-02-02 翟奕雲 Childhood teaching machine people based on recognition of face, tutoring system, storage medium
CN111124109A (en) * 2019-11-25 2020-05-08 北京明略软件系统有限公司 Interactive mode selection method, intelligent terminal, equipment and storage medium
CN111694433A (en) * 2020-06-11 2020-09-22 北京百度网讯科技有限公司 Voice interaction method and device, electronic equipment and storage medium
CN111880854A (en) * 2020-07-29 2020-11-03 百度在线网络技术(北京)有限公司 Method and apparatus for processing speech
CN114329654A (en) * 2022-03-15 2022-04-12 深圳英鸿骏智能科技有限公司 Interactive display method and system based on intelligent mirror
CN111880854B (en) * 2020-07-29 2024-04-30 百度在线网络技术(北京)有限公司 Method and device for processing voice

Families Citing this family (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105912128B (en) * 2016-04-29 2019-05-24 北京光年无限科技有限公司 Multi-modal interaction data processing method and device towards intelligent robot
CN106055105A (en) * 2016-06-02 2016-10-26 上海慧模智能科技有限公司 Robot and man-machine interactive system
CN107643509B (en) * 2016-07-22 2019-01-11 腾讯科技(深圳)有限公司 Localization method, positioning system and terminal device
CN106231234B (en) * 2016-08-05 2019-07-05 广州小百合信息技术有限公司 The image pickup method and system of video conference
CN107273944A (en) * 2017-05-16 2017-10-20 北京元视觉科技有限公司 Autonomous social smart machine, autonomous exchange method and storage medium
CN107404682B (en) * 2017-08-10 2019-11-05 京东方科技集团股份有限公司 A kind of intelligent earphone
CN109767774A (en) * 2017-11-08 2019-05-17 阿里巴巴集团控股有限公司 A kind of exchange method and equipment
CN109087636A (en) * 2017-12-15 2018-12-25 蔚来汽车有限公司 Interactive device
CN108388594A (en) * 2018-01-31 2018-08-10 上海乐愚智能科技有限公司 It wears the clothes reminding method and intelligent appliance
CN108388138A (en) * 2018-02-02 2018-08-10 宁夏玲杰科技有限公司 Apparatus control method, apparatus and system
CN108461084A (en) * 2018-03-01 2018-08-28 广东美的制冷设备有限公司 Speech recognition system control method, control device and computer readable storage medium
CN108957392A (en) * 2018-04-16 2018-12-07 深圳市沃特沃德股份有限公司 Sounnd source direction estimation method and device
CN110634486A (en) * 2018-06-21 2019-12-31 阿里巴巴集团控股有限公司 Voice processing method and device
CN109035968B (en) * 2018-07-12 2020-10-30 杜蘅轩 Piano learning auxiliary system and piano
CN109166575A (en) * 2018-07-27 2019-01-08 百度在线网络技术(北京)有限公司 Exchange method, device, smart machine and the storage medium of smart machine
CN110875060A (en) * 2018-08-31 2020-03-10 阿里巴巴集团控股有限公司 Voice signal processing method, device, system, equipment and storage medium
CN111230891B (en) * 2018-11-29 2021-07-27 深圳市优必选科技有限公司 Robot and voice interaction system thereof
CN109803013B (en) * 2019-01-21 2020-10-23 浙江大学 Weak interaction system based on artificial intelligence and control method thereof
CN111724772A (en) * 2019-03-20 2020-09-29 阿里巴巴集团控股有限公司 Interaction method and device of intelligent equipment and intelligent equipment
CN110187766A (en) * 2019-05-31 2019-08-30 北京猎户星空科技有限公司 A kind of control method of smart machine, device, equipment and medium
CN110309799B (en) * 2019-07-05 2022-02-08 四川长虹电器股份有限公司 Camera-based speaking judgment method
CN110335603A (en) * 2019-07-12 2019-10-15 四川长虹电器股份有限公司 Multi-modal exchange method applied to tv scene
CN111091823A (en) * 2019-11-28 2020-05-01 广州赛特智能科技有限公司 Robot control system and method based on voice and human face actions and electronic equipment
CN112102546A (en) * 2020-08-07 2020-12-18 浙江大华技术股份有限公司 Man-machine interaction control method, talkback calling method and related device
CN111933136A (en) * 2020-08-18 2020-11-13 南京奥拓电子科技有限公司 Auxiliary voice recognition control method and device
CN113608449B (en) * 2021-08-18 2023-09-15 四川启睿克科技有限公司 Speech equipment positioning system and automatic positioning method in smart home scene

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008126329A (en) * 2006-11-17 2008-06-05 Toyota Motor Corp Voice recognition robot and its control method
JP2008152125A (en) * 2006-12-19 2008-07-03 Toyota Central R&D Labs Inc Utterance detection device and utterance detection method
CN102298443A (en) * 2011-06-24 2011-12-28 华南理工大学 Smart home voice control system combined with video channel and control method thereof
CN102360187A (en) * 2011-05-25 2012-02-22 吉林大学 Chinese speech control system and method with mutually interrelated spectrograms for driver
CN103745723A (en) * 2014-01-13 2014-04-23 苏州思必驰信息科技有限公司 Method and device for identifying audio signal

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6964023B2 (en) * 2001-02-05 2005-11-08 International Business Machines Corporation System and method for multi-modal focus detection, referential ambiguity resolution and mood classification using multi-modal input
JP5911796B2 (en) * 2009-04-30 2016-04-27 サムスン エレクトロニクス カンパニー リミテッド User intention inference apparatus and method using multimodal information
KR101568347B1 (en) * 2011-04-12 2015-11-12 한국전자통신연구원 Computing device with robotic functions and operating method for the same
CA2904359A1 (en) * 2013-03-15 2014-09-25 JIBO, Inc. Apparatus and methods for providing a persistent companion device
CN104777910A (en) * 2015-04-23 2015-07-15 福州大学 Method and system for applying expression recognition to display device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008126329A (en) * 2006-11-17 2008-06-05 Toyota Motor Corp Voice recognition robot and its control method
JP2008152125A (en) * 2006-12-19 2008-07-03 Toyota Central R&D Labs Inc Utterance detection device and utterance detection method
CN102360187A (en) * 2011-05-25 2012-02-22 吉林大学 Chinese speech control system and method with mutually interrelated spectrograms for driver
CN102298443A (en) * 2011-06-24 2011-12-28 华南理工大学 Smart home voice control system combined with video channel and control method thereof
CN103745723A (en) * 2014-01-13 2014-04-23 苏州思必驰信息科技有限公司 Method and device for identifying audio signal

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107657852A (en) * 2017-11-14 2018-02-02 翟奕雲 Childhood teaching machine people based on recognition of face, tutoring system, storage medium
CN107657852B (en) * 2017-11-14 2023-09-22 翟奕雲 Infant teaching robot, teaching system and storage medium based on face recognition
CN111124109A (en) * 2019-11-25 2020-05-08 北京明略软件系统有限公司 Interactive mode selection method, intelligent terminal, equipment and storage medium
CN111694433A (en) * 2020-06-11 2020-09-22 北京百度网讯科技有限公司 Voice interaction method and device, electronic equipment and storage medium
CN111880854A (en) * 2020-07-29 2020-11-03 百度在线网络技术(北京)有限公司 Method and apparatus for processing speech
CN111880854B (en) * 2020-07-29 2024-04-30 百度在线网络技术(北京)有限公司 Method and device for processing voice
CN114329654A (en) * 2022-03-15 2022-04-12 深圳英鸿骏智能科技有限公司 Interactive display method and system based on intelligent mirror
CN114329654B (en) * 2022-03-15 2022-05-20 深圳英鸿骏智能科技有限公司 Interactive display method and system based on intelligent mirror

Also Published As

Publication number Publication date
CN105159111A (en) 2015-12-16
CN105159111B (en) 2019-01-25

Similar Documents

Publication Publication Date Title
WO2017031860A1 (en) Artificial intelligence-based control method and system for intelligent interaction device
US10467509B2 (en) Computationally-efficient human-identifying smart assistant computer
CN107077847B (en) Enhancement of key phrase user identification
US10019992B2 (en) Speech-controlled actions based on keywords and context thereof
US9881610B2 (en) Speech recognition system adaptation based on non-acoustic attributes and face selection based on mouth motion using pixel intensities
CN112074901A (en) Speech recognition login
US11699442B2 (en) Methods and systems for speech detection
KR102230667B1 (en) Method and apparatus for speaker diarisation based on audio-visual data
CN110808048A (en) Voice processing method, device, system and storage medium
WO2014209262A1 (en) Speech detection based upon facial movements
JP6562790B2 (en) Dialogue device and dialogue program
US10325600B2 (en) Locating individuals using microphone arrays and voice pattern matching
KR20100086262A (en) Robot and control method thereof
KR20200085696A (en) Method of processing video for determining emotion of a person
JP6891601B2 (en) Robot control programs, robot devices, and robot control methods
CN111326152A (en) Voice control method and device
US20220335937A1 (en) Acoustic zoning with distributed microphones
CN115461811A (en) Multi-modal beamforming and attention filtering for multi-party interaction
US11743588B1 (en) Object selection in computer vision
CN114449320A (en) Playing control method and device, storage medium and electronic equipment
EP3839719B1 (en) Computing device and method of operating the same
US20210392427A1 (en) Systems and Methods for Live Conversation Using Hearing Devices
JP2022147989A (en) Utterance control device, utterance control method and utterance control program
CN117116250A (en) Voice interaction refusing method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15902129

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15902129

Country of ref document: EP

Kind code of ref document: A1