WO2017035768A1 - Voice control method based on visual wake-up - Google Patents

Voice control method based on visual wake-up Download PDF

Info

Publication number
WO2017035768A1
WO2017035768A1 PCT/CN2015/088723 CN2015088723W WO2017035768A1 WO 2017035768 A1 WO2017035768 A1 WO 2017035768A1 CN 2015088723 W CN2015088723 W CN 2015088723W WO 2017035768 A1 WO2017035768 A1 WO 2017035768A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
voice control
control device
image
voice signal
Prior art date
Application number
PCT/CN2015/088723
Other languages
French (fr)
Chinese (zh)
Inventor
涂悦
Original Assignee
涂悦
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 涂悦 filed Critical 涂悦
Priority to PCT/CN2015/088723 priority Critical patent/WO2017035768A1/en
Publication of WO2017035768A1 publication Critical patent/WO2017035768A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Definitions

  • the present invention relates to the field of intelligent control technologies, and in particular, to a voice control method based on visual wake-up.
  • Figure 1 shows the structure of a typical voice control device comprising a voice receiving unit 1, typically a microphone, and a voice recognition unit 2 And processing unit 3 .
  • the speech recognition unit 2 acquires the speech signal from the speech receiving unit 1 and performs speech signal recognition, and transmits the recognized result to the processing unit 3, and the processing unit 3
  • the voice control device is instructed to execute a command corresponding to the voice signal.
  • voice wakeup When controlling a plurality of voice control devices such as those shown in FIG. 1 with these devices An important feature in voice interaction is voice wakeup. It is understandable that in order to treat the plurality of voice control devices differently, it is a prerequisite that the command can be accurately transmitted to one of the determined devices without affecting other devices, and only waking up the device to receive commands is necessary.
  • the wake-up of the wake-up voice control device is generally based on wake-up words, such as the name of the device, the code name, and the like.
  • the current voice wake-up method has many innate defects, such as when the user says the same as the wake-up word / A similar word, then the device will be woken up, even though the user does not actually wake up the device. In addition, each time the user wakes up the device, the wake-up word is said, which is not a good experience for the user.
  • those skilled in the art are directed to developing a visual wake-based voice control method to wake up a target device more intelligently.
  • the present invention provides a visual wake-up based voice control method for waking up a voice control device to cause the voice control device to reply to a voice signal it receives, characterized in that the voice control Methods include:
  • Step 1 After receiving the at least part of the voice signal, the voice control device starts an image receiving unit mounted thereon;
  • Step two the image receiving unit acquires an image and transmits the image to the image recognition unit;
  • Step 3 The image recognition unit recognizes the image, and when a line of sight is detected in the image toward a face of the voice control device, the voice control device is woken up to recognize the voice signal.
  • the image receiving unit is a camera.
  • the camera is a wide-angle camera.
  • the image receiving unit is a rotatable camera
  • the rotatable camera comprises a pan/tilt
  • the pan/tilt is mounted on the voice control device.
  • pan/tilt is 2-axis driven.
  • the step 1 includes: the voice control device distinguishes a source direction of the voice signal according to the received at least part of the voice signal; and when the voice control device can determine the voice signal In the source direction, the voice control device instructs the camera to turn to the source direction of the voice signal to acquire an image, and when the voice control device cannot determine the source direction of the voice signal, the voice control device instructs the camera Rotate and acquire an image within its maximum range of rotation angles.
  • step three includes:
  • the voice control device can determine the source direction of the voice signal, when the image recognition unit detects a line of sight toward the voice control device in the image, the voice control device receives After the voice signal, the voice signal is recognized and a reply is made;
  • the voice control device For the case where the voice control device cannot determine the source direction of the voice signal, when the image recognition unit detects a line of sight toward the voice control device in the image and the face is speaking and After the voice signal is not received, the voice control device recognizes the voice signal after receiving the voice signal, and makes a reply; when the image recognition unit detects that the line of sight is facing the voice in the image Controlling the face of the device and the face is not speaking and the voice signal has been received, the voice control device recognizes the voice signal and responds, if the voice control device cannot recognize the voice signal, does not do Reply.
  • the voice control device is not woken up.
  • the voice control device receives the voice signal through a voice receiving unit, and identifies the voice signal through a voice recognition unit.
  • the voice receiving unit is a microphone.
  • the visual wake-up based voice control method of the present invention causes the voice control device to activate a visual wake-up function when starting to receive a voice signal originating from a user, by using an image receiving unit and an image recognition unit to search for a line of sight in a source direction of the voice signal toward the
  • the face of the voice control device or the entire area searches for the line of sight toward the face of the voice control device to determine whether to wake up the voice control device; the awakened voice control device identifies the received voice signal through the voice recognition unit, and responds accordingly .
  • the invention wakes up the voice recognition unit through the above-mentioned visual wake-up function, and is more suitable for the daily voice interaction habit of the user, and is more convenient and intelligent to use.
  • FIG. 1 is a structural block diagram of a prior art voice control device.
  • Fig. 2 is a block diagram showing a form of a voice control device to which the visual wake-up based voice control method of the present invention is applied.
  • Fig. 3 is a block diagram showing another form of the voice control device to which the visual wake-up based voice control method of the present invention is applied.
  • FIG. 4 is a flow chart of a visual wake-based voice control method of the present invention to which the voice control device shown in FIG. 3 is applied.
  • the voice control device includes a voice receiving unit 1, an image receiving unit 11, a voice recognition unit 2, an image recognition unit 12, and a processing unit 13.
  • the voice receiving unit 1 The microphone receiving unit 11 is a camera, preferably a wide-angle camera; the voice receiving unit 1 and the image receiving unit 11 are mounted on the casing of the voice control device.
  • Speech recognition unit 2 The speech signal from the speech receiving unit 1 is acquired, and the speech signal is recognized, and the result of the recognition is sent to the processing unit 13.
  • Speech recognition unit 2 used in this example It can be any prior art software (and hardware) with speech recognition capabilities.
  • the image recognition unit 12 acquires an image from the image receiving unit 11 and performs image recognition, and transmits the result of the recognition to the processing unit 13
  • the image recognition unit 12 employed in the present example may be any prior art software having a recognition function of a face and a line of sight direction, for example, a Chinese patent application 'a human-computer interaction method and system based on line of sight judgment' (Application No.: CN201210261378.8), Chinese patent application 'Fast and accurate human eye positioning method and line of sight estimation method based on human eye positioning' (Application No.: CN201510152613.1) and so on.
  • the processing unit 13 can issue an instruction to the voice recognition unit 2 and the image recognition unit 12 to instruct its operation.
  • the visual wake-up based voice control method of the present invention applying the voice control device shown in FIG. 2 includes:
  • Step 1 The voice receiving unit of the voice control device 1 receives at least part of the voice signal, for example, just starts receiving 1-2 After the syllables, the image receiving unit 11 is activated.
  • Step 2 The image receiving unit 11 acquires an image and transmits it to the image recognition unit 12, that is, as the image receiving unit 11
  • the camera acquires an image within its field of view and transmits the image to the image recognition unit 12.
  • Step 3 The image recognition unit 12 recognizes the image, and when the line of sight is detected in the image toward the face of the voice control device, the image recognition unit 12 This recognition result is sent to the processing unit 13, which causes the voice control device to be woken up. Then the processing unit 13 causes the speech recognition unit 2 to operate, and the speech recognition unit 2 The complete speech signal is received and recognized, and the speech recognition unit 2 transmits the recognition result to the processing unit 3, which causes the speech control device to reply to the speech signal.
  • the voice control device includes a voice receiving unit 1, an image receiving unit 21, a voice recognition unit 2, an image recognition unit 22, and a processing unit 23.
  • the voice receiving unit 1 a microphone
  • the image receiving unit 21 is a rotatable camera such as a rotatable camera having a 2-axis driven pan/tilt that is rotatable about a horizontal axis and a vertical axis
  • a voice receiving unit 1 and an image receiving unit 21 Mounted on the housing of the voice control device, the pan/tilt of the rotatable camera is mounted on the housing of the voice control device.
  • Speech recognition unit 2 acquisition from speech receiving unit 1
  • the voice signal is subjected to voice signal recognition, and the result of the recognition is sent to the processing unit 23.
  • Speech recognition unit 2 used in this example It can be any prior art software (and hardware) that has a speech recognition function and is capable of discerning the source direction of the speech.
  • the image recognition unit 22 acquires the image receiving unit 21 And performing image recognition, and transmitting the recognized result to the processing unit 23, the image recognition unit 22 employed in the present example It may be any prior art software having the recognition function of the face and the line of sight direction, which is the same as in the previous example.
  • processing unit 23 can be directed to the speech recognition unit 2 and the image recognition unit 22 An instruction is issued to instruct its operation; the processing unit 23 is also capable of controlling the rotation of the pan/tilt as a rotatable camera of the image receiving unit 21, thereby controlling the rotational direction and angle of the rotatable camera.
  • FIG. 4 A flowchart of the visual wake-up based voice control method of the present invention using the voice control device shown in FIG. 3 is as shown in FIG. 4, and includes:
  • Step 1 The voice receiving unit of the voice control device 1 receives at least part of the voice signal, for example, just starts receiving 1-2 After the syllables, the image receiving unit 21 is activated. At the same time, the speech recognition unit 2 discriminates the source direction of the speech signal through the received partial speech signal.
  • Step 2 The image receiving unit 21 acquires an image and transmits it to the image recognition unit 22 , wherein, in the case where the source direction of the voice signal is determined in step 1, the processing unit 23 Controlling the source direction of the rotatable camera to the voice signal, and acquiring an image in a corresponding region of the source direction of the voice signal; for the case where the source direction of the voice signal cannot be determined in step 1, the processing unit 23 The rotatable camera is controlled to rotate within a range of its maximum angle of rotation, i.e., the image is acquired throughout the area until a line of sight is detected in the image toward the face of the voice control device.
  • Step 1.1 the processing unit 23 controls the source direction of the rotatable camera to turn the voice signal, and acquires an image in the corresponding area;
  • Step 1.2 Image Recognition Unit 22 Analyze the acquired image, determine whether there is a face and whether there is a human face in the image, and determine whether the line of sight of the face faces the voice control device. If yes, go to the following step 3. If there is one, go to step 1.1. .
  • Step 2.1 processing unit 23 Controlling the rotatable camera to rotate within a range of its maximum rotation angle, acquiring an image having a face therein in the entire area, and stopping the rotation of the rotatable camera after the search is completed;
  • Step 2.2 Image Recognition Unit 22 Analyze the acquired image of the face and determine whether the line of sight of the face is toward the voice control device. If yes, proceed to step 3 below. If not, proceed to step 2.1.
  • Step three image recognition unit 22 Identifying the image, wherein the image recognition unit detects a direction of the source direction of the voice signal in step one and detects a line of sight toward the voice control device in the image acquired in the corresponding region of the source direction of the voice signal
  • the result of this recognition is sent to the processing unit 23, which causes the voice control device to be woken up.
  • the processing unit 23 causes the speech recognition unit 2 to operate, and the speech recognition unit 2 Receiving the complete speech signal and identifying it, the speech recognition unit 2 sends the recognition result to the processing unit 23, and the processing unit 23
  • the voice control device is caused to reply to the voice signal.
  • this situation can also analyze the response in a more detailed manner, as shown in FIG. 4, and can also be used for the speech recognition unit 2
  • the time point at which the complete voice signal is received is determined, specifically:
  • the image recognition unit 22 confirms that the image of the face whose line of sight is facing the voice control device is acquired, the voice recognition unit 2 has Receiving the complete voice signal, that is, the voice has stopped at this time, the voice recognition unit 2 recognizes the received voice signal;
  • the image recognition unit 22 confirms that the image of the face of the face of the voice control device is acquired, the voice recognition unit 2 has not yet Receiving the complete speech signal, i.e., the speech is not stopped at this time, the processing unit 23 causes the image recognition unit 22 to judge whether the face in the acquired image is speaking.
  • the processing unit 23 aligns the camera with the face until the voice signal is received.
  • the speech recognition unit 2 receives the complete speech signal and recognizes it, and the speech recognition unit 2 transmits the recognition result to the processing unit 23, and the processing unit 23 Having the voice control device respond to the voice signal;
  • the processing unit 23 Controls the rotatable camera to rotate within the range of its maximum rotation angle to capture images throughout the area.
  • the image recognition unit 22 transmits the recognition result to the processing unit 23, which causes the voice control device to be woken up.
  • Processing unit 23 The camera is aimed at the face until the speech signal is received, and then the processing unit 23 causes the speech recognition unit 2 to operate, and the speech recognition unit 2 receives the complete speech signal and recognizes it, the speech recognition unit 2 The recognition result is sent to the processing unit 23, which causes the voice control device to reply to the voice signal.
  • the image recognition unit 22 transmits the recognition result to the processing unit 23, which causes the voice control device to be woken up. Then the processing unit 23 makes the speech recognition unit 2 Working, speech recognition unit 2 receives the complete speech signal and recognizes it.
  • the speech recognition unit 2 can recognize the speech signal, the recognition result is sent to the processing unit 23 if the processing unit 23 Being able to correctly understand the recognition result (for example, matching one of the built-in operation instruction sets) causes the voice control device to reply to the voice signal; if the voice recognition unit 2 cannot recognize the voice signal, the processing unit 23 Causes the voice control device not to reply to the voice signal.
  • the processing unit 23 Being able to correctly understand the recognition result (for example, matching one of the built-in operation instruction sets) causes the voice control device to reply to the voice signal; if the voice recognition unit 2 cannot recognize the voice signal, the processing unit 23 Causes the voice control device not to reply to the voice signal.

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Image Analysis (AREA)

Abstract

Provided is a voice control method based on visual wake-up, which is used for waking up a voice controlled device so as to allow the voice controlled device to respond to a voice signal received by said device. The voice control method of the present invention comprises: a voice controlled device initiating, upon receiving at least a part of voice signals, an image receiving unit mounted thereon; the image receiving unit acquiring an image and transmitting same to an image recognition unit; the image recognition unit recognizing the image, when a human face whose sightline is directed to the voice controlled device is detected in the image, the voice controlled device being woken up to recognize the voice signals. The present invention wakes up a voice recognition unit by means of a visual wake-up function of searching for a human face whose sightline is directed to the voice controlled device, better conforming with the daily voice interaction habits of the user, being more convenient to use and smarter.

Description

一种基于视觉唤醒的语音控制方法  A voice control method based on visual wake-up
技术领域Technical field
本发明涉及智能控制技术领域,尤其涉及一种基于视觉唤醒的语音控制方法。The present invention relates to the field of intelligent control technologies, and in particular, to a voice control method based on visual wake-up.
背景技术Background technique
随着科技的发展,从手控到音控,智能语音技术正逐步渗透至电视、家居、汽车、可穿戴设备等多个领域,越来越多的设备支持语音控制。未来的智能家庭很可能是完全或大部分地基于语音控制的。With the development of technology, from manual control to sound control, intelligent voice technology is gradually infiltrating into TV, home, automotive, wearable devices and other fields. More and more devices support voice control. Future smart homes are likely to be based entirely or largely on voice control.
图 1 示出一个典型的语音控制设备的结构,其包含语音接收单元 1 ,通常为麦克风,还包含语音识别单元 2 和处理单元 3 。语音识别单元 2 获取来自语音接收单元 1 的语音信号,并进行语音信号识别,将识别的结果发送给处理单元 3 ,处理单元 3 指令该语音控制设备执行对应该语音信号的命令。Figure 1 shows the structure of a typical voice control device comprising a voice receiving unit 1, typically a microphone, and a voice recognition unit 2 And processing unit 3 . The speech recognition unit 2 acquires the speech signal from the speech receiving unit 1 and performs speech signal recognition, and transmits the recognized result to the processing unit 3, and the processing unit 3 The voice control device is instructed to execute a command corresponding to the voice signal.
在控制身边的多个诸如图 1 所示的语音控制设备时,与这些设备的 语音交互中的一个重要功能是语音唤醒。这是可以理解的,为了区别地对待这多个语音控制设备,能将命令准确地发送给其中确定的一个设备而使其他设备不受影响,只唤醒这个设备使它接收命令是必要的前提。目前 唤醒语音控制设备的语音唤醒一般基于唤醒词,例如设备的名称、代号等。When controlling a plurality of voice control devices such as those shown in FIG. 1 with these devices An important feature in voice interaction is voice wakeup. It is understandable that in order to treat the plurality of voice control devices differently, it is a prerequisite that the command can be accurately transmitted to one of the determined devices without affecting other devices, and only waking up the device to receive commands is necessary. Currently The wake-up of the wake-up voice control device is generally based on wake-up words, such as the name of the device, the code name, and the like.
但目前的这种 语音唤醒方式具有很多先天缺陷,比如当用户说了和唤醒词相同 / 相近的词,那么虽然实际上用户并没有唤醒该设备的意思,设备也会被唤醒。另外,每次用户唤醒设备时都要说唤醒词,这对于用户来说并不是什么好的使用体验。However, the current voice wake-up method has many innate defects, such as when the user says the same as the wake-up word / A similar word, then the device will be woken up, even though the user does not actually wake up the device. In addition, each time the user wakes up the device, the wake-up word is said, which is not a good experience for the user.
由于人在语音交互中的一个普遍习惯是注视与其语音交互的对象,在用语音控制 语音控制设备时,使用者也是习惯于注视着该设备。因此相比于目前的语音唤醒,通过检测使用者的目光确定唤醒的目标设备是更为符合使用者的日常体验的。Since a common habit in people's voice interaction is to look at the object that interacts with their voice, they use voice control. When the voice control device is used, the user is also accustomed to watching the device. Therefore, compared with the current voice wake-up, it is more in line with the user's daily experience to determine the target device that wakes up by detecting the user's gaze.
因此,本领域的技术人员致力于开发一种 基于视觉唤醒的语音控制方法,以更智能地唤醒目标设备 。Accordingly, those skilled in the art are directed to developing a visual wake-based voice control method to wake up a target device more intelligently.
发明内容Summary of the invention
为实现上述目的,本发明提供了一种基于视觉唤醒的语音控制方法,用于唤醒语音控制设备以使所述语音控制设备对其接收的语音信号做出回复,其特征在于,所述语音控制方法包括:To achieve the above object, the present invention provides a visual wake-up based voice control method for waking up a voice control device to cause the voice control device to reply to a voice signal it receives, characterized in that the voice control Methods include:
步骤一、语音控制设备接收到至少部分的所述语音信号后,启动安装在其上的图像接收单元;Step 1: After receiving the at least part of the voice signal, the voice control device starts an image receiving unit mounted thereon;
步骤二、所述图像接收单元获取图像并传送到图像识别单元;Step two, the image receiving unit acquires an image and transmits the image to the image recognition unit;
步骤三、所述图像识别单元识别所述图像,当在所述图像中检测到视线朝向所述语音控制设备的人脸时,所述语音控制设备被唤醒以识别所述语音信号。Step 3: The image recognition unit recognizes the image, and when a line of sight is detected in the image toward a face of the voice control device, the voice control device is woken up to recognize the voice signal.
可选地,所述图像接收单元为摄像头。Optionally, the image receiving unit is a camera.
进一步地,所述摄像头为广角摄像头。Further, the camera is a wide-angle camera.
可选地,所述图像接收单元为可旋转摄像头,所述可旋转摄像头包括云台,所述云台安装在所述语音控制设备上。Optionally, the image receiving unit is a rotatable camera, the rotatable camera comprises a pan/tilt, and the pan/tilt is mounted on the voice control device.
进一步地,所述云台是 2 轴驱动的。Further, the pan/tilt is 2-axis driven.
进一步地,所述步骤一包括:所述语音控制设备根据接收到的所述至少部分的所述语音信号,分辨所述语音信号的来源方向;当所述语音控制设备能确定所述语音信号的来源方向时,所述语音控制设备指令所述摄像头转向所述语音信号的来源方向获取图像,当所述语音控制设备不能确定所述语音信号的来源方向时,所述语音控制设备指令所述摄像头在其最大旋转角度范围内转动并获取图像。Further, the step 1 includes: the voice control device distinguishes a source direction of the voice signal according to the received at least part of the voice signal; and when the voice control device can determine the voice signal In the source direction, the voice control device instructs the camera to turn to the source direction of the voice signal to acquire an image, and when the voice control device cannot determine the source direction of the voice signal, the voice control device instructs the camera Rotate and acquire an image within its maximum range of rotation angles.
进一步地,所述步骤三包括:Further, the step three includes:
对于所述语音控制设备能确定所述语音信号的来源方向的情况,当所述图像识别单元在所述图像中检测到视线朝向所述语音控制设备的人脸时,所述语音控制设备接收完毕所述语音信号后识别所述语音信号,并做出回复;For the case where the voice control device can determine the source direction of the voice signal, when the image recognition unit detects a line of sight toward the voice control device in the image, the voice control device receives After the voice signal, the voice signal is recognized and a reply is made;
对于所述语音控制设备不能确定所述语音信号的来源方向的情况,当所述图像识别单元在所述图像中检测到视线朝向所述语音控制设备的人脸且所述人脸正在说话且所述语音信号后未接收完毕时,所述语音控制设备接收完毕所述语音信号后识别所述语音信号,并做出回复;当所述图像识别单元在所述图像中检测到视线朝向所述语音控制设备的人脸且所述人脸不在说话且所述语音信号已接收完毕时,所述语音控制设备识别所述语音信号并做出回复,如果语音控制设备不能识别所述语音信号则不做回复。For the case where the voice control device cannot determine the source direction of the voice signal, when the image recognition unit detects a line of sight toward the voice control device in the image and the face is speaking and After the voice signal is not received, the voice control device recognizes the voice signal after receiving the voice signal, and makes a reply; when the image recognition unit detects that the line of sight is facing the voice in the image Controlling the face of the device and the face is not speaking and the voice signal has been received, the voice control device recognizes the voice signal and responds, if the voice control device cannot recognize the voice signal, does not do Reply.
进一步地,当在所述步骤三中,所述图像中未检测到视线朝向所述语音控制设备的人脸时,所述语音控制设备不被唤醒。Further, when the step of looking at the face of the voice control device is not detected in the image, the voice control device is not woken up.
进一步地,所述语音控制设备通过语音接收单元接收所述语音信号,通过语音识别单元识别所述语音信号。Further, the voice control device receives the voice signal through a voice receiving unit, and identifies the voice signal through a voice recognition unit.
进一步地,所述语音接收单元为麦克风。Further, the voice receiving unit is a microphone.
本发明的基于视觉唤醒的语音控制方法使语音控制设备在开始接收到来源于使用者的语音信号时启动视觉唤醒功能,通过使用图像接收单元和图像识别单元在语音信号的来源方向搜索视线朝向该语音控制设备的人脸或者在整个区域搜索视线朝向该语音控制设备的人脸来判断是否唤醒该语音控制设备;被唤醒的语音控制设备通过语音识别单元识别接收的语音信号,做出相应的回复。本发明通过上述的视觉唤醒功能来唤醒语音识别单元,更符合使用者的日常语音交互习惯,使用更为方便、智能。The visual wake-up based voice control method of the present invention causes the voice control device to activate a visual wake-up function when starting to receive a voice signal originating from a user, by using an image receiving unit and an image recognition unit to search for a line of sight in a source direction of the voice signal toward the The face of the voice control device or the entire area searches for the line of sight toward the face of the voice control device to determine whether to wake up the voice control device; the awakened voice control device identifies the received voice signal through the voice recognition unit, and responds accordingly . The invention wakes up the voice recognition unit through the above-mentioned visual wake-up function, and is more suitable for the daily voice interaction habit of the user, and is more convenient and intelligent to use.
以下将结合附图对本发明的构思、具体结构及产生的技术效果作进一步说明,以充分地了解本发明的目的、特征和效果。The concept, the specific structure and the technical effects of the present invention will be further described in conjunction with the accompanying drawings in order to fully understand the objects, features and effects of the invention.
附图说明DRAWINGS
图 1 是现有技术的 语音控制设备的结构框图。1 is a structural block diagram of a prior art voice control device.
图 2 是本发明的 基于视觉唤醒的语音控制方法所应用的 语音控制设备的一种形式的结构框图。Fig. 2 is a block diagram showing a form of a voice control device to which the visual wake-up based voice control method of the present invention is applied.
图 3 是本发明的 基于视觉唤醒的语音控制方法所应用的 语音控制设备的另一种形式的结构框图。Fig. 3 is a block diagram showing another form of the voice control device to which the visual wake-up based voice control method of the present invention is applied.
图 4 是应用图 3 所示的语音控制设备的本发明的 基于视觉唤醒的语音控制方法的流程图。4 is a flow chart of a visual wake-based voice control method of the present invention to which the voice control device shown in FIG. 3 is applied.
具体实施方式detailed description
如图 2 所示,在本发明的一个较佳的实施例中, 本发明的 基于视觉唤醒的语音控制方法所应用的 语音控制设备包括语音接收单元 1 、图像接收单元 11 、语音识别单元 2 、图像识别单元 12 和处理单元 13 。其中,语音接收单元 1 为麦克风;图像接收单元 11 为摄像头,较佳地为广角摄像头;语音接收单元 1 和图像接收单元 11 安装在语音控制设备的外壳上。语音识别单元 2 获取来自语音接收单元 1 的语音信号,并进行语音信号识别,将识别的结果发送给处理单元 13 。本示例中采用的语音识别单元 2 可以是任何现有技术的具有语音识别功能的软件(和硬件)。图像识别单元 12 获取来自图像接收单元 11 的图像,并进行图像识别,将识别的结果发送给处理单元 13 ,本示例中采用的图像识别单元 12 可以是任何现有技术的具有人脸及视线方向的识别功能的软件,例如中国专利申请 ' 一种基于视线判断的人机交互方法及系统 ' (申请号: CN201210261378.8 )、中国专利申请 ' 快速精确的人眼定位方法及基于人眼定位的视线估计方法 ' (申请号: CN201510152613.1 )等。另外,处理单元 13 能够向语音识别单元 2 和图像识别单元 12 发出指令,指示其工作。As shown in FIG. 2, in a preferred embodiment of the present invention, the visual wake-up based voice control method of the present invention is applied. The voice control device includes a voice receiving unit 1, an image receiving unit 11, a voice recognition unit 2, an image recognition unit 12, and a processing unit 13. Wherein, the voice receiving unit 1 The microphone receiving unit 11 is a camera, preferably a wide-angle camera; the voice receiving unit 1 and the image receiving unit 11 are mounted on the casing of the voice control device. Speech recognition unit 2 The speech signal from the speech receiving unit 1 is acquired, and the speech signal is recognized, and the result of the recognition is sent to the processing unit 13. Speech recognition unit 2 used in this example It can be any prior art software (and hardware) with speech recognition capabilities. The image recognition unit 12 acquires an image from the image receiving unit 11 and performs image recognition, and transmits the result of the recognition to the processing unit 13 The image recognition unit 12 employed in the present example may be any prior art software having a recognition function of a face and a line of sight direction, for example, a Chinese patent application 'a human-computer interaction method and system based on line of sight judgment' (Application No.: CN201210261378.8), Chinese patent application 'Fast and accurate human eye positioning method and line of sight estimation method based on human eye positioning' (Application No.: CN201510152613.1) and so on. In addition, the processing unit 13 can issue an instruction to the voice recognition unit 2 and the image recognition unit 12 to instruct its operation.
应用图 2 所示的语音控制设备的本发明的 基于视觉唤醒的语音控制方法包括:The visual wake-up based voice control method of the present invention applying the voice control device shown in FIG. 2 includes:
步骤一、语音控制设备的语音接收单元 1 接收到至少部分的语音信号后,例如刚开始接收到 1-2 个音节后,启动图像接收单元 11 。Step 1: The voice receiving unit of the voice control device 1 receives at least part of the voice signal, for example, just starts receiving 1-2 After the syllables, the image receiving unit 11 is activated.
步骤二、图像接收单元 11 获取图像并传送到图像识别单元 12 ,即作为图像接收单元 11 的摄像头获取其视野范围内的图像,并将该图像发送给图像识别单元 12 。Step 2: The image receiving unit 11 acquires an image and transmits it to the image recognition unit 12, that is, as the image receiving unit 11 The camera acquires an image within its field of view and transmits the image to the image recognition unit 12.
步骤三、图像识别单元 12 识别该图像,当在图像中检测到视线朝向语音控制设备的人脸时,图像识别单元 12 将此识别结果发送给处理单元 13 ,处理单元 13 使语音控制设备被唤醒。继而处理单元 13 使语音识别单元 2 工作,语音识别单元 2 接收完整的语音信号并对其进行识别,语音识别单元 2 将识别结果发送给处理单元 3 ,处理单元 3 使语音控制设备对该语音信号做出回复。Step 3: The image recognition unit 12 recognizes the image, and when the line of sight is detected in the image toward the face of the voice control device, the image recognition unit 12 This recognition result is sent to the processing unit 13, which causes the voice control device to be woken up. Then the processing unit 13 causes the speech recognition unit 2 to operate, and the speech recognition unit 2 The complete speech signal is received and recognized, and the speech recognition unit 2 transmits the recognition result to the processing unit 3, which causes the speech control device to reply to the speech signal.
更优选地,如图 3 所示,在本发明的一个较佳的实施例中, 本发明的 基于视觉唤醒的语音控制方法所应用的 语音控制设备包括语音接收单元 1 、图像接收单元 21 、语音识别单元 2 、图像识别单元 22 和处理单元 23 。其中,语音接收单元 1 为麦克风;图像接收单元 21 为可旋转摄像头,如具有能够绕水平轴和竖直轴转动的 2 轴驱动的云台的可旋转摄像头;语音接收单元 1 和图像接收单元 21 安装在语音控制设备的外壳上,其中可旋转摄像头的云台安装在语音控制设备的外壳上。语音识别单元 2 获取来自语音接收单元 1 的语音信号,并进行语音信号识别,将识别的结果发送给处理单元 23 。本示例中采用的语音识别单元 2 可以是任何现有技术的具有语音识别功能并且能辨别语音的来源方向的软件(和硬件)。图像识别单元 22 获取来自图像接收单元 21 的图像,并进行图像识别,将识别的结果发送给处理单元 23 ,本示例中采用的图像识别单元 22 可以是和前一示例中相同的任何现有技术的具有人脸及视线方向的识别功能的软件。另外,处理单元 23 能够向语音识别单元 2 和图像识别单元 22 发出指令,指示其工作;处理单元 23 还能够控制作为图像接收单元 21 的可旋转摄像头的云台的转动,由此控制可旋转摄像头的转动方向和角度。More preferably, as shown in FIG. 3, in a preferred embodiment of the present invention, the visual wake-up based voice control method of the present invention is applied. The voice control device includes a voice receiving unit 1, an image receiving unit 21, a voice recognition unit 2, an image recognition unit 22, and a processing unit 23. Wherein, the voice receiving unit 1 a microphone; the image receiving unit 21 is a rotatable camera such as a rotatable camera having a 2-axis driven pan/tilt that is rotatable about a horizontal axis and a vertical axis; a voice receiving unit 1 and an image receiving unit 21 Mounted on the housing of the voice control device, the pan/tilt of the rotatable camera is mounted on the housing of the voice control device. Speech recognition unit 2 acquisition from speech receiving unit 1 The voice signal is subjected to voice signal recognition, and the result of the recognition is sent to the processing unit 23. Speech recognition unit 2 used in this example It can be any prior art software (and hardware) that has a speech recognition function and is capable of discerning the source direction of the speech. The image recognition unit 22 acquires the image receiving unit 21 And performing image recognition, and transmitting the recognized result to the processing unit 23, the image recognition unit 22 employed in the present example It may be any prior art software having the recognition function of the face and the line of sight direction, which is the same as in the previous example. In addition, the processing unit 23 can be directed to the speech recognition unit 2 and the image recognition unit 22 An instruction is issued to instruct its operation; the processing unit 23 is also capable of controlling the rotation of the pan/tilt as a rotatable camera of the image receiving unit 21, thereby controlling the rotational direction and angle of the rotatable camera.
应用图 3 所示的语音控制设备的本发明的 基于视觉唤醒的语音控制方法的流程图如图 4 所示,包括:A flowchart of the visual wake-up based voice control method of the present invention using the voice control device shown in FIG. 3 is as shown in FIG. 4, and includes:
步骤一、语音控制设备的语音接收单元 1 接收到至少部分的语音信号后,例如刚开始接收到 1-2 个音节后,启动图像接收单元 21 。同时,语音识别单元 2 通过接收的部分语音信号,分辨该语音信号的来源方向。Step 1: The voice receiving unit of the voice control device 1 receives at least part of the voice signal, for example, just starts receiving 1-2 After the syllables, the image receiving unit 21 is activated. At the same time, the speech recognition unit 2 discriminates the source direction of the speech signal through the received partial speech signal.
步骤二、图像接收单元 21 获取图像并传送到图像识别单元 22 ,其中对于在步骤一中确定了语音信号的来源方向的情况,处理单元 23 控制可旋转摄像头转向语音信号的来源方向,在语音信号的来源方向的对应区域获取图像;对于在步骤一中不能确定语音信号的来源方向的情况,处理单元 23 控制可旋转摄像头在其最大旋转角度的范围内转动,即在整个区域获取图像,直到在图像中检测到视线朝向语音控制设备的人脸。Step 2: The image receiving unit 21 acquires an image and transmits it to the image recognition unit 22 , wherein, in the case where the source direction of the voice signal is determined in step 1, the processing unit 23 Controlling the source direction of the rotatable camera to the voice signal, and acquiring an image in a corresponding region of the source direction of the voice signal; for the case where the source direction of the voice signal cannot be determined in step 1, the processing unit 23 The rotatable camera is controlled to rotate within a range of its maximum angle of rotation, i.e., the image is acquired throughout the area until a line of sight is detected in the image toward the face of the voice control device.
对于前一种情况,具体地可以分为两步进行:For the former case, it can be specifically divided into two steps:
步骤 1.1 、处理单元 23 控制可旋转摄像头转向语音信号的来源方向,在对应区域获取图像;Step 1.1, the processing unit 23 controls the source direction of the rotatable camera to turn the voice signal, and acquires an image in the corresponding area;
步骤 1.2 、 图像识别单元 22 分析获取图像,判断其中是否有人脸以及在图像中有人脸的情况下判断该人脸的视线是否朝向语音控制设备,如果皆为是则进入以下的步骤三,如果有一个不是则进入步骤 1.1 。Step 1.2, Image Recognition Unit 22 Analyze the acquired image, determine whether there is a face and whether there is a human face in the image, and determine whether the line of sight of the face faces the voice control device. If yes, go to the following step 3. If there is one, go to step 1.1. .
对于后一种情况,具体地可以分为两步进行:In the latter case, it can be specifically divided into two steps:
步骤 2.1 、处理单元 23 控制可旋转摄像头在其最大旋转角度的范围内转动,在整个区域获取其中具有人脸的图像,搜索完毕则使可旋转摄像头停止转动;Step 2.1, processing unit 23 Controlling the rotatable camera to rotate within a range of its maximum rotation angle, acquiring an image having a face therein in the entire area, and stopping the rotation of the rotatable camera after the search is completed;
步骤 2.2 、 图像识别单元 22 分析获取的人脸的图像,判断该人脸的视线是否朝向语音控制设备,如果是则进入以下的步骤三,如果不是则进入步骤 2.1 。Step 2.2, Image Recognition Unit 22 Analyze the acquired image of the face and determine whether the line of sight of the face is toward the voice control device. If yes, proceed to step 3 below. If not, proceed to step 2.1.
步骤三、图像识别单元 22 识别该图像,其中对于在步骤一中确定了语音信号的来源方向的情况,且在语音信号的来源方向的对应区域获取的图像中检测到视线朝向语音控制设备的人脸时,图像识别单元 22 将此识别结果发送给处理单元 23 ,处理单元 23 使语音控制设备被唤醒。继而处理单元 23 使语音识别单元 2 工作,语音识别单元 2 接收完整的语音信号并对其进行识别,语音识别单元 2 将识别结果发送给处理单元 23 ,处理单元 23 使语音控制设备对该语音信号做出回复。较佳地,这一情况还可以更细化地分析应对,如图 4 所示,还可以对语音识别单元 2 接收完整的语音信号的时间点进行判断,具体地为:Step three, image recognition unit 22 Identifying the image, wherein the image recognition unit detects a direction of the source direction of the voice signal in step one and detects a line of sight toward the voice control device in the image acquired in the corresponding region of the source direction of the voice signal The result of this recognition is sent to the processing unit 23, which causes the voice control device to be woken up. Then the processing unit 23 causes the speech recognition unit 2 to operate, and the speech recognition unit 2 Receiving the complete speech signal and identifying it, the speech recognition unit 2 sends the recognition result to the processing unit 23, and the processing unit 23 The voice control device is caused to reply to the voice signal. Preferably, this situation can also analyze the response in a more detailed manner, as shown in FIG. 4, and can also be used for the speech recognition unit 2 The time point at which the complete voice signal is received is determined, specifically:
1 、图像识别单元 22 确认获取了 视线朝向语音控制设备的人脸的图像时,语音识别单元 2 已经 接收了完整的语音信号,即此时语音已停止,则语音识别单元 2 识别已接收的语音信号;1. The image recognition unit 22 confirms that the image of the face whose line of sight is facing the voice control device is acquired, the voice recognition unit 2 has Receiving the complete voice signal, that is, the voice has stopped at this time, the voice recognition unit 2 recognizes the received voice signal;
2 、 图像识别单元 22 确认获取了 视线朝向语音控制设备的人脸的图像时,语音识别单元 2 尚未 接收完全语音信号,即此时语音未停止,则处理单元 23 使 图像识别单元 22 判断其获取的图像中的人脸是否在说话,2, the image recognition unit 22 confirms that the image of the face of the face of the voice control device is acquired, the voice recognition unit 2 has not yet Receiving the complete speech signal, i.e., the speech is not stopped at this time, the processing unit 23 causes the image recognition unit 22 to judge whether the face in the acquired image is speaking.
如果是则能判断该接收中的语音信号是该人发出的,由此 处理单元 23 使摄像头对准该人脸直到语音信号接收完毕, 语音识别单元 2 接收完整的语音信号并对其进行识别,语音识别单元 2 将识别结果发送给处理单元 23 ,处理单元 23 使语音控制设备对该语音信号做出回复;If so, it can be judged that the received voice signal is sent by the person, whereby the processing unit 23 aligns the camera with the face until the voice signal is received. The speech recognition unit 2 receives the complete speech signal and recognizes it, and the speech recognition unit 2 transmits the recognition result to the processing unit 23, and the processing unit 23 Having the voice control device respond to the voice signal;
如果不是则 能判断该接收中的语音信号不是该人发出的,由此需要重新搜索,即回到步骤二。If not, it can be judged that the voice signal being received is not issued by the person, and thus needs to be searched again, that is, back to step 2.
可以看出,这样细化的分析应对能够适应于更复杂的环境,例如场景中存在多个人说话,从而正确地找到发出语音信号的人。It can be seen that such refined analysis should be able to adapt to more complex environments, such as the presence of multiple people in the scene to correctly find the person who emits the voice signal.
对于在步骤一中确定了语音信号的来源方向的情况,但是在语音信号的来源方向的对应区域获取的图像中没有检测到视线朝向语音控制设备的人脸时,将其作为不能确定语音信号的来源方向的情况并返回步骤二,即处理单元 23 控制可旋转摄像头在其最大旋转角度的范围内转动,在整个区域获取图像。For the case where the source direction of the voice signal is determined in step 1, but when the face of the voice control device is not detected in the image acquired in the corresponding region of the source direction of the voice signal, it is regarded as a voice signal that cannot be determined. The source direction and return to step two, the processing unit 23 Controls the rotatable camera to rotate within the range of its maximum rotation angle to capture images throughout the area.
对于在步骤一中不能确定语音信号的来源方向的情况,且在整个区域获取的图像中检测到视线朝向语音控制设备的人脸,且该人脸正在说话且此时语音接收单元 1 尚未接收完毕语音信号时,图像识别单元 22 将此识别结果发送给处理单元 23 ,处理单元 23 使语音控制设备被唤醒。处理单元 23 使摄像头对准该人脸直到语音信号接收完毕, 继而处理单元 23 使语音识别单元 2 工作,语音识别单元 2 接收完整的语音信号并对其进行识别,语音识别单元 2 将识别结果发送给处理单元 23 ,处理单元 23 使语音控制设备对该语音信号做出回复。For the case where the source direction of the voice signal cannot be determined in step 1, and the face of the voice control device is detected in the image acquired in the entire area, and the face is speaking and the voice receiving unit is at this time 1 When the voice signal has not been received, the image recognition unit 22 transmits the recognition result to the processing unit 23, which causes the voice control device to be woken up. Processing unit 23 The camera is aimed at the face until the speech signal is received, and then the processing unit 23 causes the speech recognition unit 2 to operate, and the speech recognition unit 2 receives the complete speech signal and recognizes it, the speech recognition unit 2 The recognition result is sent to the processing unit 23, which causes the voice control device to reply to the voice signal.
对于在步骤一中不能确定语音信号的来源方向的情况,且在整个区域获取的图像中检测到视线朝向语音控制设备的人脸,且该人脸不在说话且此时语音接收单元 1 已经接收完毕语音信号时,图像识别单元 22 将此识别结果发送给处理单元 23 ,处理单元 23 使语音控制设备被唤醒。继而处理单元 23 使语音识别单元 2 工作,语音识别单元 2 接收完整的语音信号并对其进行识别。如果语音识别单元 2 能够识别该语音信号,则将识别结果发送给处理单元 23 ,如果处理单元 23 能够正确理解该识别结果(例如与其内置的操作指令组中的一个相符)则 使语音控制设备对该语音信号做出回复;如果语音识别单元 2 不能够识别该语音信号,则处理单元 23 使语音控制设备不对该语音信号做出回复。For the case where the source direction of the voice signal cannot be determined in step 1, and the face of the voice control device is detected in the image acquired in the entire area, and the face is not speaking and the voice receiving unit is at this time 1 When the voice signal has been received, the image recognition unit 22 transmits the recognition result to the processing unit 23, which causes the voice control device to be woken up. Then the processing unit 23 makes the speech recognition unit 2 Working, speech recognition unit 2 receives the complete speech signal and recognizes it. If the speech recognition unit 2 can recognize the speech signal, the recognition result is sent to the processing unit 23 if the processing unit 23 Being able to correctly understand the recognition result (for example, matching one of the built-in operation instruction sets) causes the voice control device to reply to the voice signal; if the voice recognition unit 2 cannot recognize the voice signal, the processing unit 23 Causes the voice control device not to reply to the voice signal.
以上详细描述了本发明的较佳具体实施例。应当理解,本领域的普通技术人员无需创造性劳动就可以根据本发明的构思做出诸多修改和变化。因此,凡本技术领域的技术人员依本发明的构思在现有技术的基础上通过逻辑分析、推理或者有限的实验可以得到的技术方案,皆应在由权利要求书所确定的保护范围内。The above has described in detail the preferred embodiments of the invention. It will be appreciated that many modifications and variations can be made in the present invention without departing from the scope of the invention. Therefore, any technical solution that can be obtained by a person skilled in the art based on the prior art based on the prior art by logic analysis, reasoning or limited experimentation should be within the scope of protection determined by the claims.

Claims (10)

  1. 一种基于视觉唤醒的语音控制方法,用于唤醒语音控制设备以使所述语音控制设备对其接收的语音信号做出回复,其特征在于,所述语音控制方法包括: A voice control method based on visual wake-up is used for waking up a voice control device to cause the voice control device to reply to a voice signal received by the voice control device, wherein the voice control method includes:
    步骤一、语音控制设备接收到至少部分的所述语音信号后,启动安装在其上的图像接收单元;Step 1: After receiving the at least part of the voice signal, the voice control device starts an image receiving unit mounted thereon;
    步骤二、所述图像接收单元获取图像并传送到图像识别单元;Step two, the image receiving unit acquires an image and transmits the image to the image recognition unit;
    步骤三、所述图像识别单元识别所述图像,当在所述图像中检测到视线朝向所述语音控制设备的人脸时,所述语音控制设备被唤醒以识别所述语音信号。 Step 3: The image recognition unit recognizes the image, and when a line of sight is detected in the image toward a face of the voice control device, the voice control device is woken up to recognize the voice signal.
  2. 如权利要求1所述的基于视觉唤醒的语音控制方法,其中所述图像接收单元为摄像头。The visual wake-up based voice control method according to claim 1, wherein said image receiving unit is a camera.
  3. 如权利要求2所述的基于视觉唤醒的语音控制方法,其中所述摄像头为广角摄像头。The visual wake-up based voice control method of claim 2 wherein said camera is a wide-angle camera.
  4. 如权利要求1所述的基于视觉唤醒的语音控制方法,其中所述图像接收单元为可旋转摄像头,所述可旋转摄像头包括云台,所述云台安装在所述语音控制设备的外壳上。The visual wake-up based voice control method according to claim 1, wherein said image receiving unit is a rotatable camera, said rotatable camera comprises a pan/tilt, and said pan-tilt is mounted on a casing of said voice control device.
  5. 如权利要求4所述的基于视觉唤醒的语音控制方法,其中所述云台是2轴驱动的。The visual wake-up based voice control method of claim 4 wherein said pan/tilt is 2-axis driven.
  6. 如权利要求4或5所述的基于视觉唤醒的语音控制方法,其中所述步骤一包括:所述语音控制设备根据接收到的所述至少部分的所述语音信号,分辨所述语音信号的来源方向;当所述语音控制设备能确定所述语音信号的来源方向时,所述语音控制设备指令所述摄像头转向所述语音信号的来源方向获取图像,当所述语音控制设备不能确定所述语音信号的来源方向时,所述语音控制设备指令所述摄像头在其最大旋转角度范围内转动并获取图像。The visual wake-up based voice control method according to claim 4 or 5, wherein said step 1 comprises: said voice control device distinguishing a source of said voice signal based on said received at least part of said voice signal a direction; when the voice control device is capable of determining a source direction of the voice signal, the voice control device instructs the camera to turn to a source direction of the voice signal to acquire an image, and when the voice control device cannot determine the voice When the source direction of the signal, the voice control device instructs the camera to rotate within its maximum range of rotation angles and acquire an image.
  7. 如权利要求6所述的基于视觉唤醒的语音控制方法,其中所述步骤三包括:The visual wake-up based voice control method according to claim 6, wherein said step three comprises:
    对于所述语音控制设备能确定所述语音信号的来源方向的情况,当所述图像识别单元在所述图像中检测到视线朝向所述语音控制设备的人脸时,所述语音控制设备接收完毕所述语音信号后识别所述语音信号,并做出回复;For the case where the voice control device can determine the source direction of the voice signal, when the image recognition unit detects a line of sight toward the voice control device in the image, the voice control device receives After the voice signal, the voice signal is recognized and a reply is made;
    对于所述语音控制设备不能确定所述语音信号的来源方向的情况,当所述图像识别单元在所述图像中检测到视线朝向所述语音控制设备的人脸且所述人脸正在说话且所述语音信号后未接收完毕时,所述语音控制设备接收完毕所述语音信号后识别所述语音信号,并做出回复;当所述图像识别单元在所述图像中检测到视线朝向所述语音控制设备的人脸且所述人脸不在说话且所述语音信号已接收完毕时,所述语音控制设备识别所述语音信号并做出回复,如果语音控制设备不能识别所述语音信号则不做回复。For the case where the voice control device cannot determine the source direction of the voice signal, when the image recognition unit detects a line of sight toward the voice control device in the image and the face is speaking and After the voice signal is not received, the voice control device recognizes the voice signal after receiving the voice signal, and makes a reply; when the image recognition unit detects that the line of sight is facing the voice in the image Controlling the face of the device and the face is not speaking and the voice signal has been received, the voice control device recognizes the voice signal and responds, if the voice control device cannot recognize the voice signal, does not do Reply.
  8. 如权利要求7所述的基于视觉唤醒的语音控制方法,其中当在所述步骤三中,所述图像中未检测到视线朝向所述语音控制设备的人脸时,所述语音控制设备不被唤醒。The visual wake-up based voice control method according to claim 7, wherein when the face is not detected in the image toward the face of the voice control device, the voice control device is not wake.
  9. 如权利要求1所述的基于视觉唤醒的语音控制方法,其中所述语音控制设备通过语音接收单元接收所述语音信号,通过语音识别单元识别所述语音信号。The visual wake-up based voice control method according to claim 1, wherein said voice control device receives said voice signal through a voice receiving unit, and recognizes said voice signal by a voice recognition unit.
  10. 如权利要求9所述的基于视觉唤醒的语音控制方法,其中所述语音接收单元为麦克风。The visual wake-up based voice control method of claim 9, wherein the voice receiving unit is a microphone.
PCT/CN2015/088723 2015-09-01 2015-09-01 Voice control method based on visual wake-up WO2017035768A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2015/088723 WO2017035768A1 (en) 2015-09-01 2015-09-01 Voice control method based on visual wake-up

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2015/088723 WO2017035768A1 (en) 2015-09-01 2015-09-01 Voice control method based on visual wake-up

Publications (1)

Publication Number Publication Date
WO2017035768A1 true WO2017035768A1 (en) 2017-03-09

Family

ID=58186536

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/088723 WO2017035768A1 (en) 2015-09-01 2015-09-01 Voice control method based on visual wake-up

Country Status (1)

Country Link
WO (1) WO2017035768A1 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108198553A (en) * 2018-01-23 2018-06-22 北京百度网讯科技有限公司 Voice interactive method, device, equipment and computer readable storage medium
CN108735216A (en) * 2018-06-12 2018-11-02 广东小天才科技有限公司 A kind of voice based on semantics recognition searches topic method and private tutor's equipment
CN109992237A (en) * 2018-01-03 2019-07-09 腾讯科技(深圳)有限公司 Intelligent sound apparatus control method, device, computer equipment and storage medium
CN110136714A (en) * 2019-05-14 2019-08-16 北京探境科技有限公司 Natural interaction sound control method and device
CN111369988A (en) * 2018-12-26 2020-07-03 华为终端有限公司 Voice awakening method and electronic equipment
CN111767785A (en) * 2020-05-11 2020-10-13 南京奥拓电子科技有限公司 Man-machine interaction control method and device, intelligent robot and storage medium
CN111880854A (en) * 2020-07-29 2020-11-03 百度在线网络技术(北京)有限公司 Method and apparatus for processing speech
CN113539265A (en) * 2021-07-13 2021-10-22 中国第一汽车股份有限公司 Control method, device, equipment and storage medium
WO2023231211A1 (en) * 2022-06-01 2023-12-07 合众新能源汽车股份有限公司 Voice recognition method and apparatus, electronic device, storage medium, and product

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000278626A (en) * 1999-03-29 2000-10-06 Sanyo Electric Co Ltd Multiple screens sound output controller
JP2000347692A (en) * 1999-06-07 2000-12-15 Sanyo Electric Co Ltd Person detecting method, person detecting device, and control system using it
JP2002135642A (en) * 2000-10-24 2002-05-10 Atr Onsei Gengo Tsushin Kenkyusho:Kk Speech translation system
US20020105575A1 (en) * 2000-12-05 2002-08-08 Hinde Stephen John Enabling voice control of voice-controlled apparatus
CN1981257A (en) * 2004-07-08 2007-06-13 皇家飞利浦电子股份有限公司 A method and a system for communication between a user and a system
CN1983389A (en) * 2005-12-14 2007-06-20 台达电子工业股份有限公司 Speech controlling method
CN103376891A (en) * 2012-04-23 2013-10-30 凹凸电子(武汉)有限公司 Multimedia system, control method for display device and controller
CN103853329A (en) * 2012-12-06 2014-06-11 Lg电子株式会社 Mobile terminal and controlling method thereof
CN104157284A (en) * 2013-05-13 2014-11-19 佳能株式会社 Voice command detecting method and system and information processing system
CN104820556A (en) * 2015-05-06 2015-08-05 广州视源电子科技股份有限公司 Method and device for waking up voice assistant
CN105204628A (en) * 2015-09-01 2015-12-30 涂悦 Voice control method based on visual awakening

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000278626A (en) * 1999-03-29 2000-10-06 Sanyo Electric Co Ltd Multiple screens sound output controller
JP2000347692A (en) * 1999-06-07 2000-12-15 Sanyo Electric Co Ltd Person detecting method, person detecting device, and control system using it
JP2002135642A (en) * 2000-10-24 2002-05-10 Atr Onsei Gengo Tsushin Kenkyusho:Kk Speech translation system
US20020105575A1 (en) * 2000-12-05 2002-08-08 Hinde Stephen John Enabling voice control of voice-controlled apparatus
CN1981257A (en) * 2004-07-08 2007-06-13 皇家飞利浦电子股份有限公司 A method and a system for communication between a user and a system
CN1983389A (en) * 2005-12-14 2007-06-20 台达电子工业股份有限公司 Speech controlling method
CN103376891A (en) * 2012-04-23 2013-10-30 凹凸电子(武汉)有限公司 Multimedia system, control method for display device and controller
CN103853329A (en) * 2012-12-06 2014-06-11 Lg电子株式会社 Mobile terminal and controlling method thereof
CN104157284A (en) * 2013-05-13 2014-11-19 佳能株式会社 Voice command detecting method and system and information processing system
CN104820556A (en) * 2015-05-06 2015-08-05 广州视源电子科技股份有限公司 Method and device for waking up voice assistant
CN105204628A (en) * 2015-09-01 2015-12-30 涂悦 Voice control method based on visual awakening

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YONEZAWA, T. ET AL.: "Evaluating Crossmodal Awareness of Daily-partner Robot to User's Behaviors with Gaze and Utterance Detection", PROCEEDINGS OF THE ACM INTERNATIONAL WORKSHOP ON CONTEXT-AWARENESS FOR SELF-MANAGING SYSTEMS - CASEMANS 2009, 31 December 2009 (2009-12-31), pages 1 - 8, XP058164605 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109992237B (en) * 2018-01-03 2022-04-22 腾讯科技(深圳)有限公司 Intelligent voice equipment control method and device, computer equipment and storage medium
CN109992237A (en) * 2018-01-03 2019-07-09 腾讯科技(深圳)有限公司 Intelligent sound apparatus control method, device, computer equipment and storage medium
CN108198553A (en) * 2018-01-23 2018-06-22 北京百度网讯科技有限公司 Voice interactive method, device, equipment and computer readable storage medium
CN108198553B (en) * 2018-01-23 2021-08-06 北京百度网讯科技有限公司 Voice interaction method, device, equipment and computer readable storage medium
US10991372B2 (en) 2018-01-23 2021-04-27 Beijing Baidu Netcom Scienc And Technology Co., Ltd. Method and apparatus for activating device in response to detecting change in user head feature, and computer readable storage medium
CN108735216B (en) * 2018-06-12 2020-10-16 广东小天才科技有限公司 Voice question searching method based on semantic recognition and family education equipment
CN108735216A (en) * 2018-06-12 2018-11-02 广东小天才科技有限公司 A kind of voice based on semantics recognition searches topic method and private tutor's equipment
CN111369988A (en) * 2018-12-26 2020-07-03 华为终端有限公司 Voice awakening method and electronic equipment
CN110136714A (en) * 2019-05-14 2019-08-16 北京探境科技有限公司 Natural interaction sound control method and device
CN111767785A (en) * 2020-05-11 2020-10-13 南京奥拓电子科技有限公司 Man-machine interaction control method and device, intelligent robot and storage medium
CN111880854A (en) * 2020-07-29 2020-11-03 百度在线网络技术(北京)有限公司 Method and apparatus for processing speech
CN111880854B (en) * 2020-07-29 2024-04-30 百度在线网络技术(北京)有限公司 Method and device for processing voice
CN113539265A (en) * 2021-07-13 2021-10-22 中国第一汽车股份有限公司 Control method, device, equipment and storage medium
CN113539265B (en) * 2021-07-13 2022-09-16 中国第一汽车股份有限公司 Control method, device, equipment and storage medium
WO2023231211A1 (en) * 2022-06-01 2023-12-07 合众新能源汽车股份有限公司 Voice recognition method and apparatus, electronic device, storage medium, and product

Similar Documents

Publication Publication Date Title
WO2017035768A1 (en) Voice control method based on visual wake-up
US11100929B2 (en) Voice assistant devices
WO2017188801A1 (en) Optimum control method based on multi-mode command of operation-voice, and electronic device to which same is applied
CN105204628A (en) Voice control method based on visual awakening
WO2018210219A1 (en) Device-facing human-computer interaction method and system
WO2019235863A1 (en) Methods and systems for passive wakeup of a user interaction device
RU2656096C1 (en) Method and apparatus for activation of the electronic device
WO2020197062A1 (en) Multi-modal interaction with intelligent assistants in voice command devices
US20170048077A1 (en) Intelligent electric apparatus and controlling method of same
WO2014094437A1 (en) Security monitoring system and corresponding alarm triggering method
WO2017078361A1 (en) Electronic device and method for recognizing speech
US20170139470A1 (en) Method for intelligently controlling controlled equipment and device
CN105278380B (en) The control method and device of smart machine
WO2018174437A1 (en) Electronic device and controlling method thereof
WO2017101439A1 (en) Electronic device control method and apparatus
US11240057B2 (en) Alternative output response based on context
WO2018053908A1 (en) Dual-camera photographic method, system, and terminal
WO2019112240A1 (en) Electronic apparatus and control method thereof
WO2017054599A1 (en) A household appliance control method and apparatus
EP3923275B1 (en) Device wakeup method and apparatus, electronic device, and storage medium
WO2014183529A1 (en) Mobile terminal talk mode switching method, device and storage medium
WO2017166462A1 (en) Method and system for reminding about change in environment, and head-mounted vr device
TW201743241A (en) Portable electronic device and operation method thereof
EP3738025A1 (en) Electronic device and method for controlling external electronic device based on use pattern information corresponding to user
WO2017079891A1 (en) Smart door viewer system and smart door viewer

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15902575

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15902575

Country of ref document: EP

Kind code of ref document: A1