WO2015154419A1 - Dispositif et procédé d'interaction humain-machine - Google Patents

Dispositif et procédé d'interaction humain-machine Download PDF

Info

Publication number
WO2015154419A1
WO2015154419A1 PCT/CN2014/089020 CN2014089020W WO2015154419A1 WO 2015154419 A1 WO2015154419 A1 WO 2015154419A1 CN 2014089020 W CN2014089020 W CN 2014089020W WO 2015154419 A1 WO2015154419 A1 WO 2015154419A1
Authority
WO
WIPO (PCT)
Prior art keywords
lip
human
microphone
camera
voice
Prior art date
Application number
PCT/CN2014/089020
Other languages
English (en)
Chinese (zh)
Inventor
陈军
姚立哲
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2015154419A1 publication Critical patent/WO2015154419A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/24Speech recognition using non-acoustical features
    • G10L15/25Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis

Definitions

  • the present invention relates to the field of human-computer interaction technology, and more particularly to a human-computer interaction device and method.
  • the technical problem to be solved by the present invention is to provide a human-machine interaction device and method to solve the problem of low reliability of speech recognition in an environment with noise interference.
  • a human-computer interaction method comprising:
  • the camera in the human-machine interaction device is activated to acquire the lip-reading image in real time;
  • the human-machine interaction device processes the sequence formed by the acquired lip-reading image to obtain lip-motion feature data
  • the human-machine interaction device fuses the lip-motion feature data with the voice feature data extracted from the voice signal to identify the input voice.
  • the step of detecting valid voice input includes:
  • the microphone detects a sound source, converts the natural voice of the detected sound source into an electrical signal, and determines that there is a valid voice input when the converted electrical signal exceeds a set threshold, wherein the electrical signal includes a voltage Signal or current signal.
  • the step of initiating the camera in the human-machine interaction device to acquire a lip-reading image in real time the method further includes:
  • the human-machine interaction device controls the microphone to enter a listening state, and the control center The camera stops working until the microphone detects a valid voice input again, and then the camera is started to operate normally.
  • a human-computer interaction method comprising:
  • the microphone in the human-machine interaction device acquires a voice signal, and the camera acquires a lip-reading image in real time;
  • the human-machine interaction device processes the sequence formed by the acquired lip-reading image to obtain lip-moving feature data.
  • the human-machine interaction device fuses the lip motion feature data and the voice feature data extracted from the voice signal to identify an input voice, wherein the microphone acquires a voice signal but obtains from the camera When invalid lip motion feature data is obtained in the sequence formed by the lip-reading image, the microphone is controlled to enter a listening state, and the camera is controlled to stop working.
  • the method further includes:
  • the microphone When the microphone enters the listening state, if a valid voice input is detected, the working state is entered, and the camera is started to acquire the lip reading image in real time.
  • a human-machine interaction device includes a microphone, a camera, a lip-reading image processing module, and a fusion recognition module, wherein:
  • the microphone is configured to: acquire a voice signal, and when the valid voice input is detected, activate the camera;
  • the camera is configured to: acquire a lip reading image in real time according to the control of the microphone;
  • the lip-reading image processing module is configured to: process the sequence formed by the acquired lip-reading image to obtain lip-moving feature data;
  • the fusion identification module is configured to: extract the lip motion feature data from the voice signal
  • the acquired voice feature data is fused to recognize the input voice.
  • the microphone is arranged to detect valid voice input as follows:
  • the microphone detects a sound source, converts the natural voice of the detected sound source into an electrical signal, and determines that there is a valid voice input when the converted electrical signal exceeds a set threshold, wherein the electrical signal includes a voltage Signal or current signal.
  • the apparatus further includes a control module, wherein:
  • the control module is configured to control the microphone to enter a sound when the microphone acquires a voice signal, but the lip-reading image processing module obtains invalid lip-motion feature data from a sequence formed by the acquired lip-read image The state, the control camera stops working until the microphone detects a valid voice input again, and then the camera is started to work normally.
  • the device is assembled in any of the following devices:
  • Wearable devices portable devices, smart terminals, smart home appliances, security monitoring devices.
  • a human-machine interaction device includes a microphone and a camera, and further includes a lip-reading image processing module, a fusion recognition module, and a control module, wherein:
  • the lip-reading image processing module is configured to: process a sequence formed by the lip-reading image acquired by the camera to obtain lip-moving feature data;
  • the fusion identification module is configured to: fuse the lip motion feature data with the voice feature data extracted from the voice signal acquired by the microphone, and identify the input voice;
  • the control module is configured to: when the microphone acquires a voice signal, but the lip-reading image processing module obtains invalid lip-motion feature data from a sequence formed by the acquired lip-reading image, controlling the microphone to enter a sounding Status, control camera stops working.
  • the microphone is further configured to: after entering the listening state according to the control of the control module, if a valid voice input is detected, enter a working state, and start the camera to acquire a lip-reading image in real time.
  • the device is assembled in any of the following devices:
  • Wearable devices portable devices, smart terminals, smart home appliances, security monitoring devices.
  • a computer program comprising program instructions that, when executed by a human-machine interaction device, cause the human-machine interaction device to perform a corresponding human-computer interaction method.
  • a carrier carrying the computer program.
  • a computer program comprising program instructions that, when executed by a human-machine interaction device, cause the human-machine interaction device to perform a corresponding human-computer interaction method.
  • a carrier carrying the computer program.
  • the technical solution of the present application combines lip reading and speech in a noisy environment, and improves speech recognition, improves machine recognition rate, and confirms effective speech in comparison with a conventional technique of recognizing using single speech feature data.
  • the input is started, the camera work is started, and the power consumption of the device is greatly reduced.
  • FIG. 1 is a structural diagram of an interaction apparatus implemented according to an embodiment of the present invention.
  • This embodiment provides a human-computer interaction method for combining lip reading and speech to perform speech recognition in a noisy environment.
  • the method mainly includes the following operations:
  • the camera in the human-machine interaction device is activated to acquire the lip-reading image in real time;
  • the human-machine interaction device processes the sequence formed by the acquired lip-reading image to obtain lip-motion feature data.
  • the human-machine interaction device fuses the lip motion feature data and the voice feature data extracted from the voice signal to recognize the input voice.
  • the lip-reading image may also be referred to as a lip-moving image, and refers to an image in which the movement of the speaker's lips changes when a person speaks.
  • the lip-reading image constitutes a sequence of images or a lip-reading image video.
  • the sequence formed by the lip-reading image refers to the lip-reading image video over a period of time.
  • the characteristic parameters that is, the lip motion characteristic data, which are subjected to the specific operation processing of the lip motion image sequence, are common knowledge to those skilled in the art, and will not be described herein.
  • the speech feature data is obtained after the speech signal is processed and processed, and the representation method is more, for example, the spectral parameter of the speech can be used as one of the feature data.
  • the speech feature data processing is performed after the speech signal is acquired, and is executed by the speech processing module. Speech feature data processing and lip-reading image processing are performed independently.
  • the process of detecting valid voice input is as follows:
  • the microphone detects the sound source and converts the natural voice of the detected sound source into an electrical signal. When the converted electrical signal exceeds the set threshold, it is determined that there is a valid voice input.
  • the electrical signals involved include a current signal or a voltage signal.
  • a feedback mechanism of lip reading processing is also proposed, that is, when the microphone acquires the speech signal, invalid lip motion characteristic data is obtained from the sequence formed by the lip reading image acquired by the camera (at this time) That is to say, the user's lip does not have any action, the user may not speak.)
  • the human-machine interaction device controls the microphone to enter the listening state, and controls the camera to stop working until the microphone detects a valid voice input again, and then starts the camera to work normally.
  • This mechanism is mainly for the case of large noise influence, combined with the user's lip motion feature, accurately distinguishes whether the user voice or noise, and when the noise is recognized, stops the camera work to improve equipment utilization.
  • the human-machine interaction device may further reserve the microphone for acquiring the voice signal according to the user instruction, and notify the camera to cancel the acquisition of the lip-reading image. Therefore, in the special scene, the user selects the recognition mode and improves the user experience.
  • the user uses a headset to communicate with the smart device for voice interaction.
  • the lip can be utilized.
  • the recognition of the read image further improves the accuracy of the speech recognition, and facilitates the machine to better understand the user's language expression and execute the user's voice command.
  • the human-computer interaction process is as follows:
  • Step 1 The microphone acquires a voice signal, and when there is valid voice input, starts the camera work;
  • the microphone mainly uses a sound pressure sensor to detect the sound source and convert the natural voice into an electrical signal.
  • a threshold value of the sound pressure sensor electrical signal may be set to determine whether there is a valid voice input.
  • the converted sound pressure sensor electrical signal is greater than or not less than the set threshold value, when it is determined that there is valid voice input, the camera is notified to start and normal operation begins.
  • the microphone detects that there is valid voice input, it notifies the camera to work and obtains the lip-reading image, so that the operation can reduce the power consumption of the device.
  • Step 2 The camera acquires a lip-reading image.
  • the usual acquisition of lip-reading images is to perform face recognition first in the image sequence, determine the position of the lips, and then obtain lip-motion data.
  • a directional microphone can be selected, and the camera is built in the microphone (or the microphone is built in the camera), such as a headset, the camera is located at the microphone, and the camera is directly aimed at the user's lips when the user is using it. This makes it easy to get a lip image.
  • Step 3 Processing the sequence formed by the acquired lip-reading image to obtain lip-motion feature data.
  • a feedback mechanism for lip reading processing can be set. For example, in a noisy environment, or in a cross-talker scenario, if the microphone acquires other sound signals when the user does not speak, the camera starts to acquire the lip image, but the lip-reading image is processed without extracting the lip. Dynamic features. At this time, the human-machine interaction device can notify the camera, the voice processing module, the lip-reading processing module, and the fusion recognition module to stop working, and only the microphone is in the listening state.
  • the human-computer interaction is performed only by the voice to avoid the lip-reading recognition result to interfere with the speech recognition. Or for special scenes or special people, you can also set up human-computer interaction only through lip reading.
  • Step 4 Processing the acquired voice to obtain voice feature data.
  • the processing of the lip-reading image and the processing of the voice are performed by two independent parts, so the order of the above steps 3 and 4 can be adjusted. , can also be at the same time.
  • Step 5 The fusion identification module performs fusion recognition on the voice feature data and the lip motion feature data.
  • Lip reading and speech are complementary channels, such as /m/ and /n/ unit sounds that are indistinguishable in speech signal channels are visually distinguishable; visually indistinguishable from /b/, /p/ And /m/ unit sounds are distinguishable on the voice signal.
  • the auxiliary information of the lip-reading image can significantly improve the speech recognition rate of the machine.
  • the related recognition processing technology of lip reading and speech is used to correct the inconsistency between lip reading recognition and speech recognition results.
  • the trained identification library can be used to determine which channel information is more reliable, thereby improving the speech recognition rate.
  • the human-machine interaction device involved in the above method can also be installed in devices such as wearable devices (such as smart glasses, smart helmets), portable devices, smart terminals, smart home appliances, and security monitoring devices.
  • wearable devices such as smart glasses, smart helmets
  • portable devices such as smart terminals, smart home appliances, and security monitoring devices.
  • This embodiment provides a human-computer interaction method, and the method includes the following steps:
  • the microphone in the human-machine interaction device acquires a voice signal, and the camera acquires a lip-reading image in real time;
  • the human-machine interaction device processes the sequence formed by the acquired lip-reading image to obtain lip-moving feature data
  • the human-machine interaction device fuses the lip motion feature data and the voice feature data extracted from the voice signal to identify the input voice, wherein the microphone acquires the voice signal but is invalidated from the sequence formed by the lip-read image acquired by the camera. Controlling the microphone into the Detect when the lip characterization data Listen to the status and control the camera to stop working.
  • the microphone after the control microphone enters the listening state and the control camera stops working, the microphone also detects whether there is valid voice input. If a valid voice input is detected, the working state is started, and the camera starts to work.
  • This embodiment provides a human-machine interaction device. As shown in FIG. 1, the interaction device includes the following parts.
  • the microphone 11 acquires a voice signal and activates the camera when a valid voice input is detected.
  • the microphone 11 detects the sound source and converts the natural voice into a voltage or current signal, and when the voltage or current signal is greater than or not less than the set threshold, it is considered that a valid voice input is detected.
  • the camera 12 acquires a lip-reading image in real time according to the control of the microphone 11;
  • the lip-reading image processing module 13 processes the sequence formed by the acquired lip-reading image to obtain lip-moving feature data
  • the voice processing module 14 processes the voice signal to obtain voice feature data.
  • the fusion recognition module 15 fuses the lip motion feature data and the voice feature data to recognize the input voice.
  • the trained model library is used to perform fusion recognition on the lip motion feature data and the voice feature data.
  • the above device may also adopt a feedback mechanism of lip reading, in which case a control module needs to be added, the module acquires a voice signal in the microphone, but the lip-reading image processing module obtains an invalid lip motion from the sequence formed by the acquired lip-reading image.
  • the control microphone 11 enters the listening state, and the camera is controlled. 12 stopped working.
  • the lip-reading image processing module 13, the voice processing module 14, and the fusion recognition module 15 are also controlled to stop working, thereby reducing the power consumption of the device.
  • the microphone 11 can detect whether there is valid voice input. If a valid voice input is detected, the working state is entered, and the camera 12, the lip-reading image processing module 13, and the voice processing module 14 are activated. And the fusion identification module 15 works normally. Such a scheme not only improves the reliability of speech recognition in a noisy environment, but also reduces the power consumption of the device.
  • control module may further reserve the microphone according to the user instruction to acquire the voice signal, and notify the camera 12 to cancel the acquisition of the lip-reading image. That is to say, the control module can select the voice recognition mode according to the user instruction, for example, the voice recognition is performed by using the microphone 11 alone, or the voice recognition can be performed by the camera 12 alone or in two ways.
  • the above devices can be built into any of the following devices:
  • Wearable devices portable devices, smart terminals, smart home appliances, security monitoring devices.
  • the microphone 11 and the camera 12 are optionally arranged on the same side of the device, for example, the camera 12 is mounted at the microphone of the headset, and the other parts can be mounted on the smart machine.
  • This embodiment provides a human-machine interaction device, including the following parts.
  • a lip reading image processing module processing the sequence formed by the acquired lip reading image to obtain lip motion characteristic data
  • the voice processing module processes the voice signal to obtain voice feature data.
  • the fusion recognition module combines the lip motion feature data and the voice feature data to recognize the input voice.
  • the trained model library is used to perform fusion recognition on the lip motion feature data and the voice feature data.
  • the control module acquires a voice signal in the microphone, but when the lip-reading image processing module obtains invalid lip-motion feature data from the acquired lip-reading image (ie, the lip-moving feature data cannot be recognized), the microphone is controlled to enter the listening state. , control the camera to stop working.
  • control module may further reserve the microphone according to the user instruction to acquire the voice signal, and notify the camera to cancel the acquisition of the lip-reading image. That is to say, the control module can select the voice recognition mode according to the user instruction, for example, the microphone is used for voice recognition alone, or the camera can be used for voice recognition alone or in two ways.
  • the above microphone can start the camera work when there is effective voice input to reduce the power consumption of the device.
  • the microphone detects the sound source and converts the natural voice into an electrical signal, and when the electrical signal is greater than or not less than the set threshold, it is considered that a valid voice input is detected.
  • the above devices can be built into any of the following devices:
  • Wearable devices portable devices, smart terminals, smart home appliances, security monitoring devices.
  • the microphone and the camera are optionally arranged on the same side of the device, for example, the camera is assembled at the microphone of the headset, and the other parts can be assembled on the smart machine.
  • the technical solution of the present application combines lip reading and voice in a noisy environment, and adopts traditional Compared with the technology of single speech feature data recognition, the speech recognition is improved, the machine recognition rate is improved, and the camera operation is started when the valid speech input is confirmed, and the power consumption of the device is greatly reduced.

Abstract

L'invention concerne un dispositif et un procédé d'interaction humain-machine, un programme d'ordinateur correspondant, et un support pour le programme d'ordinateur. Le procédé comprend : tandis qu'un microphone dans un dispositif d'interaction humain-machine se trouve dans un processus d'acquisition d'un signal vocal, si une saisie vocale valide est détectée, alors une caméra dans le dispositif d'interaction humain-machine est activée pour acquérir en temps réel des images de lecture labiale; le dispositif d'interaction humain-machine traite une séquence formée par les images recueillies de lecture labiale pour acquérir des données de caractéristique de lecture labiale; et le dispositif d'interaction humain-machine fusionne les données de caractéristique de lecture labiale et les données de caractéristique vocale extraites du signal vocal pour identifier des paroles saisies. La solution technique de la présente demande améliore efficacement la reconnaissance vocale et augmente le taux de reconnaissance de machine.
PCT/CN2014/089020 2014-09-03 2014-10-21 Dispositif et procédé d'interaction humain-machine WO2015154419A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410446967.2 2014-09-03
CN201410446967.2A CN105389097A (zh) 2014-09-03 2014-09-03 一种人机交互装置及方法

Publications (1)

Publication Number Publication Date
WO2015154419A1 true WO2015154419A1 (fr) 2015-10-15

Family

ID=54287187

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/089020 WO2015154419A1 (fr) 2014-09-03 2014-10-21 Dispositif et procédé d'interaction humain-machine

Country Status (2)

Country Link
CN (1) CN105389097A (fr)
WO (1) WO2015154419A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108319912A (zh) * 2018-01-30 2018-07-24 歌尔科技有限公司 一种唇语识别方法、装置、系统和智能眼镜
CN112053690A (zh) * 2020-09-22 2020-12-08 湖南大学 一种跨模态多特征融合的音视频语音识别方法及系统
CN112541956A (zh) * 2020-11-05 2021-03-23 北京百度网讯科技有限公司 动画合成方法、装置、移动终端和电子设备

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107452381B (zh) * 2016-05-30 2020-12-29 中国移动通信有限公司研究院 一种多媒体语音识别装置及方法
CN108227903B (zh) * 2016-12-21 2020-01-10 深圳市掌网科技股份有限公司 一种虚拟现实语言交互系统与方法
CN107293300A (zh) * 2017-08-01 2017-10-24 珠海市魅族科技有限公司 语音识别方法及装置、计算机装置及可读存储介质
CN107679449B (zh) * 2017-08-17 2018-08-03 平安科技(深圳)有限公司 嘴唇动作捕捉方法、装置及存储介质
US11836592B2 (en) * 2017-12-15 2023-12-05 International Business Machines Corporation Communication model for cognitive systems
CN108154140A (zh) * 2018-01-22 2018-06-12 北京百度网讯科技有限公司 基于唇语的语音唤醒方法、装置、设备及计算机可读介质
CN111326152A (zh) * 2018-12-17 2020-06-23 南京人工智能高等研究院有限公司 语音控制方法及装置
WO2020172828A1 (fr) * 2019-02-27 2020-09-03 华为技术有限公司 Procédé, appareil et dispositif de séparation de source sonore
CN110111783A (zh) * 2019-04-10 2019-08-09 天津大学 一种基于深度神经网络的多模态语音识别方法
CN110335600A (zh) * 2019-07-09 2019-10-15 四川长虹电器股份有限公司 家电设备的多模态交互方法及系统
CN110765868A (zh) * 2019-09-18 2020-02-07 平安科技(深圳)有限公司 唇读模型的生成方法、装置、设备及存储介质
CN111063354B (zh) * 2019-10-30 2022-03-25 云知声智能科技股份有限公司 人机交互方法及装置
CN111190484B (zh) * 2019-12-25 2023-07-21 中国人民解放军军事科学院国防科技创新研究院 一种多模态交互系统和方法
CN111312217A (zh) * 2020-02-28 2020-06-19 科大讯飞股份有限公司 语音识别方法、装置、设备及存储介质
CN111539270A (zh) * 2020-04-10 2020-08-14 贵州合谷信息科技有限公司 一种用于语音输入法的高识别率微表情识别方法
CN112908334A (zh) * 2021-01-31 2021-06-04 云知声智能科技股份有限公司 一种基于定向拾音的助听方法、装置及设备
CN114708642B (zh) * 2022-05-24 2022-11-18 成都锦城学院 商务英语仿真实训装置、系统、方法及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100189305A1 (en) * 2009-01-23 2010-07-29 Eldon Technology Limited Systems and methods for lip reading control of a media device
CN101937268A (zh) * 2009-06-30 2011-01-05 索尼公司 基于视觉唇形识别的设备控制
CN102023703A (zh) * 2009-09-22 2011-04-20 现代自动车株式会社 组合唇读与语音识别的多模式界面系统
CN103456303A (zh) * 2013-08-08 2013-12-18 四川长虹电器股份有限公司 一种语音控制的方法和智能空调系统

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7860718B2 (en) * 2005-12-08 2010-12-28 Electronics And Telecommunications Research Institute Apparatus and method for speech segment detection and system for speech recognition
CN102324035A (zh) * 2011-08-19 2012-01-18 广东好帮手电子科技股份有限公司 口型辅助语音识别术在车载导航中应用的方法及系统
CN103745723A (zh) * 2014-01-13 2014-04-23 苏州思必驰信息科技有限公司 一种音频信号识别方法及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100189305A1 (en) * 2009-01-23 2010-07-29 Eldon Technology Limited Systems and methods for lip reading control of a media device
CN101937268A (zh) * 2009-06-30 2011-01-05 索尼公司 基于视觉唇形识别的设备控制
CN102023703A (zh) * 2009-09-22 2011-04-20 现代自动车株式会社 组合唇读与语音识别的多模式界面系统
CN103456303A (zh) * 2013-08-08 2013-12-18 四川长虹电器股份有限公司 一种语音控制的方法和智能空调系统

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108319912A (zh) * 2018-01-30 2018-07-24 歌尔科技有限公司 一种唇语识别方法、装置、系统和智能眼镜
CN112053690A (zh) * 2020-09-22 2020-12-08 湖南大学 一种跨模态多特征融合的音视频语音识别方法及系统
CN112053690B (zh) * 2020-09-22 2023-12-29 湖南大学 一种跨模态多特征融合的音视频语音识别方法及系统
CN112541956A (zh) * 2020-11-05 2021-03-23 北京百度网讯科技有限公司 动画合成方法、装置、移动终端和电子设备

Also Published As

Publication number Publication date
CN105389097A (zh) 2016-03-09

Similar Documents

Publication Publication Date Title
WO2015154419A1 (fr) Dispositif et procédé d'interaction humain-machine
US9779725B2 (en) Voice wakeup detecting device and method
JP6230726B2 (ja) 音声認識装置および音声認識方法
US10109300B2 (en) System and method for enhancing speech activity detection using facial feature detection
KR102216048B1 (ko) 음성 명령 인식 장치 및 방법
JP6504808B2 (ja) 撮像装置、音声コマンド機能の設定方法、コンピュータプログラム、及び記憶媒体
WO2018210219A1 (fr) Procédé et système d'interaction homme-ordinateur face à un dispositif
WO2018049782A1 (fr) Procédé, dispositif et système de commande d'appareil électroménager, et climatiseur intelligent
KR101501183B1 (ko) 단일 및 다수 발언자용 이중 모드 agc
US11699442B2 (en) Methods and systems for speech detection
US20150279369A1 (en) Display apparatus and user interaction method thereof
US11423896B2 (en) Gaze-initiated voice control
WO2021184549A1 (fr) Écouteur monophonique, dispositif électronique intelligent, procédé et support lisible par ordinateur
US20180009118A1 (en) Robot control device, robot, robot control method, and program recording medium
CN110730115B (zh) 语音控制方法及装置、终端、存储介质
CN111131601B (zh) 一种音频控制方法、电子设备、芯片及计算机存储介质
US9516429B2 (en) Hearing aid and method for controlling hearing aid
WO2017219450A1 (fr) Procédé et dispositif de traitement d'informations, et terminal mobile
JP2006251266A (ja) 視聴覚連携認識方法および装置
JP5797009B2 (ja) 音声認識装置、ロボット、及び音声認識方法
KR20210011146A (ko) 비음성 웨이크업 신호에 기반한 서비스 제공 장치 및 그 방법
WO2022199405A1 (fr) Procédé et appareil de commande vocale
CN113643707A (zh) 一种身份验证方法、装置和电子设备
CN104423992A (zh) 显示器语音辨识的启动方法
KR102265874B1 (ko) 멀티모달 기반 사용자 구별 방법 및 장치

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14888851

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14888851

Country of ref document: EP

Kind code of ref document: A1