WO2020073403A1 - Silent voice input identification method, computing apparatus, and computer-readable medium - Google Patents

Silent voice input identification method, computing apparatus, and computer-readable medium Download PDF

Info

Publication number
WO2020073403A1
WO2020073403A1 PCT/CN2018/114608 CN2018114608W WO2020073403A1 WO 2020073403 A1 WO2020073403 A1 WO 2020073403A1 CN 2018114608 W CN2018114608 W CN 2018114608W WO 2020073403 A1 WO2020073403 A1 WO 2020073403A1
Authority
WO
WIPO (PCT)
Prior art keywords
mouth
user
silent
input
feature
Prior art date
Application number
PCT/CN2018/114608
Other languages
French (fr)
Chinese (zh)
Inventor
喻纯
孙科
史元春
Original Assignee
清华大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 清华大学 filed Critical 清华大学
Publication of WO2020073403A1 publication Critical patent/WO2020073403A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Definitions

  • the present invention relates generally to lip language input technology, and in particular to lip speech input recognition methods, devices, and computer-readable media.
  • Silent Speech Input Silent Speech Input
  • Silent voice input refers to the following input interaction method with the computing device.
  • the user can communicate with the computing device through voice, but the user does not actually make a voice, but only makes the mouth shape corresponding to the said content.
  • Silent voice input is very suitable for occasions such as meetings that are not suitable for making voices and are not convenient for long-term input with fingers, and have very good privacy.
  • a device that supports silent voice input recognizes what the user says by capturing signals (or images) generated by the user's mouth movement through one or more specific sensors (such as myoelectric sensors, cameras, etc.).
  • the device we are targeting is a device and setting that captures and recognizes the image sequence of the user's moving mouth through a camera (this patent is concerned with the specific capture method, any method can be used, the camera is an important method).
  • the smartphone, computer, or head-mounted device the user issues a voice command or content in the form of silent voice, and the camera on the device recognizes the command or content, and then the computing device makes corresponding responses and feedback.
  • One of the key issues is how the computing device determines whether the user is actually performing silent voice input, rather than the user's mouth performing other natural movements or making voiced voices. .
  • a device that supports silent voice input captures a signal generated by the user's mouth movement through one or more specific sensors, and analyzes the signal to identify what the user says.
  • the main focus is on how to process the mouth motion signal to recognize the content spoken by the user, but there is no technology for the computing device to judge whether the user is actually performing silent voice input.
  • the inventor of the present invention believes that humans have various mouth movements, such as chewing, yawning, and unconscious mouth movements, such as skimming, etc. If these mouth movements are directly used to recognize speech input, it will cause very large errors, so Distinguishing these mouth movements from speech input is a prerequisite for accurate recognition of speech input.
  • this article proposes a technique for computing devices to determine whether the user is actually performing silent voice input, rather than the user's mouth performing other natural movements or vocal voices.
  • the device we are targeting is a device and setting that captures and recognizes the image sequence of the user's moving mouth through a camera (this patent is concerned with the specific capture method, any method can be used, the camera is an important method).
  • the smartphone, computer, or head-mounted device the user issues a voice command or content in the form of silent voice, and the camera on the device recognizes the command or content, and then the computing device makes corresponding responses and feedback.
  • a silent speech input recognition method which includes: obtaining a user motion mouth feature sequence; using a pre-trained mouth motion discriminator to determine whether the user motion mouth feature sequence is performing language Input or perform other mouth movements; determine whether the user's movement mouth feature sequence is representative of performing language input, determine whether the user is performing silent language input; and determine that the user is performing silent language input, perform Recognition of silent language input.
  • the motion mouth feature sequence is extracted from the motion mouth image sequence captured by the myoelectric sensor.
  • the motion mouth feature sequence is extracted from the motion mouth image sequence captured by the camera.
  • the image data of the moving mouth is one or a combination of RGB data, structured light, infrared point cloud data, and depth point cloud data.
  • the moving mouth image sequence is obtained as follows: identifying the user's face position based on machine learning and extracting the user's facial feature points, and acquiring the real-time image of the user's mouth through the feature points.
  • the user motion mouth feature sequence input to the mouth motion discriminator includes at least three feature data pieces that identify three states: the first feature data piece is a feature data piece that characterizes the beginning of movement of the mouth, and the second feature data
  • the slice is a piece of characteristic data characterizing continuous movement of the mouth, and the third piece of characteristic data is a piece of characteristic data characterizing the stoppage of the mouth.
  • the discriminator is a two-classifier, which is obtained by training using machine learning methods based on the collected user data.
  • determining whether the user is performing silent language input includes: according to a predetermined mouth feature and sound signal under silent language input The matching model between them determines the degree of matching between the mouth feature sequence and the sound signal sequence, and if the degree of matching exceeds a predetermined threshold, determines that the user is performing silent language input.
  • the silent voice input recognition method further includes: after recognizing the input content of the silent language, responding with the recognized instruction or content.
  • a computing device including: a sensor capable of capturing a motion signal for a mouth; a controller and a memory, on which a computer-executable instruction is stored, when the computer-executable instruction is When executed by the controller, it is operable to perform the aforementioned silent speech input recognition method.
  • a computer-readable storage medium on which computer-executable instructions are stored, and when the computer-executable instructions are executed by a computer, are operable to perform the aforementioned silent speech input recognition method.
  • the computing device first determines whether the user is actually performing silent voice input, rather than the user's mouth performing other natural movements or making voiced speech, thereby filtering out irrelevant input, It can improve the recognition accuracy of silent voice input content.
  • FIG. 1 shows an overall flowchart of a computer-implemented silent speech input recognition method 1000 according to an embodiment of the present invention.
  • FIG. 2 shows a schematic diagram of the operation and signal flow of hardware and / or software modules according to an embodiment of the present invention.
  • Silent speech input refers to the input behavior of making a speech but not speaking, some people call it "lip language”.
  • FIG. 1 shows an overall flowchart of a computer-implemented silent speech input recognition method 1000 according to an embodiment of the present invention.
  • step S1100 a feature sequence of the mouth of the user's movement is obtained.
  • the feature sequence of the mouth movement of the user here may be any feature sequence depicting the movement of the mouth of the user.
  • it may be a feature sequence extracted from a moving mouth image sequence captured by a camera.
  • the image data can be one or a combination of RGB data, structured light data, infrared point cloud data, and depth point cloud data.
  • a sequence of moving mouth images may be obtained, for example, by recognizing the position of the user's face based on machine learning and extracting user facial feature points, and acquiring real-time images of the user's mouth through the feature points .
  • step S1200 a pre-trained mouth motion discriminator is used to determine whether the user motion mouth feature sequence is performing language input or other mouth motion.
  • the user motion mouth feature sequence input to the mouth motion discriminator includes at least three feature data pieces that identify three states: the first feature data piece is a feature data piece that characterizes the beginning of movement of the mouth, and the second feature The data piece is a characteristic data piece characterizing continuous movement of the mouth, and the third characteristic data piece is a characteristic data piece characterizing stoppage of the mouth.
  • the mouth motion discriminator extracts the user's mouth motion sequence from the input user's mouth image sequence, specifically, based on the mouth feature points and image information Determine which of the following four states (1) the mouth starts to move (2) the mouth continues to move (3) the mouth stops moving (4) others.
  • the result of the operation of extracting the user's mouth motion sequence is to obtain the mouth image sequence from state (1) to state (3).
  • the discriminator needs to collect user data and use machine learning to train the model and recognize it.
  • the discriminator is a two-classifier, which is trained based on the collected user data using machine learning methods. Determine whether the mouth movement is speaking a natural language, rather than the confusion with mouth movements in other situations. Confusions include but are not limited to: users eating, yawning, unconscious movements, etc. The discriminator needs to collect user data and use machine learning to train the model and recognize it.
  • step S1300 when it is determined that the user's motion mouth feature sequence is indicative of performing language input, it is determined whether the user is performing silent language input.
  • determining whether the user is performing silent language input includes: determining a match between the mouth feature sequence and the sound signal sequence according to a predetermined matching model between the mouth feature and the sound signal in the case of silent language input Degree, and if the degree of matching exceeds a predetermined threshold, it is determined that the user is performing silent language input.
  • the input is the mouth motion image sequence and the human voice signal collected by the microphone in the same interval
  • the output is the matching degree p of the two segments of the signal. If p is greater than a certain threshold, the segment of the mouth is determined
  • the moving image sequence is a voiced sequence, that is, the user is performing voiced voice input. Otherwise, it is determined that the user is indeed performing silent voice input.
  • the determinator needs to collect user data and use machine learning to train the model and identify it.
  • step S1400 when it is determined that the user is performing silent language input, recognition of the silent language input content is performed.
  • a device that supports silent voice input (such as a mobile phone, a tablet, etc.) is captured by one or more specific sensors (such as myoelectric sensors, cameras, etc.)
  • the signal (or image) generated by the movement of the user's mouth identifies what the user said.
  • the computing device captures and recognizes a sequence of images of the user's moving mouth through a camera.
  • the user issues a voice command or content in the form of silent voice, and the camera on the device recognizes the command or content, and then the computing device makes corresponding responses and feedback.
  • FIG. 2 shows a schematic diagram of the operation and signal flow of hardware and / or software modules according to an embodiment of the present invention.
  • the camera 104 acquires the image sequence of the user 102 in real time, and the image information may include but is not limited to RGB data, structured light or infrared point cloud data, and depth point cloud data.
  • Face recognition module use machine learning and computer vision to identify the user's face position and extract user's facial feature points, obtain real-time images of the user's mouth through the feature points, image information can still include but not limited to RGB and point cloud data .
  • Extract user's mouth motion sequence module An example is a discriminator, which is based on the mouth feature points and image information to determine which of the following four states (1) mouth movement (2) mouth continuous movement (3) mouth The department stops moving (4) others.
  • the output of this module is a sequence of mouth images from state (1) to state (3).
  • the discriminator needs to collect user data and use machine learning to train the model and recognize it.
  • Detect whether the mouth motion is a language input module an example is a two-classifier, which determines whether the mouth motion is speaking a natural language, but not other words, based on the mouth motion image sequence output by the user's mouth motion sequence module 108 Confusing situation with mouth movements under the circumstances. Confusions include but are not limited to: users eating, yawning, unconscious movements, etc.
  • the discriminator needs to collect user data and use machine learning to train the model and recognize it.
  • the classifier needs to collect user data and use machine learning to train the model and recognize it.
  • the input is to extract the mouth motion image sequence output by the user's mouth motion sequence module 108 and the human voice signal collected by the microphone in the same interval, and the output is the matching degree p of these two signals, if p is greater than a certain Threshold, it is determined that this sequence of mouth moving images is a voiced sequence, that is, the user is performing voiced voice input. Otherwise, it is determined that the user is indeed performing silent voice input.
  • This module needs to collect user data and use machine learning to train the model and identify it.
  • the final recognition model recognizes the instructions or content issued by the user.
  • a computing device including: a sensor capable of capturing a motion signal for a mouth; a controller and a memory, on which a computer-executable instruction is stored, when the computer-executable instruction is When executed by the controller, it is operable to perform the aforementioned silent speech input recognition method.
  • a computer-readable storage medium on which computer-executable instructions are stored, and when the computer-executable instructions are executed by a computer, are operable to perform the aforementioned silent speech input recognition method.
  • a silent speech input recognition method including: a user motion mouth feature sequence obtaining component to obtain a user motion mouth feature sequence; detecting whether the mouth motion is a language input module, using pre-training Mouth motion discriminator to determine whether the user's movement mouth feature sequence is performing language input or other mouth movements; silent language input judgment is determining that the user's movement mouth feature sequence is indicative of language input In this case, it is determined whether the user is performing mute language input; the mute language input content recognition module, when it is determined that the user is performing mute language input, recognizes the mute language input content.
  • the computing device first determines whether the user is actually performing silent voice input, rather than the user's mouth performing other natural movements or making voiced speech, thereby filtering out irrelevant input, It can improve the recognition accuracy of silent voice input content.

Landscapes

  • Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • User Interface Of Digital Computer (AREA)
  • Image Analysis (AREA)

Abstract

A silent voice input identification method, a computing apparatus, and a computer-readable medium. The silent voice input identification method comprises: obtaining a user movement mouth feature sequence; determining, by using a pretrained mouth movement discriminator, whether the user movement mouth feature sequence performs language input or other mouth movements; when it is determined that the user movement mouth feature sequence represents that language input is carried out, determining whether a user carries out silent language input or not; and when it is determined that the user performs silent language input, identifying the content of the silent language input. It is firstly determined whether a user performs silent voice input truly instead of the user's mouth performing other natural movements or is pseaking emitting sound, and therefore the accuracy rate of identification of the content of silent voice input can be improved by filtering out irrelevant input.

Description

静默语音输入辨识方法、计算装置和计算机可读介质Silent speech input recognition method, computing device and computer readable medium 技术领域Technical field
本发明总体地涉及唇语输入技术,特别是涉及唇语语音输入辨识方法、装置和计算机可读介质。The present invention relates generally to lip language input technology, and in particular to lip speech input recognition methods, devices, and computer-readable media.
背景技术Background technique
随着机器学习技术的发展以及计算设备性能的提升,静默语音输入(Silent Speech Input)成为一种有潜力的用户输入交互方式。With the development of machine learning technology and the improvement of computing device performance, Silent Speech Input (Silent Speech Input) has become a potential user input interaction method.
静默语音输入指的是如下与计算设备的输入交互方式,用户可以通过语音与计算设备进行通信,但用户并不真正发出语音,而只做出所说内容对应的嘴型。Silent voice input refers to the following input interaction method with the computing device. The user can communicate with the computing device through voice, but the user does not actually make a voice, but only makes the mouth shape corresponding to the said content.
静默语音输入非常适合于在开会等不适合出声也不方便长时间利用手指进行输入的场合,具有非常好的隐秘性。Silent voice input is very suitable for occasions such as meetings that are not suitable for making voices and are not convenient for long-term input with fingers, and have very good privacy.
一个支持静默语音输入的设备通过某种或多种特定的传感器(如肌电传感器,摄像头等)捕捉由用户嘴部运动产生的信号(或图像)来识别用户说出的内容。A device that supports silent voice input recognizes what the user says by capturing signals (or images) generated by the user's mouth movement through one or more specific sensors (such as myoelectric sensors, cameras, etc.).
在本文中,我们针对的设备是通过摄像头(这个专利关心具体的捕获方式,通过任何方式都可以,摄像头是一种重要的方式)捕捉用户运动嘴部图像序列并进行识别的装置和设定。例如,在使用智能手机,电脑,或头戴装置时,用户通过静默语音的形式发出语音指令或内容,设备上的摄像头识别该指令或内容,然后计算设备做出相应的反应和反馈。In this article, the device we are targeting is a device and setting that captures and recognizes the image sequence of the user's moving mouth through a camera (this patent is concerned with the specific capture method, any method can be used, the camera is an important method). For example, when using a smartphone, computer, or head-mounted device, the user issues a voice command or content in the form of silent voice, and the camera on the device recognizes the command or content, and then the computing device makes corresponding responses and feedback.
其中的一个关键问题是计算设备如何判断用户是否真的在进行静默语音输入,而不是用户的嘴部在进行其他的自然运动或者发出声音的语音。。One of the key issues is how the computing device determines whether the user is actually performing silent voice input, rather than the user's mouth performing other natural movements or making voiced voices. .
发明内容Summary of the invention
一个支持静默语音输入的设备通过某种或多种特定的传感器捕捉由用户嘴部运动产生的信号,对该信号进行分析来识别用户说出的内容。A device that supports silent voice input captures a signal generated by the user's mouth movement through one or more specific sensors, and analyzes the signal to identify what the user says.
现有技术中,主要关注于如何处理嘴部运动信号来识别用户说出的内容,而尚不存在计算设备判断用户是否真的在进行静默语音输入的技术。In the prior art, the main focus is on how to process the mouth motion signal to recognize the content spoken by the user, but there is no technology for the computing device to judge whether the user is actually performing silent voice input.
本发明的发明人认为,人类有各种嘴部运动,如咀嚼、打哈欠、无意识的嘴部运动如撇嘴等,如果直接就这些嘴部运动来识别语音输入,会引发非常大的错误,因此将这些嘴部运动与语音输入区分开是准确识别语音输入的前提。The inventor of the present invention believes that humans have various mouth movements, such as chewing, yawning, and unconscious mouth movements, such as skimming, etc. If these mouth movements are directly used to recognize speech input, it will cause very large errors, so Distinguishing these mouth movements from speech input is a prerequisite for accurate recognition of speech input.
为此,本文提出了一种技术,供计算设备判断用户是否真的在进行静默语音输入,而不是用户的嘴部在进行其他的自然运动或者发出声音的语音。To this end, this article proposes a technique for computing devices to determine whether the user is actually performing silent voice input, rather than the user's mouth performing other natural movements or vocal voices.
在本文中,我们针对的设备是通过摄像头(这个专利关心具体的捕获方式,通过任何方式都可以,摄像头是一种重要的方式)捕捉用户运动嘴部图像序列并进行识别的装置和设定。例如,在使用智能手机,电脑,或头戴装置时,用户通过静默语音的形式发出语音指令或内容,设备上的摄像头识别该指令或内容,然后计算设备做出相应的反应和反馈。In this article, the device we are targeting is a device and setting that captures and recognizes the image sequence of the user's moving mouth through a camera (this patent is concerned with the specific capture method, any method can be used, the camera is an important method). For example, when using a smartphone, computer, or head-mounted device, the user issues a voice command or content in the form of silent voice, and the camera on the device recognizes the command or content, and then the computing device makes corresponding responses and feedback.
鉴于上述情况,提出了本发明。In view of the above circumstances, the present invention has been proposed.
根据本发明的一个方面,提供了一种静默语音输入辨识方法,包括:获得用户运动嘴部特征序列;利用预先训练的嘴部运动判别器来判断所述用户运动嘴部特征序列是在进行语言输入还是进行其它嘴部运动;在判断所述用户运动嘴部特征序列是表征在进行语言输入的情况下,判断用户是否在进行静音语言输入;在判定用户在进行静音语言输入的情况下,进行静音语言输入内容的识别。According to one aspect of the present invention, a silent speech input recognition method is provided, which includes: obtaining a user motion mouth feature sequence; using a pre-trained mouth motion discriminator to determine whether the user motion mouth feature sequence is performing language Input or perform other mouth movements; determine whether the user's movement mouth feature sequence is representative of performing language input, determine whether the user is performing silent language input; and determine that the user is performing silent language input, perform Recognition of silent language input.
可选的,运动嘴部特征序列是从肌电传感器捕获的运动嘴部图像序列提取的。Optionally, the motion mouth feature sequence is extracted from the motion mouth image sequence captured by the myoelectric sensor.
可选的,运动嘴部特征序列是从通过摄像头捕获的运动嘴部图像序列提取的。Optionally, the motion mouth feature sequence is extracted from the motion mouth image sequence captured by the camera.
可选的,运动嘴部图像数据为RGB数据、结构光、红外点云数据、深度点云数据中的一种或组合。Optionally, the image data of the moving mouth is one or a combination of RGB data, structured light, infrared point cloud data, and depth point cloud data.
可选的,运动嘴部图像序列是如下获得的:基于机器学习识别用户人脸位置并提取用户面部特征点,以及通过特征点获取用户嘴部的实时图像。Optionally, the moving mouth image sequence is obtained as follows: identifying the user's face position based on machine learning and extracting the user's facial feature points, and acquiring the real-time image of the user's mouth through the feature points.
可选的,输入嘴部运动判别器的用户运动嘴部特征序列至少包括标识三个状态的三个特征数据片:第一特征数据片为表征嘴部开始运动的特征数据片,第二特征数据片为表征嘴部持续运动的特征数据片,第三特征数据片为表征嘴部停止运动的特征数据片。Optionally, the user motion mouth feature sequence input to the mouth motion discriminator includes at least three feature data pieces that identify three states: the first feature data piece is a feature data piece that characterizes the beginning of movement of the mouth, and the second feature data The slice is a piece of characteristic data characterizing continuous movement of the mouth, and the third piece of characteristic data is a piece of characteristic data characterizing the stoppage of the mouth.
可选的,判别器为二分类器,是基于采集的用户数据使用机器学习方法训练得到的。Optionally, the discriminator is a two-classifier, which is obtained by training using machine learning methods based on the collected user data.
可选的,在判断所述用户运动嘴部特征序列是表征在进行语言输入的情况下,判断用户是否在进行静音语言输入包括:依据预定的在静音语言输入情况下的嘴部特征与声音信号之间的匹配模型,判定嘴部特征序列与声音信号序列之间的匹配程度,并在匹配程度超过预定阈值的情况下,判定用户在进行静音语言输入。Optionally, when judging that the user's motion mouth feature sequence is indicative of language input, determining whether the user is performing silent language input includes: according to a predetermined mouth feature and sound signal under silent language input The matching model between them determines the degree of matching between the mouth feature sequence and the sound signal sequence, and if the degree of matching exceeds a predetermined threshold, determines that the user is performing silent language input.
可选的,静默语音输入辨识方法还包括:在进行静音语言输入内容的识别之后,识别出的指令或内容来进行响应。Optionally, the silent voice input recognition method further includes: after recognizing the input content of the silent language, responding with the recognized instruction or content.
根据本发明的另一方面,提供了一种计算装置,包括:传感器,能够捕捉用于嘴部运动信号;控制器和存储器,存储器上存储有计算机可执行指令,当所述计算机可执行指令当被控制器执行时,可操作来执行前述静默语音输入辨识方法。According to another aspect of the present invention, there is provided a computing device including: a sensor capable of capturing a motion signal for a mouth; a controller and a memory, on which a computer-executable instruction is stored, when the computer-executable instruction is When executed by the controller, it is operable to perform the aforementioned silent speech input recognition method.
根据本发明的再一方面,提供了一种计算机可读存储介质,其上存储有计算机可执行指令,当所述计算机可执行指令当被计算机执行时,可操作来执行前述的静默语音输入辨识方法。According to still another aspect of the present invention, there is provided a computer-readable storage medium on which computer-executable instructions are stored, and when the computer-executable instructions are executed by a computer, are operable to perform the aforementioned silent speech input recognition method.
利用本发明的静默语音输入辨识方法,计算装置首先判断用户是否真的在进行静默语音输入,而不是用户的嘴部在进行其他的自然运动或者发出声音的语音,由此通过过滤掉无关输入,能够提高静默语音输入内容的识别准确率。Using the silent voice input recognition method of the present invention, the computing device first determines whether the user is actually performing silent voice input, rather than the user's mouth performing other natural movements or making voiced speech, thereby filtering out irrelevant input, It can improve the recognition accuracy of silent voice input content.
附图说明BRIEF DESCRIPTION
从下面结合附图对本发明实施例的详细描述中,本发明的这些和/或其它方面和优点将变得更加清楚并更容易理解,其中:These and / or other aspects and advantages of the present invention will become clearer and easier to understand from the following detailed description of the embodiments of the present invention with reference to the drawings, in which:
图1示出了根据本发明实施例的、计算机执行的静默语音输入辨识方法1000的总体流程图。FIG. 1 shows an overall flowchart of a computer-implemented silent speech input recognition method 1000 according to an embodiment of the present invention.
图2示出了根据本发明一个实施例的、硬件和/或软件模块的操作和信号流动示意图。2 shows a schematic diagram of the operation and signal flow of hardware and / or software modules according to an embodiment of the present invention.
具体实施方式detailed description
为了使本领域技术人员更好地理解本发明,下面结合附图和具体实施方式对本发明作进一步详细说明。In order to enable those skilled in the art to better understand the present invention, the present invention will be further described in detail below with reference to the drawings and specific embodiments.
在介绍之前,解释一下有关术语在本文中的含义。Before introducing, explain the meaning of relevant terms in this article.
静默语音输入,指嘴部做出说话动作,但不发声的输入行为,也有人称之为“唇语”。Silent speech input refers to the input behavior of making a speech but not speaking, some people call it "lip language".
图1示出了根据本发明实施例的、计算机执行的静默语音输入辨识方法1000的总体流程图。FIG. 1 shows an overall flowchart of a computer-implemented silent speech input recognition method 1000 according to an embodiment of the present invention.
在步骤S1100中,获得用户运动嘴部特征序列。In step S1100, a feature sequence of the mouth of the user's movement is obtained.
这里的用户运动嘴部特征序列,可以是描绘用户嘴部运动的任何特征序列。例如可以是从通过摄像头捕获的运动嘴部图像序列提取的特征序列,关于运动嘴部图像数据,基于采用的对应光源和/或摄像机(普通摄像头、结构光源、红外摄像设备、立体相机),获得的图像数据可以是RGB数据、结构光数据、红外点云数据、深度点云数据中的一种或组合。The feature sequence of the mouth movement of the user here may be any feature sequence depicting the movement of the mouth of the user. For example, it may be a feature sequence extracted from a moving mouth image sequence captured by a camera. Regarding the moving mouth image data, based on the corresponding corresponding light source and / or camera (general camera, structural light source, infrared camera device, stereo camera) used, The image data can be one or a combination of RGB data, structured light data, infrared point cloud data, and depth point cloud data.
在利用摄像头获得运动的嘴部的图像的情况下,可以例如如下获得运动嘴部图像序列:基于机器学习识别用户人脸位置并提取用户面部特征点,以及通过特征点获取用户嘴部的实时图像。In the case of obtaining an image of a moving mouth using a camera, a sequence of moving mouth images may be obtained, for example, by recognizing the position of the user's face based on machine learning and extracting user facial feature points, and acquiring real-time images of the user's mouth through the feature points .
在步骤S1200中,利用预先训练的嘴部运动判别器来判断所述用户运动嘴部特征序列是在进行语言输入还是进行其它嘴部运动。In step S1200, a pre-trained mouth motion discriminator is used to determine whether the user motion mouth feature sequence is performing language input or other mouth motion.
在一个示例中,输入嘴部运动判别器的用户运动嘴部特征序列至少包括标识三个状态的三个特征数据片:第一特征数据片为表征嘴部开始运动的特征数据片,第二特征数据片为表征嘴部持续运动的特征数据片,第三特征数据片为表征嘴部停止运动的特征数据片。In one example, the user motion mouth feature sequence input to the mouth motion discriminator includes at least three feature data pieces that identify three states: the first feature data piece is a feature data piece that characterizes the beginning of movement of the mouth, and the second feature The data piece is a characteristic data piece characterizing continuous movement of the mouth, and the third characteristic data piece is a characteristic data piece characterizing stoppage of the mouth.
例如,在运动嘴部特征序列为从用户嘴部图像提取的情况下,嘴部运动判别器从输入的用户嘴部图像序列提取用户嘴部运动序列,具体地,基于嘴 部特征点和图像信息判断当前是在以下哪四种状态(1)嘴部开始运动(2)嘴部持续运动(3)嘴部停止运动(4)其他。提取用户嘴部运动序列操作的结果为得到从状态(1)到状态(3)之间的嘴部图像序列。该判别器需要采集用户数据,并使用机器学习的方法训练模型并进行识别。For example, in the case where the motion mouth feature sequence is extracted from the user's mouth image, the mouth motion discriminator extracts the user's mouth motion sequence from the input user's mouth image sequence, specifically, based on the mouth feature points and image information Determine which of the following four states (1) the mouth starts to move (2) the mouth continues to move (3) the mouth stops moving (4) others. The result of the operation of extracting the user's mouth motion sequence is to obtain the mouth image sequence from state (1) to state (3). The discriminator needs to collect user data and use machine learning to train the model and recognize it.
判别器为二分类器,是基于采集的用户数据使用机器学习方法训练得到的。判断嘴部运动是否是正在说出一段自然语言,而不是其他情况下产生的带有嘴部运动的混淆情况。混淆情况包括但不限于:用户在吃饭,打哈欠,无意识运动等。该判别器需要采集用户数据,并使用机器学习的方法训练模型并进行识别。The discriminator is a two-classifier, which is trained based on the collected user data using machine learning methods. Determine whether the mouth movement is speaking a natural language, rather than the confusion with mouth movements in other situations. Confusions include but are not limited to: users eating, yawning, unconscious movements, etc. The discriminator needs to collect user data and use machine learning to train the model and recognize it.
在步骤S1300中,在判断所述用户运动嘴部特征序列是表征在进行语言输入的情况下,判断用户是否在进行静音语言输入。In step S1300, when it is determined that the user's motion mouth feature sequence is indicative of performing language input, it is determined whether the user is performing silent language input.
在一个示例中,判断用户是否在进行静音语言输入包括:依据预定的在静音语言输入情况下的嘴部特征与声音信号之间的匹配模型,判定嘴部特征序列与声音信号序列之间的匹配程度,并在匹配程度超过预定阈值的情况下,判定用户在进行静音语言输入。In one example, determining whether the user is performing silent language input includes: determining a match between the mouth feature sequence and the sound signal sequence according to a predetermined matching model between the mouth feature and the sound signal in the case of silent language input Degree, and if the degree of matching exceeds a predetermined threshold, it is determined that the user is performing silent language input.
具体地,在一个示例中,输入为嘴部运动图像序列和同区间麦克风收集到的人声音信号,输出为这两段信号的匹配程度p,若p大于某一阈值,则判定这段嘴部运动图像序列为有声序列,即用户在进行有声的语音输入。否则,则判定用户确实在进行静默语音输入。该判定器需要采集用户数据,并使用机器学习的方法训练模型并进行识别。Specifically, in an example, the input is the mouth motion image sequence and the human voice signal collected by the microphone in the same interval, and the output is the matching degree p of the two segments of the signal. If p is greater than a certain threshold, the segment of the mouth is determined The moving image sequence is a voiced sequence, that is, the user is performing voiced voice input. Otherwise, it is determined that the user is indeed performing silent voice input. The determinator needs to collect user data and use machine learning to train the model and identify it.
在步骤S1400中,在判定用户在进行静音语言输入的情况下,进行静音语言输入内容的识别。In step S1400, when it is determined that the user is performing silent language input, recognition of the silent language input content is performed.
这里对识别静音语言输入内容的技术没有限制,任何能够具体识别静音语言输入内容的技术都可以采用,无论是现有的,还有将来开发出来的技术。There is no limitation on the technology for recognizing the input content of the mute language. Any technology that can specifically recognize the input content of the mute language can be used, whether it is an existing one or a technology developed in the future.
关于本发明的静默语音输入技术的应用场景,一个示例未,一个支持静默语音输入的设备(如手机、平板电脑等)通过某种或多种特定的传感器(如肌电传感器,摄像头等)捕捉由用户嘴部运动产生的信号(或图像)来识别用户说出的内容。Regarding the application scenario of the silent voice input technology of the present invention, an example is not provided. A device that supports silent voice input (such as a mobile phone, a tablet, etc.) is captured by one or more specific sensors (such as myoelectric sensors, cameras, etc.) The signal (or image) generated by the movement of the user's mouth identifies what the user said.
一个更具体地例子中,计算设备是通过摄像头捕捉用户运动嘴部图像序列并进行识别。例如,在使用智能手机,电脑,或头戴装置时,用户通过静默语音的形式发出语音指令或内容,设备上的摄像头识别该指令或内容,然 后计算设备做出相应的反应和反馈。例如,用户唇语说出,“打开微信”,手机识别出后,即启动微信应用程序。In a more specific example, the computing device captures and recognizes a sequence of images of the user's moving mouth through a camera. For example, when using a smartphone, computer, or head-mounted device, the user issues a voice command or content in the form of silent voice, and the camera on the device recognizes the command or content, and then the computing device makes corresponding responses and feedback. For example, the user lip-speaks, "Open WeChat", and after the mobile phone recognizes it, the WeChat application is launched.
图2示出了根据本发明一个实施例的、硬件和/或软件模块的操作和信号流动示意图。2 shows a schematic diagram of the operation and signal flow of hardware and / or software modules according to an embodiment of the present invention.
102,104:摄像头104实时获取用户102的图像序列,图像信息可以包括但不限于RGB数据,结构光或红外点云数据,深度点云数据。102, 104: The camera 104 acquires the image sequence of the user 102 in real time, and the image information may include but is not limited to RGB data, structured light or infrared point cloud data, and depth point cloud data.
106人脸识别模块:使用机器学习和计算机视觉的方法识别用户人脸位置并提取用户面部特征点,通过特征点获取用户嘴部的实时图像,图像信息依然可以包括但不限于RGB和点云数据。106 Face recognition module: use machine learning and computer vision to identify the user's face position and extract user's facial feature points, obtain real-time images of the user's mouth through the feature points, image information can still include but not limited to RGB and point cloud data .
108提取用户嘴部运动序列模块:实例为一个判别器,基于嘴部特征点和图像信息判断当前是在以下哪四种状态(1)嘴部开始运动(2)嘴部持续运动(3)嘴部停止运动(4)其他。该模块的输出为从状态(1)到状态(3)之间的嘴部图像序列。该判别器需要采集用户数据,并使用机器学习的方法训练模型并进行识别。108 Extract user's mouth motion sequence module: An example is a discriminator, which is based on the mouth feature points and image information to determine which of the following four states (1) mouth movement (2) mouth continuous movement (3) mouth The department stops moving (4) others. The output of this module is a sequence of mouth images from state (1) to state (3). The discriminator needs to collect user data and use machine learning to train the model and recognize it.
110检测嘴部运动是否为语言输入模块:实例为一个二分类器,根据提取用户嘴部运动序列模块108输出的嘴部运动图像序列判断嘴部运动是否是正在说出一段自然语言,而不是其他情况下产生的带有嘴部运动的混淆情况。混淆情况包括但不限于:用户在吃饭,打哈欠,无意识运动等。该判别器需要采集用户数据,并使用机器学习的方法训练模型并进行识别。该分类器需要采集用户数据,并使用机器学习的方法训练模型并进行识别。110 Detect whether the mouth motion is a language input module: an example is a two-classifier, which determines whether the mouth motion is speaking a natural language, but not other words, based on the mouth motion image sequence output by the user's mouth motion sequence module 108 Confusing situation with mouth movements under the circumstances. Confusions include but are not limited to: users eating, yawning, unconscious movements, etc. The discriminator needs to collect user data and use machine learning to train the model and recognize it. The classifier needs to collect user data and use machine learning to train the model and recognize it.
112声音信号检测模块:输入为提取用户嘴部运动序列模块108输出的嘴部运动图像序列和同区间麦克风收集到的人声音信号,输出为这两段信号的匹配程度p,若p大于某一阈值,则判定这段嘴部运动图像序列为有声序列,即用户在进行有声的语音输入。否则,则判定用户确实在进行静默语音输入。该模块需要采集用户数据,并使用机器学习的方法训练模型并进行识别。112 sound signal detection module: the input is to extract the mouth motion image sequence output by the user's mouth motion sequence module 108 and the human voice signal collected by the microphone in the same interval, and the output is the matching degree p of these two signals, if p is greater than a certain Threshold, it is determined that this sequence of mouth moving images is a voiced sequence, that is, the user is performing voiced voice input. Otherwise, it is determined that the user is indeed performing silent voice input. This module needs to collect user data and use machine learning to train the model and identify it.
114最终的识别模型,识别用户发出的指令或内容。114 The final recognition model recognizes the instructions or content issued by the user.
根据本发明的另一方面,提供了一种计算装置,包括:传感器,能够捕捉用于嘴部运动信号;控制器和存储器,存储器上存储有计算机可执行指令,当所述计算机可执行指令当被控制器执行时,可操作来执行前述静默语 音输入辨识方法。According to another aspect of the present invention, there is provided a computing device including: a sensor capable of capturing a motion signal for a mouth; a controller and a memory, on which a computer-executable instruction is stored, when the computer-executable instruction is When executed by the controller, it is operable to perform the aforementioned silent speech input recognition method.
根据本发明的再一方面,提供了一种计算机可读存储介质,其上存储有计算机可执行指令,当所述计算机可执行指令当被计算机执行时,可操作来执行前述的静默语音输入辨识方法。According to still another aspect of the present invention, there is provided a computer-readable storage medium on which computer-executable instructions are stored, and when the computer-executable instructions are executed by a computer, are operable to perform the aforementioned silent speech input recognition method.
根据本发明的另一方面,提供了一种静默语音输入辨识方法,包括:用户运动嘴部特征序列获得部件,获得用户运动嘴部特征序列;检测嘴部运动是否为语言输入模块,利用预先训练的嘴部运动判别器来判断所述用户运动嘴部特征序列是在进行语言输入还是进行其它嘴部运动;静音语言输入判断,在判断所述用户运动嘴部特征序列是表征在进行语言输入的情况下,判断用户是否在进行静音语言输入;静音语言输入内容识别模块,在判定用户在进行静音语言输入的情况下,进行静音语言输入内容的识别。According to another aspect of the present invention, a silent speech input recognition method is provided, including: a user motion mouth feature sequence obtaining component to obtain a user motion mouth feature sequence; detecting whether the mouth motion is a language input module, using pre-training Mouth motion discriminator to determine whether the user's movement mouth feature sequence is performing language input or other mouth movements; silent language input judgment is determining that the user's movement mouth feature sequence is indicative of language input In this case, it is determined whether the user is performing mute language input; the mute language input content recognition module, when it is determined that the user is performing mute language input, recognizes the mute language input content.
利用本发明的静默语音输入辨识方法,计算装置首先判断用户是否真的在进行静默语音输入,而不是用户的嘴部在进行其他的自然运动或者发出声音的语音,由此通过过滤掉无关输入,能够提高静默语音输入内容的识别准确率。Using the silent voice input recognition method of the present invention, the computing device first determines whether the user is actually performing silent voice input, rather than the user's mouth performing other natural movements or making voiced speech, thereby filtering out irrelevant input, It can improve the recognition accuracy of silent voice input content.
以上已经描述了本发明的各实施例,上述说明是示例性的,并非穷尽性的,并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。因此,本发明的保护范围应该以权利要求的保护范围为准。The embodiments of the present invention have been described above. The above description is exemplary, not exhaustive, and is not limited to the disclosed embodiments. Many modifications and changes will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the illustrated embodiments. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims (11)

  1. 一种静默语音输入辨识方法,包括:A silent speech input recognition method, including:
    获得用户运动嘴部特征序列;Obtain the user's movement mouth feature sequence;
    利用预先训练的嘴部运动判别器来判断所述用户运动嘴部特征序列是在进行语言输入还是进行其它嘴部运动;Using a pre-trained mouth movement discriminator to determine whether the user's movement mouth feature sequence is performing language input or other mouth movements;
    在判断所述用户运动嘴部特征序列是表征在进行语言输入的情况下,判断用户是否在进行静音语言输入;When it is judged that the user's movement mouth feature sequence is indicative of language input, whether the user is performing silent language input is determined;
    在判定用户在进行静音语言输入的情况下,进行静音语言输入内容的识别。When it is determined that the user is performing silent language input, recognition of the silent language input content is performed.
  2. 根据权利要求1所述的静默语音输入辨识方法,所述运动嘴部特征序列是从肌电传感器捕获的运动嘴部图像序列提取的。According to the silent speech input recognition method of claim 1, the motion mouth feature sequence is extracted from the motion mouth image sequence captured by the myoelectric sensor.
  3. 根据权利要求1所述的静默语音输入辨识方法,所述运动嘴部特征序列是从通过摄像头捕获的运动嘴部图像序列提取的。According to the silent speech input recognition method of claim 1, the motion mouth feature sequence is extracted from a motion mouth image sequence captured by a camera.
  4. 根据权利要求3所述的静默语音输入辨识方法,所述运动嘴部图像数据为RGB数据、结构光、红外点云数据、深度点云数据中的一种或组合。The silent speech input recognition method according to claim 3, wherein the moving mouth image data is one or a combination of RGB data, structured light, infrared point cloud data, and depth point cloud data.
  5. 根据权利要求3所述的静默语音输入辨识方法,所述运动嘴部图像序列是如下获得的:According to the silent speech input recognition method of claim 3, the moving mouth image sequence is obtained as follows:
    基于机器学习识别用户人脸位置并提取用户面部特征点,以及通过特征点获取用户嘴部的实时图像。Recognize the position of the user's face based on machine learning and extract the user's facial feature points, and obtain the real-time image of the user's mouth through the feature points.
  6. 根据权利要求1所述的静默语音输入辨识方法,输入嘴部运动判别器的用户运动嘴部特征序列至少包括标识三个状态的三个特征数据片:第一特征数据片为表征嘴部开始运动的特征数据片,第二特征数据片为表征嘴部持续运动的特征数据片,第三特征数据片为表征嘴部停止运动的特征数据片。According to the silent speech input recognition method of claim 1, the user motion mouth feature sequence of the input mouth motion discriminator includes at least three feature data pieces identifying three states: the first feature data piece is used to characterize the mouth movement The second feature data piece is a feature data piece representing continuous movement of the mouth, and the third feature data piece is a feature data piece representing stopping movement of the mouth.
  7. 根据权利要求1所述的静默语音输入辨识方法,所述判别器为二分类器,是基于采集的用户数据使用机器学习方法训练得到的。According to the silent speech input recognition method of claim 1, the discriminator is a two-classifier, which is obtained by training using machine learning methods based on the collected user data.
  8. 根据权利要求1所述的静默语音输入辨识方法,在判断所述用户运动嘴部特征序列是表征在进行语言输入的情况下,判断用户是否在进行静音语言输入包括:According to the silent speech input recognition method of claim 1, when judging that the user's motion mouth feature sequence is indicative of performing language input, determining whether the user is performing silent language input includes:
    依据预定的在静音语言输入情况下的嘴部特征与声音信号之间的匹配模型,判定嘴部特征序列与声音信号序列之间的匹配程度,并在匹配程度超过预定阈值的情况下,判定用户在进行静音语言输入。According to the predetermined matching model between the mouth feature and the sound signal in the case of silent language input, determine the degree of matching between the mouth feature sequence and the sound signal sequence, and determine the user if the degree of matching exceeds a predetermined threshold Mute language input.
  9. 根据权利要求8所述的静默语音输入辨识方法,还包括:The silent speech input recognition method according to claim 8, further comprising:
    在进行静音语言输入内容的识别之后,识别出的指令或内容来进行响应。After recognizing the input content of the mute language, the recognized command or content responds.
  10. 一种计算装置,包括:A computing device, including:
    传感器,能够捕捉用于嘴部运动信号;The sensor can capture the movement signal for the mouth;
    控制器和存储器,存储器上存储有计算机可执行指令,当所述计算机可执行指令当被控制器执行时,可操作来执行权利要求1到8所述的静默语音输入辨识方法。A controller and a memory. Computer-executable instructions are stored on the memory. When the computer-executable instructions are executed by the controller, the computer-executable instructions are operable to perform the silent speech input recognition method according to claims 1 to 8.
  11. 一种计算机可读存储介质,其上存储有计算机可执行指令,当所述计算机可执行指令当被计算机执行时,可操作来执行权利要求1到8所述的静默语音输入辨识方法。A computer-readable storage medium having computer-executable instructions stored thereon, when the computer-executable instructions are executed by a computer, operable to perform the silent speech input recognition method of claims 1 to 8.
PCT/CN2018/114608 2018-10-08 2018-11-08 Silent voice input identification method, computing apparatus, and computer-readable medium WO2020073403A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811168994.2A CN109558788B (en) 2018-10-08 2018-10-08 Silence voice input identification method, computing device and computer readable medium
CN201811168994.2 2018-10-08

Publications (1)

Publication Number Publication Date
WO2020073403A1 true WO2020073403A1 (en) 2020-04-16

Family

ID=65864802

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/114608 WO2020073403A1 (en) 2018-10-08 2018-11-08 Silent voice input identification method, computing apparatus, and computer-readable medium

Country Status (2)

Country Link
CN (1) CN109558788B (en)
WO (1) WO2020073403A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220051676A1 (en) * 2020-08-14 2022-02-17 Lenovo (Singapore) Pte. Ltd. Headset boom with infrared lamp(s) and/or sensor(s)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110223711B (en) * 2019-06-03 2021-06-01 清华大学 Microphone signal based voice interaction wake-up electronic device, method, and medium
CN110865705B (en) * 2019-10-24 2023-09-19 中国人民解放军军事科学院国防科技创新研究院 Multi-mode fusion communication method and device, head-mounted equipment and storage medium
CN113160813B (en) * 2021-02-24 2022-12-27 北京三快在线科技有限公司 Method and device for outputting response information, electronic equipment and storage medium
CN113810819B (en) * 2021-09-23 2022-06-28 中国科学院软件研究所 Method and equipment for acquiring and processing silent voice based on ear cavity vibration
CN115857706B (en) * 2023-03-03 2023-06-06 浙江强脑科技有限公司 Character input method and device based on facial muscle state and terminal equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102023703A (en) * 2009-09-22 2011-04-20 现代自动车株式会社 Combined lip reading and voice recognition multimodal interface system
CN104808794A (en) * 2015-04-24 2015-07-29 北京旷视科技有限公司 Method and system for inputting lip language
CN105335755A (en) * 2015-10-29 2016-02-17 武汉大学 Media segment-based speaking detection method and system
WO2016148322A1 (en) * 2015-03-19 2016-09-22 삼성전자 주식회사 Method and device for detecting voice activity based on image information
CN106250829A (en) * 2016-07-22 2016-12-21 中国科学院自动化研究所 Digit recognition method based on lip texture structure
CN107358167A (en) * 2017-06-19 2017-11-17 西南科技大学 A kind of method of discrimination of yawning based on active infrared video

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101101752B (en) * 2007-07-19 2010-12-01 华中科技大学 Monosyllabic language lip-reading recognition system based on vision character
CN101950249B (en) * 2010-07-14 2012-05-23 北京理工大学 Input method and device for code characters of silent voice notes
US20140379351A1 (en) * 2013-06-24 2014-12-25 Sundeep Raniwala Speech detection based upon facial movements
DE112014007265T5 (en) * 2014-12-18 2017-09-07 Mitsubishi Electric Corporation Speech recognition device and speech recognition method
CN105912092B (en) * 2016-04-06 2019-08-13 北京地平线机器人技术研发有限公司 Voice awakening method and speech recognition equipment in human-computer interaction
CN108154140A (en) * 2018-01-22 2018-06-12 北京百度网讯科技有限公司 Voice awakening method, device, equipment and computer-readable medium based on lip reading
CN108537207B (en) * 2018-04-24 2021-01-22 Oppo广东移动通信有限公司 Lip language identification method, device, storage medium and mobile terminal

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102023703A (en) * 2009-09-22 2011-04-20 现代自动车株式会社 Combined lip reading and voice recognition multimodal interface system
WO2016148322A1 (en) * 2015-03-19 2016-09-22 삼성전자 주식회사 Method and device for detecting voice activity based on image information
CN104808794A (en) * 2015-04-24 2015-07-29 北京旷视科技有限公司 Method and system for inputting lip language
CN105335755A (en) * 2015-10-29 2016-02-17 武汉大学 Media segment-based speaking detection method and system
CN106250829A (en) * 2016-07-22 2016-12-21 中国科学院自动化研究所 Digit recognition method based on lip texture structure
CN107358167A (en) * 2017-06-19 2017-11-17 西南科技大学 A kind of method of discrimination of yawning based on active infrared video

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220051676A1 (en) * 2020-08-14 2022-02-17 Lenovo (Singapore) Pte. Ltd. Headset boom with infrared lamp(s) and/or sensor(s)
US11935538B2 (en) * 2020-08-14 2024-03-19 Lenovo (Singapore) Pte. Ltd. Headset boom with infrared lamp(s) and/or sensor(s)

Also Published As

Publication number Publication date
CN109558788A (en) 2019-04-02
CN109558788B (en) 2023-10-27

Similar Documents

Publication Publication Date Title
WO2020073403A1 (en) Silent voice input identification method, computing apparatus, and computer-readable medium
CN109192204B (en) Voice control method based on intelligent equipment camera and intelligent equipment
CN104361276B (en) A kind of multi-modal biological characteristic identity identifying method and system
TWI661363B (en) Smart robot and human-computer interaction method
CN103824481B (en) Method and device for detecting user recitation
JP5323770B2 (en) User instruction acquisition device, user instruction acquisition program, and television receiver
US20130054240A1 (en) Apparatus and method for recognizing voice by using lip image
TW201937344A (en) Smart robot and man-machine interaction method
WO2017219450A1 (en) Information processing method and device, and mobile terminal
JP2010256391A (en) Voice information processing device
US11062126B1 (en) Human face detection method
WO2020140840A1 (en) Method and apparatus for awakening wearable device
Patil et al. LSTM Based Lip Reading Approach for Devanagiri Script
Shinde et al. Real time two way communication approach for hearing impaired and dumb person based on image processing
JP2007199552A (en) Device and method for speech recognition
KR101187600B1 (en) Speech Recognition Device and Speech Recognition Method using 3D Real-time Lip Feature Point based on Stereo Camera
KR20220041891A (en) How to enter and install facial information into the database
JP6147198B2 (en) robot
US20140037150A1 (en) Information processing device
KR101950721B1 (en) Safety speaker with multiple AI module
KR20210066774A (en) Method and Apparatus for Distinguishing User based on Multimodal
JP7032284B2 (en) A device, program and method for estimating the activation timing based on the image of the user's face.
CN110653812B (en) Interaction method of robot, robot and device with storage function
JP6396813B2 (en) Program, apparatus and method for estimating learning items spent on learning from learning video
CN112567455A (en) Method and system for cleansing sound using depth information and computer readable medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18936269

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18936269

Country of ref document: EP

Kind code of ref document: A1