CN111091823A - Robot control system and method based on voice and human face actions and electronic equipment - Google Patents
Robot control system and method based on voice and human face actions and electronic equipment Download PDFInfo
- Publication number
- CN111091823A CN111091823A CN201911188246.5A CN201911188246A CN111091823A CN 111091823 A CN111091823 A CN 111091823A CN 201911188246 A CN201911188246 A CN 201911188246A CN 111091823 A CN111091823 A CN 111091823A
- Authority
- CN
- China
- Prior art keywords
- voice
- image
- action
- information
- collector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000009471 action Effects 0.000 title claims abstract description 71
- 238000000034 method Methods 0.000 title claims abstract description 15
- 230000000875 corresponding effect Effects 0.000 claims abstract description 15
- 230000008859 change Effects 0.000 claims abstract description 14
- 230000003993 interaction Effects 0.000 claims description 11
- 238000012790 confirmation Methods 0.000 claims description 6
- 230000008569 process Effects 0.000 abstract description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/24—Speech recognition using non-acoustical features
- G10L15/25—Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Manipulator (AREA)
Abstract
The invention discloses a robot control system based on voice and human face actions, which comprises a voice collector, a video collector, an algorithm analysis processor, a memory, a controller and a driving module, wherein the voice collector and the video collector are connected with the algorithm analysis processor, the memory and the controller are connected with the algorithm analysis processor, and the driving module is connected with the controller; the algorithm analysis processor is used for identifying, analyzing and processing the voice information from the voice collector and the image information from the video collector to obtain action control signals and sending the action control signals to the controller, so that the controller controls the driving module to drive the robot to execute corresponding actions according to the action control signals. The invention recognizes and processes the face action and the voice to know the action change trend of the person, thereby not only improving the recognition accuracy, but also simplifying the calculated amount of the algorithm.
Description
Technical Field
The invention relates to the technical field of voice video and image recognition, in particular to a robot control system and method based on voice and human face actions.
Background
At present, the domestic robot interaction methods acquire related information through voice recognition or by matching with conventional lip language actions so as to acquire execution instructions of the next action of the robot.
However, in the prior art, if the robot is controlled by simply relying on the voice recognition technology, the accuracy is not high, even if the robot is matched with lip language actions, the robot is only limited to simple and conventional lip actions, and the robot still cannot recognize the unusual and complex lip actions, so that the problem of low accuracy still exists. And the algorithm matched with the lip language action is too complex, and the operation processing is slow.
Disclosure of Invention
In order to overcome the defects of the prior art, one of the objectives of the present invention is to provide a robot control system based on voice and human face actions, which can solve the problems of complex algorithm, slow operation processing and low accuracy rate in the prior art.
The second purpose of the present invention is to provide a robot control method based on voice and human face actions, which can solve the problems of complex algorithm, slow operation processing and low accuracy rate in the prior art.
The invention also aims to provide electronic equipment which can solve the problems of complex algorithm, slow operation processing and low accuracy rate in the prior art.
One of the purposes of the invention is realized by adopting the following technical scheme:
the robot control system based on voice and human face actions comprises a voice collector, a video collector, an algorithm analysis processor, a memory, a controller and a driving module, wherein the voice collector and the video collector are connected with the algorithm analysis processor, the memory and the controller are connected with the algorithm analysis processor, and the driving module is connected with the controller;
the algorithm analysis processor is used for identifying, analyzing and processing the voice information from the voice collector and the image information from the video collector to obtain action control signals and sending the action control signals to the controller, so that the controller controls the driving module to drive the robot to execute corresponding actions according to the action control signals.
Preferably, the system further comprises a voice player, and the voice player is connected with the algorithm analysis processor.
Preferably, the system further comprises a human-computer interaction device connected with the algorithmic analysis processor, wherein the algorithmic analysis processor is used for displaying the motion control signal to the human-computer interaction device and receiving a confirmation signal from the human-computer interaction device so as to store the motion control signal in the memory.
The second purpose of the invention is realized by adopting the following technical scheme:
the robot control method based on voice and human face actions comprises the following steps:
receiving a voice signal from a voice collector and receiving an image signal from a video collector;
identifying and analyzing the voice signal, and processing the voice signal into voice content; identifying and analyzing the image information, and processing the image information into character content;
processing and comparing the voice content and the character content to obtain instruction information;
receiving confirmation information input by a user and a control action signal selected by the user, binding the instruction information and the control action signal to form action information, and storing the action information;
matching the newly received instruction information with the stored action information or corresponding control action signals;
and sending the control action to the controller so that the controller controls the driving module to drive the robot to execute the corresponding action according to the control action signal.
Preferably, the "recognizing, analyzing and processing the image information into text" specifically includes the following steps:
carrying out gray level processing on the image information to obtain a lip image of the face;
acquiring an image of any line in the lip width direction in a lip image along with time change, and recording the image as X (t), and collecting an image of any line in the lip width direction in the lip image along with time change, and recording the image as H (t), wherein H (t) is a set of X (t);
acquiring an image of any line in the height direction of the lips in the lip image, recording the image as Y (t), collecting an image of any line in the height direction of the lips in the lip image, recording the image as V (t), wherein V (t) is a set of Y (t);
and comparing the change trends of the lips in the lip image along with time in the height direction and the width direction, and analyzing to obtain the current text content.
The third purpose of the invention is realized by adopting the following technical scheme:
an electronic device having a memory, a processor, and a computer readable program stored in the memory and executable by the processor, wherein the computer readable program when executed by the processor implements a robot control method according to a second aspect of the present invention.
Compared with the prior art, the invention has the beneficial effects that:
the invention recognizes and processes the face action and the voice to know the action change trend of the person, thereby not only improving the recognition accuracy, but also simplifying the calculated amount of the algorithm.
Drawings
Fig. 1 is a flowchart of a robot control method based on voice and human face actions according to the present invention.
Detailed Description
The invention will be further described with reference to the accompanying drawings and the detailed description below:
the invention provides a robot control system based on voice and human face actions, which comprises a voice collector, a video collector, an algorithm analysis processor, a memory, a controller and a driving module, wherein the voice collector and the video collector are connected with the algorithm analysis processor, the memory and the controller are connected with the algorithm analysis processor, and the driving module is connected with the controller.
The algorithm analysis processor is used for identifying, analyzing and processing the voice information from the voice collector and the image information from the video collector to obtain action control signals and sending the action control signals to the controller, so that the controller controls the driving module to drive the robot to execute corresponding actions according to the action control signals.
In the present invention, the voice collector includes, but is not limited to, a recording tool, a microphone, and a sound pickup, and the selected model may be a commercially available model. The video collector can be a camera, and can collect image information and dynamic video information and collect face image changes in real time.
The embodiment further comprises a voice player, and the voice player is connected with the algorithm analysis processor. The human-computer interaction device is connected with the algorithm analysis processor, and the algorithm analysis processor is used for displaying the action control signal to the human-computer interaction device and receiving a confirmation signal from the human-computer interaction device so as to store the action control signal into the memory.
The invention comprises two scenarios of a learning mode and a control module. In the learning mode, the robot enters a learning state, the voice information is captured through the voice collector, the face video is collected through the video collector, and the mouth dynamic information in the face image is mainly obtained. The video stream is processed through gray level, video information is processed, a multi-frame change trend of the video is obtained, the change trend of the lips along with the change of time in the width direction and the height direction, namely the spatial positions of the lips corresponding to different time is obtained, so that the current text content is analyzed, the voice content is obtained according to the voice information, the voice content and the text content are compared and matched, and finally the voice content and the text content are processed into instruction information. And the user inputs the control action corresponding to the instruction information through the human-computer interaction equipment, and the algorithm analysis processor binds the control action and the instruction information and stores the control action and the instruction information into the storage module.
And binding all control actions possibly involved by the robot with corresponding instruction information through a learning mode. In the control mode, the functions of the voice collector and the video collector are the same as those in the learning mode, and the processing principle of the algorithm analysis processor on the voice information and the video information is also completely the same. The difference is that under the control mode, the algorithm analysis processor analyzes the text content and the voice content to obtain the instruction information, then directly obtains the control action matched with the instruction information from the storage module, and drives the robot to execute the corresponding action through the controller and the driving module. The voice player is, for example, a speaker, and may broadcast instruction information, voice content, and text content. The control actions corresponding to the command information are matched in the storage module by adopting the prior art, for example, the control actions can be realized in a coding mode, the command information of different control actions is coded, and the control actions bound by the command information with consistent codes are searched, namely the matched control actions.
The invention also provides a robot control method based on voice and human face actions, as shown in fig. 1, comprising the following steps:
s1: receiving a voice signal from a voice collector and receiving an image signal from a video collector;
s2: identifying and analyzing the voice signal, and processing the voice signal into voice content; identifying and analyzing the image information, and processing the image information into character content;
in this step, the steps of identifying and analyzing the image information and processing the image information into text content specifically include the following steps:
carrying out gray level processing on the image information to obtain a lip image of the face;
acquiring an image of any line in the lip width direction in a lip image along with time change, and recording the image as X (t), and collecting an image of any line in the lip width direction in the lip image along with time change, and recording the image as H (t), wherein H (t) is a set of X (t);
acquiring an image of any line in the height direction of the lips in the lip image, recording the image as Y (t), collecting an image of any line in the height direction of the lips in the lip image, recording the image as V (t), wherein V (t) is a set of Y (t);
and comparing the change trends of the lips in the lip image along with time in the height direction and the width direction, and analyzing to obtain the current text content.
S3: processing and comparing the voice content and the character content to obtain instruction information;
s4: receiving confirmation information input by a user and a control action signal selected by the user, binding the instruction information and the control action signal to form action information, and storing the action information;
s5: matching the newly received instruction information with the stored action information or corresponding control action signals;
s6: and sending the control action to the controller so that the controller controls the driving module to drive the robot to execute the corresponding action according to the control action signal.
The present embodiment also includes a learning mode and a control mode, and the flow and principle executed in the learning mode and the control mode are the same as those of the robot control system provided by the present invention.
The present invention also provides an electronic device having a memory, a processor, and a computer readable program stored in the memory and executable by the processor, wherein the computer readable program, when executed by the processor, implements the robot control method according to the present invention.
Various other modifications and changes may be made by those skilled in the art based on the above-described technical solutions and concepts, and all such modifications and changes should fall within the scope of the claims of the present invention.
Claims (6)
1. The robot control system based on voice and human face actions is characterized by comprising a voice collector, a video collector, an algorithm analysis processor, a memory, a controller and a driving module, wherein the voice collector and the video collector are connected with the algorithm analysis processor, the memory and the controller are connected with the algorithm analysis processor, and the driving module is connected with the controller;
the algorithm analysis processor is used for identifying, analyzing and processing the voice information from the voice collector and the image information from the video collector to obtain action control signals and sending the action control signals to the controller, so that the controller controls the driving module to drive the robot to execute corresponding actions according to the action control signals.
2. The robotic control system according to claim 1, further comprising a voice player coupled to the algorithmic analysis processor.
3. The robot control system of claim 1, further comprising a human machine interaction device coupled to the algorithmic analysis processor, the algorithmic analysis processor being configured to post-display the motion control signal to the human machine interaction device and to receive a confirmation signal from the human machine interaction device to store the motion control signal in the memory.
4. The robot control method based on voice and human face actions is characterized by comprising the following steps:
receiving a voice signal from a voice collector and receiving an image signal from a video collector;
identifying and analyzing the voice signal, and processing the voice signal into voice content; identifying and analyzing the image information, and processing the image information into character content;
processing and comparing the voice content and the character content to obtain instruction information;
receiving confirmation information input by a user and a control action signal selected by the user, binding the instruction information and the control action signal to form action information, and storing the action information;
matching the newly received instruction information with the stored action information or corresponding control action signals;
and sending the control action to the controller so that the controller controls the driving module to drive the robot to execute the corresponding action according to the control action signal.
5. The robot control method according to claim 4, wherein the step of recognizing, analyzing, and processing the image information into text specifically comprises the steps of:
carrying out gray level processing on the image information to obtain a lip image of the face;
acquiring an image of any line in the lip width direction in a lip image along with time change, and recording the image as X (t), and collecting an image of any line in the lip width direction in the lip image along with time change, and recording the image as H (t), wherein H (t) is a set of X (t);
acquiring an image of any line in the height direction of the lips in the lip image, recording the image as Y (t), collecting an image of any line in the height direction of the lips in the lip image, recording the image as V (t), wherein V (t) is a set of Y (t);
and comparing the change trends of the lips in the lip image along with time in the height direction and the width direction, and analyzing to obtain the current text content.
6. An electronic device having a memory, a processor, and a computer readable program stored in the memory and executable by the processor, wherein the computer readable program, when executed by the processor, implements the robot control method of claim 4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911188246.5A CN111091823A (en) | 2019-11-28 | 2019-11-28 | Robot control system and method based on voice and human face actions and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911188246.5A CN111091823A (en) | 2019-11-28 | 2019-11-28 | Robot control system and method based on voice and human face actions and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111091823A true CN111091823A (en) | 2020-05-01 |
Family
ID=70393250
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911188246.5A Pending CN111091823A (en) | 2019-11-28 | 2019-11-28 | Robot control system and method based on voice and human face actions and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111091823A (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101101752A (en) * | 2007-07-19 | 2008-01-09 | 华中科技大学 | Monosyllabic language lip-reading recognition system based on vision character |
CN104951730A (en) * | 2014-03-26 | 2015-09-30 | 联想(北京)有限公司 | Lip movement detection method, lip movement detection device and electronic equipment |
CN105159111A (en) * | 2015-08-24 | 2015-12-16 | 百度在线网络技术(北京)有限公司 | Artificial intelligence-based control method and control system for intelligent interaction equipment |
CN106023993A (en) * | 2016-07-29 | 2016-10-12 | 西安旭天电子科技有限公司 | Robot control system based on natural language and control method thereof |
CN107122646A (en) * | 2017-04-26 | 2017-09-01 | 大连理工大学 | A kind of method for realizing lip reading unblock |
CN108073875A (en) * | 2016-11-14 | 2018-05-25 | 广东技术师范学院 | A kind of band noisy speech identifying system and method based on monocular cam |
CN109741815A (en) * | 2018-12-25 | 2019-05-10 | 广州天高软件科技有限公司 | A kind of medical guide robot and its implementation |
CN110276259A (en) * | 2019-05-21 | 2019-09-24 | 平安科技(深圳)有限公司 | Lip reading recognition methods, device, computer equipment and storage medium |
-
2019
- 2019-11-28 CN CN201911188246.5A patent/CN111091823A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101101752A (en) * | 2007-07-19 | 2008-01-09 | 华中科技大学 | Monosyllabic language lip-reading recognition system based on vision character |
CN104951730A (en) * | 2014-03-26 | 2015-09-30 | 联想(北京)有限公司 | Lip movement detection method, lip movement detection device and electronic equipment |
CN105159111A (en) * | 2015-08-24 | 2015-12-16 | 百度在线网络技术(北京)有限公司 | Artificial intelligence-based control method and control system for intelligent interaction equipment |
CN106023993A (en) * | 2016-07-29 | 2016-10-12 | 西安旭天电子科技有限公司 | Robot control system based on natural language and control method thereof |
CN108073875A (en) * | 2016-11-14 | 2018-05-25 | 广东技术师范学院 | A kind of band noisy speech identifying system and method based on monocular cam |
CN107122646A (en) * | 2017-04-26 | 2017-09-01 | 大连理工大学 | A kind of method for realizing lip reading unblock |
CN109741815A (en) * | 2018-12-25 | 2019-05-10 | 广州天高软件科技有限公司 | A kind of medical guide robot and its implementation |
CN110276259A (en) * | 2019-05-21 | 2019-09-24 | 平安科技(深圳)有限公司 | Lip reading recognition methods, device, computer equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103353935B (en) | A kind of 3D dynamic gesture identification method for intelligent domestic system | |
CN102023703B (en) | Combined lip reading and voice recognition multimodal interface system | |
JP3837505B2 (en) | Method of registering gesture of control device by gesture recognition | |
CN106128188A (en) | Desktop education focus analyzes system and the method for analysis thereof | |
CN106157956A (en) | The method and device of speech recognition | |
KR20150114983A (en) | Non-contact gesture control method, and electronic terminal device | |
JP2012008772A (en) | Gesture recognition apparatus, gesture recognition method, and program | |
KR20160106691A (en) | System and method for controlling playback of media using gestures | |
CN111144321B (en) | Concentration detection method, device, equipment and storage medium | |
CN109558788A (en) | Silent voice inputs discrimination method, computing device and computer-readable medium | |
CN111625094B (en) | Interaction method and device of intelligent rearview mirror, electronic equipment and storage medium | |
CN103677254A (en) | Methods and apparatus for documenting a procedure | |
CN112949689A (en) | Image recognition method and device, electronic equipment and storage medium | |
CN111402096A (en) | Online teaching quality management method, system, equipment and medium | |
CN103135746A (en) | Non-touch control method and non-touch control system and non-touch control device based on static postures and dynamic postures | |
CN111966321A (en) | Volume adjusting method, AR device and storage medium | |
CN107452381B (en) | Multimedia voice recognition device and method | |
CN111447325A (en) | Call auxiliary method, device, terminal and storage medium | |
JP7091745B2 (en) | Display terminals, programs, information processing systems and methods | |
CN111091823A (en) | Robot control system and method based on voice and human face actions and electronic equipment | |
CN111103807A (en) | Control method and device for household terminal equipment | |
CN109688512B (en) | Pickup method and device | |
CN115909505A (en) | Control method and device of sign language recognition equipment, storage medium and electronic equipment | |
CN107203734A (en) | A kind of method and electronic equipment for obtaining mouth state | |
JP2023046127A (en) | Utterance recognition system, communication system, utterance recognition device, moving body control system, and utterance recognition method and program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: 510000 201, building a, No.19 nanxiangsan Road, Huangpu District, Guangzhou City, Guangdong Province Applicant after: GUANGZHOU SAITE INTELLIGENT TECHNOLOGY Co.,Ltd. Address before: 510000 Room 303, 36 Kaitai Avenue, Huangpu District, Guangzhou City, Guangdong Province Applicant before: GUANGZHOU SAITE INTELLIGENT TECHNOLOGY Co.,Ltd. |
|
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200501 |