CN111091823A - Robot control system and method based on voice and human face actions and electronic equipment - Google Patents

Robot control system and method based on voice and human face actions and electronic equipment Download PDF

Info

Publication number
CN111091823A
CN111091823A CN201911188246.5A CN201911188246A CN111091823A CN 111091823 A CN111091823 A CN 111091823A CN 201911188246 A CN201911188246 A CN 201911188246A CN 111091823 A CN111091823 A CN 111091823A
Authority
CN
China
Prior art keywords
voice
image
action
information
collector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911188246.5A
Other languages
Chinese (zh)
Inventor
赖志林
陈桂芳
李睿
俞锦涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Saite Intelligent Technology Co Ltd
Original Assignee
Guangzhou Saite Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Saite Intelligent Technology Co Ltd filed Critical Guangzhou Saite Intelligent Technology Co Ltd
Priority to CN201911188246.5A priority Critical patent/CN111091823A/en
Publication of CN111091823A publication Critical patent/CN111091823A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/24Speech recognition using non-acoustical features
    • G10L15/25Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Manipulator (AREA)

Abstract

The invention discloses a robot control system based on voice and human face actions, which comprises a voice collector, a video collector, an algorithm analysis processor, a memory, a controller and a driving module, wherein the voice collector and the video collector are connected with the algorithm analysis processor, the memory and the controller are connected with the algorithm analysis processor, and the driving module is connected with the controller; the algorithm analysis processor is used for identifying, analyzing and processing the voice information from the voice collector and the image information from the video collector to obtain action control signals and sending the action control signals to the controller, so that the controller controls the driving module to drive the robot to execute corresponding actions according to the action control signals. The invention recognizes and processes the face action and the voice to know the action change trend of the person, thereby not only improving the recognition accuracy, but also simplifying the calculated amount of the algorithm.

Description

Robot control system and method based on voice and human face actions and electronic equipment
Technical Field
The invention relates to the technical field of voice video and image recognition, in particular to a robot control system and method based on voice and human face actions.
Background
At present, the domestic robot interaction methods acquire related information through voice recognition or by matching with conventional lip language actions so as to acquire execution instructions of the next action of the robot.
However, in the prior art, if the robot is controlled by simply relying on the voice recognition technology, the accuracy is not high, even if the robot is matched with lip language actions, the robot is only limited to simple and conventional lip actions, and the robot still cannot recognize the unusual and complex lip actions, so that the problem of low accuracy still exists. And the algorithm matched with the lip language action is too complex, and the operation processing is slow.
Disclosure of Invention
In order to overcome the defects of the prior art, one of the objectives of the present invention is to provide a robot control system based on voice and human face actions, which can solve the problems of complex algorithm, slow operation processing and low accuracy rate in the prior art.
The second purpose of the present invention is to provide a robot control method based on voice and human face actions, which can solve the problems of complex algorithm, slow operation processing and low accuracy rate in the prior art.
The invention also aims to provide electronic equipment which can solve the problems of complex algorithm, slow operation processing and low accuracy rate in the prior art.
One of the purposes of the invention is realized by adopting the following technical scheme:
the robot control system based on voice and human face actions comprises a voice collector, a video collector, an algorithm analysis processor, a memory, a controller and a driving module, wherein the voice collector and the video collector are connected with the algorithm analysis processor, the memory and the controller are connected with the algorithm analysis processor, and the driving module is connected with the controller;
the algorithm analysis processor is used for identifying, analyzing and processing the voice information from the voice collector and the image information from the video collector to obtain action control signals and sending the action control signals to the controller, so that the controller controls the driving module to drive the robot to execute corresponding actions according to the action control signals.
Preferably, the system further comprises a voice player, and the voice player is connected with the algorithm analysis processor.
Preferably, the system further comprises a human-computer interaction device connected with the algorithmic analysis processor, wherein the algorithmic analysis processor is used for displaying the motion control signal to the human-computer interaction device and receiving a confirmation signal from the human-computer interaction device so as to store the motion control signal in the memory.
The second purpose of the invention is realized by adopting the following technical scheme:
the robot control method based on voice and human face actions comprises the following steps:
receiving a voice signal from a voice collector and receiving an image signal from a video collector;
identifying and analyzing the voice signal, and processing the voice signal into voice content; identifying and analyzing the image information, and processing the image information into character content;
processing and comparing the voice content and the character content to obtain instruction information;
receiving confirmation information input by a user and a control action signal selected by the user, binding the instruction information and the control action signal to form action information, and storing the action information;
matching the newly received instruction information with the stored action information or corresponding control action signals;
and sending the control action to the controller so that the controller controls the driving module to drive the robot to execute the corresponding action according to the control action signal.
Preferably, the "recognizing, analyzing and processing the image information into text" specifically includes the following steps:
carrying out gray level processing on the image information to obtain a lip image of the face;
acquiring an image of any line in the lip width direction in a lip image along with time change, and recording the image as X (t), and collecting an image of any line in the lip width direction in the lip image along with time change, and recording the image as H (t), wherein H (t) is a set of X (t);
acquiring an image of any line in the height direction of the lips in the lip image, recording the image as Y (t), collecting an image of any line in the height direction of the lips in the lip image, recording the image as V (t), wherein V (t) is a set of Y (t);
and comparing the change trends of the lips in the lip image along with time in the height direction and the width direction, and analyzing to obtain the current text content.
The third purpose of the invention is realized by adopting the following technical scheme:
an electronic device having a memory, a processor, and a computer readable program stored in the memory and executable by the processor, wherein the computer readable program when executed by the processor implements a robot control method according to a second aspect of the present invention.
Compared with the prior art, the invention has the beneficial effects that:
the invention recognizes and processes the face action and the voice to know the action change trend of the person, thereby not only improving the recognition accuracy, but also simplifying the calculated amount of the algorithm.
Drawings
Fig. 1 is a flowchart of a robot control method based on voice and human face actions according to the present invention.
Detailed Description
The invention will be further described with reference to the accompanying drawings and the detailed description below:
the invention provides a robot control system based on voice and human face actions, which comprises a voice collector, a video collector, an algorithm analysis processor, a memory, a controller and a driving module, wherein the voice collector and the video collector are connected with the algorithm analysis processor, the memory and the controller are connected with the algorithm analysis processor, and the driving module is connected with the controller.
The algorithm analysis processor is used for identifying, analyzing and processing the voice information from the voice collector and the image information from the video collector to obtain action control signals and sending the action control signals to the controller, so that the controller controls the driving module to drive the robot to execute corresponding actions according to the action control signals.
In the present invention, the voice collector includes, but is not limited to, a recording tool, a microphone, and a sound pickup, and the selected model may be a commercially available model. The video collector can be a camera, and can collect image information and dynamic video information and collect face image changes in real time.
The embodiment further comprises a voice player, and the voice player is connected with the algorithm analysis processor. The human-computer interaction device is connected with the algorithm analysis processor, and the algorithm analysis processor is used for displaying the action control signal to the human-computer interaction device and receiving a confirmation signal from the human-computer interaction device so as to store the action control signal into the memory.
The invention comprises two scenarios of a learning mode and a control module. In the learning mode, the robot enters a learning state, the voice information is captured through the voice collector, the face video is collected through the video collector, and the mouth dynamic information in the face image is mainly obtained. The video stream is processed through gray level, video information is processed, a multi-frame change trend of the video is obtained, the change trend of the lips along with the change of time in the width direction and the height direction, namely the spatial positions of the lips corresponding to different time is obtained, so that the current text content is analyzed, the voice content is obtained according to the voice information, the voice content and the text content are compared and matched, and finally the voice content and the text content are processed into instruction information. And the user inputs the control action corresponding to the instruction information through the human-computer interaction equipment, and the algorithm analysis processor binds the control action and the instruction information and stores the control action and the instruction information into the storage module.
And binding all control actions possibly involved by the robot with corresponding instruction information through a learning mode. In the control mode, the functions of the voice collector and the video collector are the same as those in the learning mode, and the processing principle of the algorithm analysis processor on the voice information and the video information is also completely the same. The difference is that under the control mode, the algorithm analysis processor analyzes the text content and the voice content to obtain the instruction information, then directly obtains the control action matched with the instruction information from the storage module, and drives the robot to execute the corresponding action through the controller and the driving module. The voice player is, for example, a speaker, and may broadcast instruction information, voice content, and text content. The control actions corresponding to the command information are matched in the storage module by adopting the prior art, for example, the control actions can be realized in a coding mode, the command information of different control actions is coded, and the control actions bound by the command information with consistent codes are searched, namely the matched control actions.
The invention also provides a robot control method based on voice and human face actions, as shown in fig. 1, comprising the following steps:
s1: receiving a voice signal from a voice collector and receiving an image signal from a video collector;
s2: identifying and analyzing the voice signal, and processing the voice signal into voice content; identifying and analyzing the image information, and processing the image information into character content;
in this step, the steps of identifying and analyzing the image information and processing the image information into text content specifically include the following steps:
carrying out gray level processing on the image information to obtain a lip image of the face;
acquiring an image of any line in the lip width direction in a lip image along with time change, and recording the image as X (t), and collecting an image of any line in the lip width direction in the lip image along with time change, and recording the image as H (t), wherein H (t) is a set of X (t);
acquiring an image of any line in the height direction of the lips in the lip image, recording the image as Y (t), collecting an image of any line in the height direction of the lips in the lip image, recording the image as V (t), wherein V (t) is a set of Y (t);
and comparing the change trends of the lips in the lip image along with time in the height direction and the width direction, and analyzing to obtain the current text content.
S3: processing and comparing the voice content and the character content to obtain instruction information;
s4: receiving confirmation information input by a user and a control action signal selected by the user, binding the instruction information and the control action signal to form action information, and storing the action information;
s5: matching the newly received instruction information with the stored action information or corresponding control action signals;
s6: and sending the control action to the controller so that the controller controls the driving module to drive the robot to execute the corresponding action according to the control action signal.
The present embodiment also includes a learning mode and a control mode, and the flow and principle executed in the learning mode and the control mode are the same as those of the robot control system provided by the present invention.
The present invention also provides an electronic device having a memory, a processor, and a computer readable program stored in the memory and executable by the processor, wherein the computer readable program, when executed by the processor, implements the robot control method according to the present invention.
Various other modifications and changes may be made by those skilled in the art based on the above-described technical solutions and concepts, and all such modifications and changes should fall within the scope of the claims of the present invention.

Claims (6)

1. The robot control system based on voice and human face actions is characterized by comprising a voice collector, a video collector, an algorithm analysis processor, a memory, a controller and a driving module, wherein the voice collector and the video collector are connected with the algorithm analysis processor, the memory and the controller are connected with the algorithm analysis processor, and the driving module is connected with the controller;
the algorithm analysis processor is used for identifying, analyzing and processing the voice information from the voice collector and the image information from the video collector to obtain action control signals and sending the action control signals to the controller, so that the controller controls the driving module to drive the robot to execute corresponding actions according to the action control signals.
2. The robotic control system according to claim 1, further comprising a voice player coupled to the algorithmic analysis processor.
3. The robot control system of claim 1, further comprising a human machine interaction device coupled to the algorithmic analysis processor, the algorithmic analysis processor being configured to post-display the motion control signal to the human machine interaction device and to receive a confirmation signal from the human machine interaction device to store the motion control signal in the memory.
4. The robot control method based on voice and human face actions is characterized by comprising the following steps:
receiving a voice signal from a voice collector and receiving an image signal from a video collector;
identifying and analyzing the voice signal, and processing the voice signal into voice content; identifying and analyzing the image information, and processing the image information into character content;
processing and comparing the voice content and the character content to obtain instruction information;
receiving confirmation information input by a user and a control action signal selected by the user, binding the instruction information and the control action signal to form action information, and storing the action information;
matching the newly received instruction information with the stored action information or corresponding control action signals;
and sending the control action to the controller so that the controller controls the driving module to drive the robot to execute the corresponding action according to the control action signal.
5. The robot control method according to claim 4, wherein the step of recognizing, analyzing, and processing the image information into text specifically comprises the steps of:
carrying out gray level processing on the image information to obtain a lip image of the face;
acquiring an image of any line in the lip width direction in a lip image along with time change, and recording the image as X (t), and collecting an image of any line in the lip width direction in the lip image along with time change, and recording the image as H (t), wherein H (t) is a set of X (t);
acquiring an image of any line in the height direction of the lips in the lip image, recording the image as Y (t), collecting an image of any line in the height direction of the lips in the lip image, recording the image as V (t), wherein V (t) is a set of Y (t);
and comparing the change trends of the lips in the lip image along with time in the height direction and the width direction, and analyzing to obtain the current text content.
6. An electronic device having a memory, a processor, and a computer readable program stored in the memory and executable by the processor, wherein the computer readable program, when executed by the processor, implements the robot control method of claim 4.
CN201911188246.5A 2019-11-28 2019-11-28 Robot control system and method based on voice and human face actions and electronic equipment Pending CN111091823A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911188246.5A CN111091823A (en) 2019-11-28 2019-11-28 Robot control system and method based on voice and human face actions and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911188246.5A CN111091823A (en) 2019-11-28 2019-11-28 Robot control system and method based on voice and human face actions and electronic equipment

Publications (1)

Publication Number Publication Date
CN111091823A true CN111091823A (en) 2020-05-01

Family

ID=70393250

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911188246.5A Pending CN111091823A (en) 2019-11-28 2019-11-28 Robot control system and method based on voice and human face actions and electronic equipment

Country Status (1)

Country Link
CN (1) CN111091823A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101101752A (en) * 2007-07-19 2008-01-09 华中科技大学 Monosyllabic language lip-reading recognition system based on vision character
CN104951730A (en) * 2014-03-26 2015-09-30 联想(北京)有限公司 Lip movement detection method, lip movement detection device and electronic equipment
CN105159111A (en) * 2015-08-24 2015-12-16 百度在线网络技术(北京)有限公司 Artificial intelligence-based control method and control system for intelligent interaction equipment
CN106023993A (en) * 2016-07-29 2016-10-12 西安旭天电子科技有限公司 Robot control system based on natural language and control method thereof
CN107122646A (en) * 2017-04-26 2017-09-01 大连理工大学 A kind of method for realizing lip reading unblock
CN108073875A (en) * 2016-11-14 2018-05-25 广东技术师范学院 A kind of band noisy speech identifying system and method based on monocular cam
CN109741815A (en) * 2018-12-25 2019-05-10 广州天高软件科技有限公司 A kind of medical guide robot and its implementation
CN110276259A (en) * 2019-05-21 2019-09-24 平安科技(深圳)有限公司 Lip reading recognition methods, device, computer equipment and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101101752A (en) * 2007-07-19 2008-01-09 华中科技大学 Monosyllabic language lip-reading recognition system based on vision character
CN104951730A (en) * 2014-03-26 2015-09-30 联想(北京)有限公司 Lip movement detection method, lip movement detection device and electronic equipment
CN105159111A (en) * 2015-08-24 2015-12-16 百度在线网络技术(北京)有限公司 Artificial intelligence-based control method and control system for intelligent interaction equipment
CN106023993A (en) * 2016-07-29 2016-10-12 西安旭天电子科技有限公司 Robot control system based on natural language and control method thereof
CN108073875A (en) * 2016-11-14 2018-05-25 广东技术师范学院 A kind of band noisy speech identifying system and method based on monocular cam
CN107122646A (en) * 2017-04-26 2017-09-01 大连理工大学 A kind of method for realizing lip reading unblock
CN109741815A (en) * 2018-12-25 2019-05-10 广州天高软件科技有限公司 A kind of medical guide robot and its implementation
CN110276259A (en) * 2019-05-21 2019-09-24 平安科技(深圳)有限公司 Lip reading recognition methods, device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN103353935B (en) A kind of 3D dynamic gesture identification method for intelligent domestic system
CN102023703B (en) Combined lip reading and voice recognition multimodal interface system
JP3837505B2 (en) Method of registering gesture of control device by gesture recognition
CN106128188A (en) Desktop education focus analyzes system and the method for analysis thereof
CN106157956A (en) The method and device of speech recognition
KR20150114983A (en) Non-contact gesture control method, and electronic terminal device
JP2012008772A (en) Gesture recognition apparatus, gesture recognition method, and program
KR20160106691A (en) System and method for controlling playback of media using gestures
CN111144321B (en) Concentration detection method, device, equipment and storage medium
CN109558788A (en) Silent voice inputs discrimination method, computing device and computer-readable medium
CN111625094B (en) Interaction method and device of intelligent rearview mirror, electronic equipment and storage medium
CN103677254A (en) Methods and apparatus for documenting a procedure
CN112949689A (en) Image recognition method and device, electronic equipment and storage medium
CN111402096A (en) Online teaching quality management method, system, equipment and medium
CN103135746A (en) Non-touch control method and non-touch control system and non-touch control device based on static postures and dynamic postures
CN111966321A (en) Volume adjusting method, AR device and storage medium
CN107452381B (en) Multimedia voice recognition device and method
CN111447325A (en) Call auxiliary method, device, terminal and storage medium
JP7091745B2 (en) Display terminals, programs, information processing systems and methods
CN111091823A (en) Robot control system and method based on voice and human face actions and electronic equipment
CN111103807A (en) Control method and device for household terminal equipment
CN109688512B (en) Pickup method and device
CN115909505A (en) Control method and device of sign language recognition equipment, storage medium and electronic equipment
CN107203734A (en) A kind of method and electronic equipment for obtaining mouth state
JP2023046127A (en) Utterance recognition system, communication system, utterance recognition device, moving body control system, and utterance recognition method and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 510000 201, building a, No.19 nanxiangsan Road, Huangpu District, Guangzhou City, Guangdong Province

Applicant after: GUANGZHOU SAITE INTELLIGENT TECHNOLOGY Co.,Ltd.

Address before: 510000 Room 303, 36 Kaitai Avenue, Huangpu District, Guangzhou City, Guangdong Province

Applicant before: GUANGZHOU SAITE INTELLIGENT TECHNOLOGY Co.,Ltd.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200501