CN107297745B - Voice interactive method, voice interaction device and robot - Google Patents

Voice interactive method, voice interaction device and robot Download PDF

Info

Publication number
CN107297745B
CN107297745B CN201710505552.1A CN201710505552A CN107297745B CN 107297745 B CN107297745 B CN 107297745B CN 201710505552 A CN201710505552 A CN 201710505552A CN 107297745 B CN107297745 B CN 107297745B
Authority
CN
China
Prior art keywords
angle
robot
voice signal
speaker
sound source
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710505552.1A
Other languages
Chinese (zh)
Other versions
CN107297745A (en
Inventor
蒋化冰
陈岳峰
廖凯
齐鹏举
方园
米万珠
舒剑
吴琨
管伟
罗璇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
JIANGSU MUMENG INTELLIGENT TECHNOLOGY Co.,Ltd.
Original Assignee
Shanghai Wood Wood Robot Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Wood Wood Robot Technology Co Ltd filed Critical Shanghai Wood Wood Robot Technology Co Ltd
Priority to CN201710505552.1A priority Critical patent/CN107297745B/en
Publication of CN107297745A publication Critical patent/CN107297745A/en
Application granted granted Critical
Publication of CN107297745B publication Critical patent/CN107297745B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J11/00Manipulators not otherwise provided for
    • B25J11/0005Manipulators having means for high-level communication with users, e.g. speech generator, face recognition means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/226Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Robotics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Mechanical Engineering (AREA)
  • Manipulator (AREA)
  • Toys (AREA)

Abstract

The embodiment of the present invention provides a kind of voice interactive method, device and robot, the method is applied to robot, include: to obtain image when the sound source angle of received voice signal is within the scope of the predetermined angle of the robot, identifies the angle of one or more faces in described image;It chooses the facial angle and the immediate face of the sound source angle is speaker;And the angle of the adjustment robot, so that the face center of the speaker falls in center in front of the robot, in order to respond to the voice signal.By this method, device and robot, the voice interactive function of robot can be made more intelligent and personalized.

Description

Voice interactive method, voice interaction device and robot
Technical field
The invention belongs to robot field more particularly to a kind of voice interactive methods, voice interaction device and robot.
Background technique
With the rapid development of modern science and technology, will be used wider and wider for intelligent robot is general, either in family, still The public places such as market, bank can see the figure of intelligent robot.
The interactive voice of robot and speaker is always robot automtion, the important link to personalize, in addition to dialogue On interaction except, direction of the robot relative to speaker, erect-position is also very important intelligent embodiment.
During existing robot and person speech interaction of speaking, usually immediately ahead of speaker's active erect-position to robot, So that interactive voice is more smooth.In contrast, robot cannot be intelligent, quasi- according to the automatic erect-position of voice direction of speaker Peopleization has much room for improvement.
Summary of the invention
In conclusion the embodiment of the present invention provides a kind of voice interactive method, voice interaction device and robot, to reality Existing robot and speak person speech interaction when, it is more intelligent and personalize accurately towards speaker.
In a first aspect, the embodiment of the present invention provides a kind of voice interactive method, it is applied to robot, comprising: when received When the sound source angle of voice signal is within the scope of the predetermined angle of the robot, image is obtained, identifies one in described image The angle of a or multiple faces;It chooses the facial angle and the immediate face of the sound source angle is speaker;And it adjusts The angle of the whole robot, so that the face center of the speaker falls in center in front of the robot, in order to The voice signal is responded.
Further, the method also includes: receive the voice signal;Detect the energy of the voice signal;And When the energy of the voice signal reaches the threshold value of the robot, the sound source angle of the voice signal is positioned.
Further, the method also includes: when the sound source angle of received voice signal is not in the pre- of the robot If when in angular range, adjusting the angle of the robot, so that receiving the angle of the voice signal in the robot Within the scope of predetermined angle.
Further, the method is in the angle for adjusting the robot, so that the face center of the speaker It falls in front of the robot after the step of center, further includes: judge whether more than between the robot preset time Every;And when being more than the preset time interval of the robot, the facial image of the speaker is obtained, identifies speaker's Facial angle adjusts the angle of the robot, so that the face center of the speaker falls in center in front of the robot Position.
Further, in the method, carrying out response to the voice signal includes: to carry out language to the voice signal Sound identification;According to speech recognition as a result, progress natural language understanding, retrieves corresponding answer;And by the answer with language The mode of sound synthesis or limb action is response to the speaker.
Further, angle of the sound source angle between Sounnd source direction and robot front center direction.
Further, angle of the facial angle between the face and described image shooting direction.
Second aspect, the embodiment of the present invention provide a kind of voice interaction device, are applied to robot, comprising: pickup module, For judging the sound source angle of received voice signal whether within the scope of the predetermined angle of the robot;Image obtains mould Block, for when the sound source angle of received voice signal is within the scope of the predetermined angle of the robot, obtaining image, identification The angle of one or more faces in described image;And choose the facial angle and the immediate people of the sound source angle Face is speaker;And angle adjusts module, for adjusting the angle of the robot, so that the face center of the speaker Center in front of the robot is fallen in, in order to respond to the voice signal.
Further, the pickup module further comprises: energy measuring submodule, receives the sound for detecting The energy of signal;Auditory localization submodule reaches the threshold value of the robot for the energy when the voice signal, positions institute State the sound source angle of voice signal;And speech recognition submodule, for carrying out speech recognition to the voice signal.
Further, described device further include: interaction response module, for according to the speech recognition as a result, carry out Corresponding answer is retrieved in natural language understanding, by the answer response to described in a manner of speech synthesis or limb action Speaker.
Further, the angle adjustment module is also used to, when the sound source angle of the received voice signal is not in institute When stating within the scope of the predetermined angle of robot, the angle of the robot is adjusted, so that the angle for receiving the voice signal exists Within the scope of the predetermined angle of the robot.
Further, described device further include: timing module, for being spaced at regular intervals, triggering described image is obtained Modulus block reacquires the facial image of speaker, identifies the facial angle of speaker;And trigger angle adjustment module adjustment The angle of the robot, so that the face center of the speaker falls in center in front of the robot.
Further, angle of the sound source angle between Sounnd source direction and robot front center direction.
Further, angle of the facial angle between the face and described image shooting direction.
The third aspect, the embodiment of the present invention provide a kind of robot.The robot includes being arranged in the robot Voice interaction device;The voice interaction device is realized using technical solution provided by the above embodiment.
Voice interactive method, voice interaction device and the robot provided through the embodiment of the present invention, passes through localization of sound source Angle and facial angle, and speaker is determined from the scene of more people by the registration of the two, and automatically by robot Positive accurate steering speaker, then carries out interactive voice.So that robot and speak person speech interaction when it is more intelligent and It personalizes.In addition, because the front of robot is accurate to turn to speaker, the pickup direction of robot just with Sounnd source direction weight It closes, so that pickup angle is optimal, is also advantageous to the accurate acquisition of voice signal.
Detailed description of the invention
It, below will be to embodiment or description of the prior art in order to illustrate more clearly of the present invention or scheme in the prior art Needed in attached drawing make one and simple introduce, it should be apparent that, the accompanying drawings in the following description is some realities of the invention Example is applied, it for those of ordinary skill in the art, without creative efforts, can also be according to these attached drawings Obtain other attached drawings.
Fig. 1 is a kind of flow diagram of voice interactive method provided by the embodiment of the present invention one;
Fig. 2 is a kind of flow diagram of voice interactive method provided by the embodiment of the present invention two;
Fig. 3 is a kind of composed structure schematic diagram of voice interaction device provided by the embodiment of the present invention three;
A kind of composed structure schematic diagram of voice interaction device provided by the position Fig. 4 embodiment of the present invention four.
Specific embodiment
In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described.Obviously, described embodiment is only A part of the embodiment of the present invention gives presently preferred embodiments of the present invention instead of all the embodiments in attached drawing.The present invention can To realize in many different forms, however it is not limited to embodiment described herein, on the contrary, provide the mesh of these embodiments Be to make the disclosure of the present invention more thorough and comprehensive.Based on the embodiments of the present invention, the common skill in this field Art personnel all other embodiment obtained without creative efforts belongs to the model that the present invention protects It encloses.
Unless otherwise defined, all technical and scientific terms used herein and belong to technical field of the invention The normally understood meaning of technical staff is identical.Term as used herein in the specification of the present invention is intended merely to description tool The purpose of the embodiment of body, it is not intended that in the limitation present invention.In description and claims of this specification and above-mentioned attached drawing Term " first ", " second " etc. be to be not use to describe a particular order for distinguishing different objects.In addition, term " packet Include " and " having " and their any deformations, it is intended that it covers and non-exclusive includes.Such as contain series of steps or list The process, method, system, product or equipment of member are not limited to listed step or unit, but optionally further comprising do not have The step of listing or unit, or optionally further comprising other steps intrinsic for these process, methods, product or equipment or Unit.
Referenced herein " embodiment " is it is meant that a particular feature, structure, or characteristic described can wrap in conjunction with the embodiments Containing at least one embodiment of the present invention.Each position in the description occur the phrase might not each mean it is identical Embodiment, nor the independent or alternative embodiment with other embodiments mutual exclusion.Those skilled in the art explicitly and Implicitly understand, embodiment described herein can be combined with other embodiments.
Embodiment one
The embodiment of the present invention one provides a kind of voice interactive method, is applied to robot.In embodiments of the present invention, machine The design of people generallys use, such as 165 centimetre height similar to general adult height.For the ease of pickup, annular is usually utilized Microphone array carries out pickup, for example 6 Mike's schemes, array are in some position on head, such as the crown, is horizontally arranged, main Microphone (No. 0 microphone) is located at head front center;The specified wave beam enhancing direction of microphone array is just in 0 ° of direction.Pass through After microphone array obtains voice signal, automatic speech recognition (ASR auto speech recognition) can be carried out, it is fixed Position sound source angle, judges acoustic energy.It is waken up when wherein speech recognition engine is long.There are two types of mechanism for it: one is pass through Specific wake up after word wakes up starts speech recognition, and another kind is that speech recognition engine is in wake-up states forever.The present embodiment Selection latter, and sound source angle and energy, can obtain at any time.After speech recognition result obtains, there is a natural language Speech understands (Natural Language Understanding, NLU) module, and the corresponding answer of speech recognition result is retrieved Come, and robot is allowed to feed back to speaker.Particularly, the result of speech recognition may can not find corresponding response in systems and refer to It enables, otherwise our this results are referred to as meaningless as a result, being then significant.
One camera is housed immediately ahead of the robot crown, is located at head front central point, there are certain views for camera Angular region, such as in 100-120 ° of range, the face got in image is sufficed to identify in above-mentioned angular field of view, and know The angle of one or more faces in other image, can provide the angle between given face and described image shooting direction, Namely angle of the face relative to head front central point.
There is rotatory power on robot chassis, can make robot 360 degree rotation.The idler wheel on pedestal can also be passed through simultaneously Rotate robot.
Robot can respond speaker by speech synthesis, limb action etc..
Refering to fig. 1, it is illustrated as a kind of flow diagram of voice interactive method provided in an embodiment of the present invention, this method can To be applied to above-mentioned robot, comprising:
Step S1001: when the sound source angle of received voice signal is within the scope of predetermined angle, image, identification are obtained The angle of one or more faces in image.
Robot can carry out pickup by annular microphone array, preset the angular range of pickup, the present embodiment setting Immediately ahead of the robot between positive and negative 60 degree of angles.If the voice signal received is in the angular range, it is believed that effectively, into One step obtains image by the camera of robot head front center point position, convenient for further positioning speaker.Usual feelings Under condition, the shooting angle range of robot camera is greater than the preset pickup angular range of robot, is clapped convenient for sound object It photographs and.
Step S1002: choosing the facial angle and the immediate face of the sound source angle is speaker.
If robot is located under the scene of more people's erect-positions, facial angle and the sound source angle in image are further chosen Immediate face is speaker.
In the embodiment of the present invention, " speaker " is the sound object for making a sound signal, including but not limited to: natural person, Robot or other sounding objects;" face " is the main body front of sound object, such as: face, machine face, sounding object Front.
Step S1003: adjustment robot angle, so that the face center of the speaker falls in centre bit in front of robot It sets, in order to be responded to the voice signal.
In the embodiment of the present invention, the face center of speaker falls in center in front of robot, with the face of speaker Center fall in robot front dead center heart position be best or the positive and negative 15-30 degree of center position or so between.
In the embodiment of the present invention, voice is carried out to the received voice signal arrived while executing step S1001 Identification;According to speech recognition as a result, progress natural language understanding, retrieves corresponding answer, and by the answer with voice The mode of synthesis or limb action is response to the speaker.In other embodiments, step S1003 can also executed Afterwards, i.e., when robot forward direction faces speaker, to the voice signal speech recognition, according to speech recognition as a result, carrying out Natural language understanding, retrieves corresponding answer, and by the answer in a manner of speech synthesis or limb action response to The speaker.
The voice interactive method provided through the embodiment of the present invention by localization of sound source angle and facial angle, and passes through The registration of the two determines speaker from the scene of more people, and automatically by the accurate steering speaker in the front of robot, so After carry out interactive voice.So that robot with speak person speech interaction when it is more intelligent and personalize.In addition, because robot Front it is accurate turn to speaker, the pickup direction of robot is just overlapped with Sounnd source direction so that pickup angle is optimal, also ten Divide the accurate acquisition for being conducive to voice signal.
Embodiment two
Second embodiment of the present invention provides a kind of voice interactive methods, are applied to robot.Referring to Fig.2, being illustrated as the present invention A kind of flow diagram for voice interactive method that embodiment provides.
Step S2001: voice signal is received.
In embodiments of the present invention, the design of robot generallys use similar to general adult height, such as 165 centimetres It is high.For the ease of receiving voice signal, pickup usually is carried out using annular microphone array, for example 6 Mike's schemes, array are in Some position on head, such as the crown are horizontally arranged, and main microphon (No. 0 microphone) is located at head front center;Microphone array The specified wave beam enhancing direction of column is just in 0 ° of direction.
Step S2002: detecting the energy of voice signal, judges that the energy of the voice signal reaches threshold value? if so, Execute step S2003;If it is not, process terminates.
After obtaining voice signal by annular microphone array, the energy of voice signal is further detected, judges the sound Does the energy of sound signal reach threshold value? if reaching, it is believed that be effective voice signal, continue to execute step S2003;If not reaching It arrives, it is believed that the voice signal received is invalid.
Whether step S2003: positioning the sound source angle of the voice signal, judge the sound source angle in predetermined angle model In enclosing? if so, executing step S2005;If it is not, executing step S2004.
After obtaining voice signal by annular microphone array, judge the sound source angle of voice signal whether in predetermined angle In range.Angle of the sound source angle between Sounnd source direction and robot front center direction.In predetermined angle model Interior voice signal is enclosed, robot could preferably carry out speech recognition and interaction.If the sound source angle of voice signal is not pre- If angular range in, then adjust the angle of robot.
In the embodiment of the present invention, the preset sound source angle range of the robot is positive and negative 60 degree of folders immediately ahead of robot Between angle.
Step S2004: adjusting the angle of robot, so that receiving the sound source angle of the voice signal described preset In range.
In the embodiment of the present invention, there is rotatory power on robot chassis, can make robot 360 degree rotation.It simultaneously can also be with Robot is rotated by the idler wheel on pedestal.
Step S2005: obtaining image, identifies the angle of one or more faces in image.
In the embodiment of the present invention, a camera is housed immediately ahead of the robot crown, is located at head front central point, visual angle In 100-120 ° of range, for obtaining image, and sufficing to identify the face got in image, and identify in image The angle of one or more faces can provide angle namely face phase between given face and described image shooting direction For the angle of robot head front center point.Under normal conditions, the shooting angle range of robot camera is greater than machine The preset pickup angular range of people, is taken convenient for sound object.
Step S2006: choosing the facial angle and the immediate face of sound source angle is speaker.
Identify that one or more facial images, each facial image can calculate face and image acquisition side from image To the angle of (i.e. camera positive direction), selection and sound source angle are closest, such as differential seat angle is away from less than 15 degree, and identification is attached most importance to It closes.Assert that the corresponding people of the facial image is speaker.
In the embodiment of the present invention, " speaker " is the sound object for making a sound signal, including but not limited to: natural person, Robot or other sounding objects;" face " is the main body front of sound object, such as: face, machine face, sounding object Front.
Step S2007: adjustment robot angle, so that the face center of the speaker falls in centre bit in front of robot It sets, in order to be responded to the voice signal.
In the embodiment of the present invention, the face center of speaker falls in center in front of robot, with the face of speaker It is best that center, which falls in robot front dead center heart position,;Alternatively, between the positive and negative 15-30 degree of center position or so.
In the embodiment of the present invention, there is rotatory power on robot chassis, can make robot 360 degree rotation.It simultaneously can also be with Robot is rotated by the idler wheel on pedestal.As rotary machine people, so that the face center of the speaker falls in robot When the center of front, the pickup angle of robot is more excellent, and seems more intelligent and quasi- in the interactive voice with speaker Peopleization.
Step S2008: judge whether to be more than preset time interval? if so, executing step S2009;If it is not, this process knot Beam.
Step S2009: obtaining the facial image of speaker, identifies the facial angle of speaker.Then, step is further executed Rapid S2007.
The robot of the embodiment of the present invention during turning, is understood at regular intervals, such as: preset time interval is It 0.5 second, is once updated, looks at whether the position of speaker and angle change, if need to adjust the angle of robot.
In the embodiment of the present invention, voice is carried out to the received voice signal arrived while executing step S2005 Identification;And by natural language understanding (NLU natural language understanding) module, voice is known The corresponding answer of other result is retrieved, and robot is allowed to feed back to speaker.Particularly, the result of speech recognition may be It can not find corresponding response instruction in system, otherwise our this results are referred to as meaningless as a result, being then significant.Robot meeting Speaker is responded by speech synthesis or limb action etc..In other embodiments of the invention, step can also executed After S2007, i.e., when robot forward direction faces speaker, to the voice signal speech recognition, according to speech recognition as a result, Natural language understanding is carried out, retrieves corresponding answer, and the answer is returned in a manner of speech synthesis or limb action The speaker should be given.
The voice interactive method provided through the embodiment of the present invention by localization of sound source angle and facial angle, and passes through The registration of the two determines speaker from the scene of more people, and automatically by the accurate steering speaker in the front of robot, so After carry out interactive voice.So that robot with speak person speech interaction when it is more intelligent and personalize.In addition, because robot Front it is accurate turn to speaker, the pickup direction of robot is just overlapped with Sounnd source direction so that pickup angle is optimal, also ten Divide the accurate acquisition for being conducive to voice signal.
Embodiment three
The embodiment of the present invention three provides a kind of voice interaction device, refering to Fig. 3.The voice interaction device is applied to machine People, comprising: pickup module 100, image collection module 120 and angle adjust module 140.The voice provided through this embodiment Interactive device may be implemented above-described embodiment one and implement two voice interactive methods provided.
Pickup module 100, for judge received voice signal sound source angle whether the robot default sound In the angular range of source.
Image collection module 120, for working as predetermined angle of the sound source angle in the robot of received voice signal When in range, image is obtained, the angle of one or more faces in image is identified, chooses the facial angle and the sound source The immediate face of angle is speaker.
Angle adjusts module 140, for adjusting the angle of the robot, so that the face center of the speaker is fallen in Center in front of the robot, in order to be responded to the voice signal.
The voice interaction device provided through the embodiment of the present invention by localization of sound source angle and facial angle, and passes through The registration of the two determines speaker from the scene of more people, and automatically by the accurate steering speaker in the front of robot, so After carry out interactive voice.So that robot with speak person speech interaction when it is more intelligent and personalize.In addition, because robot Front it is accurate turn to speaker, the pickup direction of robot is just overlapped with Sounnd source direction so that pickup angle is optimal, also ten Divide the accurate acquisition for being conducive to voice signal.
Example IV
The embodiment of the present invention four provides a kind of voice interaction device, refering to Fig. 4.The voice interaction device is applied to machine People, comprising: pickup module 200, image collection module 220, angle adjustment module 240 and interaction response module 260.By this reality The voice interaction device for applying example offer may be implemented above-described embodiment one and implement two voice interactive methods provided.
Pickup module 200, for judge received voice signal sound source angle whether the robot default sound In the angular range of source.Angle of the sound source angle between Sounnd source direction and robot front center direction.The machine The preset sound source angle range of device people is between the robot positive and negative 60 degree of angles in front
The pickup module 200 further comprises: energy measuring submodule 202, receives the sound letter for detecting Number energy;
Auditory localization submodule 204 reaches the threshold value of the robot for the energy when the voice signal, positions institute State the sound source angle of voice signal;And
Speech recognition submodule 206, for carrying out speech recognition to the voice signal.
Image collection module 120, for working as predetermined angle of the sound source angle in the robot of received voice signal When in range, image is obtained, the angle of one or more faces in image is identified, chooses the facial angle and the sound source The immediate face of angle is speaker.Angle of the facial angle between face and described image shooting direction.Usually In the case of, the shooting angle range of robot camera is greater than the preset pickup angular range of robot, is convenient for sound object quilt It is filmed.
Angle adjusts module 140, for adjusting the angle of the robot, so that the face center of the speaker is fallen in Center in front of the robot.In addition, when the sound source angle of the received voice signal is not in the pre- of the robot If when in angular range, adjusting the angle of the robot, so that receiving the angle of the voice signal in the robot Within the scope of predetermined angle.
In the embodiment of the present invention, " speaker " is the sound object for making a sound signal, including but not limited to: natural person, Robot or other sounding objects;" face " is the main body front of sound object, such as: face, machine face, sounding object Front.
Interaction response module 160, for being responded to the voice signal.It is specifically used for, according to speech recognition submodule The speech recognition result of block 206 carries out natural language understanding, retrieves corresponding answer, by the answer with speech synthesis or The mode of limb action is response to the speaker.
On the basis of the improvement of the embodiment of the present invention, can further include timing module (not shown), for every Interval of time, triggering described image obtain the facial image that module 220 reacquires speaker, identify the face of speaker Angle;And trigger angle adjustment module 240 adjusts the angle of the robot, so that the face center of the speaker is fallen The center in front of the robot.
In speech recognition equipment provided in an embodiment of the present invention, pickup module 200 can be mounted in robot head Annular microphone array.Pickup is carried out by annular microphone array, for example 6 Mike's schemes, array are in some position on head It sets, such as the crown, is horizontally arranged, main microphon is located at head front center;The specified wave beam enhancing direction of microphone array is just In 0 ° of direction.Image collection module 220 can be mounted in the camera on robot head top, be located at head front central point, depending on Angle has the ability to obtain picture in 100-120 ° of range, and identifies the face in image, by software application calculate face relative to The angle in image taking direction (i.e. head front central point).Angle adjusts module 240 can be by being mounted on robot bottom There is the chassis of rotatory power to realize.
The voice interaction device provided through the embodiment of the present invention by localization of sound source angle and facial angle, and passes through The registration of the two determines speaker from the scene of more people, and automatically by the accurate steering speaker in the front of robot, so After carry out interactive voice.So that robot with speak person speech interaction when it is more intelligent and personalize.In addition, because robot Front it is accurate turn to speaker, the pickup direction of robot is just overlapped with Sounnd source direction so that pickup angle is optimal, also ten Divide the accurate acquisition for being conducive to voice signal.
Embodiment five
The embodiment of the present invention five provides a kind of robot, which includes the interactive voice being arranged in the robot Device;The voice interaction device is using voice interaction device described in above embodiment three or embodiment four.Using The robot that this method embodiment provides can make the voice interactive function of robot more intelligent and personalize.
In above-described embodiment provided by the present invention, it should be understood that disclosed device and method can pass through it Its mode is realized.For example, the apparatus embodiments described above are merely exemplary, for example, the division of the module, only Only a kind of logical function partition, there may be another division manner in actual implementation, for example, multiple module or components can be tied Another system is closed or is desirably integrated into, or some features can be ignored or not executed.
The module as illustrated by the separation member may or may not be physically separated, aobvious as module The component shown may or may not be physical module, it can and it is in one place, or may be distributed over multiple In network unit.Some or all of the modules therein can be selected to realize the mesh of this embodiment scheme according to the actual needs 's.
The above is only the embodiment of the present invention, are not intended to limit the scope of the patents of the invention, although with reference to the foregoing embodiments Invention is explained in detail, still can be to aforementioned each specific reality for coming for those skilled in the art It applies technical solution documented by mode to modify, or equivalence replacement is carried out to part of technical characteristic.It is all to utilize this The equivalent structure that description of the invention and accompanying drawing content are done directly or indirectly is used in other relevant technical fields, similarly Within the invention patent protection scope.

Claims (12)

1. a kind of voice interactive method is applied to robot characterized by comprising
Receive voice signal;
Detect the energy of the voice signal;And
When the energy of the voice signal reaches the threshold value of the robot, the sound source angle of the voice signal is positioned;
When the sound source angle of the received voice signal is within the scope of the predetermined angle of the robot, image is obtained, is known The angle of one or more faces in other described image;
It chooses the facial angle and the immediate face of the sound source angle is speaker;And
The angle of the robot is adjusted, so that the face center of the speaker falls in center in front of the robot, In order to be responded to the voice signal;
Judge whether to be more than the preset time interval of the robot;And
When being more than preset time interval described in the robot, the facial image of the speaker is obtained, identifies speaker Facial angle, the angle of the robot is adjusted, so that the face center of the speaker is fallen in in front of the robot Heart position.
2. the method according to claim 1, wherein the method also includes:
When the sound source angle of received voice signal is not within the scope of the predetermined angle of the robot, the robot is adjusted Angle so that receiving the angle of the voice signal within the scope of the predetermined angle of the robot.
3. method according to any one of claims 1 to 2, which is characterized in that described to carry out response bag to the voice signal It includes:
Speech recognition is carried out to the voice signal;
According to speech recognition as a result, progress natural language understanding, retrieves corresponding answer;And
By the answer response to the speaker in a manner of speech synthesis or limb action.
4. method according to any one of claims 1 to 2, which is characterized in that the sound source angle be Sounnd source direction with it is described Angle between robot front center direction.
5. method according to any one of claims 1 to 2, which is characterized in that the facial angle be the face with it is described Angle between the shooting direction of image.
6. a kind of voice interaction device is applied to robot characterized by comprising
Pickup module, for judge received voice signal sound source angle whether the robot default sound source angle model In enclosing;The pickup module further include:
Energy measuring submodule, for detecting the energy for receiving the voice signal;
Auditory localization submodule reaches the threshold value of the robot for the energy when the voice signal, positions the sound The sound source angle of signal;And
Speech recognition submodule, for carrying out speech recognition to the voice signal;
Image collection module, for when the sound source angle of received voice signal is in the default range of the robot, Image is obtained, the angle of one or more faces in described image is identified, chooses the facial angle and the sound source angle Immediate face is speaker;And
Angle adjusts module, for adjusting the angle of the robot, so that the face center of the speaker falls in the machine Center in front of device people, in order to be responded to the voice signal.
7. device according to claim 6, which is characterized in that further comprise: interaction response module, for according to Speech recognition as a result, carry out natural language understanding, retrieve corresponding answer, the answer moved with speech synthesis or limbs The mode of work is response to the speaker.
8. device according to claim 6, which is characterized in that the angle adjustment module is also used to, when received described When the sound source angle of voice signal is not within the scope of the predetermined angle of the robot, the angle of the robot is adjusted, so that The angle of the voice signal is received within the scope of the predetermined angle of the robot.
9. device according to claim 6, which is characterized in that described device further include: timing module, for every one section Time interval, triggering described image obtain the facial image that module reacquires speaker, identify the facial angle of speaker;With And trigger angle adjustment module adjusts the angle of the robot, so that the face center of the speaker falls in the machine Center in front of people.
10. according to any device of claim 6 to 9, which is characterized in that the sound source angle be Sounnd source direction with it is described Angle between robot front center direction.
11. according to any device of claim 6 to 9, which is characterized in that the facial angle is face and described image Shooting direction between angle.
12. a kind of robot characterized by comprising
Voice interaction device in the robot is set;
The voice interaction device is using voice interaction device described in any one of the claims 6 to 11.
CN201710505552.1A 2017-06-28 2017-06-28 Voice interactive method, voice interaction device and robot Active CN107297745B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710505552.1A CN107297745B (en) 2017-06-28 2017-06-28 Voice interactive method, voice interaction device and robot

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710505552.1A CN107297745B (en) 2017-06-28 2017-06-28 Voice interactive method, voice interaction device and robot

Publications (2)

Publication Number Publication Date
CN107297745A CN107297745A (en) 2017-10-27
CN107297745B true CN107297745B (en) 2019-08-13

Family

ID=60136518

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710505552.1A Active CN107297745B (en) 2017-06-28 2017-06-28 Voice interactive method, voice interaction device and robot

Country Status (1)

Country Link
CN (1) CN107297745B (en)

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107765858B (en) * 2017-11-06 2019-12-31 Oppo广东移动通信有限公司 Method, device, terminal and storage medium for determining face angle
JP7065353B2 (en) * 2017-12-07 2022-05-12 パナソニックIpマネジメント株式会社 Head-mounted display and its control method
CN107945799A (en) * 2017-12-18 2018-04-20 深圳市今视通数码科技有限公司 A kind of multifunction speech interactive intelligence machine
CN109981970B (en) * 2017-12-28 2021-07-27 深圳市优必选科技有限公司 Method and device for determining shooting scene and robot
CN108594987A (en) * 2018-03-20 2018-09-28 中国科学院自动化研究所 More man-machine coordination Behavior Monitor Systems based on multi-modal interaction and its control method
CN108733420B (en) * 2018-03-21 2022-04-29 北京猎户星空科技有限公司 Awakening method and device of intelligent equipment, intelligent equipment and storage medium
CN108564943B (en) * 2018-04-27 2021-02-12 京东方科技集团股份有限公司 Voice interaction method and system
CN108419124B (en) * 2018-05-08 2020-11-17 北京酷我科技有限公司 Audio processing method
CN109166575A (en) * 2018-07-27 2019-01-08 百度在线网络技术(北京)有限公司 Exchange method, device, smart machine and the storage medium of smart machine
CN109192050A (en) * 2018-10-25 2019-01-11 重庆鲁班机器人技术研究院有限公司 Experience type language teaching method, device and educational robot
CN109508687A (en) * 2018-11-26 2019-03-22 北京猎户星空科技有限公司 Man-machine interaction control method, device, storage medium and smart machine
CN109584871B (en) * 2018-12-04 2021-09-03 北京蓦然认知科技有限公司 User identity recognition method and device of voice command in vehicle
CN109640224B (en) * 2018-12-26 2022-01-21 北京猎户星空科技有限公司 Pickup method and device
JP2020144209A (en) * 2019-03-06 2020-09-10 シャープ株式会社 Speech processing unit, conference system and speech processing method
CN109801632A (en) * 2019-03-08 2019-05-24 北京马尔马拉科技有限公司 A kind of artificial intelligent voice robot system and method based on big data
CN110000791A (en) * 2019-04-24 2019-07-12 深圳市三宝创新智能有限公司 A kind of motion control device and method of desktop machine people
CN110491411B (en) * 2019-09-25 2022-05-17 上海依图信息技术有限公司 Method for separating speaker by combining microphone sound source angle and voice characteristic similarity
CN112711331A (en) * 2020-12-28 2021-04-27 京东数科海益信息科技有限公司 Robot interaction method and device, storage equipment and electronic equipment
CN114242072A (en) * 2021-12-21 2022-03-25 上海帝图信息科技有限公司 Voice recognition system for intelligent robot
CN115862668B (en) * 2022-11-28 2023-10-24 之江实验室 Method and system for judging interactive object based on sound source positioning by robot

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101604204A (en) * 2009-07-09 2009-12-16 北京科技大学 Distributed cognitive technology for intelligent emotional robot
JP4784351B2 (en) * 2006-03-14 2011-10-05 セイコーエプソン株式会社 Human identification using robots
CN105957521A (en) * 2016-02-29 2016-09-21 青岛克路德机器人有限公司 Voice and image composite interaction execution method and system for robot
CN106292732A (en) * 2015-06-10 2017-01-04 上海元趣信息技术有限公司 Intelligent robot rotating method based on sound localization and Face datection
CN106328130A (en) * 2015-06-30 2017-01-11 芋头科技(杭州)有限公司 Robot voice addressed rotation system and method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4784351B2 (en) * 2006-03-14 2011-10-05 セイコーエプソン株式会社 Human identification using robots
CN101604204A (en) * 2009-07-09 2009-12-16 北京科技大学 Distributed cognitive technology for intelligent emotional robot
CN106292732A (en) * 2015-06-10 2017-01-04 上海元趣信息技术有限公司 Intelligent robot rotating method based on sound localization and Face datection
CN106328130A (en) * 2015-06-30 2017-01-11 芋头科技(杭州)有限公司 Robot voice addressed rotation system and method
CN105957521A (en) * 2016-02-29 2016-09-21 青岛克路德机器人有限公司 Voice and image composite interaction execution method and system for robot

Also Published As

Publication number Publication date
CN107297745A (en) 2017-10-27

Similar Documents

Publication Publication Date Title
CN107297745B (en) Voice interactive method, voice interaction device and robot
Vecchiotti et al. End-to-end binaural sound localisation from the raw waveform
US6549122B2 (en) Portable orientation system
Freiberger Development and evaluation of source localization algorithms for coincident microphone arrays
CN106292732A (en) Intelligent robot rotating method based on sound localization and Face datection
Young et al. Effects of pinna position on head‐related transfer functions in the cat
CN104469154B (en) A kind of camera guide device and bootstrap technique based on microphone array
CN103581606B (en) A kind of multimedia collection device and method
US10582117B1 (en) Automatic camera control in a video conference system
CN107346661A (en) A kind of distant range iris tracking and acquisition method based on microphone array
CN106710603A (en) Speech recognition method and system based on linear microphone array
CN109318243A (en) A kind of audio source tracking system, method and the clean robot of vision robot
CN107430868A (en) The Real-time Reconstruction of user speech in immersion visualization system
CN103607550B (en) A kind of method according to beholder's position adjustment Television Virtual sound channel and TV
CN101198194A (en) Ear speaker device
AU2009287421A1 (en) A microphone array system and method for sound acquisition
CN109307856A (en) A kind of sterically defined exchange method of robot and device
JP2003251583A (en) Robot audio-visual system
WO2019105238A1 (en) Method and terminal for speech signal reconstruction and computer storage medium
CN110503045A (en) A kind of Face detection method and device
CN110345610A (en) Control method and device of air conditioner and air conditioning equipment
CN104731325B (en) Relative direction based on intelligent glasses determines method, apparatus and intelligent glasses
CN109031200A (en) A kind of sound source dimensional orientation detection method based on deep learning
CN109665111A (en) Continuation of the journey artificial intelligence line holographic projections aircraft when a kind of overlength
EP1269746A1 (en) Hands-free home video production camcorder

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: Room 402, Building 33 Guangshun Road, Changning District, Shanghai, 2003

Applicant after: SHANGHAI MROBOT TECHNOLOGY Co.,Ltd.

Address before: Room 402, Building 33 Guangshun Road, Changning District, Shanghai 200050

Applicant before: SHANGHAI MUYE ROBOT TECHNOLOGY Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder

Address after: Room 402, Building 33 Guangshun Road, Changning District, Shanghai, 2003

Patentee after: Shanghai zhihuilin Medical Technology Co.,Ltd.

Address before: Room 402, Building 33 Guangshun Road, Changning District, Shanghai, 2003

Patentee before: Shanghai Zhihui Medical Technology Co.,Ltd.

Address after: Room 402, Building 33 Guangshun Road, Changning District, Shanghai, 2003

Patentee after: Shanghai Zhihui Medical Technology Co.,Ltd.

Address before: Room 402, Building 33 Guangshun Road, Changning District, Shanghai, 2003

Patentee before: SHANGHAI MROBOT TECHNOLOGY Co.,Ltd.

CP01 Change in the name or title of a patent holder
TR01 Transfer of patent right

Effective date of registration: 20201118

Address after: 226500 Jiangsu city of Nantong province Rugao City North Street Flower Market Road No. 20

Patentee after: JIANGSU MUMENG INTELLIGENT TECHNOLOGY Co.,Ltd.

Address before: Room 402, Building 33 Guangshun Road, Changning District, Shanghai, 2003

Patentee before: Shanghai zhihuilin Medical Technology Co.,Ltd.

TR01 Transfer of patent right