Voice interactive method, voice interaction device and robot
Technical field
The invention belongs to robot field more particularly to a kind of voice interactive methods, voice interaction device and robot.
Background technique
With the rapid development of modern science and technology, will be used wider and wider for intelligent robot is general, either in family, still
The public places such as market, bank can see the figure of intelligent robot.
The interactive voice of robot and speaker is always robot automtion, the important link to personalize, in addition to dialogue
On interaction except, direction of the robot relative to speaker, erect-position is also very important intelligent embodiment.
During existing robot and person speech interaction of speaking, usually immediately ahead of speaker's active erect-position to robot,
So that interactive voice is more smooth.In contrast, robot cannot be intelligent, quasi- according to the automatic erect-position of voice direction of speaker
Peopleization has much room for improvement.
Summary of the invention
In conclusion the embodiment of the present invention provides a kind of voice interactive method, voice interaction device and robot, to reality
Existing robot and speak person speech interaction when, it is more intelligent and personalize accurately towards speaker.
In a first aspect, the embodiment of the present invention provides a kind of voice interactive method, it is applied to robot, comprising: when received
When the sound source angle of voice signal is within the scope of the predetermined angle of the robot, image is obtained, identifies one in described image
The angle of a or multiple faces;It chooses the facial angle and the immediate face of the sound source angle is speaker;And it adjusts
The angle of the whole robot, so that the face center of the speaker falls in center in front of the robot, in order to
The voice signal is responded.
Further, the method also includes: receive the voice signal;Detect the energy of the voice signal;And
When the energy of the voice signal reaches the threshold value of the robot, the sound source angle of the voice signal is positioned.
Further, the method also includes: when the sound source angle of received voice signal is not in the pre- of the robot
If when in angular range, adjusting the angle of the robot, so that receiving the angle of the voice signal in the robot
Within the scope of predetermined angle.
Further, the method is in the angle for adjusting the robot, so that the face center of the speaker
It falls in front of the robot after the step of center, further includes: judge whether more than between the robot preset time
Every;And when being more than the preset time interval of the robot, the facial image of the speaker is obtained, identifies speaker's
Facial angle adjusts the angle of the robot, so that the face center of the speaker falls in center in front of the robot
Position.
Further, in the method, carrying out response to the voice signal includes: to carry out language to the voice signal
Sound identification;According to speech recognition as a result, progress natural language understanding, retrieves corresponding answer;And by the answer with language
The mode of sound synthesis or limb action is response to the speaker.
Further, angle of the sound source angle between Sounnd source direction and robot front center direction.
Further, angle of the facial angle between the face and described image shooting direction.
Second aspect, the embodiment of the present invention provide a kind of voice interaction device, are applied to robot, comprising: pickup module,
For judging the sound source angle of received voice signal whether within the scope of the predetermined angle of the robot;Image obtains mould
Block, for when the sound source angle of received voice signal is within the scope of the predetermined angle of the robot, obtaining image, identification
The angle of one or more faces in described image;And choose the facial angle and the immediate people of the sound source angle
Face is speaker;And angle adjusts module, for adjusting the angle of the robot, so that the face center of the speaker
Center in front of the robot is fallen in, in order to respond to the voice signal.
Further, the pickup module further comprises: energy measuring submodule, receives the sound for detecting
The energy of signal;Auditory localization submodule reaches the threshold value of the robot for the energy when the voice signal, positions institute
State the sound source angle of voice signal;And speech recognition submodule, for carrying out speech recognition to the voice signal.
Further, described device further include: interaction response module, for according to the speech recognition as a result, carry out
Corresponding answer is retrieved in natural language understanding, by the answer response to described in a manner of speech synthesis or limb action
Speaker.
Further, the angle adjustment module is also used to, when the sound source angle of the received voice signal is not in institute
When stating within the scope of the predetermined angle of robot, the angle of the robot is adjusted, so that the angle for receiving the voice signal exists
Within the scope of the predetermined angle of the robot.
Further, described device further include: timing module, for being spaced at regular intervals, triggering described image is obtained
Modulus block reacquires the facial image of speaker, identifies the facial angle of speaker;And trigger angle adjustment module adjustment
The angle of the robot, so that the face center of the speaker falls in center in front of the robot.
Further, angle of the sound source angle between Sounnd source direction and robot front center direction.
Further, angle of the facial angle between the face and described image shooting direction.
The third aspect, the embodiment of the present invention provide a kind of robot.The robot includes being arranged in the robot
Voice interaction device;The voice interaction device is realized using technical solution provided by the above embodiment.
Voice interactive method, voice interaction device and the robot provided through the embodiment of the present invention, passes through localization of sound source
Angle and facial angle, and speaker is determined from the scene of more people by the registration of the two, and automatically by robot
Positive accurate steering speaker, then carries out interactive voice.So that robot and speak person speech interaction when it is more intelligent and
It personalizes.In addition, because the front of robot is accurate to turn to speaker, the pickup direction of robot just with Sounnd source direction weight
It closes, so that pickup angle is optimal, is also advantageous to the accurate acquisition of voice signal.
Detailed description of the invention
It, below will be to embodiment or description of the prior art in order to illustrate more clearly of the present invention or scheme in the prior art
Needed in attached drawing make one and simple introduce, it should be apparent that, the accompanying drawings in the following description is some realities of the invention
Example is applied, it for those of ordinary skill in the art, without creative efforts, can also be according to these attached drawings
Obtain other attached drawings.
Fig. 1 is a kind of flow diagram of voice interactive method provided by the embodiment of the present invention one;
Fig. 2 is a kind of flow diagram of voice interactive method provided by the embodiment of the present invention two;
Fig. 3 is a kind of composed structure schematic diagram of voice interaction device provided by the embodiment of the present invention three;
A kind of composed structure schematic diagram of voice interaction device provided by the position Fig. 4 embodiment of the present invention four.
Specific embodiment
In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention
Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described.Obviously, described embodiment is only
A part of the embodiment of the present invention gives presently preferred embodiments of the present invention instead of all the embodiments in attached drawing.The present invention can
To realize in many different forms, however it is not limited to embodiment described herein, on the contrary, provide the mesh of these embodiments
Be to make the disclosure of the present invention more thorough and comprehensive.Based on the embodiments of the present invention, the common skill in this field
Art personnel all other embodiment obtained without creative efforts belongs to the model that the present invention protects
It encloses.
Unless otherwise defined, all technical and scientific terms used herein and belong to technical field of the invention
The normally understood meaning of technical staff is identical.Term as used herein in the specification of the present invention is intended merely to description tool
The purpose of the embodiment of body, it is not intended that in the limitation present invention.In description and claims of this specification and above-mentioned attached drawing
Term " first ", " second " etc. be to be not use to describe a particular order for distinguishing different objects.In addition, term " packet
Include " and " having " and their any deformations, it is intended that it covers and non-exclusive includes.Such as contain series of steps or list
The process, method, system, product or equipment of member are not limited to listed step or unit, but optionally further comprising do not have
The step of listing or unit, or optionally further comprising other steps intrinsic for these process, methods, product or equipment or
Unit.
Referenced herein " embodiment " is it is meant that a particular feature, structure, or characteristic described can wrap in conjunction with the embodiments
Containing at least one embodiment of the present invention.Each position in the description occur the phrase might not each mean it is identical
Embodiment, nor the independent or alternative embodiment with other embodiments mutual exclusion.Those skilled in the art explicitly and
Implicitly understand, embodiment described herein can be combined with other embodiments.
Embodiment one
The embodiment of the present invention one provides a kind of voice interactive method, is applied to robot.In embodiments of the present invention, machine
The design of people generallys use, such as 165 centimetre height similar to general adult height.For the ease of pickup, annular is usually utilized
Microphone array carries out pickup, for example 6 Mike's schemes, array are in some position on head, such as the crown, is horizontally arranged, main
Microphone (No. 0 microphone) is located at head front center;The specified wave beam enhancing direction of microphone array is just in 0 ° of direction.Pass through
After microphone array obtains voice signal, automatic speech recognition (ASR auto speech recognition) can be carried out, it is fixed
Position sound source angle, judges acoustic energy.It is waken up when wherein speech recognition engine is long.There are two types of mechanism for it: one is pass through
Specific wake up after word wakes up starts speech recognition, and another kind is that speech recognition engine is in wake-up states forever.The present embodiment
Selection latter, and sound source angle and energy, can obtain at any time.After speech recognition result obtains, there is a natural language
Speech understands (Natural Language Understanding, NLU) module, and the corresponding answer of speech recognition result is retrieved
Come, and robot is allowed to feed back to speaker.Particularly, the result of speech recognition may can not find corresponding response in systems and refer to
It enables, otherwise our this results are referred to as meaningless as a result, being then significant.
One camera is housed immediately ahead of the robot crown, is located at head front central point, there are certain views for camera
Angular region, such as in 100-120 ° of range, the face got in image is sufficed to identify in above-mentioned angular field of view, and know
The angle of one or more faces in other image, can provide the angle between given face and described image shooting direction,
Namely angle of the face relative to head front central point.
There is rotatory power on robot chassis, can make robot 360 degree rotation.The idler wheel on pedestal can also be passed through simultaneously
Rotate robot.
Robot can respond speaker by speech synthesis, limb action etc..
Refering to fig. 1, it is illustrated as a kind of flow diagram of voice interactive method provided in an embodiment of the present invention, this method can
To be applied to above-mentioned robot, comprising:
Step S1001: when the sound source angle of received voice signal is within the scope of predetermined angle, image, identification are obtained
The angle of one or more faces in image.
Robot can carry out pickup by annular microphone array, preset the angular range of pickup, the present embodiment setting
Immediately ahead of the robot between positive and negative 60 degree of angles.If the voice signal received is in the angular range, it is believed that effectively, into
One step obtains image by the camera of robot head front center point position, convenient for further positioning speaker.Usual feelings
Under condition, the shooting angle range of robot camera is greater than the preset pickup angular range of robot, is clapped convenient for sound object
It photographs and.
Step S1002: choosing the facial angle and the immediate face of the sound source angle is speaker.
If robot is located under the scene of more people's erect-positions, facial angle and the sound source angle in image are further chosen
Immediate face is speaker.
In the embodiment of the present invention, " speaker " is the sound object for making a sound signal, including but not limited to: natural person,
Robot or other sounding objects;" face " is the main body front of sound object, such as: face, machine face, sounding object
Front.
Step S1003: adjustment robot angle, so that the face center of the speaker falls in centre bit in front of robot
It sets, in order to be responded to the voice signal.
In the embodiment of the present invention, the face center of speaker falls in center in front of robot, with the face of speaker
Center fall in robot front dead center heart position be best or the positive and negative 15-30 degree of center position or so between.
In the embodiment of the present invention, voice is carried out to the received voice signal arrived while executing step S1001
Identification;According to speech recognition as a result, progress natural language understanding, retrieves corresponding answer, and by the answer with voice
The mode of synthesis or limb action is response to the speaker.In other embodiments, step S1003 can also executed
Afterwards, i.e., when robot forward direction faces speaker, to the voice signal speech recognition, according to speech recognition as a result, carrying out
Natural language understanding, retrieves corresponding answer, and by the answer in a manner of speech synthesis or limb action response to
The speaker.
The voice interactive method provided through the embodiment of the present invention by localization of sound source angle and facial angle, and passes through
The registration of the two determines speaker from the scene of more people, and automatically by the accurate steering speaker in the front of robot, so
After carry out interactive voice.So that robot with speak person speech interaction when it is more intelligent and personalize.In addition, because robot
Front it is accurate turn to speaker, the pickup direction of robot is just overlapped with Sounnd source direction so that pickup angle is optimal, also ten
Divide the accurate acquisition for being conducive to voice signal.
Embodiment two
Second embodiment of the present invention provides a kind of voice interactive methods, are applied to robot.Referring to Fig.2, being illustrated as the present invention
A kind of flow diagram for voice interactive method that embodiment provides.
Step S2001: voice signal is received.
In embodiments of the present invention, the design of robot generallys use similar to general adult height, such as 165 centimetres
It is high.For the ease of receiving voice signal, pickup usually is carried out using annular microphone array, for example 6 Mike's schemes, array are in
Some position on head, such as the crown are horizontally arranged, and main microphon (No. 0 microphone) is located at head front center;Microphone array
The specified wave beam enhancing direction of column is just in 0 ° of direction.
Step S2002: detecting the energy of voice signal, judges that the energy of the voice signal reaches threshold value? if so,
Execute step S2003;If it is not, process terminates.
After obtaining voice signal by annular microphone array, the energy of voice signal is further detected, judges the sound
Does the energy of sound signal reach threshold value? if reaching, it is believed that be effective voice signal, continue to execute step S2003;If not reaching
It arrives, it is believed that the voice signal received is invalid.
Whether step S2003: positioning the sound source angle of the voice signal, judge the sound source angle in predetermined angle model
In enclosing? if so, executing step S2005;If it is not, executing step S2004.
After obtaining voice signal by annular microphone array, judge the sound source angle of voice signal whether in predetermined angle
In range.Angle of the sound source angle between Sounnd source direction and robot front center direction.In predetermined angle model
Interior voice signal is enclosed, robot could preferably carry out speech recognition and interaction.If the sound source angle of voice signal is not pre-
If angular range in, then adjust the angle of robot.
In the embodiment of the present invention, the preset sound source angle range of the robot is positive and negative 60 degree of folders immediately ahead of robot
Between angle.
Step S2004: adjusting the angle of robot, so that receiving the sound source angle of the voice signal described preset
In range.
In the embodiment of the present invention, there is rotatory power on robot chassis, can make robot 360 degree rotation.It simultaneously can also be with
Robot is rotated by the idler wheel on pedestal.
Step S2005: obtaining image, identifies the angle of one or more faces in image.
In the embodiment of the present invention, a camera is housed immediately ahead of the robot crown, is located at head front central point, visual angle
In 100-120 ° of range, for obtaining image, and sufficing to identify the face got in image, and identify in image
The angle of one or more faces can provide angle namely face phase between given face and described image shooting direction
For the angle of robot head front center point.Under normal conditions, the shooting angle range of robot camera is greater than machine
The preset pickup angular range of people, is taken convenient for sound object.
Step S2006: choosing the facial angle and the immediate face of sound source angle is speaker.
Identify that one or more facial images, each facial image can calculate face and image acquisition side from image
To the angle of (i.e. camera positive direction), selection and sound source angle are closest, such as differential seat angle is away from less than 15 degree, and identification is attached most importance to
It closes.Assert that the corresponding people of the facial image is speaker.
In the embodiment of the present invention, " speaker " is the sound object for making a sound signal, including but not limited to: natural person,
Robot or other sounding objects;" face " is the main body front of sound object, such as: face, machine face, sounding object
Front.
Step S2007: adjustment robot angle, so that the face center of the speaker falls in centre bit in front of robot
It sets, in order to be responded to the voice signal.
In the embodiment of the present invention, the face center of speaker falls in center in front of robot, with the face of speaker
It is best that center, which falls in robot front dead center heart position,;Alternatively, between the positive and negative 15-30 degree of center position or so.
In the embodiment of the present invention, there is rotatory power on robot chassis, can make robot 360 degree rotation.It simultaneously can also be with
Robot is rotated by the idler wheel on pedestal.As rotary machine people, so that the face center of the speaker falls in robot
When the center of front, the pickup angle of robot is more excellent, and seems more intelligent and quasi- in the interactive voice with speaker
Peopleization.
Step S2008: judge whether to be more than preset time interval? if so, executing step S2009;If it is not, this process knot
Beam.
Step S2009: obtaining the facial image of speaker, identifies the facial angle of speaker.Then, step is further executed
Rapid S2007.
The robot of the embodiment of the present invention during turning, is understood at regular intervals, such as: preset time interval is
It 0.5 second, is once updated, looks at whether the position of speaker and angle change, if need to adjust the angle of robot.
In the embodiment of the present invention, voice is carried out to the received voice signal arrived while executing step S2005
Identification;And by natural language understanding (NLU natural language understanding) module, voice is known
The corresponding answer of other result is retrieved, and robot is allowed to feed back to speaker.Particularly, the result of speech recognition may be
It can not find corresponding response instruction in system, otherwise our this results are referred to as meaningless as a result, being then significant.Robot meeting
Speaker is responded by speech synthesis or limb action etc..In other embodiments of the invention, step can also executed
After S2007, i.e., when robot forward direction faces speaker, to the voice signal speech recognition, according to speech recognition as a result,
Natural language understanding is carried out, retrieves corresponding answer, and the answer is returned in a manner of speech synthesis or limb action
The speaker should be given.
The voice interactive method provided through the embodiment of the present invention by localization of sound source angle and facial angle, and passes through
The registration of the two determines speaker from the scene of more people, and automatically by the accurate steering speaker in the front of robot, so
After carry out interactive voice.So that robot with speak person speech interaction when it is more intelligent and personalize.In addition, because robot
Front it is accurate turn to speaker, the pickup direction of robot is just overlapped with Sounnd source direction so that pickup angle is optimal, also ten
Divide the accurate acquisition for being conducive to voice signal.
Embodiment three
The embodiment of the present invention three provides a kind of voice interaction device, refering to Fig. 3.The voice interaction device is applied to machine
People, comprising: pickup module 100, image collection module 120 and angle adjust module 140.The voice provided through this embodiment
Interactive device may be implemented above-described embodiment one and implement two voice interactive methods provided.
Pickup module 100, for judge received voice signal sound source angle whether the robot default sound
In the angular range of source.
Image collection module 120, for working as predetermined angle of the sound source angle in the robot of received voice signal
When in range, image is obtained, the angle of one or more faces in image is identified, chooses the facial angle and the sound source
The immediate face of angle is speaker.
Angle adjusts module 140, for adjusting the angle of the robot, so that the face center of the speaker is fallen in
Center in front of the robot, in order to be responded to the voice signal.
The voice interaction device provided through the embodiment of the present invention by localization of sound source angle and facial angle, and passes through
The registration of the two determines speaker from the scene of more people, and automatically by the accurate steering speaker in the front of robot, so
After carry out interactive voice.So that robot with speak person speech interaction when it is more intelligent and personalize.In addition, because robot
Front it is accurate turn to speaker, the pickup direction of robot is just overlapped with Sounnd source direction so that pickup angle is optimal, also ten
Divide the accurate acquisition for being conducive to voice signal.
Example IV
The embodiment of the present invention four provides a kind of voice interaction device, refering to Fig. 4.The voice interaction device is applied to machine
People, comprising: pickup module 200, image collection module 220, angle adjustment module 240 and interaction response module 260.By this reality
The voice interaction device for applying example offer may be implemented above-described embodiment one and implement two voice interactive methods provided.
Pickup module 200, for judge received voice signal sound source angle whether the robot default sound
In the angular range of source.Angle of the sound source angle between Sounnd source direction and robot front center direction.The machine
The preset sound source angle range of device people is between the robot positive and negative 60 degree of angles in front
The pickup module 200 further comprises: energy measuring submodule 202, receives the sound letter for detecting
Number energy;
Auditory localization submodule 204 reaches the threshold value of the robot for the energy when the voice signal, positions institute
State the sound source angle of voice signal;And
Speech recognition submodule 206, for carrying out speech recognition to the voice signal.
Image collection module 120, for working as predetermined angle of the sound source angle in the robot of received voice signal
When in range, image is obtained, the angle of one or more faces in image is identified, chooses the facial angle and the sound source
The immediate face of angle is speaker.Angle of the facial angle between face and described image shooting direction.Usually
In the case of, the shooting angle range of robot camera is greater than the preset pickup angular range of robot, is convenient for sound object quilt
It is filmed.
Angle adjusts module 140, for adjusting the angle of the robot, so that the face center of the speaker is fallen in
Center in front of the robot.In addition, when the sound source angle of the received voice signal is not in the pre- of the robot
If when in angular range, adjusting the angle of the robot, so that receiving the angle of the voice signal in the robot
Within the scope of predetermined angle.
In the embodiment of the present invention, " speaker " is the sound object for making a sound signal, including but not limited to: natural person,
Robot or other sounding objects;" face " is the main body front of sound object, such as: face, machine face, sounding object
Front.
Interaction response module 160, for being responded to the voice signal.It is specifically used for, according to speech recognition submodule
The speech recognition result of block 206 carries out natural language understanding, retrieves corresponding answer, by the answer with speech synthesis or
The mode of limb action is response to the speaker.
On the basis of the improvement of the embodiment of the present invention, can further include timing module (not shown), for every
Interval of time, triggering described image obtain the facial image that module 220 reacquires speaker, identify the face of speaker
Angle;And trigger angle adjustment module 240 adjusts the angle of the robot, so that the face center of the speaker is fallen
The center in front of the robot.
In speech recognition equipment provided in an embodiment of the present invention, pickup module 200 can be mounted in robot head
Annular microphone array.Pickup is carried out by annular microphone array, for example 6 Mike's schemes, array are in some position on head
It sets, such as the crown, is horizontally arranged, main microphon is located at head front center;The specified wave beam enhancing direction of microphone array is just
In 0 ° of direction.Image collection module 220 can be mounted in the camera on robot head top, be located at head front central point, depending on
Angle has the ability to obtain picture in 100-120 ° of range, and identifies the face in image, by software application calculate face relative to
The angle in image taking direction (i.e. head front central point).Angle adjusts module 240 can be by being mounted on robot bottom
There is the chassis of rotatory power to realize.
The voice interaction device provided through the embodiment of the present invention by localization of sound source angle and facial angle, and passes through
The registration of the two determines speaker from the scene of more people, and automatically by the accurate steering speaker in the front of robot, so
After carry out interactive voice.So that robot with speak person speech interaction when it is more intelligent and personalize.In addition, because robot
Front it is accurate turn to speaker, the pickup direction of robot is just overlapped with Sounnd source direction so that pickup angle is optimal, also ten
Divide the accurate acquisition for being conducive to voice signal.
Embodiment five
The embodiment of the present invention five provides a kind of robot, which includes the interactive voice being arranged in the robot
Device;The voice interaction device is using voice interaction device described in above embodiment three or embodiment four.Using
The robot that this method embodiment provides can make the voice interactive function of robot more intelligent and personalize.
In above-described embodiment provided by the present invention, it should be understood that disclosed device and method can pass through it
Its mode is realized.For example, the apparatus embodiments described above are merely exemplary, for example, the division of the module, only
Only a kind of logical function partition, there may be another division manner in actual implementation, for example, multiple module or components can be tied
Another system is closed or is desirably integrated into, or some features can be ignored or not executed.
The module as illustrated by the separation member may or may not be physically separated, aobvious as module
The component shown may or may not be physical module, it can and it is in one place, or may be distributed over multiple
In network unit.Some or all of the modules therein can be selected to realize the mesh of this embodiment scheme according to the actual needs
's.
The above is only the embodiment of the present invention, are not intended to limit the scope of the patents of the invention, although with reference to the foregoing embodiments
Invention is explained in detail, still can be to aforementioned each specific reality for coming for those skilled in the art
It applies technical solution documented by mode to modify, or equivalence replacement is carried out to part of technical characteristic.It is all to utilize this
The equivalent structure that description of the invention and accompanying drawing content are done directly or indirectly is used in other relevant technical fields, similarly
Within the invention patent protection scope.