CN107297745B

CN107297745B - Voice interactive method, voice interaction device and robot

Info

Publication number: CN107297745B
Application number: CN201710505552.1A
Authority: CN
Inventors: 蒋化冰; 陈岳峰; 廖凯; 齐鹏举; 方园; 米万珠; 舒剑; 吴琨; 管伟; 罗璇
Original assignee: Shanghai Wood Wood Robot Technology Co Ltd
Current assignee: JIANGSU MUMENG INTELLIGENT TECHNOLOGY Co.,Ltd.
Priority date: 2017-06-28
Filing date: 2017-06-28
Publication date: 2019-08-13
Anticipated expiration: 2037-06-28
Also published as: CN107297745A

Abstract

The embodiment of the present invention provides a kind of voice interactive method, device and robot, the method is applied to robot, include: to obtain image when the sound source angle of received voice signal is within the scope of the predetermined angle of the robot, identifies the angle of one or more faces in described image；It chooses the facial angle and the immediate face of the sound source angle is speaker；And the angle of the adjustment robot, so that the face center of the speaker falls in center in front of the robot, in order to respond to the voice signal.By this method, device and robot, the voice interactive function of robot can be made more intelligent and personalized.

Description

Voice interactive method, voice interaction device and robot

Technical field

The invention belongs to robot field more particularly to a kind of voice interactive methods, voice interaction device and robot.

Background technique

With the rapid development of modern science and technology, will be used wider and wider for intelligent robot is general, either in family, still The public places such as market, bank can see the figure of intelligent robot.

The interactive voice of robot and speaker is always robot automtion, the important link to personalize, in addition to dialogue On interaction except, direction of the robot relative to speaker, erect-position is also very important intelligent embodiment.

During existing robot and person speech interaction of speaking, usually immediately ahead of speaker's active erect-position to robot, So that interactive voice is more smooth.In contrast, robot cannot be intelligent, quasi- according to the automatic erect-position of voice direction of speaker Peopleization has much room for improvement.

Summary of the invention

In conclusion the embodiment of the present invention provides a kind of voice interactive method, voice interaction device and robot, to reality Existing robot and speak person speech interaction when, it is more intelligent and personalize accurately towards speaker.

In a first aspect, the embodiment of the present invention provides a kind of voice interactive method, it is applied to robot, comprising: when received When the sound source angle of voice signal is within the scope of the predetermined angle of the robot, image is obtained, identifies one in described image The angle of a or multiple faces；It chooses the facial angle and the immediate face of the sound source angle is speaker；And it adjusts The angle of the whole robot, so that the face center of the speaker falls in center in front of the robot, in order to The voice signal is responded.

Further, the method also includes: receive the voice signal；Detect the energy of the voice signal；And When the energy of the voice signal reaches the threshold value of the robot, the sound source angle of the voice signal is positioned.

Further, the method also includes: when the sound source angle of received voice signal is not in the pre- of the robot If when in angular range, adjusting the angle of the robot, so that receiving the angle of the voice signal in the robot Within the scope of predetermined angle.

Further, the method is in the angle for adjusting the robot, so that the face center of the speaker It falls in front of the robot after the step of center, further includes: judge whether more than between the robot preset time Every；And when being more than the preset time interval of the robot, the facial image of the speaker is obtained, identifies speaker's Facial angle adjusts the angle of the robot, so that the face center of the speaker falls in center in front of the robot Position.

Further, in the method, carrying out response to the voice signal includes: to carry out language to the voice signal Sound identification；According to speech recognition as a result, progress natural language understanding, retrieves corresponding answer；And by the answer with language The mode of sound synthesis or limb action is response to the speaker.

Further, angle of the sound source angle between Sounnd source direction and robot front center direction.

Further, angle of the facial angle between the face and described image shooting direction.

Second aspect, the embodiment of the present invention provide a kind of voice interaction device, are applied to robot, comprising: pickup module, For judging the sound source angle of received voice signal whether within the scope of the predetermined angle of the robot；Image obtains mould Block, for when the sound source angle of received voice signal is within the scope of the predetermined angle of the robot, obtaining image, identification The angle of one or more faces in described image；And choose the facial angle and the immediate people of the sound source angle Face is speaker；And angle adjusts module, for adjusting the angle of the robot, so that the face center of the speaker Center in front of the robot is fallen in, in order to respond to the voice signal.

Further, the pickup module further comprises: energy measuring submodule, receives the sound for detecting The energy of signal；Auditory localization submodule reaches the threshold value of the robot for the energy when the voice signal, positions institute State the sound source angle of voice signal；And speech recognition submodule, for carrying out speech recognition to the voice signal.

Further, described device further include: interaction response module, for according to the speech recognition as a result, carry out Corresponding answer is retrieved in natural language understanding, by the answer response to described in a manner of speech synthesis or limb action Speaker.

Further, the angle adjustment module is also used to, when the sound source angle of the received voice signal is not in institute When stating within the scope of the predetermined angle of robot, the angle of the robot is adjusted, so that the angle for receiving the voice signal exists Within the scope of the predetermined angle of the robot.

Further, described device further include: timing module, for being spaced at regular intervals, triggering described image is obtained Modulus block reacquires the facial image of speaker, identifies the facial angle of speaker；And trigger angle adjustment module adjustment The angle of the robot, so that the face center of the speaker falls in center in front of the robot.

The third aspect, the embodiment of the present invention provide a kind of robot.The robot includes being arranged in the robot Voice interaction device；The voice interaction device is realized using technical solution provided by the above embodiment.

Voice interactive method, voice interaction device and the robot provided through the embodiment of the present invention, passes through localization of sound source Angle and facial angle, and speaker is determined from the scene of more people by the registration of the two, and automatically by robot Positive accurate steering speaker, then carries out interactive voice.So that robot and speak person speech interaction when it is more intelligent and It personalizes.In addition, because the front of robot is accurate to turn to speaker, the pickup direction of robot just with Sounnd source direction weight It closes, so that pickup angle is optimal, is also advantageous to the accurate acquisition of voice signal.

Detailed description of the invention

It, below will be to embodiment or description of the prior art in order to illustrate more clearly of the present invention or scheme in the prior art Needed in attached drawing make one and simple introduce, it should be apparent that, the accompanying drawings in the following description is some realities of the invention Example is applied, it for those of ordinary skill in the art, without creative efforts, can also be according to these attached drawings Obtain other attached drawings.

Fig. 1 is a kind of flow diagram of voice interactive method provided by the embodiment of the present invention one；

Fig. 2 is a kind of flow diagram of voice interactive method provided by the embodiment of the present invention two；

Fig. 3 is a kind of composed structure schematic diagram of voice interaction device provided by the embodiment of the present invention three；

A kind of composed structure schematic diagram of voice interaction device provided by the position Fig. 4 embodiment of the present invention four.

Specific embodiment

In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described.Obviously, described embodiment is only A part of the embodiment of the present invention gives presently preferred embodiments of the present invention instead of all the embodiments in attached drawing.The present invention can To realize in many different forms, however it is not limited to embodiment described herein, on the contrary, provide the mesh of these embodiments Be to make the disclosure of the present invention more thorough and comprehensive.Based on the embodiments of the present invention, the common skill in this field Art personnel all other embodiment obtained without creative efforts belongs to the model that the present invention protects It encloses.

Unless otherwise defined, all technical and scientific terms used herein and belong to technical field of the invention The normally understood meaning of technical staff is identical.Term as used herein in the specification of the present invention is intended merely to description tool The purpose of the embodiment of body, it is not intended that in the limitation present invention.In description and claims of this specification and above-mentioned attached drawing Term " first ", " second " etc. be to be not use to describe a particular order for distinguishing different objects.In addition, term " packet Include " and " having " and their any deformations, it is intended that it covers and non-exclusive includes.Such as contain series of steps or list The process, method, system, product or equipment of member are not limited to listed step or unit, but optionally further comprising do not have The step of listing or unit, or optionally further comprising other steps intrinsic for these process, methods, product or equipment or Unit.

Referenced herein " embodiment " is it is meant that a particular feature, structure, or characteristic described can wrap in conjunction with the embodiments Containing at least one embodiment of the present invention.Each position in the description occur the phrase might not each mean it is identical Embodiment, nor the independent or alternative embodiment with other embodiments mutual exclusion.Those skilled in the art explicitly and Implicitly understand, embodiment described herein can be combined with other embodiments.

Embodiment one

The embodiment of the present invention one provides a kind of voice interactive method, is applied to robot.In embodiments of the present invention, machine The design of people generallys use, such as 165 centimetre height similar to general adult height.For the ease of pickup, annular is usually utilized Microphone array carries out pickup, for example 6 Mike's schemes, array are in some position on head, such as the crown, is horizontally arranged, main Microphone (No. 0 microphone) is located at head front center；The specified wave beam enhancing direction of microphone array is just in 0 ° of direction.Pass through After microphone array obtains voice signal, automatic speech recognition (ASR auto speech recognition) can be carried out, it is fixed Position sound source angle, judges acoustic energy.It is waken up when wherein speech recognition engine is long.There are two types of mechanism for it: one is pass through Specific wake up after word wakes up starts speech recognition, and another kind is that speech recognition engine is in wake-up states forever.The present embodiment Selection latter, and sound source angle and energy, can obtain at any time.After speech recognition result obtains, there is a natural language Speech understands (Natural Language Understanding, NLU) module, and the corresponding answer of speech recognition result is retrieved Come, and robot is allowed to feed back to speaker.Particularly, the result of speech recognition may can not find corresponding response in systems and refer to It enables, otherwise our this results are referred to as meaningless as a result, being then significant.

One camera is housed immediately ahead of the robot crown, is located at head front central point, there are certain views for camera Angular region, such as in 100-120 ° of range, the face got in image is sufficed to identify in above-mentioned angular field of view, and know The angle of one or more faces in other image, can provide the angle between given face and described image shooting direction, Namely angle of the face relative to head front central point.

There is rotatory power on robot chassis, can make robot 360 degree rotation.The idler wheel on pedestal can also be passed through simultaneously Rotate robot.

Robot can respond speaker by speech synthesis, limb action etc..

Refering to fig. 1, it is illustrated as a kind of flow diagram of voice interactive method provided in an embodiment of the present invention, this method can To be applied to above-mentioned robot, comprising:

Step S1001: when the sound source angle of received voice signal is within the scope of predetermined angle, image, identification are obtained The angle of one or more faces in image.

Robot can carry out pickup by annular microphone array, preset the angular range of pickup, the present embodiment setting Immediately ahead of the robot between positive and negative 60 degree of angles.If the voice signal received is in the angular range, it is believed that effectively, into One step obtains image by the camera of robot head front center point position, convenient for further positioning speaker.Usual feelings Under condition, the shooting angle range of robot camera is greater than the preset pickup angular range of robot, is clapped convenient for sound object It photographs and.

Step S1002: choosing the facial angle and the immediate face of the sound source angle is speaker.

If robot is located under the scene of more people's erect-positions, facial angle and the sound source angle in image are further chosen Immediate face is speaker.

In the embodiment of the present invention, " speaker " is the sound object for making a sound signal, including but not limited to: natural person, Robot or other sounding objects；" face " is the main body front of sound object, such as: face, machine face, sounding object Front.

Step S1003: adjustment robot angle, so that the face center of the speaker falls in centre bit in front of robot It sets, in order to be responded to the voice signal.

In the embodiment of the present invention, the face center of speaker falls in center in front of robot, with the face of speaker Center fall in robot front dead center heart position be best or the positive and negative 15-30 degree of center position or so between.

In the embodiment of the present invention, voice is carried out to the received voice signal arrived while executing step S1001 Identification；According to speech recognition as a result, progress natural language understanding, retrieves corresponding answer, and by the answer with voice The mode of synthesis or limb action is response to the speaker.In other embodiments, step S1003 can also executed Afterwards, i.e., when robot forward direction faces speaker, to the voice signal speech recognition, according to speech recognition as a result, carrying out Natural language understanding, retrieves corresponding answer, and by the answer in a manner of speech synthesis or limb action response to The speaker.

The voice interactive method provided through the embodiment of the present invention by localization of sound source angle and facial angle, and passes through The registration of the two determines speaker from the scene of more people, and automatically by the accurate steering speaker in the front of robot, so After carry out interactive voice.So that robot with speak person speech interaction when it is more intelligent and personalize.In addition, because robot Front it is accurate turn to speaker, the pickup direction of robot is just overlapped with Sounnd source direction so that pickup angle is optimal, also ten Divide the accurate acquisition for being conducive to voice signal.

Embodiment two

Second embodiment of the present invention provides a kind of voice interactive methods, are applied to robot.Referring to Fig.2, being illustrated as the present invention A kind of flow diagram for voice interactive method that embodiment provides.

Step S2001: voice signal is received.

In embodiments of the present invention, the design of robot generallys use similar to general adult height, such as 165 centimetres It is high.For the ease of receiving voice signal, pickup usually is carried out using annular microphone array, for example 6 Mike's schemes, array are in Some position on head, such as the crown are horizontally arranged, and main microphon (No. 0 microphone) is located at head front center；Microphone array The specified wave beam enhancing direction of column is just in 0 ° of direction.

Step S2002: detecting the energy of voice signal, judges that the energy of the voice signal reaches threshold value? if so, Execute step S2003；If it is not, process terminates.

After obtaining voice signal by annular microphone array, the energy of voice signal is further detected, judges the sound Does the energy of sound signal reach threshold value? if reaching, it is believed that be effective voice signal, continue to execute step S2003；If not reaching It arrives, it is believed that the voice signal received is invalid.

Whether step S2003: positioning the sound source angle of the voice signal, judge the sound source angle in predetermined angle model In enclosing? if so, executing step S2005；If it is not, executing step S2004.

After obtaining voice signal by annular microphone array, judge the sound source angle of voice signal whether in predetermined angle In range.Angle of the sound source angle between Sounnd source direction and robot front center direction.In predetermined angle model Interior voice signal is enclosed, robot could preferably carry out speech recognition and interaction.If the sound source angle of voice signal is not pre- If angular range in, then adjust the angle of robot.

In the embodiment of the present invention, the preset sound source angle range of the robot is positive and negative 60 degree of folders immediately ahead of robot Between angle.

Step S2004: adjusting the angle of robot, so that receiving the sound source angle of the voice signal described preset In range.

In the embodiment of the present invention, there is rotatory power on robot chassis, can make robot 360 degree rotation.It simultaneously can also be with Robot is rotated by the idler wheel on pedestal.

Step S2005: obtaining image, identifies the angle of one or more faces in image.

In the embodiment of the present invention, a camera is housed immediately ahead of the robot crown, is located at head front central point, visual angle In 100-120 ° of range, for obtaining image, and sufficing to identify the face got in image, and identify in image The angle of one or more faces can provide angle namely face phase between given face and described image shooting direction For the angle of robot head front center point.Under normal conditions, the shooting angle range of robot camera is greater than machine The preset pickup angular range of people, is taken convenient for sound object.

Step S2006: choosing the facial angle and the immediate face of sound source angle is speaker.

Identify that one or more facial images, each facial image can calculate face and image acquisition side from image To the angle of (i.e. camera positive direction), selection and sound source angle are closest, such as differential seat angle is away from less than 15 degree, and identification is attached most importance to It closes.Assert that the corresponding people of the facial image is speaker.

Step S2007: adjustment robot angle, so that the face center of the speaker falls in centre bit in front of robot It sets, in order to be responded to the voice signal.

In the embodiment of the present invention, the face center of speaker falls in center in front of robot, with the face of speaker It is best that center, which falls in robot front dead center heart position,；Alternatively, between the positive and negative 15-30 degree of center position or so.

In the embodiment of the present invention, there is rotatory power on robot chassis, can make robot 360 degree rotation.It simultaneously can also be with Robot is rotated by the idler wheel on pedestal.As rotary machine people, so that the face center of the speaker falls in robot When the center of front, the pickup angle of robot is more excellent, and seems more intelligent and quasi- in the interactive voice with speaker Peopleization.

Step S2008: judge whether to be more than preset time interval? if so, executing step S2009；If it is not, this process knot Beam.

Step S2009: obtaining the facial image of speaker, identifies the facial angle of speaker.Then, step is further executed Rapid S2007.

The robot of the embodiment of the present invention during turning, is understood at regular intervals, such as: preset time interval is It 0.5 second, is once updated, looks at whether the position of speaker and angle change, if need to adjust the angle of robot.

In the embodiment of the present invention, voice is carried out to the received voice signal arrived while executing step S2005 Identification；And by natural language understanding (NLU natural language understanding) module, voice is known The corresponding answer of other result is retrieved, and robot is allowed to feed back to speaker.Particularly, the result of speech recognition may be It can not find corresponding response instruction in system, otherwise our this results are referred to as meaningless as a result, being then significant.Robot meeting Speaker is responded by speech synthesis or limb action etc..In other embodiments of the invention, step can also executed After S2007, i.e., when robot forward direction faces speaker, to the voice signal speech recognition, according to speech recognition as a result, Natural language understanding is carried out, retrieves corresponding answer, and the answer is returned in a manner of speech synthesis or limb action The speaker should be given.

Embodiment three

The embodiment of the present invention three provides a kind of voice interaction device, refering to Fig. 3.The voice interaction device is applied to machine People, comprising: pickup module 100, image collection module 120 and angle adjust module 140.The voice provided through this embodiment Interactive device may be implemented above-described embodiment one and implement two voice interactive methods provided.

Pickup module 100, for judge received voice signal sound source angle whether the robot default sound In the angular range of source.

Image collection module 120, for working as predetermined angle of the sound source angle in the robot of received voice signal When in range, image is obtained, the angle of one or more faces in image is identified, chooses the facial angle and the sound source The immediate face of angle is speaker.

Angle adjusts module 140, for adjusting the angle of the robot, so that the face center of the speaker is fallen in Center in front of the robot, in order to be responded to the voice signal.

The voice interaction device provided through the embodiment of the present invention by localization of sound source angle and facial angle, and passes through The registration of the two determines speaker from the scene of more people, and automatically by the accurate steering speaker in the front of robot, so After carry out interactive voice.So that robot with speak person speech interaction when it is more intelligent and personalize.In addition, because robot Front it is accurate turn to speaker, the pickup direction of robot is just overlapped with Sounnd source direction so that pickup angle is optimal, also ten Divide the accurate acquisition for being conducive to voice signal.

Example IV

The embodiment of the present invention four provides a kind of voice interaction device, refering to Fig. 4.The voice interaction device is applied to machine People, comprising: pickup module 200, image collection module 220, angle adjustment module 240 and interaction response module 260.By this reality The voice interaction device for applying example offer may be implemented above-described embodiment one and implement two voice interactive methods provided.

Pickup module 200, for judge received voice signal sound source angle whether the robot default sound In the angular range of source.Angle of the sound source angle between Sounnd source direction and robot front center direction.The machine The preset sound source angle range of device people is between the robot positive and negative 60 degree of angles in front

The pickup module 200 further comprises: energy measuring submodule 202, receives the sound letter for detecting Number energy；

Auditory localization submodule 204 reaches the threshold value of the robot for the energy when the voice signal, positions institute State the sound source angle of voice signal；And

Speech recognition submodule 206, for carrying out speech recognition to the voice signal.

Image collection module 120, for working as predetermined angle of the sound source angle in the robot of received voice signal When in range, image is obtained, the angle of one or more faces in image is identified, chooses the facial angle and the sound source The immediate face of angle is speaker.Angle of the facial angle between face and described image shooting direction.Usually In the case of, the shooting angle range of robot camera is greater than the preset pickup angular range of robot, is convenient for sound object quilt It is filmed.

Angle adjusts module 140, for adjusting the angle of the robot, so that the face center of the speaker is fallen in Center in front of the robot.In addition, when the sound source angle of the received voice signal is not in the pre- of the robot If when in angular range, adjusting the angle of the robot, so that receiving the angle of the voice signal in the robot Within the scope of predetermined angle.

Interaction response module 160, for being responded to the voice signal.It is specifically used for, according to speech recognition submodule The speech recognition result of block 206 carries out natural language understanding, retrieves corresponding answer, by the answer with speech synthesis or The mode of limb action is response to the speaker.

On the basis of the improvement of the embodiment of the present invention, can further include timing module (not shown), for every Interval of time, triggering described image obtain the facial image that module 220 reacquires speaker, identify the face of speaker Angle；And trigger angle adjustment module 240 adjusts the angle of the robot, so that the face center of the speaker is fallen The center in front of the robot.

In speech recognition equipment provided in an embodiment of the present invention, pickup module 200 can be mounted in robot head Annular microphone array.Pickup is carried out by annular microphone array, for example 6 Mike's schemes, array are in some position on head It sets, such as the crown, is horizontally arranged, main microphon is located at head front center；The specified wave beam enhancing direction of microphone array is just In 0 ° of direction.Image collection module 220 can be mounted in the camera on robot head top, be located at head front central point, depending on Angle has the ability to obtain picture in 100-120 ° of range, and identifies the face in image, by software application calculate face relative to The angle in image taking direction (i.e. head front central point).Angle adjusts module 240 can be by being mounted on robot bottom There is the chassis of rotatory power to realize.

Embodiment five

The embodiment of the present invention five provides a kind of robot, which includes the interactive voice being arranged in the robot Device；The voice interaction device is using voice interaction device described in above embodiment three or embodiment four.Using The robot that this method embodiment provides can make the voice interactive function of robot more intelligent and personalize.

In above-described embodiment provided by the present invention, it should be understood that disclosed device and method can pass through it Its mode is realized.For example, the apparatus embodiments described above are merely exemplary, for example, the division of the module, only Only a kind of logical function partition, there may be another division manner in actual implementation, for example, multiple module or components can be tied Another system is closed or is desirably integrated into, or some features can be ignored or not executed.

The module as illustrated by the separation member may or may not be physically separated, aobvious as module The component shown may or may not be physical module, it can and it is in one place, or may be distributed over multiple In network unit.Some or all of the modules therein can be selected to realize the mesh of this embodiment scheme according to the actual needs 's.

The above is only the embodiment of the present invention, are not intended to limit the scope of the patents of the invention, although with reference to the foregoing embodiments Invention is explained in detail, still can be to aforementioned each specific reality for coming for those skilled in the art It applies technical solution documented by mode to modify, or equivalence replacement is carried out to part of technical characteristic.It is all to utilize this The equivalent structure that description of the invention and accompanying drawing content are done directly or indirectly is used in other relevant technical fields, similarly Within the invention patent protection scope.

Claims

1. a kind of voice interactive method is applied to robot characterized by comprising

Receive voice signal；

Detect the energy of the voice signal；And

When the energy of the voice signal reaches the threshold value of the robot, the sound source angle of the voice signal is positioned；

When the sound source angle of the received voice signal is within the scope of the predetermined angle of the robot, image is obtained, is known The angle of one or more faces in other described image；

It chooses the facial angle and the immediate face of the sound source angle is speaker；And

The angle of the robot is adjusted, so that the face center of the speaker falls in center in front of the robot, In order to be responded to the voice signal；

Judge whether to be more than the preset time interval of the robot；And

When being more than preset time interval described in the robot, the facial image of the speaker is obtained, identifies speaker Facial angle, the angle of the robot is adjusted, so that the face center of the speaker is fallen in in front of the robot Heart position.

2. the method according to claim 1, wherein the method also includes:

When the sound source angle of received voice signal is not within the scope of the predetermined angle of the robot, the robot is adjusted Angle so that receiving the angle of the voice signal within the scope of the predetermined angle of the robot.

3. method according to any one of claims 1 to 2, which is characterized in that described to carry out response bag to the voice signal It includes:

Speech recognition is carried out to the voice signal；

According to speech recognition as a result, progress natural language understanding, retrieves corresponding answer；And

By the answer response to the speaker in a manner of speech synthesis or limb action.

4. method according to any one of claims 1 to 2, which is characterized in that the sound source angle be Sounnd source direction with it is described Angle between robot front center direction.

5. method according to any one of claims 1 to 2, which is characterized in that the facial angle be the face with it is described Angle between the shooting direction of image.

6. a kind of voice interaction device is applied to robot characterized by comprising

Pickup module, for judge received voice signal sound source angle whether the robot default sound source angle model In enclosing；The pickup module further include:

Energy measuring submodule, for detecting the energy for receiving the voice signal；

Auditory localization submodule reaches the threshold value of the robot for the energy when the voice signal, positions the sound The sound source angle of signal；And

Speech recognition submodule, for carrying out speech recognition to the voice signal；

Image collection module, for when the sound source angle of received voice signal is in the default range of the robot, Image is obtained, the angle of one or more faces in described image is identified, chooses the facial angle and the sound source angle Immediate face is speaker；And

Angle adjusts module, for adjusting the angle of the robot, so that the face center of the speaker falls in the machine Center in front of device people, in order to be responded to the voice signal.

7. device according to claim 6, which is characterized in that further comprise: interaction response module, for according to Speech recognition as a result, carry out natural language understanding, retrieve corresponding answer, the answer moved with speech synthesis or limbs The mode of work is response to the speaker.

8. device according to claim 6, which is characterized in that the angle adjustment module is also used to, when received described When the sound source angle of voice signal is not within the scope of the predetermined angle of the robot, the angle of the robot is adjusted, so that The angle of the voice signal is received within the scope of the predetermined angle of the robot.

9. device according to claim 6, which is characterized in that described device further include: timing module, for every one section Time interval, triggering described image obtain the facial image that module reacquires speaker, identify the facial angle of speaker；With And trigger angle adjustment module adjusts the angle of the robot, so that the face center of the speaker falls in the machine Center in front of people.

10. according to any device of claim 6 to 9, which is characterized in that the sound source angle be Sounnd source direction with it is described Angle between robot front center direction.

11. according to any device of claim 6 to 9, which is characterized in that the facial angle is face and described image Shooting direction between angle.

12. a kind of robot characterized by comprising

Voice interaction device in the robot is set；

The voice interaction device is using voice interaction device described in any one of the claims 6 to 11.