CN110751946A - Robot and voice recognition device and method thereof - Google Patents

Robot and voice recognition device and method thereof Download PDF

Info

Publication number
CN110751946A
CN110751946A CN201911061814.5A CN201911061814A CN110751946A CN 110751946 A CN110751946 A CN 110751946A CN 201911061814 A CN201911061814 A CN 201911061814A CN 110751946 A CN110751946 A CN 110751946A
Authority
CN
China
Prior art keywords
signal
voice signal
voice
sound source
source direction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911061814.5A
Other languages
Chinese (zh)
Inventor
蒲东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cloudminds Robotics Co Ltd
Original Assignee
As Science And Technology Chengdu Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by As Science And Technology Chengdu Co Ltd filed Critical As Science And Technology Chengdu Co Ltd
Priority to CN201911061814.5A priority Critical patent/CN110751946A/en
Publication of CN110751946A publication Critical patent/CN110751946A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J11/00Manipulators not otherwise provided for
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J13/00Controls for manipulators
    • B25J13/003Controls for manipulators by means of an audio-responsive input
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Robotics (AREA)
  • Mechanical Engineering (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Obtaining Desirable Characteristics In Audible-Bandwidth Transducers (AREA)

Abstract

The disclosure relates to a robot and a voice recognition device and method thereof, belonging to the field of voice processing and capable of accurately performing voice recognition in various scenes. A voice recognition apparatus applied to a robot, comprising: a distributed microphone array comprising a first microphone array located on a front face of the robot and a second microphone array located on a back face of the robot for acquiring a first speech signal and a second speech signal, respectively; and the voice processor is used for fusing the first voice signal and the second voice signal to perform voice recognition.

Description

Robot and voice recognition device and method thereof
Technical Field
The present disclosure relates to the field of speech processing, and in particular, to a robot and a speech recognition apparatus and method thereof.
Background
Currently, a single microphone and a linear microphone are usually installed on a robot body to perform voice recognition. However, when the robot is applied to an environment such as a convention show, a business hall, or the like, since the environment is noisy and the robot is constantly moving, the voice recognition accuracy is poor.
Disclosure of Invention
An object of the present disclosure is to provide a robot, and a voice recognition apparatus and method thereof, capable of accurately performing voice recognition in various scenes.
According to a first embodiment of the present disclosure, there is provided a voice recognition apparatus applied to a robot, including: a distributed microphone array comprising a first microphone array located on a front face of the robot and a second microphone array located on a back face of the robot for acquiring a first speech signal and a second speech signal, respectively; and the voice processor is used for fusing the first voice signal and the second voice signal to perform voice recognition.
Optionally, the first microphone array and the second microphone array are each one of: linear microphone arrays, annular microphone arrays, and spherical microphone arrays.
Optionally, the first microphone array is located on a chest of the robot and the second microphone array is located on a back of the robot.
Optionally, the speech processor comprises: the sound source direction determining unit is used for determining a first sound source direction based on the first voice signal and determining a second sound source direction based on the second voice signal; a beamforming unit configured to perform beamforming on the first voice signal in which the first sound source direction is determined, and perform beamforming on the second voice signal in which the second sound source direction is determined; the signal-to-noise ratio calculation unit is used for respectively calculating the signal-to-noise ratio of the first voice signal and the signal-to-noise ratio of the second voice signal after beam forming; the noise reduction processing unit is used for using the voice signal with the excellent signal-to-noise ratio as a noise reference signal and performing noise reduction processing on the voice signal with the poor signal-to-noise ratio by using the noise reference signal; and a voice recognition unit for performing voice recognition based on the voice signal after the noise reduction processing.
Optionally, the beamforming unit is configured to: calculating a first spatial delay of the first voice signal by using a first area array corresponding to the first microphone array, and calculating a second spatial delay of the second voice signal by using a second area array corresponding to the second microphone array; and calculating the weight of the direction vector of the first voice signal according to the first spatial delay and updating the corresponding blocking matrix, and calculating the weight of the direction vector of the second voice signal according to the second spatial delay and updating the corresponding blocking matrix.
Optionally, the speech processor further includes a final sound source direction determining unit configured to determine a sound source direction of the speech signal with a high signal-to-noise ratio as the final sound source direction.
Optionally, the speech processor further comprises an echo cancellation unit for selecting an array of microphones further away from the loudspeaker to perform echo cancellation before performing beamforming.
According to a second embodiment of the present disclosure, there is provided a robot including the voice recognition apparatus according to the first embodiment of the present disclosure.
According to a third embodiment of the present disclosure, there is provided a voice recognition method applied to a robot, including: acquiring a first voice signal by a first microphone array located on a front side of the robot and acquiring a second voice signal by a second microphone array located on a back side of the robot; and fusing the first voice signal and the second voice signal for voice recognition.
Optionally, the fusing the first voice signal and the second voice signal for voice recognition includes: determining a first sound source direction based on the first voice signal, and determining a second sound source direction based on the second voice signal; performing beamforming on the first voice signal in which the first sound source direction is determined, and performing beamforming on the second voice signal in which the second sound source direction is determined; respectively calculating the signal-to-noise ratio of the first voice signal and the signal-to-noise ratio of the second voice signal after beam forming; using the voice signal with excellent signal-to-noise ratio as a noise reference signal, and performing noise reduction processing on the voice signal with poor signal-to-noise ratio by using the noise reference signal; and performing voice recognition based on the voice signal after the noise reduction processing.
Optionally, the performing beamforming on the first voice signal with the first sound source direction determined and performing beamforming on the second voice signal with the second sound source direction determined includes: calculating a first spatial delay of the first voice signal by using a first annular area array corresponding to the first microphone array, and calculating a second spatial delay of the second voice signal by using a second annular area array corresponding to the second microphone array; and calculating the weight of the direction vector of the first voice signal according to the first spatial delay and updating the corresponding blocking matrix, and calculating the weight of the direction vector of the second voice signal according to the second spatial delay and updating the corresponding blocking matrix.
Optionally, the method further comprises: and determining the sound source direction of the voice signal with the excellent signal-to-noise ratio as the final sound source direction.
Optionally, the method further comprises: an array of microphones that are a little further away from the loudspeaker are selected to perform echo cancellation before beamforming is performed.
By adopting the technical scheme, the voice recognition device and the voice recognition method according to the embodiment of the disclosure utilize the distributed microphone arrays on the front and the back of the robot to pick up the voice and fuse the first voice signal and the second voice signal to perform voice recognition, so that 360-degree positioning and pickup can be performed in strong noise (such as in the environments of exhibition, business hall and the like) and the scenes of robot motion, voice recognition can be accurately performed, and the robustness of voice interaction is enhanced.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure without limiting the disclosure. In the drawings:
fig. 1 shows a schematic block diagram of a speech recognition apparatus applied to a robot according to an embodiment of the present disclosure.
Fig. 2 shows a schematic diagram of a first microphone array and a second microphone array on the chest and back, respectively, of a robot and each being an 8-microphone loop microphone array.
Fig. 3a and 3b show schematic diagrams of a ring-shaped microphone array placed flat and upright, respectively.
Fig. 4 shows a flowchart of a voice recognition method applied to a robot according to an embodiment of the present disclosure.
Fig. 5 shows a flow chart of how a first speech signal and a second speech signal are fused for speech recognition.
Detailed Description
The following detailed description of specific embodiments of the present disclosure is provided in connection with the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present disclosure, are given by way of illustration and explanation only, not limitation.
Fig. 1 shows a schematic block diagram of a speech recognition apparatus applied to a robot according to an embodiment of the present disclosure. As shown in fig. 1, the speech recognition apparatus includes: a distributed microphone array 1, the distributed microphone array 1 comprising a first microphone array 11 located on the front face of the robot and a second microphone array 12 located on the back face of the robot for respectively acquiring a first voice signal and a second voice signal; and the voice processor 2 is used for fusing the first voice signal and the second voice signal to perform voice recognition.
The first microphone array 11 may be arranged at least at one location of the robot's chest, front of the legs etc., preferably on the chest. The second microphone array 12 may be arranged on at least one location of the back, the back of the brain, the back of the legs, etc. of the robot, preferably on the back.
The first microphone array 11 and the second microphone array 12 may each be one of: linear microphone arrays, annular microphone arrays, and spherical microphone arrays. For example, the first microphone array 11 and the second microphone array 12 may both be implemented by an annular microphone array, or the first microphone array 11 may be implemented by a linear microphone array and the second microphone array 12 may be implemented by an annular microphone array, and so on. In addition, the linear microphone array may be an array of n rows and m arrays, where n and m are both positive integers greater than 2, in order to achieve 360 degree speech recognition; the annular microphone array may be a j-microphone array, where j is a positive integer greater than 4, such as a 4-microphone annular microphone array, a 5-microphone annular microphone array, an 8-microphone annular microphone array, and so forth.
With the help of first microphone array 11, realized the three-dimensional location pickup in the front space of robot, with the help of second microphone array 12, realized the three-dimensional location pickup in the back space of robot, through the combination of low pressure microphone array 11 and second microphone array 12 then can realize the space location pickup in the whole robot all sides, no dead angle, can realize more focused beam forming, promote the noise reduction effect. Moreover, through the arrangement of the distributed microphone array, the problem that the depth of the microphone aperture is inconsistent due to the fact that the body of the robot is not smooth and undulates and the problem that the microphone cannot be deployed to effectively receive the voice in any direction due to the posture of a robot product can be solved.
Fig. 2 shows a schematic diagram of a first microphone array 11 and a second microphone array 12, which are located on the chest and back, respectively, of a robot and which are each an 8-microphone loop microphone array. The double arrow in fig. 2 indicates that an 8-microphone loop microphone array indicated by reference numeral 12 is located on the back of the robot. Then, the first voice signal acquired by the first microphone array 11 is an 8-channel voice signal, and the second voice signal acquired by the second microphone array 12 is also an 8-channel voice signal.
By adopting the technical scheme, because the voice recognition device according to the embodiment of the disclosure comprises the distributed microphone arrays on the front and back of the robot, and the voice processor 2 performs voice recognition by fusing the first voice signal and the second voice signal, 360-degree positioning and sound pickup can be performed in scenes of strong noise (such as in environments of exhibition, business hall and the like) and robot motion, voice recognition can be accurately performed, and robustness of voice interaction is enhanced.
In one embodiment, the speech processor 2 may include a sound source direction determining unit, a beam forming unit, a signal-to-noise ratio calculating unit, a noise reduction processing unit, and a speech recognition unit.
The sound source Direction determining unit is configured to determine a first sound source Direction based on the first speech signal and a second sound source Direction based on the second speech signal, and the sound source Direction may be determined using a Direction of Arrival (DOA) estimation algorithm, for example.
The beam forming unit is used for carrying out beam forming on a first voice signal which determines the first sound source direction and carrying out beam forming on a second voice signal which determines the second sound source direction.
The signal-to-noise ratio calculating unit is used for respectively calculating the signal-to-noise ratio of the first voice signal and the signal-to-noise ratio of the second voice signal after beam forming.
The noise reduction processing unit is configured to use a speech signal with a superior signal-to-noise ratio as a noise reference signal, and perform noise reduction processing on a speech signal with a poor signal-to-noise ratio by using the noise reference signal, for example, if the signal-to-noise ratio of the first speech signal is better than that of the second speech signal, the noise reduction processing unit may use the first speech signal as the noise reference signal, that is, the first speech signal is used as a noise spectrum input of post filtering in the post filtering processing process after beamforming, and then, based on, for example, wiener filtering or a statistical model or other methods, the stationary noise in the second speech signal is removed. In an actual application scene, due to the posture of the robot, in the interaction process, one surface of the array is necessarily opposite to the actual sound source, so that the microphone array facing the actual sound source can be used for sound pickup and noise reduction, and the microphone array opposite to the actual sound source is used as a reference signal. The noise reduction processing unit may be implemented using various suitable filters.
And the voice recognition unit is used for carrying out voice recognition based on the voice signal after the noise reduction processing. Still taking the above-mentioned example as an example, in the case of using the first speech signal as the noise spectrum input to eliminate stationary noise in the second speech signal, the speech recognition unit performs speech recognition based on the noise-reduced second speech signal.
In the prior art, only a single microphone array is used for sound pickup, so that only one sound source direction needs to be positioned, and a noise spectrum obtained by using a statistical model needs to be used as a noise reference signal during noise reduction processing. In the application, the distributed microphone arrays respectively pick up the voice signals from the front and the back of the robot, so that the voice signals picked up by the microphone arrays need to be respectively positioned in the direction of a sound source, and in the noise reduction process, the voice signals with excellent signal-to-noise ratio are used as noise reference signals, and the noise reference signals are used for carrying out noise reduction process on the voice signals with poor signal-to-noise ratio.
In one embodiment, the speech processor 2 further includes a final sound source direction determining unit configured to determine a sound source direction of the speech signal with the excellent signal-to-noise ratio as the final sound source direction. Therefore, the accuracy of target tracking in the moving process of the robot is improved.
In the prior art, a planar microphone array, a ring microphone array, etc. are all placed in a flat manner, so that a linear array or a ring array is adopted for calculation during the beamforming process. In the present disclosure, the microphone array is arranged on the robot body in a vertical manner. Fig. 3a and 3b show schematic diagrams of a ring-shaped microphone array placed flat and upright, respectively. The inventor finds that the conventional linear array and ring array calculation mode is not suitable any more, otherwise, the beam forming processing result is inaccurate. Therefore, it is necessary to improve the existing beam forming to perform the beam forming process on the voice signal picked up by the microphone array placed vertically. That is, the beamforming unit is configured to: calculating a first spatial delay of the first voice signal by using a first area array corresponding to the first microphone array 11, and calculating a second spatial delay of the second voice signal by using a second area array corresponding to the second microphone array 12, for example, when the first microphone array 11 and the second microphone array 12 are both annular microphone arrays, the first area array and the second area array are both annular area arrays; and calculating the weight of the direction vector of the first voice signal according to the first spatial delay and updating the corresponding blocking matrix, and calculating the weight of the direction vector of the second voice signal according to the second spatial delay and updating the corresponding blocking matrix. By adopting the technical scheme, the result of beam forming processing can be more accurate, and the accuracy of voice recognition is higher.
In one embodiment, the speech processor 2 further comprises an echo cancellation unit for selecting an array of microphones that are further away from the loudspeaker to perform echo cancellation before performing beamforming. In the places such as exhibitions, business halls, the sound that the loudspeaker broadcast can fill the whole place, consequently selects which microphone array to do its effect of echo cancellation unanimously basically, selects the microphone array that is far away from loudspeaker to carry out the echo cancellation in principle, because the vibrations or the nonlinear change influence that receive the loudspeaker cavity are minimum, and the advantage that beam forming can exert is better simultaneously.
According to still another embodiment of the present disclosure, there is provided a robot including a voice recognition apparatus according to an embodiment of the present disclosure.
Fig. 4 shows a flowchart of a voice recognition method applied to a robot according to an embodiment of the present disclosure. As shown in fig. 4, the method includes:
in step S41, acquiring a first voice signal by a first microphone array located on the front face of the robot, and acquiring a second voice signal by a second microphone array located on the back face of the robot;
in step S42, the first speech signal and the second speech signal are fused for speech recognition.
By adopting the technical scheme, the voice recognition method according to the embodiment of the disclosure utilizes the distributed microphone arrays on the front and back of the robot to pick up voice and fuses the first voice signal and the second voice signal to perform voice recognition, so that 360-degree positioning and pickup can be performed in strong noise (such as in the environments of exhibitions, business halls and the like) and the scenes of robot motion, voice recognition can be accurately performed, and the robustness of voice interaction is enhanced.
Fig. 5 shows a flow chart of how a first speech signal and a second speech signal are fused for speech recognition.
As shown in fig. 5, includes:
in step S42a, determining a first sound source direction based on the first voice signal and a second sound source direction based on the second voice signal;
in step S42b, beamforming is performed on the first voice signal in which the first sound source direction is determined, and beamforming is performed on the second voice signal in which the second sound source direction is determined;
in step S42c, calculating the signal-to-noise ratio of the beamformed first voice signal and the signal-to-noise ratio of the second voice signal respectively;
in step S42d, the speech signal with the excellent signal-to-noise ratio is used as a noise reference signal, and the noise reference signal is used to perform noise reduction processing on the speech signal with the poor signal-to-noise ratio; and
in step S42e, speech recognition is performed based on the speech signal after the noise reduction processing.
Alternatively, the performing beamforming on the first voice signal with the first sound source direction determined and the performing beamforming on the second voice signal with the second sound source direction determined in step S42b includes: calculating a first spatial delay of the first voice signal by using a first annular area array corresponding to the first microphone array, and calculating a second spatial delay of the second voice signal by using a second annular area array corresponding to the second microphone array; and calculating the weight of the direction vector of the first voice signal according to the first spatial delay and updating the corresponding blocking matrix, and calculating the weight of the direction vector of the second voice signal according to the second spatial delay and updating the corresponding blocking matrix.
Optionally, the method according to the embodiment of the present disclosure further includes: and determining the sound source direction of the voice signal with the excellent signal-to-noise ratio as the final sound source direction.
Optionally, the method according to the embodiment of the present disclosure further includes: an array of microphones that are a little further away from the loudspeaker are selected to perform echo cancellation before beamforming is performed.
Specific implementation manners of the steps involved in the speech recognition method according to the embodiment of the present disclosure have been described in detail in the apparatus according to the embodiment of the present disclosure, and are not described herein again.
The preferred embodiments of the present disclosure are described in detail with reference to the accompanying drawings, however, the present disclosure is not limited to the specific details of the above embodiments, and various simple modifications may be made to the technical solution of the present disclosure within the technical idea of the present disclosure, and these simple modifications all belong to the protection scope of the present disclosure.
It should be noted that the various features described in the above embodiments may be combined in any suitable manner without departing from the scope of the invention. In order to avoid unnecessary repetition, various possible combinations will not be separately described in this disclosure.
In addition, any combination of various embodiments of the present disclosure may be made, and the same should be considered as the disclosure of the present disclosure, as long as it does not depart from the spirit of the present disclosure.

Claims (12)

1. A speech recognition apparatus applied to a robot, comprising:
a distributed microphone array comprising a first microphone array located on a front face of the robot and a second microphone array located on a back face of the robot for acquiring a first speech signal and a second speech signal, respectively;
and the voice processor is used for fusing the first voice signal and the second voice signal to perform voice recognition.
2. The apparatus of claim 1, wherein the first microphone array and the second microphone array are each one of: linear microphone arrays, annular microphone arrays, and spherical microphone arrays.
3. The apparatus of claim 1 or 2, wherein the speech processor comprises:
the sound source direction determining unit is used for determining a first sound source direction based on the first voice signal and determining a second sound source direction based on the second voice signal;
a beamforming unit configured to perform beamforming on the first voice signal in which the first sound source direction is determined, and perform beamforming on the second voice signal in which the second sound source direction is determined;
the signal-to-noise ratio calculation unit is used for respectively calculating the signal-to-noise ratio of the first voice signal and the signal-to-noise ratio of the second voice signal after beam forming;
the noise reduction processing unit is used for using the voice signal with the excellent signal-to-noise ratio as a noise reference signal and performing noise reduction processing on the voice signal with the poor signal-to-noise ratio by using the noise reference signal; and
and the voice recognition unit is used for carrying out voice recognition based on the voice signal after the noise reduction processing.
4. The apparatus of claim 2, wherein the beamforming unit is configured to:
calculating a first spatial delay of the first voice signal by using a first area array corresponding to the first microphone array, and calculating a second spatial delay of the second voice signal by using a second area array corresponding to the second microphone array;
and calculating the weight of the direction vector of the first voice signal according to the first spatial delay and updating the corresponding blocking matrix, and calculating the weight of the direction vector of the second voice signal according to the second spatial delay and updating the corresponding blocking matrix.
5. The apparatus of claim 3, wherein the speech processor further comprises a final sound source direction determining unit configured to determine a sound source direction of the speech signal having the excellent signal-to-noise ratio as the final sound source direction.
6. The apparatus of claim 3, wherein the speech processor further comprises an echo cancellation unit to select an array of microphones farther away from a loudspeaker to perform echo cancellation before performing beamforming.
7. A robot characterized by comprising a speech recognition device according to any one of claims 1 to 6.
8. A speech recognition method applied to a robot, comprising:
acquiring a first voice signal by a first microphone array located on a front side of the robot and acquiring a second voice signal by a second microphone array located on a back side of the robot;
and fusing the first voice signal and the second voice signal for voice recognition.
9. The method of claim 8, wherein said fusing the first speech signal and the second speech signal for speech recognition comprises:
determining a first sound source direction based on the first voice signal, and determining a second sound source direction based on the second voice signal;
performing beamforming on the first voice signal in which the first sound source direction is determined, and performing beamforming on the second voice signal in which the second sound source direction is determined;
respectively calculating the signal-to-noise ratio of the first voice signal and the signal-to-noise ratio of the second voice signal after beam forming;
using the voice signal with excellent signal-to-noise ratio as a noise reference signal, and performing noise reduction processing on the voice signal with poor signal-to-noise ratio by using the noise reference signal; and
and performing voice recognition based on the voice signal after the noise reduction processing.
10. The method of claim 9, wherein the beamforming the first voice signal with the first audio source direction determined and the beamforming the second voice signal with the second audio source direction determined comprises:
calculating a first spatial delay of the first voice signal by using a first annular area array corresponding to the first microphone array, and calculating a second spatial delay of the second voice signal by using a second annular area array corresponding to the second microphone array;
and calculating the weight of the direction vector of the first voice signal according to the first spatial delay and updating the corresponding blocking matrix, and calculating the weight of the direction vector of the second voice signal according to the second spatial delay and updating the corresponding blocking matrix.
11. The method of claim 9, further comprising: and determining the sound source direction of the voice signal with the excellent signal-to-noise ratio as the final sound source direction.
12. The method of claim 9, further comprising: an array of microphones that are a little further away from the loudspeaker are selected to perform echo cancellation before beamforming is performed.
CN201911061814.5A 2019-11-01 2019-11-01 Robot and voice recognition device and method thereof Pending CN110751946A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911061814.5A CN110751946A (en) 2019-11-01 2019-11-01 Robot and voice recognition device and method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911061814.5A CN110751946A (en) 2019-11-01 2019-11-01 Robot and voice recognition device and method thereof

Publications (1)

Publication Number Publication Date
CN110751946A true CN110751946A (en) 2020-02-04

Family

ID=69281888

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911061814.5A Pending CN110751946A (en) 2019-11-01 2019-11-01 Robot and voice recognition device and method thereof

Country Status (1)

Country Link
CN (1) CN110751946A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022135130A1 (en) * 2020-12-24 2022-06-30 北京有竹居网络技术有限公司 Voice extraction method and apparatus, and electronic device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101911724A (en) * 2008-03-18 2010-12-08 高通股份有限公司 Speech enhancement using multiple microphones on multiple devices
US20130216064A1 (en) * 2010-10-29 2013-08-22 Mightyworks Co., Ltd. Multi-beam sound system
CN104936091A (en) * 2015-05-14 2015-09-23 科大讯飞股份有限公司 Intelligent interaction method and system based on circle microphone array
CN106653039A (en) * 2016-12-02 2017-05-10 上海木爷机器人技术有限公司 Audio signal processing system and audio signal processing method
CN107017003A (en) * 2017-06-02 2017-08-04 厦门大学 A kind of microphone array far field speech sound enhancement device
CN206489876U (en) * 2016-11-04 2017-09-12 北京声智科技有限公司 Self-alignment far field interactive voice equipment
CN211529608U (en) * 2019-11-01 2020-09-18 达闼科技成都有限公司 Robot and voice recognition device thereof

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101911724A (en) * 2008-03-18 2010-12-08 高通股份有限公司 Speech enhancement using multiple microphones on multiple devices
US20130216064A1 (en) * 2010-10-29 2013-08-22 Mightyworks Co., Ltd. Multi-beam sound system
CN104936091A (en) * 2015-05-14 2015-09-23 科大讯飞股份有限公司 Intelligent interaction method and system based on circle microphone array
CN206489876U (en) * 2016-11-04 2017-09-12 北京声智科技有限公司 Self-alignment far field interactive voice equipment
CN106653039A (en) * 2016-12-02 2017-05-10 上海木爷机器人技术有限公司 Audio signal processing system and audio signal processing method
CN107017003A (en) * 2017-06-02 2017-08-04 厦门大学 A kind of microphone array far field speech sound enhancement device
CN211529608U (en) * 2019-11-01 2020-09-18 达闼科技成都有限公司 Robot and voice recognition device thereof

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022135130A1 (en) * 2020-12-24 2022-06-30 北京有竹居网络技术有限公司 Voice extraction method and apparatus, and electronic device

Similar Documents

Publication Publication Date Title
US9838785B2 (en) Methods circuits devices systems and associated computer executable code for acquiring acoustic signals
RU2663343C2 (en) System, device and method for compatible reproduction of acoustic scene based on adaptive functions
JP5728094B2 (en) Sound acquisition by extracting geometric information from direction of arrival estimation
US9973848B2 (en) Signal-enhancing beamforming in an augmented reality environment
RU2559520C2 (en) Device and method for spatially selective sound reception by acoustic triangulation
KR101724514B1 (en) Sound signal processing method and apparatus
KR101761312B1 (en) Directonal sound source filtering apparatus using microphone array and controlling method thereof
KR101456866B1 (en) Method and apparatus for extracting the target sound signal from the mixed sound
Aarabi et al. Robust sound localization using multi-source audiovisual information fusion
US20100123785A1 (en) Graphic Control for Directional Audio Input
WO2016183791A1 (en) Voice signal processing method and device
KR20140099536A (en) Apparatus and method for microphone positioning based on a spatial power density
US20130287224A1 (en) Noise suppression based on correlation of sound in a microphone array
KR20170063618A (en) Electronic device and its reverberation removing method
JP2007235334A (en) Audio apparatus and directive sound generating method
JP2019062435A (en) Equipment control device, equipment control program, equipment control method, dialog device, and communication system
US11044555B2 (en) Apparatus, method and computer program for obtaining audio signals
CN211529608U (en) Robot and voice recognition device thereof
JP2013110633A (en) Transoral system
CN110751946A (en) Robot and voice recognition device and method thereof
JP2007027939A (en) Acoustic signal processor
GB2575492A (en) An ambisonic microphone apparatus
JP2009100372A (en) Call device
US20190306618A1 (en) Methods circuits devices systems and associated computer executable code for acquiring acoustic signals
JP2010041667A (en) Sound collection apparatus, and sound emission/collection apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20210301

Address after: 201111 2nd floor, building 2, no.1508, Kunyang Road, Minhang District, Shanghai

Applicant after: Dalu Robot Co.,Ltd.

Address before: 610094 West Section of Fucheng Avenue, Chengdu High-tech District, Sichuan Province

Applicant before: CLOUDMINDS (CHENGDU) TECHNOLOGIES Co.,Ltd.

CB02 Change of applicant information
CB02 Change of applicant information

Address after: 201111 Building 8, No. 207, Zhongqing Road, Minhang District, Shanghai

Applicant after: Dayu robot Co.,Ltd.

Address before: 201111 2nd floor, building 2, no.1508, Kunyang Road, Minhang District, Shanghai

Applicant before: Dalu Robot Co.,Ltd.