WO2019200722A1 - 声源方向估计方法和装置 - Google Patents

声源方向估计方法和装置 Download PDF

Info

Publication number
WO2019200722A1
WO2019200722A1 PCT/CN2018/094132 CN2018094132W WO2019200722A1 WO 2019200722 A1 WO2019200722 A1 WO 2019200722A1 CN 2018094132 W CN2018094132 W CN 2018094132W WO 2019200722 A1 WO2019200722 A1 WO 2019200722A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound source
angle
sound
function
microphones
Prior art date
Application number
PCT/CN2018/094132
Other languages
English (en)
French (fr)
Inventor
邹黄辉
Original Assignee
深圳市沃特沃德股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市沃特沃德股份有限公司 filed Critical 深圳市沃特沃德股份有限公司
Publication of WO2019200722A1 publication Critical patent/WO2019200722A1/zh

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S3/00Direction-finders for determining the direction from which infrasonic, sonic, ultrasonic, or electromagnetic waves, or particle emission, not having a directional significance, are being received
    • G01S3/80Direction-finders for determining the direction from which infrasonic, sonic, ultrasonic, or electromagnetic waves, or particle emission, not having a directional significance, are being received using ultrasonic, sonic or infrasonic waves
    • G01S3/802Systems for determining direction or deviation from predetermined direction

Definitions

  • the present invention relates to the field of electronic technology, and in particular to a sound source direction estimation method and apparatus.
  • the most effective sound source direction estimation method is to use two-microphone technology to estimate the sound source direction, that is, to use two microphones to collect sound signals, and estimate the sound source direction according to the phase difference between the two sound signals collected.
  • the main object of the present invention is to provide a sound source direction estimating method and apparatus for improving the accuracy of sound source direction estimation, aiming at solving the technical problem of inaccurate current sound source direction estimation.
  • an embodiment of the present invention provides a method for estimating a sound source direction, and the method includes the following steps:
  • the step of acquiring location coordinates of the sound source in the image includes:
  • the step of acquiring the position coordinates of the lips of the face in the image includes:
  • the step of calculating, according to the location coordinates, a first angle between a connection between the camera and the sound source and a projection surface of the camera includes:
  • the first angle is calculated using the following formula:
  • A1 is a first angle
  • (x, y) is the position coordinate
  • c is a distance between the image and the projection surface
  • f is a focal length of the camera.
  • the step of calculating the direction of the sound source according to the first angle and the preset second angle includes:
  • A1 is the first angle
  • A2 is the second angle
  • A is the angle between the line connecting the sound source and the microphone and the line connecting the two microphones, and represents the direction of the sound source.
  • the step of calculating the direction of the sound source according to the first angle and the preset second angle further includes:
  • the step of calculating, according to the direction of the sound source, a time delay that the two microphones receive the sound signal of the sound source includes:
  • the time delay is calculated using the following formula:
  • t is the time delay
  • d is the distance between the two microphones
  • A is the angle between the line connecting the sound source and the microphone and the line connecting the two microphones.
  • the method further includes:
  • a wave function of the noise-reduced speech signal is calculated based on a wave function of the sound signal, the coherence function, and the noise function.
  • the step of acquiring a coherence function according to a wave function of the two sound signals includes:
  • the coherence function is obtained using the following formula:
  • r(w) 2* Y1(w)*y2(w)/( y1(w)*y1(w) + y2(w)*y2(w));
  • y1(w) is a wave function of a sound signal received by one of the microphones
  • y2(w) is a wave function of the sound signal received by the other microphone.
  • the step of calculating a wave function of the denoised speech signal according to the wave function of the sound signal, the coherence function, and the noise function comprises:
  • y(w) is the wave function of the speech signal after noise reduction
  • y1(w) is the wave function of the sound signal received by one of the microphones
  • n1(w) is the noise function of the sound signal received by one of the microphones.
  • the embodiment of the invention simultaneously provides a sound source direction estimating device, the device comprising:
  • An image acquisition module configured to collect an image through a camera when a sound signal is detected
  • a position obtaining module configured to acquire position coordinates of the sound source in the image
  • a first calculating module configured to calculate, according to the position coordinates, a first angle between a connection between the camera and the sound source and a projection surface of the camera;
  • a second calculating module configured to calculate a direction of the sound source according to the first angle and a preset second angle; wherein the second angle is a connection between two microphones and the camera The angle of the horizontal axis.
  • the location obtaining module includes:
  • a recognition unit configured to identify a face in the image
  • an obtaining unit configured to acquire position coordinates of a lip of the face in the image, and use position coordinates of the lip as position coordinates of the sound source in the image.
  • the obtaining unit includes:
  • a detecting subunit configured to detect whether the lips of the human face are shaking when there are at least two faces in the image
  • the subunit is obtained for obtaining the position coordinates of the lips of the lips on the shaking face.
  • the first computing module is configured to:
  • the first angle is calculated using the following formula:
  • A1 is a first angle
  • (x, y) is the position coordinate
  • c is a distance between the image and the projection surface
  • f is a focal length of the camera.
  • the second computing module is configured to:
  • A1 is the first angle
  • A2 is the second angle
  • A is the angle between the line connecting the sound source and the microphone and the line connecting the two microphones, and represents the direction of the sound source.
  • the device further includes a third calculating module, configured to: calculate a time delay of the sound signals received by the two microphones from the sound source according to the direction of the sound source.
  • a third calculating module configured to: calculate a time delay of the sound signals received by the two microphones from the sound source according to the direction of the sound source.
  • the third computing module is configured to:
  • the time delay is calculated using the following formula:
  • t is the time delay
  • d is the distance between the two microphones
  • A is the angle between the line connecting the sound source and the microphone and the line connecting the two microphones.
  • the device further includes:
  • An alignment processing module configured to perform alignment processing on a wave function of two sound signals received by the two microphones according to the time delay
  • a function obtaining module configured to acquire a coherence function according to a wave function of the two sound signals, and acquire a noise function of the sound signal
  • a function calculation module configured to calculate a wave function of the denoised speech signal according to the wave function of the sound signal, the coherence function, and the noise function.
  • the function obtaining module is configured to:
  • the coherence function is obtained using the following formula:
  • r(w) 2* Y1(w)*y2(w)/( y1(w)*y1(w) + y2(w)*y2(w));
  • y1(w) is a wave function of a sound signal received by one of the microphones
  • y2(w) is a wave function of the sound signal received by the other microphone.
  • the function calculation module is configured to:
  • y(w) is the wave function of the speech signal after noise reduction
  • y1(w) is the wave function of the sound signal received by one of the microphones
  • n1(w) is the noise function of the sound signal received by one of the microphones.
  • Embodiments of the present invention also provide a terminal device including a memory, a processor, and at least one application stored in the memory and configured to be executed by the processor, the application being configured to be used for The aforementioned sound source direction estimation method is performed.
  • a method for estimating a sound source direction when detecting a sound signal, using image recognition technology to obtain position coordinates of a sound source in an image, thereby estimating a direction of the sound source, thereby avoiding environmental noise
  • the influence of sound source direction estimation improves the accuracy of sound source direction estimation, and lays a foundation for improving the effect of subsequent speech noise reduction or sound source localization.
  • FIG. 1 is a flow chart of a first embodiment of a sound source direction estimating method of the present invention
  • Figure 2 is a flow chart showing a second embodiment of the sound source direction estimating method of the present invention.
  • Figure 3 is a flow chart showing a third embodiment of the sound source direction estimating method of the present invention.
  • Figure 4 is a block diagram showing the first embodiment of the sound source direction estimating device of the present invention.
  • FIG. 5 is a block diagram of the position acquisition module of Figure 4.
  • Figure 6 is a block diagram of the acquisition unit of Figure 5;
  • Figure 7 is a block diagram showing a second embodiment of the sound source direction estimating device of the present invention.
  • Figure 8 is a block diagram showing a third embodiment of the sound source direction estimating apparatus of the present invention.
  • terminal and terminal device used herein include both a wireless signal receiver device, a device having only a wireless signal receiver without a transmitting capability, and a receiving and transmitting hardware.
  • Such devices may include cellular or other communication devices having a single line display or a multi-line display or a cellular or other communication device without a multi-line display; PCS (Personal Communications) Service, personal communication system), which can combine voice, data processing, fax and/or data communication capabilities; PDA (Personal Digital Assistant), which can include radio frequency receiver, pager, Internet/Intranet access, network Browser, notepad, calendar and/or GPS (Global Positioning System) receiver; conventional laptop and/or palmtop computer or other device with and/or conventional lap including radio frequency receiver Type and / or palmtop or other device.
  • PCS Personal Communications
  • PDA Personal Digital Assistant
  • terminal may be portable, transportable, installed in a vehicle (aviation, sea and/or land), or adapted and/or configured to operate locally, and/or Run in any other location on the Earth and/or space in a distributed form.
  • the "terminal” and “terminal device” used herein may also be a communication terminal, an internet terminal, a music/video playing terminal, and may be, for example, a PDA, a MID (Mobile Internet Device), and/or have a music/video playback.
  • Functional mobile phones can also be smart TVs, set-top boxes and other devices.
  • the server used herein includes, but is not limited to, a computer, a network host, a single network server, a plurality of network server sets, or a cloud composed of a plurality of servers.
  • the cloud is composed of a large number of computers or network servers based on Cloud Computing, which is a kind of distributed computing, a super virtual computer composed of a group of loosely coupled computers.
  • communication can be implemented by any communication means between the server, the terminal device and the WNS server, including but not limited to, mobile communication based on 3GPP, LTE, WIMAX, and computer network communication based on TCP/IP and UDP protocols. And short-range wireless transmission based on Bluetooth and infrared transmission standards.
  • the method for estimating a sound source direction can be applied to various electronic devices, including terminal devices (such as cameras, mobile phones, tablets, etc.), smart home devices (such as audio devices, smart TVs, etc.), robot devices, and security supervisors. Equipment (such as monitoring devices, etc.). The following is a detailed description of the application to the terminal device.
  • the method includes the following steps:
  • the terminal device detects the sound through the dual microphone, and when the sound signal is detected, immediately collects the image through the camera.
  • the terminal device uses the face recognition technology to recognize the face in the image.
  • the position coordinates of the lips of the face in the image are acquired, and the position coordinates of the lips are used as the sound source in the image.
  • Position coordinates in are used as the sound source in the image.
  • the voice is spoken, so the position coordinates of the lips of the lips of the shaking face are obtained as sound.
  • the position coordinates of the source in the image it is detected whether the lips of the face are shaking, and when the lips are shaking, the voice is spoken, so the position coordinates of the lips of the lips of the shaking face are obtained as sound.
  • the terminal device acquires the focal length of the preset camera and the distance between the image and the projection surface, and calculates the camera and the sound according to the position coordinates of the sound source in the image, the focal length of the camera, and the distance between the image and the projection surface.
  • the terminal device can calculate the first angle by using the following formula:
  • A1 is the first angle
  • (x, y) is the position coordinate of the sound source in the image
  • c is the distance between the image and the projection surface (the surface of the camera where the focus is parallel to the camera)
  • f is the camera focal length.
  • the angle between the connection line of the two microphones and the horizontal axis of the camera may be calculated in advance according to the hardware design, and the angle is preset as the second angle to the terminal device.
  • the terminal device calculates the direction of the sound source according to the first angle and the second angle.
  • the terminal device can calculate the direction of the sound source by using the following formula:
  • A1 is the first angle
  • A2 is the second angle
  • A is the angle between the connection of the sound source and a microphone and the connection of the two microphones, and represents the direction of the sound source. Since the distance between the two microphones is extremely small with respect to the distance between the sound source and the microphone, the second angle may be the angle between the connection of the sound source and any one of the microphones and the connection of the two microphones.
  • the image recognition technology is used to obtain the position coordinates of the sound source in the image, and the direction of the sound source is estimated accordingly, thereby avoiding the influence of environmental noise on the sound source direction estimation and improving the accuracy of the sound source direction estimation.
  • step S14 the following steps are further included:
  • the terminal device calculates the sound source of the two microphones according to the distance between the two microphones and the angle between the connection between the sound source and a microphone and the connection line between the two microphones (ie, the sound source direction). The time delay of the sound signal.
  • the terminal device can calculate the time delay by using the following formula:
  • t is the time delay
  • d is the distance between the two microphones
  • A is the angle between the line connecting the sound source and one microphone and the line connecting the two microphones (the sound source direction).
  • the time delay of the sound signals collected by the two microphones can be accurately calculated, thereby laying a foundation for improving the effect of subsequent speech noise reduction.
  • step S15 in the third embodiment of the sound source direction estimating method of the present invention, after step S15, the following steps are further included:
  • the terminal device performs alignment processing on the wave functions of the two sound signals according to the time delay t of the sound signals received by the two microphones, such as shifting one of the wave functions forward t or backward another wave function. Pan t.
  • the terminal device performs Fourier transform on the wave functions of the two sound signals, and then calculates the coherence of the two wave functions to obtain a coherence function.
  • the terminal device acquires a noise function of the sound signal by detecting a non-speech portion of the sound signal.
  • the terminal device only needs to acquire the noise function of the sound signal collected by any one of the microphones, such as the noise function n1(w) of the sound signal collected by the microphone 1.
  • the terminal device obtains a coherence function by using the following formula:
  • r(w) 2* Y1(w)*y2(w)/( y1(w)*y1(w) + y2(w)*y2(w));
  • y1(w) is the wave function of the sound signal received by one of the microphones (microphone 1)
  • y2(w) is the sound signal received by the other microphone (microphone 2) Wave function.
  • the terminal device performs signal update according to a mapping relationship such as a coherence function r(w), a noise function n1(w), and a wave function y1(w) of the sound signal, and obtains a wave function y of the denoised speech signal. (w), and performing an inverse Fourier transform on the wave function y(w) to obtain a noise-reduced speech signal.
  • a mapping relationship such as a coherence function r(w), a noise function n1(w), and a wave function y1(w) of the sound signal
  • the terminal device can calculate the wave function of the noise-reduced speech signal by using the following formula:
  • y(w) is the wave function of the noise-reduced speech signal
  • y1(w) is the wave function of the sound signal received by one of the microphones (microphone 1)
  • n1(w) is one of the microphones (microphone 1)
  • the noise function of the received sound signal may also be replaced by y2(w) and n2(w), respectively.
  • the sound source direction estimating method when detecting a sound signal, uses image recognition technology to acquire the position coordinates of the sound source in the image, and estimates the direction of the sound source, thereby avoiding the estimation of the sound source direction by the environmental noise.
  • the influence of the sound source direction estimation is improved, which lays a foundation for improving the effect of subsequent speech noise reduction or sound source localization.
  • the apparatus includes an image capturing module 10, a position acquiring module 20, a first calculating module 30, and a second calculating module 40, wherein: the image collecting module 10
  • the image acquisition module 20 is configured to acquire the position coordinates of the sound source in the image
  • the first calculation module 30 is configured to calculate the camera and the sound source according to the position coordinates.
  • the second calculation module 40 is configured to calculate the direction of the sound source according to the first angle and the preset second angle.
  • the terminal device detects the sound through the dual microphone.
  • the image acquisition module 10 immediately collects the image through the camera, and the position acquisition module 20 acquires the position coordinates of the sound source in the image.
  • the location obtaining module 20 includes an identifying unit 21 and an obtaining unit 22, wherein: the identifying unit 21 is configured to recognize a face in an image by using a face recognition technology; and the acquiring unit 22 is configured to use When the face is recognized, the position coordinates of the lips of the face in the image are acquired, and the position coordinates of the lips are used as the position coordinates of the sound source in the image.
  • the obtaining unit 22 includes a detecting subunit 221 and an obtaining subunit 222, as shown in FIG. 6, wherein: the detecting subunit 221 is configured to detect the lips of the human face when there are at least two faces in the image. Whether it is dithering; the acquisition sub-unit 222 is configured to acquire the position coordinates of the lips of the lips on the shaking face as the position coordinates of the sound source in the image.
  • the first calculating module 30 acquires the focal length of the preset camera and the distance between the image and the projection surface, and calculates the position according to the position coordinates of the sound source in the image, the focal length of the camera, and the distance between the image and the projection surface.
  • the first calculation module 30 calculates the first angle by using the following formula:
  • A1 is the first angle
  • (x, y) is the position coordinate of the sound source in the image
  • c is the distance between the image and the projection surface (the surface of the camera where the focus is parallel to the camera)
  • f is the camera focal length.
  • the angle between the connection line of the two microphones and the horizontal axis of the camera may be calculated in advance according to the hardware design, and the angle is preset as the second angle to the terminal device.
  • the second calculating module 40 calculates the direction of the sound source according to the first angle and the second angle.
  • the second calculating module 40 can calculate the direction of the sound source by using the following formula:
  • A1 is the first angle
  • A2 is the second angle
  • A is the angle between the connection of the sound source and the microphone and the connection of the two microphones, and represents the direction of the sound source. Since the distance between the two microphones is extremely small with respect to the distance between the sound source and the microphone, the second angle may be the angle between the connection of the sound source and any one of the microphones and the connection of the two microphones.
  • the image recognition technology is used to obtain the position coordinates of the sound source in the image, and the direction of the sound source is estimated accordingly, thereby avoiding the influence of environmental noise on the sound source direction estimation and improving the accuracy of the sound source direction estimation.
  • the apparatus further includes a third calculating module 50, wherein the third calculating module 50 is configured to: calculate according to the direction of the sound source. The time delay between the two microphones receiving the sound signal of the sound source.
  • the third calculating module 50 calculates two microphone receiving according to the distance between the two microphones and the angle between the connection of the sound source and one microphone and the connection line of the two microphones (ie, the sound source direction). The time delay of the sound signal to the sound source.
  • the third calculation module 50 can calculate the time delay by using the following formula:
  • t is the time delay
  • d is the distance between the two microphones
  • A is the angle between the line connecting the sound source and the microphone and the line connecting the two microphones.
  • the time delay of the sound signals collected by the two microphones can be accurately calculated, thereby laying a foundation for improving the effect of subsequent speech noise reduction.
  • the apparatus further includes an alignment processing module 60, a function obtaining module 70, and a function calculating module 80, wherein: the alignment processing module 60 Aligning the wave functions of the two sound signals received by the two microphones according to the time delay; the function obtaining module 70 is configured to acquire a coherence function according to the wave function of the two sound signals, and acquire the noise of the sound signal;
  • the function calculation module 80 is configured to calculate a wave function of the denoised speech signal according to a wave function, a coherence function, and a noise function of the sound signal.
  • the alignment processing module 60 performs alignment processing on the wave functions of the two sound signals according to the time delay t of the sound signals received by the two microphones, such as shifting one of the wave functions forward t or another wave function. Shift backward t.
  • the function acquisition module 70 first performs a Fourier transform on the wave functions of the two sound signals, and then calculates the coherence of the two wave functions to obtain a coherence function. At the same time, the function acquisition module 70 acquires the noise function of the sound signal by detecting the non-speech portion of the sound signal. The function acquisition module 70 only needs to acquire the noise function of the sound signal collected by any one of the microphones, such as the noise function n1(w) of the sound signal acquired by the microphone 1.
  • the function obtaining module 70 can obtain the coherence function by using the following formula:
  • r(w) 2* Y1(w)*y2(w)/( y1(w)*y1(w) + y2(w)*y2(w));
  • y1(w) is the wave function of the sound signal received by one of the microphones (microphone 1)
  • y2(w) is the sound signal received by the other microphone (microphone 2) Wave function.
  • the function calculation module 80 calculates the wave function y(w) of the noise-reduced speech signal, and performs inverse Fourier transform on the wave function y(w) to obtain a denoised speech signal.
  • the function calculation module 80 can calculate the wave function of the noise-reduced speech signal by using the following formula:
  • y(w) is the wave function of the noise-reduced speech signal
  • y1(w) is the wave function of the sound signal received by one of the microphones (microphone 1)
  • n1(w) is one of the microphones (microphone 1)
  • the noise function of the received sound signal may also be replaced by y2(w) and n2(w), respectively.
  • the sound source direction estimating device of the embodiment of the present invention uses image recognition technology to acquire the position coordinates of the sound source in the image when the sound signal is detected, thereby estimating the direction of the sound source, thereby avoiding the estimation of the sound source direction by the environmental noise.
  • the influence of the sound source direction estimation is improved, which lays a foundation for improving the effect of subsequent speech noise reduction or sound source localization.
  • the present invention also provides a terminal device including a memory, a processor, and at least one application stored in the memory and configured to be executed by the processor, the application being configured to perform a sound source direction estimation method .
  • the sound source direction estimating method comprises the following steps: when detecting a sound signal, acquiring an image through a camera; acquiring position coordinates of the sound source in the image; calculating a connection between the camera and the sound source and a projection surface of the camera according to the position coordinates; The first angle; the direction of the sound source is calculated according to the first angle and the preset second angle; wherein the second angle is the angle between the line connecting the two microphones and the horizontal axis of the camera.
  • the sound source direction estimation method described in this embodiment is the sound source direction estimation method according to the above embodiment of the present invention, and details are not described herein again.
  • the present invention includes apparatus that is directed to performing one or more of the operations described herein. These devices may be specially designed and manufactured for the required purposes, or may also include known devices in a general purpose computer. These devices have computer programs stored therein that are selectively activated or reconfigured.
  • Such computer programs may be stored in a device (eg, computer) readable medium or in any type of medium suitable for storing electronic instructions and coupled to a bus, respectively, including but not limited to any Types of disks (including floppy disks, hard disks, optical disks, CD-ROMs, and magneto-optical disks), ROM (Read-Only Memory), RAM (Random Access Memory), EPROM (Erasable Programmable) Read-Only Memory, EEPROM (Electrically Erasable) Programmable Read-Only Memory, flash memory, magnetic card or light card.
  • a readable medium includes any medium that is stored or transmitted by a device (eg, a computer) in a readable form.
  • each block of the block diagrams and/or block diagrams and/or flow diagrams and combinations of blocks in the block diagrams and/or block diagrams and/or flow diagrams can be implemented by computer program instructions. .
  • these computer program instructions can be implemented by a general purpose computer, a professional computer, or a processor of other programmable data processing methods, such that the processor is executed by a computer or other programmable data processing method.
  • steps, measures, and solutions in the various operations, methods, and processes that have been discussed in the present invention may be alternated, changed, combined, or deleted. Further, other steps, measures, and schemes of the various operations, methods, and processes that have been discussed in the present invention may be alternated, modified, rearranged, decomposed, combined, or deleted. Further, the steps, measures, and solutions in the prior art having various operations, methods, and processes disclosed in the present invention may also be alternated, changed, rearranged, decomposed, combined, or deleted.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

提供了一种声源方向估计方法和装置,该方法包括:当检测到声音信号时,通过摄像头采集图像(S11);获取声源在图像中的位置坐标(S12);根据声源在图像中的位置坐标计算出摄像头与声源的连线与摄像头的投影面的第一夹角(S13);根据第一夹角和预置的第二夹角计算出声源的方向(S14)。通过该方法估计声源的方向,避免了环境噪声的影响,提高了估计的准确性。

Description

声源方向估计方法和装置 技术领域
本发明涉及电子技术领域,特别是涉及到一种声源方向估计方法和装置。
背景技术
在语音降噪、声源跟踪等应用场景,都需要首先估计声源方向。目前,最有效的声源方向估计方法是采用双麦克风技术进行声源方向估计,即利用两个麦克风采集声音信号,根据采集的两个声音信号的相位差来估计声源方向。
但是,当用于远场语音降噪时,或者背景噪音较大时,上述声源方向估计方法的效果就大打折扣,导致估计的声源方向不够准确,从而影响后续语音降噪或声源跟踪的效果。
技术问题
本发明的主要目的为提供一种提高声源方向估计的准确性的声源方向估计方法和装置,旨在解决现有估计声源方向不准确的技术问题。
技术解决方案
为达以上目的,本发明实施例提出一种声源方向估计方法,所述方法包括以下步骤:
当检测到声音信号时,通过摄像头采集图像;
获取声源在所述图像中的位置坐标;
根据所述位置坐标计算出所述摄像头与所述声源的连线与所述摄像头的投影面的第一夹角;
根据所述第一夹角和预置的第二夹角计算出所述声源的方向;其中,所述第二夹角为两个麦克风的连线与所述摄像头的横轴的夹角。
可选地,所述获取声源在所述图像中的位置坐标的步骤包括:
识别所述图像中的人脸;
获取所述图像中人脸的嘴唇的位置坐标,并将所述嘴唇的位置坐标作为所述声源在所述图像中的位置坐标。
可选地,所述获取所述图像中人脸的嘴唇的位置坐标的步骤包括:
当所述图像中的人脸至少有两个时,检测所述人脸的嘴唇是否在抖动;
获取嘴唇在抖动的人脸的嘴唇的位置坐标。
可选地,所述根据所述位置坐标计算出所述摄像头与所述声源的连线与所述摄像头的投影面的第一夹角的步骤包括:
利用以下公式计算出所述第一夹角:
A1= atan((x*x+y*y)^0.5/(c *f));
其中,A1为第一夹角,(x,y)为所述位置坐标,c为所述图像与所述投影面的距离,f为所述摄像头的焦距。
可选地,所述根据所述第一夹角和预置的第二夹角计算出所述声源的方向的步骤包括:
利用以下公式计算出所述声源的方向:
A =arccos(cos(A1)*cos(A2));
其中,A1为第一夹角,A2为第二夹角,A为所述声源与所述麦克风的连线与两个麦克风的连线的夹角,代表所述声源的方向。
可选地,所述根据所述第一夹角和预置的第二夹角计算出所述声源的方向的步骤之后还包括:
根据所述声源的方向计算出两个麦克风接收到所述声源的声音信号的时间延迟。
可选地,所述根据所述声源的方向计算出两个麦克风接收到所述声源的声音信号的时间延迟的步骤包括:
利用以下公式计算出所述时间延迟:
t= d*cos(A)/340;
其中,t为所述时间延迟,d为两个麦克风之间的距离,A为所述声源与所述麦克风的连线与两个麦克风的连线的夹角。
可选地,所述根据所述声源的方向计算出两个麦克风接收到所述声源的声音信号的时间延迟的步骤之后还包括:
根据所述时间延迟对两个麦克风接收到的两个声音信号的波函数进行对齐处理;
根据所述两个声音信号的波函数获取相干性函数,并获取所述声音信号的噪声函数;
根据所述声音信号的波函数、所述相干性函数和所述噪声函数计算出降噪后的语音信号的波函数。
可选地,所述根据所述两个声音信号的波函数获取相干性函数的步骤包括:
利用以下公式获取所述相干性函数:
r(w)=2* y1(w)*y2(w)/( y1(w)*y1(w) + y2(w)*y2(w));
其中,r(w)为相干性函数,y1(w)为其中一个麦克风接收到的声音信号的波函数,y2(w)为另一个麦克风接收到的声音信号的波函数。
可选地,所述根据所述声音信号的波函数、所述相干性函数和所述噪声函数计算出降噪后的语音信号的波函数的步骤包括:
利用以下公式计算出降噪后的语音信号的波函数:
y(w)=r(w)*(y1(w)- n1(w));
其中,y(w)为降噪后的语音信号的波函数,y1(w) 为其中一个麦克风接收到的声音信号的波函数, n1(w)为其中一个麦克风接收到的声音信号的噪声函数。
本发明实施例同时提出一种声源方向估计装置,所述装置包括:
图像采集模块,用于当检测到声音信号时,通过摄像头采集图像;
位置获取模块,用于获取声源在所述图像中的位置坐标;
第一计算模块,用于根据所述位置坐标计算出所述摄像头与所述声源的连线与所述摄像头的投影面的第一夹角;
第二计算模块,用于根据所述第一夹角和预置的第二夹角计算出所述声源的方向;其中,所述第二夹角为两个麦克风的连线与所述摄像头的横轴的夹角。
可选地,所述位置获取模块包括:
识别单元,用于识别所述图像中的人脸;
获取单元,用于获取所述图像中人脸的嘴唇的位置坐标,并将所述嘴唇的位置坐标作为所述声源在所述图像中的位置坐标。
可选地,所述获取单元包括:
检测子单元,用于当所述图像中的人脸至少有两个时,检测所述人脸的嘴唇是否在抖动;
获取子单元,用于获取嘴唇在抖动的人脸的嘴唇的位置坐标。
可选地,所述第一计算模块用于:
利用以下公式计算出所述第一夹角:
A1= atan((x*x+y*y)^0.5/(c *f));
其中,A1为第一夹角,(x,y)为所述位置坐标,c为所述图像与所述投影面的距离,f为所述摄像头的焦距。
可选地,所述第二计算模块用于:
利用以下公式计算出所述声源的方向:
A =arccos(cos(A1)*cos(A2));
其中,A1为第一夹角,A2为第二夹角,A为所述声源与所述麦克风的连线与两个麦克风的连线的夹角,代表所述声源的方向。
可选地,所述装置还包括第三计算模块,所述第三计算模块用于:根据所述声源的方向计算出两个麦克风接收到所述声源的声音信号的时间延迟。
可选地,所述第三计算模块用于:
利用以下公式计算出所述时间延迟:
t= d*cos(A)/340;
其中,t为所述时间延迟,d为两个麦克风之间的距离,A为所述声源与所述麦克风的连线与两个麦克风的连线的夹角。
可选地,所述装置还包括:
对齐处理模块,用于根据所述时间延迟对两个麦克风接收到的两个声音信号的波函数进行对齐处理;
函数获取模块,用于根据所述两个声音信号的波函数获取相干性函数,并获取所述声音信号的噪声函数;
函数计算模块,用于根据所述声音信号的波函数、所述相干性函数和所述噪声函数计算出降噪后的语音信号的波函数。
可选地,所述函数获取模块用于:
利用以下公式获取所述相干性函数:
r(w)=2* y1(w)*y2(w)/( y1(w)*y1(w) + y2(w)*y2(w));
其中,r(w)为相干性函数,y1(w)为其中一个麦克风接收到的声音信号的波函数,y2(w)为另一个麦克风接收到的声音信号的波函数。
可选地,所述函数计算模块用于:
利用以下公式计算出降噪后的语音信号的波函数:
y(w)=r(w)*(y1(w)- n1(w));
其中,y(w)为降噪后的语音信号的波函数,y1(w) 为其中一个麦克风接收到的声音信号的波函数, n1(w)为其中一个麦克风接收到的声音信号的噪声函数。
本发明实施例还提出一种终端设备,其包括存储器、处理器和至少一个被存储在所述存储器中并被配置为由所述处理器执行的应用程序,所述应用程序被配置为用于执行前述声源方向估计方法。
有益效果
本发明实施例所提供的一种声源方向估计方法,当检测到声音信号时,利用图像识别技术获取声源在图像中的位置坐标,据此估计声源的方向,从而避免了环境噪声对声源方向估计的影响,提高了声源方向估计的准确性,进而为提高后续语音降噪或声源定位的效果奠定了基础。
附图说明
图1是本发明的声源方向估计方法第一实施例的流程图;
图2是本发明的声源方向估计方法第二实施例的流程图;
图3是本发明的声源方向估计方法第三实施例的流程图;
图4是本发明的声源方向估计装置第一实施例的模块示意图;
图5是图4中的位置获取模块的模块示意图;
图6是图5中的获取单元的模块示意图;
图7是本发明的声源方向估计装置第二实施例的模块示意图;
图8是本发明的声源方向估计装置第三实施例的模块示意图。
本发明目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
本发明的最佳实施方式
应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。
下面详细描述本发明的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,仅用于解释本发明,而不能解释为对本发明的限制。
本技术领域技术人员可以理解,除非特意声明,这里使用的单数形式“一”、“一个”、“所述”和“该”也可包括复数形式。应该进一步理解的是,本发明的说明书中使用的措辞“包括”是指存在所述特征、整数、步骤、操作、元件和/或组件,但是并不排除存在或添加一个或多个其他特征、整数、步骤、操作、元件、组件和/或它们的组。应该理解,当我们称元件被“连接”或“耦接”到另一元件时,它可以直接连接或耦接到其他元件,或者也可以存在中间元件。此外,这里使用的“连接”或“耦接”可以包括无线连接或无线耦接。这里使用的措辞“和/或”包括一个或更多个相关联的列出项的全部或任一单元和全部组合。
本技术领域技术人员可以理解,除非另外定义,这里使用的所有术语(包括技术术语和科学术语),具有与本发明所属领域中的普通技术人员的一般理解相同的意义。还应该理解的是,诸如通用字典中定义的那些术语,应该被理解为具有与现有技术的上下文中的意义一致的意义,并且除非像这里一样被特定定义,否则不会用理想化或过于正式的含义来解释。
本技术领域技术人员可以理解,这里所使用的“终端”、“终端设备”既包括无线信号接收器的设备,其仅具备无发射能力的无线信号接收器的设备,又包括接收和发射硬件的设备,其具有能够在双向通信链路上,执行双向通信的接收和发射硬件的设备。这种设备可以包括:蜂窝或其他通信设备,其具有单线路显示器或多线路显示器或没有多线路显示器的蜂窝或其他通信设备;PCS(Personal Communications Service,个人通信系统),其可以组合语音、数据处理、传真和/或数据通信能力;PDA(Personal Digital Assistant,个人数字助理),其可以包括射频接收器、寻呼机、互联网/内联网访问、网络浏览器、记事本、日历和/或GPS(Global Positioning System,全球定位系统)接收器;常规膝上型和/或掌上型计算机或其他设备,其具有和/或包括射频接收器的常规膝上型和/或掌上型计算机或其他设备。这里所使用的“终端”、“终端设备”可以是便携式、可运输、安装在交通工具(航空、海运和/或陆地)中的,或者适合于和/或配置为在本地运行,和/或以分布形式,运行在地球和/或空间的任何其他位置运行。这里所使用的“终端”、“终端设备”还可以是通信终端、上网终端、音乐/视频播放终端,例如可以是PDA、MID(Mobile Internet Device,移动互联网设备)和/或具有音乐/视频播放功能的移动电话,也可以是智能电视、机顶盒等设备。
本技术领域技术人员可以理解,这里所使用的服务器,其包括但不限于计算机、网络主机、单个网络服务器、多个网络服务器集或多个服务器构成的云。在此,云由基于云计算(Cloud Computing)的大量计算机或网络服务器构成,其中,云计算是分布式计算的一种,由一群松散耦合的计算机集组成的一个超级虚拟计算机。本发明的实施例中,服务器、终端设备与WNS服务器之间可通过任何通信方式实现通信,包括但不限于,基于3GPP、LTE、WIMAX的移动通信、基于TCP/IP、UDP协议的计算机网络通信以及基于蓝牙、红外传输标准的近距无线传输方式。
本发明实施例的声源方向估计方法,可以应用于各种电子设备,包括终端设备(如相机、手机、平板等)、智能家居设备(如音响设备、智能电视等)、机器人设备、安监设备(如监控装置等)等。以下以应用于终端设备为例进行详细说明。
参照图1,提出本发明的声源方向估计方法第一实施例,所述方法包括以下步骤:
S11、当检测到声音信号时,通过摄像头采集图像。
本发明实施例中,终端设备通过双麦克风检测声音,当检测到声音信号时,立即通过摄像头采集图像。
S12、获取声源在图像中的位置坐标。
本发明实施例中,终端设备采用人脸识别技术识别图像中的人脸,当识别到人脸时,则获取图像中人脸的嘴唇的位置坐标,并将嘴唇的位置坐标作为声源在图像中的位置坐标。
可选地,当图像中的人脸至少有两个时,检测人脸的嘴唇是否在抖动,当嘴唇在抖动时才说明在说话,因此获取嘴唇在抖动的人脸的嘴唇的位置坐标作为声源在图像中的位置坐标。
S13、根据声源在图像中的位置坐标计算出摄像头与声源的连线与摄像头的投影面的第一夹角。
本发明实施例中,终端设备获取预置的摄像头的焦距以及图像与投影面的距离,并根据声源在图像中的位置坐标、摄像头的焦距以及图像与投影面的距离,计算出摄像头与声源的连线与摄像头的投影面的第一夹角。
具体的,终端设备可以利用以下公式计算出第一夹角:
A1= atan((x*x+y*y)^0.5/(c *f));
其中,A1为第一夹角,(x,y)为声源在图像中的位置坐标,c为图像与投影面(摄像头的焦点所在的且与摄像头平行的面)的距离,f为摄像头的焦距。
S14、根据第一夹角和预置的第二夹角计算出声源的方向。
本发明实施例中,可以根据硬件设计预先计算出两个麦克风的连线与摄像头的横轴的夹角,并将该夹角作为第二夹角预置于终端设备。终端设备则根据第一夹角和第二夹角计算出声源的方向。
具体的,终端设备可以利用以下公式计算出声源的方向:
A =arccos(cos(A1)*cos(A2));
其中,A1为第一夹角,A2为第二夹角,A为声源与一个麦克风的连线与两个麦克风的连线的夹角,代表声源的方向。由于两个麦克风之间的距离相对于声源与麦克风的距离来说是极小的,因此第二夹角可以是声源与任意一个麦克风的连线与两个麦克风的连线的夹角。
从而,利用图像识别技术获取声源在图像中的位置坐标,据此估计声源的方向,从而避免了环境噪声对声源方向估计的影响,提高了声源方向估计的准确性。
进一步,如图2所示,在本发明的声源方向估计方法第二实施例中,步骤S14之后还包括以下步骤:
S15、根据声源的方向计算出两个麦克风接收到声源的声音信号的时间延迟。
本实施例中,终端设备根据两个麦克风之间的距离以及声源与一个麦克风的连线与两个麦克风的连线的夹角(即声源方向),计算出两个麦克风接收到声源的声音信号的时间延迟。
具体的,终端设备可以利用以下公式计算出时间延迟:
t= d*cos(A)/340;
其中,t为时间延迟,d为两个麦克风之间的距离,A为声源与一个麦克风的连线与两个麦克风的连线的夹角(声源方向)。
从而,即使在嘈杂的环境下,也能准确的计算出两个麦克风采集的声音信号的时间延迟,从而为提高后续语音降噪的效果奠定了基础。
更进一步地,如图3所示,在本发明的声源方向估计方法第三实施例中,步骤S15之后还包括以下步骤:
S16、根据时间延迟对两个麦克风接收到的两个声音信号的波函数进行对齐处理。
本实施例中,终端设备根据两个麦克风接收到的声音信号的时间延迟t对两个声音信号的波函数进行对齐处理,如将其中一个波函数向前平移t或者将另一个波函数向后平移t。
S17、根据两个声音信号的波函数获取相干性函数,并获取声音信号的噪声函数。
本实施例中,终端设备先对两个声音信号的波函数进行傅立叶变换,再计算两个波函数的相干性,获取相干性函数。同时,终端设备通过检测声音信号中的无语音部分,获取声音信号的噪声函数。终端设备只需获取任意一个麦克风采集的声音信号的噪声函数,如获取麦克风1采集的声音信号的噪声函数n1(w)。
具体的,终端设备利用以下公式获取相干性函数:
r(w)=2* y1(w)*y2(w)/( y1(w)*y1(w) + y2(w)*y2(w));
其中,r(w)为相干性函数,y1(w)为其中一个麦克风(麦克风1)接收到的声音信号的波函数,y2(w)为另一个麦克风(麦克风2)接收到的声音信号的波函数。
S18、根据声音信号的波函数、相干性函数和噪声函数计算出降噪后的语音信号的波函数。
本实施例中,终端设备根据相干性函数r(w)、噪声函数n1(w)和声音信号的波函数y1(w)等映射关系进行信号更新,得到降噪后的语音信号的波函数y(w),并对该波函数y(w)进行反傅立叶变换,得到降噪后的语音信号。
具体的,终端设备可以利用以下公式计算出降噪后的语音信号的波函数:
y(w)=r(w)*(y1(w)- n1(w));
其中,y(w)为降噪后的语音信号的波函数,y1(w) 为其中一个麦克风(麦克风1)接收到的声音信号的波函数,n1(w)为其中一个麦克风(麦克风1)接收到的声音信号的噪声函数。可选地,前述y1(w)和n1(w)也可以分别替换为y2(w)和n2(w)。
从而,即使进行远场语音降噪时,或者背景噪音较大时,也能获得良好的语音降噪效果,提升了用户体验。
本发明实施例的声源方向估计方法,当检测到声音信号时,利用图像识别技术获取声源在图像中的位置坐标,据此估计声源的方向,从而避免了环境噪声对声源方向估计的影响,提高了声源方向估计的准确性,进而为提高后续语音降噪或声源定位的效果奠定了基础。
参照图4,提出本发明的声源方向估计装置第一实施例,所述装置包括图像采集模块10、位置获取模块20、第一计算模块30和第二计算模块40,其中:图像采集模块10,用于当检测到声音信号时,通过摄像头采集图像;位置获取模块20,用于获取声源在图像中的位置坐标;第一计算模块30,用于根据位置坐标计算出摄像头与声源的连线与摄像头的投影面的第一夹角;第二计算模块40,用于根据第一夹角和预置的第二夹角计算出声源的方向。
本发明实施例中,终端设备通过双麦克风检测声音,当检测到声音信号时,图像采集模块10立即通过摄像头采集图像,位置获取模块20则获取声源在图像中的位置坐标。
本发明实施例中,位置获取模块20如图5所示,包括识别单元21和获取单元22,其中:识别单元21,用于采用人脸识别技术识别图像中的人脸;获取单元22,用于当识别到人脸时,获取图像中人脸的嘴唇的位置坐标,并将嘴唇的位置坐标作为声源在图像中的位置坐标。
可选地,获取单元22如图6所示,包括检测子单元221和获取子单元222,其中:检测子单元221,用于当图像中的人脸至少有两个时,检测人脸的嘴唇是否在抖动;获取子单元222,用于获取嘴唇在抖动的人脸的嘴唇的位置坐标作为声源在图像中的位置坐标。
本发明实施例中,第一计算模块30获取预置的摄像头的焦距以及图像与投影面的距离,并根据声源在图像中的位置坐标、摄像头的焦距以及图像与投影面的距离,计算出摄像头与声源的连线与摄像头的投影面的第一夹角。
具体的,第一计算模块30利用以下公式计算出第一夹角:
A1= atan((x*x+y*y)^0.5/(c *f));
其中,A1为第一夹角,(x,y)为声源在图像中的位置坐标,c为图像与投影面(摄像头的焦点所在的且与摄像头平行的面)的距离,f为摄像头的焦距。
本发明实施例中,可以根据硬件设计预先计算出两个麦克风的连线与摄像头的横轴的夹角,并将该夹角作为第二夹角预置于终端设备。第二计算模块40则根据第一夹角和第二夹角计算出声源的方向。
具体的,第二计算模块40可以利用以下公式计算出声源的方向:
A =arccos(cos(A1)*cos(A2));
其中,A1为第一夹角,A2为第二夹角,A为声源与麦克风的连线与两个麦克风的连线的夹角,代表声源的方向。由于两个麦克风之间的距离相对于声源与麦克风的距离来说是极小的,因此第二夹角可以是声源与任意一个麦克风的连线与两个麦克风的连线的夹角。
从而,利用图像识别技术获取声源在图像中的位置坐标,据此估计声源的方向,从而避免了环境噪声对声源方向估计的影响,提高了声源方向估计的准确性。
进一步地,如图7所示,在本发明的声源方向估计装置第二实施例中,该装置还包括第三计算模块50,该第三计算模块50用于:根据声源的方向计算出两个麦克风接收到声源的声音信号的时间延迟。
本实施例中,第三计算模块50根据两个麦克风之间的距离以及声源与一个麦克风的连线与两个麦克风的连线的夹角(即声源方向),计算出两个麦克风接收到声源的声音信号的时间延迟。
具体的,第三计算模块50可以利用以下公式计算出时间延迟:
t= d*cos(A)/340;
其中,t为时间延迟,d为两个麦克风之间的距离,A为声源与麦克风的连线与两个麦克风的连线的夹角。
从而,即使在嘈杂的环境下,也能准确的计算出两个麦克风采集的声音信号的时间延迟,从而为提高后续语音降噪的效果奠定了基础。
更进一步地,如图8所示,在本发明的声源方向估计装置第三实施例中,该装置还包括对齐处理模块60、函数获取模块70和函数计算模块80,其中:对齐处理模块60,用于根据时间延迟对两个麦克风接收到的两个声音信号的波函数进行对齐处理;函数获取模块70,用于根据两个声音信号的波函数获取相干性函数,并获取声音信号的噪声函数;函数计算模块80,用于根据声音信号的波函数、相干性函数和噪声函数计算出降噪后的语音信号的波函数。
本实施例中,对齐处理模块60根据两个麦克风接收到的声音信号的时间延迟t对两个声音信号的波函数进行对齐处理,如将其中一个波函数向前平移t或者将另一个波函数向后平移t。
本实施例中,函数获取模块70先对两个声音信号的波函数进行傅立叶变换,再计算两个波函数的相干性,获取相干性函数。同时,函数获取模块70通过检测声音信号中的无语音部分,获取声音信号的噪声函数。函数获取模块70只需获取任意一个麦克风采集的声音信号的噪声函数,如获取麦克风1采集的声音信号的噪声函数n1(w)。
具体的,函数获取模块70可以利用以下公式获取相干性函数:
r(w)=2* y1(w)*y2(w)/( y1(w)*y1(w) + y2(w)*y2(w));
其中,r(w)为相干性函数,y1(w)为其中一个麦克风(麦克风1)接收到的声音信号的波函数,y2(w)为另一个麦克风(麦克风2)接收到的声音信号的波函数。
本实施例中,函数计算模块80计算出降噪后的语音信号的波函数y(w),并对该波函数y(w)进行反傅立叶变换,得到降噪后的语音信号。
具体的,函数计算模块80可以利用以下公式计算出降噪后的语音信号的波函数:
y(w)=r(w)*(y1(w)- n1(w));
其中,y(w)为降噪后的语音信号的波函数,y1(w) 为其中一个麦克风(麦克风1)接收到的声音信号的波函数, n1(w)为其中一个麦克风(麦克风1)接收到的声音信号的噪声函数。可选地,前述y1(w)和n1(w)也可以分别替换为y2(w)和n2(w)。
从而,即使进行远场语音降噪时,或者背景噪音较大时,也能获得良好的语音降噪效果,提升了用户体验。
本发明实施例的声源方向估计装置,当检测到声音信号时,利用图像识别技术获取声源在图像中的位置坐标,据此估计声源的方向,从而避免了环境噪声对声源方向估计的影响,提高了声源方向估计的准确性,进而为提高后续语音降噪或声源定位的效果奠定了基础。
本发明同时提出一种终端设备,其包括存储器、处理器和至少一个被存储在存储器中并被配置为由处理器执行的应用程序,所述应用程序被配置为用于执行声源方向估计方法。所述声源方向估计方法包括以下步骤:当检测到声音信号时,通过摄像头采集图像;获取声源在图像中的位置坐标;根据位置坐标计算出摄像头与声源的连线与摄像头的投影面的第一夹角;根据第一夹角和预置的第二夹角计算出声源的方向;其中,第二夹角为两个麦克风的连线与摄像头的横轴的夹角。本实施例中所描述的声源方向估计方法为本发明中上述实施例所涉及的声源方向估计方法,在此不再赘述。
本领域技术人员可以理解,本发明包括涉及用于执行本申请中所述操作中的一项或多项的设备。这些设备可以为所需的目的而专门设计和制造,或者也可以包括通用计算机中的已知设备。这些设备具有存储在其内的计算机程序,这些计算机程序选择性地激活或重构。这样的计算机程序可以被存储在设备(例如,计算机)可读介质中或者存储在适于存储电子指令并分别耦联到总线的任何类型的介质中,所述计算机可读介质包括但不限于任何类型的盘(包括软盘、硬盘、光盘、CD-ROM、和磁光盘)、ROM(Read-Only Memory,只读存储器)、RAM(Random Access Memory,随机存储器)、EPROM(Erasable Programmable Read-Only Memory,可擦写可编程只读存储器)、EEPROM(Electrically Erasable Programmable Read-Only Memory,电可擦可编程只读存储器)、闪存、磁性卡片或光线卡片。也就是,可读介质包括由设备(例如,计算机)以能够读的形式存储或传输信息的任何介质。
本技术领域技术人员可以理解,可以用计算机程序指令来实现这些结构图和/或框图和/或流图中的每个框以及这些结构图和/或框图和/或流图中的框的组合。本技术领域技术人员可以理解,可以将这些计算机程序指令提供给通用计算机、专业计算机或其他可编程数据处理方法的处理器来实现,从而通过计算机或其他可编程数据处理方法的处理器来执行本发明公开的结构图和/或框图和/或流图的框或多个框中指定的方案。
本技术领域技术人员可以理解,本发明中已经讨论过的各种操作、方法、流程中的步骤、措施、方案可以被交替、更改、组合或删除。进一步地,具有本发明中已经讨论过的各种操作、方法、流程中的其他步骤、措施、方案也可以被交替、更改、重排、分解、组合或删除。进一步地,现有技术中的具有与本发明中公开的各种操作、方法、流程中的步骤、措施、方案也可以被交替、更改、重排、分解、组合或删除。
以上参照附图说明了本发明的优选实施例,并非因此局限本发明的权利范围。本领域技术人员不脱离本发明的范围和实质,可以有多种变型方案实现本发明,比如作为一个实施例的特征可用于另一实施例而得到又一实施例。凡在运用本发明的技术构思之内所作的任何修改、等同替换和改进,均应在本发明的权利范围之内。

Claims (20)

  1. 一种声源方向估计方法,其特征在于,包括以下步骤:
    当检测到声音信号时,通过摄像头采集图像;
    获取声源在所述图像中的位置坐标;
    根据所述位置坐标计算出所述摄像头与所述声源的连线与所述摄像头的投影面的第一夹角;
    根据所述第一夹角和预置的第二夹角计算出所述声源的方向;其中,所述第二夹角为两个麦克风的连线与所述摄像头的横轴的夹角。
  2. 根据权利要求1所述的声源方向估计方法,其特征在于,所述获取声源在所述图像中的位置坐标的步骤包括:
    识别所述图像中的人脸;
    获取所述图像中人脸的嘴唇的位置坐标,并将所述嘴唇的位置坐标作为所述声源在所述图像中的位置坐标。
  3. 根据权利要求2所述的声源方向估计方法,其特征在于,所述获取所述图像中人脸的嘴唇的位置坐标的步骤包括:
    当所述图像中的人脸至少有两个时,检测所述人脸的嘴唇是否在抖动;
    获取嘴唇在抖动的人脸的嘴唇的位置坐标。
  4. 根据权利要求1所述的声源方向估计方法,其特征在于,所述根据所述位置坐标计算出所述摄像头与所述声源的连线与所述摄像头的投影面的第一夹角的步骤包括:
    利用以下公式计算出所述第一夹角:
    A1= atan((x*x+y*y)^0.5/(c *f));
    其中,A1为第一夹角,(x,y)为所述位置坐标,c为所述图像与所述投影面的距离,f为所述摄像头的焦距。
  5. 根据权利要求1所述的声源方向估计方法,其特征在于,所述根据所述第一夹角和预置的第二夹角计算出所述声源的方向的步骤包括:
    利用以下公式计算出所述声源的方向:
    A =arccos(cos(A1)*cos(A2));
    其中,A1为第一夹角,A2为第二夹角,A为所述声源与所述麦克风的连线与两个麦克风的连线的夹角,代表所述声源的方向。
  6. 根据权利要求1所述的声源方向估计方法,其特征在于,所述根据所述第一夹角和预置的第二夹角计算出所述声源的方向的步骤之后还包括:
    根据所述声源的方向计算出两个麦克风接收到所述声源的声音信号的时间延迟。
  7. 根据权利要求6所述的声源方向估计方法,其特征在于,所述根据所述声源的方向计算出两个麦克风接收到所述声源的声音信号的时间延迟的步骤包括:
    利用以下公式计算出所述时间延迟:
    t= d*cos(A)/340;
    其中,t为所述时间延迟,d为两个麦克风之间的距离,A为所述声源与所述麦克风的连线与两个麦克风的连线的夹角。
  8. 根据权利要求6所述的声源方向估计方法,其特征在于,所述根据所述声源的方向计算出两个麦克风接收到所述声源的声音信号的时间延迟的步骤之后还包括:
    根据所述时间延迟对两个麦克风接收到的两个声音信号的波函数进行对齐处理;
    根据所述两个声音信号的波函数获取相干性函数,并获取所述声音信号的噪声函数;
    根据所述声音信号的波函数、所述相干性函数和所述噪声函数计算出降噪后的语音信号的波函数。
  9. 根据权利要求8所述的声源方向估计方法,其特征在于,所述根据所述两个声音信号的波函数获取相干性函数的步骤包括:
    利用以下公式获取所述相干性函数:
    r(w)=2* y1(w)*y2(w)/( y1(w)*y1(w) + y2(w)*y2(w));
    其中,r(w)为相干性函数,y1(w)为其中一个麦克风接收到的声音信号的波函数,y2(w)为另一个麦克风接收到的声音信号的波函数。
  10. 根据权利要求8所述的声源方向估计方法,其特征在于,所述根据所述声音信号的波函数、所述相干性函数和所述噪声函数计算出降噪后的语音信号的波函数的步骤包括:
    利用以下公式计算出降噪后的语音信号的波函数:
    y(w)=r(w)*(y1(w)- n1(w));
    其中,y(w)为降噪后的语音信号的波函数,y1(w) 为其中一个麦克风接收到的声音信号的波函数, n1(w)为其中一个麦克风接收到的声音信号的噪声函数。
  11. 一种声源方向估计装置,其特征在于,包括:
    图像采集模块,用于当检测到声音信号时,通过摄像头采集图像;
    位置获取模块,用于获取声源在所述图像中的位置坐标;
    第一计算模块,用于根据所述位置坐标计算出所述摄像头与所述声源的连线与所述摄像头的投影面的第一夹角;
    第二计算模块,用于根据所述第一夹角和预置的第二夹角计算出所述声源的方向;其中,所述第二夹角为两个麦克风的连线与所述摄像头的横轴的夹角。
  12. 根据权利要求11所述的声源方向估计装置,其特征在于,所述位置获取模块包括:
    识别单元,用于识别所述图像中的人脸;
    获取单元,用于获取所述图像中人脸的嘴唇的位置坐标,并将所述嘴唇的位置坐标作为所述声源在所述图像中的位置坐标。
  13. 根据权利要求12所述的声源方向估计装置,其特征在于,所述获取单元包括:
    检测子单元,用于当所述图像中的人脸至少有两个时,检测所述人脸的嘴唇是否在抖动;
    获取子单元,用于获取嘴唇在抖动的人脸的嘴唇的位置坐标。
  14. 根据权利要求11所述的声源方向估计装置,其特征在于,所述第一计算模块用于:
    利用以下公式计算出所述第一夹角:
    A1= atan((x*x+y*y)^0.5/(c *f));
    其中,A1为第一夹角,(x,y)为所述位置坐标,c为所述图像与所述投影面的距离,f为所述摄像头的焦距。
  15. 根据权利要求11所述的声源方向估计装置,其特征在于,所述第二计算模块用于:
    利用以下公式计算出所述声源的方向:
    A =arccos(cos(A1)*cos(A2));
    其中,A1为第一夹角,A2为第二夹角,A为所述声源与所述麦克风的连线与两个麦克风的连线的夹角,代表所述声源的方向。
  16. 根据权利要求11所述的声源方向估计装置,其特征在于,所述装置还包括第三计算模块,所述第三计算模块用于:
    根据所述声源的方向计算出两个麦克风接收到所述声源的声音信号的时间延迟。
  17. 根据权利要求16所述的声源方向估计装置,其特征在于,所述第三计算模块用于:
    利用以下公式计算出所述时间延迟:
    t= d*cos(A)/340;
    其中,t为所述时间延迟,d为两个麦克风之间的距离,A为所述声源与所述麦克风的连线与两个麦克风的连线的夹角。
  18. 根据权利要求11所述的声源方向估计装置,其特征在于,所述装置还包括:
    对齐处理模块,用于根据所述时间延迟对两个麦克风接收到的两个声音信号的波函数进行对齐处理;
    函数获取模块,用于根据所述两个声音信号的波函数获取相干性函数,并获取所述声音信号的噪声函数;
    函数计算模块,用于根据所述声音信号的波函数、所述相干性函数和所述噪声函数计算出降噪后的语音信号的波函数。
  19. 根据权利要求18所述的声源方向估计装置,其特征在于,所述函数获取模块用于:
    利用以下公式获取所述相干性函数:
    r(w)=2* y1(w)*y2(w)/( y1(w)*y1(w) + y2(w)*y2(w));
    其中,r(w)为相干性函数,y1(w)为其中一个麦克风接收到的声音信号的波函数,y2(w)为另一个麦克风接收到的声音信号的波函数。
  20. 根据权利要求18所述的声源方向估计装置,其特征在于,所述函数计算模块用于:
    利用以下公式计算出降噪后的语音信号的波函数:
    y(w)=r(w)*(y1(w)- n1(w));
    其中,y(w)为降噪后的语音信号的波函数,y1(w) 为其中一个麦克风接收到的声音信号的波函数, n1(w)为其中一个麦克风接收到的声音信号的噪声函数。
PCT/CN2018/094132 2018-04-16 2018-07-02 声源方向估计方法和装置 WO2019200722A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810339205.0A CN108957392A (zh) 2018-04-16 2018-04-16 声源方向估计方法和装置
CN201810339205.0 2018-04-16

Publications (1)

Publication Number Publication Date
WO2019200722A1 true WO2019200722A1 (zh) 2019-10-24

Family

ID=64498687

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/094132 WO2019200722A1 (zh) 2018-04-16 2018-07-02 声源方向估计方法和装置

Country Status (2)

Country Link
CN (1) CN108957392A (zh)
WO (1) WO2019200722A1 (zh)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109506568B (zh) * 2018-12-29 2021-06-18 思必驰科技股份有限公司 一种基于图像识别和语音识别的声源定位方法及装置
CN110493690B (zh) * 2019-08-29 2021-08-13 北京搜狗科技发展有限公司 一种声音采集方法及装置
CN113450769B (zh) * 2020-03-09 2024-06-25 杭州海康威视数字技术股份有限公司 语音提取方法、装置、设备和存储介质
CN112492430B (zh) * 2020-12-17 2023-12-15 维沃移动通信有限公司 电子设备和电子设备的录音方法
CN113301294B (zh) * 2021-05-14 2023-04-25 深圳康佳电子科技有限公司 一种通话控制方法、装置及智能终端

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008126329A (ja) * 2006-11-17 2008-06-05 Toyota Motor Corp 音声認識ロボットおよび音声認識ロボットの制御方法
CN105159111A (zh) * 2015-08-24 2015-12-16 百度在线网络技术(北京)有限公司 基于人工智能的智能交互设备控制方法及系统
CN105679328A (zh) * 2016-01-28 2016-06-15 苏州科达科技股份有限公司 一种语音信号处理方法、装置及系统
CN105812969A (zh) * 2014-12-31 2016-07-27 展讯通信(上海)有限公司 一种拾取声音信号的方法、系统及装置
CN105976826A (zh) * 2016-04-28 2016-09-28 中国科学技术大学 应用于双麦克风小型手持设备的语音降噪方法
CN107680593A (zh) * 2017-10-13 2018-02-09 歌尔股份有限公司 一种智能设备的语音增强方法及装置

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6593956B1 (en) * 1998-05-15 2003-07-15 Polycom, Inc. Locating an audio source
JP4689107B2 (ja) * 2001-08-22 2011-05-25 本田技研工業株式会社 自律行動ロボット
JP2003255993A (ja) * 2002-03-04 2003-09-10 Ntt Docomo Inc 音声認識システム、音声認識方法、音声認識プログラム、音声合成システム、音声合成方法、音声合成プログラム
CN1212608C (zh) * 2003-09-12 2005-07-27 中国科学院声学研究所 一种采用后置滤波器的多通道语音增强方法
US8582783B2 (en) * 2008-04-07 2013-11-12 Dolby Laboratories Licensing Corporation Surround sound generation from a microphone array
CN102854494B (zh) * 2012-08-08 2015-09-09 Tcl集团股份有限公司 一种声源定位方法及装置
CN103841357A (zh) * 2012-11-21 2014-06-04 中兴通讯股份有限公司 基于视频跟踪的麦克风阵列声源定位方法、装置及系统
WO2016183791A1 (zh) * 2015-05-19 2016-11-24 华为技术有限公司 一种语音信号处理方法及装置
CN106292732A (zh) * 2015-06-10 2017-01-04 上海元趣信息技术有限公司 基于声源定位和人脸检测的智能机器人转动方法
CN105184214B (zh) * 2015-07-20 2019-02-01 北京进化者机器人科技有限公司 一种基于声源定位和人脸检测的人体定位方法和系统
CN106338711A (zh) * 2016-08-30 2017-01-18 康佳集团股份有限公司 一种基于智能设备的语音定向方法及系统
US9674453B1 (en) * 2016-10-26 2017-06-06 Cisco Technology, Inc. Using local talker position to pan sound relative to video frames at a remote location
CN107369456A (zh) * 2017-07-05 2017-11-21 南京邮电大学 数字助听器中基于广义旁瓣抵消器的噪声消除方法
CN107677992B (zh) * 2017-09-30 2021-06-22 深圳市沃特沃德股份有限公司 移动侦测方法、装置和监控设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008126329A (ja) * 2006-11-17 2008-06-05 Toyota Motor Corp 音声認識ロボットおよび音声認識ロボットの制御方法
CN105812969A (zh) * 2014-12-31 2016-07-27 展讯通信(上海)有限公司 一种拾取声音信号的方法、系统及装置
CN105159111A (zh) * 2015-08-24 2015-12-16 百度在线网络技术(北京)有限公司 基于人工智能的智能交互设备控制方法及系统
CN105679328A (zh) * 2016-01-28 2016-06-15 苏州科达科技股份有限公司 一种语音信号处理方法、装置及系统
CN105976826A (zh) * 2016-04-28 2016-09-28 中国科学技术大学 应用于双麦克风小型手持设备的语音降噪方法
CN107680593A (zh) * 2017-10-13 2018-02-09 歌尔股份有限公司 一种智能设备的语音增强方法及装置

Also Published As

Publication number Publication date
CN108957392A (zh) 2018-12-07

Similar Documents

Publication Publication Date Title
WO2019200722A1 (zh) 声源方向估计方法和装置
CN109506568B (zh) 一种基于图像识别和语音识别的声源定位方法及装置
KR101659712B1 (ko) 입자 필터링을 이용한 음원 위치를 추정
US20180376273A1 (en) System and method for determining audio context in augmented-reality applications
JP4872871B2 (ja) 音源方向検出装置、音源方向検出方法及び音源方向検出カメラ
US6525993B2 (en) Speaker direction detection circuit and speaker direction detection method used in this circuit
US9282399B2 (en) Listen to people you recognize
US9900685B2 (en) Creating an audio envelope based on angular information
JP6703525B2 (ja) 音源を強調するための方法及び機器
WO2021037129A1 (zh) 一种声音采集方法及装置
CN107677992B (zh) 移动侦测方法、装置和监控设备
WO2015184893A1 (zh) 移动终端通话语音降噪方法及装置
CN105611167B (zh) 一种对焦平面调整方法及电子设备
US20130332156A1 (en) Sensor Fusion to Improve Speech/Audio Processing in a Mobile Device
WO2014161309A1 (zh) 一种移动终端实现声源定位的方法及装置
EP3542549A1 (en) Distributed audio capture and mixing controlling
EP3576430A1 (en) Audio signal processing method and device, and storage medium
TWI678696B (zh) 語音資訊的接收方法、系統及裝置
WO2017152601A1 (zh) 一种麦克风确定方法和终端
WO2011081527A1 (en) Method and system for determining the direction between a detection point and an acoustic source
KR101508092B1 (ko) 화상 회의를 지원하는 방법 및 시스템
CN101685153A (zh) 麦克风间距测量方法和装置
WO2016078415A1 (zh) 一种终端拾音控制方法、终端及终端拾音控制系统
WO2023056905A1 (zh) 声源定位方法、装置及设备
CN111933182B (zh) 声源跟踪方法、装置、设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18915437

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18915437

Country of ref document: EP

Kind code of ref document: A1