CN113053408B - Sound source separation method and device - Google Patents

Sound source separation method and device Download PDF

Info

Publication number
CN113053408B
CN113053408B CN202110268230.6A CN202110268230A CN113053408B CN 113053408 B CN113053408 B CN 113053408B CN 202110268230 A CN202110268230 A CN 202110268230A CN 113053408 B CN113053408 B CN 113053408B
Authority
CN
China
Prior art keywords
signal
doa
error
end signal
far
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110268230.6A
Other languages
Chinese (zh)
Other versions
CN113053408A (en
Inventor
丁少为
关海欣
梁家恩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Unisound Intelligent Technology Co Ltd
Shenzhen Yunzhisheng Information Technology Co Ltd
Original Assignee
Unisound Intelligent Technology Co Ltd
Shenzhen Yunzhisheng Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Unisound Intelligent Technology Co Ltd, Shenzhen Yunzhisheng Information Technology Co Ltd filed Critical Unisound Intelligent Technology Co Ltd
Priority to CN202110268230.6A priority Critical patent/CN113053408B/en
Publication of CN113053408A publication Critical patent/CN113053408A/en
Application granted granted Critical
Publication of CN113053408B publication Critical patent/CN113053408B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Obtaining Desirable Characteristics In Audible-Bandwidth Transducers (AREA)

Abstract

The invention relates to a sound source separation method, which comprises the following steps: according to the array element distance preset in the microphone array, a first differential beam former and a second differential beam former are respectively arranged at a first end and a second end of the microphone array; converting the mixed signal to a short-time frequency domain to obtain a first signal; calculating a DOA estimate for each frame in the first signal; calculating first and second DOA errors; inputting the first signal into a first and a second differential beam former to obtain a first far-end signal, a first near-end signal, a second far-end signal and a second near-end signal; according to the first DOA error, performing first adaptive cancellation processing on the first near-end signal and the second far-end signal to obtain a first output signal; according to the second DOA error, second self-adaptive cancellation processing is carried out on the first far-end signal and the second near-end signal, and a second output signal is obtained; and respectively carrying out short-time Fourier inverse transformation on the first output signal and the second output signal to obtain a first separation signal and a second separation signal.

Description

Sound source separation method and device
Technical Field
The present invention relates to the field of speech processing technologies, and in particular, to a sound source separation method and apparatus.
Background
In the prior art, when sound sources are separated, two fixed beam formers can be designed according to a uniform linear microphone array, main lobes point to two end-fire directions of the linear array respectively, two fixed weights are weighted to array receiving signals, and two paths of output signals are obtained and are separated signals, or blind source separation is performed through independent component analysis and the like.
However, in the conventional fixed beam former, when the number of array elements is small and the array aperture is small, the low-frequency main lobe is wide, and the suppression of the other-end signal is weak, resulting in a large amount of the other-end signal remaining in the split signal.
While other blind source separation methods result in higher computational complexity due to the need to solve the separation matrix.
Disclosure of Invention
The invention aims to provide a sound source separation method aiming at the defects of the prior art, so as to solve the problems that in the prior art, the residual quantity of signals at the other end in a separation signal is more, and the computation complexity of a blind source separation method is higher.
In a first aspect, the present invention provides a sound source separation method, including:
according to an array element distance preset in a microphone array, arranging a first differential beam former at a first end of the microphone array and arranging a second differential beam former at a second end of the microphone array; wherein the main lobe direction of the first differential beamformer is towards the first end, the null direction of the first differential beamformer is towards the second end, the main lobe direction of the second differential beamformer is towards the second end, and the null direction of the second differential beamformer is towards the first end;
carrying out short-time Fourier transform on a mixed signal received by a microphone array, and transforming the mixed signal to a short-time frequency domain to obtain a first signal; wherein the mixed signal is a mixed signal generated by a first sound source at the first end and a second sound source at the second end;
calculating a direction of arrival (DOA) estimate for each frame in the first signal;
calculating a first DOA error corresponding to a first sound source and a second DOA error corresponding to a second sound source according to the DOA estimation;
inputting the first signal into a first differential beam former and a second differential beam former respectively to obtain a first far-end signal and a first near-end signal output by the first differential beam former and a second far-end signal and a second near-end signal output by the second differential beam former;
according to the first DOA error, performing first adaptive cancellation processing on the first near-end signal and the second far-end signal to obtain a first output signal; according to the second DOA error, second self-adaptive cancellation processing is carried out on the first far-end signal and the second near-end signal to obtain a second output signal;
and respectively carrying out short-time Fourier inverse transformation on the first output signal and the second output signal to obtain a first separation signal and a second separation signal.
Preferably, the array element spacing is in the range of 2.0cm-3.5 cm.
Preferably, the calculating, according to the DOA estimation, a first DOA error corresponding to a first sound source and a second DOA error corresponding to a second sound source specifically includes:
according to the formula errACalculating a first DOA error from |0- θ |;
according to the formula errBCalculating a second DOA error from |180- θ |;
wherein, errAIs the first DOA error, errBIs a second DOA error; θ is the DOA estimate.
Preferably, the performing, according to the first DOA error, a first adaptive cancellation process on the first near-end signal and the second far-end signal to obtain a first output signal specifically includes:
comparing the first DOA error with a preset error threshold, and when the first DOA error is not greater than the preset error threshold, taking the first output signal of the current frame as a first near-end signal without updating a first adaptive processing filter coefficient;
and when the first DOA error is larger than a preset error threshold value, the second far-end signal of the current frame is not reserved, and the coefficient of the first self-adaptive processing filter is updated.
Preferably, the performing, according to the second DOA error, second adaptive cancellation processing on the first far-end signal and the second near-end signal to obtain a second output signal specifically includes:
comparing the second DOA error with a preset error threshold, and when the second DOA error is not greater than the preset error threshold, taking the second output signal of the current frame as a second near-end signal without updating the coefficient of a second adaptive processing filter;
and when the second DOA error is larger than a preset error threshold value, the first far-end signal of the current frame is not reserved, and the coefficient of the second self-adaptive processing filter is updated.
In a second aspect, the present invention provides a sound source separating apparatus comprising:
the microphone array comprises a setting unit, a first differential beam former and a second differential beam former, wherein the setting unit is used for setting the first differential beam former at the first end of the microphone array and setting the second differential beam former at the second end of the microphone array according to array element spacing preset in the microphone array; wherein the main lobe direction of the first differential beamformer is towards the first end, the null direction of the first differential beamformer is towards the second end, the main lobe direction of the second differential beamformer is towards the second end, and the null direction of the second differential beamformer is towards the first end;
the microphone array comprises a conversion unit, a first signal processing unit and a second signal processing unit, wherein the conversion unit is used for carrying out short-time Fourier transform on a mixed signal received by the microphone array and converting the mixed signal to a short-time frequency domain to obtain a first signal; wherein the mixed signal is a mixed signal generated by a first sound source at the first end and a second sound source at the second end;
a calculation unit for calculating a direction of arrival, DOA, estimate for each frame in the first signal;
the calculating unit is further used for calculating a first DOA error corresponding to the first sound source and a second DOA error corresponding to the second sound source according to the DOA estimation;
a processing unit, configured to input the first signal into a first differential beam former and a second differential beam former respectively, so as to obtain a first far-end signal and a first near-end signal output by the first differential beam former, and a second far-end signal and a second near-end signal output by the second differential beam former;
the processing unit is further configured to perform a first adaptive cancellation process on the first near-end signal and the second far-end signal according to the first DOA error to obtain a first output signal; according to the second DOA error, second self-adaptive cancellation processing is carried out on the first far-end signal and the second near-end signal to obtain a second output signal;
the transformation unit is further configured to perform short-time inverse fourier transformation on the first output signal and the second output signal, respectively, to obtain a first separated signal and a second separated signal.
Preferably, the array element spacing is in the range of 2.0cm-3.5 cm.
Preferably, the computing unit is specifically configured to:
according to the formula errACalculating a first DOA error from |0- θ |;
according to the formula errBCalculating a second DOA error from |180- θ |;
wherein, errAIs the first DOA error, errBIs a second DOA error; θ is the DOA estimate.
Preferably, the processing unit is specifically configured to:
comparing the first DOA error with a preset error threshold, and when the first DOA error is not greater than the preset error threshold, taking the first output signal of the current frame as a first near-end signal without updating the coefficient of a first adaptive processing filter;
and when the first DOA error is larger than a preset error threshold value, the current frame is a second far-end signal, and the coefficient of the first self-adaptive processing filter is updated.
Preferably, the processing unit is specifically configured to:
comparing the second DOA error with a preset error threshold, and when the second DOA error is not greater than the preset error threshold, taking the second output signal of the current frame as a second near-end signal without updating the coefficient of a second adaptive processing filter;
and when the second DOA error is larger than a preset error threshold value, the current frame is a first far-end signal, and the coefficient of a second self-adaptive processing filter is updated.
In a third aspect, the invention provides an apparatus comprising a memory for storing a program and a processor for performing the method of any of the first aspects.
In a fourth aspect, the present invention provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method according to any one of the first aspect.
In a fifth aspect, the invention provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the method of any of the first aspects.
According to the sound source separation method provided by the embodiment of the invention, the adaptive cancellation processing is added after the output of the differential beam former, so that the interference residue at the other end in the output signal is less, and the parameters of the adaptive cancellation filter are controlled to be updated through DOA errors, so that the voice damage after separation is reduced; in addition, the method adopts the fixed beam forming and self-adaptive cancellation technology, does not relate to solving of a separation matrix, and has lower calculation complexity compared with blind source separation methods such as independent component analysis and the like.
Drawings
Fig. 1 is a schematic flow chart of a sound source separation method according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a microphone array according to an embodiment of the invention;
fig. 3 is a schematic structural diagram of a sound source separation apparatus according to a second embodiment of the present invention.
Detailed Description
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments.
The terms "first," "second," and the like in the description and claims of the present application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Fig. 1 is a schematic flow chart of a sound source separation method according to an embodiment of the present invention, where an execution subject of the method is a device with a computing function, such as a terminal and a server. The technical solution of the present invention is described in detail below with reference to fig. 1.
Step 110, according to an array element distance preset in a microphone array, arranging a first differential beam former at a first end of the microphone array and arranging a second differential beam former at a second end of the microphone array; the main lobe direction of the first differential beam former faces to the first end, the zero limit direction of the first differential beam former faces to the second end, the main lobe direction of the second differential beam former faces to the second end, and the zero limit direction of the second differential beam former faces to the first end;
specifically, in the present application, the array element spacing may be set to 2.0cm to 3.5cm based on a small-spacing microphone array, and the speakers a and B are respectively located at two ends of the microphone array, such as the 2mic array shown in fig. 2. Two first-order differential beamformers can be designed according to the array element spacing, namely a first differential beamformer and a second differential beamformer, wherein the main lobe direction of the first differential beamformer is 0 degrees, namely the talker a direction, and the null direction is 180 degrees, namely the talker B direction, the first differential beamformer is opposite to the first differential beamformer, namely the main lobe direction of the second differential beamformer is 180 degrees, and the null direction is 0 degree. Speaker a corresponds to a first sound source and speaker B corresponds to a second sound source.
Step 120, performing short-time fourier transform on the mixed signal received by the microphone array, and transforming the mixed signal to a short-time frequency domain to obtain a first signal; the mixed signal is generated by a first sound source at the first end and a second sound source at the second end.
Specifically, the microphone array may receive mixed sound signals of a first sound source and a second sound source, and since a speech signal has a short-time stationary characteristic and is generally converted to a short-time frequency domain for analysis, the mixed sound signal is subjected to short-time fourier transform to obtain a first signal.
Step 130, calculating a direction of arrival (DOA) estimate of each frame in the first signal.
Specifically, the DOA estimation value of each frame of signal can be obtained in real time by performing DOA estimation through any one of the commonly used methods, such as a beam forming algorithm, a subspace algorithm, and a deconvolution algorithm.
Step 140, calculating a first DOA error corresponding to the first sound source and a second DOA error corresponding to the second sound source according to the DOA estimation;
specifically, the DOA estimate may be denoted as θ, and a first DOA error err corresponding to a first sound source, i.e., the talker a direction, may be calculated respectivelyASecond DOA error err corresponding to the direction of the second sound source, i.e. speaker BBWherein, errA=|0-θ|,errB=|180-θ|。
Step 150, inputting the first signal into a first differential beam former and a second differential beam former respectively to obtain a first far-end signal and a first near-end signal output by the first differential beam former, and a second far-end signal and a second near-end signal output by the second differential beam former;
specifically, the first main lobe direction of the first differential beamformer is a, and the null direction is B, i.e., the a-direction signal is retained and the B-direction signal is suppressed in the output signal. The second differential beamformer, in contrast, retains the B-direction signals and suppresses the a-direction signals. The first near-end signal output by the first differential beamformer may be denoted as SA1 and the first far-end signal as SB1, and the second near-end signal SA2 and the second far-end signal output by the second differential beamformer may be denoted as SB 2.
Step 160, according to the first DOA error, performing a first adaptive cancellation process on the first near-end signal and the second far-end signal to obtain a first output signal; according to the second DOA error, second self-adaptive cancellation processing is carried out on the first far-end signal and the second near-end signal to obtain a second output signal;
for the first adaptive cancellation processing, the first DOA error may be compared with a preset error threshold, when the first DOA error is not greater than the preset error threshold, the first output signal of the current frame is a first near-end signal, and at this time, the coefficient of the first adaptive processing filter is not updated;
when the first DOA error is larger than the preset error threshold value, the current frame is the second far-end signal, and the first self-adaptive processing filter coefficient is updated, so that the first near-end signal of the current frame is continuously processed through the updated first self-adaptive processing filter coefficient.
Specifically, the first adaptive processing filter is obtained by updating according to the input signal in the first adaptive cancellation process. The error threshold is an experimental value, and can be set according to a plurality of experiments, for example, the error threshold θ can be setth30. There is impairment to the target direction speech signal if it is updated during the target direction speech phase, so whether the first adaptive processing filter is updated is controlled by the real-time first DOA. The first adaptive processing filter update is performed on a non-target signal, and only the current first adaptive processing filter coefficients are used to process data at the target signal stage without changing the value of the first adaptive processing filter, the update of the first adaptive processing filter being adaptively updated according to the signal. If errA≤θthThen, the frame signal is an a-direction signal, and needs to be reserved, i.e. the SA1 is reserved, and the output is marked as TA. If errA>θthIf the frame signal is interference noise or B-direction signal, it needs to be eliminated, and at this time, the coefficient of the filter is updated, so as to continuously determine the first output signal.
Correspondingly, aiming at the second adaptive cancellation processing, the second DOA error is compared with a preset error threshold, when the second DOA error is not greater than the preset error threshold, the second output signal of the current frame is a second near-end signal, and the coefficient of a filter of the second adaptive processing is not updated;
and when the second DOA error is larger than a preset error threshold value, the current frame is a first far-end signal, and the coefficient of the second self-adaptive processing filter is updated.
Specifically, the adaptive cancellation processing can be performed according to the first far-end signal SB1 and the second near-end signal SA2, and also according to errBControls whether the second adaptive processing filter coefficients are updated, i.e.: if errB>θthIf the frame signal is the interference noise or the a-direction signal and needs to be eliminated, the second adaptive processing filter coefficient is updated. If errB≤θthThen the frame signal is a B-direction signal, and needs to be reserved, i.e. the SA2 is reserved, and the output is denoted as TB
The first adaptive cancellation processing may be any one of a Least Mean Square (Least Mean Square LMS) algorithm LMS, a Normalized LMS (NLMS) algorithm, and an Least Square (RLS) method. The second adaptive cancellation process and the first adaptive cancellation process are the same algorithm.
And 170, performing short-time inverse Fourier transform on the first output signal and the second output signal respectively to obtain a first separation signal and a second separation signal.
Specifically, for two output signals TAAnd TBAnd respectively carrying out short-time Fourier inverse transformation to obtain a final first separation signal A and a final second separation signal B.
Furthermore, the array element array can be expanded to more array elements, and only two corresponding differential beam formers need to be designed according to the linear microphone array.
According to the sound source separation method provided by the embodiment of the invention, the adaptive cancellation processing is added after the output of the differential beam former, so that the interference residue at the other end in the output signal is less, and the parameters of the adaptive cancellation filter are controlled to be updated through DOA errors, so that the voice damage after separation is reduced; in addition, the method adopts the fixed beam forming and self-adaptive cancellation technology, does not relate to solving of a separation matrix, and has lower calculation complexity compared with blind source separation methods such as independent component analysis and the like.
Fig. 3 is a schematic structural diagram of a sound source separation apparatus according to a second embodiment of the present invention, as shown in fig. 3, the sound source separation apparatus includes: a setting unit 310, a transformation unit 320, a calculation unit 330 and a processing unit 340.
The setting unit 310 is configured to set a first differential beam former at a first end of the microphone array and a second differential beam former at a second end of the microphone array according to an array element distance preset in the microphone array; the main lobe direction of the first differential beam former faces to the first end, the zero limit direction of the first differential beam former faces to the second end, the main lobe direction of the second differential beam former faces to the second end, and the zero limit direction of the second differential beam former faces to the first end;
the transformation unit 320 is configured to perform short-time fourier transformation on the mixed signal received by the microphone array, and transform the mixed signal to a short-time frequency domain to obtain a first signal; the mixed signal is generated by a first sound source at a first end and a second sound source at a second end;
the calculating unit 330 is configured to calculate a direction of arrival DOA estimate for each frame in the first signal;
the calculating unit 330 is further configured to calculate, according to the DOA estimation, a first DOA error corresponding to the first sound source and a second DOA error corresponding to the second sound source;
the processing unit 340 is configured to input the first signal into the first differential beam former and the second differential beam former respectively to obtain a first far-end signal and a first near-end signal output by the first differential beam former, and a second far-end signal and a second near-end signal output by the second differential beam former;
the processing unit 340 is further configured to perform a first adaptive cancellation process on the first near-end signal and the second far-end signal according to the first DOA error to obtain a first output signal; according to the second DOA error, second self-adaptive cancellation processing is carried out on the first far-end signal and the second near-end signal to obtain a second output signal;
the transforming unit 320 is further configured to perform short-time inverse fourier transform on the first output signal and the second output signal, respectively, to obtain a first separated signal and a second separated signal.
Wherein the spacing between the array elements is within 2.0cm-3.5 cm.
Wherein, the calculating unit 330 is specifically configured to:
according to the formula errACalculating a first DOA error from |0- θ |;
according to the formula errBCalculating a second DOA error from |180- θ |;
wherein, errAIs the first DOA error, errBIs a second DOA error; θ is the DOA estimate.
Wherein the processing unit 340 is specifically configured to:
comparing the first DOA error with a preset error threshold, and when the first DOA error is not greater than the preset error threshold, taking the first output signal of the current frame as a first near-end signal, and not updating the coefficient of the first adaptive processing filter;
and when the first DOA error is larger than a preset error threshold value, the current frame is a second far-end signal, and the coefficient of the first self-adaptive processing filter is updated.
Wherein, the processing unit 340 is specifically configured to:
comparing the second DOA error with a preset error threshold, and when the second DOA error is not greater than the preset error threshold, taking the second output signal of the current frame as a second near-end signal, and not updating the coefficient of the second adaptive processing filter;
and when the second DOA error is larger than a preset error threshold value, the current frame is the first far-end signal, and the coefficient of the second self-adaptive processing filter is updated.
According to the sound source separation device provided by the embodiment of the invention, the adaptive cancellation processing is added after the output of the differential beam former, so that the interference residue at the other end in the output signal is less, and the parameters of the adaptive cancellation filter are controlled to be updated through DOA errors, so that the voice damage after separation is reduced; in addition, the method adopts the fixed beam forming and self-adaptive cancellation technology, does not relate to solving of a separation matrix, and has lower calculation complexity compared with blind source separation methods such as independent component analysis and the like.
The third embodiment of the invention provides equipment, which comprises a memory and a processor, wherein the memory is used for storing programs, and the memory can be connected with the processor through a bus. The memory may be a non-volatile memory such as a hard disk drive and a flash memory, in which a software program and a device driver are stored. The software program is capable of performing various functions of the above-described methods provided by embodiments of the present invention; the device drivers may be network and interface drivers. The processor is used for executing a software program, and the software program can realize the method provided by the first embodiment of the invention when being executed.
A fourth embodiment of the present invention provides a computer program product including instructions, which, when the computer program product runs on a computer, causes the computer to execute the method provided in the first embodiment of the present invention.
The fifth embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the method provided in the first embodiment of the present invention is implemented.
Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, a software module executed by a processor, or a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. A sound source separation method, characterized by comprising:
according to the array element distance preset in a microphone array, arranging a first differential beam former at a first end of the microphone array and arranging a second differential beam former at a second end of the microphone array; wherein the main lobe direction of the first differential beamformer is towards the first end, the null direction of the first differential beamformer is towards the second end, the main lobe direction of the second differential beamformer is towards the second end, and the null direction of the second differential beamformer is towards the first end;
carrying out short-time Fourier transform on a mixed signal received by a microphone array, and transforming the mixed signal to a short-time frequency domain to obtain a first signal; wherein the mixed signal is a mixed signal generated by a first sound source at the first end and a second sound source at the second end;
calculating a direction of arrival (DOA) estimate for each frame in the first signal;
calculating a first DOA error corresponding to a first sound source and a second DOA error corresponding to a second sound source according to the DOA estimation;
inputting the first signal into a first differential beam former and a second differential beam former respectively to obtain a first far-end signal and a first near-end signal output by the first differential beam former and a second far-end signal and a second near-end signal output by the second differential beam former;
according to the first DOA error, performing first adaptive cancellation processing on the first near-end signal and the second far-end signal to obtain a first output signal; according to the second DOA error, second self-adaptive cancellation processing is carried out on the first far-end signal and the second near-end signal to obtain a second output signal;
and respectively carrying out short-time Fourier inverse transformation on the first output signal and the second output signal to obtain a first separation signal and a second separation signal.
2. The method of claim 1, wherein the array element spacing is in the range of 2.0cm to 3.5 cm.
3. The method according to claim 1, wherein said calculating, from said DOA estimates, a first DOA error for a first acoustic source and a second DOA error for a second acoustic source specifically comprises:
according to the formula errACalculating a first DOA error from |0- θ |;
according to the formula errBCalculating a second DOA error from |180- θ |;
wherein, errAIs the first DOA error, errBIs a second DOA error; θ is the DOA estimate.
4. The method of claim 1, wherein the performing a first adaptive cancellation process on the first near-end signal and the second far-end signal according to the first DOA error to obtain a first output signal specifically comprises:
comparing the first DOA error with a preset error threshold, and when the first DOA error is not greater than the preset error threshold, taking the first output signal of the current frame as a first near-end signal without updating the coefficient of a first adaptive processing filter;
and when the first DOA error is larger than a preset error threshold value, the current frame is a second far-end signal, and the coefficient of the first self-adaptive processing filter is updated.
5. The method according to claim 1, wherein the performing, according to the second DOA error, second adaptive cancellation processing on the first far-end signal and the second near-end signal to obtain a second output signal specifically includes:
comparing the second DOA error with a preset error threshold, and when the second DOA error is not greater than the preset error threshold, taking the second output signal of the current frame as a second near-end signal without updating the coefficient of a second adaptive processing filter;
and when the second DOA error is larger than a preset error threshold value, the current frame is a first far-end signal, and the coefficient of the second self-adaptive processing filter is updated.
6. A sound source separation device, characterized by comprising:
the microphone array comprises a setting unit, a first differential beam former and a second differential beam former, wherein the setting unit is used for setting the first differential beam former at the first end of a microphone array and setting the second differential beam former at the second end of the microphone array according to array element spacing preset in the microphone array; wherein the main lobe direction of the first differential beamformer is towards the first end, the null direction of the first differential beamformer is towards the second end, the main lobe direction of the second differential beamformer is towards the second end, and the null direction of the second differential beamformer is towards the first end;
the microphone array comprises a conversion unit, a first signal processing unit and a second signal processing unit, wherein the conversion unit is used for carrying out short-time Fourier transform on a mixed signal received by the microphone array and converting the mixed signal to a short-time frequency domain to obtain a first signal; wherein the mixed signal is a mixed signal generated by a first sound source at the first end and a second sound source at the second end;
a calculation unit for calculating a direction of arrival, DOA, estimate for each frame in the first signal;
the calculating unit is further used for calculating a first DOA error corresponding to the first sound source and a second DOA error corresponding to the second sound source according to the DOA estimation;
a processing unit, configured to input the first signal into a first differential beam former and a second differential beam former respectively, so as to obtain a first far-end signal and a first near-end signal output by the first differential beam former, and a second far-end signal and a second near-end signal output by the second differential beam former;
the processing unit is further configured to perform a first adaptive cancellation process on the first near-end signal and the second far-end signal according to the first DOA error to obtain a first output signal; according to the second DOA error, second self-adaptive cancellation processing is carried out on the first far-end signal and the second near-end signal to obtain a second output signal;
the transformation unit is further configured to perform short-time inverse fourier transformation on the first output signal and the second output signal, respectively, to obtain a first separated signal and a second separated signal.
7. The apparatus of claim 6, wherein the array element spacing is in the range of 2.0cm-3.5 cm.
8. The apparatus according to claim 6, wherein the computing unit is specifically configured to:
according to the formula errACalculating a first DOA error from |0- θ |;
according to the formula errBCalculating a second DOA error from |180- θ |;
wherein, errAIs the first DOA error, errBIs a second DOA error; θ is the DOA estimate.
9. The apparatus according to claim 6, wherein the processing unit is specifically configured to:
comparing the first DOA error with a preset error threshold, and when the first DOA error is not greater than the preset error threshold, taking the first output signal of the current frame as a first near-end signal without updating a first adaptive processing filter coefficient;
and when the first DOA error is larger than a preset error threshold value, the current frame is a second far-end signal, and the coefficient of the first self-adaptive processing filter is updated.
10. The apparatus according to claim 6, wherein the processing unit is specifically configured to:
comparing the second DOA error with a preset error threshold, and when the second DOA error is not greater than the preset error threshold, taking the second output signal of the current frame as a second near-end signal without updating the coefficient of a second adaptive processing filter;
and when the second DOA error is larger than a preset error threshold value, the current frame is a first far-end signal, and the coefficient of the second self-adaptive processing filter is updated.
CN202110268230.6A 2021-03-12 2021-03-12 Sound source separation method and device Active CN113053408B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110268230.6A CN113053408B (en) 2021-03-12 2021-03-12 Sound source separation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110268230.6A CN113053408B (en) 2021-03-12 2021-03-12 Sound source separation method and device

Publications (2)

Publication Number Publication Date
CN113053408A CN113053408A (en) 2021-06-29
CN113053408B true CN113053408B (en) 2022-06-14

Family

ID=76511725

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110268230.6A Active CN113053408B (en) 2021-03-12 2021-03-12 Sound source separation method and device

Country Status (1)

Country Link
CN (1) CN113053408B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102831898A (en) * 2012-08-31 2012-12-19 厦门大学 Microphone array voice enhancement device with sound source direction tracking function and method thereof
CN106710603A (en) * 2016-12-23 2017-05-24 上海语知义信息技术有限公司 Speech recognition method and system based on linear microphone array
CN110554357A (en) * 2019-09-12 2019-12-10 苏州思必驰信息科技有限公司 Sound source positioning method and device
CN110931036A (en) * 2019-12-07 2020-03-27 杭州国芯科技股份有限公司 Microphone array beam forming method
CN111429939A (en) * 2020-02-20 2020-07-17 西安声联科技有限公司 Sound signal separation method of double sound sources and sound pickup

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101470528B1 (en) * 2008-06-09 2014-12-15 삼성전자주식회사 Adaptive mode controller and method of adaptive beamforming based on detection of desired sound of speaker's direction
US9210499B2 (en) * 2012-12-13 2015-12-08 Cisco Technology, Inc. Spatial interference suppression using dual-microphone arrays

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102831898A (en) * 2012-08-31 2012-12-19 厦门大学 Microphone array voice enhancement device with sound source direction tracking function and method thereof
CN106710603A (en) * 2016-12-23 2017-05-24 上海语知义信息技术有限公司 Speech recognition method and system based on linear microphone array
CN110554357A (en) * 2019-09-12 2019-12-10 苏州思必驰信息科技有限公司 Sound source positioning method and device
CN110931036A (en) * 2019-12-07 2020-03-27 杭州国芯科技股份有限公司 Microphone array beam forming method
CN111429939A (en) * 2020-02-20 2020-07-17 西安声联科技有限公司 Sound signal separation method of double sound sources and sound pickup

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Speech Enhancement Based on the General TransferFunction GSCand Postfiltering;Sharon Gannot,Israel Cohen;《IEEE Transactions on Speech and Audio Processing》;20041231;全文 *
一种采用旁瓣增强的麦克风阵列抗混响算法;李剑汶等;《厦门大学学报(自然科学版)》;20171231(第05期);全文 *
用于语音识别的鲁棒自适应麦克风阵列算法;赵贤宇等;《清华大学学报(自然科学版)》;20041030(第10期);全文 *

Also Published As

Publication number Publication date
CN113053408A (en) 2021-06-29

Similar Documents

Publication Publication Date Title
KR101339592B1 (en) Sound source separator device, sound source separator method, and computer readable recording medium having recorded program
US10827263B2 (en) Adaptive beamforming
EP2222091B1 (en) Method for determining a set of filter coefficients for an acoustic echo compensation means
US8374358B2 (en) Method for determining a noise reference signal for noise compensation and/or noise reduction
US7366662B2 (en) Separation of target acoustic signals in a multi-transducer arrangement
EP2237271B1 (en) Method for determining a signal component for reducing noise in an input signal
US9002027B2 (en) Space-time noise reduction system for use in a vehicle and method of forming same
CN109285557B (en) Directional pickup method and device and electronic equipment
CN111128220B (en) Dereverberation method, apparatus, device and storage medium
JP3795610B2 (en) Signal processing device
CN110660404B (en) Voice communication and interactive application system and method based on null filtering preprocessing
Reuven et al. Dual-source transfer-function generalized sidelobe canceller
CN113050035B (en) Two-dimensional directional pickup method and device
EP3545691B1 (en) Far field sound capturing
JP2005318518A (en) Double-talk state judging method, echo cancel method, double-talk state judging apparatus, echo cancel apparatus, and program
CN113053408B (en) Sound source separation method and device
CN112017680A (en) Dereverberation method and device
Priyanka et al. Adaptive Beamforming Using Zelinski-TSNR Multichannel Postfilter for Speech Enhancement
Comminiello et al. A novel affine projection algorithm for superdirective microphone array beamforming
Lim et al. MINTFormer: A spatially aware channel equalizer
CN110661510B (en) Beam former forming method, beam forming device and electronic equipment
Yoshioka et al. Speech dereverberation and denoising based on time varying speech model and autoregressive reverberation model
CN111863017B (en) In-vehicle directional pickup method based on double microphone arrays and related device
KR102649227B1 (en) Double-microphone array echo eliminating method, device and electronic equipment
Wang et al. A Joint Speech Enhancement Algorithm Based on the Tri-Microphone

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant