CN113053408B - Sound source separation method and device - Google Patents
Sound source separation method and device Download PDFInfo
- Publication number
- CN113053408B CN113053408B CN202110268230.6A CN202110268230A CN113053408B CN 113053408 B CN113053408 B CN 113053408B CN 202110268230 A CN202110268230 A CN 202110268230A CN 113053408 B CN113053408 B CN 113053408B
- Authority
- CN
- China
- Prior art keywords
- signal
- doa
- error
- end signal
- far
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000926 separation method Methods 0.000 title claims abstract description 40
- 238000012545 processing Methods 0.000 claims abstract description 70
- 230000003044 adaptive effect Effects 0.000 claims abstract description 44
- 230000009466 transformation Effects 0.000 claims abstract description 11
- 238000000034 method Methods 0.000 claims description 32
- 230000008569 process Effects 0.000 claims description 12
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000006243 chemical reaction Methods 0.000 claims description 4
- 230000001131 transforming effect Effects 0.000 claims description 4
- 238000004422 calculation algorithm Methods 0.000 description 8
- 238000004590 computer program Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 4
- 238000012880 independent component analysis Methods 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 2
- 230000005236 sound signal Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000006735 deficit Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Circuit For Audible Band Transducer (AREA)
- Obtaining Desirable Characteristics In Audible-Bandwidth Transducers (AREA)
Abstract
The invention relates to a sound source separation method, which comprises the following steps: according to the array element distance preset in the microphone array, a first differential beam former and a second differential beam former are respectively arranged at a first end and a second end of the microphone array; converting the mixed signal to a short-time frequency domain to obtain a first signal; calculating a DOA estimate for each frame in the first signal; calculating first and second DOA errors; inputting the first signal into a first and a second differential beam former to obtain a first far-end signal, a first near-end signal, a second far-end signal and a second near-end signal; according to the first DOA error, performing first adaptive cancellation processing on the first near-end signal and the second far-end signal to obtain a first output signal; according to the second DOA error, second self-adaptive cancellation processing is carried out on the first far-end signal and the second near-end signal, and a second output signal is obtained; and respectively carrying out short-time Fourier inverse transformation on the first output signal and the second output signal to obtain a first separation signal and a second separation signal.
Description
Technical Field
The present invention relates to the field of speech processing technologies, and in particular, to a sound source separation method and apparatus.
Background
In the prior art, when sound sources are separated, two fixed beam formers can be designed according to a uniform linear microphone array, main lobes point to two end-fire directions of the linear array respectively, two fixed weights are weighted to array receiving signals, and two paths of output signals are obtained and are separated signals, or blind source separation is performed through independent component analysis and the like.
However, in the conventional fixed beam former, when the number of array elements is small and the array aperture is small, the low-frequency main lobe is wide, and the suppression of the other-end signal is weak, resulting in a large amount of the other-end signal remaining in the split signal.
While other blind source separation methods result in higher computational complexity due to the need to solve the separation matrix.
Disclosure of Invention
The invention aims to provide a sound source separation method aiming at the defects of the prior art, so as to solve the problems that in the prior art, the residual quantity of signals at the other end in a separation signal is more, and the computation complexity of a blind source separation method is higher.
In a first aspect, the present invention provides a sound source separation method, including:
according to an array element distance preset in a microphone array, arranging a first differential beam former at a first end of the microphone array and arranging a second differential beam former at a second end of the microphone array; wherein the main lobe direction of the first differential beamformer is towards the first end, the null direction of the first differential beamformer is towards the second end, the main lobe direction of the second differential beamformer is towards the second end, and the null direction of the second differential beamformer is towards the first end;
carrying out short-time Fourier transform on a mixed signal received by a microphone array, and transforming the mixed signal to a short-time frequency domain to obtain a first signal; wherein the mixed signal is a mixed signal generated by a first sound source at the first end and a second sound source at the second end;
calculating a direction of arrival (DOA) estimate for each frame in the first signal;
calculating a first DOA error corresponding to a first sound source and a second DOA error corresponding to a second sound source according to the DOA estimation;
inputting the first signal into a first differential beam former and a second differential beam former respectively to obtain a first far-end signal and a first near-end signal output by the first differential beam former and a second far-end signal and a second near-end signal output by the second differential beam former;
according to the first DOA error, performing first adaptive cancellation processing on the first near-end signal and the second far-end signal to obtain a first output signal; according to the second DOA error, second self-adaptive cancellation processing is carried out on the first far-end signal and the second near-end signal to obtain a second output signal;
and respectively carrying out short-time Fourier inverse transformation on the first output signal and the second output signal to obtain a first separation signal and a second separation signal.
Preferably, the array element spacing is in the range of 2.0cm-3.5 cm.
Preferably, the calculating, according to the DOA estimation, a first DOA error corresponding to a first sound source and a second DOA error corresponding to a second sound source specifically includes:
according to the formula errACalculating a first DOA error from |0- θ |;
according to the formula errBCalculating a second DOA error from |180- θ |;
wherein, errAIs the first DOA error, errBIs a second DOA error; θ is the DOA estimate.
Preferably, the performing, according to the first DOA error, a first adaptive cancellation process on the first near-end signal and the second far-end signal to obtain a first output signal specifically includes:
comparing the first DOA error with a preset error threshold, and when the first DOA error is not greater than the preset error threshold, taking the first output signal of the current frame as a first near-end signal without updating a first adaptive processing filter coefficient;
and when the first DOA error is larger than a preset error threshold value, the second far-end signal of the current frame is not reserved, and the coefficient of the first self-adaptive processing filter is updated.
Preferably, the performing, according to the second DOA error, second adaptive cancellation processing on the first far-end signal and the second near-end signal to obtain a second output signal specifically includes:
comparing the second DOA error with a preset error threshold, and when the second DOA error is not greater than the preset error threshold, taking the second output signal of the current frame as a second near-end signal without updating the coefficient of a second adaptive processing filter;
and when the second DOA error is larger than a preset error threshold value, the first far-end signal of the current frame is not reserved, and the coefficient of the second self-adaptive processing filter is updated.
In a second aspect, the present invention provides a sound source separating apparatus comprising:
the microphone array comprises a setting unit, a first differential beam former and a second differential beam former, wherein the setting unit is used for setting the first differential beam former at the first end of the microphone array and setting the second differential beam former at the second end of the microphone array according to array element spacing preset in the microphone array; wherein the main lobe direction of the first differential beamformer is towards the first end, the null direction of the first differential beamformer is towards the second end, the main lobe direction of the second differential beamformer is towards the second end, and the null direction of the second differential beamformer is towards the first end;
the microphone array comprises a conversion unit, a first signal processing unit and a second signal processing unit, wherein the conversion unit is used for carrying out short-time Fourier transform on a mixed signal received by the microphone array and converting the mixed signal to a short-time frequency domain to obtain a first signal; wherein the mixed signal is a mixed signal generated by a first sound source at the first end and a second sound source at the second end;
a calculation unit for calculating a direction of arrival, DOA, estimate for each frame in the first signal;
the calculating unit is further used for calculating a first DOA error corresponding to the first sound source and a second DOA error corresponding to the second sound source according to the DOA estimation;
a processing unit, configured to input the first signal into a first differential beam former and a second differential beam former respectively, so as to obtain a first far-end signal and a first near-end signal output by the first differential beam former, and a second far-end signal and a second near-end signal output by the second differential beam former;
the processing unit is further configured to perform a first adaptive cancellation process on the first near-end signal and the second far-end signal according to the first DOA error to obtain a first output signal; according to the second DOA error, second self-adaptive cancellation processing is carried out on the first far-end signal and the second near-end signal to obtain a second output signal;
the transformation unit is further configured to perform short-time inverse fourier transformation on the first output signal and the second output signal, respectively, to obtain a first separated signal and a second separated signal.
Preferably, the array element spacing is in the range of 2.0cm-3.5 cm.
Preferably, the computing unit is specifically configured to:
according to the formula errACalculating a first DOA error from |0- θ |;
according to the formula errBCalculating a second DOA error from |180- θ |;
wherein, errAIs the first DOA error, errBIs a second DOA error; θ is the DOA estimate.
Preferably, the processing unit is specifically configured to:
comparing the first DOA error with a preset error threshold, and when the first DOA error is not greater than the preset error threshold, taking the first output signal of the current frame as a first near-end signal without updating the coefficient of a first adaptive processing filter;
and when the first DOA error is larger than a preset error threshold value, the current frame is a second far-end signal, and the coefficient of the first self-adaptive processing filter is updated.
Preferably, the processing unit is specifically configured to:
comparing the second DOA error with a preset error threshold, and when the second DOA error is not greater than the preset error threshold, taking the second output signal of the current frame as a second near-end signal without updating the coefficient of a second adaptive processing filter;
and when the second DOA error is larger than a preset error threshold value, the current frame is a first far-end signal, and the coefficient of a second self-adaptive processing filter is updated.
In a third aspect, the invention provides an apparatus comprising a memory for storing a program and a processor for performing the method of any of the first aspects.
In a fourth aspect, the present invention provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method according to any one of the first aspect.
In a fifth aspect, the invention provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the method of any of the first aspects.
According to the sound source separation method provided by the embodiment of the invention, the adaptive cancellation processing is added after the output of the differential beam former, so that the interference residue at the other end in the output signal is less, and the parameters of the adaptive cancellation filter are controlled to be updated through DOA errors, so that the voice damage after separation is reduced; in addition, the method adopts the fixed beam forming and self-adaptive cancellation technology, does not relate to solving of a separation matrix, and has lower calculation complexity compared with blind source separation methods such as independent component analysis and the like.
Drawings
Fig. 1 is a schematic flow chart of a sound source separation method according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a microphone array according to an embodiment of the invention;
fig. 3 is a schematic structural diagram of a sound source separation apparatus according to a second embodiment of the present invention.
Detailed Description
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments.
The terms "first," "second," and the like in the description and claims of the present application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Fig. 1 is a schematic flow chart of a sound source separation method according to an embodiment of the present invention, where an execution subject of the method is a device with a computing function, such as a terminal and a server. The technical solution of the present invention is described in detail below with reference to fig. 1.
Step 110, according to an array element distance preset in a microphone array, arranging a first differential beam former at a first end of the microphone array and arranging a second differential beam former at a second end of the microphone array; the main lobe direction of the first differential beam former faces to the first end, the zero limit direction of the first differential beam former faces to the second end, the main lobe direction of the second differential beam former faces to the second end, and the zero limit direction of the second differential beam former faces to the first end;
specifically, in the present application, the array element spacing may be set to 2.0cm to 3.5cm based on a small-spacing microphone array, and the speakers a and B are respectively located at two ends of the microphone array, such as the 2mic array shown in fig. 2. Two first-order differential beamformers can be designed according to the array element spacing, namely a first differential beamformer and a second differential beamformer, wherein the main lobe direction of the first differential beamformer is 0 degrees, namely the talker a direction, and the null direction is 180 degrees, namely the talker B direction, the first differential beamformer is opposite to the first differential beamformer, namely the main lobe direction of the second differential beamformer is 180 degrees, and the null direction is 0 degree. Speaker a corresponds to a first sound source and speaker B corresponds to a second sound source.
Specifically, the microphone array may receive mixed sound signals of a first sound source and a second sound source, and since a speech signal has a short-time stationary characteristic and is generally converted to a short-time frequency domain for analysis, the mixed sound signal is subjected to short-time fourier transform to obtain a first signal.
Specifically, the DOA estimation value of each frame of signal can be obtained in real time by performing DOA estimation through any one of the commonly used methods, such as a beam forming algorithm, a subspace algorithm, and a deconvolution algorithm.
specifically, the DOA estimate may be denoted as θ, and a first DOA error err corresponding to a first sound source, i.e., the talker a direction, may be calculated respectivelyASecond DOA error err corresponding to the direction of the second sound source, i.e. speaker BBWherein, errA=|0-θ|,errB=|180-θ|。
Step 150, inputting the first signal into a first differential beam former and a second differential beam former respectively to obtain a first far-end signal and a first near-end signal output by the first differential beam former, and a second far-end signal and a second near-end signal output by the second differential beam former;
specifically, the first main lobe direction of the first differential beamformer is a, and the null direction is B, i.e., the a-direction signal is retained and the B-direction signal is suppressed in the output signal. The second differential beamformer, in contrast, retains the B-direction signals and suppresses the a-direction signals. The first near-end signal output by the first differential beamformer may be denoted as SA1 and the first far-end signal as SB1, and the second near-end signal SA2 and the second far-end signal output by the second differential beamformer may be denoted as SB 2.
Step 160, according to the first DOA error, performing a first adaptive cancellation process on the first near-end signal and the second far-end signal to obtain a first output signal; according to the second DOA error, second self-adaptive cancellation processing is carried out on the first far-end signal and the second near-end signal to obtain a second output signal;
for the first adaptive cancellation processing, the first DOA error may be compared with a preset error threshold, when the first DOA error is not greater than the preset error threshold, the first output signal of the current frame is a first near-end signal, and at this time, the coefficient of the first adaptive processing filter is not updated;
when the first DOA error is larger than the preset error threshold value, the current frame is the second far-end signal, and the first self-adaptive processing filter coefficient is updated, so that the first near-end signal of the current frame is continuously processed through the updated first self-adaptive processing filter coefficient.
Specifically, the first adaptive processing filter is obtained by updating according to the input signal in the first adaptive cancellation process. The error threshold is an experimental value, and can be set according to a plurality of experiments, for example, the error threshold θ can be setth30. There is impairment to the target direction speech signal if it is updated during the target direction speech phase, so whether the first adaptive processing filter is updated is controlled by the real-time first DOA. The first adaptive processing filter update is performed on a non-target signal, and only the current first adaptive processing filter coefficients are used to process data at the target signal stage without changing the value of the first adaptive processing filter, the update of the first adaptive processing filter being adaptively updated according to the signal. If errA≤θthThen, the frame signal is an a-direction signal, and needs to be reserved, i.e. the SA1 is reserved, and the output is marked as TA. If errA>θthIf the frame signal is interference noise or B-direction signal, it needs to be eliminated, and at this time, the coefficient of the filter is updated, so as to continuously determine the first output signal.
Correspondingly, aiming at the second adaptive cancellation processing, the second DOA error is compared with a preset error threshold, when the second DOA error is not greater than the preset error threshold, the second output signal of the current frame is a second near-end signal, and the coefficient of a filter of the second adaptive processing is not updated;
and when the second DOA error is larger than a preset error threshold value, the current frame is a first far-end signal, and the coefficient of the second self-adaptive processing filter is updated.
Specifically, the adaptive cancellation processing can be performed according to the first far-end signal SB1 and the second near-end signal SA2, and also according to errBControls whether the second adaptive processing filter coefficients are updated, i.e.: if errB>θthIf the frame signal is the interference noise or the a-direction signal and needs to be eliminated, the second adaptive processing filter coefficient is updated. If errB≤θthThen the frame signal is a B-direction signal, and needs to be reserved, i.e. the SA2 is reserved, and the output is denoted as TB。
The first adaptive cancellation processing may be any one of a Least Mean Square (Least Mean Square LMS) algorithm LMS, a Normalized LMS (NLMS) algorithm, and an Least Square (RLS) method. The second adaptive cancellation process and the first adaptive cancellation process are the same algorithm.
And 170, performing short-time inverse Fourier transform on the first output signal and the second output signal respectively to obtain a first separation signal and a second separation signal.
Specifically, for two output signals TAAnd TBAnd respectively carrying out short-time Fourier inverse transformation to obtain a final first separation signal A and a final second separation signal B.
Furthermore, the array element array can be expanded to more array elements, and only two corresponding differential beam formers need to be designed according to the linear microphone array.
According to the sound source separation method provided by the embodiment of the invention, the adaptive cancellation processing is added after the output of the differential beam former, so that the interference residue at the other end in the output signal is less, and the parameters of the adaptive cancellation filter are controlled to be updated through DOA errors, so that the voice damage after separation is reduced; in addition, the method adopts the fixed beam forming and self-adaptive cancellation technology, does not relate to solving of a separation matrix, and has lower calculation complexity compared with blind source separation methods such as independent component analysis and the like.
Fig. 3 is a schematic structural diagram of a sound source separation apparatus according to a second embodiment of the present invention, as shown in fig. 3, the sound source separation apparatus includes: a setting unit 310, a transformation unit 320, a calculation unit 330 and a processing unit 340.
The setting unit 310 is configured to set a first differential beam former at a first end of the microphone array and a second differential beam former at a second end of the microphone array according to an array element distance preset in the microphone array; the main lobe direction of the first differential beam former faces to the first end, the zero limit direction of the first differential beam former faces to the second end, the main lobe direction of the second differential beam former faces to the second end, and the zero limit direction of the second differential beam former faces to the first end;
the transformation unit 320 is configured to perform short-time fourier transformation on the mixed signal received by the microphone array, and transform the mixed signal to a short-time frequency domain to obtain a first signal; the mixed signal is generated by a first sound source at a first end and a second sound source at a second end;
the calculating unit 330 is configured to calculate a direction of arrival DOA estimate for each frame in the first signal;
the calculating unit 330 is further configured to calculate, according to the DOA estimation, a first DOA error corresponding to the first sound source and a second DOA error corresponding to the second sound source;
the processing unit 340 is configured to input the first signal into the first differential beam former and the second differential beam former respectively to obtain a first far-end signal and a first near-end signal output by the first differential beam former, and a second far-end signal and a second near-end signal output by the second differential beam former;
the processing unit 340 is further configured to perform a first adaptive cancellation process on the first near-end signal and the second far-end signal according to the first DOA error to obtain a first output signal; according to the second DOA error, second self-adaptive cancellation processing is carried out on the first far-end signal and the second near-end signal to obtain a second output signal;
the transforming unit 320 is further configured to perform short-time inverse fourier transform on the first output signal and the second output signal, respectively, to obtain a first separated signal and a second separated signal.
Wherein the spacing between the array elements is within 2.0cm-3.5 cm.
Wherein, the calculating unit 330 is specifically configured to:
according to the formula errACalculating a first DOA error from |0- θ |;
according to the formula errBCalculating a second DOA error from |180- θ |;
wherein, errAIs the first DOA error, errBIs a second DOA error; θ is the DOA estimate.
Wherein the processing unit 340 is specifically configured to:
comparing the first DOA error with a preset error threshold, and when the first DOA error is not greater than the preset error threshold, taking the first output signal of the current frame as a first near-end signal, and not updating the coefficient of the first adaptive processing filter;
and when the first DOA error is larger than a preset error threshold value, the current frame is a second far-end signal, and the coefficient of the first self-adaptive processing filter is updated.
Wherein, the processing unit 340 is specifically configured to:
comparing the second DOA error with a preset error threshold, and when the second DOA error is not greater than the preset error threshold, taking the second output signal of the current frame as a second near-end signal, and not updating the coefficient of the second adaptive processing filter;
and when the second DOA error is larger than a preset error threshold value, the current frame is the first far-end signal, and the coefficient of the second self-adaptive processing filter is updated.
According to the sound source separation device provided by the embodiment of the invention, the adaptive cancellation processing is added after the output of the differential beam former, so that the interference residue at the other end in the output signal is less, and the parameters of the adaptive cancellation filter are controlled to be updated through DOA errors, so that the voice damage after separation is reduced; in addition, the method adopts the fixed beam forming and self-adaptive cancellation technology, does not relate to solving of a separation matrix, and has lower calculation complexity compared with blind source separation methods such as independent component analysis and the like.
The third embodiment of the invention provides equipment, which comprises a memory and a processor, wherein the memory is used for storing programs, and the memory can be connected with the processor through a bus. The memory may be a non-volatile memory such as a hard disk drive and a flash memory, in which a software program and a device driver are stored. The software program is capable of performing various functions of the above-described methods provided by embodiments of the present invention; the device drivers may be network and interface drivers. The processor is used for executing a software program, and the software program can realize the method provided by the first embodiment of the invention when being executed.
A fourth embodiment of the present invention provides a computer program product including instructions, which, when the computer program product runs on a computer, causes the computer to execute the method provided in the first embodiment of the present invention.
The fifth embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the method provided in the first embodiment of the present invention is implemented.
Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, a software module executed by a processor, or a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.
Claims (10)
1. A sound source separation method, characterized by comprising:
according to the array element distance preset in a microphone array, arranging a first differential beam former at a first end of the microphone array and arranging a second differential beam former at a second end of the microphone array; wherein the main lobe direction of the first differential beamformer is towards the first end, the null direction of the first differential beamformer is towards the second end, the main lobe direction of the second differential beamformer is towards the second end, and the null direction of the second differential beamformer is towards the first end;
carrying out short-time Fourier transform on a mixed signal received by a microphone array, and transforming the mixed signal to a short-time frequency domain to obtain a first signal; wherein the mixed signal is a mixed signal generated by a first sound source at the first end and a second sound source at the second end;
calculating a direction of arrival (DOA) estimate for each frame in the first signal;
calculating a first DOA error corresponding to a first sound source and a second DOA error corresponding to a second sound source according to the DOA estimation;
inputting the first signal into a first differential beam former and a second differential beam former respectively to obtain a first far-end signal and a first near-end signal output by the first differential beam former and a second far-end signal and a second near-end signal output by the second differential beam former;
according to the first DOA error, performing first adaptive cancellation processing on the first near-end signal and the second far-end signal to obtain a first output signal; according to the second DOA error, second self-adaptive cancellation processing is carried out on the first far-end signal and the second near-end signal to obtain a second output signal;
and respectively carrying out short-time Fourier inverse transformation on the first output signal and the second output signal to obtain a first separation signal and a second separation signal.
2. The method of claim 1, wherein the array element spacing is in the range of 2.0cm to 3.5 cm.
3. The method according to claim 1, wherein said calculating, from said DOA estimates, a first DOA error for a first acoustic source and a second DOA error for a second acoustic source specifically comprises:
according to the formula errACalculating a first DOA error from |0- θ |;
according to the formula errBCalculating a second DOA error from |180- θ |;
wherein, errAIs the first DOA error, errBIs a second DOA error; θ is the DOA estimate.
4. The method of claim 1, wherein the performing a first adaptive cancellation process on the first near-end signal and the second far-end signal according to the first DOA error to obtain a first output signal specifically comprises:
comparing the first DOA error with a preset error threshold, and when the first DOA error is not greater than the preset error threshold, taking the first output signal of the current frame as a first near-end signal without updating the coefficient of a first adaptive processing filter;
and when the first DOA error is larger than a preset error threshold value, the current frame is a second far-end signal, and the coefficient of the first self-adaptive processing filter is updated.
5. The method according to claim 1, wherein the performing, according to the second DOA error, second adaptive cancellation processing on the first far-end signal and the second near-end signal to obtain a second output signal specifically includes:
comparing the second DOA error with a preset error threshold, and when the second DOA error is not greater than the preset error threshold, taking the second output signal of the current frame as a second near-end signal without updating the coefficient of a second adaptive processing filter;
and when the second DOA error is larger than a preset error threshold value, the current frame is a first far-end signal, and the coefficient of the second self-adaptive processing filter is updated.
6. A sound source separation device, characterized by comprising:
the microphone array comprises a setting unit, a first differential beam former and a second differential beam former, wherein the setting unit is used for setting the first differential beam former at the first end of a microphone array and setting the second differential beam former at the second end of the microphone array according to array element spacing preset in the microphone array; wherein the main lobe direction of the first differential beamformer is towards the first end, the null direction of the first differential beamformer is towards the second end, the main lobe direction of the second differential beamformer is towards the second end, and the null direction of the second differential beamformer is towards the first end;
the microphone array comprises a conversion unit, a first signal processing unit and a second signal processing unit, wherein the conversion unit is used for carrying out short-time Fourier transform on a mixed signal received by the microphone array and converting the mixed signal to a short-time frequency domain to obtain a first signal; wherein the mixed signal is a mixed signal generated by a first sound source at the first end and a second sound source at the second end;
a calculation unit for calculating a direction of arrival, DOA, estimate for each frame in the first signal;
the calculating unit is further used for calculating a first DOA error corresponding to the first sound source and a second DOA error corresponding to the second sound source according to the DOA estimation;
a processing unit, configured to input the first signal into a first differential beam former and a second differential beam former respectively, so as to obtain a first far-end signal and a first near-end signal output by the first differential beam former, and a second far-end signal and a second near-end signal output by the second differential beam former;
the processing unit is further configured to perform a first adaptive cancellation process on the first near-end signal and the second far-end signal according to the first DOA error to obtain a first output signal; according to the second DOA error, second self-adaptive cancellation processing is carried out on the first far-end signal and the second near-end signal to obtain a second output signal;
the transformation unit is further configured to perform short-time inverse fourier transformation on the first output signal and the second output signal, respectively, to obtain a first separated signal and a second separated signal.
7. The apparatus of claim 6, wherein the array element spacing is in the range of 2.0cm-3.5 cm.
8. The apparatus according to claim 6, wherein the computing unit is specifically configured to:
according to the formula errACalculating a first DOA error from |0- θ |;
according to the formula errBCalculating a second DOA error from |180- θ |;
wherein, errAIs the first DOA error, errBIs a second DOA error; θ is the DOA estimate.
9. The apparatus according to claim 6, wherein the processing unit is specifically configured to:
comparing the first DOA error with a preset error threshold, and when the first DOA error is not greater than the preset error threshold, taking the first output signal of the current frame as a first near-end signal without updating a first adaptive processing filter coefficient;
and when the first DOA error is larger than a preset error threshold value, the current frame is a second far-end signal, and the coefficient of the first self-adaptive processing filter is updated.
10. The apparatus according to claim 6, wherein the processing unit is specifically configured to:
comparing the second DOA error with a preset error threshold, and when the second DOA error is not greater than the preset error threshold, taking the second output signal of the current frame as a second near-end signal without updating the coefficient of a second adaptive processing filter;
and when the second DOA error is larger than a preset error threshold value, the current frame is a first far-end signal, and the coefficient of the second self-adaptive processing filter is updated.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110268230.6A CN113053408B (en) | 2021-03-12 | 2021-03-12 | Sound source separation method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110268230.6A CN113053408B (en) | 2021-03-12 | 2021-03-12 | Sound source separation method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113053408A CN113053408A (en) | 2021-06-29 |
CN113053408B true CN113053408B (en) | 2022-06-14 |
Family
ID=76511725
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110268230.6A Active CN113053408B (en) | 2021-03-12 | 2021-03-12 | Sound source separation method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113053408B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102831898A (en) * | 2012-08-31 | 2012-12-19 | 厦门大学 | Microphone array voice enhancement device with sound source direction tracking function and method thereof |
CN106710603A (en) * | 2016-12-23 | 2017-05-24 | 上海语知义信息技术有限公司 | Speech recognition method and system based on linear microphone array |
CN110554357A (en) * | 2019-09-12 | 2019-12-10 | 苏州思必驰信息科技有限公司 | Sound source positioning method and device |
CN110931036A (en) * | 2019-12-07 | 2020-03-27 | 杭州国芯科技股份有限公司 | Microphone array beam forming method |
CN111429939A (en) * | 2020-02-20 | 2020-07-17 | 西安声联科技有限公司 | Sound signal separation method of double sound sources and sound pickup |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101470528B1 (en) * | 2008-06-09 | 2014-12-15 | 삼성전자주식회사 | Adaptive mode controller and method of adaptive beamforming based on detection of desired sound of speaker's direction |
US9210499B2 (en) * | 2012-12-13 | 2015-12-08 | Cisco Technology, Inc. | Spatial interference suppression using dual-microphone arrays |
-
2021
- 2021-03-12 CN CN202110268230.6A patent/CN113053408B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102831898A (en) * | 2012-08-31 | 2012-12-19 | 厦门大学 | Microphone array voice enhancement device with sound source direction tracking function and method thereof |
CN106710603A (en) * | 2016-12-23 | 2017-05-24 | 上海语知义信息技术有限公司 | Speech recognition method and system based on linear microphone array |
CN110554357A (en) * | 2019-09-12 | 2019-12-10 | 苏州思必驰信息科技有限公司 | Sound source positioning method and device |
CN110931036A (en) * | 2019-12-07 | 2020-03-27 | 杭州国芯科技股份有限公司 | Microphone array beam forming method |
CN111429939A (en) * | 2020-02-20 | 2020-07-17 | 西安声联科技有限公司 | Sound signal separation method of double sound sources and sound pickup |
Non-Patent Citations (3)
Title |
---|
Speech Enhancement Based on the General TransferFunction GSCand Postfiltering;Sharon Gannot,Israel Cohen;《IEEE Transactions on Speech and Audio Processing》;20041231;全文 * |
一种采用旁瓣增强的麦克风阵列抗混响算法;李剑汶等;《厦门大学学报(自然科学版)》;20171231(第05期);全文 * |
用于语音识别的鲁棒自适应麦克风阵列算法;赵贤宇等;《清华大学学报(自然科学版)》;20041030(第10期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN113053408A (en) | 2021-06-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR101339592B1 (en) | Sound source separator device, sound source separator method, and computer readable recording medium having recorded program | |
US10827263B2 (en) | Adaptive beamforming | |
EP2222091B1 (en) | Method for determining a set of filter coefficients for an acoustic echo compensation means | |
US8374358B2 (en) | Method for determining a noise reference signal for noise compensation and/or noise reduction | |
US7366662B2 (en) | Separation of target acoustic signals in a multi-transducer arrangement | |
EP2237271B1 (en) | Method for determining a signal component for reducing noise in an input signal | |
US9002027B2 (en) | Space-time noise reduction system for use in a vehicle and method of forming same | |
CN109285557B (en) | Directional pickup method and device and electronic equipment | |
CN111128220B (en) | Dereverberation method, apparatus, device and storage medium | |
JP3795610B2 (en) | Signal processing device | |
CN110660404B (en) | Voice communication and interactive application system and method based on null filtering preprocessing | |
Reuven et al. | Dual-source transfer-function generalized sidelobe canceller | |
CN113050035B (en) | Two-dimensional directional pickup method and device | |
EP3545691B1 (en) | Far field sound capturing | |
JP2005318518A (en) | Double-talk state judging method, echo cancel method, double-talk state judging apparatus, echo cancel apparatus, and program | |
CN113053408B (en) | Sound source separation method and device | |
CN112017680A (en) | Dereverberation method and device | |
Priyanka et al. | Adaptive Beamforming Using Zelinski-TSNR Multichannel Postfilter for Speech Enhancement | |
Comminiello et al. | A novel affine projection algorithm for superdirective microphone array beamforming | |
Lim et al. | MINTFormer: A spatially aware channel equalizer | |
CN110661510B (en) | Beam former forming method, beam forming device and electronic equipment | |
Yoshioka et al. | Speech dereverberation and denoising based on time varying speech model and autoregressive reverberation model | |
CN111863017B (en) | In-vehicle directional pickup method based on double microphone arrays and related device | |
KR102649227B1 (en) | Double-microphone array echo eliminating method, device and electronic equipment | |
Wang et al. | A Joint Speech Enhancement Algorithm Based on the Tri-Microphone |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |