CN114863944B - Low-delay audio signal overdetermined blind source separation method and separation device - Google Patents

Low-delay audio signal overdetermined blind source separation method and separation device Download PDF

Info

Publication number
CN114863944B
CN114863944B CN202210174605.7A CN202210174605A CN114863944B CN 114863944 B CN114863944 B CN 114863944B CN 202210174605 A CN202210174605 A CN 202210174605A CN 114863944 B CN114863944 B CN 114863944B
Authority
CN
China
Prior art keywords
separated
sound source
time
omega
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210174605.7A
Other languages
Chinese (zh)
Other versions
CN114863944A (en
Inventor
王泰辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Acoustics CAS
Original Assignee
Institute of Acoustics CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Acoustics CAS filed Critical Institute of Acoustics CAS
Priority to CN202210174605.7A priority Critical patent/CN114863944B/en
Publication of CN114863944A publication Critical patent/CN114863944A/en
Application granted granted Critical
Publication of CN114863944B publication Critical patent/CN114863944B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention belongs to the technical field of frequency domain blind source separation and audio signal processing, and particularly relates to a low-delay audio signal overdetermined blind source separation method, which comprises the following steps: each microphone array element in the microphone array picks up the acoustic signals of N sound sources to be separated in the target environment, converts the acoustic signals into corresponding digital signals, and then performs short-time Fourier transform on the digital signals to obtain corresponding time-frequency domain observation signals; repeatedly iterating and updating the obtained time-frequency domain observation signals until convergence is achieved, and obtaining variances and unmixed vectors of each sound source to be separated; constructing a unmixed matrix by using the obtained unmixed vector; inverting the solution mixing matrix to obtain an estimation of the mixing matrix; constructing a multi-channel wiener filter based on the mixed matrix for each sound source to be separated and performing filtering to obtain time-frequency domain signals to be separated; and then carrying out short-time Fourier inverse transformation to obtain the time domain waveform of the signal to be separated.

Description

Low-delay audio signal overdetermined blind source separation method and separation device
Technical Field
The invention belongs to the technical field of frequency domain blind source separation (Blind source separation, BSS) and audio signal processing, and particularly relates to a low-delay audio signal overdetermined blind source separation method and a separation device.
Background
In a scenario where multiple speakers speak simultaneously, one may focus on the voice of one speaker of interest while automatically ignoring the voice of the other speaker, a well-known "cocktail party" problem. The problem was at the earliest addressed by the uk cognizant, the professor Cherry, in the last 50 th century. However, this problem has long been left unsolved. Blind source separation is a new area developed to solve this problem. The blind source separation of the audio signals has wide application prospect, including man-machine voice interaction, automatic conference log, music separation and the like.
Frequency domain blind source separation technology has evolved rapidly over the last two decades as a class of representative audio separation solutions, with representative algorithms including independent component analysis (independent component analysis, IVA), independent vector analysis (independent vector analysis, IVA), independent low-rank matrix analysis (ILRMA), and the like. These algorithms essentially make use of the higher order statistic information of the signal. To achieve good separation performance, enough data needs to be accumulated to achieve accurate high order statistic estimation. In an off-line implementation, the estimation of the required statistics can be achieved with a longer length of data that has been collected, and thus the algorithms achieve better performance. Many practical application systems require blind source separation algorithms to be implemented on-line and require as little time delay as possible between system inputs and outputs. For example, high-end hearing aids require a system delay of less than 5 milliseconds. This is a demanding requirement for current blind source separation algorithms.
Most of the current blind source separation algorithms are based on a so-called narrowband assumption that the window length for the short-time fourier transform is much longer than the length of the system hybrid filter. In a conference system, a typical value for the reverberation time of a room is 600 milliseconds, which requires a window length of the short-time fourier transform of more than 600 milliseconds. It is apparent that the system delay is too great for many applications. The current real-time blind source separation algorithm cannot significantly reduce the time delay of the system. Therefore, development of a low-delay audio signal blind source separation technology is urgently needed to meet the requirement of real-time processing.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a low-delay audio signal overdetermined blind source separation method, which comprises the following steps:
each microphone array element in the microphone array picks up the acoustic signals of N sound sources to be separated in the target environment, converts the acoustic signals into corresponding digital signals, and then performs short-time Fourier transform on the digital signals to obtain corresponding time-frequency domain observation signals;
repeatedly iterating and updating the obtained time-frequency domain observation signals until convergence is achieved, and obtaining variances and unmixed vectors of each sound source to be separated; constructing a unmixed matrix by using the obtained unmixed vector; inverting the solution mixing matrix to obtain an estimation of the mixing matrix; constructing a multi-channel wiener filter based on the mixed matrix for each sound source to be separated and performing filtering to obtain time-frequency domain signals to be separated; and then carrying out short-time Fourier inverse transformation to obtain the time domain waveform of the signal to be separated.
The invention also provides a device for separating the overdetermined blind source of the low-delay audio signal, which comprises:
the microphone array comprises M microphone array elements and is used for picking up acoustic signals of N sound sources to be separated in a target environment; wherein M > N;
the A/D module is used for converting the sound signals of the N sound sources to be separated picked up by the microphone array into corresponding digital signals;
the short-time Fourier transform module is used for caching the signals acquired by the microphone array and performing short-time Fourier transform to obtain corresponding time-frequency domain signals;
the sound source variance and unmixed matrix estimation module is used for carrying out continuous iterative updating by utilizing the obtained time-frequency domain observation signals until convergence is achieved, estimating the variance and unmixed vector of the nth sound source to be separated, constructing an unmixed matrix by utilizing the obtained unmixed vector, and updating the unmixed matrix;
a mixing matrix estimation module for inverting the solution mixing matrix to obtain a mixing matrix;
the multi-channel wiener filtering module is used for constructing a multi-channel wiener filter of the nth sound source to be separated based on the mixing matrix and filtering the multi-channel wiener filter to obtain a time-frequency domain signal of the nth sound source to be separated; and
and the short-time inverse Fourier transform module is used for transforming the sound source signals of the N separated time-frequency domains into time domain waveforms, and taking the time domain waveforms as the sound signals of the real sound sources to be separated, so as to finish the ultra-blind source separation of the low-time-delay audio signals.
As one of the improvements of the above technical solutions, the apparatus further includes: a D/A module and a speaker array module;
the D/A module is used for converting the separated time domain digital signals of each channel output by the short-time inverse Fourier transform module into analog signals;
and the loudspeaker array module plays the analog separation signal through the loudspeaker array and sends the separation signal to the post-processing module for further processing.
Compared with the prior art, the invention has the beneficial effects that:
1. the method of the invention provides a low-delay audio signal blind source separation method, which is suitable for a real-time processing system requiring short delay, such as a remote online conference system;
2. the audio signal obtained by separation in the method can only contain direct sound and early reflected sound parts, so that the method has the characteristics of signal separation and dereverberation.
Drawings
FIG. 1 is a schematic diagram of the method for separating the overdetermined blind source of a low-delay audio signal;
FIG. 2 is a method flow chart of a low-delay audio signal overdetermined blind source separation method of the present invention;
FIG. 3 is a specific flowchart of step 2) of a low-delay audio signal overdetermined blind source separation method of the present invention;
fig. 4 is a schematic structural diagram of a low-delay audio signal overdetermined blind source separation device according to the present invention.
Detailed Description
The invention will now be further described with reference to the accompanying drawings.
The invention provides a low-delay audio signal overdetermined blind source separation method, which solves the problem of overdetermined blind source separation and requires more microphones than sound sources; the method of the invention needs short time Fourier transform window length shorter than the reverberation time of the space, thereby reducing the time delay between the input and the output of the real-time processing system.
The method comprises the following steps:
each microphone array element in the microphone array picks up the acoustic signals of N sound sources to be separated in the target environment, converts the acoustic signals into corresponding digital signals, and then performs short-time Fourier transform on the digital signals to obtain corresponding time-frequency domain observation signals;
repeatedly iterating and updating the obtained time-frequency domain observation signals until convergence is achieved, and obtaining variances and unmixed vectors of each sound source to be separated; constructing a unmixed matrix by using the obtained unmixed vector; inverting the solution mixing matrix to obtain an estimation of the mixing matrix; constructing a multi-channel wiener filter for each sound source to be separated and performing filtering to obtain time-frequency domain signals to be separated; and carrying out short-time Fourier inverse transformation to obtain the time domain waveform of the signal to be separated.
As shown in FIG. 1, there are N sound signals s of sound sources to be separated in a certain target environment space n (t), wherein 1.ltoreq.n.ltoreq.N and t is a discrete time. The acoustic signal s of the sound source to be separated n (t) simultaneously received by each microphone array element in a microphone array, the microphone array comprising M microphones; the signals received by the M microphones are recorded as x m (t), M is more than or equal to 1 and less than or equal to M. The method of the invention is limited to ultra-blind source separation, i.e. requiring that the total number of microphone elements is greater than the number of sound sources. The time domain transfer function from the nth sound source to be separated to the mth microphone array element is h nm (t), then the signal received by the mth microphone element is expressed as
Figure GDA0003687940230000041
Where x represents the convolution operation.
In the method of the present invention, as shown in fig. 1, blind source separation 101 is performed by using only the signal x received by the microphone array element m (t), M is more than or equal to 1 and less than or equal to M, and recovering the true sound source signal to be separated
Figure GDA0003687940230000042
However, it is in practice difficult to obtain a clean sound source signal to be separated, nor does the method of the invention seek to obtain an accurate estimate of the sound source signal to be separated, but rather to estimate the direct sound and early reflected sound portions of the sound source signal received by the microphone array elements or their mirror images at the microphone array elements.
It is difficult to perform the separation task directly in the time domain because the reverberation time in a closed space can be relatively long, resulting in slow convergence of the blind source separation algorithm in the time domain and unsatisfactory performance after convergence. The method of the invention obtains the corresponding time-frequency domain signal after the time domain signal is subjected to the short-time Fourier transform, so that the blind source separation of the audio signal can be more efficiently executed in the time-frequency domain.
As shown in fig. 2, the method specifically includes:
step 1) the m-th microphone array element in the microphone array picks up the acoustic signal s of the n-th sound source to be separated in the target environment n (t) and converts it into a corresponding digital signal, denoted as mth microphone signal x m (t) and performing short-time Fourier transform on the obtained signal to obtain a corresponding time-frequency domain observation signal X m (ω, k), wherein 1.ltoreq.n.ltoreq.N; t is the discrete time; m is more than or equal to 1 and less than or equal to M; m is the total number of microphone array elements in the microphone array, k is the frame identification, and ω is the frequency; the acoustic signal s of the nth sound source to be separated n (t) is an analog signal;
the microphone array comprises M microphone array elements, wherein the number M of the microphone array elements is greater than the total number of acoustic signals of sound sources to be separated, and the number M is recorded as M > N; i.e. overdetermined source separation.
Step 2) using the obtained time-frequency domain observation signal X m (omega, k) performing continuous iterative updating until convergence is reached, and estimating the variance lambda of the nth sound source to be separated n (omega, k-l) and the unmixed vector w n,l (omega) constructing a de-mixing matrix by using the obtained de-mixing vector, and updating the de-mixing matrix W (omega), wherein N is more than or equal to 1 and less than or equal to N; l is more than or equal to 0 and less than or equal to L n ;L n Representing the number of reflected sounds to be estimated of the nth sound source to be separated, wherein N represents the number of sound sources to be estimated;
specifically, the step 2) specifically includes:
step 201) using the obtained time-frequency domain observation signal X m (omega, k) updating the nth sound source to be separated to be nearest L n Variance lambda of frame n (ω,k-l):
Figure GDA0003687940230000051
Wherein F is the window length of the short-time Fourier transform; x (ω, k) = [ X 1 (ω,k),…,X M (ω,k)] T
Step 202) utilize lambda n (omega, k-L), updating the nth sound source to be separated at the nearest L n Weighted covariance matrix V of frame n,l (ω,k):
Figure GDA0003687940230000052
Where α is a smoothing factor very close to 1; v (V) n,l (ω, k-1) is a weighted covariance matrix of the (k-1) th frame; h is conjugate transpose;
step 203) utilizing V n,l (omega, k) updating the L corresponding to the nth sound source to be separated n Individual de-mixing vectors w n,l (ω):
Figure GDA0003687940230000053
Upper contract L 0 =0, column vector
Figure GDA0003687940230000054
(L) 0 +…+L n-1 ) +l elements are 1 and the other remaining elements are all 0, w (ω) = [ w 1,0 (ω),…,w 1,L-1 (ω),…,w N,0 (ω),…,w N,L-1 (ω)] H Is a unmixed matrix.
Step 204) for the updated L corresponding to the nth sound source to be separated n Individual de-mixing vectors w n,l (omega) performing normalization operation to obtain a normalized solution mixing vector;
Figure GDA0003687940230000055
step 205) uses the de-mixing vector w obtained in step 204) n,l (ω) constructing a unmixed matrix W (ω);
repeating step 201) -step 204), performing continuous iterative updating,
if the iteration times reach a preset value P and reach convergence, ending the iteration to obtain a solution mixing matrix;
otherwise, step 201) to step 205) are re-performed.
Step 3) inverting the unmixed matrix W (omega) to obtain a mixed matrix H (omega);
specifically, the step 3) specifically includes:
inverting the unmixed matrix W (omega) to obtain a mixed matrix H (omega);
H(ω)=[H 1 (ω),…,H N (ω)]=W -1 (ω)
wherein,,
Figure GDA0003687940230000056
is of dimension M x L n Matrix of (h), h n,l Is a column vector of dimension mx1.
Step 4) constructing a multichannel wiener filter omega of the nth sound source to be separated based on the mixed matrix H (omega) aiming at the nth sound source to be separated n (omega, k) and performing filtering to obtain the time-frequency domain signal of the nth sound source to be separated
Figure GDA0003687940230000061
Wherein the sum of the number of all reflected sounds to be estimated is equal to the total number of microphone elements, i.e. there is a constraint +.>
Figure GDA0003687940230000062
And suggest L n The value of N is more than or equal to 1 and less than or equal to N is as close as possible;
specifically, the step 4) specifically includes:
for the nth sound source to be separated, constructing a multichannel wiener filter omega of the nth sound source to be separated based on the mixed matrix H (omega) n (ω,k):
Figure GDA0003687940230000063
Wherein,,
Figure GDA0003687940230000064
Σ x (ω, k) is the covariance matrix of the current frame frequency domain microphone signal vector;
by using the omega thus obtained n (ω, k) the current frame frequency domain microphone receives a signal vector X (ω, k) = [ X) 1 (ω,k),…,X M (ω,k)] T Filtering to obtain a filtered signal c n,0 (ω,k):
c n,0 (ω,k)=Ω n (ω,k)x(ω,k)
From the resulting filtered signal c n,0 (omega, k) to obtain the time-frequency domain signal of the nth sound source to be separated
Figure GDA0003687940230000065
Frequency domain signal at nth sound source to be separated +.>
Figure GDA0003687940230000066
Is c n,0 (ω, k).
In other specific embodiments, the step 4) may further specifically include:
for the nth sound source to be separated, constructing a multichannel wiener filter omega of the nth sound source to be separated based on the mixed matrix H (omega) n (ω,k):
Figure GDA0003687940230000067
Receiving signal vector X (omega, k) = [ X ] by using multichannel wiener filter obtained by the method 1 (ω,k),…,X M (ω,k)] T Filtering to obtain a filtered signal c n (ω, k) is
c n (ω,k)=Ω n (ω,k)x(ω,k)
From the resulting filtered signal c n (omega, k) to obtain the time-frequency domain signal of the nth sound source to be separated
Figure GDA0003687940230000068
The time-frequency domain signal of the nth sound source to be separated +.>
Figure GDA0003687940230000069
Is c n (ω, k).
Step 5) for the nth time-frequency domain signal of the sound source to be separated
Figure GDA00036879402300000610
Performing short-time inverse Fourier transform to obtain corresponding time domain waveform +.>
Figure GDA00036879402300000611
And the sound source is used as a real sound signal of a sound source to be separated, so that the ultra-stationary blind source separation of the low-delay audio signal is completed.
Example 1.
Fig. 2 is a system block diagram of a real-time blind separation method of an audio signal according to the present invention, which includes a short-time fourier transform 201, a sound source variance and a unmixed matrix calculation 202, a mixed matrix estimation module 203, a multi-channel wiener filter 204, and an inverse short-time fourier transform 205.
The invention provides a low-delay audio signal overdetermined blind source separation method, which comprises the following steps:
short-time fourier transform 201
Performing short-time Fourier transform on each path of time domain signal of each sound source to be separated in a target environment acquired by a microphone array to obtain a corresponding current frame time-frequency domain observation signal; specifically, the short-time fourier transforms 201 respectively apply to the signals x received by the microphone array elements m (t) performing short-time Fourier transform to obtain X m (ω, k), wherein k is a frame identity and ω is a frequency; the window length of the short-time Fourier transform is F. Unlike other existing real-time processing algorithms, the short-time Fourier transform window used in the invention can be much smaller than the reverberation time, thereby reducing the time delay between the input and output of the real-time system.
Acoustic source variance and unmixed matrix estimation 202
Using short-time Fourier transform signals X m (omega, k) calculating to obtain variance lambda of each sound source signal to be separated of the current frame n (ω, k) and the unmixed vector w n,l (ω) iteratively updating the variance of each sound source to be separated and the corresponding downmix vector and updating the downmix matrix;
frequency-domain microphone received signal vector defining dimension as Mx 1
x(ω,k)=[X 1 (ω,k),…,X M (ω,k)] T . (2)
Defining the frequency domain signal vector of the acoustic signal of the nth sound source to be separated received by the microphone array as
c n (ω,k)=[c n1 (ω,k),…,c nM (ω,k)] T (3)
Wherein this vector is also called mirror image of the nth sound source to be separated. The invention subjects the mirror image c n (omega, k) modeled as the sum of a series of reflected sounds
Figure GDA0003687940230000071
Wherein L is n Representing the number of reflected sounds to be estimated of the nth sound source to be separated, c n,0 (ω, k) is the first reflected sound portion (including the direct sound) of the nth sound source to be separated c n,1 (ω, k) is the second reflected sound portion of the nth sound source to be separated, and so on. The technique disclosed in the present patent can realize the estimation of the first reflected sound part and the mirror image c n An estimate of (ω, k).
To ensure that this patent works well, the number of reflected sound portions of all recovered sound sources to be separated needs to be constrained as follows:
Figure GDA0003687940230000081
in addition, in practice L should be ensured n The value of N is not less than 1 and not more than N is as much as possibleNear. For example, if the number of sound sources to be separated is n=2 and the total number of microphones is m=4, then L is preferably set 1 =L 2 =2; the number of sound sources to be separated is n=2, the total number of microphone array elements is m=5, then L is preferably set 1 =2,L 2 =3 or L 1 =3,L 2 =2, not recommended for use of L 1 =1,L 2 =4 and L 1 =4,L 2 =1。
Currently existing real-time blind source separation methods are mostly based on so-called narrowband assumptions, which require the use of a very long short-time fourier transform window length to cover the main energy part of the hybrid filter. The invention divides the whole mixed impulse response into a plurality of parts and the former L n The reflected acoustic portions are separated. For example, assuming that separation of two sound sources to be separated is required, the reverberation time of a room where 470 milliseconds is located, the window length of the short-time fourier transform is set to 128 milliseconds, and the number of reflected sounds to be separated per sound source is L n =2, a better performance can be achieved with only 4 microphones. Whereas with existing blind source separation methods, short-time fourier transforms are required with window lengths approaching 470 milliseconds. Therefore, the real-time blind source separation method provided by the invention greatly reduces the time delay of the real-time processing system, which is a great advantage for an online system.
To realize all L of the nth sound source to be separated n Separation of the reflected acoustic portions requires L n Individual de-mixing vectors w n,l (ω),0≤l≤L n -1. But it will be apparent to those skilled in the art that w n,l (omega) cannot be directly used to implement L n Separation of the reflected sound portions.
With the above knowledge background, a specific flow chart for implementing the sound source variance and unmixed matrix estimation 202 is shown in fig. 3.
More specifically, the source variance and unmixed matrix estimation 202 is implemented by iteration, the number of iterations being set to P. For example, setting p=2 can achieve a better separation performance. In each iteration, the following 5 steps are performed in sequence:
step 202-1) when in useFrequency domain observation signal X m (omega, k) updating the nth sound source to be separated to be nearest L n Variance lambda of frame n (ω, k-l) the computational expression is
Figure GDA0003687940230000082
Step 202-2) utilize lambda n (omega, k-L), updating the nth sound source to be separated at the nearest L n Weighted covariance matrix V of frame n,l (ω, k) by
Figure GDA0003687940230000083
Where α is a smoothing factor very close to 1.
Step 202-3) utilizing V n,l (omega, k) updating the L corresponding to the nth sound source to be separated n Individual de-mixing vectors w n,l (ω)
Figure GDA0003687940230000091
Upper contract L 0 =0, column vector
Figure GDA0003687940230000092
(L) 0 +…+L n-1 ) +l elements are 1 and the other elements are all 0, w (ω) = [ w) 1,0 (ω),…,w 1,L-1 (ω),…,w N,0 (ω),…,w N,L-1 (ω)] H Is a de-mixing matrix;
step 202-4) L for nth Sound Source to be separated n Individual de-mixing vectors w n,l (ω) performing a normalization operation;
Figure GDA0003687940230000093
step 202-5) uses the de-mixing vector w obtained in step 202-4) n,l (omega) construction solutionMixing matrix W (ω).
If the iteration times reach a preset value P, finishing the iteration updating process; otherwise, steps 202-1 to 202-5 shown in fig. 3 are re-performed.
Hybrid matrix estimation 203
Inverting the solution mixing matrix to obtain a mixing matrix;
in particular, the inverse of the unmixed matrix is used to construct a mixed matrix H (ω) of dimension MxM, in particular
H(ω)=[H 1 (ω),…,H N (ω)]=W -1 (ω) (10)
Wherein the method comprises the steps of
Figure GDA0003687940230000094
Is of dimension M x L n Matrix of (h), h n,l Is a column vector of dimension mx1.
Multi-channel wiener filtering 204
Constructing a multi-channel wiener filter aiming at each sound source to be separated to obtain estimation of time-frequency domain signals of the sound source to be separated;
specifically, the variance λ of all N sound sources is obtained using the sound source variance and the unmixed matrix estimate 202 n (ω, k) and said 203 estimated mixing matrix H (ω) construct N multi-channel wiener filters. Wherein the multichannel wiener filter Ω for the nth sound source n (ω, k) is
Figure GDA0003687940230000095
Wherein,,
Figure GDA0003687940230000096
is the covariance matrix of the microphone signal.
Furthermore, the multichannel wiener filter Ω is utilized n (omega, k) filtering the current frame frequency domain microphone received signal vector x (omega, k) to obtain
c n,0 (ω,k)=Ω n (ω,k)x(ω,k) (12)
In the above formula (12), the first reflected sound portion of the nth sound source to be separated is output. Therefore, the invention has the functions of signal separation and dereverberation, which can better improve the voice quality of the separated voice.
Alternatively, it is also possible to selectively output a mirror image c containing all reflected sound portions n (omega, k) at this time, the multichannel wiener filter corresponding to the nth sound source to be separated is
Figure GDA0003687940230000101
Mirror image c recovered with the estimated multi-channel wiener filter of (13) n (ω, k) is
c n (ω,k)=Ω n (ω,k)x(ω,k) (14)
C obtained by the formula (12) or (14) n,0 (ω, k) or c n And (omega, k) obtaining M channel signals of the nth sound source to be separated, wherein in practice, each sound source only needs one output signal.
For convenience, the present patent uniformly selects the mirror image at the first microphone or the first reflected acoustic portion as the output, i.e
Figure GDA0003687940230000102
Or->
Figure GDA0003687940230000103
Wherein the method comprises the steps of
Figure GDA0003687940230000104
And->
Figure GDA0003687940230000105
Respectively the vector c n (ω, k) and c n,0 (ω, k).
Inverse short-time fourier transform 205
And performing short-time Fourier inverse transformation on the time-frequency domain signals of each sound source to be separated to obtain corresponding time domain waveforms, and taking the time domain waveforms as real sound signals of the sound sources to be separated to finish the ultra-blind source separation of the low-time-delay audio signals.
Specifically, estimation of the sound source signal to be separated output by the multi-channel wiener filter 204
Figure GDA0003687940230000106
Performing short-time Fourier inverse transformation, and obtaining corresponding time domain signal by overlap-add method
Figure GDA0003687940230000107
Example 2.
As shown in fig. 4, the present invention further provides a low-delay audio signal overdetermined blind source separation device, which includes:
the microphone array 401 includes M microphone array elements for picking up acoustic signals of N sound sources to be separated in the target environment, where the total number of microphone array elements is required to be greater than the number of sound sources to be separated, i.e., M > N; m is more than or equal to 3; n is more than or equal to 2;
an a/D module 402, configured to convert the acoustic signals (analog signals) of the N sound sources to be separated, which are picked up by the microphone array 401, into corresponding digital signals, so as to send the digital signals to a processor or other devices to execute a related separation algorithm; in a MEMS microphone, the a/D module 402 may be integrated into the microphone.
The short-time fourier transform module 403 is configured to buffer the signals collected by the microphone array and perform short-time fourier transform to obtain corresponding time-frequency domain signals; the real-time blind source separation method is performed in the time-frequency domain. The window length of the short-time fourier transform required for the present invention may be much shorter than the reverberation time of the space in which the microphone array is located.
The sound source variance and unmixed matrix estimation module 404 is configured to perform continuous iterative update by using the obtained time-frequency domain observation signal until convergence is reached, estimate the variance and unmixed vector of the nth sound source to be separated, construct an unmixed matrix by using the obtained unmixed vector, and update the unmixed matrix; the iteration specific process comprises the following steps:
1) Respectively calculating variances of all sound sources to be separated; specifically, the variances of the nth sound sources to be separated are respectively calculated by using the obtained M de-mixing vectors;
2) Updating the weighted covariance matrix of the nth sound source to be separated;
3) Updating all the unmixed vectors of the nth sound source to be separated;
4) Normalizing all the unmixed vectors of the nth sound source to be separated;
5) Constructing a unmixed matrix by using the unmixed vector obtained in the previous step;
a mixing matrix estimation module 405, configured to invert the solution mixing matrix to obtain a mixing matrix;
the multi-channel wiener filtering module 406 is configured to construct a multi-channel wiener filter of the nth to-be-separated sound source based on the mixing matrix, perform filtering to obtain a time-frequency domain signal of the nth to-be-separated sound source, that is, calculate a multi-channel wiener filter corresponding to each to-be-separated sound source, multiply the multi-channel wiener filter with a microphone time-frequency domain vector to obtain a mirror image of the to-be-separated sound source or a first early-stage reflected sound part in the mirror image, and take a first signal of the vector as the sound source separation output signal.
The short-time inverse fourier transform module 407 is configured to transform the N time-frequency domain sound source signals obtained by separation into a time domain waveform, and use the time domain waveform as a real sound signal of a sound source to be separated, thereby completing the ultra-blind source separation of the low-delay audio signal.
Wherein the apparatus further comprises: a D/a module 408, a speaker array module 409, and a post-processing module 410;
the D/a module 408 is configured to convert the separated time domain digital signals of each channel output by the inverse short-time fourier transform module 407 into analog signals;
the speaker array module 409 plays the analog split signal through the speaker array and sends the split signal to the post-processing module 410 (e.g., a speech recognition engine, keyword recognition engine, etc.) for further processing.
It should be noted that the real-time blind source separation method described in the present invention can be implemented in various ways, such as hardware, software, or a combination of hardware and software. The hardware platform may be an FPGA, PLD, or other application specific integrated circuit ASIC. The software platform includes a DSP, ARM, or other microprocessor. The combination of software and hardware, for example, part of the modules are implemented in DSP software and part of the modules are implemented in hardware accelerators.
Finally, it should be noted that the above embodiments are only for illustrating the technical solution of the present invention and are not limiting. Although the present invention has been described in detail with reference to the embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the present invention, which is intended to be covered by the appended claims.

Claims (6)

1. A method for overdetermined source separation of low-delay audio signals, the method comprising:
each microphone array element in the microphone array picks up the acoustic signals of N sound sources to be separated in the target environment, converts the acoustic signals into corresponding digital signals, and then performs short-time Fourier transform on the digital signals to obtain corresponding time-frequency domain observation signals;
repeatedly iterating and updating the obtained time-frequency domain observation signals until convergence is achieved, and obtaining variances and unmixed vectors of each sound source to be separated; constructing a unmixed matrix by using the obtained unmixed vector; inverting the solution mixing matrix to obtain an estimation of the mixing matrix; constructing a multi-channel wiener filter based on the mixed matrix for each sound source to be separated and performing filtering to obtain time-frequency domain signals to be separated; then carrying out short-time Fourier inverse transformation to obtain a signal time domain waveform to be separated;
the method specifically comprises the following steps:
step 1) the m-th microphone array element in the microphone array picks up the acoustic signal s of the n-th sound source to be separated in the target environment n (t) and converts it into a corresponding digital signal, denoted as mth microphone signal x m (t) and performing short-time Fourier transform to obtain corresponding time-frequency domainObservation signal X m (ω, k), wherein 1.ltoreq.n.ltoreq.N; t is the discrete time; m is more than or equal to 1 and less than or equal to M; m is the total number of microphone array elements in the microphone array, k is the frame identification, and ω is the frequency;
step 2) using the obtained time-frequency domain observation signal X m (omega, k) performing continuous iterative updating until convergence is reached, and estimating the variance lambda of the nth sound source to be separated n (omega, k-l) and the unmixed vector w n,l (ω) using the resulting de-mixing vector w n,l (ω) constructing a unmixed matrix; and updating a unmixing matrix W (omega), wherein N is more than or equal to 1 and less than or equal to N; l is more than or equal to 0 and less than or equal to L n ;L n Representing the number of reflected sounds to be estimated of the nth sound source to be separated, wherein N represents the number of sound sources to be estimated;
step 3) inverting the unmixed matrix W (omega) to obtain a mixed matrix H (omega);
step 4) constructing a multichannel wiener filter omega of the nth sound source to be separated based on the mixed matrix H (omega) aiming at the nth sound source to be separated n (omega, k) and performing filtering to obtain the time-frequency domain signal of the nth sound source to be separated
Figure FDA0004284640920000011
Step 5) for the nth time-frequency domain signal of the sound source to be separated
Figure FDA0004284640920000012
Performing short-time inverse Fourier transform to obtain corresponding time domain waveform +.>
Figure FDA0004284640920000013
And the sound source is used as a sound signal of a real sound source to be separated, so that the ultra-stationary blind source separation of the low-delay audio signal is completed;
the microphone array comprises M microphone array elements, wherein the number M of the microphone array elements is greater than the total number of the sound signals of the sound sources to be separated, and is recorded as M & gtN;
the step 2) specifically comprises the following steps:
step 201) updating the nth sound source to be separated by using the obtained time-frequency domain observation signal x (omega, k)Variance lambda of the k-l frame of (2) n (ω,k-l):
Figure FDA0004284640920000021
Wherein F is the window length of the short-time Fourier transform; x (ω, k) = [ X 1 (ω,k),…,X M (ω,k)] T ;w n,l (omega) represents the L-th corresponding to the nth sound source to be separated n A plurality of de-mixing vectors;
step 202) utilize lambda n (omega, k-L), updating the nth sound source to be separated at the nearest L n Weighted covariance matrix V of frame n,l (ω,k):
Figure FDA0004284640920000022
Where α is a smoothing factor approaching 1; v (V) n,l (ω, k-1) is a weighted covariance matrix of the (k-1) th frame; h is conjugate transpose;
step 203) utilizing V n,l (omega, k) updating the L corresponding to the nth sound source to be separated n Individual de-mixing vectors w n,l (ω):
Figure FDA0004284640920000023
Upper contract L 0 =0, column vector
Figure FDA0004284640920000024
(L) 0 +…+L n-1 ) +l elements are 1 and the other remaining elements are all 0, w (ω) = [ w 1,0 (ω),…,w 1,L-1 (ω),…,w N,0 (ω),…,w N,L-1 (ω)] H Is a de-mixing matrix;
step 204) for the updated L corresponding to the nth sound source to be separated n Individual de-mixing vectors w n,l (omega) performing normalization operation to obtain normalized solution mixtureVector;
Figure FDA0004284640920000025
step 205) uses the de-mixing vector w obtained in step 204) n,l (ω) constructing a unmixed matrix W (ω);
repeating steps 201) through 204), performing continuous iterative updating,
if the iteration times reach a preset value P and reach convergence, ending the iteration to obtain a solution mixing matrix;
otherwise, re-executing steps 201) to 205);
the step 4) specifically comprises the following steps:
for the nth sound source to be separated, constructing a multichannel wiener filter omega of the nth sound source to be separated based on the mixed matrix H (omega) n (ω,k):
Figure FDA0004284640920000031
Wherein,,
Figure FDA0004284640920000032
Σ x (ω, k) is the covariance matrix of the current frame frequency domain microphone received signal vector x (ω, k);
by using the omega thus obtained n (ω, k) the current frame frequency domain microphone receives a signal vector X (ω, k) = [ X) 1 (ω,k),…,X M (ω,k)] T Filtering to obtain a filtered signal c n,0 (ω,k):
c n,0 (ω,k)=Ω n (ω,k)x(ω,k)
From the resulting filtered signal c n,0 (omega, k) to obtain the time-frequency domain signal of the nth sound source to be separated
Figure FDA0004284640920000033
Frequency domain signal at nth sound source to be separated +.>
Figure FDA0004284640920000034
Is c n,0 (ω, k).
2. The method of claim 1, wherein the sum of the number of all reflected sounds to be estimated is equal to the total number of microphone elements, denoted as
Figure FDA0004284640920000035
3. The method for overdetermined source separation of low-delay audio signals according to claim 1, wherein the step 3) specifically comprises:
inverting the unmixed matrix W (omega) to obtain a mixed matrix H (omega);
H(ω)=[H 1 (ω),…,H N (ω)]=W -1 (ω)
wherein,,
Figure FDA0004284640920000036
is of dimension M x L n Matrix of (h), h n,l Is a column vector of dimension mx1.
4. The method for overdetermined source separation of low-delay audio signals according to claim 1, wherein the step 4) specifically comprises:
for the nth sound source to be separated, constructing a multichannel wiener filter omega of the nth sound source to be separated based on the mixed matrix H (omega) n (ω,k):
Figure FDA0004284640920000037
Receiving signal vector X (omega, k) = [ X ] by using multichannel wiener filter obtained by the method 1 (ω,k),…,X M (ω,k)] T Filtering to obtain a filtered signal c n (ω, k) is
c n (ω,k)=Ω n (ω,k)x(ω,k)
From the resulting filtered signal c n (omega, k) to obtain the time-frequency domain signal of the nth sound source to be separated
Figure FDA0004284640920000038
The time-frequency domain signal of the nth sound source to be separated +.>
Figure FDA0004284640920000039
Is c n (ω, k).
5. A low-delay audio signal overdetermined blind source separation device, characterized in that the device comprises:
the microphone array (401) comprises M microphone array elements, and is used for picking up acoustic signals of N sound sources to be separated in a target environment; wherein M is greater than N;
an A/D module (402) for converting the acoustic signals of N sound sources to be separated picked up by the microphone array (401) into corresponding digital signals;
the short-time Fourier transform module (403) is used for buffering the signals acquired by the microphone array and performing short-time Fourier transform to obtain corresponding time-frequency domain signals;
the sound source variance and unmixed matrix estimation module (404) is used for carrying out continuous iterative updating by utilizing the obtained time-frequency domain observation signal until convergence is reached, estimating the variance and unmixed vector of the nth sound source to be separated, constructing an unmixed matrix by utilizing the obtained unmixed vector, and updating the unmixed matrix;
a mixing matrix estimation module (405) for inverting the solution mixing matrix to obtain a mixing matrix;
the multi-channel wiener filtering module (406) is used for constructing a multi-channel wiener filter of the nth sound source to be separated based on the mixing matrix and filtering the multi-channel wiener filter to obtain a time-frequency domain signal of the nth sound source to be separated; and
the short-time inverse Fourier transform module (407) is used for transforming the sound source signals of the N separated time-frequency domains into time domain waveforms, and taking the time domain waveforms as the sound signals of the real sound sources to be separated, so as to finish the ultra-blind source separation of the low-time-delay audio signals;
the method for performing overdetermined source separation on the low-delay audio signal by the device comprises the following steps:
step 1) the m-th microphone array element in the microphone array picks up the acoustic signal s of the n-th sound source to be separated in the target environment n (t) and converts it into a corresponding digital signal, denoted as mth microphone signal x m (t) and performing short-time Fourier transform on the obtained signal to obtain a corresponding time-frequency domain observation signal X m (ω, k), wherein 1.ltoreq.n.ltoreq.N; t is the discrete time; m is more than or equal to 1 and less than or equal to M; m is the total number of microphone array elements in the microphone array, k is the frame identification, and ω is the frequency;
step 2) using the obtained time-frequency domain observation signal X m (omega, k) performing continuous iterative updating until convergence is reached, and estimating the variance lambda of the nth sound source to be separated n (omega, k-l) and the unmixed vector w n,l (ω) using the resulting de-mixing vector w n,l (ω) constructing a unmixed matrix; and updating a unmixing matrix W (omega), wherein N is more than or equal to 1 and less than or equal to N; l is more than or equal to 0 and less than or equal to L n ;L n Representing the number of reflected sounds to be estimated of the nth sound source to be separated, wherein N represents the number of sound sources to be estimated;
step 3) inverting the unmixed matrix W (omega) to obtain a mixed matrix H (omega);
step 4) constructing a multichannel wiener filter omega of the nth sound source to be separated based on the mixed matrix H (omega) aiming at the nth sound source to be separated n (omega, k) and performing filtering to obtain the time-frequency domain signal of the nth sound source to be separated
Figure FDA0004284640920000051
Step 5) for the nth time-frequency domain signal of the sound source to be separated
Figure FDA0004284640920000052
Performing short-time inverse Fourier transform to obtain corresponding time domain waveform +.>
Figure FDA0004284640920000053
And the sound source is used as a sound signal of a real sound source to be separated, so that the ultra-stationary blind source separation of the low-delay audio signal is completed;
the microphone array comprises M microphone array elements, wherein the number M of the microphone array elements is greater than the total number of the sound signals of the sound sources to be separated, and is recorded as M & gtN;
the step 2) specifically comprises the following steps:
step 201) updating the variance lambda of the k-l frame of the nth sound source to be separated using the obtained time-frequency domain observation signal x (omega, k) n (ω,k-l):
Figure FDA0004284640920000054
Wherein F is the window length of the short-time Fourier transform; x (ω, k) = [ X 1 (ω,k),…,X M (ω,k)] T ;w n,l (omega) represents the L-th corresponding to the nth sound source to be separated n A plurality of de-mixing vectors;
step 202) utilize lambda n (omega, k-L), updating the nth sound source to be separated at the nearest L n Weighted covariance matrix V of frame n,l (ω,k):
Figure FDA0004284640920000055
Where α is a smoothing factor approaching 1; v (V) n,l (ω, k-1) is a weighted covariance matrix of the (k-1) th frame; h is conjugate transpose;
step 203) utilizing V n,l (omega, k) updating the L corresponding to the nth sound source to be separated n Individual de-mixing vectors w n,l (ω):
Figure FDA0004284640920000056
Upper contract L 0 =0, column vector
Figure FDA0004284640920000057
(L) 0 +…+L n-1 ) +l elements are 1 and the other remaining elements are all 0, w (ω) = [ w 1,0 (ω),…,w 1,L-1 (ω),…,w N,0 (ω),…,w N,L-1 (ω)] H Is a de-mixing matrix;
step 204) for the updated L corresponding to the nth sound source to be separated n Individual de-mixing vectors w n,l (omega) performing normalization operation to obtain a normalized solution mixing vector;
Figure FDA0004284640920000061
step 205) uses the de-mixing vector w obtained in step 204) n,l (ω) constructing a unmixed matrix W (ω);
repeating steps 201) through 204), performing continuous iterative updating,
if the iteration times reach a preset value P and reach convergence, ending the iteration to obtain a solution mixing matrix;
otherwise, re-executing steps 201) to 205);
the step 4) specifically comprises the following steps:
for the nth sound source to be separated, constructing a multichannel wiener filter omega of the nth sound source to be separated based on the mixed matrix H (omega) n (ω,k):
Figure FDA0004284640920000062
Wherein,,
Figure FDA0004284640920000063
Σ x (ω, k) is the covariance matrix of the current frame frequency domain microphone received signal vector x (ω, k);
By using the omega thus obtained n (ω, k) the current frame frequency domain microphone receives a signal vector X (ω, k) = [ X) 1 (ω,k),…,X M (ω,k)] T Filtering to obtain a filtered signal c n,0 (ω,k):
c n,0 (ω,k)=Ω n (ω,k)x(ω,k)
From the resulting filtered signal c n,0 (omega, k) to obtain the time-frequency domain signal of the nth sound source to be separated
Figure FDA0004284640920000064
Frequency domain signal at nth sound source to be separated +.>
Figure FDA0004284640920000065
Is c n,0 (ω, k).
6. The low-latency audio signal overdetermined source separation device of claim 5, further comprising: a D/a module (408), a speaker array module (409) and a post-processing module (410);
the D/A module (408) is used for converting the separated time domain digital signals of each channel output by the short-time inverse Fourier transform module (407) into analog signals;
the speaker array module (409) plays the analog split signal through the speaker array and sends the split signal to the post-processing module (410) for further processing.
CN202210174605.7A 2022-02-24 2022-02-24 Low-delay audio signal overdetermined blind source separation method and separation device Active CN114863944B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210174605.7A CN114863944B (en) 2022-02-24 2022-02-24 Low-delay audio signal overdetermined blind source separation method and separation device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210174605.7A CN114863944B (en) 2022-02-24 2022-02-24 Low-delay audio signal overdetermined blind source separation method and separation device

Publications (2)

Publication Number Publication Date
CN114863944A CN114863944A (en) 2022-08-05
CN114863944B true CN114863944B (en) 2023-07-14

Family

ID=82627900

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210174605.7A Active CN114863944B (en) 2022-02-24 2022-02-24 Low-delay audio signal overdetermined blind source separation method and separation device

Country Status (1)

Country Link
CN (1) CN114863944B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117202077B (en) * 2023-11-03 2024-03-01 恩平市海天电子科技有限公司 Microphone intelligent correction method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2437517A1 (en) * 2010-09-30 2012-04-04 Nxp B.V. Sound scene manipulation
CN102568493A (en) * 2012-02-24 2012-07-11 大连理工大学 Underdetermined blind source separation (UBSS) method based on maximum matrix diagonal rate
CN105355212A (en) * 2015-10-14 2016-02-24 天津大学 Firm underdetermined blind separation source number and hybrid matrix estimating method and device
CN111986695A (en) * 2019-05-24 2020-11-24 中国科学院声学研究所 Non-overlapping sub-band division fast independent vector analysis voice blind separation method and system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8880395B2 (en) * 2012-05-04 2014-11-04 Sony Computer Entertainment Inc. Source separation by independent component analysis in conjunction with source direction information
US10770091B2 (en) * 2016-12-28 2020-09-08 Google Llc Blind source separation using similarity measure

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2437517A1 (en) * 2010-09-30 2012-04-04 Nxp B.V. Sound scene manipulation
CN102568493A (en) * 2012-02-24 2012-07-11 大连理工大学 Underdetermined blind source separation (UBSS) method based on maximum matrix diagonal rate
CN105355212A (en) * 2015-10-14 2016-02-24 天津大学 Firm underdetermined blind separation source number and hybrid matrix estimating method and device
CN111986695A (en) * 2019-05-24 2020-11-24 中国科学院声学研究所 Non-overlapping sub-band division fast independent vector analysis voice blind separation method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Convolutive Transfer Function-Based Multichannel Nonnegative Matrix Factorization for Overdetermined Blind Source Separation;Taihui Wang et al.;IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING;第802-815页 *

Also Published As

Publication number Publication date
CN114863944A (en) 2022-08-05

Similar Documents

Publication Publication Date Title
Zhang et al. ADL-MVDR: All deep learning MVDR beamformer for target speech separation
CN111133511B (en) sound source separation system
Erdogan et al. Improved MVDR beamforming using single-channel mask prediction networks.
Xiao et al. Deep beamforming networks for multi-channel speech recognition
US9668066B1 (en) Blind source separation systems
CN107393550B (en) Voice processing method and device
JP5124014B2 (en) Signal enhancement apparatus, method, program and recording medium
Krueger et al. Model-based feature enhancement for reverberant speech recognition
CN109427328B (en) Multichannel voice recognition method based on filter network acoustic model
JP2002510930A (en) Separation of unknown mixed sources using multiple decorrelation methods
CN108109617A (en) A kind of remote pickup method
JP2007526511A (en) Method and apparatus for blind separation of multipath multichannel mixed signals in the frequency domain
GB2548325A (en) Acoustic source seperation systems
Zhang et al. Multi-channel multi-frame ADL-MVDR for target speech separation
Bertrand et al. Adaptive distributed noise reduction for speech enhancement in wireless acoustic sensor networks
CN114863944B (en) Low-delay audio signal overdetermined blind source separation method and separation device
KR20220022286A (en) Method and apparatus for extracting reverberant environment embedding using dereverberation autoencoder
Li et al. Taylorbeamformer: Learning all-neural beamformer for multi-channel speech enhancement from taylor's approximation theory
CN113823316B (en) Voice signal separation method for sound source close to position
CN108962276B (en) Voice separation method and device
Giacobello et al. Speech dereverberation based on convex optimization algorithms for group sparse linear prediction
CN113409804A (en) Multichannel frequency domain speech enhancement algorithm based on variable-span generalized subspace
Aroudi et al. Cognitive-driven convolutional beamforming using EEG-based auditory attention decoding
CN109243476B (en) Self-adaptive estimation method and device for post-reverberation power spectrum in reverberation voice signal
Yoshioka et al. Dereverberation by using time-variant nature of speech production system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Wang Taihui

Inventor after: Yang Feiran

Inventor after: Sun Guohua

Inventor after: Yang Jun

Inventor before: Wang Taihui