WO2021179416A1 - 一种基于分离矩阵初始化频点选择的盲源分离方法及系统 - Google Patents
一种基于分离矩阵初始化频点选择的盲源分离方法及系统 Download PDFInfo
- Publication number
- WO2021179416A1 WO2021179416A1 PCT/CN2020/087639 CN2020087639W WO2021179416A1 WO 2021179416 A1 WO2021179416 A1 WO 2021179416A1 CN 2020087639 W CN2020087639 W CN 2020087639W WO 2021179416 A1 WO2021179416 A1 WO 2021179416A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- separation
- separation matrix
- frequency point
- frequency
- matrix
- Prior art date
Links
- 238000000926 separation method Methods 0.000 title claims abstract description 222
- 239000011159 matrix material Substances 0.000 title claims abstract description 185
- 238000000034 method Methods 0.000 claims abstract description 27
- 230000005236 sound signal Effects 0.000 claims abstract description 14
- 238000013450 outlier detection Methods 0.000 claims description 8
- 238000004590 computer program Methods 0.000 claims description 5
- 238000007781 pre-processing Methods 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims 1
- 238000004422 calculation algorithm Methods 0.000 abstract description 81
- 238000012880 independent component analysis Methods 0.000 description 29
- 238000004088 simulation Methods 0.000 description 25
- 238000002474 experimental method Methods 0.000 description 21
- 230000004044 response Effects 0.000 description 17
- 238000004364 calculation method Methods 0.000 description 7
- 238000012545 processing Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 5
- 230000009467 reduction Effects 0.000 description 5
- 230000006870 function Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000007423 decrease Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 description 1
- 208000032041 Hearing impaired Diseases 0.000 description 1
- 238000010521 absorption reaction Methods 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000037433 frameshift Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
Definitions
- the present disclosure belongs to the technical field of audio signal processing, and in particular relates to a blind source separation method and system based on a separation matrix initialization frequency point selection.
- Blind Source Separation is a process of only separating the original source signal from the received mixed signal according to the statistical characteristics of the input source signal without knowing any parameters of the input source signal and the transmission channel. Because the BSS algorithm has fewer requirements for source signals and has a very wide range of applications, it has attracted more and more experts and researchers' attention.
- BSS can maintain the binaural cues of all sound sources through post-processing technology while performing speech enhancement to eliminate human voice interference.
- the cocktail party problem how to locate the sound you are interested in from a noisy venue, is very difficult for hearing impaired patients. Due to the time delay caused by sound propagation and the multipath caused by sound reflection, the signal received by the microphone in a real reverberation environment is the convolutional mixture of the source signal, but due to the multi-channel convolution operation involved, they are in the time domain It is difficult to code, and the algorithm converges slowly, and it is difficult to converge to the global optimum.
- FDBSS Frequency Domain Blind Source Separation
- reducing the computational complexity of the algorithm without affecting the separation performance can be started from the following three aspects: (a) reducing the number of ICA iterations; (b) reducing the number of frequency points for executing ICA iterations; c) Combining (a) and (b), both reduce the number of ICA iterations and the number of frequency points for ICA iteration.
- DOA Direction of Arrival
- the DOA information of the unknown source signal is estimated through covariance fitting.
- Using the estimated DOA information to form an accurate initial separation matrix can reduce the number of ICA iterations and speed up the convergence.
- the present disclosure provides a blind source separation method and system based on the frequency point selection of the separation matrix initialization.
- the method initializes the separation matrix through the DOA information of the source signal and accelerates the convergence speed of the algorithm. Improve separation performance.
- one or more embodiments of the present disclosure provide the following technical solutions:
- a method for blind source separation based on initial frequency point selection of a separation matrix including the following steps:
- the frequency point is selected according to the determinant of the mixed signal covariance matrix, and is classified into the primary frequency point set;
- One or more embodiments provide a blind source separation system based on initial frequency point selection of a separation matrix, including:
- Data acquisition module to acquire the audio signal to be separated
- a data preprocessing module which converts the to-be-separated audio signal into the frequency domain
- the DOA information estimation module performs an ICA iteration on the frequency points in the frequency domain where spatial aliasing does not occur to obtain a separation matrix, and estimates the DOA information of each source signal based on the separation matrix;
- Frequency point selection module once, at each frequency point in the entire frequency domain, select the frequency point according to the mixed signal covariance matrix, and put it into the primary frequency point set;
- the separation matrix initialization module uses the DOA information of the source signal to initialize to obtain the initial separation matrix and perform ICA iteration for the primary frequency points;
- the frequency point separation module uses the initial separation matrix to perform ICA iteration on the primary frequency points to obtain the separation matrix of the primary frequency points, and re-estimate the DOA information of the source signal; construct the separation matrix of the unselected frequency points based on the re-estimated DOA information ;
- the signal reconstruction module performs inverse Fourier transform according to the separation matrix of all frequency points to reconstruct the separated signal.
- One or more embodiments provide a computer-readable storage medium having a computer program stored thereon, and when the program is executed by a processor, the blind source separation method based on the initialization frequency point selection of the separation matrix is realized.
- One or more embodiments provide a binaural hearing aid system, including a memory, a processor, and a computer program stored in the memory and running on the processor.
- a blind source separation method based on the initial frequency selection of the separation matrix.
- the above technical solution provides a blind source separation method suitable for a binaural hearing aid system.
- the separation matrix is initialized to accelerate the convergence speed of the algorithm and reduce the amount of calculation for calculating the separation matrix.
- the running time of the proposed separation matrix initialization frequency selection FDBSS method is significantly shortened, whether it is in a reverberant environment or in a reverberant environment. At the same time, the separation performance is improved.
- FIG. 1 is a flowchart of a blind source separation method based on frequency point selection of a separation matrix initialization according to one or more embodiments of the present disclosure
- Figure 4 shows the estimated value of the source signal DOA when the incident angle is 0° in the simulation experiment
- Figure 5 shows the directional patterns at different frequency points before solving the arrangement uncertainty problem in the simulation experiment
- Figure 6 shows the directional patterns at different frequency points after solving the arrangement uncertainty problem in the simulation experiment
- Figure 7 shows the simulation experiment room setting
- Figure 11 is the distribution diagram of the determinant of the normalized covariance matrix with frequency in the simulation experiment.
- Figure 12 is a distribution diagram of the number of initially selected frequency points versus the threshold in the simulation experiment
- Figures 13(a) and 13(b) are performance comparison diagrams of the method provided by the embodiment and the traditional method under different iteration times in the simulation experiment;
- Fig. 14(a) and Fig. 14(b) are the curves of dN and running time decreasing percentage with threshold value under 4 pairs of different signal arrival directions in the simulation experiment;
- Figure 15(a) and Figure 15(b) show the performance comparison between the proposed algorithm and the traditional algorithm under different iteration times in the simulation experiment.
- the blind source separation algorithm has three basic models: instantaneous mixing model, non-reverberation mixing model and convolutional mixing model.
- instantaneous mixing model we assume here that the mixing of voice signals is instantaneous, that is, the time difference between different signals reaching each microphone is negligible.
- the signal received by the microphone is a linear mixture of the source signal, which can be expressed as:
- Expression (1) can be expressed in the form of matrix and vector as:
- A is the N ⁇ M mixing matrix.
- W ⁇ A is the unit matrix, and the separation matrix W can be expressed as the inverse of the mixing matrix A.
- the quantized natural gradient algorithm is an improvement from the Infomax algorithm.
- the Infomax algorithm uses a nonlinear function to transform the separation matrix from the perspective of information theory, and completes the separation by maximizing the output entropy.
- the iterative formula for calculating the separation matrix using the quantized natural gradient algorithm can be expressed as:
- the nonlinear function is selected as:
- ⁇ is a factor for adjusting the nonlinear gain
- ⁇ ( ⁇ ) represents the argument
- Step 1 Obtain an audio signal to be separated, and perform Fourier transform on the audio signal to be separated.
- the source signal vector, mixed signal vector and mixed matrix in the frequency domain can be expressed as:
- ⁇ is the delay parameter
- ⁇ is the attenuation parameter
- It represents the arrival delay of the second source signal observed at the first microphone from the ⁇ 2 direction
- ⁇ 12 represents the arrival attenuation of the second source signal observed at the first microphone from the ⁇ 2 direction
- d is the distance between the microphones
- ⁇ is the DOA of the source signal
- the value of ⁇ is put into formula (10) to obtain:
- Step 2 Perform an ICA (Independent Component Analysis, ICA) iteration on the frequency points in the frequency domain where spatial aliasing will not occur to obtain a separation matrix; among them, the frequency domain where spatial aliasing will not occur is based on the binaural hearing aid The distance between the two microphones is determined. Specifically, the frequency domain range FL in which spatial aliasing does not occur can be calculated as:
- c is the speed of sound, which is about 340m/s
- d is the distance between the microphones, which is about 15cm.
- the frequency range where spatial aliasing does not occur is 0Hz ⁇ f ⁇ 1133Hz.
- Step 3 Estimate DOA (Direction of Arrival, DOA) information of each source signal based on the separation matrix.
- the guiding vector is defined as:
- the directional pattern of the separation matrix contains zeros in each source direction. Under the condition that the number of microphones is equal to the number of source signals equal to 2, at each frequency point, the zero direction only exists in two specific directions, and these zero directions represent the DOA information of the source signal.
- the DOA information of each sound source can be estimated. We can assume that a smaller angle corresponds to the direction of arrival of the first sound source, and a larger angle corresponds to the direction of arrival of the second sound source. Then the DOA estimate of the first source signal is defined as:
- N is the number of frequency points in the effective frequency range
- ⁇ l (f m ) represents the estimated value of DOA information of the l-th source signal at the m-th frequency point:
- max[x,y](min[x,y]) is a function representing the maximum and minimum values between two numbers.
- DOA estimation plays a significant role in this embodiment.
- the estimated value of DOA is used to initialize the separation matrix; on the other hand, the estimated value of DOA is used to solve the uncertainty of the arrangement order; finally, the estimation of DOA needs to be used.
- the value is used to calculate the separation matrix of the unselected frequency points; it can be seen that the accuracy of the DOA estimate directly affects the stability and convergence of the algorithm.
- Figure 2(a)- Figure 2(b) show the directivity pattern and DOA estimation value of the source signal in an experiment corresponding to the position of the source signal at (2,3) in a non-reverberation environment.
- Step 4 Calculate the mixed signal covariance matrix determinant at each frequency point in the entire frequency domain, and select the frequency points with the determinant greater than the set value to be included in the primary frequency point set, that is, complete a frequency point selection.
- the determinant of the mixed signal covariance matrix is:
- R s (f) is the covariance matrix of the source signal.
- the source signals are independent of each other.
- the covariance matrix of the source signal is expressed as:
- p 1 (f) and p 2 (f) represent the power of the first source signal and the second source signal, respectively, and the determinant of the covariance matrix can be expressed as:
- Step 5 Initialize using the DOA information of the source signal to obtain the initial separation matrix.
- the DOA information obtained from the separation matrix is used to construct a zero beamformer to form an initial separation matrix W ini (f).
- the ij-th element of W ini (f) is written as Since the zero beamformer will set the gain of the undesired source signal direction to zero, for We assume that its observation direction is Zero direction is pointed for We assume that its observation direction is Zero direction is pointed Under this assumption, the initial separation matrix W ini (f m ) satisfies the following equation:
- f m represents the frequency of any primary frequency point
- I 2 ⁇ 2 is a 2 ⁇ 2 unit matrix
- Step 6 Use the initial separation matrix to perform ICA iteration on the primary frequency points to obtain the separation matrix of the primary frequency points, and estimate the DOA information of the source signal again.
- the accurate initial separation matrix extracted from the DOA is used to iterate the primary frequency points according to formula (13).
- the DOA information of the source signal is estimated again from the obtained separation matrix to solve the signal Uncertainty of arrangement order, and used to calculate the separation matrix of unselected frequency points to complete the separation of unselected frequency points.
- Step 7 Perform outlier detection on the DOA information of each source signal, move the detected outliers into the unselected frequency point set, and complete the secondary frequency point selection.
- the DOA information of one of the source signals estimated in a certain experiment is shown in Figure 4, and the true incident angle of the corresponding source signal is 0°. From the figure, we can see that the histogram distribution is similar. In the normal distribution, the frequency points that deviate from the average value by a large angle of 0° are regarded as outliers and should be classified as unselected frequency points. For the primary frequency points, the DOA information of each source signal is detected by this method, and the detected outliers are included in the unselected frequency point set, and the remaining frequency points are the final selected frequency points. point.
- the average value of the DOA of the l-th source signal in the final frequency point set can be calculated as:
- N f is the number of frequency points finally selected.
- Step 8 Construct a hybrid matrix based on the DOA information after the outliers are removed, and solve the separation matrix of the unselected frequency points according to the hybrid matrix.
- the mixing matrix can be expressed by the DOA of the source signal as:
- ⁇ 1 and ⁇ 2 are the DOA estimated values from the first source signal and the second source signal, respectively.
- the separation matrix of the unselected frequency can be obtained by inverting the mixing matrix:
- W us (f) is the unselected frequency point separation matrix
- inv( ⁇ ) represents the inversion of the matrix
- Step 9 Use the method of estimating the DOA information of the signal to solve the problem of permutation uncertainty.
- Figure 5 shows the directivity pattern of the source signal in an experiment where the position of the source signal is (2,3) at the 35th frequency point before solving the arrangement uncertainty problem.
- 6 Draw the directivity pattern of the source signal in an experiment where the position of the source signal is (2, 3) at the 35th frequency point after solving the arrangement uncertainty problem.
- the DOA of the first source signal is 30°
- the DOA of the second source signal is 0°. From Figure 4-7, we can see that the angle corresponding to the first source signal s 1 (f,t) is 0°, and the angle corresponding to the second source signal s 2 (f,t) is 30°.
- the problem of disorderly arrangement is solved.
- the method of clustering by using the DOA information of the source signal solves the problem of arrangement uncertainty as shown in Figure 6, so that the separation results of the same mixed signal at different frequency points are kept consistent.
- Step 10 Use the principle of minimum distortion to solve the problem of amplitude uncertainty.
- diag ( ⁇ ) means to take the elements on the main diagonal.
- the initial separated signal at each frequency point can be expressed as:
- Step 11 Perform inverse Fourier transform according to the separation matrix of all frequency points to reconstruct the separated signal.
- the purpose of this embodiment is to provide a blind source separation system based on initial frequency point selection of the separation matrix.
- the system includes:
- Data acquisition module to acquire the audio signal to be separated
- a data preprocessing module which converts the to-be-separated audio signal into the frequency domain
- the DOA information estimation module performs an ICA iteration on the frequency points in the frequency domain where spatial aliasing does not occur to obtain a separation matrix, and estimates the DOA information of each source signal based on the separation matrix;
- Frequency point selection module once, at each frequency point in the entire frequency domain, select the frequency point according to the determinant of the mixed signal covariance matrix, and put it into the primary frequency point set;
- the separation matrix initialization module performs ICA iteration on the primary frequency points, and initializes with the DOA information of the source signal to obtain the initial separation matrix;
- Select the frequency point separation module use the initial separation matrix to perform ICA iteration on the primary frequency points, obtain the separation matrix of the primary frequency points, and estimate the DOA information of the source signal again;
- the frequency point secondary selection module performs outlier detection according to the DOA information of each source signal, removes the detected outliers, and completes the secondary frequency point selection; wherein, the outlier detection is based on normal distribution Outlier detection method;
- the frequency point separation module is not selected, and the separation matrix of the unselected frequency points is constructed based on the DOA information estimated again;
- the signal reconstruction module performs inverse Fourier transform according to the separation matrix of all frequency points to reconstruct the separated signal.
- the purpose of this embodiment is to provide a binaural hearing aid system, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor.
- the described blind source separation method based on the separation matrix initialization frequency point selection.
- the reverberation room setup used in the simulation experiment is shown in Figure 7.
- the room size is 5.73m*3.56m*2.7m
- the distance between the two microphones is 15cm
- the height is 1.35m.
- the voice signal can be incident from 5 different angles.
- four simulation experiments are set up from different angles, and their corresponding angles are (30°, 0°) ,(30°,-40°),(30°,-80°),(70°,-80°)
- the corresponding source signal positions are (2,3),(2,4),(2, 5),(1,5).
- the source signals used in the experiment are English male and female voices selected from the open speech library VoxForg, and they are processed to grow into 3s speech signals to ensure the consistency of the experimental data.
- the signal received by the microphone is the convolution result of the impulse response produced by the interaction of the source voice signal, the sensor and the surrounding environment.
- This article uses the mirror source method to generate the room impulse response.
- Reverberation Time (RT) is defined as the time required for the energy of the voice signal to attenuate to 60dB.
- RT Reverberation Time
- the reflection and absorption coefficients can be indirectly changed by changing the materials of the walls, floors, and ceilings to obtain different RTs.
- RT When RT>0ms, the speech signal and the room impulse response are convolved to simulate the mixing process in a reverberant environment.
- different RTs will be set for simulation experiments.
- the sampling frequency of the voice signal used in the simulation experiment is 16Khz
- the frame length is 512
- the frame shift is 256
- the Hamming window is used for short-time Fourier transform. All simulation experiments are done on a computer with a CPU of Intel(R)Xeon(R)E5-2643 v4@3.40GHz and a memory of 128.0GHz, and the software platform is MATLAB 2015b.
- the non-reverberation mixing model is very simple. You only need to set the relative position of the source signal and the microphone.
- the signal received by the microphone is just a simple first-order weighted summation of the source signal, that is, the number of taps of the room impulse response is 1. Therefore, the amplitude response of the mixing matrix has nothing to do with frequency, and the phase response has a linear relationship with frequency. Therefore, the actual values of the relative attenuation and delay parameters are equal at any frequency point.
- Set RT 0ms.
- Figure 8 shows the room impulse response from the first source signal to the first microphone in an experiment.
- the convolutional mixing model is relatively complicated.
- the signal received by the microphone is the convolution of the source signal and the impulse response of the room.
- the more taps of the impulse response the more severe the reverberation of the room.
- RT 100ms
- Fig. 9 the room impulse response of the first source signal to the first microphone
- the noise reduction rate is defined as the output signal-to-noise ratio (SNR) minus the input signal-to-noise ratio (SNR) in dB.
- SNR output signal-to-noise ratio
- SNR input signal-to-noise ratio
- the separation matrix, the mixing matrix A(f) is a description of the room impulse response expressed in the frequency domain.
- the number of initial selection frequency points should be considered comprehensively to reduce the complexity of the effect and the overall separation performance of the algorithm.
- the number of initial selection frequency points cannot be too many, otherwise the effect of reducing complexity will be reduced.
- the number of initially selected frequency points cannot be too small, otherwise the estimated normalized attenuation delay parameter may be inaccurate, and the separation performance of unselected frequency points may be reduced.
- the curve of the average value of the mixed signal covariance matrix determinant with frequency is shown in Figure 11. It can reflect the energy distribution of the speech signal to a certain extent. Since the energy of the speech signal is concentrated in the low frequency region, it can be expected that the separation performance of these frequency points is better.
- the total number of frequency points is 256.
- the curve of the average number of primary frequency points with the threshold is shown in Figure 12. From the figure, we can clearly see that the number of primary frequency points increases with the increase of the threshold. It can be expected that the separation performance of the algorithm will also increase with the increase in the number of selected frequency points.
- the algorithm can set different thresholds as needed to meet different performance requirements.
- the initial frequency points of the separation matrix initialization frequency selection FDBSS algorithm proposed in this paper account for 4.81% of the total frequency points, the running time is reduced by 84.4%, and the performance index NRR increases by 44.16%.
- the algorithm proposed in this paper not only greatly reduces the computational complexity, but also significantly improves the separation performance.
- the separation matrix initialization frequency selection FDBSS algorithm proposed in this paper greatly reduces the computational complexity by improving these two aspects.
- we only select a few frequency points with good separation performance for ICA iteration The separation matrix of most unselected frequency points is simple to calculate and does not require ICA iteration.
- the separation matrix of the unselected frequency points is estimated from the arranged DOA parameters, and there is no sorting uncertainty problem. Therefore, the computational complexity is reduced again.
- Table 2 shows the comparison of NRR and running time between the FDBSS algorithm and the traditional FDBSS algorithm proposed in this paper for the initial frequency selection of the separation matrix.
- the values in Table 2 are the average of the results of 1000 experiments.
- One or more embodiments of the present disclosure propose a method for fast blind separation of speech signals based on frequency point selection of separation matrix initialization.
- a frequency point selection is performed within the range.
- the traditional ICA algorithm is used for separation in the frequency domain, if the separation matrix is not well initialized, the convergence and separation performance of the algorithm are not ideal. Therefore, we use the DOA information of the source signal to initialize the separation matrix of each frequency point that has been selected, and then perform ICA iteration to obtain the separation matrix.
- a frequency point selection may select a frequency point with poor separation performance
- the average value of DOA information obtained from the final selected frequency points is used to construct the separation matrix of the unselected frequency points and solve the sorting uncertainty problem.
- the problem of amplitude uncertainty is solved for the separation matrix of all frequency points, and the initial separation of the mixed signal is completed.
- the above technical solution provides a blind source separation method suitable for binaural hearing aid systems, which uses separation matrix initialization to reduce the number of iterations and accelerate the convergence speed of the algorithm;
- a two-stage frequency point selection algorithm is used to select frequency points with good separation performance, which reduces the number of frequency points for performing ICA iteration, thereby reducing the amount of calculation to calculate the separation matrix;
- the running time of the proposed separation matrix initialization frequency selection FDBSS method is significantly shortened, whether it is in a reverberant environment or in a reverberant environment. At the same time, the separation performance is improved.
- modules or steps of the present disclosure can be implemented by a general-purpose computer device. Alternatively, they can be implemented by a program code executable by the computing device, so that they can be stored in a storage device. The device is executed by a computing device, or they are separately fabricated into individual integrated circuit modules, or multiple modules or steps in them are fabricated into a single integrated circuit module for implementation.
- the present disclosure is not limited to any specific combination of hardware and software.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
Description
Claims (10)
- 一种基于分离矩阵初始化频点选择的盲源分离方法,其特征在于,包括以下步骤:获取待分离音频信号,并对所述待分离音频信号进行傅里叶变换;对不会发生空间混叠的频域范围内频点进行一次ICA迭代,得到分离矩阵,并基于所述分离矩阵估计各源信号的DOA信息;在整个频域范围内的每个频点,根据混合信号协方差矩阵的行列式进行频点选择,归入初选频点集合;使用源信号的DOA信息进行初始化,得到初始分离矩阵;然后采用初始分离矩阵对初选频点进行ICA迭代,得到初选频点的分离矩阵,并再次估计源信号的DOA信息;基于再次估计的DOA信息解决排列顺序不确定性问题并构建未选择频点的分离矩阵;根据所有频点的分离矩阵进行傅里叶逆变换,重构得到分离信号。
- 如权利要求1所述的基于分离矩阵初始化频点选择的盲源分离方法,其特征在于,所述根据混合信号协方差矩阵的行列式进行频点选择包括:对于整个频域范围内的每个频点,均计算混合信号协方差矩阵行列式并进行归一化,选择归一化后的行列式值大于设定值的频点归入初选频点集合,其余频点归入未选频点集合。
- 如权利要求1所述的基于分离矩阵初始化频点选择的盲源分离方法,其特征在于,基于所述分离矩阵估计各源信号的DOA信息包括:对于每个频点,通过相应分离矩阵的数组权重和导向矢量相乘获得方向性图案;对各方向性图案中的零方向进行统计,估计各源信号的DOA信息。
- 如权利要求1所述的基于分离矩阵初始化频点选择的盲源分离方法,其特征在于,再次估计源信号的DOA信息后,还根据各源信号的DOA信息进行离群点检测,将检测得到的离群点移除,完成二次频点选择;其中,所述离群点检测采用基于正态分布的离群点检测方法。
- 如权利要求4所述的基于分离矩阵初始化频点选择的盲源分离方法,其特征在于,基于再次估计的DOA信息构建未选择频点的分离矩阵包括:基于离群点移除后的DOA信息构建混合矩阵;对混合矩阵求逆得到未选择频点的分离矩阵。
- 如权利要求4所述的基于分离矩阵初始化频点选择的盲源分离方法,其特征在于,解决排列不确定性问题的方法为:对于已选择频点的方向性图案,根据零方向的指向将各源信号进行聚集,使得不同频点处分离出来的各源信号各自对应的DOA相同。
- 如权利要求1所述的基于分离矩阵初始化频点选择的盲源分离方法,其特征在于,对所有频点的分离矩阵使用最小失真原则解决幅度不确定性问题。
- 一种基于分离矩阵初始化频点选择的盲源分离系统,其特征在于,包括:数据获取模块,获取待分离音频信号;数据预处理模块,将所述待分离音频信号转换到频域;DOA信息估计模块,对不会发生空间混叠的频域范围内频点进行一次ICA迭代,得到分离矩阵,并基于所述分离矩阵估计各源信号的DOA信息;频点一次选择模块,在整个频域范围内的每个频点,根据混合信号协方差矩阵的行列式进行频点选择,归入初选频点集合;分离矩阵初始化模块,对初选频点进行ICA迭代,并使用源信号的DOA信息进行初始化,得到初始分离矩阵;频点分离模块,采用初始分离矩阵对初选频点进行ICA迭代,得到初选频点的分离矩阵,并再次估计源信号的DOA信息;基于再次估计的DOA信息构建未选择频点的分离矩阵;信号重构模块,根据所有频点的分离矩阵进行傅里叶逆变换,重构得到分离信号。
- 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现如权利要求1-7任一项所述的基于分离矩阵初始化频点选择的盲源分离方法。
- 一种双耳助听系统,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现如权利要求1-7任一项所述的基于分离矩阵初始化频点选择的盲源分离方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010161022.1 | 2020-03-10 | ||
CN202010161022.1A CN111415676B (zh) | 2020-03-10 | 2020-03-10 | 一种基于分离矩阵初始化频点选择的盲源分离方法及系统 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021179416A1 true WO2021179416A1 (zh) | 2021-09-16 |
Family
ID=71492893
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/087639 WO2021179416A1 (zh) | 2020-03-10 | 2020-04-29 | 一种基于分离矩阵初始化频点选择的盲源分离方法及系统 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111415676B (zh) |
WO (1) | WO2021179416A1 (zh) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114220453A (zh) * | 2022-01-12 | 2022-03-22 | 中国科学院声学研究所 | 基于频域卷积传递函数的多通道非负矩阵分解方法及系统 |
CN114333897A (zh) * | 2022-03-14 | 2022-04-12 | 青岛科技大学 | 基于多信道噪声方差估计的BrBCA盲源分离方法 |
CN116935883A (zh) * | 2023-09-14 | 2023-10-24 | 北京探境科技有限公司 | 声源定位方法、装置、存储介质及电子设备 |
CN117560663A (zh) * | 2024-01-12 | 2024-02-13 | 数海信息技术有限公司 | 一种基于5g消息的信息交互方法及系统 |
CN117609746A (zh) * | 2023-11-22 | 2024-02-27 | 江南大学 | 一种基于机器学习和聚类算法的盲源分离估计方法 |
CN117609746B (zh) * | 2023-11-22 | 2024-06-07 | 江南大学 | 一种基于机器学习和聚类算法的盲源分离估计方法 |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112285641B (zh) * | 2020-09-16 | 2023-12-29 | 西安空间无线电技术研究所 | 一种基于ica的波达方向doa的估计方法及装置 |
CN112349292B (zh) * | 2020-11-02 | 2024-04-19 | 深圳地平线机器人科技有限公司 | 信号分离方法和装置、计算机可读存储介质、电子设备 |
CN112633427B (zh) * | 2021-03-15 | 2021-05-28 | 四川大学 | 一种基于离群点检测的超高次谐波发射信号检测方法 |
CN113660594B (zh) * | 2021-08-21 | 2024-05-17 | 武汉左点科技有限公司 | 一种助听系统自调节降噪方法及装置 |
CN113804981B (zh) * | 2021-09-15 | 2022-06-24 | 电子科技大学 | 一种时频联合最优化多源多信道信号分离方法 |
CN113783813B (zh) * | 2021-11-11 | 2022-02-08 | 煤炭科学技术研究院有限公司 | 5g通信信号干扰的处理方法、装置、电子设备及存储介质 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090222262A1 (en) * | 2006-03-01 | 2009-09-03 | The Regents Of The University Of California | Systems And Methods For Blind Source Signal Separation |
CN108735227A (zh) * | 2018-06-22 | 2018-11-02 | 北京三听科技有限公司 | 一种用于对麦克风阵列拾取的语音信号进行声源分离的方法及系统 |
CN109616138A (zh) * | 2018-12-27 | 2019-04-12 | 山东大学 | 基于分段频点选择的语音信号盲分离方法和双耳助听系统 |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007033804A (ja) * | 2005-07-26 | 2007-02-08 | Kobe Steel Ltd | 音源分離装置,音源分離プログラム及び音源分離方法 |
CN101667425A (zh) * | 2009-09-22 | 2010-03-10 | 山东大学 | 一种对卷积混叠语音信号进行盲源分离的方法 |
CN106057210B (zh) * | 2016-07-01 | 2017-05-10 | 山东大学 | 双耳间距下基于频点选择的快速语音盲源分离方法 |
CN108364659B (zh) * | 2018-02-05 | 2021-06-01 | 西安电子科技大学 | 基于多目标优化的频域卷积盲信号分离方法 |
CN110010148B (zh) * | 2019-03-19 | 2021-03-16 | 中国科学院声学研究所 | 一种低复杂度的频域盲分离方法及系统 |
-
2020
- 2020-03-10 CN CN202010161022.1A patent/CN111415676B/zh active Active
- 2020-04-29 WO PCT/CN2020/087639 patent/WO2021179416A1/zh active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090222262A1 (en) * | 2006-03-01 | 2009-09-03 | The Regents Of The University Of California | Systems And Methods For Blind Source Signal Separation |
CN108735227A (zh) * | 2018-06-22 | 2018-11-02 | 北京三听科技有限公司 | 一种用于对麦克风阵列拾取的语音信号进行声源分离的方法及系统 |
CN109616138A (zh) * | 2018-12-27 | 2019-04-12 | 山东大学 | 基于分段频点选择的语音信号盲分离方法和双耳助听系统 |
Non-Patent Citations (2)
Title |
---|
HIROSHI SARUWATARI , TOSHIYA KAWAMURA , TSUYOKI NISHIKAWA , AKINOBU LEE , KIYOHIRO SHIKANO: "Blind Source Separation Based on a Fast-Convergence Algorithm Combining ICA and Beamforming", IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, vol. 14, no. 2, 1 March 2006 (2006-03-01), pages 666 - 678, XP008131945, ISSN: 1558-7916, DOI: 10.1109/TSA.2005.855832 * |
LIU BAIYUN; WEI YING: "A fast blind source separation algorithm for binaural hearing aids based on frequency bin selection", 2018 IEEE 23RD INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 19 November 2018 (2018-11-19), pages 1 - 5, XP033512516, ISSN: 2165-3577, DOI: 10.1109/ICDSP.2018.8631688 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114220453A (zh) * | 2022-01-12 | 2022-03-22 | 中国科学院声学研究所 | 基于频域卷积传递函数的多通道非负矩阵分解方法及系统 |
CN114333897A (zh) * | 2022-03-14 | 2022-04-12 | 青岛科技大学 | 基于多信道噪声方差估计的BrBCA盲源分离方法 |
CN116935883A (zh) * | 2023-09-14 | 2023-10-24 | 北京探境科技有限公司 | 声源定位方法、装置、存储介质及电子设备 |
CN116935883B (zh) * | 2023-09-14 | 2023-12-29 | 北京探境科技有限公司 | 声源定位方法、装置、存储介质及电子设备 |
CN117609746A (zh) * | 2023-11-22 | 2024-02-27 | 江南大学 | 一种基于机器学习和聚类算法的盲源分离估计方法 |
CN117609746B (zh) * | 2023-11-22 | 2024-06-07 | 江南大学 | 一种基于机器学习和聚类算法的盲源分离估计方法 |
CN117560663A (zh) * | 2024-01-12 | 2024-02-13 | 数海信息技术有限公司 | 一种基于5g消息的信息交互方法及系统 |
CN117560663B (zh) * | 2024-01-12 | 2024-03-12 | 数海信息技术有限公司 | 一种基于5g消息的信息交互方法及系统 |
Also Published As
Publication number | Publication date |
---|---|
CN111415676B (zh) | 2022-10-18 |
CN111415676A (zh) | 2020-07-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021179416A1 (zh) | 一种基于分离矩阵初始化频点选择的盲源分离方法及系统 | |
CN107452389B (zh) | 一种通用的单声道实时降噪方法 | |
CN109616138B (zh) | 基于分段频点选择的语音信号盲分离方法和双耳助听系统 | |
CN107703486B (zh) | 一种基于卷积神经网络cnn的声源定位方法 | |
US8363850B2 (en) | Audio signal processing method and apparatus for the same | |
US9654894B2 (en) | Selective audio source enhancement | |
US9570087B2 (en) | Single channel suppression of interfering sources | |
WO2021179424A1 (zh) | 结合ai模型的语音增强方法、系统、电子设备和介质 | |
US20220068288A1 (en) | Signal processing apparatus, signal processing method, and program | |
WO2020224226A1 (zh) | 基于语音处理的语音增强方法及相关设备 | |
Cord-Landwehr et al. | Monaural source separation: From anechoic to reverberant environments | |
WO2019014890A1 (zh) | 一种通用的单声道实时降噪方法 | |
WO2015129760A1 (ja) | 信号処理装置、方法及びプログラム | |
Pujol et al. | BeamLearning: An end-to-end deep learning approach for the angular localization of sound sources using raw multichannel acoustic pressure data | |
CN110544490A (zh) | 一种基于高斯混合模型和空间功率谱特征的声源定位方法 | |
JP6748304B2 (ja) | ニューラルネットワークを用いた信号処理装置、ニューラルネットワークを用いた信号処理方法及び信号処理プログラム | |
Aroudi et al. | Dbnet: Doa-driven beamforming network for end-to-end reverberant sound source separation | |
JP5911101B2 (ja) | 音響信号解析装置、方法、及びプログラム | |
JP6538624B2 (ja) | 信号処理装置、信号処理方法および信号処理プログラム | |
Fu et al. | Sparse modeling of the early part of noisy room impulse responses with sparse bayesian learning | |
Higuchi et al. | Unified approach for audio source separation with multichannel factorial HMM and DOA mixture model | |
Dwivedi et al. | Joint doa estimation in spherical harmonics domain using low complexity cnn | |
CN112802490A (zh) | 一种基于传声器阵列的波束形成方法和装置 | |
CN116052702A (zh) | 一种基于卡尔曼滤波的低复杂度多通道去混响降噪方法 | |
Aroudi et al. | DBNET: DOA-driven beamforming network for end-to-end farfield sound source separation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20924341 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20924341 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 07.06.2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20924341 Country of ref document: EP Kind code of ref document: A1 |