WO2021179416A1 - 一种基于分离矩阵初始化频点选择的盲源分离方法及系统 - Google Patents

一种基于分离矩阵初始化频点选择的盲源分离方法及系统 Download PDF

Info

Publication number
WO2021179416A1
WO2021179416A1 PCT/CN2020/087639 CN2020087639W WO2021179416A1 WO 2021179416 A1 WO2021179416 A1 WO 2021179416A1 CN 2020087639 W CN2020087639 W CN 2020087639W WO 2021179416 A1 WO2021179416 A1 WO 2021179416A1
Authority
WO
WIPO (PCT)
Prior art keywords
separation
separation matrix
frequency point
frequency
matrix
Prior art date
Application number
PCT/CN2020/087639
Other languages
English (en)
French (fr)
Inventor
魏莹
刘百云
Original Assignee
山东大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 山东大学 filed Critical 山东大学
Publication of WO2021179416A1 publication Critical patent/WO2021179416A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation

Definitions

  • the present disclosure belongs to the technical field of audio signal processing, and in particular relates to a blind source separation method and system based on a separation matrix initialization frequency point selection.
  • Blind Source Separation is a process of only separating the original source signal from the received mixed signal according to the statistical characteristics of the input source signal without knowing any parameters of the input source signal and the transmission channel. Because the BSS algorithm has fewer requirements for source signals and has a very wide range of applications, it has attracted more and more experts and researchers' attention.
  • BSS can maintain the binaural cues of all sound sources through post-processing technology while performing speech enhancement to eliminate human voice interference.
  • the cocktail party problem how to locate the sound you are interested in from a noisy venue, is very difficult for hearing impaired patients. Due to the time delay caused by sound propagation and the multipath caused by sound reflection, the signal received by the microphone in a real reverberation environment is the convolutional mixture of the source signal, but due to the multi-channel convolution operation involved, they are in the time domain It is difficult to code, and the algorithm converges slowly, and it is difficult to converge to the global optimum.
  • FDBSS Frequency Domain Blind Source Separation
  • reducing the computational complexity of the algorithm without affecting the separation performance can be started from the following three aspects: (a) reducing the number of ICA iterations; (b) reducing the number of frequency points for executing ICA iterations; c) Combining (a) and (b), both reduce the number of ICA iterations and the number of frequency points for ICA iteration.
  • DOA Direction of Arrival
  • the DOA information of the unknown source signal is estimated through covariance fitting.
  • Using the estimated DOA information to form an accurate initial separation matrix can reduce the number of ICA iterations and speed up the convergence.
  • the present disclosure provides a blind source separation method and system based on the frequency point selection of the separation matrix initialization.
  • the method initializes the separation matrix through the DOA information of the source signal and accelerates the convergence speed of the algorithm. Improve separation performance.
  • one or more embodiments of the present disclosure provide the following technical solutions:
  • a method for blind source separation based on initial frequency point selection of a separation matrix including the following steps:
  • the frequency point is selected according to the determinant of the mixed signal covariance matrix, and is classified into the primary frequency point set;
  • One or more embodiments provide a blind source separation system based on initial frequency point selection of a separation matrix, including:
  • Data acquisition module to acquire the audio signal to be separated
  • a data preprocessing module which converts the to-be-separated audio signal into the frequency domain
  • the DOA information estimation module performs an ICA iteration on the frequency points in the frequency domain where spatial aliasing does not occur to obtain a separation matrix, and estimates the DOA information of each source signal based on the separation matrix;
  • Frequency point selection module once, at each frequency point in the entire frequency domain, select the frequency point according to the mixed signal covariance matrix, and put it into the primary frequency point set;
  • the separation matrix initialization module uses the DOA information of the source signal to initialize to obtain the initial separation matrix and perform ICA iteration for the primary frequency points;
  • the frequency point separation module uses the initial separation matrix to perform ICA iteration on the primary frequency points to obtain the separation matrix of the primary frequency points, and re-estimate the DOA information of the source signal; construct the separation matrix of the unselected frequency points based on the re-estimated DOA information ;
  • the signal reconstruction module performs inverse Fourier transform according to the separation matrix of all frequency points to reconstruct the separated signal.
  • One or more embodiments provide a computer-readable storage medium having a computer program stored thereon, and when the program is executed by a processor, the blind source separation method based on the initialization frequency point selection of the separation matrix is realized.
  • One or more embodiments provide a binaural hearing aid system, including a memory, a processor, and a computer program stored in the memory and running on the processor.
  • a blind source separation method based on the initial frequency selection of the separation matrix.
  • the above technical solution provides a blind source separation method suitable for a binaural hearing aid system.
  • the separation matrix is initialized to accelerate the convergence speed of the algorithm and reduce the amount of calculation for calculating the separation matrix.
  • the running time of the proposed separation matrix initialization frequency selection FDBSS method is significantly shortened, whether it is in a reverberant environment or in a reverberant environment. At the same time, the separation performance is improved.
  • FIG. 1 is a flowchart of a blind source separation method based on frequency point selection of a separation matrix initialization according to one or more embodiments of the present disclosure
  • Figure 4 shows the estimated value of the source signal DOA when the incident angle is 0° in the simulation experiment
  • Figure 5 shows the directional patterns at different frequency points before solving the arrangement uncertainty problem in the simulation experiment
  • Figure 6 shows the directional patterns at different frequency points after solving the arrangement uncertainty problem in the simulation experiment
  • Figure 7 shows the simulation experiment room setting
  • Figure 11 is the distribution diagram of the determinant of the normalized covariance matrix with frequency in the simulation experiment.
  • Figure 12 is a distribution diagram of the number of initially selected frequency points versus the threshold in the simulation experiment
  • Figures 13(a) and 13(b) are performance comparison diagrams of the method provided by the embodiment and the traditional method under different iteration times in the simulation experiment;
  • Fig. 14(a) and Fig. 14(b) are the curves of dN and running time decreasing percentage with threshold value under 4 pairs of different signal arrival directions in the simulation experiment;
  • Figure 15(a) and Figure 15(b) show the performance comparison between the proposed algorithm and the traditional algorithm under different iteration times in the simulation experiment.
  • the blind source separation algorithm has three basic models: instantaneous mixing model, non-reverberation mixing model and convolutional mixing model.
  • instantaneous mixing model we assume here that the mixing of voice signals is instantaneous, that is, the time difference between different signals reaching each microphone is negligible.
  • the signal received by the microphone is a linear mixture of the source signal, which can be expressed as:
  • Expression (1) can be expressed in the form of matrix and vector as:
  • A is the N ⁇ M mixing matrix.
  • W ⁇ A is the unit matrix, and the separation matrix W can be expressed as the inverse of the mixing matrix A.
  • the quantized natural gradient algorithm is an improvement from the Infomax algorithm.
  • the Infomax algorithm uses a nonlinear function to transform the separation matrix from the perspective of information theory, and completes the separation by maximizing the output entropy.
  • the iterative formula for calculating the separation matrix using the quantized natural gradient algorithm can be expressed as:
  • the nonlinear function is selected as:
  • is a factor for adjusting the nonlinear gain
  • ⁇ ( ⁇ ) represents the argument
  • Step 1 Obtain an audio signal to be separated, and perform Fourier transform on the audio signal to be separated.
  • the source signal vector, mixed signal vector and mixed matrix in the frequency domain can be expressed as:
  • is the delay parameter
  • is the attenuation parameter
  • It represents the arrival delay of the second source signal observed at the first microphone from the ⁇ 2 direction
  • ⁇ 12 represents the arrival attenuation of the second source signal observed at the first microphone from the ⁇ 2 direction
  • d is the distance between the microphones
  • is the DOA of the source signal
  • the value of ⁇ is put into formula (10) to obtain:
  • Step 2 Perform an ICA (Independent Component Analysis, ICA) iteration on the frequency points in the frequency domain where spatial aliasing will not occur to obtain a separation matrix; among them, the frequency domain where spatial aliasing will not occur is based on the binaural hearing aid The distance between the two microphones is determined. Specifically, the frequency domain range FL in which spatial aliasing does not occur can be calculated as:
  • c is the speed of sound, which is about 340m/s
  • d is the distance between the microphones, which is about 15cm.
  • the frequency range where spatial aliasing does not occur is 0Hz ⁇ f ⁇ 1133Hz.
  • Step 3 Estimate DOA (Direction of Arrival, DOA) information of each source signal based on the separation matrix.
  • the guiding vector is defined as:
  • the directional pattern of the separation matrix contains zeros in each source direction. Under the condition that the number of microphones is equal to the number of source signals equal to 2, at each frequency point, the zero direction only exists in two specific directions, and these zero directions represent the DOA information of the source signal.
  • the DOA information of each sound source can be estimated. We can assume that a smaller angle corresponds to the direction of arrival of the first sound source, and a larger angle corresponds to the direction of arrival of the second sound source. Then the DOA estimate of the first source signal is defined as:
  • N is the number of frequency points in the effective frequency range
  • ⁇ l (f m ) represents the estimated value of DOA information of the l-th source signal at the m-th frequency point:
  • max[x,y](min[x,y]) is a function representing the maximum and minimum values between two numbers.
  • DOA estimation plays a significant role in this embodiment.
  • the estimated value of DOA is used to initialize the separation matrix; on the other hand, the estimated value of DOA is used to solve the uncertainty of the arrangement order; finally, the estimation of DOA needs to be used.
  • the value is used to calculate the separation matrix of the unselected frequency points; it can be seen that the accuracy of the DOA estimate directly affects the stability and convergence of the algorithm.
  • Figure 2(a)- Figure 2(b) show the directivity pattern and DOA estimation value of the source signal in an experiment corresponding to the position of the source signal at (2,3) in a non-reverberation environment.
  • Step 4 Calculate the mixed signal covariance matrix determinant at each frequency point in the entire frequency domain, and select the frequency points with the determinant greater than the set value to be included in the primary frequency point set, that is, complete a frequency point selection.
  • the determinant of the mixed signal covariance matrix is:
  • R s (f) is the covariance matrix of the source signal.
  • the source signals are independent of each other.
  • the covariance matrix of the source signal is expressed as:
  • p 1 (f) and p 2 (f) represent the power of the first source signal and the second source signal, respectively, and the determinant of the covariance matrix can be expressed as:
  • Step 5 Initialize using the DOA information of the source signal to obtain the initial separation matrix.
  • the DOA information obtained from the separation matrix is used to construct a zero beamformer to form an initial separation matrix W ini (f).
  • the ij-th element of W ini (f) is written as Since the zero beamformer will set the gain of the undesired source signal direction to zero, for We assume that its observation direction is Zero direction is pointed for We assume that its observation direction is Zero direction is pointed Under this assumption, the initial separation matrix W ini (f m ) satisfies the following equation:
  • f m represents the frequency of any primary frequency point
  • I 2 ⁇ 2 is a 2 ⁇ 2 unit matrix
  • Step 6 Use the initial separation matrix to perform ICA iteration on the primary frequency points to obtain the separation matrix of the primary frequency points, and estimate the DOA information of the source signal again.
  • the accurate initial separation matrix extracted from the DOA is used to iterate the primary frequency points according to formula (13).
  • the DOA information of the source signal is estimated again from the obtained separation matrix to solve the signal Uncertainty of arrangement order, and used to calculate the separation matrix of unselected frequency points to complete the separation of unselected frequency points.
  • Step 7 Perform outlier detection on the DOA information of each source signal, move the detected outliers into the unselected frequency point set, and complete the secondary frequency point selection.
  • the DOA information of one of the source signals estimated in a certain experiment is shown in Figure 4, and the true incident angle of the corresponding source signal is 0°. From the figure, we can see that the histogram distribution is similar. In the normal distribution, the frequency points that deviate from the average value by a large angle of 0° are regarded as outliers and should be classified as unselected frequency points. For the primary frequency points, the DOA information of each source signal is detected by this method, and the detected outliers are included in the unselected frequency point set, and the remaining frequency points are the final selected frequency points. point.
  • the average value of the DOA of the l-th source signal in the final frequency point set can be calculated as:
  • N f is the number of frequency points finally selected.
  • Step 8 Construct a hybrid matrix based on the DOA information after the outliers are removed, and solve the separation matrix of the unselected frequency points according to the hybrid matrix.
  • the mixing matrix can be expressed by the DOA of the source signal as:
  • ⁇ 1 and ⁇ 2 are the DOA estimated values from the first source signal and the second source signal, respectively.
  • the separation matrix of the unselected frequency can be obtained by inverting the mixing matrix:
  • W us (f) is the unselected frequency point separation matrix
  • inv( ⁇ ) represents the inversion of the matrix
  • Step 9 Use the method of estimating the DOA information of the signal to solve the problem of permutation uncertainty.
  • Figure 5 shows the directivity pattern of the source signal in an experiment where the position of the source signal is (2,3) at the 35th frequency point before solving the arrangement uncertainty problem.
  • 6 Draw the directivity pattern of the source signal in an experiment where the position of the source signal is (2, 3) at the 35th frequency point after solving the arrangement uncertainty problem.
  • the DOA of the first source signal is 30°
  • the DOA of the second source signal is 0°. From Figure 4-7, we can see that the angle corresponding to the first source signal s 1 (f,t) is 0°, and the angle corresponding to the second source signal s 2 (f,t) is 30°.
  • the problem of disorderly arrangement is solved.
  • the method of clustering by using the DOA information of the source signal solves the problem of arrangement uncertainty as shown in Figure 6, so that the separation results of the same mixed signal at different frequency points are kept consistent.
  • Step 10 Use the principle of minimum distortion to solve the problem of amplitude uncertainty.
  • diag ( ⁇ ) means to take the elements on the main diagonal.
  • the initial separated signal at each frequency point can be expressed as:
  • Step 11 Perform inverse Fourier transform according to the separation matrix of all frequency points to reconstruct the separated signal.
  • the purpose of this embodiment is to provide a blind source separation system based on initial frequency point selection of the separation matrix.
  • the system includes:
  • Data acquisition module to acquire the audio signal to be separated
  • a data preprocessing module which converts the to-be-separated audio signal into the frequency domain
  • the DOA information estimation module performs an ICA iteration on the frequency points in the frequency domain where spatial aliasing does not occur to obtain a separation matrix, and estimates the DOA information of each source signal based on the separation matrix;
  • Frequency point selection module once, at each frequency point in the entire frequency domain, select the frequency point according to the determinant of the mixed signal covariance matrix, and put it into the primary frequency point set;
  • the separation matrix initialization module performs ICA iteration on the primary frequency points, and initializes with the DOA information of the source signal to obtain the initial separation matrix;
  • Select the frequency point separation module use the initial separation matrix to perform ICA iteration on the primary frequency points, obtain the separation matrix of the primary frequency points, and estimate the DOA information of the source signal again;
  • the frequency point secondary selection module performs outlier detection according to the DOA information of each source signal, removes the detected outliers, and completes the secondary frequency point selection; wherein, the outlier detection is based on normal distribution Outlier detection method;
  • the frequency point separation module is not selected, and the separation matrix of the unselected frequency points is constructed based on the DOA information estimated again;
  • the signal reconstruction module performs inverse Fourier transform according to the separation matrix of all frequency points to reconstruct the separated signal.
  • the purpose of this embodiment is to provide a binaural hearing aid system, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor.
  • the described blind source separation method based on the separation matrix initialization frequency point selection.
  • the reverberation room setup used in the simulation experiment is shown in Figure 7.
  • the room size is 5.73m*3.56m*2.7m
  • the distance between the two microphones is 15cm
  • the height is 1.35m.
  • the voice signal can be incident from 5 different angles.
  • four simulation experiments are set up from different angles, and their corresponding angles are (30°, 0°) ,(30°,-40°),(30°,-80°),(70°,-80°)
  • the corresponding source signal positions are (2,3),(2,4),(2, 5),(1,5).
  • the source signals used in the experiment are English male and female voices selected from the open speech library VoxForg, and they are processed to grow into 3s speech signals to ensure the consistency of the experimental data.
  • the signal received by the microphone is the convolution result of the impulse response produced by the interaction of the source voice signal, the sensor and the surrounding environment.
  • This article uses the mirror source method to generate the room impulse response.
  • Reverberation Time (RT) is defined as the time required for the energy of the voice signal to attenuate to 60dB.
  • RT Reverberation Time
  • the reflection and absorption coefficients can be indirectly changed by changing the materials of the walls, floors, and ceilings to obtain different RTs.
  • RT When RT>0ms, the speech signal and the room impulse response are convolved to simulate the mixing process in a reverberant environment.
  • different RTs will be set for simulation experiments.
  • the sampling frequency of the voice signal used in the simulation experiment is 16Khz
  • the frame length is 512
  • the frame shift is 256
  • the Hamming window is used for short-time Fourier transform. All simulation experiments are done on a computer with a CPU of Intel(R)Xeon(R)E5-2643 v4@3.40GHz and a memory of 128.0GHz, and the software platform is MATLAB 2015b.
  • the non-reverberation mixing model is very simple. You only need to set the relative position of the source signal and the microphone.
  • the signal received by the microphone is just a simple first-order weighted summation of the source signal, that is, the number of taps of the room impulse response is 1. Therefore, the amplitude response of the mixing matrix has nothing to do with frequency, and the phase response has a linear relationship with frequency. Therefore, the actual values of the relative attenuation and delay parameters are equal at any frequency point.
  • Set RT 0ms.
  • Figure 8 shows the room impulse response from the first source signal to the first microphone in an experiment.
  • the convolutional mixing model is relatively complicated.
  • the signal received by the microphone is the convolution of the source signal and the impulse response of the room.
  • the more taps of the impulse response the more severe the reverberation of the room.
  • RT 100ms
  • Fig. 9 the room impulse response of the first source signal to the first microphone
  • the noise reduction rate is defined as the output signal-to-noise ratio (SNR) minus the input signal-to-noise ratio (SNR) in dB.
  • SNR output signal-to-noise ratio
  • SNR input signal-to-noise ratio
  • the separation matrix, the mixing matrix A(f) is a description of the room impulse response expressed in the frequency domain.
  • the number of initial selection frequency points should be considered comprehensively to reduce the complexity of the effect and the overall separation performance of the algorithm.
  • the number of initial selection frequency points cannot be too many, otherwise the effect of reducing complexity will be reduced.
  • the number of initially selected frequency points cannot be too small, otherwise the estimated normalized attenuation delay parameter may be inaccurate, and the separation performance of unselected frequency points may be reduced.
  • the curve of the average value of the mixed signal covariance matrix determinant with frequency is shown in Figure 11. It can reflect the energy distribution of the speech signal to a certain extent. Since the energy of the speech signal is concentrated in the low frequency region, it can be expected that the separation performance of these frequency points is better.
  • the total number of frequency points is 256.
  • the curve of the average number of primary frequency points with the threshold is shown in Figure 12. From the figure, we can clearly see that the number of primary frequency points increases with the increase of the threshold. It can be expected that the separation performance of the algorithm will also increase with the increase in the number of selected frequency points.
  • the algorithm can set different thresholds as needed to meet different performance requirements.
  • the initial frequency points of the separation matrix initialization frequency selection FDBSS algorithm proposed in this paper account for 4.81% of the total frequency points, the running time is reduced by 84.4%, and the performance index NRR increases by 44.16%.
  • the algorithm proposed in this paper not only greatly reduces the computational complexity, but also significantly improves the separation performance.
  • the separation matrix initialization frequency selection FDBSS algorithm proposed in this paper greatly reduces the computational complexity by improving these two aspects.
  • we only select a few frequency points with good separation performance for ICA iteration The separation matrix of most unselected frequency points is simple to calculate and does not require ICA iteration.
  • the separation matrix of the unselected frequency points is estimated from the arranged DOA parameters, and there is no sorting uncertainty problem. Therefore, the computational complexity is reduced again.
  • Table 2 shows the comparison of NRR and running time between the FDBSS algorithm and the traditional FDBSS algorithm proposed in this paper for the initial frequency selection of the separation matrix.
  • the values in Table 2 are the average of the results of 1000 experiments.
  • One or more embodiments of the present disclosure propose a method for fast blind separation of speech signals based on frequency point selection of separation matrix initialization.
  • a frequency point selection is performed within the range.
  • the traditional ICA algorithm is used for separation in the frequency domain, if the separation matrix is not well initialized, the convergence and separation performance of the algorithm are not ideal. Therefore, we use the DOA information of the source signal to initialize the separation matrix of each frequency point that has been selected, and then perform ICA iteration to obtain the separation matrix.
  • a frequency point selection may select a frequency point with poor separation performance
  • the average value of DOA information obtained from the final selected frequency points is used to construct the separation matrix of the unselected frequency points and solve the sorting uncertainty problem.
  • the problem of amplitude uncertainty is solved for the separation matrix of all frequency points, and the initial separation of the mixed signal is completed.
  • the above technical solution provides a blind source separation method suitable for binaural hearing aid systems, which uses separation matrix initialization to reduce the number of iterations and accelerate the convergence speed of the algorithm;
  • a two-stage frequency point selection algorithm is used to select frequency points with good separation performance, which reduces the number of frequency points for performing ICA iteration, thereby reducing the amount of calculation to calculate the separation matrix;
  • the running time of the proposed separation matrix initialization frequency selection FDBSS method is significantly shortened, whether it is in a reverberant environment or in a reverberant environment. At the same time, the separation performance is improved.
  • modules or steps of the present disclosure can be implemented by a general-purpose computer device. Alternatively, they can be implemented by a program code executable by the computing device, so that they can be stored in a storage device. The device is executed by a computing device, or they are separately fabricated into individual integrated circuit modules, or multiple modules or steps in them are fabricated into a single integrated circuit module for implementation.
  • the present disclosure is not limited to any specific combination of hardware and software.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

本公开公开了一种基于分离矩阵初始化频点选择的盲源分离方法及系统,包括:获取待分离音频信号并进行傅里叶变换;对不会发生空间混叠的频域范围内频点进行一次ICA迭代,得到分离矩阵,并估计各源信号的DOA信息;根据混合信号协方差矩阵对频域范围内的频点进行频点选择,归入初选频点集合;对初选频点进行ICA迭代,并使用源信号的DOA信息进行初始化,得到初始分离矩阵;然后采用初始分离矩阵对初选频点进行ICA迭代,得到初选频点的分离矩阵,并再次估计源信号的DOA信息;基于再次估计的DOA信息构建未选择频点的分离矩阵;根据所有频点的分离矩阵进行傅里叶逆变换,重构得到分离信号。本公开通过将分离矩阵初始化,加快算法的收敛速度,提高分离性能。

Description

一种基于分离矩阵初始化频点选择的盲源分离方法及系统 技术领域
本公开属于音频信号处理技术领域,尤其涉及一种基于分离矩阵初始化频点选择的盲源分离方法及系统。
背景技术
本部分的陈述仅仅是提供了与本公开相关的背景技术信息,不必然构成在先技术。
随着计算机技术的发展和快速傅里叶变化的提出,数字信号处理在移动通信、语音信号处理、生物医学信号处理等领域得到广泛的应用。盲源分离(Blind Source Separation,BSS)作为数字信号处理中一种新的研究方向也随之应运而生。盲源分离是在不知道输入源信号和传输通道任何参数的情况下,根据输入源信号的统计特性,仅从接收到的混合信号中分离出原始源信号的过程。由于BSS算法对源信号的要求较少,应用范围非常广泛,得到了越来越多的专家和学者的关注。
值得注意的一点是,BSS可以在进行语音增强消除人声干扰的同时通过后处理技术保持所有声源的双耳线索,这在双耳助听系统的应用方面具有非常大的潜力。例如,鸡尾酒会问题,如何从嘈杂的会场中,定位你感兴趣的声音,这对听障患者来说是非常难的。由于声音传播引起的时间延迟和由声音反射产生的多径,在真实的混响环境中麦克风接收到的信号为源信号的卷积混合,但由于涉及多通道卷积运算,它们在时域中难以编码,而且算法收敛缓慢,很难收敛到全局最优。简化卷积混合的一种方法是将任务变换到频域,时域卷积变换为频域相乘。频域盲源分离(Frequency Domain Blind Source Separation,FDBSS)算法,通过在每个频点上单独进行迭代得到分离矩阵,完成混合信号的分离。但是,该算法的计算复杂度非常高,这与助听系统低延迟和低功耗的需求相矛盾。因此,要想将FDBSS算法应用到双耳助听系统中,最首要的问题是要降低算法的计算复杂度。
根据前人的相关工作,在不影响分离性能的条件下降低算法的计算复杂度可以从以下三个方面着手:(a)减少ICA迭代次数;(b)减少执行ICA迭代的频点数目;(c)结合(a)和(b),既减小ICA迭代次数又减小进行ICA迭代的频点数目。对于已知一个源信号的波达方向(Direction of Arrival,DOA)的半盲系统,通过协方差拟合估计出未知源信号的DOA信息。使用估计出来的DOA信息形成精确的初始分离矩阵,可以减少ICA迭代次数,加快收敛速度。然后只选择分离性能好的频点进行ICA迭代,这又减少了分离矩阵的计算量。但该方法的最大局限性在于它需要知道其中一个源信号的方位信息,只适合于小间距麦克风下的半盲系统,将它们直接应用到双耳助听系统中是有问题的。其次,我们知道传统的FDICA(Frequency Domain Independent Component Analysis,FDICA)算法是通过在每个频点上迭代寻优来估计 分离矩阵的,具有非线性收敛缓慢的缺点。如果没有一个较好的初始化分离矩阵,在迭代的过程中,分离矩阵的估计值与实际值之间的误差越来越大,使得算法发散很难快速的收敛到全局最优,并导致最终的分离性能不理想。
发明内容
为克服上述现有技术的不足,本公开提供了一种基于分离矩阵初始化频点选择的盲源分离方法及系统,所述方法通过源信号的DOA信息将分离矩阵初始化,加快算法的收敛速度,提高分离性能。
为实现上述目的,本公开的一个或多个实施例提供了如下技术方案:
一种基于分离矩阵初始化频点选择的盲源分离方法,包括以下步骤:
获取待分离音频信号,并对所述待分离音频信号进行傅里叶变换;
对不会发生空间混叠的频域范围内频点进行一次ICA迭代,得到分离矩阵,并基于所述分离矩阵估计各源信号的DOA信息;
在整个频域范围内的每个频点,根据混合信号协方差矩阵的行列式进行频点选择,归入初选频点集合;
使用源信号的DOA信息进行初始化,得到初始分离矩阵,并对初选频点进行ICA迭代;然后采用初始分离矩阵对初选频点进行ICA迭代,得到初选频点的分离矩阵,并再次估计源信号的DOA信息;
基于再次估计的DOA信息解决排列顺序不确定性问题并构建未选择频点的分离矩阵;
根据所有频点的分离矩阵进行傅里叶逆变换,重构得到分离信号。
一个或多个实施例提供了一种基于分离矩阵初始化频点选择的盲源分离系统,包括:
数据获取模块,获取待分离音频信号;
数据预处理模块,将所述待分离音频信号转换到频域;
DOA信息估计模块,对不会发生空间混叠的频域范围内频点进行一次ICA迭代,得到分离矩阵,并基于所述分离矩阵估计各源信号的DOA信息;
频点一次选择模块,在整个频域范围内的每个频点,根据混合信号协方差矩阵进行频点选择,归入初选频点集合;
分离矩阵初始化模块,使用源信号的DOA信息进行初始化,得到初始分离矩阵并对初选频点进行ICA迭代;
频点分离模块,采用初始分离矩阵对初选频点进行ICA迭代,得到初选频点的分离矩阵,并再次估计源信号的DOA信息;基于再次估计的DOA信息构建未选择频点的分离矩阵;
信号重构模块,根据所有频点的分离矩阵进行傅里叶逆变换,重构得到分离信号。
一个或多个实施例提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被 处理器执行时实现所述的基于分离矩阵初始化频点选择的盲源分离方法。
一个或多个实施例提供了一种双耳助听系统,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现所述的基于分离矩阵初始化频点选择的盲源分离方法。
以上一个或多个技术方案存在以下有益效果:
上述技术方案提供了一种适用于双耳助听系统的盲源分离方法,采用分离矩阵初始化,加快算法的收敛速度,减小了计算分离矩阵的计算量。无论是在无混响还是在混响环境中,与传统FDBSS算法相比,所提出的分离矩阵初始化频点选择FDBSS方法运行时间均明显缩短,同时,分离性能得到了提升。
附图说明
构成本公开的一部分的说明书附图用来提供对本公开的进一步理解,本公开的示意性实施例及其说明用于解释本公开,并不构成对本公开的不当限定。
图1为本公开一个或多个实施例提供的一种基于分离矩阵初始化频点选择的盲源分离方法流程图;
图2(a)为仿真实验中RT=0ms无混响条件下两个源信号的方向性图案;
图2(b)为仿真实验中RT=0ms无混响条件下两个源信号的DOA估计值;
图3(a)为仿真实验中RT=100ms混响条件下两个源信号的方向性图案;
图3(b)为仿真实验中RT=100ms混响条件下两个源信号的DOA估计值;
图4为仿真实验中对应入射角为0°时源信号DOA的估计值;
图5为仿真实验中解决排列不确定性问题前不同频点处的方向性图案;
图6为仿真实验中解决排列不确定性问题后不同频点处的方向性图案;
图7为仿真实验房间设置;
图8为仿真实验中RT=0ms时一次实验中第一个源信号到第一个麦克风的房间冲击响应;
图9为仿真实验中RT=100ms时一次实验中第一个源信号到第一个麦克风的房间冲击响应;
图10为仿真实验中RT=200ms时一次实验中第一个源信号到第一个麦克风的房间冲击响应;
图11为仿真实验中归一化协方差矩阵的行列式随频率的分布图;
图12为仿真实验中初始选择频点数量随阈值的分布图;
图13(a)和图13(b)分别为仿真实验中不同迭代次数下实施例所提供方法与传统方法的性能对比图;
图14(a)和图14(b)分别为仿真实验中在4对不同信号到达方向下dN和运行时间下降百分比随阈值变化曲线;
图15(a)和图15(b)分别为仿真实验中不同迭代次数下本文提出算法与传统算法的性能比较。
具体实施方式
应该指出,以下详细说明都是示例性的,旨在对本公开提供进一步的说明。除非另有指明,本文使用的所有技术和科学术语具有与本公开所属技术领域的普通技术人员通常理解的相同含义。
需要注意的是,这里所使用的术语仅是为了描述具体实施方式,而非意图限制根据本公开的示例性实施方式。如在这里所使用的,除非上下文另外明确指出,否则单数形式也意图包括复数形式,此外,还应当理解的是,当在本说明书中使用术语“包含”和/或“包括”时,其指明存在特征、步骤、操作、器件、组件和/或它们的组合。
在不冲突的情况下,本公开中的实施例及实施例中的特征可以相互组合。
盲源分离算法有三种基本的模型:瞬时混合模型,无混响混合模型和卷积混合模型。我们这里假设语音信号的混合是瞬时的,即不同的信号到达各个麦克风的时间差别可以忽略不计。麦克风接收到的信号是源信号的线性混合,可以用公式表示为:
Figure PCTCN2020087639-appb-000001
其中,j=1,...,M表示源信号的个数,i=1,...,N表示麦克风的个数。表达式(1)可以用矩阵和向量的形式表示为:
x(n)=As(n)     (2)
其中,x(n)=[x 1(n),...,x N(n)] T是混合信号向量,s(n)=[s 1(n),...,s M(n)] T是源信号向量,A为N×M的混合矩阵。盲源分离问题即在源信号s(n)和混合矩阵A未知的情况下,仅根据麦克风接受到的混合信号x(n),来对s(n)和A进行估计。
在解混合系统中,我们需要求解分离矩阵W,使得W对混合信号x(n)的线性变化:
y(n)=Wx(n)=WAs(n)     (3)
为对源信号s(n)的估计,从而完成混合信号的分离。其中,y(n)=[y 1(n),...,y M(n)] T是分离出来的信号向量,即对s(n)的估计。在理想的情况下,W×A为单位阵,分离矩阵W可以表示为混合矩阵A的逆。
实际上,由于FDBSS算法是在每个频点上独立进行ICA迭代得到分离矩阵,我们无法保 证对第一个麦克风接收到的信号进行分离后得到第一个源信号,需要对分离出的信号解决排列和幅值不确定性问题,使分离出来的信号接近原始的源信号。
随着对BSS问题更加深入的研究,产生了应用于不同场景的各种各样的算法。其中主要包括独立成分分析算法、稀疏成分分析算法以及非负矩阵分解算法。由于时域求解分离矩阵的复杂度高而且难以收敛,我们需要转换到频域进行求解。本文使用量化自然梯度算法来产生良好的分离矩阵。
量化自然梯度算法是在Infomax算法上改进而来的,Infomax算法从信息论的角度出发,使用非线性函数对分离矩阵进行变换,通过最大化输出熵完成分离。采用量化自然梯度算法计算分离矩阵的迭代公式可以表示为:
Figure PCTCN2020087639-appb-000002
Figure PCTCN2020087639-appb-000003
Figure PCTCN2020087639-appb-000004
其中,b表示量化因子,μ表示学习步长,I表示单位矩阵。根据语音信号的超高斯分布特性,非线性函数选为:
Figure PCTCN2020087639-appb-000005
其中,η是调节非线性增益的因子,θ(·)表示辐角。
实施例一
传统的频域盲源分离算法在进行混合信号的分离时,如果分离矩阵没有经过良好的初始化,在迭代的过程中,分离矩阵的估计值与实际值之间的误差越来越大,使得算法发散很难快速的收敛到全局最优,并导致最终的分离性能不理想。另一方面,传统的频域盲源分离算法需要在每个频点进行迭代求解分离矩阵,其计算量巨大,具有非线性优化收敛缓慢的缺点。对于具有低时延需求的设备来说是非常不适合的。本实施例公开了一种基于分离矩阵初始化频点选择的盲源分离方法,包括以下步骤:
步骤1:获取待分离音频信号,并对所述待分离音频信号进行傅里叶变换。
在麦克风数量等于源信号数量等于2,麦克风之间的距离为15cm的条件下,在频域中源信号向量,混合信号向量和混合矩阵可分别表示为:
s(f,t)=[s 1(f,t),s 2(f,t)] T    (8)
x(f,t)=[x 1(f,t),x 2(f,t)] T     (9)
Figure PCTCN2020087639-appb-000006
其中,τ为延时参数,λ为衰减参数。
Figure PCTCN2020087639-appb-000007
表示在第1个麦克风处观察到的第2个源信号从θ 2方向的到达延时,λ 12表示在第1个麦克风处观察到的第2个源信号从θ 2方向的到达衰减。d为麦克风之间的距离,θ为源信号的DOA,将θ值带入到公式(10)可得:
Figure PCTCN2020087639-appb-000008
步骤2:对不会发生空间混叠的频域范围内频点进行一次ICA(Independent Component Analysis,ICA)迭代,得到分离矩阵;其中,不会发生空间混叠的频域范围根据双耳助听器中两个麦克风之间的距离确定。具体地,所述不会发生空间混叠的频域范围F L可以计算为:
Figure PCTCN2020087639-appb-000009
其中,c为声速,约为340m/s,d为麦克风之间的距离,约为15cm,在本实施例中不发生空间混叠的频率范围为0Hz<f<1133Hz。对该频率范围内的频点进行1次ICA迭代,通过对目标函数(6)进行迭代,可计算分离矩阵为:
Figure PCTCN2020087639-appb-000010
步骤3:基于所述分离矩阵估计各源信号的DOA(Direction of Arrival,DOA)信息。
在阵列信号处理中,估计DOA信息的方法有许多种,其中包括古典谱估计,最小方差估计,多重信号分类估计等。但在盲系统中,由于我们无法知道源信号的先验信息,这些方法通常都无法使用。通过实验发现,盲系统的分离矩阵通常提供针对不期望的源信号的定向零点,即零方向被导向为被抑制源信号的DOA。因此,本实施例需要从分离矩阵的方向性图案中估计各源信号的DOA信息,即从分离矩阵的方向性图案中找到零方向,近似的估计出各源信号的DOA信息。方向性图案通常是通过分离矩阵数组权重和导向矢量的乘积获得的,第l个源信号输出的方向性图案表示为F l(f,θ):
[F 1(f,θ),F 2(f,θ)] T=W(f)e(f,θ)    (14)
其中导向矢量定义为:
Figure PCTCN2020087639-appb-000011
分离矩阵的方向性图案在每个源方向上都包含零点。在麦克风数量等于源信号数量等于2的条件下,在每个频率点处,零方向仅存在于两个特定的方向,这些零方向代表了源信号的DOA信息。通过获得有效频率范围内所有频率点的零方向统计,就可以估计出每个声源的DOA信息。我们可以假设较小的角度对应第一个声源的波达方向,较大的角度对应第二个声源的波达方向,则第l个源信号的DOA估计被定义为:
Figure PCTCN2020087639-appb-000012
其中,N为有效频率范围内的频点数量,θ l(f m)代表第l个源信号在第m个频率点处的DOA信息的估计值:
Figure PCTCN2020087639-appb-000013
Figure PCTCN2020087639-appb-000014
其中,max[x,y](min[x,y])是代表求两个数之间最大值和最小值的函数。
DOA估计在本实施例中所起到的作用重大,一方面,使用DOA的估计值来初始化分离矩阵;另一方面,使用DOA估计值解决排列顺序不确定性问题;最后还需要使用DOA的估计值来计算未选频点的分离矩阵;可见DOA估计值的准确性直接影响算法的稳定性和收敛性。在仿真实验设置下,图2(a)-图2(b)画出了无混响环境下对应源信号位置为(2,3)的一次实验中源信号的方向性图案和DOA估计值,图3(a)-图3(b)画出了在RT=100ms的条件下对应源信号位置为(2,3)的一次实验中源信号的方向性图案和DOA估计值。从图2和图3中的(a)我们可以看出,无论是在混响条件下还是在非混响条件下都可以使用分离矩阵的方向性图案来估计源信号的DOA信息。源信号位置为(2,3)时对应信号的入射角度为(30°,0°)。由于麦克风的间距为15cm,在高频区域发生了空间混叠,无法正确的估计源信号的DOA如图2(b)和图3(b)所示,所以我们只能使用有效频率范围内频点的DOA来初始化分离矩阵。
步骤4:在整个频域范围内的每个频点,均计算混合信号协方差矩阵行列式,选择行列式大于设定值的频点归入初选频点集合,即完成一次频点选择。
在整个频率范围内,我们将混合信号协方差矩阵的行列式作为选择标准,选择具有较高能量的频点。假设在某一频点处只有一个源信号存在,则混合信号协方差矩阵不满秩,其行列式为零。相反,如果有两个源信号存在,则混合信号协方差矩阵的是满秩的,其行列式不为零。在2个麦克风2个源信号的条件下,行列式的确可以描述源信号的数量。混合信号协 方差矩阵的计算公式为:
Figure PCTCN2020087639-appb-000015
其中,R s(f)为源信号的协方差矩阵。我们假设各个源信号之间是相互独立的,此时源信号的协方差矩阵表示为:
Figure PCTCN2020087639-appb-000016
其中,p 1(f)和p 2(f)分别表示第一个源信号和第二个源信号的功率,则协方差矩阵的行列式可以表示为:
Figure PCTCN2020087639-appb-000017
在整个频率区域内的每个频点上计算混合信号协方差矩阵的行列式并除以最大值进行归一化处理,对应较大行列式值的频点将会被选中,并归入初选频点集合,没有被选中的频点归入未选频点集合。
步骤5:使用源信号的DOA信息进行初始化,得到初始分离矩阵。
本实施例使用从分离矩阵中获得的DOA信息构建零波束形成器,形成初始分离矩阵W ini(f)。W ini(f)的第ij个元素被写为
Figure PCTCN2020087639-appb-000018
由于零波束形成器会将不期望的源信号方向的增益置为零,对于
Figure PCTCN2020087639-appb-000019
我们假设其观测方向是
Figure PCTCN2020087639-appb-000020
零方向被指向
Figure PCTCN2020087639-appb-000021
对于
Figure PCTCN2020087639-appb-000022
我们假设其观测方向是
Figure PCTCN2020087639-appb-000023
零方向被指向
Figure PCTCN2020087639-appb-000024
在这种假设条件下,初始分离矩阵W ini(f m)满足如下方程:
Figure PCTCN2020087639-appb-000025
其中,f m表示任意初选频点的频率,I 2×2是一个2×2的单位阵,从式子(22)我们可以得到:
Figure PCTCN2020087639-appb-000026
步骤6:采用初始分离矩阵对初选频点进行ICA迭代,得到初选频点的分离矩阵,并再次估计源信号的DOA信息。
本实施例中使用从DOA中提取出来的精确的初始分离矩阵对初选频点按照公式(13)进行 迭代,迭代完成后从获得的分离矩阵中再次估计源信号的DOA信息,用于解决信号排列顺序不确定性问题,并用于计算未选频点的分离矩阵,完成未选频点的分离。
步骤7:对各源信号的DOA信息进行离群点检测,将检测得到的离群点移入未选频点集合,完成二次频点选择。
第一阶段频点选择可能会选到少数分离性能不好的频点和发生空间混叠的频点,从中提取的DOA信息是不准确的。由于不准确的DOA信息与实际值偏差较大,我们将其认定为离群点,为了找到这些离群点,我们对各源信号的DOA信息进行频数统计,使用了基于正态分布的离群点检测方法来进行第二阶段频点选择。在正态分布概率3σ原则下,如果数据遵循正态分布,由于数据出现在3σ之外的概率小于0.003,离群点被定义为偏离平均值μ超过3σ的数据。将初选频点集合中的离群点移除,并归入未选频点集合,保证了DOA信息的准确性。
在实验设置下,某次实验中估计出的其中一个源信号的DOA信息如图4所示,其对应的源信号真实的入射角为0°,从图中我们可以看出其直方图分布类似于正态分布,偏离平均值0°较大角度的频点都视为离群点,应该被归为未选频点。对于初选频点,将每个源信号的DOA信息都用这种方法进行离群点检测,将检测出的离群点归入到未选频点集合,剩余的频点为最终选择的频点。终选频点集合中第l个源信号的DOA的平均值可以计算为:
Figure PCTCN2020087639-appb-000027
其中,N f为最终选择频点的数量。
步骤8:基于离群点移除后的DOA信息构建混合矩阵,根据混合矩阵求解未选择频点的分离矩阵。
我们使用从终选频点分离矩阵中估计出的每个源信号的DOA信息的平均值来计算未选频点的分离矩阵。混合矩阵可以用源信号的DOA表示为:
Figure PCTCN2020087639-appb-000028
其中,θ 1和θ 2分别是从第1个源信号和第2个源信号的DOA估计值。与零点波束形成原理类似,未选频点的分离矩阵可以通过对混合矩阵求逆得到:
W us(f)=inv(A(f))     (26)
其中,W us(f)是未选频点分离矩阵,inv(·)表示对矩阵求逆。
步骤9:采用估计信号DOA信息的方法来解决排列不确定性问题。
本实施例从选择频点的方向性图案中,聚集所有零方向指向s 1(f,t)的方向性图案。此外,也聚集所有零方向指向s 2(f,t)的方向性图案。通过执行此过程,使得不同频点处分离出来的两个信号各自对应的DOA相同,从而可以解决排列不确定性问题。
在实验设置下进行了仿真实验,图5画出了解决排列不确定性问题之前在第35个频点处对应源信号位置为(2,3)的一次实验中源信号的方向性图案,图6画出了解决排列不确定性问题之后在第35个频点处对应源信号位置为(2,3)的一次实验中源信号的方向性图案。在理想情况下,第一个源信号的DOA为30°,第二个源信号的DOA为0°。从图4-7中我们可以看出,第一个源信号s 1(f,t)对应的角度为0°,第二个源信号s 2(f,t)对应的角度为30°,发生了排列顺序错乱问题。利用源信号的DOA信息进行聚类的方法很好的解决了排列不确定性问题如图6所示,使得在不同频点处对同一混合信号的分离结果保持一致。
步骤10:采用最小失真原则解决幅度不确定性问题。
频域BSS算法存在的另一个重要问题是幅度不确定性,对所有频点的分离矩阵W(f)使用最小失真原则解决幅度不确定性问题,即对分离矩阵做如下变换:
W(f)=diag(W -1(f))W(f)    (27)
其中,diag(·)表示取主对角线上的元素。
解决完幅度不确定性问题后,每个频点上的初始分离信号可以表示为:
y(f,t)=W(f)x(f,t)      (28)
步骤11:根据所有频点的分离矩阵进行傅里叶逆变换,重构得到分离信号。
实施例二
本实施例的目的是提供一种基于分离矩阵初始化频点选择的盲源分离系统。所述系统包括:
数据获取模块,获取待分离音频信号;
数据预处理模块,将所述待分离音频信号转换到频域;
DOA信息估计模块,对不会发生空间混叠的频域范围内频点进行一次ICA迭代,得到分离矩阵,并基于所述分离矩阵估计各源信号的DOA信息;
频点一次选择模块,在整个频域范围内的每个频点,根据混合信号协方差矩阵的行列式进行频点选择,归入初选频点集合;
分离矩阵初始化模块,对初选频点进行ICA迭代,并使用源信号的DOA信息进行初始化,得到初始分离矩阵;
选择频点分离模块,采用初始分离矩阵对初选频点进行ICA迭代,得到初选频点的分离矩阵,并再次估计源信号的DOA信息;
频点二次选择模块,根据各源信号的DOA信息进行离群点检测,将检测得到的离群点移除,完成二次频点选择;其中,所述离群点检测采用基于正态分布的离群点检测方法;
未选择频点分离模块,基于再次估计的DOA信息构建未选择频点的分离矩阵;
信号重构模块,根据所有频点的分离矩阵进行傅里叶逆变换,重构得到分离信号。
实施例三
本实施例的目的是提供一种双耳助听系统,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现如实施例一种所述的基于分离矩阵初始化频点选择的盲源分离方法。
以上实施例二和三中涉及的各步骤与方法实施例一相对应,具体实施方式可参见实施例一的相关说明部分。
仿真实验
仿真实验所用到的混响房间设置如图7所示,房间的规格为5.73m*3.56m*2.7m,两个麦克风之间的距离为15cm,高度为1.35m。本文中设置语音信号可以从5个不同角度入射,在两个源信号两个麦克风的情况下,设置了4种从不同角度入射的仿真实验,它们对应的角度分别为(30°,0°),(30°,-40°),(30°,-80°),(70°,-80°),对应源信号的位置分别为(2,3),(2,4),(2,5),(1,5)。实验中用到的源信号都是从开放语音库VoxForg中挑选的英文男声和女声,并将它们处理成长为3s的语音信号,确保实验数据的一致性。
麦克风接收到的信号是源语音信号与传感器及周围环境共同作用产生的冲激响应的卷积结果。本文使用镜像源的方法产生房间冲击响应。混响时间(Reverberation Time,RT)定义为语音信号能量衰减到60dB时所需要的时间,真实环境下可以通过改变墙,地面,天花板的材质来间接的改变反射和吸收系数从而得到不同的RT。当RT=0ms时,源信号不与房间冲击响应卷积,只考虑信号直达路径上的衰减和延迟,模拟信号在无混响环境下的混合。当RT>0ms时,将语音信号与房间冲激响应进行卷积来模拟混响环境下的混合过程。本发明中将设置不同的RT进行仿真实验。仿真实验中用到的语音信号的采样频率为16Khz,帧长为512,帧移为256,使用汉明窗进行短时傅里叶变换。所有的仿真实验是在CPU为Intel(R)Xeon(R)E5-2643 v4@3.40GHz内存为128.0GHz的电脑上完成的,软件平台为MATLAB 2015b。
无混响混合模型非常简单,只需要设置好源信号和麦克风的相对位置就可以,麦克风接收到的信号只是源信号简单的一阶加权求和,也就是房间冲击响应的抽头数为1的情况,所 以混合矩阵的幅值响应与频率无关,相位响应与频率成线性关系,因此相对衰减和延迟参数的实际值在任意频点上都是相等的。设置RT=0ms,图8画出了一次实验中第一个源信号到第一个麦克风的房间冲激响应。
卷积混合模型相对比较复杂,麦克风接收到的信号是源信号与房间冲击响应的卷积,冲击响应的抽头数越多,说明房间的混响程度越严重,直达路径的源信号对冲击响应的贡献越小,想要正确的分离出源信号也就越困难,所以分离性能会降低。我们设置不同的RT进行实验。当RT=100ms时,第一个源信号对第一麦克风的房间冲击响应如图9所示,当RT=200ms时,第一个源信号对第一麦克风的房间冲击响应如图10所示。我们可以看到,随着RT的增加,房间冲击响应的抽头数量增加。
我们使用程序的运行时间作为衡量计算复杂度的指标。其次,使用噪声降低率(Noise reduction rate,NRR)的平均值作为衡量分离效果的性能指标。噪声降低率(NRR)被定义为以dB为单位的输出信噪比(SNR)减去输入信噪比(SNR)。噪声降低率的值越大,说明分离效果越好,即恢复出来的分离信号更接近原始的源信号。该指标的计算公式如下:
Figure PCTCN2020087639-appb-000029
其中
Figure PCTCN2020087639-appb-000030
代表第l个源信号的输出信噪比,
Figure PCTCN2020087639-appb-000031
代表第l个源信号的输入信噪比,H ij(f)是矩阵H(f)=W(f)A(f)的第i行第j列的元素,W(f)是最终获得的分离矩阵,混合矩阵A(f)是对频域表示的房间脉冲响应的描述。
初选频点的数量
在第一阶段频点选择方案中,初始选择频点的数量要综合考虑复杂度降低的效果和算法整体的分离性能。初始选择频点的数目不能太多,否则复杂度降低的效果就会下降。同时初始选择频点的数目不能过小,否则估计的归一化衰减延迟参数可能不准确,未选频点的分离性能可能会下降。
一方面,我们进行了920次实验,混合信号协方差矩阵行列式的平均值随频率的变化曲线如图11所示,它在一定程度上可以很好的反应语音信号能量的分布。由于语音信号的能量集中分布在低频区域,可以期望这些频点的分离性能较好。另一方面,根据短时傅里叶变换的参数设置,总的频点数为256,我们分别执行了920次实验,平均初选频点数量随阈值的变化曲线如图12所示。从图中我们可以明显的看到,初选频点的数量随阈值的增加而增加。 可以预期到该算法的分离性能也随选择频点数量的增加而增加。该算法可以根据需要设置不同的阈值,满足不同的性能需求。
无混响环境下的实验结果和性能比较
为了证明本文提出的分离矩阵初始化频点选择FDBSS算法的有效性,当RT=0ms时,我们在无混响环境下进行了仿真实验。
首先,我们比较了本文提出的分离矩阵初始化频点选择FDBSS算法与传统FDBSS算法的分离性能。根据实验设置,在4对不同信号到达方向下分别进行了1000次实验,总共进行了4000次实验。对于本文提出的算法,我们设置阈值ε=0.1。表1显示了本文提出分离矩阵初始化频点选择FDBSS算法与传统FDBSS算法在4对不同信号到达方向下NRR和运行时间的比较。表1中的值是1000次实验结果的平均值。
表1.在4对不同信号到达方向下两种算法的RNN和运行时间的比较
Figure PCTCN2020087639-appb-000032
与传统FDBSS算法相比,本文提出的分离矩阵初始化频点选择FDBSS算法的初选频点的数量约占总频点数量的4.81%,运行时间减少了84.4%,性能指标NRR增长了44.16%。也就是说,本文所提出的算法不仅大大降低了计算复杂度,而且显着提高了分离性能。
其次,我们比较了不同迭代次数下本文提出的分离矩阵初始化频点选择FDBSS算法与传统FDBSS算法的分离性能。如图13(a)-13(b)所示,每个迭代次数下是4000次实验结果的平均值。从图中我们可以看出,本文所提出的算法不仅提升了分离性能,而且加快了算法的收敛速度。在迭代10次左右就能达到很好的收敛,收敛速度约为传统算法的10倍左右。这是由于在分离矩阵迭代学习的早期,我们使用源信号的DOA信息来初始化分离矩阵,能够对分离矩阵进行更准确的更新。
接下来,我们分析了算法分离性能提升和计算复杂度降低的原因。由于分离矩阵的迭代计算和排序不确定性的求解是传统FDBSS算法的主要复杂度所在,本文提出的分离矩阵初始化频点选择FDBSS算法通过对这两个方面的改进大大降低了计算复杂度。一方面,我们知道 在初始分离矩阵不理想的情况下迭代算法很难收敛到全局最优,所以我们对分离矩阵进行初始化,加快算法的收敛速度。其次,我们只选择少数具有良好分离性能的频点进行ICA迭代,大多数未选频点的分离矩阵计算简单,不需要进行ICA迭代。另外,我们只需要解决初选频点的排序不确定性问题,未选频点的分离矩阵是由排列好的DOA参数估计得到的,不具有排序不确定性问题。因此,计算复杂度再次降低。
根据实验设置,在不同的频点选择阈值下,在4对不同的信号到达方向上分别进行了200次实验。dN和运行时间下降百分比用于表示本文所提出的分离矩阵初始化频点选择FDBSS算法与传统FDBSS算法之间在性能指标NRR和运行时间上的差异。图14(a)-14(b)显示了在4对不同信号到达方向下dN和运行时间下降百分比随阈值的变化曲线。从图14可以清楚地看到,随着阈值的增加所选频点数量的逐渐减少,分离性能出现了先上升后下降的趋势,说明选择频点的数量太多或太少都不好,在阈值为0.1左右性能指标NRR有全局最高点,此时程序的运行时间下降90%左右。
混响环境下的实验结果和性能比较
为了比较本文所提出的分离矩阵初始化频点选择FDBSS算法与传统FDBSS算法在混响环境下的分离性能,我们设置不同的RT进行实验。
根据实验设置,当RT=100ms时,我们设置阈值ε=0.1并进行了4000次实验。表2示出了本文所提出的分离矩阵初始化频点选择FDBSS算法和传统FDBSS算法的NRR和运行时间的比较。表2中的值是1000次实验结果的平均值。
表2.在4对不同信号到达方向下两种算法的NRR和运行时间的比较
Figure PCTCN2020087639-appb-000033
从表2可以清楚地看出,当RT=100ms时,本文所提出的分段频点选择FDBSS算法的分离性能优于传统的FDBSS算法。在运行时间降低了88.68%的情况下,NRR增长了3.03%。即在混响环境下本文所提出的算法也优于传统的算法。
其次,在RT=100ms的条件下,我们比较了不同迭代次数下本文提出的分离矩阵初始化频点选择FDBSS算法与传统FDBSS算法的分离性能。如图15(a)-15(b)所示,每个迭代次数下是4000次实验结果的平均值。从图中我们可以看出,在混响的环境下,本文所提出的算法不仅提升了分离性能,而且加快了算法的收敛速度。在迭代10次左右就能达到很好的收敛,本文提出算法的NRR指标为20dB左右,而传统算法的NRR指标只有2dB左右。除此之外,本文提出算法的运行时间也比传统算法运行时间低很多。即在混响环境下本文提出的算法分离性能也优于传统算法的分离性能。
以上一个或多个实施例具有以下技术效果:
本公开的一个或多个实施例提出了一种基于分离矩阵初始化的频点选择语音信号快速盲分离方法。首先,对不会发生空间混叠的频域范围内的频点进行一次ICA迭代,得到分离矩阵并从中估计出源信号的DOA信息,接着使用基于混合信号协方差矩阵的行列式在整个频域范围内进行一次频点选择,在频域利用传统的ICA算法进行分离时,如果分离矩阵没有经过良好的初始化,算法的收敛和分离性能都不理想。所以我们利用源信号的DOA信息对已选择的各个频点的分离矩阵进行初始化,然后进行ICA迭代获得分离矩阵。其次,由于一次频点选择有可能选到分离性能不好的频点,我们引入了基于离群点检测的第二阶段频点选择,以确保DOA信息的准确性,移除的离群点归入未选频点集合。接下来,使用从终选频点中获得的DOA信息的平均值来构建未选择频点的分离矩阵并解决排序不确定性问题。最后,对所有频点的分离矩阵解决幅值不确定问题,完成混合信号的初步分离。
上述技术方案提供了一种适用于双耳助听系统的盲源分离方法,采用分离矩阵初始化,减小了迭代次数,加快算法的收敛速度;
采用两阶段频点选择算法用于选择具有良好分离性能的频点,减少了执行ICA迭代的频点数量,从而减小了计算分离矩阵的计算量;
无论是在无混响还是在混响环境中,与传统FDBSS算法相比,所提出的分离矩阵初始化频点选择FDBSS方法运行时间均明显缩短,同时,分离性能得到了提升。
本领域技术人员应该明白,上述本公开的各模块或各步骤可以用通用的计算机装置来实现,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。本公开不限制于任何特定的硬件和软件的结合。
以上所述仅为本公开的优选实施例而已,并不用于限制本公开,对于本领域的技术人员来说,本公开可以有各种更改和变化。凡在本公开的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本公开的保护范围之内。
上述虽然结合附图对本公开的具体实施方式进行了描述,但并非对本公开保护范围的限制,所属领域技术人员应该明白,在本公开的技术方案的基础上,本领域技术人员不需要付出创造性劳动即可做出的各种修改或变形仍在本公开的保护范围以内。

Claims (10)

  1. 一种基于分离矩阵初始化频点选择的盲源分离方法,其特征在于,包括以下步骤:
    获取待分离音频信号,并对所述待分离音频信号进行傅里叶变换;
    对不会发生空间混叠的频域范围内频点进行一次ICA迭代,得到分离矩阵,并基于所述分离矩阵估计各源信号的DOA信息;
    在整个频域范围内的每个频点,根据混合信号协方差矩阵的行列式进行频点选择,归入初选频点集合;
    使用源信号的DOA信息进行初始化,得到初始分离矩阵;然后采用初始分离矩阵对初选频点进行ICA迭代,得到初选频点的分离矩阵,并再次估计源信号的DOA信息;
    基于再次估计的DOA信息解决排列顺序不确定性问题并构建未选择频点的分离矩阵;
    根据所有频点的分离矩阵进行傅里叶逆变换,重构得到分离信号。
  2. 如权利要求1所述的基于分离矩阵初始化频点选择的盲源分离方法,其特征在于,所述根据混合信号协方差矩阵的行列式进行频点选择包括:对于整个频域范围内的每个频点,均计算混合信号协方差矩阵行列式并进行归一化,选择归一化后的行列式值大于设定值的频点归入初选频点集合,其余频点归入未选频点集合。
  3. 如权利要求1所述的基于分离矩阵初始化频点选择的盲源分离方法,其特征在于,基于所述分离矩阵估计各源信号的DOA信息包括:对于每个频点,通过相应分离矩阵的数组权重和导向矢量相乘获得方向性图案;对各方向性图案中的零方向进行统计,估计各源信号的DOA信息。
  4. 如权利要求1所述的基于分离矩阵初始化频点选择的盲源分离方法,其特征在于,再次估计源信号的DOA信息后,还根据各源信号的DOA信息进行离群点检测,将检测得到的离群点移除,完成二次频点选择;其中,所述离群点检测采用基于正态分布的离群点检测方法。
  5. 如权利要求4所述的基于分离矩阵初始化频点选择的盲源分离方法,其特征在于,基于再次估计的DOA信息构建未选择频点的分离矩阵包括:
    基于离群点移除后的DOA信息构建混合矩阵;
    对混合矩阵求逆得到未选择频点的分离矩阵。
  6. 如权利要求4所述的基于分离矩阵初始化频点选择的盲源分离方法,其特征在于,解决排列不确定性问题的方法为:对于已选择频点的方向性图案,根据零方向的指向将各源信号进行聚集,使得不同频点处分离出来的各源信号各自对应的DOA相同。
  7. 如权利要求1所述的基于分离矩阵初始化频点选择的盲源分离方法,其特征在于,对所有频点的分离矩阵使用最小失真原则解决幅度不确定性问题。
  8. 一种基于分离矩阵初始化频点选择的盲源分离系统,其特征在于,包括:
    数据获取模块,获取待分离音频信号;
    数据预处理模块,将所述待分离音频信号转换到频域;
    DOA信息估计模块,对不会发生空间混叠的频域范围内频点进行一次ICA迭代,得到分离矩阵,并基于所述分离矩阵估计各源信号的DOA信息;
    频点一次选择模块,在整个频域范围内的每个频点,根据混合信号协方差矩阵的行列式进行频点选择,归入初选频点集合;
    分离矩阵初始化模块,对初选频点进行ICA迭代,并使用源信号的DOA信息进行初始化,得到初始分离矩阵;
    频点分离模块,采用初始分离矩阵对初选频点进行ICA迭代,得到初选频点的分离矩阵,并再次估计源信号的DOA信息;基于再次估计的DOA信息构建未选择频点的分离矩阵;
    信号重构模块,根据所有频点的分离矩阵进行傅里叶逆变换,重构得到分离信号。
  9. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现如权利要求1-7任一项所述的基于分离矩阵初始化频点选择的盲源分离方法。
  10. 一种双耳助听系统,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现如权利要求1-7任一项所述的基于分离矩阵初始化频点选择的盲源分离方法。
PCT/CN2020/087639 2020-03-10 2020-04-29 一种基于分离矩阵初始化频点选择的盲源分离方法及系统 WO2021179416A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010161022.1 2020-03-10
CN202010161022.1A CN111415676B (zh) 2020-03-10 2020-03-10 一种基于分离矩阵初始化频点选择的盲源分离方法及系统

Publications (1)

Publication Number Publication Date
WO2021179416A1 true WO2021179416A1 (zh) 2021-09-16

Family

ID=71492893

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/087639 WO2021179416A1 (zh) 2020-03-10 2020-04-29 一种基于分离矩阵初始化频点选择的盲源分离方法及系统

Country Status (2)

Country Link
CN (1) CN111415676B (zh)
WO (1) WO2021179416A1 (zh)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114220453A (zh) * 2022-01-12 2022-03-22 中国科学院声学研究所 基于频域卷积传递函数的多通道非负矩阵分解方法及系统
CN114333897A (zh) * 2022-03-14 2022-04-12 青岛科技大学 基于多信道噪声方差估计的BrBCA盲源分离方法
CN116935883A (zh) * 2023-09-14 2023-10-24 北京探境科技有限公司 声源定位方法、装置、存储介质及电子设备
CN117560663A (zh) * 2024-01-12 2024-02-13 数海信息技术有限公司 一种基于5g消息的信息交互方法及系统
CN117609746A (zh) * 2023-11-22 2024-02-27 江南大学 一种基于机器学习和聚类算法的盲源分离估计方法
CN117609746B (zh) * 2023-11-22 2024-06-07 江南大学 一种基于机器学习和聚类算法的盲源分离估计方法

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112285641B (zh) * 2020-09-16 2023-12-29 西安空间无线电技术研究所 一种基于ica的波达方向doa的估计方法及装置
CN112349292B (zh) * 2020-11-02 2024-04-19 深圳地平线机器人科技有限公司 信号分离方法和装置、计算机可读存储介质、电子设备
CN112633427B (zh) * 2021-03-15 2021-05-28 四川大学 一种基于离群点检测的超高次谐波发射信号检测方法
CN113660594B (zh) * 2021-08-21 2024-05-17 武汉左点科技有限公司 一种助听系统自调节降噪方法及装置
CN113804981B (zh) * 2021-09-15 2022-06-24 电子科技大学 一种时频联合最优化多源多信道信号分离方法
CN113783813B (zh) * 2021-11-11 2022-02-08 煤炭科学技术研究院有限公司 5g通信信号干扰的处理方法、装置、电子设备及存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090222262A1 (en) * 2006-03-01 2009-09-03 The Regents Of The University Of California Systems And Methods For Blind Source Signal Separation
CN108735227A (zh) * 2018-06-22 2018-11-02 北京三听科技有限公司 一种用于对麦克风阵列拾取的语音信号进行声源分离的方法及系统
CN109616138A (zh) * 2018-12-27 2019-04-12 山东大学 基于分段频点选择的语音信号盲分离方法和双耳助听系统

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007033804A (ja) * 2005-07-26 2007-02-08 Kobe Steel Ltd 音源分離装置,音源分離プログラム及び音源分離方法
CN101667425A (zh) * 2009-09-22 2010-03-10 山东大学 一种对卷积混叠语音信号进行盲源分离的方法
CN106057210B (zh) * 2016-07-01 2017-05-10 山东大学 双耳间距下基于频点选择的快速语音盲源分离方法
CN108364659B (zh) * 2018-02-05 2021-06-01 西安电子科技大学 基于多目标优化的频域卷积盲信号分离方法
CN110010148B (zh) * 2019-03-19 2021-03-16 中国科学院声学研究所 一种低复杂度的频域盲分离方法及系统

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090222262A1 (en) * 2006-03-01 2009-09-03 The Regents Of The University Of California Systems And Methods For Blind Source Signal Separation
CN108735227A (zh) * 2018-06-22 2018-11-02 北京三听科技有限公司 一种用于对麦克风阵列拾取的语音信号进行声源分离的方法及系统
CN109616138A (zh) * 2018-12-27 2019-04-12 山东大学 基于分段频点选择的语音信号盲分离方法和双耳助听系统

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HIROSHI SARUWATARI , TOSHIYA KAWAMURA , TSUYOKI NISHIKAWA , AKINOBU LEE , KIYOHIRO SHIKANO: "Blind Source Separation Based on a Fast-Convergence Algorithm Combining ICA and Beamforming", IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, vol. 14, no. 2, 1 March 2006 (2006-03-01), pages 666 - 678, XP008131945, ISSN: 1558-7916, DOI: 10.1109/TSA.2005.855832 *
LIU BAIYUN; WEI YING: "A fast blind source separation algorithm for binaural hearing aids based on frequency bin selection", 2018 IEEE 23RD INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 19 November 2018 (2018-11-19), pages 1 - 5, XP033512516, ISSN: 2165-3577, DOI: 10.1109/ICDSP.2018.8631688 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114220453A (zh) * 2022-01-12 2022-03-22 中国科学院声学研究所 基于频域卷积传递函数的多通道非负矩阵分解方法及系统
CN114333897A (zh) * 2022-03-14 2022-04-12 青岛科技大学 基于多信道噪声方差估计的BrBCA盲源分离方法
CN116935883A (zh) * 2023-09-14 2023-10-24 北京探境科技有限公司 声源定位方法、装置、存储介质及电子设备
CN116935883B (zh) * 2023-09-14 2023-12-29 北京探境科技有限公司 声源定位方法、装置、存储介质及电子设备
CN117609746A (zh) * 2023-11-22 2024-02-27 江南大学 一种基于机器学习和聚类算法的盲源分离估计方法
CN117609746B (zh) * 2023-11-22 2024-06-07 江南大学 一种基于机器学习和聚类算法的盲源分离估计方法
CN117560663A (zh) * 2024-01-12 2024-02-13 数海信息技术有限公司 一种基于5g消息的信息交互方法及系统
CN117560663B (zh) * 2024-01-12 2024-03-12 数海信息技术有限公司 一种基于5g消息的信息交互方法及系统

Also Published As

Publication number Publication date
CN111415676B (zh) 2022-10-18
CN111415676A (zh) 2020-07-14

Similar Documents

Publication Publication Date Title
WO2021179416A1 (zh) 一种基于分离矩阵初始化频点选择的盲源分离方法及系统
CN107452389B (zh) 一种通用的单声道实时降噪方法
CN109616138B (zh) 基于分段频点选择的语音信号盲分离方法和双耳助听系统
CN107703486B (zh) 一种基于卷积神经网络cnn的声源定位方法
US8363850B2 (en) Audio signal processing method and apparatus for the same
US9654894B2 (en) Selective audio source enhancement
US9570087B2 (en) Single channel suppression of interfering sources
WO2021179424A1 (zh) 结合ai模型的语音增强方法、系统、电子设备和介质
US20220068288A1 (en) Signal processing apparatus, signal processing method, and program
WO2020224226A1 (zh) 基于语音处理的语音增强方法及相关设备
Cord-Landwehr et al. Monaural source separation: From anechoic to reverberant environments
WO2019014890A1 (zh) 一种通用的单声道实时降噪方法
WO2015129760A1 (ja) 信号処理装置、方法及びプログラム
Pujol et al. BeamLearning: An end-to-end deep learning approach for the angular localization of sound sources using raw multichannel acoustic pressure data
CN110544490A (zh) 一种基于高斯混合模型和空间功率谱特征的声源定位方法
JP6748304B2 (ja) ニューラルネットワークを用いた信号処理装置、ニューラルネットワークを用いた信号処理方法及び信号処理プログラム
Aroudi et al. Dbnet: Doa-driven beamforming network for end-to-end reverberant sound source separation
JP5911101B2 (ja) 音響信号解析装置、方法、及びプログラム
JP6538624B2 (ja) 信号処理装置、信号処理方法および信号処理プログラム
Fu et al. Sparse modeling of the early part of noisy room impulse responses with sparse bayesian learning
Higuchi et al. Unified approach for audio source separation with multichannel factorial HMM and DOA mixture model
Dwivedi et al. Joint doa estimation in spherical harmonics domain using low complexity cnn
CN112802490A (zh) 一种基于传声器阵列的波束形成方法和装置
CN116052702A (zh) 一种基于卡尔曼滤波的低复杂度多通道去混响降噪方法
Aroudi et al. DBNET: DOA-driven beamforming network for end-to-end farfield sound source separation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20924341

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20924341

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 07.06.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20924341

Country of ref document: EP

Kind code of ref document: A1