CN117037836B

CN117037836B - Real-time sound source separation method and device based on signal covariance matrix reconstruction

Info

Publication number: CN117037836B
Application number: CN202311278673.9A
Authority: CN
Inventors: 朱世强; 肖永雄; 宛敏红; 宋伟; 付强; 李特; 顾建军
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2023-10-07
Filing date: 2023-10-07
Publication date: 2023-12-29
Anticipated expiration: 2043-10-07
Also published as: CN117037836A

Abstract

A real-time sound source separation method and device based on signal covariance matrix reconstruction. The method includes: when multiple sound source signals are detected, the exponential smoothing method is used to calculate the covariance matrix of the mixed signal; and the covariance matrix of the mixed signal is characterized Value decomposition, using eigenvalues to calculate the noise power of different frequency components; using the subspace and theoretical steering vectors composed of the main eigenvalue vectors to calculate the steering vector of the sound source; using the eigenvalue vectors and eigenvalues of the mixed signal covariance matrix, calculating The inverse matrix of the mixed signal covariance matrix; use the inverse matrix of the mixed signal covariance matrix and the steering vector of the sound source to calculate the power of each sound source, and reconstruct the signal covariance of each sound source according to the definition of the covariance matrix Matrix; use the signal covariance matrix of each sound source and the theoretical guidance vector of the signal to calculate the separation coefficient matrix; obtain the separated sound source signal based on the mixed sound signal vector and the separation coefficient matrix.

Description

Real-time sound source separation method and device based on signal covariance matrix reconstruction

技术领域Technical field

本发明涉及阵列声源信号处理领域，特别涉及一种基于信号协方差矩阵重构的实时声源分离方法与装置。The invention relates to the field of array sound source signal processing, and in particular to a real-time sound source separation method and device based on signal covariance matrix reconstruction.

背景技术Background technique

在具有背景噪声、混响和多说话人干扰的复杂声学场景中，人耳具备提取目标说话人声音的能力，被称为“鸡尾酒会效应”。然而，这对于机器人来说，是一项困难的任务，是声源信号处理领域的难题。利用信号处理方法，将一个或多个目标说话人的声源信号从混合声源信号中提取出来的技术，称为声源分离技术。The human ear’s ability to extract the target speaker’s voice in complex acoustic scenes with background noise, reverberation, and interference from multiple speakers is known as the “cocktail party effect.” However, this is a difficult task for robots and a difficult problem in the field of sound source signal processing. The technology of extracting the sound source signals of one or more target speakers from the mixed sound source signals using signal processing methods is called sound source separation technology.

声源分离主要包括单通道声源分离和多通道声源分离。基于麦克风阵列的多通道声源分离方法能更好的利用声场的空间信息。因此，无论是采用传统信号处理方法还是深度学习方法，多通道声源分离方法通常能取得比单通道声源分离更好的性能。目前大部分人机交互设备均采用多通道麦克风阵列作为收音硬件。Sound source separation mainly includes single-channel sound source separation and multi-channel sound source separation. The multi-channel sound source separation method based on microphone array can better utilize the spatial information of the sound field. Therefore, whether using traditional signal processing methods or deep learning methods, multi-channel sound source separation methods can usually achieve better performance than single-channel sound source separation. At present, most human-computer interaction devices use multi-channel microphone arrays as sound reception hardware.

理论上，利用基于多通道麦克风阵列的波束成形，可以自适应的完成不同方向上声源信号的分离和提取任务。然而对于多人同时说话的场景，针对不同方向的目标声源信号，准确地估计出其他声源信号构成的干扰信号和噪声信号的协方差矩阵仍然较为困难。此外，由于声场中存在散射和反射、麦克风位置和声源定位等误差的存在，无法获得准确的导向矢量。这两个问题直接影响了波束成形方法的声源分离性能。Theoretically, beamforming based on multi-channel microphone arrays can adaptively complete the separation and extraction of sound source signals in different directions. However, for scenes where multiple people are talking at the same time, it is still difficult to accurately estimate the covariance matrix of interference signals and noise signals composed of other sound source signals for target sound source signals in different directions. In addition, accurate guidance vectors cannot be obtained due to errors such as scattering and reflection in the sound field, microphone position, and sound source localization. These two problems directly affect the sound source separation performance of the beamforming method.

为较准确的估计信号协方差矩阵，一种方法是引入信号存在概率（SignalPresence Probability, SPP）估计。当估计的目标声源的SPP低时，更新信号协方差矩阵；当估计的目标声源的SPP高时，不更新信号协方差矩阵。然而这种方法需要知道先验的声源信号存在概率，并且对于多个声源同时存在的场景，不同声源信号的SPP难以准确估计。In order to estimate the signal covariance matrix more accurately, one method is to introduce signal presence probability (SignalPresence Probability, SPP) estimation. When the estimated SPP of the target sound source is low, the signal covariance matrix is updated; when the estimated SPP of the target sound source is high, the signal covariance matrix is not updated. However, this method needs to know the a priori probability of the existence of sound source signals, and for scenarios where multiple sound sources exist at the same time, it is difficult to accurately estimate the SPP of different sound source signals.

另一种估计信号协方差矩阵的方法是在除目标声源信号所在角度区域外的整个空间区域对Capon谱估计进行积分，对积分获得的协方差矩阵进行特征值分解，再次重构信号协方差矩阵。虽然可以把积分离散为求和来减少计算量，但对于宽频带的声源信号，需要在每个频率上进行多次求和运算、求逆运算和特征值分解运算，计算复杂度非常高，无法满足实时声源信号处理要求。Another method to estimate the signal covariance matrix is to integrate the Capon spectrum estimate in the entire space area except the angular area where the target sound source signal is located, perform eigenvalue decomposition on the covariance matrix obtained by integration, and reconstruct the signal covariance again. matrix. Although the integral can be discretized into a summation to reduce the amount of calculation, for wide-band sound source signals, multiple summation operations, inversion operations, and eigenvalue decomposition operations need to be performed at each frequency, and the computational complexity is very high. It cannot meet the requirements of real-time sound source signal processing.

发明内容Contents of the invention

针对现有技术的不足，本发明提出一种基于信号协方差矩阵重构的实时声源信号分离方法与装置。通过特征值分解估计噪声功率，并利用特征值分解求逆矩阵，利用理论导向矢量在主要特征值向量构成的子空间的投影修正声源的导向矢量，按照协方差矩阵的定义重构每一个声源的协方差矩阵，不但极大的降低了信号协方差矩阵重构的计算量，也显著提高了协方差矩阵的重构精度。In view of the shortcomings of the existing technology, the present invention proposes a real-time sound source signal separation method and device based on signal covariance matrix reconstruction. Estimate the noise power through eigenvalue decomposition, use eigenvalue decomposition to find the inverse matrix, use the projection of the theoretical guidance vector in the subspace composed of the main eigenvalue vectors to correct the guidance vector of the sound source, and reconstruct each sound according to the definition of the covariance matrix. The covariance matrix of the source not only greatly reduces the calculation amount of signal covariance matrix reconstruction, but also significantly improves the reconstruction accuracy of the covariance matrix.

本发明的第一方面提供了基于信号协方差矩阵重构的声源分离方法，包括以下步骤：The first aspect of the present invention provides a sound source separation method based on signal covariance matrix reconstruction, including the following steps:

在检测到多个声源信号时，采用指数平滑方法计算混合信号的协方差矩阵；When multiple sound source signals are detected, the exponential smoothing method is used to calculate the covariance matrix of the mixed signal;

对混合信号的协方差矩阵作特征值分解，利用特征值计算不同频率分量的噪声功率；Perform eigenvalue decomposition on the covariance matrix of the mixed signal, and use the eigenvalues to calculate the noise power of different frequency components;

利用主要特征值向量构成的子空间和理论导向矢量，计算声源的导向矢量；Calculate the guidance vector of the sound source using the subspace composed of the main eigenvalue vectors and the theoretical guidance vector;

利用混合信号协方差矩阵的特征值向量和特征值，计算混合信号协方差矩阵的逆矩阵；Calculate the inverse matrix of the mixed signal covariance matrix using the eigenvalue vector and eigenvalue of the mixed signal covariance matrix;

利用混合信号协方差矩阵的逆矩阵和声源的导向矢量，计算每个声源的功率，并按照协方差矩阵的定义重构每个声源的信号协方差矩阵；Use the inverse matrix of the mixed signal covariance matrix and the steering vector of the sound source to calculate the power of each sound source, and reconstruct the signal covariance matrix of each sound source according to the definition of the covariance matrix;

利用每个声源的信号协方差矩阵和信号的理论导向矢量，计算分离系数矩阵；基于混合声信号向量与分离系数矩阵得到分离后的声源信号。The signal covariance matrix of each sound source and the theoretical steering vector of the signal are used to calculate the separation coefficient matrix; the separated sound source signal is obtained based on the mixed sound signal vector and the separation coefficient matrix.

进一步，在检测到多个声源信号时，采用指数平滑方法计算混合信号的协方差矩阵的公式如下：Furthermore, when multiple sound source signals are detected, the formula for calculating the covariance matrix of the mixed signal using the exponential smoothing method is as follows:

（1） (1)

其中代表时间帧；/>代表频率；/>为指数平滑因子，取值范围为（0，1），取值越大，则历史预测数据的作用越大，当前实际数据的作用越小；in Represents the time frame;/> Represents frequency;/> It is an exponential smoothing factor, with a value range of (0, 1). The larger the value, the greater the role of historical forecast data, and the smaller the role of current actual data;

（2） (2)

为维声信号向量；/>为第/>个麦克风观测到的信号；/>为常规矩阵转置；并且当/>时，/>为：for Dimensional sound signal vector;/> For the first/> Signals observed by microphones;/> is the transpose of a regular matrix; and when/> When,/> for:

（3） (3)

其中为/>阶单位矩阵；/>为随频率变化的正则化参数，取值范围为（0，1），频率越大，取值越小；/>为球面扩散噪声的协方差矩阵，其/>元素为：in for/> Order identity matrix;/> It is a regularization parameter that changes with frequency. The value range is (0, 1). The greater the frequency, the smaller the value;/> is the covariance matrix of spherical diffusion noise, whose/> The elements are:

（4） (4)

其中为相邻麦克风阵元的时延，/>为相邻麦克风阵元的间距，/>为物理量的传播速度，/>为采样函数。in is the delay of adjacent microphone array elements,/> is the spacing between adjacent microphone array elements,/> is the propagation speed of physical quantity,/> is the sampling function.

进一步，对混合信号的协方差矩阵作特征值分解，利用特征值计算不同频率分量的噪声功率的过程包括：Furthermore, the covariance matrix of the mixed signal is decomposed into eigenvalues, and the process of using eigenvalues to calculate the noise power of different frequency components includes:

对当前时间帧的混合信号协方差矩阵作特征值分解：Mixed-signal covariance matrix for the current time frame Do eigenvalue decomposition:

（5） (5)

其中为特征值向量矩阵；/>为对角阵，其对角元素为从大到小排列的特征值；为矩阵的共轭转置。in is the eigenvalue vector matrix;/> It is a diagonal matrix, and its diagonal elements are the eigenvalues arranged from large to small; is the conjugate transpose of the matrix.

计算噪声的功率为最小的/>个特征值的平均值，其中/>为声源的总数量。Calculate the power of noise is the smallest/> The average of eigenvalues, where/> is the total number of sound sources.

进一步，利用主要特征值向量构成的子空间和理论导向矢量，计算每个声源的导向矢量的公式如下：Furthermore, the guidance vector of each sound source is calculated using the subspace composed of the main eigenvalue vectors and the theoretical guidance vector. The formula is as follows:

（6） (6)

其中为第/>个声源的来波方向，在三维球坐标系中，/>，/>和/>分别为的俯仰角和方位角；/>为第/>个声源的理论导向矢量，根据阵列的拓扑结构和自由声场传播模型计算得到；/>为/>的前/>个列向量组成的/>维矩阵。in For the first/> The direction of the incoming wave of a sound source, in the three-dimensional spherical coordinate system,/> ,/> and/> are the pitch angle and azimuth angle respectively;/> For the first/> The theoretical guidance vector of each sound source is calculated based on the topology of the array and the free sound field propagation model;/> for/> The front/> composed of column vectors/> dimensional matrix.

进一步，利用特征值向量和特征值，计算混合信号协方差矩阵的逆矩阵的公式如下：Furthermore, using the eigenvalue vector and eigenvalues, the formula for calculating the inverse matrix of the mixed signal covariance matrix is as follows:

（7） (7)

其中为对/>的对角线元素求倒数得到。in for/> Obtained by taking the reciprocal of the diagonal elements of .

进一步，利用混合信号协方差矩阵的逆矩阵和声源的导向矢量，计算每个声源的功率，并按照协方差矩阵的定义重构每个声源的信号协方差矩阵的过程包括：Furthermore, the process of calculating the power of each sound source using the inverse matrix of the mixed signal covariance matrix and the steering vector of the sound source, and reconstructing the signal covariance matrix of each sound source according to the definition of the covariance matrix includes:

计算第个声源的功率/>的公式为：Calculate the first The power of a sound source/> The formula is:

（8） (8)

按照协方差矩阵的定义重构第个声源的信号协方差矩阵/>的公式如下：According to the definition of covariance matrix, reconstruct the Signal covariance matrix of sound sources/> The formula is as follows:

（9） (9)

进一步，利用每个声源的信号协方差矩阵和目标信号的理论导向矢量，计算分离系数矩阵；基于混合声信号向量与分离系数矩阵得到分离后的声源信号的过程包括：Furthermore, the signal covariance matrix of each sound source and the theoretical guidance vector of the target signal are used to calculate the separation coefficient matrix; the process of obtaining the separated sound source signal based on the mixed sound signal vector and the separation coefficient matrix includes:

计算维分离系数矩阵的公式如下：calculate The formula of the dimensional separation coefficient matrix is as follows:

（10） (10)

其中为/>个声源的理论导向矢量组成的/>维矩阵，/>为最终用于计算分离系数的信号协方差矩阵：in for/> composed of theoretical guidance vectors of sound sources/> dimensional matrix, /> is the signal covariance matrix ultimately used to calculate the separation coefficient:

（11） (11)

其中为对角加载权重因子，取值范围为（0,1），取值越大，对信号的变化越敏感；其中：in Load the weight factor for the diagonal, the value range is (0,1), the larger the value, the more sensitive it is to signal changes; where:

（12） (12)

上式中，计算第个声源的分离系数时，/>中不包括第/>个声源的信号协方差矩阵/>。In the above formula, calculate the When the separation coefficient of a sound source is Excludes No./> Signal covariance matrix of sound sources/> .

对麦克风接收到的混合信号进行声源分离计算的公式如下：The formula for calculating the sound source separation of the mixed signal received by the microphone is as follows:

（13） (13)

进一步，所述声源分离方法还包括：对每帧分离后的频域信号乘以自定义的窗函数，进行逆傅里叶变换，得到/>个分离后的时域声信号。Further, the sound source separation method also includes: separating the frequency domain signal of each frame Multiply by the custom window function and perform inverse Fourier transform to get/> separated time domain acoustic signal.

本发明的第二方面提供了一种基于信号协方差矩阵重构的声源分离装置，包括一个或多个处理器，用于实现上述的信号协方差矩阵重构的实时声源分离方法。A second aspect of the present invention provides a sound source separation device based on signal covariance matrix reconstruction, including one or more processors for implementing the above-mentioned real-time sound source separation method of signal covariance matrix reconstruction.

本发明的第三方面提供了一种计算机可读存储介质，其上存储有程序，该程序被处理器执行时，用于实现上述的基于信号协方差矩阵重构的声源分离方法。A third aspect of the present invention provides a computer-readable storage medium on which a program is stored. When the program is executed by a processor, it is used to implement the above-mentioned sound source separation method based on signal covariance matrix reconstruction.

本发明的有益效果是：本发明提出了一种信号协方差矩阵重构的实时声源分离方法。通过特征值分解估计噪声功率，并利用特征值分解求逆矩阵，利用理论导向矢量在主要特征值向量构成的子空间的投影修正声源的导向矢量，不但极大的降低了信号协方差矩阵重构的计算量，协方差矩阵的重构精度，适用于机器人对不同声源信号的实时提取。The beneficial effects of the present invention are: the present invention proposes a real-time sound source separation method for signal covariance matrix reconstruction. The noise power is estimated through eigenvalue decomposition, and the inverse matrix is obtained using eigenvalue decomposition. The projection of the theoretical steering vector in the subspace composed of the main eigenvalue vectors is used to correct the steering vector of the sound source, which not only greatly reduces the signal covariance matrix repetition The calculation amount of the structure and the reconstruction accuracy of the covariance matrix are suitable for the real-time extraction of different sound source signals by the robot.

附图说明Description of the drawings

为了更清楚地说明本发明实施例中的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present invention. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without exerting any creative effort.

图1是本发明提出的信号协方差矩阵重构的实时声源分离方法的流程图；Figure 1 is a flow chart of the real-time sound source separation method for signal covariance matrix reconstruction proposed by the present invention;

图2是本发明方法的语音分离性能指标和其他方法的语音分离性能指标对比图；Figure 2 is a comparison chart between the speech separation performance indicators of the method of the present invention and the speech separation performance indicators of other methods;

图3（a）和图3（b）是本发明方法参考目标语音信号和从混合信号中分离的目标语音信号的波形对比图，其中图3（a）为参考目标语音信号的波形图，图3（b）为从混合信号中分离的目标语音信号的波形图；Figure 3(a) and Figure 3(b) are waveform comparison diagrams of the reference target speech signal and the target speech signal separated from the mixed signal by the method of the present invention, where Figure 3(a) is the waveform diagram of the reference target speech signal, Fig. 3(b) is the waveform diagram of the target speech signal separated from the mixed signal;

图4（a）和图4（b）是本发明方法参考干扰信号和语音分离抑制的干扰信号的波形对比图，其中图4（a）为参考干扰信号的波形图，图4（b）为本发明语音分离方法抑制后的干扰信号的波形；Figure 4(a) and Figure 4(b) are waveform comparison diagrams of the reference interference signal and the interference signal suppressed by speech separation according to the method of the present invention. Figure 4(a) is the waveform diagram of the reference interference signal, and Figure 4(b) is the waveform diagram of the reference interference signal. The waveform of the interference signal after suppression by the speech separation method of the present invention;

图5是本发明提出的信号协方差矩阵重构的实时声源分离方法装置的结构示意图。Figure 5 is a schematic structural diagram of the real-time sound source separation method device for signal covariance matrix reconstruction proposed by the present invention.

具体实施方式Detailed ways

下面结合附图和具体实例，详细说明本发明的实施步骤和有益效果。The implementation steps and beneficial effects of the present invention will be described in detail below with reference to the accompanying drawings and specific examples.

实施例1Example 1

本发明提出了一种信号协方差矩阵重构的实时声源分离方法，如图1和图2所示，所述方法具体包括以下步骤：The present invention proposes a real-time sound source separation method for signal covariance matrix reconstruction, as shown in Figures 1 and 2. The method specifically includes the following steps:

1、计算混合信号的协方差矩阵；1. Calculate the covariance matrix of the mixed signal ;

已知声源数量为，第/>个声源的来波方向矢量为/>，其中/>和/>分别为三维球坐标系的俯仰角和方位角，麦克风阵列的阵元数量为/>。The number of known sound sources is , No./> The incoming wave direction vector of a sound source is/> , of which/> and/> are the pitch angle and azimuth angle of the three-dimensional spherical coordinate system respectively, and the number of elements of the microphone array is/> .

对麦克风阵元接收到的第帧时域信号，作/>个点的快速傅里叶变换。对于每一个频率分量，按以下公式计算/>维混合信号的协方差矩阵/>：The microphone array element receives the Frame time domain signal, for/> Fast Fourier transform of points. For each frequency component, calculate according to the following formula/> Covariance matrix of dimensional mixed signals/> :

（1） (1)

其中代表时间帧；/>代表频率；/>为指数平滑因子，取值范围为（0,1），取值越大，则历史预测数据的作用越大，当前实际数据的作用越小；in Represents the time frame;/> Represents frequency;/> It is an exponential smoothing factor with a value range of (0,1). The larger the value, the greater the role of historical forecast data and the smaller the role of current actual data;

（2） (2)

为维声信号向量；/>为第/>个麦克风的信号；/>为常规矩阵转置；当时，/>为：for Dimensional sound signal vector;/> For the first/> signal from a microphone;/> is the transpose of a regular matrix; when When,/> for:

（3） (3)

其中为/>阶单位矩阵；/>为随频率变化的正则化参数，取值范围为（0,1），频率越大，取值越小；/>为球面扩散噪声的协方差矩阵，其/>元素为：in for/> Order identity matrix;/> It is a regularization parameter that changes with frequency. The value range is (0,1). The greater the frequency, the smaller the value;/> is the covariance matrix of spherical diffusion noise, whose/> The elements are:

（4） (4)

其中为相邻麦克风阵元的时延，/>为相邻麦克风阵元的间距，/>为物理量的传播速度。in is the delay of adjacent microphone array elements,/> is the spacing between adjacent microphone array elements,/> is the propagation speed of physical quantities.

2、计算噪声的功率；2. Calculate the power of noise ;

（5） (5)

其中为/>维特征值向量矩阵；/>为对角阵，其对角元素为从大到小排列的特征值；/>为矩阵的共轭转置。in for/> dimensional eigenvalue vector matrix;/> is a diagonal matrix, and its diagonal elements are the eigenvalues arranged from large to small;/> is the conjugate transpose of the matrix.

计算噪声的功率为最小的/>个特征值的平均值。Calculate the power of noise is the smallest/> the average of the eigenvalues.

3、利用主要特征值向量构成的子空间和理论导向矢量，计算每个声源的导向矢量：3. Calculate the guidance vector of each sound source using the subspace composed of the main eigenvalue vectors and the theoretical guidance vector. :

（6） (6)

4、利用下式计算混合信号协方差矩阵的逆矩阵：4. Use the following formula to calculate the inverse matrix of the mixed signal covariance matrix:

（7） (7)

5、利用下式计算第个声源的功率/>：5. Use the following formula to calculate the The power of a sound source/> :

（8） (8)

按照协方差矩阵的定义重构第个声源的信号协方差矩阵/>：（9）。According to the definition of covariance matrix, reconstruct the Signal covariance matrix of sound sources/> : (9).

6、利用每个声源的信号协方差矩阵和目标信号的理论导向矢量，计算维分离系数矩阵：6. Using the signal covariance matrix of each sound source and the theoretical guidance vector of the target signal, calculate Dimensional separation coefficient matrix:

（10） (10)

（11） (11)

（12） (12)

对麦克风接收到的混合信号进行声源分离计算的公式如下：（13）。The formula for calculating the sound source separation of the mixed signal received by the microphone is as follows: (13).

7、对每帧分离后的频域信号乘以自定义的窗函数，进行逆傅里叶变换，得到/>个分离后的时域声信号。7. Separate the frequency domain signal of each frame Multiply by the custom window function and perform inverse Fourier transform to get/> separated time domain acoustic signal.

以均匀圆环形六阵元麦克风阵列为例，麦克风全指向，，麦克风阵元的俯仰角度均为90°，即圆环面水平安装。Taking a uniform annular six-element microphone array as an example, the microphone is omnidirectional. , the pitch angles of the microphone array elements are all 90°, that is, the torus is installed horizontally.

对麦克风阵元接收到的第帧时域信号，作/>个点的快速傅里叶变换，在本实施例中，/>。按照公式（1），计算每一个频率分量的/>维混合信号的协方差矩阵/>。在本实施例中，/>。按照公式（3）和（4）计算初始信号协方差矩阵。The microphone array element receives the Frame time domain signal, for/> Fast Fourier transform of points, in this embodiment,/> . According to formula (1), calculate the /> of each frequency component Covariance matrix of dimensional mixed signals/> . In this embodiment,/> . Calculate the initial signal covariance matrix according to formulas (3) and (4) .

在本实施例中，采样频率为1.6 kHz，频率的范围为0到8000 Hz，等间隔62.5 Hz取值；/>的取值随着频率/>的增大，等间隔地从0.01减小到0.001。In this embodiment, the sampling frequency is 1.6 kHz, and the frequency The range is 0 to 8000 Hz, with values at equal intervals of 62.5 Hz;/> The value of with frequency/> increases and decreases from 0.01 to 0.001 at equal intervals.

按照公式（5）对当前时间帧的混合信号协方差矩阵作特征值分解，并按照公式（7）利用下式计算混合信号协方差矩阵的逆矩阵/>。According to formula (5), the mixed signal covariance matrix of the current time frame Perform eigenvalue decomposition and use the following formula to calculate the inverse matrix of the mixed signal covariance matrix/> .

计算噪声的功率为最小的/>个特征值的平均值，在本实施例中，/>。Calculate the power of noise is the smallest/> The average value of the characteristic values, in this embodiment,/> .

假设自由声场远场传播模型，圆环形麦克风阵列的理论导向矢量为：Assuming a free sound field far-field propagation model, the theoretical guidance vector of a circular microphone array is:

（14） (14)

其中为波数，/>为麦克风阵列的半径，为麦克风阵元的方位角度，且：in is the wave number,/> is the radius of the microphone array, is the azimuth angle of the microphone array element, and:

（15） (15)

按照公式（6）计算第个声源的导向矢量/>，并按照公式（8）计算第/>个声源的功率。然后按照公式（9）计算第/>个声源的信号协方差矩阵/>。According to formula (6), calculate the Guidance vectors of sound sources/> , and calculate the /> The power of a sound source . Then calculate the /> Signal covariance matrix of sound sources/> .

按照公式（10）-（12）计算维分离系数矩阵，在本实施例中，/>。Calculate according to formula (10)-(12) Dimension separation coefficient matrix, in this embodiment, /> .

最后，按照公式（13）对麦克风接收到的混合信号进行声源分离，并对得到的每帧分离后的频域信号乘以自定义的窗函数，进行逆傅里叶变换，得到/>个分离后的时域声信号。在本实施例中，自定义的窗函数的汉宁窗。Finally, perform sound source separation on the mixed signal received by the microphone according to formula (13), and obtain the separated frequency domain signal of each frame. Multiply by the custom window function and perform inverse Fourier transform to get/> separated time domain acoustic signal. In this example, the custom window function is the Hanning window.

在本发明实施例中，采用的声源信号为两个从语料库中随机抽取的语音信号。两个语音信号的来波方向不同，其中俯仰角固定为88°，方位角为随机，但保持间隔为45°。两个语音信号长度固定为3秒，信号的能量比为[0,3] dB随机。采用镜像源法 ( ImageSource Method )仿真环形6麦接收到的混合语音信号，仿真声场环境的混响时间为0.6 s。生成混合语音信号共100条。In the embodiment of the present invention, the sound source signals used are two speech signals randomly selected from the corpus. The two voice signals come in different directions, with the pitch angle fixed at 88° and the azimuth angle random, but the interval is maintained at 45°. The length of the two voice signals is fixed to 3 seconds, and the energy ratio of the signals is [0,3] dB randomly. The ImageSource Method is used to simulate the mixed speech signal received by the ring 6 microphone, and the reverberation time of the simulated sound field environment is 0.6 s. A total of 100 mixed speech signals are generated.

图2是采用本发明方法实现的语音分离性能指标与其他方法的对比，包括感知语音质量评价指标（PESQ），短时客观可懂度(STOI)和尺度不变信号失真比改善（SI-SDRi）。PESQ的取值范围为-0.5~4.5，PESQ值越高则表明被测试的语音具有越好的听觉语音质量；STOI的取值范围0~1，越接近1表示语音越能够被充分理解；SI-SNR代表语音分离后的信号与原信号的接近程度，越大越好。本发明方法的平均PESQ，平均STOI和平均SI-SNR均与基于中国专利CN105182298A的波束成形声源分离方法相当。Figure 2 is a comparison of speech separation performance indicators achieved by the method of the present invention and other methods, including perceptual speech quality evaluation index (PESQ), short-term objective intelligibility (STOI) and scale-invariant signal-to-distortion ratio improvement (SI-SDRi). ). The value range of PESQ is -0.5~4.5. The higher the PESQ value, the better the auditory speech quality of the tested speech. The value range of STOI is 0~1. The closer to 1, the more fully understood the speech is. SI -SNR represents the closeness of the speech separated signal to the original signal, the bigger the better. The average PESQ, average STOI and average SI-SNR of the method of the present invention are all comparable to the beamforming sound source separation method based on Chinese patent CN105182298A.

表1是本发明方法处理3s数据的平均总耗时与其他方法的比较，处理3s数据总耗时可以反应一个方法能否用于实时处理。当处理总耗时远大于数据的时长3s时，无法用于实时声源分离处理。从表1可以看到，本发明方法的处理平均总耗时相比基于中国专利CN105182298A的波束成形声源分离方法减小了93.6%，可用于实时处理。在本实施例中，计算所用CPU为Intel Xeon processor icelake，主频2.6GHz。每帧快速傅里叶变换的点数为256，帧移为128，信号采样频率为16 kHz，每4帧信号重构一次协方差矩阵并更新分离矩阵系数。Table 1 is a comparison of the average total time consuming for processing 3s data of the method of the present invention and other methods. The total time consuming for processing 3s data can reflect whether a method can be used for real-time processing. When the total processing time is much greater than the data duration of 3 seconds, it cannot be used for real-time sound source separation processing. As can be seen from Table 1, the average total processing time of the method of the present invention is reduced by 93.6% compared with the beamforming sound source separation method based on Chinese patent CN105182298A, and can be used for real-time processing. In this embodiment, the CPU used for calculation is Intel Xeon processor icelake, with a main frequency of 2.6GHz. The number of fast Fourier transform points per frame is 256, the frame shift is 128, the signal sampling frequency is 16 kHz, the covariance matrix is reconstructed every 4 frames of signal and the separation matrix coefficients are updated.

图3（a）和图3（b）是本发明方法参考目标语音信号和从混合信号中分离的目标语音信号的波形对比图，其中图3（a）为参考目标语音信号的波形图，图3（b）为从混合信号中分离的目标语音信号的波形图；可以看到从混合信号中分离的目标语音信号与参考目标语音信号的波形很接近，取得了较好的语音分离效果。Figure 3(a) and Figure 3(b) are waveform comparison diagrams of the reference target speech signal and the target speech signal separated from the mixed signal by the method of the present invention, where Figure 3(a) is the waveform diagram of the reference target speech signal, Fig. 3(b) is the waveform diagram of the target speech signal separated from the mixed signal; it can be seen that the waveform of the target speech signal separated from the mixed signal is very close to the waveform of the reference target speech signal, and a good speech separation effect is achieved.

图4（a）和图4（b）是本发明方法参考干扰信号和语音分离抑制的干扰信号的波形对比图，其中图4（a）为参考干扰信号的波形图，图4（b）为本发明语音分离方法抑制后的干扰信号的波形；可以看到干扰语音信号经过本发明处理后几乎完全被抑制。Figure 4(a) and Figure 4(b) are waveform comparison diagrams of the reference interference signal and the interference signal suppressed by speech separation according to the method of the present invention. Figure 4(a) is the waveform diagram of the reference interference signal, and Figure 4(b) is the waveform diagram of the reference interference signal. The waveform of the interference signal after being suppressed by the voice separation method of the present invention; it can be seen that the interference voice signal is almost completely suppressed after being processed by the present invention.

与前述基于信号协方差矩阵重构的实时声源分离方法的实施例相对应，本发明还提供了基于信号协方差矩阵重构的实时声源分离方法装置的实施例。Corresponding to the foregoing embodiments of the real-time sound source separation method based on signal covariance matrix reconstruction, the present invention also provides embodiments of a real-time sound source separation method and device based on signal covariance matrix reconstruction.

实施例2Example 2

参见图5，本实施例提供的一种信号协方差矩阵重构的实时声源分离方法装置，包括一个或多个处理器，用于实现实施例1中的信号协方差矩阵重构的实时声源分离方法。Referring to Figure 5, this embodiment provides a real-time sound source separation method device for signal covariance matrix reconstruction, including one or more processors, used to implement the real-time sound source separation method for signal covariance matrix reconstruction in Embodiment 1. Source separation methods.

本发明信号协方差矩阵重构的实时声源分离方法装置的实施例可以应用在任意具备数据处理能力的设备上，该任意具备数据处理能力的设备可以为诸如计算机等设备或装置。装置实施例可以通过软件实现，也可以通过硬件或者软硬件结合的方式实现。以软件实现为例，作为一个逻辑意义上的装置，是通过其所在任意具备数据处理能力的设备的处理器将非易失性存储器中对应的计算机程序指令读取到内存中运行形成的。从硬件层面而言，如图5所示，为本发明信号协方差矩阵重构的实时声源分离方法装置所在任意具备数据处理能力的设备的一种硬件结构图，除了图5所示的处理器、内存、网络接口、以及非易失性存储器之外，实施例中装置所在的任意具备数据处理能力的设备通常根据该任意具备数据处理能力的设备的实际功能，还可以包括其他硬件，对此不再赘述。The embodiments of the real-time sound source separation method and device for signal covariance matrix reconstruction of the present invention can be applied to any device with data processing capabilities, and any device with data processing capabilities can be a device or device such as a computer. The device embodiments may be implemented by software, or may be implemented by hardware or a combination of software and hardware. Taking software implementation as an example, as a logical device, it is formed by reading the corresponding computer program instructions in the non-volatile memory into the memory and running them through the processor of any device with data processing capabilities. From the hardware level, as shown in Figure 5, it is a hardware structure diagram of any device with data processing capabilities where the real-time sound source separation method for signal covariance matrix reconstruction of the present invention is located. In addition to the processing shown in Figure 5 In addition to the processor, memory, network interface, and non-volatile memory, any device with data processing capabilities where the device in the embodiment is located may also include other hardware based on the actual functions of any device with data processing capabilities. This will not be described again.

上述装置中各个单元的功能和作用的实现过程具体详见上述方法中对应步骤的实现过程，在此不再赘述。For details on the implementation process of the functions and effects of each unit in the above device, please refer to the implementation process of the corresponding steps in the above method, and will not be described again here.

对于装置实施例而言，由于其基本对应于方法实施例，所以相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的，其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本发明方案的目的。本领域普通技术人员在不付出创造性劳动的情况下，即可以理解并实施。As for the device embodiment, since it basically corresponds to the method embodiment, please refer to the partial description of the method embodiment for relevant details. The device embodiments described above are only illustrative. The units described as separate components may or may not be physically separated. The components shown as units may or may not be physical units, that is, they may be located in One location, or it can be distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of the present invention. Persons of ordinary skill in the art can understand and implement the method without any creative effort.

实施例3Example 3

本发明实施例提供一种计算机可读存储介质，其上存储有程序，该程序被处理器执行时，实现上述实施例1中的信号协方差矩阵重构的实时声源分离方法。An embodiment of the present invention provides a computer-readable storage medium on which a program is stored. When the program is executed by a processor, the real-time sound source separation method for signal covariance matrix reconstruction in the above-mentioned Embodiment 1 is implemented.

所述计算机可读存储介质可以是前述任一实施例所述的任意具备数据处理能力的设备的内部存储单元，例如硬盘或内存。所述计算机可读存储介质也可以是任意具备数据处理能力的设备，例如所述设备上配备的插接式硬盘、智能存储卡（Smart Media Card，SMC）、SD卡、闪存卡（Flash Card）等。进一步的，所述计算机可读存储介质还可以既包括任意具备数据处理能力的设备的内部存储单元也包括外部存储设备。所述计算机可读存储介质用于存储所述计算机程序以及所述任意具备数据处理能力的设备所需的其他程序和数据，还可以用于暂时地存储已经输出或者将要输出的数据。The computer-readable storage medium may be an internal storage unit of any device with data processing capabilities as described in any of the foregoing embodiments, such as a hard disk or a memory. The computer-readable storage medium can also be any device with data processing capabilities, such as a plug-in hard disk, a smart media card (SMC), an SD card, and a flash card (Flash Card) equipped on the device. wait. Furthermore, the computer-readable storage medium may also include both an internal storage unit and an external storage device of any device with data processing capabilities. The computer-readable storage medium is used to store the computer program and other programs and data required by any device with data processing capabilities, and can also be used to temporarily store data that has been output or is to be output.

Claims

1. A real-time sound source separation method based on signal covariance matrix reconstruction, which is characterized by including the following steps:

When multiple sound source signals are detected, the exponential smoothing method is used to calculate the covariance matrix of the mixed signal;

Perform eigenvalue decomposition on the covariance matrix of the mixed signal, and use the eigenvalues to calculate the noise power of different frequency components;

Calculate the guidance vector of the sound source using the subspace composed of the main eigenvalue vectors and the theoretical guidance vector;

Calculate the inverse matrix of the mixed signal covariance matrix using the eigenvalue vector and eigenvalue of the mixed signal covariance matrix;

Use the inverse matrix of the mixed signal covariance matrix and the steering vector of the sound source to calculate the power of each sound source, and reconstruct the signal covariance matrix of each sound source according to the definition of the covariance matrix;

The signal covariance matrix of each sound source and the theoretical steering vector of the signal are used to calculate the separation coefficient matrix; the separated sound source signal is obtained based on the mixed sound signal vector and the separation coefficient matrix.

2. The real-time sound source separation method based on signal covariance matrix reconstruction according to claim 1, characterized in that when multiple sound source signals are detected, the exponential smoothing method is used to calculate the formula of the covariance matrix of the mixed signal. as follows:

(1)

in Represents the time frame;/> Represents frequency;/> is the exponential smoothing factor, the value range is/> , the larger the value, the greater the role of historical forecast data, and the smaller the role of current actual data;

(2)

for Dimensional sound signal vector;/> For the first/> Signals observed by microphones;/> is the transpose of a regular matrix; and when/> =1,/> for

(3)

in for/> Order identity matrix;/> is a regularization parameter that changes with frequency, and its value range is/> , the greater the frequency, the smaller the value;/> is the covariance matrix of spherical diffusion noise, whose/> The elements are:

(4)

in is the delay of adjacent microphone array elements,/> is the spacing between adjacent microphone array elements,/> is the propagation speed of physical quantity,/> is the sampling function.

3. The real-time sound source separation method based on signal covariance matrix reconstruction according to claim 1, characterized in that the covariance matrix of the mixed signal is decomposed into eigenvalues, and the eigenvalues are used to calculate the noise power of different frequency components. The process includes:

Mixed-signal covariance matrix for the current time frame Do eigenvalue decomposition:

(5)

in is the eigenvalue vector matrix;/> is a diagonal matrix, and its diagonal elements are the eigenvalues arranged from large to small;/> is the conjugate transpose of the matrix;

Calculate the power of noise ：/> is the smallest/> The average of eigenvalues, where/> is the total number of sound sources.

4. The real-time sound source separation method based on signal covariance matrix reconstruction according to claim 1, characterized in that the steering vector of each sound source is calculated using a subspace composed of main eigenvalue vectors and a theoretical steering vector. The formula is as follows:

(6)

in For the first/> The direction of the incoming wave of a sound source, in the three-dimensional spherical coordinate system,/> ,/> and/> are the pitch angle and azimuth angle respectively;/> For the first/> The theoretical guidance vector of each sound source is calculated based on the topology of the array and the free sound field propagation model;/> for/> before/> composed of column vectors/> dimensional matrix.

5. The real-time sound source separation method based on signal covariance matrix reconstruction according to claim 1, characterized in that, using eigenvalue vectors and eigenvalues, the formula for calculating the inverse matrix of the mixed signal covariance matrix is as follows:

(7)

in for/> Obtained by taking the reciprocal of the diagonal elements of .

6. The real-time sound source separation method based on signal covariance matrix reconstruction according to claim 1, characterized in that the power of each sound source is calculated using the inverse matrix of the mixed signal covariance matrix and the steering vector of the sound source. , and the process of reconstructing the signal covariance matrix of each sound source according to the definition of the covariance matrix includes:

Calculate the first The power of a sound source/> The formula is:

(8)

According to the definition of covariance matrix, reconstruct the Signal covariance matrix of sound sources/> The formula is as follows:

(9).

7. The real-time sound source separation method based on signal covariance matrix reconstruction according to claim 1, characterized in that the separation coefficient matrix is calculated using the signal covariance matrix of each sound source and the theoretical guidance vector of the target signal; The process of obtaining the separated sound source signal based on the mixed sound signal vector and the separation coefficient matrix includes:

calculate The formula of the dimensional separation coefficient matrix is as follows:

(10)

in for/> composed of theoretical guidance vectors of sound sources/> dimensional matrix, /> is the signal covariance matrix ultimately used to calculate the separation coefficient:

(11)

in Load the weight factor for the diagonal, the value range is/> , the larger the value, the more sensitive it is to signal changes; where

(12)

In the above formula, calculate the When the separation coefficient of a sound source is Excludes No./> Signal covariance matrix of sound sources/> ;

The formula for calculating the sound source separation of the mixed signal received by the microphone is as follows:

(13).

8. The real-time sound source separation method based on signal covariance matrix reconstruction according to claim 7, characterized in that the sound source separation method further includes: separating the frequency domain signal of each frame Multiply by the custom window function and perform inverse Fourier transform to get/> separated time domain acoustic signal.

9. A real-time sound source separation device based on signal covariance matrix reconstruction, characterized in that it includes one or more processors for implementing the signal covariance matrix-based method described in any one of claims 1-8. A reconstructed real-time sound source separation method.

10. A computer-readable storage medium, characterized in that a program is stored thereon, and when the program is executed by a processor, it is used to implement the signal covariance matrix reconstruction according to any one of claims 1-8. A real-time sound source separation method.