CN117037836B - Real-time sound source separation method and device based on signal covariance matrix reconstruction - Google Patents

Real-time sound source separation method and device based on signal covariance matrix reconstruction Download PDF

Info

Publication number
CN117037836B
CN117037836B CN202311278673.9A CN202311278673A CN117037836B CN 117037836 B CN117037836 B CN 117037836B CN 202311278673 A CN202311278673 A CN 202311278673A CN 117037836 B CN117037836 B CN 117037836B
Authority
CN
China
Prior art keywords
covariance matrix
sound source
signal
matrix
calculating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311278673.9A
Other languages
Chinese (zh)
Other versions
CN117037836A (en
Inventor
朱世强
肖永雄
宛敏红
宋伟
付强
李特
顾建军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN202311278673.9A priority Critical patent/CN117037836B/en
Publication of CN117037836A publication Critical patent/CN117037836A/en
Application granted granted Critical
Publication of CN117037836B publication Critical patent/CN117037836B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/21Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)

Abstract

A real-time sound source separation method and device based on signal covariance matrix reconstruction, the method comprises: when a plurality of sound source signals are detected, calculating a covariance matrix of the mixed signal by adopting an exponential smoothing method; performing eigenvalue decomposition on the covariance matrix of the mixed signal, and calculating the noise power of different frequency components by using the eigenvalue; calculating a guiding vector of a sound source by utilizing a subspace formed by the main eigenvalue vectors and a theoretical guiding vector; calculating an inverse matrix of the mixed signal covariance matrix by using the eigenvalue vector and the eigenvalue of the mixed signal covariance matrix; calculating the power of each sound source by using the inverse matrix of the mixed signal covariance matrix and the steering vector of the sound source, and reconstructing the signal covariance matrix of each sound source according to the definition of the covariance matrix; calculating a separation coefficient matrix by utilizing the signal covariance matrix of each sound source and the theoretical steering vector of the signal; and obtaining the separated sound source signal based on the mixed sound signal vector and the separation coefficient matrix.

Description

Real-time sound source separation method and device based on signal covariance matrix reconstruction
Technical Field
The invention relates to the field of array sound source signal processing, in particular to a real-time sound source separation method and device based on signal covariance matrix reconstruction.
Background
In complex acoustic scenarios with background noise, reverberation, and multi-speaker interference, the human ear has the ability to extract the target speaker's voice, known as the "cocktail party effect". However, this is a difficult task for robots, which is a problem in the field of sound source signal processing. A technique of extracting sound source signals of one or more target speakers from the mixed sound source signals using a signal processing method is called a sound source separation technique.
The sound source separation mainly comprises single-channel sound source separation and multi-channel sound source separation. The multichannel sound source separation method based on the microphone array can better utilize the spatial information of the sound field. Therefore, the multi-channel sound source separation method generally achieves better performance than single-channel sound source separation, whether using a conventional signal processing method or a deep learning method. Most of the man-machine interaction devices at present adopt a multichannel microphone array as sound receiving hardware.
Theoretically, the task of separating and extracting sound source signals in different directions can be adaptively completed by utilizing beam forming based on a multichannel microphone array. However, for a scenario where multiple people speak at the same time, it is still difficult to accurately estimate covariance matrices of interference signals and noise signals formed by other sound source signals for target sound source signals in different directions. In addition, due to the presence of scattering and reflection in the sound field, errors in microphone position and sound source positioning, etc., an accurate steering vector cannot be obtained. These two problems directly affect the sound source separation performance of the beamforming method.
To more accurately estimate the signal covariance matrix, one approach is to introduce a probability of signal presence (Signal Presence Probability, SPP) estimate. Updating the signal covariance matrix when the estimated SPP of the target sound source is low; when the estimated SPP of the target sound source is high, the signal covariance matrix is not updated. However, this method requires knowledge of the prior sound source signal existence probability, and for a scene where a plurality of sound sources exist at the same time, the SPPs of different sound source signals are difficult to estimate accurately.
Another method for estimating the signal covariance matrix is to integrate Capon spectrum estimation in the whole space region except the angle region where the target sound source signal is located, decompose eigenvalue of the covariance matrix obtained by integration, and reconstruct the signal covariance matrix again. Although the integration can be discretized into summation to reduce the calculated amount, for the broadband sound source signal, the summation operation, the inversion operation and the eigenvalue decomposition operation need to be carried out on each frequency for multiple times, the calculation complexity is very high, and the real-time sound source signal processing requirement cannot be met.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a real-time sound source signal separation method and device based on signal covariance matrix reconstruction. Noise power is estimated through eigenvalue decomposition, an eigenvalue decomposition inversion matrix is utilized, a guide vector of a sound source is corrected by utilizing projection of a theoretical guide vector in a subspace formed by main eigenvalue vectors, and a covariance matrix of each sound source is reconstructed according to definition of the covariance matrix, so that the calculated amount of signal covariance matrix reconstruction is greatly reduced, and the reconstruction accuracy of the covariance matrix is remarkably improved.
The first aspect of the present invention provides a sound source separation method based on signal covariance matrix reconstruction, comprising the following steps:
when a plurality of sound source signals are detected, calculating a covariance matrix of the mixed signal by adopting an exponential smoothing method;
performing eigenvalue decomposition on the covariance matrix of the mixed signal, and calculating the noise power of different frequency components by using the eigenvalue;
calculating a guiding vector of a sound source by utilizing a subspace formed by the main eigenvalue vectors and a theoretical guiding vector;
calculating an inverse matrix of the mixed signal covariance matrix by using the eigenvalue vector and the eigenvalue of the mixed signal covariance matrix;
calculating the power of each sound source by using the inverse matrix of the mixed signal covariance matrix and the steering vector of the sound source, and reconstructing the signal covariance matrix of each sound source according to the definition of the covariance matrix;
calculating a separation coefficient matrix by utilizing the signal covariance matrix of each sound source and the theoretical steering vector of the signal; and obtaining the separated sound source signal based on the mixed sound signal vector and the separation coefficient matrix.
Further, when a plurality of sound source signals are detected, the formula for calculating the covariance matrix of the mixed signal by using the exponential smoothing method is as follows:
(1)
wherein the method comprises the steps ofRepresenting a time frame; />Representing the frequency; />The value range is (0, 1) for the exponential smoothing factor, the larger the value is, the larger the effect of the historical prediction data is, and the smaller the effect of the current actual data is;
(2)
is thatA vector of dimensional acoustic signals; />Is->Signals observed by the microphones; />Is normalA matrix transposition; and when->When (I)>The method comprises the following steps:
(3)
wherein the method comprises the steps ofIs->A rank identity matrix; />The value range is (0, 1) for regularization parameters which change along with the frequency, and the larger the frequency is, the smaller the value is; />Covariance matrix of spherical diffusion noise, which is +.>The elements are as follows:
(4)
wherein the method comprises the steps ofFor the delay of adjacent microphone elements +.>For the spacing between adjacent microphone elements, +.>For the propagation speed of the physical quantity, +.>As a sampling function.
Further, the process of calculating noise power of different frequency components by using eigenvalues by decomposing the covariance matrix of the mixed signal as eigenvalues includes:
mixed signal covariance matrix for current time frameAnd (3) performing eigenvalue decomposition:
(5)
wherein the method comprises the steps ofIs a eigenvalue vector matrix; />The diagonal elements of the diagonal array are characteristic values arranged from large to small;is the conjugate transpose of the matrix.
Calculating the power of noiseIs minimum->Average value of individual characteristic values, wherein->Is the total number of sound sources.
Further, the guiding vector of each sound source is calculated by utilizing the subspace formed by the main eigenvalue vectors and the theoretical guiding vectorThe formula of (2) is as follows:
(6)
wherein the method comprises the steps ofIs->The direction of the incoming wave of the individual sound sources, in three-dimensional spherical coordinates,/->,/>And->Pitch angle and azimuth angle respectively; />Is->Theoretical steering vectors of the individual sound sources are calculated according to the topological structure of the array and a free sound field propagation model; />Is->Before->A>A dimension matrix.
Further, the inverse matrix of the mixed signal covariance matrix is calculated using the eigenvalue vector and eigenvalue as follows:
(7)
wherein the method comprises the steps ofFor->Is obtained by inverting the diagonal elements of (a).
Further, the process of calculating the power of each sound source using the inverse of the mixed signal covariance matrix and the steering vector of the sound source and reconstructing the signal covariance matrix of each sound source according to the definition of the covariance matrix includes:
calculate the firstPower of individual sound source->The formula of (2) is:
(8)
reconstructing the first according to the definition of the covariance matrixSignal covariance matrix of individual sound sources->The formula of (2) is as follows:
(9)
further, calculating a separation coefficient matrix by utilizing the signal covariance matrix of each sound source and the theoretical steering vector of the target signal; the process of obtaining the separated sound source signal based on the mixed sound signal vector and the separation coefficient matrix comprises the following steps:
calculation ofThe formula of the dimension separation coefficient matrix is as follows:
(10)
wherein the method comprises the steps ofIs->Theoretical steering vector composition of individual sound sources +.>Dimension matrix->For the signal covariance matrix that is ultimately used to calculate the separation coefficient:
(11)
wherein the method comprises the steps ofThe weight factors are loaded diagonally, the value range is (0, 1), and the larger the value is, the more sensitive the value is to the change of the signal; wherein:
(12)
in the above, calculate the firstThe separation coefficient of the individual sound sources->Does not include->Signal covariance matrix of individual sound sources->
The formula for performing sound source separation calculation on the mixed signal received by the microphone is as follows:
(13)
further, the sound source separation method further includes: for each frame of separated frequency domain signalMultiplying by a custom window function, performing an inverse Fourier transform to obtain +.>And separating the time domain acoustic signals.
The second aspect of the present invention provides a sound source separation device based on signal covariance matrix reconstruction, which comprises one or more processors, and is used for implementing the real-time sound source separation method based on signal covariance matrix reconstruction.
A third aspect of the present invention provides a computer-readable storage medium having stored thereon a program which, when executed by a processor, is adapted to carry out the above-described method of sound source separation based on signal covariance matrix reconstruction.
The beneficial effects of the invention are as follows: the invention provides a real-time sound source separation method for signal covariance matrix reconstruction. Noise power is estimated through eigenvalue decomposition, an eigenvalue decomposition inversion matrix is utilized, and the guide vector of a sound source is corrected by utilizing projection of a theoretical guide vector in a subspace formed by main eigenvalue vectors, so that the calculated amount of signal covariance matrix reconstruction is greatly reduced, the covariance matrix reconstruction precision is improved, and the method is suitable for real-time extraction of different sound source signals by a robot.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.
FIG. 1 is a flow chart of a real-time sound source separation method for signal covariance matrix reconstruction according to the invention;
FIG. 2 is a comparison of speech separation performance metrics of the method of the present invention with speech separation performance metrics of other methods;
fig. 3 (a) and 3 (b) are waveform comparison diagrams of a reference target voice signal and a target voice signal separated from a mixed signal according to the method of the present invention, wherein fig. 3 (a) is a waveform diagram of the reference target voice signal and fig. 3 (b) is a waveform diagram of the target voice signal separated from the mixed signal;
fig. 4 (a) and fig. 4 (b) are waveform comparison diagrams of a reference interference signal and a speech separation suppressed interference signal according to the method of the present invention, wherein fig. 4 (a) is a waveform diagram of the reference interference signal, and fig. 4 (b) is a waveform of the interference signal after the speech separation method of the present invention is suppressed;
fig. 5 is a schematic structural diagram of a real-time sound source separation method device for signal covariance matrix reconstruction.
Detailed Description
The steps and advantages of the present invention will be described in detail below with reference to the drawings and specific examples.
Example 1
The invention provides a real-time sound source separation method for signal covariance matrix reconstruction, which specifically comprises the following steps as shown in fig. 1 and 2:
1. calculating covariance matrix of mixed signal
The number of sound sources is known asFirst->The incoming wave direction vector of the individual sound sources is +.>Wherein->And->The pitch angle and the azimuth angle of the three-dimensional spherical coordinate system are respectively, and the number of array elements of the microphone array is +.>
Received by microphone array elementFrame time domain signal, do->Fast fourier transform of individual points. For each frequency component +.>Covariance matrix of the dimensional mixed signal>
(1)
Wherein the method comprises the steps ofRepresenting a time frame; />Representing the frequency; />The value range is (0, 1) for the exponential smoothing factor, the larger the value is, the larger the effect of the historical prediction data is, and the smaller the effect of the current actual data is;
(2)
is thatA vector of dimensional acoustic signals; />Is->Signals of the microphones; />Transpose the conventional matrix; when (when)When (I)>The method comprises the following steps:
(3)
wherein the method comprises the steps ofIs->A rank identity matrix; />The value range is (0, 1) for regularization parameters which change along with the frequency, and the larger the frequency is, the smaller the value is; />Covariance matrix of spherical diffusion noise, which is +.>The elements are as follows:
(4)
wherein the method comprises the steps ofFor the delay of adjacent microphone elements +.>For the spacing between adjacent microphone elements, +.>Is the propagation velocity of the physical quantity.
2. Calculating the power of noise
Mixed signal covariance matrix for current time frameAnd (3) performing eigenvalue decomposition:
(5)
wherein the method comprises the steps ofIs->A vector matrix of dimensional eigenvalues; />The diagonal elements of the diagonal array are characteristic values arranged from large to small; />Is the conjugate transpose of the matrix.
Calculating the power of noiseIs minimum->Average of the individual characteristic values.
3. Calculating the director of each sound source by using the subspace formed by the principal eigenvalue vectors and the theoretical director vectorMeasuring amount
(6)
Wherein the method comprises the steps ofIs->The direction of the incoming wave of the individual sound sources, in three-dimensional spherical coordinates,/->,/>And->Pitch angle and azimuth angle respectively; />Is->Theoretical steering vectors of the individual sound sources are calculated according to the topological structure of the array and a free sound field propagation model; />Is->Before->A>A dimension matrix.
4. The inverse of the mixed signal covariance matrix is calculated using:
(7)
wherein the method comprises the steps ofFor->Is obtained by inverting the diagonal elements of (a).
5. Calculate the first usingPower of individual sound source->
(8)
Reconstructing the first according to the definition of the covariance matrixSignal covariance matrix of individual sound sources->(9)。
6. Calculating by using the signal covariance matrix of each sound source and the theoretical steering vector of the target signalDimensional separation coefficient matrix:
(10)
wherein the method comprises the steps ofIs->Theoretical steering vector composition of individual sound sources +.>Dimension matrix->For the signal covariance matrix that is ultimately used to calculate the separation coefficient:
(11)
wherein the method comprises the steps ofThe weight factors are loaded diagonally, the value range is (0, 1), and the larger the value is, the more sensitive the value is to the change of the signal; wherein:
(12)
in the above, calculate the firstThe separation coefficient of the individual sound sources->Does not include->Signal covariance matrix of individual sound sources->
The formula for performing sound source separation calculation on the mixed signal received by the microphone is as follows:(13)。
7. for each frame of separated frequency domain signalMultiplying by a custom window function, performing an inverse Fourier transform to obtain +.>And separating the time domain acoustic signals.
Taking a uniform circular six-array element microphone array as an example, the microphones are all directional,the pitching angles of the microphone array elements are 90 degrees, namely the circular ring surface is horizontally arranged.
Received by microphone array elementFrame time domain signal, do->Fast fourier transform of individual points, in this embodiment,/->. According to formula (1), calculating +/for each frequency component>Covariance matrix of the dimensional mixed signal>. In this embodiment, <' > a->. Calculating an initial signal covariance matrix according to formulas (3) and (4)
In the present embodiment, the sampling frequency is 1.6 kHz, the frequencyRanging from 0 to 8000 Hz, equally spaced 62.5 Hz values; />Is dependent on the frequency>Is reduced from 0.01 to 0.001 at equal intervals.
A mixed signal covariance matrix of the current time frame according to formula (5)Decomposing the eigenvalue and calculating the inverse of the covariance matrix of the mixed signal according to the formula (7)>
Calculating the power of noiseIs minimum->Average value of individual characteristic values, < > in this embodiment>
Assuming a free sound field far field propagation model, the theoretical steering vector of the circular microphone array is:
(14)
wherein the method comprises the steps ofFor wave number, < >>For the radius of the microphone array,is the azimuth angle of the microphone array element, and:
(15)
calculate the first according to equation (6)Steering vector of individual sound source +.>And calculate +.>Power of individual sound source. Then calculate +.sup.th according to equation (9)>Signal covariance matrix of individual sound sources->
Calculated according to formulas (10) - (12)A matrix of dimension separation coefficients, in this embodiment, < ->
Finally, the mixed signal received by the microphone is subjected to sound source separation according to the formula (13), and the obtained frequency domain signal after each frame separation is subjected toMultiplying by a custom window function, performing an inverse Fourier transform to obtain +.>And separating the time domain acoustic signals. In this embodiment, the hanning window of the custom window function.
In the embodiment of the invention, the adopted sound source signals are two voice signals randomly extracted from the corpus. The incoming wave directions of the two voice signals are different, wherein the pitch angle is fixed to 88 degrees, the azimuth angle is random, and the interval is kept to be 45 degrees. The two speech signals are fixed to be 3 seconds in length, and the energy ratio of the signals is [0,3] dB random. The mixed voice signal received by the ring-shaped 6 microphone is simulated by adopting a mirror image source method (Image Source Method), and the reverberation time of the simulated sound field environment is 0.6 s. A total of 100 mixed speech signals are generated.
FIG. 2 is a comparison of speech separation performance metrics achieved using the method of the present invention with other methods, including perceptual speech quality assessment metrics (PESQ), short-term objective intelligibility (STOI) and scale-invariant signal-to-distortion ratio improvement (SI-SDRi). The value range of the PESQ is-0.5-4.5, and the higher the PESQ value is, the better the hearing voice quality of the tested voice is; the value range of STOI is 0-1, and the closer to 1 is the more fully understood the voice is; SI-SNR represents the closeness of the speech-separated signal to the original signal, the greater the better. The average PESQ, average STOI and average SI-SNR of the method of the present invention are both comparable to the beamforming sound source separation method based on chinese patent CN 105182298A.
Table 1 shows the average total time consumed to process 3s data for the method of the present invention compared to other methods, and the total time consumed to process 3s data can reflect whether one method can be used for real-time processing. When the total processing time is far longer than the duration of 3s of data, the method cannot be used for real-time sound source separation processing. It can be seen from table 1 that the average total consumption of the process of the present invention is reduced by 93.6% compared to the beam forming sound source separation method based on chinese patent CN105182298A, which can be used for real-time processing. In this example, the CPU used for the calculation was Intel Xeon processor icelake, the main frequency was 2.6GHz. The number of points per frame of the fast fourier transform is 256, the frame shift is 128, the signal sampling frequency is 16 kHz, the covariance matrix is reconstructed every 4 frames of the signal, and the separation matrix coefficients are updated.
Fig. 3 (a) and 3 (b) are waveform comparison diagrams of a reference target voice signal and a target voice signal separated from a mixed signal according to the method of the present invention, wherein fig. 3 (a) is a waveform diagram of the reference target voice signal and fig. 3 (b) is a waveform diagram of the target voice signal separated from the mixed signal; the waveform of the target voice signal separated from the mixed signal is very close to that of the reference target voice signal, and a good voice separation effect is achieved.
Fig. 4 (a) and fig. 4 (b) are waveform comparison diagrams of a reference interference signal and a speech separation suppressed interference signal according to the method of the present invention, wherein fig. 4 (a) is a waveform diagram of the reference interference signal, and fig. 4 (b) is a waveform of the interference signal after the speech separation method of the present invention is suppressed; it can be seen that the interfering speech signal is almost completely suppressed after the inventive processing.
Corresponding to the embodiment of the real-time sound source separation method based on the signal covariance matrix reconstruction, the invention also provides an embodiment of the real-time sound source separation method based on the signal covariance matrix reconstruction.
Example 2
Referring to fig. 5, the apparatus for a real-time sound source separation method for signal covariance matrix reconstruction provided in this embodiment includes one or more processors configured to implement the real-time sound source separation method for signal covariance matrix reconstruction in embodiment 1.
The embodiment of the real-time sound source separation method and device for signal covariance matrix reconstruction can be applied to any device with data processing capability, and the device with data processing capability can be a device or device such as a computer. The apparatus embodiments may be implemented by software, or may be implemented by hardware or a combination of hardware and software. Taking software implementation as an example, the device in a logic sense is formed by reading corresponding computer program instructions in a nonvolatile memory into a memory by a processor of any device with data processing capability. In terms of hardware, as shown in fig. 5, a hardware structure diagram of an apparatus with any data processing capability where the apparatus for a real-time sound source separation method for signal covariance matrix reconstruction of the present invention is located is shown in fig. 5, and in addition to the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 5, any apparatus with data processing capability in the embodiment generally includes other hardware according to the actual function of the any apparatus with data processing capability, which is not described herein again.
The implementation process of the functions and roles of each unit in the above device is specifically shown in the implementation process of the corresponding steps in the above method, and will not be described herein again.
For the device embodiments, reference is made to the description of the method embodiments for the relevant points, since they essentially correspond to the method embodiments. The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purposes of the present invention. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
Example 3
An embodiment of the present invention provides a computer-readable storage medium having stored thereon a program which, when executed by a processor, implements the real-time sound source separation method of signal covariance matrix reconstruction in embodiment 1 described above.
The computer readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any of the data processing enabled devices described in any of the previous embodiments. The computer readable storage medium may be any device having data processing capability, for example, a plug-in hard disk, a Smart Media Card (SMC), an SD Card, a Flash memory Card (Flash Card), or the like, which are provided on the device. Further, the computer readable storage medium may include both internal storage units and external storage devices of any data processing device. The computer readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing apparatus, and may also be used for temporarily storing data that has been output or is to be output.

Claims (10)

1. The real-time sound source separation method based on signal covariance matrix reconstruction is characterized by comprising the following steps of:
when a plurality of sound source signals are detected, calculating a covariance matrix of the mixed signal by adopting an exponential smoothing method;
performing eigenvalue decomposition on the covariance matrix of the mixed signal, and calculating the noise power of different frequency components by using the eigenvalue;
calculating a guiding vector of a sound source by utilizing a subspace formed by the main eigenvalue vectors and a theoretical guiding vector;
calculating an inverse matrix of the mixed signal covariance matrix by using the eigenvalue vector and the eigenvalue of the mixed signal covariance matrix;
calculating the power of each sound source by using the inverse matrix of the mixed signal covariance matrix and the steering vector of the sound source, and reconstructing the signal covariance matrix of each sound source according to the definition of the covariance matrix;
calculating a separation coefficient matrix by utilizing the signal covariance matrix of each sound source and the theoretical steering vector of the signal; and obtaining the separated sound source signal based on the mixed sound signal vector and the separation coefficient matrix.
2. The real-time sound source separation method based on signal covariance matrix reconstruction according to claim 1, wherein when a plurality of sound source signals are detected, a formula for calculating a covariance matrix of a mixed signal using an exponential smoothing method is as follows:
(1)
wherein the method comprises the steps ofRepresenting a time frame; />Representing the frequency; />Is an exponential smoothing factor with a value range of +.>The larger the value is, the larger the effect of the history prediction data is, and the smaller the effect of the current actual data is;
(2)
is thatA vector of dimensional acoustic signals; />Is->Signals observed by the microphones; />Transpose the conventional matrix; and when->When=1,>is that
(3)
Wherein the method comprises the steps ofIs->A rank identity matrix; />For regularization parameters varying with frequency, the value range is +.>The larger the frequency is, the smaller the value is; />Covariance matrix of spherical diffusion noise, which is +.>The elements are as follows:
(4)
wherein the method comprises the steps ofFor the delay of adjacent microphone elements +.>For the spacing between adjacent microphone elements, +.>For the propagation speed of the physical quantity, +.>As a sampling function.
3. The method for real-time sound source separation based on signal covariance matrix reconstruction according to claim 1, wherein the process of performing eigenvalue decomposition on the covariance matrix of the mixed signal and calculating noise power of different frequency components using the eigenvalues comprises:
mixed signal covariance matrix for current time frameAnd (3) performing eigenvalue decomposition:
(5)
wherein the method comprises the steps ofIs a eigenvalue vector matrix; />The diagonal elements of the diagonal array are characteristic values arranged from large to small; />Is the conjugate transpose of the matrix;
calculating the power of noise:/>Is minimum->Average value of individual characteristic values, wherein->Is the total number of sound sources.
4. The method for real-time sound source separation based on signal covariance matrix reconstruction according to claim 1, wherein the steering vector of each sound source is calculated using a subspace formed by the principal eigenvalue vectors and theoretical steering vectorsThe formula of (2) is as follows:
(6)
wherein the method comprises the steps ofIs->The direction of the incoming wave of the individual sound sources, in three-dimensional spherical coordinates,/->,/>And->Pitch angle and azimuth angle respectively; />Is->Theoretical steering vectors of the individual sound sources are calculated according to the topological structure of the array and a free sound field propagation model; />Is->Before->A>A dimension matrix.
5. The method for real-time sound source separation based on signal covariance matrix reconstruction according to claim 1, wherein the formula for calculating the inverse matrix of the mixed signal covariance matrix using eigenvalue vectors and eigenvalues is as follows:
(7)
wherein the method comprises the steps ofFor->Is obtained by inverting the diagonal elements of (a).
6. The method for real-time sound source separation based on signal covariance matrix reconstruction according to claim 1, wherein the process of calculating power of each sound source using an inverse matrix of the mixed signal covariance matrix and a steering vector of the sound source and reconstructing the signal covariance matrix of each sound source according to definition of the covariance matrix comprises:
calculate the firstPower of individual sound source->The formula of (2) is:
(8)
reconstructing the first according to the definition of the covariance matrixSignal covariance matrix of individual sound sources->The formula of (2) is as follows:
(9)。
7. the real-time sound source separation method based on signal covariance matrix reconstruction according to claim 1, wherein a separation coefficient matrix is calculated using the signal covariance matrix of each sound source and a theoretical steering vector of the target signal; the process of obtaining the separated sound source signal based on the mixed sound signal vector and the separation coefficient matrix comprises the following steps:
calculation ofThe formula of the dimension separation coefficient matrix is as follows:
(10)
wherein the method comprises the steps ofIs->Theoretical steering vector composition of individual sound sources +.>Dimension matrix->For the signal covariance matrix that is ultimately used to calculate the separation coefficient:
(11)
wherein the method comprises the steps ofFor loading weight factors diagonally, the value range is +.>The larger the value, the more sensitive to the change of the signal; wherein the method comprises the steps of
(12)
In the above, calculate the firstThe separation coefficient of the individual sound sources->Does not include->Signal covariance matrix of individual sound sources->
The formula for performing sound source separation calculation on the mixed signal received by the microphone is as follows:
(13)。
8. the method for real-time sound source separation based on signal covariance matrix reconstruction according to claim 7, further comprising: for each frame of separated frequency domain signalMultiplying by a custom window function, performing an inverse Fourier transform to obtain +.>And separating the time domain acoustic signals.
9. A real-time sound source separation device based on signal covariance matrix reconstruction, characterized by comprising one or more processors for implementing the real-time sound source separation method based on signal covariance matrix reconstruction according to any one of claims 1-8.
10. A computer readable storage medium, having stored thereon a program which, when executed by a processor, is adapted to carry out the method for real-time sound source separation based on signal covariance matrix reconstruction according to any one of claims 1-8.
CN202311278673.9A 2023-10-07 2023-10-07 Real-time sound source separation method and device based on signal covariance matrix reconstruction Active CN117037836B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311278673.9A CN117037836B (en) 2023-10-07 2023-10-07 Real-time sound source separation method and device based on signal covariance matrix reconstruction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311278673.9A CN117037836B (en) 2023-10-07 2023-10-07 Real-time sound source separation method and device based on signal covariance matrix reconstruction

Publications (2)

Publication Number Publication Date
CN117037836A CN117037836A (en) 2023-11-10
CN117037836B true CN117037836B (en) 2023-12-29

Family

ID=88635765

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311278673.9A Active CN117037836B (en) 2023-10-07 2023-10-07 Real-time sound source separation method and device based on signal covariance matrix reconstruction

Country Status (1)

Country Link
CN (1) CN117037836B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010054728A (en) * 2008-08-27 2010-03-11 Hitachi Ltd Sound source extracting device
CN105182298A (en) * 2015-10-19 2015-12-23 电子科技大学 Interfering noise covariance matrix reconstruction method aiming at incoming wave direction error
WO2022172441A1 (en) * 2021-02-15 2022-08-18 日本電信電話株式会社 Sound source separation device, sound source separation method, and program
CN115775564A (en) * 2023-01-29 2023-03-10 北京探境科技有限公司 Audio processing method and device, storage medium and intelligent glasses
CN116312602A (en) * 2022-12-07 2023-06-23 之江实验室 Voice signal beam forming method based on interference noise space spectrum matrix
CN116343808A (en) * 2023-03-28 2023-06-27 之江实验室 Flexible microphone array voice enhancement method and device, electronic equipment and medium
CN116846440A (en) * 2023-07-11 2023-10-03 之江实验室 Beam forming method and system for calculating covariance matrix based on singular value decomposition

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10770091B2 (en) * 2016-12-28 2020-09-08 Google Llc Blind source separation using similarity measure

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010054728A (en) * 2008-08-27 2010-03-11 Hitachi Ltd Sound source extracting device
CN105182298A (en) * 2015-10-19 2015-12-23 电子科技大学 Interfering noise covariance matrix reconstruction method aiming at incoming wave direction error
WO2022172441A1 (en) * 2021-02-15 2022-08-18 日本電信電話株式会社 Sound source separation device, sound source separation method, and program
CN116312602A (en) * 2022-12-07 2023-06-23 之江实验室 Voice signal beam forming method based on interference noise space spectrum matrix
CN115775564A (en) * 2023-01-29 2023-03-10 北京探境科技有限公司 Audio processing method and device, storage medium and intelligent glasses
CN116343808A (en) * 2023-03-28 2023-06-27 之江实验室 Flexible microphone array voice enhancement method and device, electronic equipment and medium
CN116846440A (en) * 2023-07-11 2023-10-03 之江实验室 Beam forming method and system for calculating covariance matrix based on singular value decomposition

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Adaptive Beamforming Of Interference Covariance Matrix Reconstruction Based On K-means Clustering;Hao Wangshen et al.;2021 Global Reliability and Prognostics and Health Management (PHM-Nanjing);全文 *
基于协方差矩阵重构和导向矢量估计的稳健自适应波束形成;谢菊兰;李昕亚;李会勇;王旭;;电波科学学报(第02期);全文 *

Also Published As

Publication number Publication date
CN117037836A (en) 2023-11-10

Similar Documents

Publication Publication Date Title
CN107039045B (en) Globally optimized least squares post-filtering for speech enhancement
CN109839612B (en) Sound source direction estimation method and device based on time-frequency masking and deep neural network
Schwartz et al. Multi-microphone speech dereverberation and noise reduction using relative early transfer functions
US7626889B2 (en) Sensor array post-filter for tracking spatial distributions of signals and noise
CN109102822B (en) Filtering method and device based on fixed beam forming
CN106537501B (en) Reverberation estimator
CN110223708B (en) Speech enhancement method based on speech processing and related equipment
Schwartz et al. An expectation-maximization algorithm for multimicrophone speech dereverberation and noise reduction with coherence matrix estimation
Koldovský et al. Spatial source subtraction based on incomplete measurements of relative transfer function
EP2884491A1 (en) Extraction of reverberant sound using microphone arrays
Braun et al. A multichannel diffuse power estimator for dereverberation in the presence of multiple sources
CN111681665A (en) Omnidirectional noise reduction method, equipment and storage medium
CN112712818A (en) Voice enhancement method, device and equipment
Schwartz et al. Nested generalized sidelobe canceller for joint dereverberation and noise reduction
CN116612776A (en) Signal processing method and device for microphone array
CN114242104A (en) Method, device and equipment for voice noise reduction and storage medium
Levin et al. Near-field signal acquisition for smartglasses using two acoustic vector-sensors
Neri et al. Speaker Distance Estimation in Enclosures from Single-Channel Audio
CN113223552A (en) Speech enhancement method, speech enhancement device, speech enhancement apparatus, storage medium, and program
CN117037836B (en) Real-time sound source separation method and device based on signal covariance matrix reconstruction
Pfeifenberger et al. Blind source extraction based on a direction-dependent a-priori SNR.
CN106448693B (en) A kind of audio signal processing method and device
Ji et al. Coherence-Based Dual-Channel Noise Reduction Algorithm in a Complex Noisy Environment.
Firoozabadi et al. Combination of nested microphone array and subband processing for multiple simultaneous speaker localization
Laufer et al. ML estimation and CRBs for reverberation, speech, and noise PSDs in rank-deficient noise field

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant