CN111711918B

CN111711918B - Coherent sound and environmental sound extraction method and system of multichannel signal

Info

Publication number: CN111711918B
Application number: CN202010448458.9A
Authority: CN
Inventors: 吴彦琴; 桑晋秋; 郑成诗; 张芳杰; 李晓东
Original assignee: Institute of Acoustics CAS
Current assignee: Institute of Acoustics CAS
Priority date: 2020-05-25
Filing date: 2020-05-25
Publication date: 2021-05-18
Anticipated expiration: 2040-05-25
Also published as: CN111711918A

Abstract

The invention discloses a method and a system for extracting coherent sound and environmental sound of a multi-channel signal, wherein the method comprises the following steps: calculating weight expressions of the N channel signal coherent sounds, and estimating the coherent sounds according to the weight expressions, thereby calculating the coherent sounds of each channel; calculating the environment sound of each channel according to the coherent sound of each channel; and carrying out inverse Fourier transform on the N channels of coherent sound and the N channels of environment sound to obtain coherent sound and environment sound represented by a time domain. The method can realize the extraction of the coherent sound and the environmental sound no matter whether the proportion of the coherent sound energy is equal or not and the energy of the environmental sound in each channel is equal or not, and has small extraction error and high precision.

Description

Coherent sound and environmental sound extraction method and system of multichannel signal

Technical Field

The invention relates to the field of spatial sound reproduction, in particular to a method and a system for extracting coherent sound and environmental sound of a multi-channel signal.

Background

In the case of spatial sound reproduction, it is necessary to satisfy not only certain requirements for sound source localization and sound image width but also good spatial feeling and immersion feeling. The spatial sound mainly includes two components of coherent sound having directivity and ambient sound having diffusivity. Since coherent sound and Ambient sound have different characteristics and are perceived differently, in order to achieve a better spatial sound reproduction effect, it is necessary to extract (PAE) coherent sound and Ambient sound and perform different processing.

The PAE technology can be fused with spatial audio coding systems such as spatial audio scene coding, directional audio coding and the like, and has become one of the key technologies of a spatial sound reproduction system. In general, PAE techniques, as a front-end for audio encoding or decoding, can enable complex, efficient, and immersive spatial sound playback. First, the PAE technique separates the coherent sound from the ambient sound in the spatial sound scene, which can make the audio format for replaying the spatial sound independent from the original audio format, increasing the flexibility of spatial sound replay. Secondly, for the object-based audio format, the PAE-based sound reproduction system can reproduce a sound scene with better spatial sense without separating a single sound source object, and the efficiency of spatial sound reproduction is maintained. Finally, two important components in the sound scene, namely a coherent sound component and an environmental sound component, are separated by the PAE technology, and the two important components are respectively processed to improve the auditory experience when the sound scene is reconstructed.

The PAE may be implemented by a Principal Component Analysis (PCA), in which a feature vector corresponding to a maximum feature value of a covariance matrix of an input signal is identified as a coherent acoustic vector by using correlation between channels, the vector is normalized to obtain a unit vector, and the input signal is projected onto the unit vector to obtain coherent acoustic of each channel. The PCA method is used on the premise that coherent sound occupies a dominant energy, and an extraction error increases when the coherent sound energy is small. In addition, when the number of channels is large, the eigenvector corresponding to the maximum eigenvalue of the covariance matrix of the input signal is not easy to be solved. In addition to the PCA method, another method widely used in PAE is the Least-Squares (LS) method. Since the calculation amount of the estimation weight is large when the LS method is used to estimate the coherent sound, especially when the number of channels is large, the estimation weight cannot be calculated, so the LS method is only used for PAE of the stereo signal at present. The paired correlation method is a PAE method specially aiming at multi-channel signals, pairwise pairs of the multi-channel signals are paired, a linear relation between coherent acoustic energy occupation ratios of all channels and correlation values among the channels is explored, the coherent acoustic energy occupation ratios of all the channels are solved by utilizing the correlation values among the channels, and the PAE of the multi-channel signals is completed. However, this method only uses amplitude information of the correlation value, and the accuracy of extracting coherent sound is not high.

Disclosure of Invention

The invention aims to overcome the technical defects and provides a coherent sound and environment sound extraction method of a multi-channel signal. According to the method, when the number of channels is small, the weight of coherent sound is estimated by using a least square method, and a weight expression when the coherent sound estimation is carried out on a multi-channel signal with any number of channels is obtained according to the regularity of the change of the weight along with the number of channels. In addition, the method of the invention utilizes the signal energy of each channel and the correlation value among the channels to calculate each unknown parameter in the weight expression, thereby realizing PAE of the multichannel signal.

To achieve the above object, embodiment 1 of the present invention provides a coherent acoustic and ambient acoustic extraction method for a multichannel signal, including:

calculating weight expressions of the N channel signal coherent sounds, and estimating the coherent sounds according to the weight expressions, thereby calculating the coherent sounds of each channel;

calculating the environment sound of each channel according to the coherent sound of each channel;

and carrying out inverse Fourier transform on the N channels of coherent sound and the N channels of environment sound to obtain coherent sound and environment sound represented by a time domain.

As an improvement of the above method, the method calculates a weight expression of the coherent sound of the N channel signals, estimates the coherent sound according to the weight expression, and thereby calculates the coherent sound of each channel; the method specifically comprises the following steps:

fourier transform is carried out on time domain multi-channel signals, and the nth channel inputs a signal X_nExpressed as:

X_n＝β_nS+A_n

wherein S represents the spectrum of coherent sound, β_nRepresenting the amplitude difference factor of the coherent sound of the nth channel and the coherent sound of the first channel, N is more than or equal to 1 and less than or equal to N, beta₁＝1，A_nA frequency spectrum representing the ambient sound of the nth channel;

calculating the nth channel input signal X_nShort time energy of

Calculate the correlation between any two channels:

wherein,

is n th₁A channel and an n-th channel₂Correlation value between channels, n₁＝1,2,…,N,n₂＝1,2,…,N,n₁≠n₂(ii) a In common with

A number of different cross-correlation values;

by using

Selecting N groups of cross-correlation values to simultaneously calculate the proportion of coherent sound in each channel to eta_n；

For the first channel, β is known₁1, therefore, there is:

wherein, P_SRepresents the short-term energy of the coherent sound,

a short-time energy representing ambient sound of the first channel;

for other channels, based on the input signal X_nShort time energy of

And inter-channel correlation values, resulting in:

wherein,

represents the short-time energy of the nth channel ambient sound, wherein N is 2,3, …, N;

calculating the weight value w of the nth channel_n：

Then the estimate of the coherent sound

Comprises the following steps:

the nth channel coherent sound S_n：

As an improvement of the above method, the ambient sound of each channel is calculated from the coherent sound of each channel; the method specifically comprises the following steps:

ambient sound of nth channel A_nComprises the following steps:

A_n＝X_n-S_n。

embodiment 2 of the present invention provides a coherent acoustic and ambient acoustic extraction system of a multichannel signal, including:

the coherent sound extraction module is used for calculating weight expressions of the coherent sounds of the signals of the N channels, estimating the coherent sounds according to the weight expressions, and calculating the coherent sounds of each channel;

the environment sound extraction module is used for calculating the environment sound of each channel according to the coherent sound of each channel;

and the frequency domain to time domain module is used for carrying out inverse Fourier transform on the N channels of coherent sound and the N channels of environment sound to obtain coherent sound and environment sound represented by time domain.

As an improvement of the above system, the implementation process of the coherent sound extraction module is as follows:

X_n＝β_nS+A_n

calculating the nth channel input signal X_nShort time energy of

Calculate the correlation between any two channels:

wherein,

is n th₁A channel and an n-th channel₂Between passagesCorrelation value, n₁＝1,2,…,N,n₂＝1,2,…,N,n₁≠n₂(ii) a In common with

A number of different cross-correlation values;

by using

For the first channel, β is known₁1, therefore, there is:

wherein, P_SRepresents the short-term energy of the coherent sound,

a short-time energy representing ambient sound of the first channel;

for other channels, based on the input signal X_nShort time energy of

And inter-channel correlation values, resulting in:

wherein,

calculating the weight value w of the nth channel_n：

Then the estimate of the coherent sound

Comprises the following steps:

the nth channel coherent sound S_n：

As an improvement of the above system, the specific implementation process of the ambient sound extraction module is as follows:

ambient sound of nth channel A_nComprises the following steps:

A_n＝X_n-S_n。

the invention has the advantages that:

the method can realize the extraction of the coherent sound and the environmental sound no matter whether the proportion of the coherent sound energy is equal or not and the energy of the environmental sound in each channel is equal, and has small extraction error and high precision.

Drawings

FIG. 1 is a flow chart of a coherent acoustic and ambient acoustic extraction method of a multi-channel signal of the present invention;

FIG. 2(a) is an error plot of coherent acoustic component extraction for a mixed five channel signal 1 using the method of the present invention and pairwise correlation;

FIG. 2(b) is an error plot of ambient sound component extraction for a mixed five channel signal 1 using the method of the present invention and pairwise correlation;

FIG. 3(a) is an error plot of coherent acoustic component extraction for a mixed five-channel signal 2 using the method of the present invention and pairwise correlation;

fig. 3(b) is an error map of ambient sound component extraction for a mixed five-channel signal 2 using the method of the present invention and the pairwise correlation method.

Detailed Description

The technical solution of the present invention will be described in detail below with reference to the accompanying drawings.

Example 1

As shown in fig. 1, embodiment 1 of the present invention proposes a coherent sound and ambient sound extraction method for a multichannel signal, including the following steps:

step 1) framing a multichannel signal, performing Fourier transform to obtain a frequency spectrum, and expressing short-time energy of each channel and correlation values between any two channels according to a multichannel signal model, wherein the method specifically comprises the following steps:

in the multi-channel signal model, the input signal is represented as a superposition of coherent sound and ambient sound. Because the characteristics of coherent sound and environmental sound are different, the coherent sound of each channel is assumed to be completely correlated, namely, a linear relation exists; it is assumed that coherent sound is uncorrelated with ambient sound of each channel and ambient sound between channels.

Step 1-1), performing Fourier transform on the time domain multi-channel signal to obtain a frequency spectrum:

X_n＝β_nS+A_n,n＝1,2,…,N

where N is the number of channels, S represents the frequency spectrum of the coherent sound, β_nAn amplitude difference factor representing the presence of coherent sound of the nth channel and coherent sound of the first channel, and beta₁＝1，A_nA frequency spectrum representing the ambient sound of the nth channel;

step 1-2) the signal energy of each channel can be expressed as:

wherein E { } represents a short-time average.

The correlation values between the channels of steps 1-3) can be expressed as:

wherein,

A number of different cross-correlation values;

step 2) estimating and calculating the weight values of coherent sounds of two channels and three channels by using a least square method, and exploring the regularity of the weight values, thereby giving the weight values of the coherent sounds of N channels;

step 2-1) for two-channel signals, using the input signal X₁And X₂Estimating weight values of coherent sounds:

step 2-1-1) estimating coherent sound of two channels

Wherein, w₁And w₂Representing the estimated weights to be found.

Step 2-1-2) calculation

Is estimated error σ_S：

Step 2-1-3) is solved by using a least square algorithm, namely when the estimation error is completely uncorrelated with the input stereo signal, the obtained weight is an optimal estimation:

E{σ_SX₁}＝0

E{σ_SX₂}＝0

at this time, the weight of the optimal estimation is expressed as:

wherein, P_SRepresents the short-term energy of the coherent sound,

and

respectively representing the short-time energy of the two channel ambient sounds.

Step 2-2) for three-channel signals, calculating an input signal X₁、X₂And X₃Estimating coherent sound

The weight value of (2):

step 2-2-1) estimating coherent sound

Wherein, w₁、w₂And w₃Representing the estimated weights to be found.

Step 2-2-2) can obtain the weight value of the three-channel signal estimated coherent sound by using a processing method similar to the step 2-1):

wherein, P_SRepresents the short-term energy of the coherent sound,

and

respectively representing the short-time energy of the ambient sound of the three channels.

Step 2-3) calculating the estimation weight of each channel of coherent sound aiming at the multichannel signal with the number of channels being N;

for a multi-channel signal with a number of channels N, the estimated coherent sound is represented as:

wherein, the weight value can be expressed as:

wherein, P_SRepresents the short-term energy of the coherent sound,

respectively representing the short-time energy of the N channel ambient sounds.

Step 3) calculating and estimating each unknown parameter in the weight of the coherent sound, and completing the extraction of the coherent sound and the environmental sound of the multichannel signal, wherein the method specifically comprises the following steps:

step 3-1), since coherent sounds of each channel are completely correlated, and the coherent sounds are uncorrelated with ambient sounds of each channel and ambient sounds between channels, signal energy of each channel can be expressed as:

wherein, P_SRepresents the short-term energy of the coherent sound,

representing the short-time energy of the nth channel ambient sound.

The correlation values between two different channels are:

step 3-2) defining the proportion of coherent sound in each channel as eta_nAnd calculating eta from the correlation value between channels_n(ii) a The method comprises the following steps:

step 3-2-1) grouping N channels pairwise and calculating correlation values thereof

According to η_nIs defined as follows:

thus, the relationship can be found:

taking logarithm on two sides to obtain:

step 3-2-2) N channel signals exist

Different cross-correlation values are a problem when N is 3 and an overdetermined problem when N > 3. Therefore, when N is larger than 3, N groups of cross correlation values with strong reliability are selected to obtain the proportion of coherent sound in N unknown channels.

Step 3-3) for the first channel, β is known₁1, therefore, there is:

for other channels, according to the signal energy of each channel and the correlation value between channels, the following can be obtained:

and 3-4) substituting all the parameters in the step 3-3) into the expression of the weights in the step 3-2), so that the estimation of the coherent sound S of the first channel can be realized.

Step 4) PAE is carried out on the multichannel signals with any number of channels, and the method specifically comprises the following steps:

step 4-1) calculating coherent sound of each channel, which specifically comprises the following steps:

because the step 2) calculates the PAE time estimation of the multi-channel signal with any number of channelsAnd 3) calculating each unknown parameter in the weight expression, so that when the number of channels of the multichannel signal is determined, the coherent sound S can be directly estimated according to the weight expression. The coherent sound is directly the coherent sound of the first channel, the coherent sound of other channels is obtained by S linear processing, namely beta_nS(n＝2,…,N)。

Step 4-2) calculating the environment sound of each channel, which specifically comprises the following steps:

the remaining component of each channel is considered as ambient sound, i.e. A_n＝X_n-β_nS。

And 4-3) carrying out inverse Fourier transform on the obtained N-channel coherent sound and N-channel environment sound to obtain coherent sound and environment sound represented by a time domain.

The following describes the performance of the method proposed by the present invention with reference to the simulation example:

and synthesizing the completely correlated coherent sound and the completely uncorrelated environmental sound into a mixed five-channel signal according to a certain proportion, and performing component extraction by using the multi-channel PAE method and the pairwise correlation method provided by the invention. Two groups of mixed multi-channel signals are synthesized, namely a mixed five-channel signal 1 with pure voice as coherent sound and sea wave sound as environment sound, and a mixed five-channel signal 2 with pure music sound as coherent sound and forest background sound as environment sound. In mixing, in order to control the distribution of coherent sound energy between channels, a coherent sound amplitude difference factor beta between channels is set_nWith its reference value beta₀The components are in a certain proportional relation; setting the environmental acoustic energy of each channel in order to control the distribution of the environmental acoustic energy among the channels

And its reference value

The components are in a certain proportional relation; in order to control the proportion of coherent sound components in the mixed signal, different coherent sound energy proportion gamma is set. Reference value beta₀Determined by gamma.

This experimental setupThe amplitude of coherent sound of each channel exists beta₁＝β₂＝β₀，β₃＝2β₀，β₄＝β₅＝0.5β₀The energy of the environmental sound of each channel exists

The coherent acoustic energy ratio γ is 0.05 to 0.95 (interval is 0.1). Extraction error epsilon of coherent sound_PRespectively expressed as:

extraction error epsilon of environmental sound_aRespectively expressed as:

fig. 2(a) and 2(b) represent the extraction errors of coherent sound and ambient sound when PAE is performed on the mixed five-channel signal 1 by the algorithm and the pairwise correlation method proposed by the present invention, respectively; fig. 3(a) and 3(b) represent extraction errors of coherent sound and ambient sound when the algorithm and the pairwise correlation method proposed by the present invention perform PAE on the mixed five-channel signal 2, respectively. It can be seen that, in the whole interval of the coherent acoustic energy ratio gamma of 0.05 to 0.95 (interval of 0.1), the extraction errors of the algorithm provided by the invention are all smaller than those of the pairwise correlation method.

Example 2

Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention and are not limited. Although the present invention has been described in detail with reference to the embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A method of coherent acoustic and ambient acoustic extraction of a multichannel signal, the method comprising:

carrying out inverse Fourier transform on the N channels of coherent sound and the N channels of environment sound to obtain coherent sound and environment sound represented by a time domain;

the weight expression of the coherent sound of the signals of the N channels is calculated, and the coherent sound is estimated according to the weight expression, so that the coherent sound of each channel is calculated; the method specifically comprises the following steps:

X_n＝β_nS+A_n

calculating the nth channel input signal X_nShort time energy of

Calculate the correlation between any two channels:

wherein,

A number of different cross-correlation values;

by using

For the first channel, β is known₁1, therefore, there is:

wherein, P_SRepresents the short-term energy of the coherent sound,

a short-time energy representing ambient sound of the first channel;

for other channels, based on the input signal X_nShort time energy of

And inter-channel correlation values, resulting in:

wherein,

representing the short-time energy of the environment sound of the nth channel, wherein n is more than or equal to 2;

calculating the weight value w of the nth channel_n：

Then the estimate of the coherent sound

Comprises the following steps:

the nth channel coherent sound S_n：

2. The method according to claim 1, wherein the method calculates the ambient sound of each channel from the coherent sound of each channel; the method specifically comprises the following steps:

ambient sound of nth channel A_nComprises the following steps:

A_n＝X_n-S_n。

3. a coherent acoustic and ambient acoustic extraction system for a multichannel signal, the system comprising:

the frequency domain to time domain conversion module is used for carrying out inverse Fourier transform on the N channels of coherent sound and the N channels of environment sound to obtain coherent sound and environment sound represented by a time domain;

the specific implementation process of the coherent sound extraction module is as follows:

X_n＝β_nS+A_n

calculating the nth channel input signal X_nShort time energy of

Calculate the correlation between any two channels:

wherein,

A number of different cross-correlation values;

by using

For the first channel, β is known₁1, therefore, there is:

wherein, P_SRepresents the short-term energy of the coherent sound,

a short-time energy representing ambient sound of the first channel;

for other channels, based on the input signal X_nShort time energy of

And inter-channel correlation values, resulting in:

wherein,

calculating the weight value w of the nth channel_n：

Then the estimate of the coherent sound

Comprises the following steps:

the nth channel coherent sound S_n：

4. The system for extracting coherent sound and environmental sound of a multi-channel signal according to claim 3, wherein the environmental sound extraction module is implemented by:

ambient sound of nth channel A_nComprises the following steps:

A_n＝X_n-S_n。