CN111711918B - Coherent sound and environmental sound extraction method and system of multichannel signal - Google Patents

Coherent sound and environmental sound extraction method and system of multichannel signal Download PDF

Info

Publication number
CN111711918B
CN111711918B CN202010448458.9A CN202010448458A CN111711918B CN 111711918 B CN111711918 B CN 111711918B CN 202010448458 A CN202010448458 A CN 202010448458A CN 111711918 B CN111711918 B CN 111711918B
Authority
CN
China
Prior art keywords
sound
channel
coherent
channels
calculating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010448458.9A
Other languages
Chinese (zh)
Other versions
CN111711918A (en
Inventor
吴彦琴
桑晋秋
郑成诗
张芳杰
李晓东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Acoustics CAS
Original Assignee
Institute of Acoustics CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Acoustics CAS filed Critical Institute of Acoustics CAS
Priority to CN202010448458.9A priority Critical patent/CN111711918B/en
Publication of CN111711918A publication Critical patent/CN111711918A/en
Application granted granted Critical
Publication of CN111711918B publication Critical patent/CN111711918B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

The invention discloses a method and a system for extracting coherent sound and environmental sound of a multi-channel signal, wherein the method comprises the following steps: calculating weight expressions of the N channel signal coherent sounds, and estimating the coherent sounds according to the weight expressions, thereby calculating the coherent sounds of each channel; calculating the environment sound of each channel according to the coherent sound of each channel; and carrying out inverse Fourier transform on the N channels of coherent sound and the N channels of environment sound to obtain coherent sound and environment sound represented by a time domain. The method can realize the extraction of the coherent sound and the environmental sound no matter whether the proportion of the coherent sound energy is equal or not and the energy of the environmental sound in each channel is equal or not, and has small extraction error and high precision.

Description

Coherent sound and environmental sound extraction method and system of multichannel signal
Technical Field
The invention relates to the field of spatial sound reproduction, in particular to a method and a system for extracting coherent sound and environmental sound of a multi-channel signal.
Background
In the case of spatial sound reproduction, it is necessary to satisfy not only certain requirements for sound source localization and sound image width but also good spatial feeling and immersion feeling. The spatial sound mainly includes two components of coherent sound having directivity and ambient sound having diffusivity. Since coherent sound and Ambient sound have different characteristics and are perceived differently, in order to achieve a better spatial sound reproduction effect, it is necessary to extract (PAE) coherent sound and Ambient sound and perform different processing.
The PAE technology can be fused with spatial audio coding systems such as spatial audio scene coding, directional audio coding and the like, and has become one of the key technologies of a spatial sound reproduction system. In general, PAE techniques, as a front-end for audio encoding or decoding, can enable complex, efficient, and immersive spatial sound playback. First, the PAE technique separates the coherent sound from the ambient sound in the spatial sound scene, which can make the audio format for replaying the spatial sound independent from the original audio format, increasing the flexibility of spatial sound replay. Secondly, for the object-based audio format, the PAE-based sound reproduction system can reproduce a sound scene with better spatial sense without separating a single sound source object, and the efficiency of spatial sound reproduction is maintained. Finally, two important components in the sound scene, namely a coherent sound component and an environmental sound component, are separated by the PAE technology, and the two important components are respectively processed to improve the auditory experience when the sound scene is reconstructed.
The PAE may be implemented by a Principal Component Analysis (PCA), in which a feature vector corresponding to a maximum feature value of a covariance matrix of an input signal is identified as a coherent acoustic vector by using correlation between channels, the vector is normalized to obtain a unit vector, and the input signal is projected onto the unit vector to obtain coherent acoustic of each channel. The PCA method is used on the premise that coherent sound occupies a dominant energy, and an extraction error increases when the coherent sound energy is small. In addition, when the number of channels is large, the eigenvector corresponding to the maximum eigenvalue of the covariance matrix of the input signal is not easy to be solved. In addition to the PCA method, another method widely used in PAE is the Least-Squares (LS) method. Since the calculation amount of the estimation weight is large when the LS method is used to estimate the coherent sound, especially when the number of channels is large, the estimation weight cannot be calculated, so the LS method is only used for PAE of the stereo signal at present. The paired correlation method is a PAE method specially aiming at multi-channel signals, pairwise pairs of the multi-channel signals are paired, a linear relation between coherent acoustic energy occupation ratios of all channels and correlation values among the channels is explored, the coherent acoustic energy occupation ratios of all the channels are solved by utilizing the correlation values among the channels, and the PAE of the multi-channel signals is completed. However, this method only uses amplitude information of the correlation value, and the accuracy of extracting coherent sound is not high.
Disclosure of Invention
The invention aims to overcome the technical defects and provides a coherent sound and environment sound extraction method of a multi-channel signal. According to the method, when the number of channels is small, the weight of coherent sound is estimated by using a least square method, and a weight expression when the coherent sound estimation is carried out on a multi-channel signal with any number of channels is obtained according to the regularity of the change of the weight along with the number of channels. In addition, the method of the invention utilizes the signal energy of each channel and the correlation value among the channels to calculate each unknown parameter in the weight expression, thereby realizing PAE of the multichannel signal.
To achieve the above object, embodiment 1 of the present invention provides a coherent acoustic and ambient acoustic extraction method for a multichannel signal, including:
calculating weight expressions of the N channel signal coherent sounds, and estimating the coherent sounds according to the weight expressions, thereby calculating the coherent sounds of each channel;
calculating the environment sound of each channel according to the coherent sound of each channel;
and carrying out inverse Fourier transform on the N channels of coherent sound and the N channels of environment sound to obtain coherent sound and environment sound represented by a time domain.
As an improvement of the above method, the method calculates a weight expression of the coherent sound of the N channel signals, estimates the coherent sound according to the weight expression, and thereby calculates the coherent sound of each channel; the method specifically comprises the following steps:
fourier transform is carried out on time domain multi-channel signals, and the nth channel inputs a signal XnExpressed as:
Xn=βnS+An
wherein S represents the spectrum of coherent sound, βnRepresenting the amplitude difference factor of the coherent sound of the nth channel and the coherent sound of the first channel, N is more than or equal to 1 and less than or equal to N, beta1=1,AnA frequency spectrum representing the ambient sound of the nth channel;
calculating the nth channel input signal XnShort time energy of
Figure BDA0002506801810000021
Figure BDA0002506801810000022
Calculate the correlation between any two channels:
Figure BDA0002506801810000023
wherein,
Figure BDA0002506801810000024
is n th1A channel and an n-th channel2Correlation value between channels, n1=1,2,…,N,n2=1,2,…,N,n1≠n2(ii) a In common with
Figure BDA0002506801810000025
A number of different cross-correlation values;
by using
Figure BDA0002506801810000031
Selecting N groups of cross-correlation values to simultaneously calculate the proportion of coherent sound in each channel to etan
For the first channel, β is known11, therefore, there is:
Figure BDA0002506801810000032
Figure BDA0002506801810000033
wherein, PSRepresents the short-term energy of the coherent sound,
Figure BDA0002506801810000034
a short-time energy representing ambient sound of the first channel;
for other channels, based on the input signal XnShort time energy of
Figure BDA0002506801810000035
And inter-channel correlation values, resulting in:
Figure BDA0002506801810000036
Figure BDA0002506801810000037
wherein,
Figure BDA0002506801810000038
represents the short-time energy of the nth channel ambient sound, wherein N is 2,3, …, N;
calculating the weight value w of the nth channeln
Figure BDA0002506801810000039
Then the estimate of the coherent sound
Figure BDA00025068018100000310
Comprises the following steps:
Figure BDA00025068018100000311
the nth channel coherent sound Sn
Figure BDA00025068018100000312
As an improvement of the above method, the ambient sound of each channel is calculated from the coherent sound of each channel; the method specifically comprises the following steps:
ambient sound of nth channel AnComprises the following steps:
An=Xn-Sn
embodiment 2 of the present invention provides a coherent acoustic and ambient acoustic extraction system of a multichannel signal, including:
the coherent sound extraction module is used for calculating weight expressions of the coherent sounds of the signals of the N channels, estimating the coherent sounds according to the weight expressions, and calculating the coherent sounds of each channel;
the environment sound extraction module is used for calculating the environment sound of each channel according to the coherent sound of each channel;
and the frequency domain to time domain module is used for carrying out inverse Fourier transform on the N channels of coherent sound and the N channels of environment sound to obtain coherent sound and environment sound represented by time domain.
As an improvement of the above system, the implementation process of the coherent sound extraction module is as follows:
fourier transform is carried out on time domain multi-channel signals, and the nth channel inputs a signal XnExpressed as:
Xn=βnS+An
wherein S represents the spectrum of coherent sound, βnRepresenting the amplitude difference factor of the coherent sound of the nth channel and the coherent sound of the first channel, N is more than or equal to 1 and less than or equal to N, beta1=1,AnA frequency spectrum representing the ambient sound of the nth channel;
calculating the nth channel input signal XnShort time energy of
Figure BDA0002506801810000041
Figure BDA0002506801810000042
Calculate the correlation between any two channels:
Figure BDA0002506801810000043
wherein,
Figure BDA0002506801810000044
is n th1A channel and an n-th channel2Between passagesCorrelation value, n1=1,2,…,N,n2=1,2,…,N,n1≠n2(ii) a In common with
Figure BDA0002506801810000045
A number of different cross-correlation values;
by using
Figure BDA0002506801810000046
Selecting N groups of cross-correlation values to simultaneously calculate the proportion of coherent sound in each channel to etan
For the first channel, β is known11, therefore, there is:
Figure BDA0002506801810000047
Figure BDA0002506801810000048
wherein, PSRepresents the short-term energy of the coherent sound,
Figure BDA0002506801810000049
a short-time energy representing ambient sound of the first channel;
for other channels, based on the input signal XnShort time energy of
Figure BDA00025068018100000410
And inter-channel correlation values, resulting in:
Figure BDA0002506801810000051
Figure BDA0002506801810000052
wherein,
Figure BDA0002506801810000053
represents the short-time energy of the nth channel ambient sound, wherein N is 2,3, …, N;
calculating the weight value w of the nth channeln
Figure BDA0002506801810000054
Then the estimate of the coherent sound
Figure BDA0002506801810000055
Comprises the following steps:
Figure BDA0002506801810000056
the nth channel coherent sound Sn
Figure BDA0002506801810000057
As an improvement of the above system, the specific implementation process of the ambient sound extraction module is as follows:
ambient sound of nth channel AnComprises the following steps:
An=Xn-Sn
the invention has the advantages that:
the method can realize the extraction of the coherent sound and the environmental sound no matter whether the proportion of the coherent sound energy is equal or not and the energy of the environmental sound in each channel is equal, and has small extraction error and high precision.
Drawings
FIG. 1 is a flow chart of a coherent acoustic and ambient acoustic extraction method of a multi-channel signal of the present invention;
FIG. 2(a) is an error plot of coherent acoustic component extraction for a mixed five channel signal 1 using the method of the present invention and pairwise correlation;
FIG. 2(b) is an error plot of ambient sound component extraction for a mixed five channel signal 1 using the method of the present invention and pairwise correlation;
FIG. 3(a) is an error plot of coherent acoustic component extraction for a mixed five-channel signal 2 using the method of the present invention and pairwise correlation;
fig. 3(b) is an error map of ambient sound component extraction for a mixed five-channel signal 2 using the method of the present invention and the pairwise correlation method.
Detailed Description
The technical solution of the present invention will be described in detail below with reference to the accompanying drawings.
Example 1
As shown in fig. 1, embodiment 1 of the present invention proposes a coherent sound and ambient sound extraction method for a multichannel signal, including the following steps:
step 1) framing a multichannel signal, performing Fourier transform to obtain a frequency spectrum, and expressing short-time energy of each channel and correlation values between any two channels according to a multichannel signal model, wherein the method specifically comprises the following steps:
in the multi-channel signal model, the input signal is represented as a superposition of coherent sound and ambient sound. Because the characteristics of coherent sound and environmental sound are different, the coherent sound of each channel is assumed to be completely correlated, namely, a linear relation exists; it is assumed that coherent sound is uncorrelated with ambient sound of each channel and ambient sound between channels.
Step 1-1), performing Fourier transform on the time domain multi-channel signal to obtain a frequency spectrum:
Xn=βnS+An,n=1,2,…,N
where N is the number of channels, S represents the frequency spectrum of the coherent sound, βnAn amplitude difference factor representing the presence of coherent sound of the nth channel and coherent sound of the first channel, and beta1=1,AnA frequency spectrum representing the ambient sound of the nth channel;
step 1-2) the signal energy of each channel can be expressed as:
Figure BDA0002506801810000061
wherein E { } represents a short-time average.
The correlation values between the channels of steps 1-3) can be expressed as:
Figure BDA0002506801810000062
wherein,
Figure BDA0002506801810000063
is n th1A channel and an n-th channel2Correlation value between channels, n1=1,2,…,N,n2=1,2,…,N,n1≠n2(ii) a In common with
Figure BDA0002506801810000064
A number of different cross-correlation values;
step 2) estimating and calculating the weight values of coherent sounds of two channels and three channels by using a least square method, and exploring the regularity of the weight values, thereby giving the weight values of the coherent sounds of N channels;
step 2-1) for two-channel signals, using the input signal X1And X2Estimating weight values of coherent sounds:
step 2-1-1) estimating coherent sound of two channels
Figure BDA0002506801810000071
Figure BDA0002506801810000072
Wherein, w1And w2Representing the estimated weights to be found.
Step 2-1-2) calculation
Figure BDA0002506801810000073
Is estimated error σS
Figure BDA0002506801810000074
Step 2-1-3) is solved by using a least square algorithm, namely when the estimation error is completely uncorrelated with the input stereo signal, the obtained weight is an optimal estimation:
E{σSX1}=0
E{σSX2}=0
at this time, the weight of the optimal estimation is expressed as:
Figure BDA0002506801810000075
Figure BDA0002506801810000076
wherein, PSRepresents the short-term energy of the coherent sound,
Figure BDA0002506801810000077
and
Figure BDA0002506801810000078
respectively representing the short-time energy of the two channel ambient sounds.
Step 2-2) for three-channel signals, calculating an input signal X1、X2And X3Estimating coherent sound
Figure BDA0002506801810000079
The weight value of (2):
step 2-2-1) estimating coherent sound
Figure BDA00025068018100000710
Figure BDA00025068018100000711
Wherein, w1、w2And w3Representing the estimated weights to be found.
Step 2-2-2) can obtain the weight value of the three-channel signal estimated coherent sound by using a processing method similar to the step 2-1):
Figure BDA0002506801810000081
Figure BDA0002506801810000082
Figure BDA0002506801810000083
wherein, PSRepresents the short-term energy of the coherent sound,
Figure BDA0002506801810000084
and
Figure BDA0002506801810000085
respectively representing the short-time energy of the ambient sound of the three channels.
Step 2-3) calculating the estimation weight of each channel of coherent sound aiming at the multichannel signal with the number of channels being N;
for a multi-channel signal with a number of channels N, the estimated coherent sound is represented as:
Figure BDA0002506801810000086
wherein, the weight value can be expressed as:
Figure BDA0002506801810000087
wherein, PSRepresents the short-term energy of the coherent sound,
Figure BDA0002506801810000088
respectively representing the short-time energy of the N channel ambient sounds.
Step 3) calculating and estimating each unknown parameter in the weight of the coherent sound, and completing the extraction of the coherent sound and the environmental sound of the multichannel signal, wherein the method specifically comprises the following steps:
step 3-1), since coherent sounds of each channel are completely correlated, and the coherent sounds are uncorrelated with ambient sounds of each channel and ambient sounds between channels, signal energy of each channel can be expressed as:
Figure BDA0002506801810000089
wherein, PSRepresents the short-term energy of the coherent sound,
Figure BDA00025068018100000810
representing the short-time energy of the nth channel ambient sound.
The correlation values between two different channels are:
Figure BDA00025068018100000811
step 3-2) defining the proportion of coherent sound in each channel as etanAnd calculating eta from the correlation value between channelsn(ii) a The method comprises the following steps:
step 3-2-1) grouping N channels pairwise and calculating correlation values thereof
Figure BDA00025068018100000812
According to ηnIs defined as follows:
Figure BDA0002506801810000091
thus, the relationship can be found:
Figure BDA0002506801810000092
taking logarithm on two sides to obtain:
Figure BDA0002506801810000093
step 3-2-2) N channel signals exist
Figure BDA0002506801810000094
Different cross-correlation values are a problem when N is 3 and an overdetermined problem when N > 3. Therefore, when N is larger than 3, N groups of cross correlation values with strong reliability are selected to obtain the proportion of coherent sound in N unknown channels.
Step 3-3) for the first channel, β is known11, therefore, there is:
Figure BDA0002506801810000095
Figure BDA0002506801810000096
for other channels, according to the signal energy of each channel and the correlation value between channels, the following can be obtained:
Figure BDA0002506801810000097
Figure BDA0002506801810000098
and 3-4) substituting all the parameters in the step 3-3) into the expression of the weights in the step 3-2), so that the estimation of the coherent sound S of the first channel can be realized.
Step 4) PAE is carried out on the multichannel signals with any number of channels, and the method specifically comprises the following steps:
step 4-1) calculating coherent sound of each channel, which specifically comprises the following steps:
because the step 2) calculates the PAE time estimation of the multi-channel signal with any number of channelsAnd 3) calculating each unknown parameter in the weight expression, so that when the number of channels of the multichannel signal is determined, the coherent sound S can be directly estimated according to the weight expression. The coherent sound is directly the coherent sound of the first channel, the coherent sound of other channels is obtained by S linear processing, namely betanS(n=2,…,N)。
Step 4-2) calculating the environment sound of each channel, which specifically comprises the following steps:
the remaining component of each channel is considered as ambient sound, i.e. An=XnnS。
And 4-3) carrying out inverse Fourier transform on the obtained N-channel coherent sound and N-channel environment sound to obtain coherent sound and environment sound represented by a time domain.
The following describes the performance of the method proposed by the present invention with reference to the simulation example:
and synthesizing the completely correlated coherent sound and the completely uncorrelated environmental sound into a mixed five-channel signal according to a certain proportion, and performing component extraction by using the multi-channel PAE method and the pairwise correlation method provided by the invention. Two groups of mixed multi-channel signals are synthesized, namely a mixed five-channel signal 1 with pure voice as coherent sound and sea wave sound as environment sound, and a mixed five-channel signal 2 with pure music sound as coherent sound and forest background sound as environment sound. In mixing, in order to control the distribution of coherent sound energy between channels, a coherent sound amplitude difference factor beta between channels is setnWith its reference value beta0The components are in a certain proportional relation; setting the environmental acoustic energy of each channel in order to control the distribution of the environmental acoustic energy among the channels
Figure BDA0002506801810000101
And its reference value
Figure BDA0002506801810000102
The components are in a certain proportional relation; in order to control the proportion of coherent sound components in the mixed signal, different coherent sound energy proportion gamma is set. Reference value beta0Determined by gamma.
This experimental setupThe amplitude of coherent sound of each channel exists beta1=β2=β0,β3=2β0,β4=β5=0.5β0The energy of the environmental sound of each channel exists
Figure BDA0002506801810000103
The coherent acoustic energy ratio γ is 0.05 to 0.95 (interval is 0.1). Extraction error epsilon of coherent soundPRespectively expressed as:
Figure BDA0002506801810000104
extraction error epsilon of environmental soundaRespectively expressed as:
Figure BDA0002506801810000105
fig. 2(a) and 2(b) represent the extraction errors of coherent sound and ambient sound when PAE is performed on the mixed five-channel signal 1 by the algorithm and the pairwise correlation method proposed by the present invention, respectively; fig. 3(a) and 3(b) represent extraction errors of coherent sound and ambient sound when the algorithm and the pairwise correlation method proposed by the present invention perform PAE on the mixed five-channel signal 2, respectively. It can be seen that, in the whole interval of the coherent acoustic energy ratio gamma of 0.05 to 0.95 (interval of 0.1), the extraction errors of the algorithm provided by the invention are all smaller than those of the pairwise correlation method.
Example 2
Embodiment 2 of the present invention provides a coherent acoustic and ambient acoustic extraction system of a multichannel signal, including:
the coherent sound extraction module is used for calculating weight expressions of the coherent sounds of the signals of the N channels, estimating the coherent sounds according to the weight expressions, and calculating the coherent sounds of each channel;
the environment sound extraction module is used for calculating the environment sound of each channel according to the coherent sound of each channel;
and the frequency domain to time domain module is used for carrying out inverse Fourier transform on the N channels of coherent sound and the N channels of environment sound to obtain coherent sound and environment sound represented by time domain.
Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention and are not limited. Although the present invention has been described in detail with reference to the embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (4)

1. A method of coherent acoustic and ambient acoustic extraction of a multichannel signal, the method comprising:
calculating weight expressions of the N channel signal coherent sounds, and estimating the coherent sounds according to the weight expressions, thereby calculating the coherent sounds of each channel;
calculating the environment sound of each channel according to the coherent sound of each channel;
carrying out inverse Fourier transform on the N channels of coherent sound and the N channels of environment sound to obtain coherent sound and environment sound represented by a time domain;
the weight expression of the coherent sound of the signals of the N channels is calculated, and the coherent sound is estimated according to the weight expression, so that the coherent sound of each channel is calculated; the method specifically comprises the following steps:
fourier transform is carried out on time domain multi-channel signals, and the nth channel inputs a signal XnExpressed as:
Xn=βnS+An
wherein S represents the spectrum of coherent sound, βnRepresenting the amplitude difference factor of the coherent sound of the nth channel and the coherent sound of the first channel, N is more than or equal to 1 and less than or equal to N, beta1=1,AnA frequency spectrum representing the ambient sound of the nth channel;
calculating the nth channel input signal XnShort time energy of
Figure FDA0002921795040000011
Figure FDA0002921795040000012
Calculate the correlation between any two channels:
Figure FDA0002921795040000013
wherein,
Figure FDA0002921795040000014
is n th1A channel and an n-th channel2Correlation value between channels, n1=1,2,…,N,n2=1,2,…,N,n1≠n2(ii) a In common with
Figure FDA0002921795040000015
A number of different cross-correlation values;
by using
Figure FDA0002921795040000016
Selecting N groups of cross-correlation values to simultaneously calculate the proportion of coherent sound in each channel to etan
For the first channel, β is known11, therefore, there is:
Figure FDA0002921795040000017
Figure FDA0002921795040000018
wherein, PSRepresents the short-term energy of the coherent sound,
Figure FDA0002921795040000019
a short-time energy representing ambient sound of the first channel;
for other channels, based on the input signal XnShort time energy of
Figure FDA0002921795040000021
And inter-channel correlation values, resulting in:
Figure FDA0002921795040000022
Figure FDA0002921795040000023
wherein,
Figure FDA0002921795040000024
representing the short-time energy of the environment sound of the nth channel, wherein n is more than or equal to 2;
calculating the weight value w of the nth channeln
Figure FDA0002921795040000025
Then the estimate of the coherent sound
Figure FDA0002921795040000026
Comprises the following steps:
Figure FDA0002921795040000027
the nth channel coherent sound Sn
Figure FDA0002921795040000028
2. The method according to claim 1, wherein the method calculates the ambient sound of each channel from the coherent sound of each channel; the method specifically comprises the following steps:
ambient sound of nth channel AnComprises the following steps:
An=Xn-Sn
3. a coherent acoustic and ambient acoustic extraction system for a multichannel signal, the system comprising:
the coherent sound extraction module is used for calculating weight expressions of the coherent sounds of the signals of the N channels, estimating the coherent sounds according to the weight expressions, and calculating the coherent sounds of each channel;
the environment sound extraction module is used for calculating the environment sound of each channel according to the coherent sound of each channel;
the frequency domain to time domain conversion module is used for carrying out inverse Fourier transform on the N channels of coherent sound and the N channels of environment sound to obtain coherent sound and environment sound represented by a time domain;
the specific implementation process of the coherent sound extraction module is as follows:
fourier transform is carried out on time domain multi-channel signals, and the nth channel inputs a signal XnExpressed as:
Xn=βnS+An
wherein S represents the spectrum of coherent sound, βnRepresenting the amplitude difference factor of the coherent sound of the nth channel and the coherent sound of the first channel, N is more than or equal to 1 and less than or equal to N, beta1=1,AnA frequency spectrum representing the ambient sound of the nth channel;
calculating the nth channel input signal XnShort time energy of
Figure FDA0002921795040000031
Figure FDA0002921795040000032
Calculate the correlation between any two channels:
Figure FDA0002921795040000033
wherein,
Figure FDA0002921795040000034
is n th1A channel and an n-th channel2Correlation value between channels, n1=1,2,…,N,n2=1,2,…,N,n1≠n2(ii) a In common with
Figure FDA0002921795040000035
A number of different cross-correlation values;
by using
Figure FDA0002921795040000036
Selecting N groups of cross-correlation values to simultaneously calculate the proportion of coherent sound in each channel to etan
For the first channel, β is known11, therefore, there is:
Figure FDA0002921795040000037
Figure FDA0002921795040000038
wherein, PSRepresents the short-term energy of the coherent sound,
Figure FDA0002921795040000039
a short-time energy representing ambient sound of the first channel;
for other channels, based on the input signal XnShort time energy of
Figure FDA00029217950400000310
And inter-channel correlation values, resulting in:
Figure FDA00029217950400000311
Figure FDA00029217950400000312
wherein,
Figure FDA00029217950400000313
representing the short-time energy of the environment sound of the nth channel, wherein n is more than or equal to 2;
calculating the weight value w of the nth channeln
Figure FDA0002921795040000041
Then the estimate of the coherent sound
Figure FDA0002921795040000042
Comprises the following steps:
Figure FDA0002921795040000043
the nth channel coherent sound Sn
Figure FDA0002921795040000044
4. The system for extracting coherent sound and environmental sound of a multi-channel signal according to claim 3, wherein the environmental sound extraction module is implemented by:
ambient sound of nth channel AnComprises the following steps:
An=Xn-Sn
CN202010448458.9A 2020-05-25 2020-05-25 Coherent sound and environmental sound extraction method and system of multichannel signal Active CN111711918B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010448458.9A CN111711918B (en) 2020-05-25 2020-05-25 Coherent sound and environmental sound extraction method and system of multichannel signal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010448458.9A CN111711918B (en) 2020-05-25 2020-05-25 Coherent sound and environmental sound extraction method and system of multichannel signal

Publications (2)

Publication Number Publication Date
CN111711918A CN111711918A (en) 2020-09-25
CN111711918B true CN111711918B (en) 2021-05-18

Family

ID=72538330

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010448458.9A Active CN111711918B (en) 2020-05-25 2020-05-25 Coherent sound and environmental sound extraction method and system of multichannel signal

Country Status (1)

Country Link
CN (1) CN111711918B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101401151A (en) * 2006-03-15 2009-04-01 法国电信公司 Device and method for graduated encoding of a multichannel audio signal based on a principal component analysis
EP2523473A1 (en) * 2011-05-11 2012-11-14 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for generating an output signal employing a decomposer
CN110534129A (en) * 2018-05-23 2019-12-03 哈曼贝克自动系统股份有限公司 The separation of dry sound and ambient sound

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8588427B2 (en) * 2007-09-26 2013-11-19 Frauhnhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for extracting an ambient signal in an apparatus and method for obtaining weighting coefficients for extracting an ambient signal and computer program
CN103474066B (en) * 2013-10-11 2016-01-06 福州大学 Based on the ecological of multi-band signal reconstruct
CN103902822B (en) * 2014-03-28 2017-09-08 西安交通大学苏州研究院 Sources number detection method in the case of the mixing of incoherent and coherent signal
CN110531310B (en) * 2019-07-25 2021-07-13 西安交通大学 Far-field coherent signal direction-of-arrival estimation method based on subspace and interpolation transformation

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101401151A (en) * 2006-03-15 2009-04-01 法国电信公司 Device and method for graduated encoding of a multichannel audio signal based on a principal component analysis
EP2523473A1 (en) * 2011-05-11 2012-11-14 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for generating an output signal employing a decomposer
CN110534129A (en) * 2018-05-23 2019-12-03 哈曼贝克自动系统股份有限公司 The separation of dry sound and ambient sound

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
相干声与环境声提取方法的客观性能评估;吴彦琴等;《声学技术》;20191031;第38卷(第5期);全文 *

Also Published As

Publication number Publication date
CN111711918A (en) 2020-09-25

Similar Documents

Publication Publication Date Title
CN110089134B (en) Method, system and computer readable medium for reproducing spatially distributed sound
CN102523551B (en) An apparatus for determining a spatial output multi-channel audio signal
KR101828138B1 (en) Segment-wise Adjustment of Spatial Audio Signal to Different Playback Loudspeaker Setup
CN102124513B (en) Apparatus for determining converted spatial audio signal
KR101341523B1 (en) Method to generate multi-channel audio signals from stereo signals
CN102138342B (en) Apparatus for merging spatial audio streams
US8705750B2 (en) Device and method for converting spatial audio signal
EP2524370B1 (en) Extraction of a direct/ambience signal from a downmix signal and spatial parametric information
JP2019134475A (en) Rendering method, rendering device, and recording medium
CN111316354A (en) Determination of target spatial audio parameters and associated spatial audio playback
CN103650537A (en) Apparatus and method for generating an output signal employing a decomposer
CN104094613A (en) Apparatus and method for microphone positioning based on a spatial power density
US20100169102A1 (en) Low complexity mpeg encoding for surround sound recordings
CN101263742A (en) Audio coding
CN101933344A (en) Method and apparatus for generating a binaural audio signal
EP2543199B1 (en) Method and apparatus for upmixing a two-channel audio signal
Khan et al. Video-aided model-based source separation in real reverberant rooms
WO2008018689A1 (en) Method, medium, and system encoding/decoding a multi-channel audio signal, and method medium, and system decoding a down-mixed signal to a 2-channel signal
KR20110018108A (en) Residual signal encoding and decoding method and apparatus
CN111711918B (en) Coherent sound and environmental sound extraction method and system of multichannel signal
WO2020057050A1 (en) Method for extracting direct sound and background sound, and loudspeaker system and sound reproduction method therefor
CN111669697B (en) Coherent sound and environmental sound extraction method and system of multichannel signal
Cobos et al. Stereo to wave-field synthesis music up-mixing: An objective and subjective evaluation
He et al. Time-shifting based primary-ambient extraction for spatial audio reproduction
CN113449255B (en) Improved method and device for estimating phase angle of environmental component under sparse constraint and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant