CN109640242B - Audio source component and environment component extraction method - Google Patents

Audio source component and environment component extraction method Download PDF

Info

Publication number
CN109640242B
CN109640242B CN201811507726.9A CN201811507726A CN109640242B CN 109640242 B CN109640242 B CN 109640242B CN 201811507726 A CN201811507726 A CN 201811507726A CN 109640242 B CN109640242 B CN 109640242B
Authority
CN
China
Prior art keywords
component
environment
source
frequency point
energy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811507726.9A
Other languages
Chinese (zh)
Other versions
CN109640242A (en
Inventor
史创
陈璐
方惠
李会勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201811507726.9A priority Critical patent/CN109640242B/en
Publication of CN109640242A publication Critical patent/CN109640242A/en
Application granted granted Critical
Publication of CN109640242B publication Critical patent/CN109640242B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S1/00Two-channel systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Acoustics & Sound (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Stereophonic System (AREA)

Abstract

The invention discloses an audio source component and environment component extraction method, and belongs to the technical field of audio and video processing. The method comprises the steps of firstly, solving each positive frequency point component value of an environment component and a source component in each frame under two conditions based on the positive frequency point component values of a stereo audio signal in a left channel and a right channel of a signal complex frequency domain and a source acoustic phase factor; determining a real solution by comparing the energy of the source component and the environment component in the two groups of solving results, and constructing a corresponding negative frequency point component value through a conjugate symmetry relationship; and finally, performing frequency domain to time domain conversion processing on each frequency point component value of each frame to obtain the environment component signals and the source component signals of the left and right channels of the stereo audio signal to be subjected to component extraction. The invention can be used for stereo expansion, and the time domain waveforms of the source component and the environment component extracted by the extraction method have high consistency with the waveforms of the left channel source component and the environment component of the original voice.

Description

Audio source component and environment component extraction method
Technical Field
The invention belongs to the technical field of audio processing, and particularly relates to a technology for decomposing a stereo sound scene.
Background
The channel-based audio format is widely applied in real life, and is mostly adopted in mobile phones, computers and earphones. Audio in this format typically requires a specific sound system for playback. For today's diversified sound systems, the audio signal needs to be decomposed and reconstructed to meet different sound systems to obtain better spatial quality (spatial quality). For example, to obtain a better hearing experience, a two-channel stereo in a mobile phone is played back in a multi-channel headset. The conventional and customary way is: the two-channel signals are processed using audio re-synthesis (audio re-synthesis) and virtualization (virtualization) techniques to obtain multi-channel audio output. Based on the literature "upper and lower two-channel stereo-audio for consumer electronics" and the literature "Spatial audio processing" MPEG surround and other application ", the conventional method can solve the problem of adaptability of the playback system, but the Spatial quality of the reconstructed acoustic scene needs to be improved.
An improved idea for the above problem is to consider the acoustic scene as a linear combination of a source component (primary component) and an ambient component (ambient component). Will stand upThe left and right channels of the bulk acoustic signal are denoted x respectivelyLAnd xRThus, there are: x is the number ofL=pL+aL,xR=pR+aRWherein p isLAnd pRRepresenting the source components of the left and right channels, respectively, aLAnd aRRepresenting the left and right channel ambient components, respectively. In chinese patent application publication No. CN101902679A, an acoustic processing technique disclosed therein can convert a binaural input signal into a 5.1-path surround output signal, and the technique performs a difference between left and right channel signals, and then performs filtering and delay processing on the difference to obtain an ambient component of an acoustic scene, but the method has a large error in estimating the ambient component. In the channel audio format, the following reasonable assumptions can be made: the source components in the left and right channels satisfy a linear relationship, namely: p is a radical ofR=kpLDefining k as a source translation factor; the magnitude of the irrelevance between the environmental components, i.e.: a isL⊥aR,|aL|=|aRL. For the Principal Component Analysis (PCA) algorithm proposed by the above hypothesis Michael m.goodwin and Jean-Marc Jot, the source component and the environment component are respectively estimated by adopting a source environment extraction method for the mixed signal. The quality of sound scene reconstruction can be improved by processing the source component and the environment component by adopting different rendering methods. However, PCA has the disadvantages that the error of a source component is large, irrelevance between environment components is not satisfied, and loudness distortion exists.
Disclosure of Invention
The invention aims to: in view of the existing problems, a new source environmental component extraction method based on uncorrelated environmental components is provided to further improve the accuracy of source component and environmental component extraction, and simultaneously ensure loudness equalization between channels.
The method for extracting the audio source component and the environmental component comprises the following steps:
step 1: respectively framing left and right channel signals of a stereo audio signal to be subjected to component extraction, converting each frame signal to a frequency domain, and extracting a positive frequency point component value x of the left and right channel signals in each frameL[m,f]、xR[m,f]Wherein m represents the number of frames and f represents the frequency value;
step 2: in a signal complex frequency domain, x is obtained according to the positive frequency point component value of each frameL[m,f]Coordinate (x) of1,y1) And xR[m,f]Coordinate (x) of2,y2);
And step 3: respectively solving the component values of the environment component and the source component at each positive frequency point of each frame under two conditions:
(1) for the case where the left channel ambient component is 90 ° behind the right channel ambient component:
Figure GDA0002368723920000021
(2) for the case where the left channel ambient component leads the right channel ambient component by 90 °:
Figure GDA0002368723920000022
wherein (a)1,b1)、(a2,b2) The coordinates, p, of the positive frequency point component values of the left and right channel environment components in the complex frequency domain of the signal respectivelyL、pRRespectively representing the positive frequency point component values of the left channel source component and the right channel source component, wherein k represents a source translation factor;
and 4, step 4: determining a true solution for each positive frequency point of each frame: respectively calculating source component energy and environment component energy for the two groups of solution results, judging whether a solution that the source component energy is greater than the environment component energy exists, and if so, determining that the real solution of the current positive frequency point is a solution that the source component energy is greater than the environment component energy; otherwise, the solution is that the energy of the environment component is greater than the energy of the source component;
the source component energy is the sum of the energies of the left and right source component values of the current positive frequency point, and the environment component energy is the sum of the energies of the left and right channel environment component values of the current positive frequency point;
and 5: constructing the negative frequency point component values of the source components and the environment components of the left channel and the right channel in each frame through a conjugate symmetry relation based on the real solution of each positive frequency point of each frame;
step 6: and performing frequency domain to time domain conversion processing on each frequency point component value of each frame to obtain the environment component signals and the source component signals of the left channel and the right channel of the stereo audio signal to be subjected to component extraction.
In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that: the time domain waveforms of the source component and the environment component extracted by the extraction method have high consistency with the waveforms of the source component and the environment component of the left channel of the original voice, and the left channel and the right channel environment components extracted by the extraction method do not have the problem of amplitude distortion, so that the loudness balance of the left channel and the right channel environment components can be ensured. In addition, the source component and the environment component extracted based on the present invention can highly restore the original audio signal.
Drawings
FIG. 1 is a geometric representation of the source environment extraction method of the present invention;
FIG. 2 is a process flow diagram of a source environment extraction method of the present invention;
FIG. 3 is a time domain waveform of an original left channel source component;
FIG. 4 is a time domain waveform of an original left channel ambient component;
FIG. 5 is a time domain waveform of the left channel source component extracted by the new method of the present invention;
FIG. 6 is a time domain waveform of the left channel environmental component extracted by the new method of the present invention;
FIG. 7 is a time domain waveform of the environmental component of the right channel extracted by the new method of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the following embodiments and accompanying drawings.
Aiming at the defects existing in the conventional PCA algorithm and based on the irrelevance of environment components, the invention provides a novel source environment component extraction method (primary and ambient component estimation based on the uncorrelated environment components, UAPAE) based on the condition that the environment components are vertical in a frequency domain, so as to improve the extraction precision of the source components and the environment components and ensure the loudness balance between channels.
A stereo signal can be seen as a linear combination of source and ambient components, with a linear relationship of k times satisfied between the source components and uncorrelated and constant amplitude between the ambient components. The environmental component and the source component can be separated using the geometric relationship under the above conditions. Transforming the signal to the frequency domain by short-time Fourier transform, and for each time frequency point:
xL[m,f]=pL[m,f]+aL[m,f](1)
xR[m,f]=pR[m,f]+aR[m,f](2)
where m is the number of frames, f is the frequency value, xL[m,f]、xR[m,f]Respectively representing corresponding left and right channel signals, pL[m,f]、pR[m,f]Respectively representing the source components of the corresponding left and right channels, aL[m,f]、aR[m,f]Respectively representing the environment classification of the corresponding left and right channels.
And performing signal decomposition at each time frequency point, solving only in a positive frequency part due to the fact that Fourier transformation of the real signals has conjugate symmetry, constructing a negative frequency component through a conjugate symmetry relation, thereby obtaining a frequency spectrum of the signals, and obtaining a time domain solution which is still the real signals through inverse Fourier transformation.
In the complex frequency domain of the signal, a coordinate relationship as shown in fig. 1 can be established according to the frequency domain value: for a certain time frequency point, use (x)1,y1) Representing the left channel signal xLBy (x)2,y2) Representing the right channel signal xRCoordinates of (a) with1,b1) Representing the left channel ambient component aLCoordinates of (a) with2,b2) Representing the right channel ambient component aRIn fig. 1, Im denotes the imaginary part and Re denotes the real part;
due to aL⊥aRAnd | aL|=|aR|,Then there are:
Figure GDA0002368723920000041
thus, it is possible to obtain:
Figure GDA0002368723920000042
or
Figure GDA0002368723920000043
The formula (4) and the formula (5) correspond to aLRatio aRLags by 90 °, andLratio aRLeading by 90 deg. two cases.
Without loss of generality, in the present embodiment, one of them is selected for solution, and the solution in the other case can be obtained by a similar method.
According to pR=kpLThe following can be obtained: (x)2-a2,y2-b2)=k(x1-a1,y1-b1) In combination with formula (4), one can solve:
Figure GDA0002368723920000044
Figure GDA0002368723920000045
when a isLRatio aRA 90 ° lead gives:
Figure GDA0002368723920000051
Figure GDA0002368723920000052
the frequency spectrum of the positive frequency component of each frame of signal can be obtained through the relation.
This method contains two solutions, and it is not possible to determine which one is the true solution without additional conditions. The invention can judge a proper one by introducing an optimization criterion: and when the energy value of the source component in the solved signal is higher than the environment component, selecting the solution of which the corresponding source component energy is higher than the environment component energy as the solved signal, and otherwise, selecting the solution of which the environment component energy is higher than the source component energy.
Examples
And (3) making stereo audio to be decomposed:
the source component of the left channel uses a recorded mono speech audio signal (the time domain waveform is shown in fig. 3), and the source component of the right channel multiplies the source component of the left channel by a source panning factor k, where k is 2 in this example. The left channel audio signal of the binaural sound is taken as a left channel environment component (the time domain waveform is shown in fig. 4), and the environment component of the right channel is obtained by performing hilbert transform on the left channel environment component.
Then, the powers of the source component and the ambient component are calculated, and the source components of the left and right channels are processed so that the ratio of the total source component power to the total power is 0.8.
And then mixing the source component and the environment component of the left channel and the right channel respectively to obtain the output signals of the left channel and the right channel, namely obtaining the stereo audio signal to be processed.
Referring to fig. 2, the specific operation steps of implementing sound scene decomposition on the stereo audio signal to be processed by using the extraction method of the present invention are as follows:
first, frame division processing is performed on left and right output signals of a stereo audio signal, and in this embodiment, each frame after frame division processing includes 4096 sampling points.
Then, 4096-point Fast Fourier Transform (FFT) is performed on each frame of audio signal to obtain the frequency spectrum of the left and right channel output signals.
Traverse all frames, for all positive frequency points x within each frameL[m,f]And xR[m,f]The two cases are solved according to equations (6) and (7), and equations (8) and (9), respectivelyConditional positive frequency component a of the left and right channel ambient componentsL=a1+jb1、aR=a2+jb2And a positive frequency component p of the source componentL=x1+jy1、pR=x2+jy2And j represents an imaginary unit.
And then comparing the energy of the source component and the energy of the environment component according to the solving results under the two conditions to determine the real solution of each frame under different positive frequency points: for the solution result of each positive frequency point, if the source component energy (sum of left and right channels) is greater than the environment component energy (sum of left and right channels), the real solution of the current positive frequency point is the solution that the source component energy is greater than the environment component energy; otherwise, the solution is that the energy of the environment component is greater than the energy of the source component.
In this embodiment, the calculation method of the energy of the source component and the energy of the environment component at each positive frequency point is as follows: ep=|pL|2+|pR|2,Ea=|aL|2+|aR|2
And then, constructing a negative frequency component value through a conjugate symmetry relation based on a real solution under each positive frequency point.
And finally, inversely converting the frequency domain signals of all the frames of the source components and the environment components of the left and right channels into time domain signals, finally connecting the signals, and using the extracted components for stereo expansion.
The example extracts the source component of the left channel, the environmental component of the left and right channels, and plots the time domain waveforms thereof, as shown in fig. 5-7; as can be seen from the comparison and analysis with the left channel source component of the original speech shown in fig. 3 and the environment component shown in fig. 4, the time domain waveforms of the left channel source component and the environment component extracted by the extraction method of the present invention have high consistency with the waveforms of the left channel source component and the environment component of the original speech, and the left channel environment component and the right channel environment component extracted by the extraction method of the present invention do not have the problem of amplitude distortion, so that the loudness equalization of the left channel environment component and the right channel environment component can be ensured. In addition, the source component and the environment component extracted by the invention can be found to be almost indistinguishable from the original source component and the environment component through earphone playback, and the original audio signal can be highly restored. In conclusion, the component extraction method provided by the invention has practical utilization value.
While the invention has been described with reference to specific embodiments, any feature disclosed in this specification may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise; all of the disclosed features, or all of the method or process steps, may be combined in any combination, except mutually exclusive features and/or steps.

Claims (2)

1. An audio source component and environment component extraction method, comprising the steps of:
step 1: respectively framing left and right channel signals of a stereo audio signal to be subjected to component extraction, converting each frame signal to a frequency domain, and extracting a positive frequency point component value x of the left and right channel signals in each frameL[m,f]、xR[m,f]Wherein m represents the number of frames and f represents the frequency value;
step 2: in a signal complex frequency domain, x is obtained according to the positive frequency point component value of each frameL[m,f]Coordinate (x) of1,y1) And xR[m,f]Coordinate (x) of2,y2);
And step 3: respectively solving the component values of the environment component and the source component at each positive frequency point of each frame under two conditions:
(1) for the case where the left channel ambient component is 90 ° behind the right channel ambient component:
Figure FDA0002368723910000011
(2) for the case where the left channel ambient component leads the right channel ambient component by 90 °:
Figure FDA0002368723910000012
wherein the content of the first and second substances,(a1,b1)、(a2,b2) The coordinates, p, of the positive frequency point component values of the left and right channel environment components in the complex frequency domain of the signal respectivelyL、pRRespectively representing the positive frequency point component values of the left channel source component and the right channel source component, wherein k represents a source translation factor;
and 4, step 4: determining a true solution for each positive frequency point of each frame: respectively calculating source component energy and environment component energy for the two groups of solution results, judging whether a solution that the source component energy is greater than the environment component energy exists, and if so, determining that the real solution of the current positive frequency point is a solution that the source component energy is greater than the environment component energy; otherwise, the solution is that the energy of the environment component is greater than the energy of the source component;
the source component energy is the sum of the energies of the left and right source component values of the current positive frequency point, and the environment component energy is the sum of the energies of the left and right channel environment component values of the current positive frequency point;
and 5: constructing the negative frequency point component values of the source components and the environment components of the left channel and the right channel in each frame through a conjugate symmetry relation based on the real solution of each positive frequency point of each frame;
step 6: and performing frequency domain to time domain conversion processing on each frequency point component value of each frame to obtain the environment component signals and the source component signals of the left channel and the right channel of the stereo audio signal to be subjected to component extraction.
2. The method of claim 1, wherein in step 1, the number of sampling points included in each frame is set to 4096 when performing the framing process.
CN201811507726.9A 2018-12-11 2018-12-11 Audio source component and environment component extraction method Active CN109640242B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811507726.9A CN109640242B (en) 2018-12-11 2018-12-11 Audio source component and environment component extraction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811507726.9A CN109640242B (en) 2018-12-11 2018-12-11 Audio source component and environment component extraction method

Publications (2)

Publication Number Publication Date
CN109640242A CN109640242A (en) 2019-04-16
CN109640242B true CN109640242B (en) 2020-05-12

Family

ID=66072455

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811507726.9A Active CN109640242B (en) 2018-12-11 2018-12-11 Audio source component and environment component extraction method

Country Status (1)

Country Link
CN (1) CN109640242B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113518299B (en) * 2021-04-30 2022-06-03 电子科技大学 Improved method, equipment and computer readable storage medium for extracting source component and environment component
CN113449255B (en) * 2021-06-15 2022-11-11 电子科技大学 Improved method and device for estimating phase angle of environmental component under sparse constraint and storage medium

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06113388A (en) * 1992-08-11 1994-04-22 Sutajio Ion:Kk Acoustic signal generator
US5970152A (en) * 1996-04-30 1999-10-19 Srs Labs, Inc. Audio enhancement system for use in a surround sound environment
ES2358786T3 (en) * 2007-06-08 2011-05-13 Dolby Laboratories Licensing Corporation HYBRID DERIVATION OF SURROUND SOUND AUDIO CHANNELS COMBINING CONTROLLING SOUND COMPONENTS OF ENVIRONMENTAL SOUND SIGNALS AND WITH MATRICIAL DECODIFICATION.
EP2191462A4 (en) * 2007-09-06 2010-08-18 Lg Electronics Inc A method and an apparatus of decoding an audio signal
CN104240711B (en) * 2013-06-18 2019-10-11 杜比实验室特许公司 For generating the mthods, systems and devices of adaptive audio content
CN105917674B (en) * 2013-10-30 2019-11-22 华为技术有限公司 For handling the method and mobile device of audio signal
US9602946B2 (en) * 2014-12-19 2017-03-21 Nokia Technologies Oy Method and apparatus for providing virtual audio reproduction
US10042038B1 (en) * 2015-09-01 2018-08-07 Digimarc Corporation Mobile devices and methods employing acoustic vector sensors

Also Published As

Publication number Publication date
CN109640242A (en) 2019-04-16

Similar Documents

Publication Publication Date Title
JP6703525B2 (en) Method and device for enhancing sound source
KR101935183B1 (en) A signal processing apparatus for enhancing a voice component within a multi-channal audio signal
EP2612322B1 (en) Method and device for decoding a multichannel audio signal
US8332229B2 (en) Low complexity MPEG encoding for surround sound recordings
CN106971738A (en) The method and device that compression and decompression high-order ambisonics signal are represented
WO2013090463A1 (en) Audio processing method and audio processing apparatus
EP1779385B1 (en) Method and apparatus for encoding and decoding multi-channel audio signal using virtual source location information
US10798511B1 (en) Processing of audio signals for spatial audio
CN109640242B (en) Audio source component and environment component extraction method
CN110024421A (en) Method and apparatus for self adaptive control decorrelation filters
EP3808106A1 (en) Spatial audio capture, transmission and reproduction
WO2021130405A1 (en) Combining of spatial audio parameters
CN106960672B (en) Bandwidth extension method and device for stereo audio
EP2941770B1 (en) Method for determining a stereo signal
CN113646836A (en) Sound field dependent rendering
WO2021120795A1 (en) Sampling rate processing method, apparatus and system, and storage medium and computer device
TWI762949B (en) Method for loss concealment, method for decoding a dirac encoding audio scene and corresponding computer program, loss concealment apparatus and decoder
CN109036456B (en) Method for extracting source component environment component for stereo
CN112133316A (en) Spatial audio representation and rendering
WO2018234623A1 (en) Spatial audio processing
JP6832095B2 (en) Channel number converter and its program
KR20110127783A (en) Apparatus for separating voice and method for separating voice of single channel using the same
Bae et al. A New Non-uniform Sampling & Quantization by using a Modified Correlation
CN113808608A (en) Single sound channel noise suppression method and device based on time-frequency masking smoothing strategy
Marin-Hurtado et al. Preservation of localization cues in BSS-based noise reduction: Application in binaural hearing aids

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant