CN109640242B - Audio source component and environment component extraction method - Google Patents
Audio source component and environment component extraction method Download PDFInfo
- Publication number
- CN109640242B CN109640242B CN201811507726.9A CN201811507726A CN109640242B CN 109640242 B CN109640242 B CN 109640242B CN 201811507726 A CN201811507726 A CN 201811507726A CN 109640242 B CN109640242 B CN 109640242B
- Authority
- CN
- China
- Prior art keywords
- component
- environment
- source
- frequency point
- energy
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000605 extraction Methods 0.000 title claims abstract description 23
- 238000000034 method Methods 0.000 claims abstract description 17
- 230000005236 sound signal Effects 0.000 claims abstract description 15
- 238000006243 chemical reaction Methods 0.000 claims abstract description 3
- 238000009432 framing Methods 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 2
- 239000000126 substance Substances 0.000 claims 1
- 230000007613 environmental effect Effects 0.000 description 9
- 238000000513 principal component analysis Methods 0.000 description 4
- 238000001228 spectrum Methods 0.000 description 3
- 238000000354 decomposition reaction Methods 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000004091 panning Methods 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S1/00—Two-channel systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Acoustics & Sound (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Stereophonic System (AREA)
Abstract
The invention discloses an audio source component and environment component extraction method, and belongs to the technical field of audio and video processing. The method comprises the steps of firstly, solving each positive frequency point component value of an environment component and a source component in each frame under two conditions based on the positive frequency point component values of a stereo audio signal in a left channel and a right channel of a signal complex frequency domain and a source acoustic phase factor; determining a real solution by comparing the energy of the source component and the environment component in the two groups of solving results, and constructing a corresponding negative frequency point component value through a conjugate symmetry relationship; and finally, performing frequency domain to time domain conversion processing on each frequency point component value of each frame to obtain the environment component signals and the source component signals of the left and right channels of the stereo audio signal to be subjected to component extraction. The invention can be used for stereo expansion, and the time domain waveforms of the source component and the environment component extracted by the extraction method have high consistency with the waveforms of the left channel source component and the environment component of the original voice.
Description
Technical Field
The invention belongs to the technical field of audio processing, and particularly relates to a technology for decomposing a stereo sound scene.
Background
The channel-based audio format is widely applied in real life, and is mostly adopted in mobile phones, computers and earphones. Audio in this format typically requires a specific sound system for playback. For today's diversified sound systems, the audio signal needs to be decomposed and reconstructed to meet different sound systems to obtain better spatial quality (spatial quality). For example, to obtain a better hearing experience, a two-channel stereo in a mobile phone is played back in a multi-channel headset. The conventional and customary way is: the two-channel signals are processed using audio re-synthesis (audio re-synthesis) and virtualization (virtualization) techniques to obtain multi-channel audio output. Based on the literature "upper and lower two-channel stereo-audio for consumer electronics" and the literature "Spatial audio processing" MPEG surround and other application ", the conventional method can solve the problem of adaptability of the playback system, but the Spatial quality of the reconstructed acoustic scene needs to be improved.
An improved idea for the above problem is to consider the acoustic scene as a linear combination of a source component (primary component) and an ambient component (ambient component). Will stand upThe left and right channels of the bulk acoustic signal are denoted x respectivelyLAnd xRThus, there are: x is the number ofL=pL+aL,xR=pR+aRWherein p isLAnd pRRepresenting the source components of the left and right channels, respectively, aLAnd aRRepresenting the left and right channel ambient components, respectively. In chinese patent application publication No. CN101902679A, an acoustic processing technique disclosed therein can convert a binaural input signal into a 5.1-path surround output signal, and the technique performs a difference between left and right channel signals, and then performs filtering and delay processing on the difference to obtain an ambient component of an acoustic scene, but the method has a large error in estimating the ambient component. In the channel audio format, the following reasonable assumptions can be made: the source components in the left and right channels satisfy a linear relationship, namely: p is a radical ofR=kpLDefining k as a source translation factor; the magnitude of the irrelevance between the environmental components, i.e.: a isL⊥aR,|aL|=|aRL. For the Principal Component Analysis (PCA) algorithm proposed by the above hypothesis Michael m.goodwin and Jean-Marc Jot, the source component and the environment component are respectively estimated by adopting a source environment extraction method for the mixed signal. The quality of sound scene reconstruction can be improved by processing the source component and the environment component by adopting different rendering methods. However, PCA has the disadvantages that the error of a source component is large, irrelevance between environment components is not satisfied, and loudness distortion exists.
Disclosure of Invention
The invention aims to: in view of the existing problems, a new source environmental component extraction method based on uncorrelated environmental components is provided to further improve the accuracy of source component and environmental component extraction, and simultaneously ensure loudness equalization between channels.
The method for extracting the audio source component and the environmental component comprises the following steps:
step 1: respectively framing left and right channel signals of a stereo audio signal to be subjected to component extraction, converting each frame signal to a frequency domain, and extracting a positive frequency point component value x of the left and right channel signals in each frameL[m,f]、xR[m,f]Wherein m represents the number of frames and f represents the frequency value;
step 2: in a signal complex frequency domain, x is obtained according to the positive frequency point component value of each frameL[m,f]Coordinate (x) of1,y1) And xR[m,f]Coordinate (x) of2,y2);
And step 3: respectively solving the component values of the environment component and the source component at each positive frequency point of each frame under two conditions:
(1) for the case where the left channel ambient component is 90 ° behind the right channel ambient component:
(2) for the case where the left channel ambient component leads the right channel ambient component by 90 °:
wherein (a)1,b1)、(a2,b2) The coordinates, p, of the positive frequency point component values of the left and right channel environment components in the complex frequency domain of the signal respectivelyL、pRRespectively representing the positive frequency point component values of the left channel source component and the right channel source component, wherein k represents a source translation factor;
and 4, step 4: determining a true solution for each positive frequency point of each frame: respectively calculating source component energy and environment component energy for the two groups of solution results, judging whether a solution that the source component energy is greater than the environment component energy exists, and if so, determining that the real solution of the current positive frequency point is a solution that the source component energy is greater than the environment component energy; otherwise, the solution is that the energy of the environment component is greater than the energy of the source component;
the source component energy is the sum of the energies of the left and right source component values of the current positive frequency point, and the environment component energy is the sum of the energies of the left and right channel environment component values of the current positive frequency point;
and 5: constructing the negative frequency point component values of the source components and the environment components of the left channel and the right channel in each frame through a conjugate symmetry relation based on the real solution of each positive frequency point of each frame;
step 6: and performing frequency domain to time domain conversion processing on each frequency point component value of each frame to obtain the environment component signals and the source component signals of the left channel and the right channel of the stereo audio signal to be subjected to component extraction.
In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that: the time domain waveforms of the source component and the environment component extracted by the extraction method have high consistency with the waveforms of the source component and the environment component of the left channel of the original voice, and the left channel and the right channel environment components extracted by the extraction method do not have the problem of amplitude distortion, so that the loudness balance of the left channel and the right channel environment components can be ensured. In addition, the source component and the environment component extracted based on the present invention can highly restore the original audio signal.
Drawings
FIG. 1 is a geometric representation of the source environment extraction method of the present invention;
FIG. 2 is a process flow diagram of a source environment extraction method of the present invention;
FIG. 3 is a time domain waveform of an original left channel source component;
FIG. 4 is a time domain waveform of an original left channel ambient component;
FIG. 5 is a time domain waveform of the left channel source component extracted by the new method of the present invention;
FIG. 6 is a time domain waveform of the left channel environmental component extracted by the new method of the present invention;
FIG. 7 is a time domain waveform of the environmental component of the right channel extracted by the new method of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the following embodiments and accompanying drawings.
Aiming at the defects existing in the conventional PCA algorithm and based on the irrelevance of environment components, the invention provides a novel source environment component extraction method (primary and ambient component estimation based on the uncorrelated environment components, UAPAE) based on the condition that the environment components are vertical in a frequency domain, so as to improve the extraction precision of the source components and the environment components and ensure the loudness balance between channels.
A stereo signal can be seen as a linear combination of source and ambient components, with a linear relationship of k times satisfied between the source components and uncorrelated and constant amplitude between the ambient components. The environmental component and the source component can be separated using the geometric relationship under the above conditions. Transforming the signal to the frequency domain by short-time Fourier transform, and for each time frequency point:
xL[m,f]=pL[m,f]+aL[m,f](1)
xR[m,f]=pR[m,f]+aR[m,f](2)
where m is the number of frames, f is the frequency value, xL[m,f]、xR[m,f]Respectively representing corresponding left and right channel signals, pL[m,f]、pR[m,f]Respectively representing the source components of the corresponding left and right channels, aL[m,f]、aR[m,f]Respectively representing the environment classification of the corresponding left and right channels.
And performing signal decomposition at each time frequency point, solving only in a positive frequency part due to the fact that Fourier transformation of the real signals has conjugate symmetry, constructing a negative frequency component through a conjugate symmetry relation, thereby obtaining a frequency spectrum of the signals, and obtaining a time domain solution which is still the real signals through inverse Fourier transformation.
In the complex frequency domain of the signal, a coordinate relationship as shown in fig. 1 can be established according to the frequency domain value: for a certain time frequency point, use (x)1,y1) Representing the left channel signal xLBy (x)2,y2) Representing the right channel signal xRCoordinates of (a) with1,b1) Representing the left channel ambient component aLCoordinates of (a) with2,b2) Representing the right channel ambient component aRIn fig. 1, Im denotes the imaginary part and Re denotes the real part;
due to aL⊥aRAnd | aL|=|aR|,Then there are:
thus, it is possible to obtain:
or
The formula (4) and the formula (5) correspond to aLRatio aRLags by 90 °, andLratio aRLeading by 90 deg. two cases.
Without loss of generality, in the present embodiment, one of them is selected for solution, and the solution in the other case can be obtained by a similar method.
According to pR=kpLThe following can be obtained: (x)2-a2,y2-b2)=k(x1-a1,y1-b1) In combination with formula (4), one can solve:
when a isLRatio aRA 90 ° lead gives:
the frequency spectrum of the positive frequency component of each frame of signal can be obtained through the relation.
This method contains two solutions, and it is not possible to determine which one is the true solution without additional conditions. The invention can judge a proper one by introducing an optimization criterion: and when the energy value of the source component in the solved signal is higher than the environment component, selecting the solution of which the corresponding source component energy is higher than the environment component energy as the solved signal, and otherwise, selecting the solution of which the environment component energy is higher than the source component energy.
Examples
And (3) making stereo audio to be decomposed:
the source component of the left channel uses a recorded mono speech audio signal (the time domain waveform is shown in fig. 3), and the source component of the right channel multiplies the source component of the left channel by a source panning factor k, where k is 2 in this example. The left channel audio signal of the binaural sound is taken as a left channel environment component (the time domain waveform is shown in fig. 4), and the environment component of the right channel is obtained by performing hilbert transform on the left channel environment component.
Then, the powers of the source component and the ambient component are calculated, and the source components of the left and right channels are processed so that the ratio of the total source component power to the total power is 0.8.
And then mixing the source component and the environment component of the left channel and the right channel respectively to obtain the output signals of the left channel and the right channel, namely obtaining the stereo audio signal to be processed.
Referring to fig. 2, the specific operation steps of implementing sound scene decomposition on the stereo audio signal to be processed by using the extraction method of the present invention are as follows:
first, frame division processing is performed on left and right output signals of a stereo audio signal, and in this embodiment, each frame after frame division processing includes 4096 sampling points.
Then, 4096-point Fast Fourier Transform (FFT) is performed on each frame of audio signal to obtain the frequency spectrum of the left and right channel output signals.
Traverse all frames, for all positive frequency points x within each frameL[m,f]And xR[m,f]The two cases are solved according to equations (6) and (7), and equations (8) and (9), respectivelyConditional positive frequency component a of the left and right channel ambient componentsL=a1+jb1、aR=a2+jb2And a positive frequency component p of the source componentL=x1+jy1、pR=x2+jy2And j represents an imaginary unit.
And then comparing the energy of the source component and the energy of the environment component according to the solving results under the two conditions to determine the real solution of each frame under different positive frequency points: for the solution result of each positive frequency point, if the source component energy (sum of left and right channels) is greater than the environment component energy (sum of left and right channels), the real solution of the current positive frequency point is the solution that the source component energy is greater than the environment component energy; otherwise, the solution is that the energy of the environment component is greater than the energy of the source component.
In this embodiment, the calculation method of the energy of the source component and the energy of the environment component at each positive frequency point is as follows: ep=|pL|2+|pR|2,Ea=|aL|2+|aR|2。
And then, constructing a negative frequency component value through a conjugate symmetry relation based on a real solution under each positive frequency point.
And finally, inversely converting the frequency domain signals of all the frames of the source components and the environment components of the left and right channels into time domain signals, finally connecting the signals, and using the extracted components for stereo expansion.
The example extracts the source component of the left channel, the environmental component of the left and right channels, and plots the time domain waveforms thereof, as shown in fig. 5-7; as can be seen from the comparison and analysis with the left channel source component of the original speech shown in fig. 3 and the environment component shown in fig. 4, the time domain waveforms of the left channel source component and the environment component extracted by the extraction method of the present invention have high consistency with the waveforms of the left channel source component and the environment component of the original speech, and the left channel environment component and the right channel environment component extracted by the extraction method of the present invention do not have the problem of amplitude distortion, so that the loudness equalization of the left channel environment component and the right channel environment component can be ensured. In addition, the source component and the environment component extracted by the invention can be found to be almost indistinguishable from the original source component and the environment component through earphone playback, and the original audio signal can be highly restored. In conclusion, the component extraction method provided by the invention has practical utilization value.
While the invention has been described with reference to specific embodiments, any feature disclosed in this specification may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise; all of the disclosed features, or all of the method or process steps, may be combined in any combination, except mutually exclusive features and/or steps.
Claims (2)
1. An audio source component and environment component extraction method, comprising the steps of:
step 1: respectively framing left and right channel signals of a stereo audio signal to be subjected to component extraction, converting each frame signal to a frequency domain, and extracting a positive frequency point component value x of the left and right channel signals in each frameL[m,f]、xR[m,f]Wherein m represents the number of frames and f represents the frequency value;
step 2: in a signal complex frequency domain, x is obtained according to the positive frequency point component value of each frameL[m,f]Coordinate (x) of1,y1) And xR[m,f]Coordinate (x) of2,y2);
And step 3: respectively solving the component values of the environment component and the source component at each positive frequency point of each frame under two conditions:
(1) for the case where the left channel ambient component is 90 ° behind the right channel ambient component:
(2) for the case where the left channel ambient component leads the right channel ambient component by 90 °:
wherein the content of the first and second substances,(a1,b1)、(a2,b2) The coordinates, p, of the positive frequency point component values of the left and right channel environment components in the complex frequency domain of the signal respectivelyL、pRRespectively representing the positive frequency point component values of the left channel source component and the right channel source component, wherein k represents a source translation factor;
and 4, step 4: determining a true solution for each positive frequency point of each frame: respectively calculating source component energy and environment component energy for the two groups of solution results, judging whether a solution that the source component energy is greater than the environment component energy exists, and if so, determining that the real solution of the current positive frequency point is a solution that the source component energy is greater than the environment component energy; otherwise, the solution is that the energy of the environment component is greater than the energy of the source component;
the source component energy is the sum of the energies of the left and right source component values of the current positive frequency point, and the environment component energy is the sum of the energies of the left and right channel environment component values of the current positive frequency point;
and 5: constructing the negative frequency point component values of the source components and the environment components of the left channel and the right channel in each frame through a conjugate symmetry relation based on the real solution of each positive frequency point of each frame;
step 6: and performing frequency domain to time domain conversion processing on each frequency point component value of each frame to obtain the environment component signals and the source component signals of the left channel and the right channel of the stereo audio signal to be subjected to component extraction.
2. The method of claim 1, wherein in step 1, the number of sampling points included in each frame is set to 4096 when performing the framing process.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811507726.9A CN109640242B (en) | 2018-12-11 | 2018-12-11 | Audio source component and environment component extraction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811507726.9A CN109640242B (en) | 2018-12-11 | 2018-12-11 | Audio source component and environment component extraction method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109640242A CN109640242A (en) | 2019-04-16 |
CN109640242B true CN109640242B (en) | 2020-05-12 |
Family
ID=66072455
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811507726.9A Active CN109640242B (en) | 2018-12-11 | 2018-12-11 | Audio source component and environment component extraction method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109640242B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113518299B (en) * | 2021-04-30 | 2022-06-03 | 电子科技大学 | Improved method, equipment and computer readable storage medium for extracting source component and environment component |
CN113449255B (en) * | 2021-06-15 | 2022-11-11 | 电子科技大学 | Improved method and device for estimating phase angle of environmental component under sparse constraint and storage medium |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH06113388A (en) * | 1992-08-11 | 1994-04-22 | Sutajio Ion:Kk | Acoustic signal generator |
US5970152A (en) * | 1996-04-30 | 1999-10-19 | Srs Labs, Inc. | Audio enhancement system for use in a surround sound environment |
ES2358786T3 (en) * | 2007-06-08 | 2011-05-13 | Dolby Laboratories Licensing Corporation | HYBRID DERIVATION OF SURROUND SOUND AUDIO CHANNELS COMBINING CONTROLLING SOUND COMPONENTS OF ENVIRONMENTAL SOUND SIGNALS AND WITH MATRICIAL DECODIFICATION. |
EP2191462A4 (en) * | 2007-09-06 | 2010-08-18 | Lg Electronics Inc | A method and an apparatus of decoding an audio signal |
CN104240711B (en) * | 2013-06-18 | 2019-10-11 | 杜比实验室特许公司 | For generating the mthods, systems and devices of adaptive audio content |
CN105917674B (en) * | 2013-10-30 | 2019-11-22 | 华为技术有限公司 | For handling the method and mobile device of audio signal |
US9602946B2 (en) * | 2014-12-19 | 2017-03-21 | Nokia Technologies Oy | Method and apparatus for providing virtual audio reproduction |
US10042038B1 (en) * | 2015-09-01 | 2018-08-07 | Digimarc Corporation | Mobile devices and methods employing acoustic vector sensors |
-
2018
- 2018-12-11 CN CN201811507726.9A patent/CN109640242B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN109640242A (en) | 2019-04-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6703525B2 (en) | Method and device for enhancing sound source | |
KR101935183B1 (en) | A signal processing apparatus for enhancing a voice component within a multi-channal audio signal | |
EP2612322B1 (en) | Method and device for decoding a multichannel audio signal | |
US8332229B2 (en) | Low complexity MPEG encoding for surround sound recordings | |
CN106971738A (en) | The method and device that compression and decompression high-order ambisonics signal are represented | |
WO2013090463A1 (en) | Audio processing method and audio processing apparatus | |
EP1779385B1 (en) | Method and apparatus for encoding and decoding multi-channel audio signal using virtual source location information | |
US10798511B1 (en) | Processing of audio signals for spatial audio | |
CN109640242B (en) | Audio source component and environment component extraction method | |
CN110024421A (en) | Method and apparatus for self adaptive control decorrelation filters | |
EP3808106A1 (en) | Spatial audio capture, transmission and reproduction | |
WO2021130405A1 (en) | Combining of spatial audio parameters | |
CN106960672B (en) | Bandwidth extension method and device for stereo audio | |
EP2941770B1 (en) | Method for determining a stereo signal | |
CN113646836A (en) | Sound field dependent rendering | |
WO2021120795A1 (en) | Sampling rate processing method, apparatus and system, and storage medium and computer device | |
TWI762949B (en) | Method for loss concealment, method for decoding a dirac encoding audio scene and corresponding computer program, loss concealment apparatus and decoder | |
CN109036456B (en) | Method for extracting source component environment component for stereo | |
CN112133316A (en) | Spatial audio representation and rendering | |
WO2018234623A1 (en) | Spatial audio processing | |
JP6832095B2 (en) | Channel number converter and its program | |
KR20110127783A (en) | Apparatus for separating voice and method for separating voice of single channel using the same | |
Bae et al. | A New Non-uniform Sampling & Quantization by using a Modified Correlation | |
CN113808608A (en) | Single sound channel noise suppression method and device based on time-frequency masking smoothing strategy | |
Marin-Hurtado et al. | Preservation of localization cues in BSS-based noise reduction: Application in binaural hearing aids |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |