WO2015060652A1 - 오디오 신호 처리 방법 및 장치 - Google Patents
오디오 신호 처리 방법 및 장치 Download PDFInfo
- Publication number
- WO2015060652A1 WO2015060652A1 PCT/KR2014/009975 KR2014009975W WO2015060652A1 WO 2015060652 A1 WO2015060652 A1 WO 2015060652A1 KR 2014009975 W KR2014009975 W KR 2014009975W WO 2015060652 A1 WO2015060652 A1 WO 2015060652A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- subband
- filter coefficients
- audio signal
- signal
- filter
- Prior art date
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S3/00—Systems employing more than two channels, e.g. quadraphonic
- H04S3/008—Systems employing more than two channels, e.g. quadraphonic in which the audio signals are in digital form, i.e. employing more than two discrete digital channels
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03H—IMPEDANCE NETWORKS, e.g. RESONANT CIRCUITS; RESONATORS
- H03H17/00—Networks using digital techniques
- H03H17/02—Frequency selective networks
- H03H17/06—Non-recursive filters
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R5/00—Stereophonic arrangements
- H04R5/033—Headphones for stereophonic communication
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S3/00—Systems employing more than two channels, e.g. quadraphonic
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S3/00—Systems employing more than two channels, e.g. quadraphonic
- H04S3/002—Non-adaptive circuits, e.g. manually adjustable or static, for enhancing the sound image or the spatial distribution
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S3/00—Systems employing more than two channels, e.g. quadraphonic
- H04S3/002—Non-adaptive circuits, e.g. manually adjustable or static, for enhancing the sound image or the spatial distribution
- H04S3/004—For headphones
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2250/00—Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
- G10H2250/055—Filters for musical processing or musical effects; Filter responses, filter architecture, filter coefficients or control parameters therefor
- G10H2250/111—Impulse response, i.e. filters defined or specifed by their temporal impulse response features, e.g. for echo or reverberation applications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2250/00—Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
- G10H2250/131—Mathematical functions for musical analysis, processing, synthesis or composition
- G10H2250/145—Convolution, e.g. of a music input signal with a desired impulse response to compute an output
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/01—Multi-channel, i.e. more than two input channels, sound reproduction with two speakers wherein the multi-channel information is substantially preserved
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/01—Enhancing the perception of the sound image or of the spatial distribution using head related transfer functions [HRTF's] or equivalents thereof, e.g. interaural time difference [ITD] or interaural level difference [ILD]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/03—Application of parametric coding in stereophonic audio systems
Definitions
- the present invention relates to a signal processing method and apparatus for effectively reproducing an audio signal, and more particularly, to an audio signal processing method and apparatus for implementing filtering on an input audio signal with a low calculation amount.
- Binaural rendering for listening to a multi-channel signal in stereo has a problem that requires more computation as the length of the target filter increases.
- the length may range from 48,000 to 96,000 samples.
- the amount of calculation is huge.
- binaural filtering can be expressed as follows.
- the above time-domain convolution is generally performed using fast convolution based on the Fast Fourier Transform (FFT).
- FFT Fast Fourier Transform
- an FFT corresponding to the number of input channels and an inverse FFT transform corresponding to the number of output channels must be performed.
- delay must be taken into account, so block-wise fast convolution must be performed, which is more than simply fast convolution over the entire length. The amount of computation can be consumed.
- a filtering process requiring a large amount of computation in binaural rendering to preserve a stereoscopic effect such as an original signal can be implemented with a very low computational amount while minimizing sound loss. Has a purpose.
- the present invention has an object to minimize the diffusion of distortion through a high quality filter when there is distortion in the input signal itself.
- the present invention has an object to implement a finite impulse response (FIR) filter having a very long length to a filter of a smaller length.
- FIR finite impulse response
- the present invention has an object to minimize the distortion of the portion damaged by the missing filter coefficients when performing the filtering using the abbreviated FIR filter.
- the present invention provides an audio signal processing method and an audio signal processing apparatus as follows.
- the present invention comprises the steps of receiving an input audio signal; Receiving truncated subband filter coefficients for filtering each subband signal of the input audio signal, wherein the truncated subband filter coefficients are BRIR (Binaural Room Impulse Response) for binaural filtering of the input audio signal At least a portion of the subband filter coefficients obtained from the filter coefficients, and the length of the truncated subband filter coefficients is determined based on the filter order information obtained using at least partially the characteristic information extracted from the corresponding subband filter coefficients.
- BRIR Binary Room Impulse Response
- the truncated subband filter coefficients comprise at least one FFT filter coefficient on which a Fast Fourier Transform (FFT) is performed on a predetermined block basis in the corresponding subband; Performing fast Fourier transform on the subband signal based on a predetermined subframe unit in the corresponding subband; Generating a filtered subframe by multiplying the fast Fourier transformed subframe and the FFT filter coefficients; Inverse fast Fourier transforming the filtered subframes; And overlap-adding the at least one subframe transformed with the inverse fast Fourier to generate a filtered subband signal.
- FFT Fast Fourier Transform
- an audio signal processing apparatus for performing binaural rendering of an input audio signal, wherein the input audio signal includes a plurality of subband signals, and the audio signal processing apparatus directly controls the respective subband signals.
- a fast convolution section for performing rendering of the sound and early reflection sound parts wherein the fast convolution section comprises: receiving an input audio signal; Receive truncated subband filter coefficients for filtering each subband signal of the input audio signal, the truncated subband filter coefficients being a Binaural Room Impulse Response (BRIR) filter for binaural filtering of the input audio signal.
- BRIR Binaural Room Impulse Response
- the truncated subband filter coefficients may include at least one FFT filter coefficient on which a Fast Fourier Transform (FFT) is performed on a predetermined block basis in the corresponding subband; Performing fast Fourier transform on the subband signal based on a predetermined subframe unit in the corresponding subband; Generating a filtered subframe by multiplying the fast Fourier transformed subframe and the FFT filter coefficients; Inverse fast Fourier transform the filtered subframes;
- An apparatus for processing an audio signal may include generating a filtered subband signal by overlap-adding at least one subframe of the inverse fast Fourier transform.
- a method including receiving an input audio signal; Receiving truncated subband filter coefficients for filtering each subband signal of the input audio signal, wherein the truncated subband filter coefficients are BRIR (Binaural Room Impulse Response) for binaural filtering of the input audio signal At least a portion of the subband filter coefficients obtained from the filter coefficients, and the length of the truncated subband filter coefficients is determined based on the filter order information obtained by using at least partially the characteristic information extracted from the corresponding subband filter coefficients ; Obtaining at least one FFT filter coefficient by performing Fast Fourier Transform (FFT) on the truncated subband filter coefficients in predetermined block units in the corresponding subband; Performing fast Fourier transform on the subband signal based on a predetermined subframe unit in the corresponding subband; Generating a filtered subframe by multiplying the fast Fourier transformed subframe and the FFT filter coefficients; Inverse fast Fourier
- an audio signal processing apparatus for performing binaural rendering of an input audio signal, wherein the input audio signal includes a plurality of subband signals, and the audio signal processing apparatus directly controls the respective subband signals.
- a fast convolution section for performing rendering of the sound and early reflection sound parts wherein the fast convolution section comprises: receiving an input audio signal; Receive truncated subband filter coefficients for filtering each subband signal of the input audio signal, the truncated subband filter coefficients being a Binaural Room Impulse Response (BRIR) filter for binaural filtering of the input audio signal.
- BRIR Binaural Room Impulse Response
- At least a portion of the subband filter coefficients obtained from the coefficients, and the length of the truncated subband filter coefficients is determined based on the filter order information obtained using at least partially the characteristic information extracted from the corresponding subband filter coefficients; ; Obtaining at least one FFT filter coefficient by performing Fast Fourier Transform (FFT) on the truncated subband filter coefficients in predetermined block units in the corresponding subband; Performing fast Fourier transform on the subband signal based on a predetermined subframe unit in the corresponding subband; Generating a filtered subframe by multiplying the fast Fourier transformed subframe and the FFT filter coefficients; Inverse fast Fourier transform the filtered subframes;
- An apparatus for processing an audio signal may include generating a filtered subband signal by overlap-adding at least one subframe of the inverse fast Fourier transform.
- the characteristic information may include reverberation time information of a corresponding subband filter coefficient, and the filter order information may have one value for each subband.
- the length of at least one truncated subband filter coefficient is different from the length of the truncated subband filter coefficients of other subbands.
- the length of the predetermined block and the length of the preset subframe has a power of two.
- the length of the preset subframe may be determined based on the length of the preset block in the corresponding subband.
- the performing of the fast Fourier transform may include: dividing the subband signal by the predetermined subframe unit; Generating a temporary subframe including a first half consisting of the divided subframes and a second half consisting of zero-padded values; And fast Fourier transforming the generated temporary subframe.
- a method comprising: receiving at least one prototype filter coefficient for filtering each subband signal of an input audio signal; Converting the circular filter coefficients into a plurality of subband filter coefficients; Cutting each of the subband filter coefficients based on filter order information obtained by using at least part of the characteristic information extracted from the corresponding subband filter coefficients, the length of at least one of the truncated subband filter coefficients being different from each other Different from the length of the truncated subband filter coefficients of the band; Generating a FFT filter coefficient by performing a Fast Fourier Transform (FFT) on the truncated subband filter coefficients in predetermined block units in the corresponding subband; It provides a method for generating a filter of an audio signal comprising a.
- FFT Fast Fourier Transform
- a parameterization unit for generating a filter of the audio signal may include: receiving at least one proto-type filter coefficient for filtering each subband signal of the input audio signal; Convert the circular filter coefficients into a plurality of subband filter coefficients; Each of the subband filter coefficients is truncated based on filter order information obtained by using at least part of the characteristic information extracted from the corresponding subband filter coefficients, wherein at least one of the truncated subband filter coefficients has a different subband length.
- a parameterization unit that generates the FFT filter coefficients by performing a Fast Fourier Transform (FFT) on the truncated subband filter coefficients in predetermined block units in the corresponding subband.
- FFT Fast Fourier Transform
- the characteristic information may include reverberation time information of a corresponding subband filter coefficient, and the filter order information may have one value for each subband.
- the length of the predetermined block is determined by a smaller value of twice the reference filter length of the truncated subband filter coefficients and a preset maximum FFT size, and the reference filter length is a power of two times the filter order. It is characterized by showing either true value or approximation value of a form.
- the generating of the FFT filter coefficients may include: dividing the truncated subband filter coefficients by half of a predetermined block; Generating the temporary filter coefficients in the predetermined block unit by using the divided filter coefficients, wherein the first half of the temporary filter coefficients comprises the divided filter coefficients and the second half of the temporary filter coefficients is a zero-padded value. Configured; And fast Fourier transforming the generated temporary filter.
- the circular filter coefficient may be a BRIR filter coefficient in the time domain.
- receiving an input audio signal includes a plurality of subband signals, each of the plurality of subband signals of a low frequency on the basis of a predetermined frequency band A signal of the first subband group and a signal of the second high frequency subband group;
- At least a portion of a filter coefficient wherein the length of the truncated subband filter coefficients is determined based on filter order information obtained at least in part using feature information extracted from the corresponding subband filter coefficients;
- an audio signal processing apparatus for performing filtering on an input audio signal, wherein the input audio signal includes a plurality of subband signals, each of the plurality of subband signals having a low frequency based on a preset frequency band.
- a fast convolution unit including a first subband group signal and a high frequency second subband group signal, and performing filtering on each subband signal of the first subband group;
- a tap-delay line processing unit configured to perform filtering on each subband signal of the second subband group, wherein the fast convolution unit receives the input audio signal and receives each subband of the first subband group.
- Receive truncated subband filter coefficients for filtering a band signal the truncated subband filter coefficients being at least a portion of subband filter coefficients obtained from circular filter coefficients for filtering the input audio signal
- the length of the subband filter coefficients is determined based on the filter order information obtained by using at least part of the characteristic information extracted from the subband filter coefficients.
- FFT Fast Fourier Transform
- the audio signal processing method may include receiving at least one parameter corresponding to each subband signal of the second subband group, wherein the at least one parameter corresponds to the subband filter corresponding to each subband signal. Extracted from coefficients; And performing tap-delay line filtering on the subband signals of the second subband group by using the received parameters.
- the tap-delay line processing unit may receive at least one parameter corresponding to each subband signal of the second subband group, wherein the at least one parameter corresponds to the subband filter coefficient corresponding to each subband signal. And the tap-delay line filtering is performed on the subband signals of the second subband group using the received parameters.
- the tap-delay line filtering may be one-tap-delay line filtering using the parameter.
- the amount of computation can be dramatically lowered while minimizing sound loss when performing binaural rendering on a multichannel or multiobject signal.
- the present invention provides a method for efficiently performing various types of filtering of a multimedia signal including an audio signal with a low calculation amount.
- FIG. 1 is a block diagram illustrating an audio signal decoder according to an embodiment of the present invention.
- Figure 2 is a block diagram showing each configuration of the binaural renderer according to an embodiment of the present invention.
- 3 to 7 illustrate various embodiments of an audio signal processing apparatus according to the present invention.
- FIGS. 8 to 10 are diagrams illustrating a method for generating an FIR filter for binaural rendering according to an embodiment of the present invention.
- 11 to 14 illustrate various embodiments of the P-part rendering unit of the present invention.
- 17 and 18 illustrate an embodiment of an audio signal processing method using fast convolution in units of blocks.
- 19 is a view showing an embodiment of an audio signal processing procedure in a fast convolution unit of the present invention.
- the audio signal decoder of the present invention includes a core decoder 10, a rendering unit 20, a mixer 30, and a post processing unit 40.
- the core decoder 10 decodes a loudspeaker channel signal, a discrete object signal, an object downmix signal, a pre-rendered signal, and the like.
- the core decoder 10 may use a Unified Speech and Audio Coding (USAC) based codec.
- USAC Unified Speech and Audio Coding
- the rendering unit 20 renders the signal decoded by the core decoder 10 using reproduction layout information.
- the rendering unit 20 may include a format converter 22, an object renderer 24, an OAM decoder 25, a SAOC decoder 26, and a HOA decoder 28.
- the rendering unit 20 performs rendering using any one of the above configurations according to the type of the decoded signal.
- the format converter 22 converts the transmitted channel signal into an output speaker channel signal. That is, the format converter 22 performs conversion between the transmitted channel configuration and the speaker channel configuration to be reproduced. If the number of output speaker channels (such as 5.1 channels) is less than the number of transmitted channels (such as 22.2 channels) or the transmitted channel configuration is different from the channel configuration to be reproduced, the format converter 22 transmits the transmitted channel. Perform a downmix on the signal.
- the audio signal decoder of the present invention may generate an optimal downmix matrix using a combination of an input channel signal and an output speaker channel signal, and perform a downmix using the matrix.
- the channel signal processed by the format converter 22 may include a pre-rendered object signal.
- at least one object signal may be pre-rendered and mixed with the channel signal before encoding the audio signal.
- the mixed object signal may be converted into an output speaker channel signal by the format converter 22 together with the channel signal.
- the object renderer 24 and the SAOC decoder 26 perform rendering for the object based audio signal.
- the object-based audio signal may include individual object waveforms and parametric object waveforms.
- each object signal is provided to the encoder as a monophonic waveform, and the encoder transmits the respective object signals using single channel elements (SCEs).
- SCEs single channel elements
- a parametric object waveform a plurality of object signals are downmixed into at least one channel signal, and characteristics of each object and a relationship between them are represented by a spatial audio object coding (SAOC) parameter.
- SAOC spatial audio object coding
- compressed object metadata corresponding thereto may be transmitted together.
- Object metadata quantizes object attributes in units of time and space to specify the position and gain of each object in three-dimensional space.
- the OAM decoder 25 of the rendering unit 20 receives the compressed object metadata, decodes it, and passes it to the object renderer 24 and / or the SAOC decoder 26.
- the object renderer 24 uses object metadata to render each object signal in accordance with a given playback format.
- each object signal may be rendered to specific output channels based on the object metadata.
- the SAOC decoder 26 recovers the object / channel signal from the decoded SAOC transport channels and parametric information.
- the SAOC decoder 26 may generate an output audio signal based on the reproduction layout information and the object metadata. As such, the object renderer 24 and the SAOC decoder 26 may render the object signal as a channel signal.
- the HOA decoder 28 receives a Higher Order Ambisonics (HOA) signal and HOA side information and decodes it.
- the HOA decoder 28 generates a sound scene by modeling a channel signal or an object signal with a separate equation. When the location of the speaker in the generated sound scene is selected, rendering may be performed with the speaker channel signal.
- HOA Higher Order Ambisonics
- DRC dynamic range control
- the channel-based audio signal and the object-based audio signal processed by the rendering unit 20 are transferred to the mixer 30.
- the mixer 30 adjusts delays of the channel-based waveform and the rendered object waveform and sums them in units of samples.
- the audio signal summed by the mixer 30 is passed to the post processing unit 40.
- the post processing unit 40 includes a speaker renderer 100 and a binaural renderer 200.
- the speaker renderer 100 performs post processing for outputting the multichannel and / or multiobject audio signal transmitted from the mixer 30.
- Such post processing may include dynamic range control (DRC), loudness normalization (LN) and peak limiter (PL).
- DRC dynamic range control
- LN loudness normalization
- PL peak limiter
- the binaural renderer 200 generates a binaural downmix signal of the multichannel and / or multiobject audio signal.
- the binaural downmix signal is a two-channel audio signal such that each input channel / object signal is represented by a virtual sound source located in three dimensions.
- the binaural renderer 200 may receive an audio signal supplied to the speaker renderer 100 as an input signal.
- Binaural rendering is performed based on a Binaural Room Impulse Response (BRIR) filter and may be performed on a time domain or a QMF domain.
- BRIR Binaural Room Impulse Response
- DRC dynamic range control
- LN volume normalization
- PL peak limit
- the binaural renderer 200 is a BRIR parameterization unit 210, high-speed convolution unit 230, late reverberation generation unit 240, QTDL processing unit 250, Mixer & combiner 260 may be included.
- the binaural renderer 200 performs binaural rendering on various types of input signals to generate 3D audio headphone signals (ie, 3D audio two channel signals).
- the input signal may be an audio signal including at least one of a channel signal (ie, a speaker channel signal), an object signal, and a HOA signal.
- the binaural renderer 200 when the binaural renderer 200 includes a separate decoder, the input signal may be an encoded bitstream of the aforementioned audio signal.
- Binaural rendering converts the decoded input signal into a binaural downmix signal, so that the surround sound can be experienced while listening to the headphones.
- the binaural renderer 200 may perform binaural rendering of the input signal on the QMF domain.
- the binaural renderer 200 may receive a multi-channel (N channels) signal of a QMF domain and perform binaural rendering on the multi-channel signal using a BRIR subband filter of the QMF domain.
- Is Is the time domain BRIR filter transformed into a subband filter in the QMF domain.
- binaural rendering may be performed by dividing a channel signal or an object signal of a QMF domain into a plurality of subband signals, convolving each subband signal with a corresponding BRIR subband filter, and then summing them.
- the BRIR parameterization unit 210 converts and edits BRIR filter coefficients and generates various parameters for binaural rendering in the QMF domain.
- the BRIR parameterization unit 210 receives time domain BRIR filter coefficients for a multichannel or multiobject and converts them into QMF domain BRIR filter coefficients.
- the QMF domain BRIR filter coefficients include a plurality of subband filter coefficients respectively corresponding to the plurality of frequency bands.
- the subband filter coefficients indicate each BRIR filter coefficient of the QMF transformed subband domain.
- Subband filter coefficients may also be referred to herein as BRIR subband filter coefficients.
- the BRIR parameterization unit 210 may edit the plurality of BRIR subband filter coefficients of the QMF domain, respectively, and transmit the edited subband filter coefficients to the high speed convolution unit 230.
- the BRIR parameterization unit 210 may be included as one component of the binaural renderer 200, or may be provided as a separate device.
- the configuration including the high-speed convolution unit 230, the late reverberation generation unit 240, the QTDL processing unit 250, the mixer & combiner 260 except for the BRIR parameterization unit 210 is The binaural rendering unit 220 may be classified.
- the BRIR parameterization unit 210 may receive, as an input, a BRIR filter coefficient corresponding to at least one position of the virtual reproduction space.
- Each position of the virtual reproduction space may correspond to each speaker position of the multichannel system.
- each BRIR filter coefficient received by the BRIR parameterization unit 210 may be directly matched to each channel or each object of the input signal of the binaural renderer 200.
- each of the received BRIR filter coefficients may have a configuration independent of the input signal of the binaural renderer 200.
- the BRIR filter coefficients received by the BRIR parameterization unit 210 may not directly match the input signal of the binaural renderer 200, and the number of received BRIR filter coefficients may correspond to the channel of the input signal and / or Or it may be smaller or larger than the total number of objects.
- the BRIR parameterization unit 210 converts and edits the BRIR filter coefficients corresponding to each channel or each object of the input signal of the binaural renderer 200 to perform the binaural rendering unit 220.
- the corresponding BRIR filter coefficients may be matching BRIR or fallback BRIR for each channel or each object.
- BRIR matching may be determined according to whether or not there is a BRIR filter coefficient targeting the position of each channel or each object in the virtual reproduction space. In this case, location information of each channel (or object) may be obtained from an input parameter signaling a channel configuration.
- the corresponding BRIR filter coefficient may be a matching BRIR of the input signal. However, if there are no BRIR filter coefficients targeting the position of a particular channel or object, the BRIR parameterization unit 210 falls back to the corresponding channel or object with the BRIR filter coefficients targeting the position most similar to that channel or object. It can be provided by BRIR.
- the corresponding BRIR filter coefficient may be selected. For example, a BRIR filter coefficient having the same altitude as the desired position and an azimuth deviation within +/ ⁇ 20 ° may be selected. If there is no corresponding BRIR filter coefficient, a BRIR filter coefficient having a minimum geometric distance from the desired position may be selected among the set of BRIR filter coefficients. That is, a BRIR filter coefficient may be selected that minimizes the geometric distance between the location of the BRIR and the desired location.
- the position of the BRIR represents the position of the speaker corresponding to the corresponding BRIR filter coefficients.
- the geometric distance between the two positions may be defined as the sum of the absolute value of the altitude deviation of the two positions and the absolute value of the azimuth deviation.
- the BRIR parameterization unit 210 may convert and edit all of the received BRIR filter coefficients and transmit the converted BRIR filter coefficients to the binaural rendering unit 220.
- the screening operation of the BRIR filter coefficients (or the edited BRIR filter coefficients) corresponding to each channel or each object of the input signal may be performed by the binaural rendering unit 220.
- the binaural rendering unit 220 includes a high speed convolution unit 230, a late reverberation generation unit 240, and a QTDL processing unit 250, and outputs a multi audio signal including a multichannel and / or multiobject signal. Receive.
- an input signal including a multichannel and / or multiobject signal is referred to as a multi audio signal.
- the binaural rendering unit 220 receives the multi-channel signal of the QMF domain according to an embodiment.
- the input signal of the binaural rendering unit 220 may be a time domain multi-channel signal and a multi-channel. Object signals and the like.
- the input signal may be an encoded bitstream of the multi audio signal.
- the present invention will be described based on the case of performing BRIR rendering on the multi-audio signal, but the present invention is not limited thereto. That is, the features provided by the present invention may be applied to other types of rendering filters other than BRIR, and may be applied to an audio signal of a single channel or a single object rather than a multi-audio signal.
- the fast convolution unit 230 performs fast convolution between the input signal and the BRIR filter to process direct sound and early reflection on the input signal.
- the high speed convolution unit 230 may perform high speed convolution using a truncated BRIR.
- the truncated BRIR includes a plurality of subband filter coefficients truncated depending on each subband frequency, and is generated by the BRIR parameterization unit 210. In this case, the length of each truncated subband filter coefficient is determined depending on the frequency of the corresponding subband.
- the fast convolution unit 230 may perform variable order filtering in the frequency domain by using truncated subband filter coefficients having different lengths according to subbands.
- fast convolution may be performed between the QMF domain subband audio signal and the truncated subband filters of the corresponding QMF domain for each frequency band.
- the direct sound & early reflection (D & E) part may be referred to as a front part.
- the late reverberation generator 240 generates a late reverberation signal with respect to the input signal.
- the late reverberation signal represents an output signal after the direct sound and the initial reflection sound generated by the fast convolution unit 230.
- the late reverberation generator 240 may process the input signal based on the reverberation time information determined from each subband filter coefficient transmitted from the BRIR parameterization unit 210.
- the late reverberation generator 240 may generate a mono or stereo downmix signal for the input audio signal and perform late reverberation processing on the generated downmix signal.
- the late reverberation (LR) part herein may be referred to as a parametric (P) -part.
- the QMF domain trapped delay line (QTDL) processing unit 250 processes a signal of a high frequency band among the input audio signals.
- the QTDL processing unit 250 receives at least one parameter corresponding to each subband signal of a high frequency band from the BRIR parameterization unit 210 and performs tap-delay line filtering in the QMF domain using the received parameter.
- the binaural renderer 200 separates the input audio signal into a low frequency band signal and a high frequency band signal based on a predetermined constant or a predetermined frequency band, and the low frequency band signal is a high speed signal.
- the high frequency band signal may be processed by the QTDL processing unit 250, respectively.
- the fast convolution unit 230, the late reverberation generator 240, and the QTDL processing unit 250 output two QMF domain subband signals, respectively.
- the mixer & combiner 260 performs mixing by combining the output signal of the fast convolution unit 230, the output signal of the late reverberation generator 240, and the output signal of the QTDL processing unit 250. At this time, the combination of the output signal is performed separately for the left and right output signals of the two channels.
- the binaural renderer 200 QMF synthesizes the combined output signal to produce a final output audio signal in the time domain.
- the audio signal processing apparatus may refer to the binaural renderer 200 or the binaural rendering unit 220 illustrated in FIG. 2.
- the audio signal processing apparatus may broadly refer to the audio signal decoder of FIG. 1 including a binaural renderer.
- Each binaural renderer illustrated in FIGS. 3 to 7 may represent only a partial configuration of the binaural renderer 200 illustrated in FIG. 2 for convenience of description.
- an embodiment of a multichannel input signal may be mainly described, but unless otherwise stated, the channel, multichannel, and multichannel input signals respectively include an object, a multiobject, and a multiobject input signal. Can be used as a concept.
- the multichannel input signal may be used as a concept including a HOA decoded and rendered signal.
- FIG. 3 illustrates a binaural renderer 200A according to an embodiment of the present invention.
- Generalizing binaural rendering using BRIR is M-to-O processing to obtain O output signals for multi-channel input signals with M channels.
- Binaural filtering can be regarded as filtering using filter coefficients corresponding to each input channel and output channel in this process.
- the original filter set H denotes transfer functions from the speaker position of each channel signal to the left and right ear positions.
- One of these transfer functions measured in a general listening room, that is, a room with reverberation, is called a Binaural Room Impulse Response (BRIR).
- BRIR Binaural Room Impulse Response
- the BRIR contains not only the direction information but also the information of the reproduction space.
- the HRTF and an artificial reverberator may be used to replace the BRIR.
- the binaural rendering using the BRIR is described, but the present invention is not limited thereto and may be applied to the binaural rendering using various types of FIR filters including HRIR and HRTF.
- the present invention is applicable not only to binaural rendering of an audio signal but also to various types of filtering operations of an input signal.
- the BRIR may have a length of 96K samples, and multi-channel binaural rendering is performed using M * O different filters, thus requiring a high throughput process.
- the BRIR parameterization unit 210 may generate modified filter coefficients from the original filter set H to optimize the calculation amount.
- the BRIR parameterization unit 210 separates the original filter coefficients into F (front) -part coefficients and P (parametric) -part coefficients.
- the F-part represents the direct sound and the early reflection sound (D & E) part
- the P-part represents the late reverberation (LR) part.
- an original filter coefficient having a 96K sample length may be separated into an F-part cut only up to the previous 4K sample and a P-part corresponding to the remaining 92K sample.
- the binaural rendering unit 220 receives the F-part coefficients and the P-part coefficients from the BRIR parameterization unit 210 and renders the multi-channel input signal using the F-part coefficients.
- the fast convolution unit 230 illustrated in FIG. 2 renders a multi-audio signal using the F-part coefficient received from the BRIR parameterization unit 210, and generates a late reverberation generator 240.
- F-part rendering (binaural rendering using F-part coefficients) is implemented with a conventional Finite Impulse Response (FIR) filter, and P-part rendering (binaural using P-part coefficients). Rendering) can be implemented in a parametric way.
- FIR Finite Impulse Response
- P-part rendering (binaural using P-part coefficients). Rendering) can be implemented in a parametric way.
- the complexity-quality control input provided by the user or control system may be used to determine the information generated by the F-part and / or P-part.
- FIG. 4 illustrates a more detailed method of implementing F-part rendering as a binaural renderer 200B according to another embodiment of the present invention.
- the P-part rendering unit is omitted in FIG. 4.
- FIG. 4 shows a filter implemented in the QMF domain, the present invention is not limited thereto and may be applicable to all subband processing of other domains.
- F-part rendering may be performed by the fast convolution unit 230 on the QMF domain.
- the QMF analyzer 222 performs time domain input signals x0, x1,... x_M-1 is the QMF domain signal X0, X1,... Convert to X_M-1.
- the input signals x0, x1,... x_M-1 may be a multi-channel audio signal, for example, a channel signal corresponding to a 22.2 channel speaker.
- the QMF domain may use 64 subbands in total, but the present invention is not limited thereto.
- the QMF analyzer 222 may be omitted from the binaural renderer 200B.
- the binaural renderer 200B directly performs QMF domain signals X0, X1,... Without QMF analysis.
- X_M-1 can be received as an input. Therefore, when receiving the QMF domain signal as an input directly, the QMF used in the binaural renderer according to the present invention is characterized in that it is the same as the QMF used in the previous processing unit (for example, SBR).
- the QMF synthesizing unit 244 performs QMF synthesizing of the left and right signals Y_L and Y_R of the two channels on which the binaural rendering is performed to generate the two-channel output audio signals yL and yR of the time domain.
- 5 through 7 illustrate embodiments of binaural renderers 200C, 200D, and 200E that perform F-part rendering and P-part rendering, respectively.
- the F-part rendering is performed by the fast convolution unit 230 on the QMF domain
- the P-part rendering is performed by the late reverberation generation unit 240 on the QMF domain or the time domain. do.
- FIGS. 5 to 7 detailed description of parts overlapping with the embodiments of the previous drawings will be omitted.
- the binaural renderer 200C may perform both F-part rendering and P-part rendering in the QMF domain. That is, the QMF analysis unit 222 of the binaural renderer 200C receives the time domain input signals x0, x1,... x_M-1 is the QMF domain signal X0, X1,... X_M-1 is converted to the high speed convolution unit 230 and the late reverberation generation unit 240, respectively.
- the high speed convolution unit 230 and the late reverberation generation unit 240 perform the QMF domain signals X0, X1,... Render X_M-1 to generate two channels of output signals Y_L, Y_R and Y_Lp and Y_Rp, respectively.
- the fast convolution unit 230 and the late reverberation generator 240 may perform rendering using the F-part filter coefficients and the P-part filter coefficients received by the BRIR parameterization unit 210, respectively.
- the output signals Y_L, Y_R of the F-part rendering and the output signals Y_Lp, Y_Rp of the P-part rendering are combined by the left and right channels in the mixer & combiner 260 and transmitted to the QMF synthesis unit 224.
- the QMF synthesizing unit 224 QMF synthesizes the input two left and right signals to generate two channel output audio signals yL and yR in the time domain.
- the binaural renderer 200D may perform F-part rendering in the QMF domain and P-part rendering in the time domain, respectively.
- the QMF analyzer 222 of the binaural renderer 200D QMF-converts the time domain input signal to the fast convolution unit 230.
- the fast convolution unit 230 generates the output signals Y_L and Y_R of two channels by F-part rendering the QMF domain signal.
- the QMF synthesizing unit 224 converts the output signal of the F-part rendering into a time domain output signal and delivers it to the mixer & combiner 260.
- the late reverberation generator 240 directly receives the time domain input signal and performs P-part rendering.
- the output signals yLp and yRp of the P-part rendering are sent to the mixer & combiner 260.
- the mixer & combiner 260 combines the F-part rendering output signal and the P-part rendering output signal in the time domain, respectively, to generate the two-channel output audio signals yL and yR in the time domain.
- the F-part rendering and the P-part rendering are performed in parallel, respectively.
- the binaural renderer 200E performs the F-part rendering.
- P-part rendering can be performed sequentially, respectively. That is, the fast convolution unit 230 performs F-part rendering on the QMF-converted input signal, and the F-part rendered two-channel signals Y_L and Y_R are converted into time domain signals by the QMF synthesis unit 224 and then late reverberation. It may be delivered to the generation unit 240.
- the late reverberation generator 240 performs P-part rendering on the input two-channel signal to generate two-channel output audio signals yL and yR in the time domain.
- 5 to 7 illustrate an embodiment of performing F-part rendering and P-part rendering, respectively, and binaural rendering may be performed by combining or modifying the embodiments of each drawing.
- the binaural renderer may perform P-part rendering for each of the input multi-audio signals separately, but downmixes the input signal to two channels of left, right or mono signals and then down P-part rendering may be performed on the mixed signal.
- 8 to 10 illustrate a method for generating an FIR filter for binaural rendering according to an embodiment of the present invention.
- an FIR filter converted to a plurality of subband filters of the QMF domain may be used for binaural rendering in the QMF domain.
- subband filters truncated depending on the subband frequencies may be used for F-part rendering. That is, the fast convolution unit of the binaural renderer may perform variable order filtering in the QMF domain by using truncated subband filters having different lengths according to subbands. 8 to 10 described below may be performed by the BRIR parameterization unit 210 of FIG. 2.
- FIG. 8 shows an embodiment of the length according to each QMF band of the QMF domain filter used for binaural rendering.
- the FIR filter is converted into I QMF subband filters
- Fi represents the truncated subband filter of QMF subband i.
- the QMF domain may use 64 subbands in total, but the present invention is not limited thereto.
- N represents the length (number of taps) of the original subband filter
- the length of the truncated subband filter is represented by N1, N2, and N3, respectively. Where the lengths N, N1, N2 and N3 represent the number of taps in the downsampled QMF domain.
- truncated subband filters having different lengths N1, N2, N3 according to each subband may be used for F-part rendering.
- the truncated subband filter is a front filter cut from the original subband filter, and may also be referred to as a front subband filter.
- the rear after truncation of the original subband filter may be referred to as a rear subband filter and may be used for P-part rendering.
- the filter order for each subband may include parameters extracted from the original BRIR filter, for example, reverberation time (RT) information for each subband filter, and EDC (Energy). Decay Curve) value, energy decay time information and the like can be determined.
- the reverberation time may vary from frequency to frequency, due to the acoustic characteristics of the attenuation in the air for each frequency, the sound absorption of the wall and ceiling material is different. In general, a lower frequency signal has a longer reverberation time. Long reverberation time means that a lot of information remains behind the FIR filter.
- each truncated subband filter of the present invention is determined based at least in part on the characteristic information (eg, reverberation time information) extracted from the subband filter.
- each subband may be classified into a plurality of groups, and the length of each truncated subband filter may be determined according to the classified group.
- each subband may be classified into three zones (Zone 1, Zone 2, and Zone 3), wherein the truncated subband filters of Zone 1 corresponding to the low frequency are Zone corresponding to the high frequency. It may have a longer filter order (ie, filter length) than truncated subband filters of 2 and Zone 3. Also, as the high frequency zone goes, the filter order of the truncated subband filter in that zone may gradually decrease.
- the length of each truncated subband filter may be determined independently and variably for each subband according to the characteristic information of the original subband filter.
- the length of each truncated subband filter is determined based on the truncation length determined in that subband and is not affected by the length of the truncated subband filter of neighboring or other subbands.
- the length of some or all truncated subband filters of Zone 2 may be longer than the length of at least one truncated subband filter of Zone 1.
- frequency domain variable order filtering may be performed only on a part of subbands classified into a plurality of groups. That is, truncated subband filters having different lengths may be generated only for subbands belonging to some of the classified at least two groups.
- a truncated subband filter may be generated only for a total of 32 subbands having indices of 0 to 31 in the order of low frequency bands, that is, subbands corresponding to 0-12 kHz bands, which are half of the entire 0-24 kHz band.
- the length of the truncated subband filter of the subband having the index 0 is longer than the length of the truncated subband filter of the subband having the index 31 according to the embodiment of the present invention.
- the length of the truncated filter may be determined based on additional information obtained by the audio signal processing apparatus, such as complexity of the decoder, complexity level (profile), or required quality information.
- the complexity may be determined according to hardware resources of the audio signal processing apparatus or based on a value directly input by the user.
- the quality may be determined according to a user's request, or may be determined by referring to a value transmitted through the bitstream or other information included in the bitstream.
- the quality may be determined according to an estimated value of the quality of the transmitted audio signal. For example, the higher the bit rate, the higher the quality.
- the length of each truncated subband filter may increase proportionally according to complexity and quality, or may vary at different rates for each band.
- each truncated subband filter may be determined as a multiple of a power unit, for example, a power of 2, so as to obtain an additional gain by high-speed processing such as an FFT described later.
- the length of the truncated subband filter may be adjusted to the length of the actual subband filter.
- the BRIR parameterization unit generates truncated subband filter coefficients (F-part coefficients) corresponding to each truncated subband filter determined according to the above-described embodiment, and transfers them to the fast convolution unit.
- the fast convolution unit performs frequency domain variable order filtering on each subband signal of the multi-audio signal using the truncated subband filter coefficients.
- FIG. 9 shows another embodiment of the length of each QMF band of the QMF domain filter used for binaural rendering.
- the same or corresponding parts as those of the embodiment of FIG. 8 will be omitted.
- Fi represents a truncated subband filter (front subband filter) used for F-part rendering of QMF subband i
- Pi is a rear sub used for P-part rendering of QMF subband i.
- N denotes the length (number of taps) of the original subband filter
- NiF and NiP denote lengths of the front subband filter and the rear subband filter of subband i, respectively.
- NiF and NiP represent the number of taps in the down sampled QMF domain.
- the length of the rear subband filter as well as the front subband filter may be determined based on parameters extracted from the original subband filter. That is, the lengths of the front subband filter and the rear subband filter of each subband are determined based at least in part on the characteristic information extracted from the corresponding subband filter. For example, the length of the front subband filter may be determined based on the first reverberation time information of the corresponding subband filter, and the length of the rear subband filter may be determined based on the second reverberation time information.
- the front subband filter is a filter of the front part cut based on the first reverberation time information in the original subband filter
- the rear subband filter is a section after the front subband filter between the first reverberation time and the second reverberation time.
- the filter may be a later part corresponding to the interval of.
- the first reverberation time information may be RT20 and the second reverberation time information may be RT60, but the present invention is not limited thereto.
- the second reverberation time there is a portion that switches from the early reflection part to the late reverberation part.
- a point of transition from a section having a deterministic characteristic to a section having a stochastic characteristic is called a mixing time in view of the BRIR of the entire band.
- information that provides directionality for each position is mainly present, which is unique for each channel.
- the late reverberation part since the late reverberation part has a common characteristic for each channel, it may be efficient to process a plurality of channels at once. Therefore, it is possible to estimate the mixing time for each subband and perform fast convolution through the F-part rendering before the mixing time, and perform the processing reflecting the common characteristics of each channel through the P-part rendering after the mixing time. have.
- the length of the F-part that is, the length of the front subband filter may be longer or shorter than the length corresponding to the mixing time according to the complexity-quality control.
- the model of reducing the filter of the subband to a lower order is possible.
- a typical method is FIR filter modeling using frequency sampling, and it is possible to design a filter that is minimized in terms of least squares.
- the lengths of the front subband filter and / or the rear subband filter for each subband may have the same value for each channel of the corresponding subband.
- the length of the filter may be determined based on the inter-channel or sub-band interrelationships to reduce this effect.
- the BRIR parameterization unit extracts first characteristic information (eg, first reverberation time information) from subband filters corresponding to respective channels of the same subband, and combines the extracted first characteristic information.
- One piece of filter order information (or first truncation point information) for the corresponding subband may be obtained.
- the front subband filter for each channel of the corresponding subband may be determined to have the same length based on the obtained filter order information (or the first truncation point information).
- the BRIR parameterization unit extracts second characteristic information (eg, second reverberation time information) from subband filters corresponding to respective channels of the same subband, and combines the extracted second characteristic information to correspond to the corresponding subbands.
- Second cut point information to be commonly applied to a rear subband filter corresponding to each channel of may be obtained.
- the front subband filter is a front filter cut based on the first cut point information in the original subband filter
- the rear subband filter is a section after the front subband filter between the first cut point and the second cut point. Can be the latter filter corresponding to the interval of
- only F-part processing may be performed on subbands of a specific subband group.
- the processing when the processing is performed using only the filter up to the first truncation point for the corresponding subband, the user may be perceived by the energy difference of the filter processed compared to when the processing is performed using the entire subband filter. This level of distortion can occur.
- energy compensation may be performed for regions not used for processing in the corresponding subband filter, that is, regions after the first cutting point.
- the energy compensation can be performed by dividing the F-part coefficients (front subband filter coefficients) by the filter power up to the first truncation point of the corresponding subband filter and multiplying the energy of the desired area, ie the total power of the corresponding subband filter. Do.
- the energy of the F-part coefficients can be adjusted to be equal to the energy of the entire subband filter.
- the binaural rendering unit may not perform the P-part processing based on the complexity-quality control. In this case, the binaural rendering unit may perform the energy compensation for the F-part coefficients using the P-part coefficients.
- the filter coefficients of truncated subband filters having different lengths for each subband are obtained from one time-domain filter (ie, proto-type filter). That is, since one time-domain filter is converted into a plurality of QMF subband filters and the lengths of the filters corresponding to each subband are varied, each truncated subband filter is obtained from one circular filter.
- one time-domain filter ie, proto-type filter
- the BRIR parameterization unit generates front subband filter coefficients (F-part coefficients) corresponding to each front subband filter determined according to the above-described embodiment, and transfers them to the fast convolution unit.
- the fast convolution unit performs frequency domain variable order filtering on each subband signal of the multi-audio signal using the received front subband filter coefficients.
- the BRIR parameterization unit may generate rear subband filter coefficients (P-part coefficients) corresponding to each rear subband filter determined according to the above-described embodiments, and may transfer them to the late reverberation generation unit.
- the late reverberation generator may perform reverberation processing for each subband signal using the received rear subband filter coefficients.
- the BRIR parameterization unit may generate a downmix subband filter coefficient (downmix P-part coefficient) by combining rear subband filter coefficients for each channel, and transmit the downmix subband filter coefficients to the late reverberation generator.
- the late reverberation generator may generate two channels of left and right subband reverberation signals using the received downmix subband filter coefficients.
- FIG. 10 illustrates another embodiment of a method for generating an FIR filter used for binaural rendering.
- the same or corresponding parts as those of FIGS. 8 and 9 will be omitted.
- a plurality of QMF transformed subband filters may be classified into a plurality of groups, and different processing may be applied to each classified group.
- the plurality of subbands are classified into a first subband group Zone 1 of a low frequency and a second subband group Zone 2 of a high frequency based on a preset frequency band QMF band i. Can be.
- F-part rendering may be performed on the input subband signals of the first subband group
- QTDL processing described below may be performed on the input subband signals of the second subband group.
- the BRIR parameterization unit generates front subband filter coefficients for each subband of the first subband group, and transfers the front subband filter coefficients to the fast convolution unit.
- the fast convolution unit performs F-part rendering on the subband signals of the first subband group by using the received front subband filter coefficients.
- P-part rendering of subband signals of the first subband group may be additionally performed by the late reverberation generator.
- the BRIR parameterization unit obtains at least one parameter from each subband filter coefficient of the second subband group and transfers it to the QTDL processing unit.
- the QTDL processing unit performs tap-delay line filtering on each subband signal of the second subband group using the obtained parameter as described below.
- the predetermined frequency (QMF band i) for distinguishing the first subband group and the second subband group may be determined based on a predetermined constant value, and the bit of the transmitted audio input signal may be determined. It may be determined depending on the thermal characteristics. For example, in the case of an audio signal using SBR, the second subband group may be set to correspond to the SBR band.
- the plurality of subbands may be classified into three subband groups based on the first frequency band QMF band i and the second frequency band QMF band j. That is, the plurality of subbands may include a first subband group Zone 1 which is a low frequency zone smaller than or equal to the first frequency band, and a second subband that is an intermediate frequency zone greater than or equal to the second frequency band. Band group Zone 2 and a third subband group Zone 3 that is a higher frequency region larger than the second frequency band.
- the first subband group includes a total of 32 subbands having indices of 0 to 31
- the second subband group may include a total of 16 subbands having indices of 32 to 47
- the third subband group may include subbands having indices of the remaining 48 to 63.
- the subband index has a lower value as the subband frequency is lower.
- binaural rendering may be performed only on the subband signals of the first subband group and the second subband group. That is, F-part rendering and P-part rendering may be performed on the subband signals of the first subband group, and QTDL processing may be performed on the subband signals of the second subband group. Can be. In addition, binaural rendering may not be performed on the subband signals of the third subband group.
- the first frequency band (QMF band i) is set to a subband of index Kconv-1
- the second frequency band (QMF band j) is set to a subband of index Kproc-1.
- the values of the information Kproc of the maximum frequency band and the information Kconv of the frequency band performing the convolution may vary depending on the sampling frequency of the original BRIR input, the sampling frequency of the input audio signal, and the like.
- FIGS. 11 to 14 various embodiments of the P-part rendering of the present invention will be described with reference to FIGS. 11 to 14. That is, various embodiments of the late reverberation generation unit 240 of FIG. 2 performing P-part rendering in the QMF domain will be described with reference to FIGS. 11 to 14.
- FIGS. 11 to 14 it is assumed that a multichannel input signal is received as a subband signal of a QMF domain. Accordingly, the processing of each component of FIGS. 11 to 14, that is, the decorrelator 241, the subband filtering unit 242, the IC matching unit 243, the downmixing unit 244, and the energy attenuation matching unit 246 is performed. May be performed for each QMF subband.
- FIGS. 11 to 14 detailed descriptions of parts overlapping with the embodiments of the previous drawings will be omitted.
- Pi (P1, P2, P3, ...) corresponding to the P-part corresponds to the rear portion of each subband filter removed according to the frequency variable truncation.
- the length of the P-part may be defined as the entire filter after the cut point of each subband filter, or may be defined as a smaller length with reference to the second reverberation time information of the corresponding subband filter. have.
- P-part rendering may be performed independently for each channel, or may be performed for downmixed channels.
- the P-part rendering may be applied through different processing for each preset subband group or for each subband, or may be applied to the same processing for all subbands.
- the processing applicable to the P-part includes energy reduction compensation for the input signal, tap-delay line filtering, processing using an Infinite Impulse Response (IIR) filter, processing using an artificial reverberator, and frequency (FIIC) -independent interaural coherence (FDIC) compensation, and frequency-dependent interaural coherence (FDIC) compensation.
- IIR Infinite Impulse Response
- FDIC frequency-independent interaural coherence
- EDR Energy Decay Relief
- FDIC Frequency-dependent Interaural Coherence
- Impulse response STFT Short Time Fourier Transform
- n time index
- i frequency index
- k frame index
- m output channel index (L, R).
- the function of the molecule Outputs the real value of the input x, Denotes the complex conjugate of x.
- the molecular part in the above formula may be replaced with a function that takes an absolute value instead of a real value.
- FDIC since the binaural rendering in the present invention is performed in the QMF domain, FDIC may be defined by the following equation.
- i is the subband index
- k is the time index in the subband
- the FDIC of the late reverberation part is a parameter that is mainly influenced by the position of the two microphones when the BRIR is recorded. Assuming the listener's head is a sphere, BRIR's theoretical FDIC (IC ideal ) can satisfy the following equation:
- r is the distance between the listener's ears, ie, the distance between the two microphones, and k is the frequency index.
- the initial reflection sound mainly included in the F-part is very different for each channel.
- the FDIC of the F-part varies very differently from channel to channel.
- the FDIC varies greatly, but this is because a large measurement error occurs due to the characteristics of the high frequency band signal, which rapidly decays energy, and when the average of each channel is taken, the FDIC converges to almost zero.
- the difference in FDIC for each channel occurs due to measurement error, but it can be seen that the average converges to the sync function as shown in Equation 5.
- the late reverberation generation unit for P-part rendering may be implemented based on the above characteristics.
- the late reverberation generation unit 240A may include a subband filtering unit 242 and downmixing units 244a and 244b.
- the subband filtering unit 242 uses the P-part coefficients to multi-channel input signals X0, X1,... , X_M-1 is filtered for each subband.
- the P-part coefficient is received from a BRIR parameterization unit (not shown) as described above, and may include coefficients of a rear subband filter having different lengths for each subband.
- the subband filtering unit 242 performs fast convolution between the QMF domain subband signal and the rear subband filter of the QMF domain corresponding to each frequency.
- the length of the rear subband filter may be determined based on the RT60 as described above, but may be set to a value larger or smaller than the RT60 according to the complexity-quality control.
- the multi-channel input signals are left channel signals X_L0, X_L1, ... by the subband filtering unit 242, respectively. , X_L_M-1 and the right channel signals X_R0, X_R1,... , Rendered with X_R_M-1.
- the downmix units 244a and 244b downmix the rendered plurality of left channel signals and the plurality of right channel signals by left and right channels, respectively, to generate two channels of left and right output signals Y_Lp and Y_Rp.
- the late reverberation generation unit 240B includes a decorator 241, an IC matching unit 243, a downmixing units 244a and 244b, and energy attenuation matching units 246a and 246b. can do.
- the BRIR parameterization unit (not shown) may include an IC estimator 213 and a downmix subband filter generator 216.
- the late reverberation generation unit 240B may reduce the amount of calculation by using the same energy decay characteristic for each channel of the late reverberation part. That is, the late reverberation generation unit 240B performs decorrelation and interaural coherence (IC) adjustment for each multichannel signal, and downmixes the adjusted input signal and decorrelation signal for each channel into left and right channel signals. Afterwards, two channels of left and right output signals are generated by compensating for energy attenuation of the downmixed signal. More specifically, the decorrelator 241 is configured for each multichannel input signal X0, X1,... , The decoration signals D0, D1, ... for X_M-1. , D_M-1 is generated. The decorrelator 241 is a kind of preprocessor for adjusting coherence between both ears, and a phase randomizer may be used, and the phase of the input signal may be phased in units of 90 degrees for efficiency of computation. You can also change
- the IC estimator 213 of the BRIR parameterization unit estimates an IC value and transmits the IC value to the binaural rendering unit (not shown).
- the binaural rendering unit may store the received IC value in the memory 255 and transmit the received IC value to the IC matching unit 243.
- the IC matching unit 243 may directly receive an IC value from the BRIR parameterization unit and may obtain an IC value previously stored in the memory 255.
- the input signal and the decoration signal for each channel are X_L0, X_L1, ... which are left channel signals in the IC matching unit 243. , X_L_M-1 and the right channel signals X_R0, X_R1,... , Rendered with X_R_M-1.
- the IC matching unit 243 performs weighted summation between the decorrelated signal and the original input signal for each channel by referring to the IC value, and adjusts the coherence between the two channel signals.
- the input signal for each channel is a signal of the subband domain
- the above-described FDIC can be matched.
- the left and right channel signals X_L and X_R on which IC matching is performed may be expressed by the following equation.
- the downmix units 244a and 244b downmix the plurality of left channel signals and the plurality of right channel signals rendered through IC matching for each left and right channel to generate two left and right rendering signals.
- the energy attenuation matching units 246a and 246b generate the two channel left and right output signals Y_Lp and Y_Rp by reflecting the energy decay of the two channel left and right rendering signals, respectively.
- the energy attenuation matching units 246a and 246b perform energy attenuation matching using the downmix subband filter coefficients obtained from the downmix subband filter generation unit 216.
- the downmix subband filter coefficients are generated by a combination of rear subband filter coefficients for each channel of the corresponding subband.
- the downmix subband filter coefficients may include subband filter coefficients rooted on the average of the square amplitude response of the rear subband filter coefficients for each channel with respect to the corresponding subband. Accordingly, the downmix subband filter coefficients reflect energy reduction characteristics of the late reverberation part for the corresponding subband signal.
- the downmix subband filter coefficients may include downmix subband filter coefficients that are downmixed in mono or stereo, depending on the embodiment, and may be received directly from the BRIR parameterization section, as in FDIC, or from values previously stored in the memory 225. Can be obtained.
- FIG. 13 illustrates a late reverberation generation unit 240C according to another embodiment of the present invention.
- Each configuration of the late reverberation generation unit 240C of FIG. 13 may be the same as each configuration of the late reverberation generation unit 240B described in the embodiment of FIG. 12, and the data processing order between the elements may be partially different.
- the late reverberation generation unit 240C may further reduce the amount of calculation by using the same FDIC for each channel of the late reverberation part. That is, the late reverberation generation unit 240C downmixes each multichannel signal into left and right channel signals, adjusts the IC of the downmixed left and right channel signals, and then adjusts the energy of the adjusted left and right channel signals. The attenuation can be compensated to generate two channels of left and right output signals.
- the decorrelator 241 is configured for each multichannel input signal X0, X1,... ,
- the decoration signals D0, D1, ... for X_M-1. , D_M-1 is generated.
- the downmix units 244a and 244b downmix the multi-channel input signal and the decoration signal to generate two-channel downmix signals X_DMX and D_DMX, respectively.
- the IC matching unit 243 weights and sums the two-channel downmix signal with reference to the IC value, thereby adjusting the coherence between the two channel signals.
- the energy attenuation compensators 246a and 246b perform energy compensation on each of the left and right channel signals X_L and X_R on which IC matching is performed by the IC matching unit 243 to output the left and right output signals X_Lp and Y_Rp of two channels.
- the energy compensation information used for energy compensation may include downmix subband filter coefficients for each subband.
- Each configuration of the late reverberation generation unit 240D of FIG. 14 may be the same as each configuration of the late reverberation generation units 240B and 240C described in the embodiments of FIGS. 12 and 13, but has a more simplified feature.
- the down mix unit 244 performs multichannel input signals X0, X1,... , Downmixing X_M-1 for each subband to generate a mono downmix signal (ie, a mono subband signal) X_DMX.
- the energy decay matching unit 246 reflects the energy decay of the generated mono downmix signal.
- downmix subband filter coefficients for each subband may be used to reflect energy attenuation.
- the decorrelator 241 generates a decoration signal D_DMX of the mono downmix signal reflecting the energy decay.
- the IC matching unit 243 weights the mono downmix signal and the decoration signal reflecting the energy decay with reference to the FDIC value, thereby generating two left and right output signals Y_Lp and Y_Rp. According to the embodiment of FIG. 14, the energy attenuation matching is performed only once for the mono downmix signal X_DMX, thereby further reducing the amount of computation.
- FIGS. 15 and 16 assume that the multi-channel input signal is received as a subband signal in the QMF domain. Accordingly, in the embodiment of FIGS. 15 and 16, the tap-delay line filter and the one-tap-delay line filter may perform processing for each QMF subband. In addition, QTDL processing may be performed only on the input signal of the high frequency band classified based on a predetermined constant or a predetermined frequency band as described above. If SBR (Spectral Band Replication) is applied to the input audio signal, the high frequency band may correspond to the SBR band. 15 and 16, detailed description of parts overlapping with those of the previous drawings will be omitted.
- SBR Spectrum Band Replication
- SBR Spectral Band Replication
- the high frequency band is generated using information of the low frequency band that is encoded and transmitted and additional information of the high frequency band signal transmitted by the encoder.
- SBR band is a high frequency band, and as described above, the reverberation time of the frequency band is very short. That is, the BRIR subband filter of the SBR band has less valid information and has a fast attenuation rate. Therefore, the BRIR rendering for the high frequency band that corresponds to the SBR band may be very effective in terms of the amount of computation compared to the quality of sound quality rather than performing the convolution.
- the QTDL processing unit 250A uses a tap-delay line filter to multi-channel input signals X0, X1,... , Sub-band filtering is performed on X_M-1.
- the tap-delay line filter convolutions only a few taps preset for each channel signal. In this case, the number of taps used may be determined based on a parameter directly extracted from a BRIR subband filter coefficient corresponding to the corresponding subband signal.
- the parameter includes delay information for each tap to be used in the tap-delay line filter and gain information corresponding thereto.
- the number of taps used in the tap-delay line filter can be determined by complexity-quality control.
- the QTDL processing unit 250A receives, from the BRIR parameterization unit, a set of parameters (gain information and delay information) corresponding to the number of taps for each channel and subband based on the predetermined number of taps.
- the received parameter set is extracted from the BRIR subband filter coefficients corresponding to the corresponding subband signal, and may be determined according to various embodiments. For example, a set of parameters for each of the peaks extracted by the predetermined number of taps may be received among the plurality of peaks of the corresponding BRIR subband filter coefficients in order of absolute value magnitude, real value magnitude, or imaginary value magnitude. have.
- the delay information of each parameter represents position information of a corresponding peak, and has an integer value of a sample unit in the QMF domain.
- the gain information is determined based on the magnitude of the peak corresponding to the delay information.
- the weight value of the corresponding peak after energy compensation for the entire subband filter coefficients may be used.
- the gain information is obtained by using both real weight and imaginary weight for the corresponding peak, and thus has a complex value.
- the plurality of channel signals filtered by the tap-delay line filter are summed into two channel left and right output signals Y_L and Y_R for each subband.
- parameters used in each tap-delay line filter of the QTDL processing unit 250A may be stored in a memory during initialization of binaural rendering, and QTDL processing may be performed without additional calculation for parameter extraction.
- the QTDL processing unit 250B uses the one-tap-delay line filter to multi-channel input signals X0, X1,... , Sub-band filtering is performed on X_M-1.
- One-tap-delay line filters can be understood to perform convolution on only one tap for each channel signal.
- the tap used may be determined based on a parameter directly extracted from a BRIR subband filter coefficient corresponding to the corresponding subband signal.
- the parameter includes delay information extracted from the BRIR subband filter coefficients and corresponding gain information.
- L_0, L_1,... L_M-1 represents the delay for the BRIR to the left ear in M channels, respectively, and R_0, R_1,. , R_M-1 represents the delay for the BRIR from the M channel to the right ear, respectively.
- the delay information indicates position information of the maximum peak among the corresponding BRIR subband filter coefficients in order of absolute value, real value, or imaginary value.
- G_L_0, G_L_1,... , G_L_M-1 represent gains corresponding to the delay information of the left channel
- G_R_0, G_R_1,... And G_R_M-1 indicate gains corresponding to the delay information of the right channel, respectively.
- each gain information is determined based on the magnitude of the peak corresponding to the corresponding delay information.
- the weight value of the corresponding peak after energy compensation for the entire subband filter coefficients may be used.
- the gain information is obtained by using both real weight and imaginary weight for the corresponding peak, and thus has a complex value.
- the plurality of channel signals filtered by the one-tap-delay line filter are added to the left and right output signals Y_L and Y_R of two channels for each subband.
- parameters used in each one-tap-delay line filter of the QTDL processing unit 250B may be stored in a memory during initialization of binaural rendering, and QTDL processing may be performed without additional operations for parameter extraction. have.
- FIGS. 17 through 19 illustrate an audio signal processing method using block-wise high-speed convolution according to an embodiment of the present invention.
- FIGS. 17 to 19 detailed description of parts overlapping with the embodiments of the previous drawings will be omitted.
- fast convolution may be performed in a predetermined block unit for optimal binaural rendering in terms of efficiency and performance.
- High-speed convolution based on FFT reduces the amount of computation as the FFT size increases, but increases the overall processing delay and increases the memory usage. If a high-speed convolution of a BRIR with a length of 1 second with an FFT size that is twice the length is effective, it is efficient in terms of throughput but a delay of 1 second is generated and corresponding buffer and processing memory. You will need An audio signal processing method having a long delay time is not suitable for an application for real time data processing. Since the minimum unit capable of performing decoding in the audio signal processing apparatus is a frame, it is preferable that binaural rendering also performs fast convolution of a block unit in a size corresponding to the frame unit.
- FIG. 17 illustrates an embodiment of an audio signal processing method using fast convolution in units of blocks.
- the circular FIR filter is converted into I subband filters, and Fi represents a truncated subband filter of subband i.
- Each subband Band 0 to Band I-1 may represent a subband in the frequency domain, that is, a QMF subband.
- the QMF domain may use 64 subbands in total, but the present invention is not limited thereto.
- N represents the length (number of taps) of the original subband filter
- the length of the truncated subband filter is represented by N1, N2, and N3, respectively.
- the length of the truncated subband filter coefficients of the subband i included in Zone 1 includes N1 values
- the length of the truncated subband filter coefficients of the subband i included in Zone 2 includes N2 values into Zone 3
- the truncated subband filter coefficients of subband i have the value of N3.
- the lengths N, N1, N2 and N3 represent the number of taps in the downsampled QMF domain.
- the length of the truncated subband filter may be independently determined for each subband group (Zone 1, Zone 2, Zone 3) as shown in FIG. 17, but may be independently determined for each subband. .
- the BRIR parameterization unit (or binaural rendering unit) of the present invention performs fast Fourier transform of the truncated subband filter coefficients on a predetermined block basis in the corresponding subband (or subband group). To generate FFT filter coefficients.
- the length M_i of the preset block in each subband i is determined based on the preset maximum FFT size L. More specifically, the length M_i of the preset block in the subband i may be represented by the following equation.
- L is a preset maximum FFT size and N_i is a reference filter length of truncated subband filter coefficients.
- the length M_i of the preset block may be determined as a smaller value of twice the reference filter length N_i of the truncated subband filter coefficients and the preset maximum FFT size L. If, as in Zone 1 and Zone 2 of FIG. 17, the value of twice the reference filter length N_i of the truncated subband filter coefficients is greater than or equal to (or greater than) the maximum FFT size L, The length M_i of the preset block is determined as the maximum FFT size L. However, as in Zone 3 of FIG.
- the length M_i is determined to be twice the value of the reference filter length N_i.
- fast Fourier transform is performed, so that the length M_i of the block for fast Fourier transform is equal to the reference filter length N_i. It may be determined based on the comparison result between the 2 times the value of and the predetermined maximum FFT size (L).
- the reference filter length N_i represents any one of a true power or an approximation of a power of 2 of the filter order (that is, the length of truncated subband filter coefficients) in the corresponding subband. That is, if the filter order of subband i is a power of 2, the filter order is used as the reference filter length (N_i) in subband i, and if it is not a power of 2, power of 2 of the corresponding filter order is not used. The squared rounded up or down value is used as the reference filter length N_i.
- N3 the filter order of subband I-1 of Zone 3, is not a power of 2, so N3 ', which is an approximation of the power of 2, is used as the reference filter length (N_I-1) of the corresponding subband. Can be.
- the preset length M_I-1 of the subband I-1 is set to the double value of N3'.
- both the length M_i and the reference filter length N_i of the preset block may be powers of two.
- the BRIR parameterization unit divides the truncated subband filter coefficients by half (M_i / 2) units of the preset block.
- the region of the dotted line boundary of the F-part shown in FIG. 17 represents subband filter coefficients divided into half units of the preset block.
- the BRIR parameterization unit generates temporary filter coefficients of a predetermined block unit M_i by using each divided filter coefficient. In this case, the first half of the temporary filter coefficients is composed of the divided filter coefficients, and the second half is composed of zero-padded values.
- the temporary filter coefficient of the preset block length M_i is generated using the filter coefficient of the half length M_i / 2 of the preset block.
- the BRIR parameterization unit performs fast Fourier transform on the generated temporary filter coefficients to generate FFT filter coefficients.
- the FFT filter coefficients generated as described above may be used for fast convolution of a predetermined block unit for the input audio signal. That is, the fast convolution unit of the binaural renderer may perform the fast convolution by multiplying (eg, complex multiplying) the generated FFT filter coefficients and the corresponding multi-audio signal in units of subframes as described below. have.
- the BRIR parameterization unit performs FFT filter coefficients by performing a fast Fourier transform on subband filter coefficients truncated in blocks of a length independently determined for each subband (or for each subband group). Can be generated. Accordingly, fast convolution using different numbers of blocks for each subband (or for each subband group) may be performed. In this case, the number k i of blocks in the subband i may satisfy the following equation.
- the number k i of blocks in the subband i may be determined as a value obtained by dividing a value twice the reference filter length N_i in the corresponding subband by the length M_i of the predetermined block.
- FIG. 18 illustrates another embodiment of an audio signal processing method using fast convolution in units of blocks.
- the same or corresponding parts as those of the embodiment of FIG. 10 or 17 will be omitted.
- a plurality of subbands in the frequency domain may include a first subband group Zone 1 of a low frequency and a second subband group of a high frequency based on a preset frequency band QMF band i. Zone 2) can be classified.
- the plurality of subbands may be divided into three subband groups, that is, the first subband group Zone 1 and the second, based on a preset first frequency band QMF band i and a second frequency band QMF band j.
- the subband group Zone 2 and the third subband group Zone 3 may be classified.
- F-part rendering using fast convolution in block units may be performed on the input subband signals of the first subband group, and QTDL processing may be performed on the input subband signals of the second subband group.
- the subband signals of the third subband group may not be rendered.
- the above-described generation of the FFT filter coefficients in block units may be limitedly performed on the front subband filters Fi of the first subband group.
- the P-part rendering of the subband signals of the first subband group may be performed by the late reverberation generator according to the exemplary embodiment.
- the late reverberation generator may also perform P-part rendering on a predetermined block basis.
- the BRIR parameterization unit may generate FFT filter coefficients in predetermined block units corresponding to the rear subband filters Pi of the first subband group, respectively.
- the BRIR parameterization unit performs fast Fourier transform of coefficients of each rear subband filter Pi or downmix subband filter (downmix P-part coefficients) in predetermined block units to perform at least one FFT filter. You can generate coefficients.
- the generated FFT filter coefficients may be passed to the late reverberation generator to be used for P-part rendering of the input audio signal. That is, the late reverberation generator may perform P-part rendering by complex-multiplying the obtained FFT filter coefficients and the subband signals of the first subband group corresponding thereto in subframe units.
- the BRIR parameterization unit obtains at least one parameter from each subband filter coefficient of the second subband group and transfers it to the QTDL processing unit.
- the QTDL processing unit performs tap-delay line filtering on each subband signal of the second subband group by using the acquired parameter.
- the BRIR parameterization unit may generate at least one FFT filter coefficient by performing a fast Fourier transform of a predetermined block unit on the obtained parameter.
- the BRIR parameterization unit transfers the FFT filter coefficients corresponding to each subband of the second subband group to the QTDL processing unit.
- the QTDL processing unit may perform filtering by complex multiplying the obtained FFT filter coefficients and subband signals of the second subband group corresponding thereto by subframe units.
- the FFT filter coefficient generation process described above with reference to FIGS. 17 and 18 may be performed by the BRIR parameterization unit included in the binaural renderer.
- the present invention is not limited thereto and may be performed in a BRIR parameterization unit separate from the binaural rendering unit.
- the BRIR parameterization unit transmits the truncated subband filter coefficients to the binaural rendering unit in the form of FFT filter coefficients in units of blocks. That is, the truncated subband filter coefficients transmitted from the BRIR parameterization unit to the binaural rendering unit are composed of at least one FFT filter coefficient having fast Fourier transforms performed on a block basis.
- the FFT filter coefficient generation process using the fast Fourier transform on a block basis has been described as being performed in the BRIR parameterization unit, but the present invention is not limited thereto. That is, according to another embodiment of the present invention, the above-described FFT filter coefficient generation process may be performed in the binaural rendering unit.
- the BRIR parameterization unit transmits the truncated subband filter coefficients obtained by cutting the BRIR subband filter coefficients to the binaural rendering unit.
- the binaural rendering unit may receive at least one FFT filter coefficient by receiving the truncated subband filter coefficients from the BRIR parameterization unit and performing fast Fourier transform on the truncated subband filter coefficients in predetermined block units.
- the fast convolution unit of the present invention may perform fast convolution on a block basis to filter the input audio signal.
- the fast convolution unit obtains at least one FFT filter coefficient constituting truncated subband filter coefficients for filtering each subband signal.
- the fast convolution unit may receive the FFT filter coefficients from the BRIR parameterization unit.
- the fast convolution unit (or binaural rendering unit including the same) receives the truncated subband filter coefficients from the BRIR parameterization unit and sets the truncated subband filter coefficients in a predetermined block.
- Fast Fourier transform can be used to generate FFT filter coefficients.
- the length M_i of the predetermined block in each subband is determined, and the number of FFT filter coefficients (FFT coef. 1 to FFT coef.k) corresponding to the number k i of blocks in the corresponding subband is determined. i ) is obtained.
- the fast convolution unit performs fast Fourier transform on each subband signal of the input audio signal based on a predetermined subframe unit in the corresponding subband.
- the fast convolution unit divides the subband signal into predetermined subframe units.
- the length of the subframe is determined based on the length M_i of the predetermined block in the corresponding subband.
- the length of the subframe is half the length of the preset block (M_i / 2). ) Can be determined.
- the length of the subframe may be set to have a power of two.
- the high speed convolution unit generates the temporary subframes each having a length twice the length of the subframe (that is, the length M_i) using the divided subframes subframe 1 to subframe K i .
- the first half of the temporary subframe consists of the divided subframes, and the second half consists of zero-padded values.
- the fast convolution unit generates a FFT subframe by fast Fourier transforming the generated temporary subframe.
- the fast convolution unit generates a filtered subframe by multiplying the fast Fourier transformed subframe (ie, the FFT subframe) and the FFT filter coefficients.
- the complex multiplier CMPY of the fast convolution unit may generate a filtered subframe by performing a complex multiplication between the FFT subframe and the FFT filter coefficients.
- the fast convolution unit inverse fast Fourier transforms each filtered subframe to generate a fast conv. Subframe.
- the fast convolution unit overlaps-adds at least one fast conv. Subframe inverse fast Fourier transform to generate a filtered subband signal.
- the filtered subband signal may constitute an output audio signal in the corresponding subband.
- subchannels for each channel of the same subband may be summed into subframes for two output channels.
- the FFT filter coefficients i.e., FFT coef.
- Filtered subframes obtained by performing complex multiplication with m are stored in memory (buffer), summed when subframes after the current subframe are processed, and then inversed.
- Fast Fourier transforms can be performed.
- the filtered subframe obtained through complex multiplication between the first FFT subframe 1 and the second FFT filter coefficients FFT coef. 2 is stored in a buffer and then corresponds to the second subframe.
- Each filtered subframe obtained through complex multiplication between filter coefficients (FFT coef. 2) may be stored in a buffer.
- the filtered subframe stored in the buffer is a filtered subframe obtained by complex multiplication between a third FFT subframe (FFT subframe 3) and a first FFT filter coefficient (FFT coef. 1) at a time point corresponding to the third subframe.
- the inverse fast Fourier transform may be performed on the summed frame and the summed subframe.
- the length of the subframe may have a value smaller than the half length (M_i / 2) of the preset block.
- each subframe may be extended to a preset length M_i through zero-padding, and then fast Fourier transform may be performed.
- the overlap interval is not the length of the subframe but the half length of the preset block (M_i / 2). It can be performed based on.
- the present invention can be applied to a multimedia signal processing apparatus including various types of audio signal processing apparatuses and video signal processing apparatuses.
Abstract
Description
Claims (14)
- 입력 오디오 신호를 수신하는 단계;상기 입력 오디오 신호의 각 서브밴드 신호의 필터링을 위한 절단된 서브밴드 필터 계수들을 수신하는 단계, 상기 절단된 서브밴드 필터 계수는 상기 입력 오디오 신호의 바이노럴 필터링을 위한 BRIR(Binaural Room Impulse Response) 필터 계수로부터 획득된 서브밴드 필터 계수의 적어도 일 부분이며, 상기 절단된 서브밴드 필터 계수의 길이는 해당 서브밴드 필터 계수에서 추출된 특성 정보를 적어도 부분적으로 이용하여 획득된 필터 차수 정보에 기초하여 결정되고, 상기 절단된 서브밴드 필터 계수는 해당 서브밴드에서의 기 설정된 블록 단위로 고속 퓨리에 변환(Fast Fourier Transform, FFT)이 수행된 적어도 하나의 FFT 필터 계수로 구성됨;상기 서브밴드 신호를 해당 서브밴드에서의 기 설정된 서브 프레임 단위에 기초하여 고속 퓨리에 변환을 수행하는 단계;상기 고속 퓨리에 변환된 서브 프레임과 상기 FFT 필터 계수를 곱하여 필터링 된 서브 프레임을 생성하는 단계;상기 필터링 된 서브 프레임을 역 고속 퓨리에 변환하는 단계; 및상기 역 고속 퓨리에 변환된 적어도 하나의 서브 프레임을 오버랩-애드 하여 필터링 된 서브밴드 신호를 생성하는 단계;를 포함하는 것을 특징으로 하는 오디오 신호 처리 방법.
- 제1 항에 있어서,상기 특성 정보는 해당 서브밴드 필터 계수의 잔향 시간 정보를 포함하며, 상기 필터 차수 정보는 각 서브밴드 별로 하나의 값을 갖는 것을 특징으로 하는 오디오 신호 처리 방법.
- 제1 항에 있어서,적어도 하나의 상기 절단된 서브밴드 필터 계수의 길이는 다른 서브밴드의 절단된 서브밴드 필터 계수의 길이와 다른 것을 특징으로 하는 오디오 신호 처리 방법.
- 제1 항에 있어서,상기 기 설정된 블록의 길이 및 기 설정된 서브 프레임의 길이는 2의 거듭 제곱 값을 갖는 것을 특징으로 하는 오디오 신호 처리 방법.
- 제1 항에 있어서,상기 기 설정된 서브 프레임의 길이는 해당 서브밴드에서의 상기 기 설정된 블록의 길이에 기초하여 결정되는 것을 특징으로 하는 오디오 신호 처리 방법.
- 제5 항에 있어서,상기 고속 퓨리에 변환을 수행하는 단계는,상기 서브밴드 신호를 상기 기 설정된 서브 프레임 단위로 분할하는 단계;상기 분할된 서브 프레임으로 구성된 전반부 및 제로-패딩된 값으로 구성된 후반부를 포함하는 임시 서브 프레임을 생성하는 단계; 및상기 생성된 임시 서브 프레임을 고속 퓨리에 변환하는 단계를 포함하는 것을 특징으로 하는 오디오 신호 처리 방법.
- 입력 오디오 신호를 수신하는 단계;상기 입력 오디오 신호의 각 서브밴드 신호의 필터링을 위한 절단된 서브밴드 필터 계수들을 수신하는 단계, 상기 절단된 서브밴드 필터 계수는 상기 입력 오디오 신호의 바이노럴 필터링을 위한 BRIR(Binaural Room Impulse Response) 필터 계수로부터 획득된 서브밴드 필터 계수의 적어도 일 부분이며, 상기 절단된 서브밴드 필터 계수의 길이는 해당 서브밴드 필터 계수에서 추출된 특성 정보를 적어도 부분적으로 이용하여 획득된 필터 차수 정보에 기초하여 결정됨;상기 절단된 서브밴드 필터 계수를 해당 서브밴드에서의 기 설정된 블록 단위로 고속 퓨리에 변환(Fast Fourier Transform, FFT)하여 적어도 하나의 FFT 필터 계수를 획득하는 단계;상기 서브밴드 신호를 해당 서브밴드에서의 기 설정된 서브 프레임 단위에 기초하여 고속 퓨리에 변환을 수행하는 단계;상기 고속 퓨리에 변환된 서브 프레임과 상기 FFT 필터 계수를 곱하여 필터링 된 서브 프레임을 생성하는 단계;상기 필터링 된 서브 프레임을 역 고속 퓨리에 변환하는 단계; 및상기 역 고속 퓨리에 변환된 적어도 하나의 서브 프레임을 오버랩-애드 하여 필터링 된 서브밴드 신호를 생성하는 단계;를 포함하는 것을 특징으로 하는 오디오 신호 처리 방법.
- 입력 오디오 신호에 대한 바이노럴 렌더링을 수행하기 위한 오디오 신호 처리 장치로서, 상기 입력 오디오 신호는 각각 복수의 서브밴드 신호들을 포함하며, 상기 오디오 신호 처리 장치는 상기 각 서브밴드 신호에 대한 직접음 및 초기 반사음 파트의 렌더링을 수행하는 고속 콘볼루션부를 포함하되,상기 고속 콘볼루션부는,입력 오디오 신호를 수신하고;상기 입력 오디오 신호의 각 서브밴드 신호의 필터링을 위한 절단된 서브밴드 필터 계수들을 수신하되, 상기 절단된 서브밴드 필터 계수는 상기 입력 오디오 신호의 바이노럴 필터링을 위한 BRIR(Binaural Room Impulse Response) 필터 계수로부터 획득된 서브밴드 필터 계수의 적어도 일 부분이며, 상기 절단된 서브밴드 필터 계수의 길이는 해당 서브밴드 필터 계수에서 추출된 특성 정보를 적어도 부분적으로 이용하여 획득된 필터 차수 정보에 기초하여 결정되고, 상기 절단된 서브밴드 필터 계수는 해당 서브밴드에서의 기 설정된 블록 단위로 고속 퓨리에 변환(Fast Fourier Transform, FFT)이 수행된 적어도 하나의 FFT 필터 계수로 구성되고;상기 서브밴드 신호를 해당 서브밴드에서의 기 설정된 서브 프레임 단위에 기초하여 고속 퓨리에 변환을 수행하고;상기 고속 퓨리에 변환된 서브 프레임과 상기 FFT 필터 계수를 곱하여 필터링 된 서브 프레임을 생성하고;상기 필터링 된 서브 프레임을 역 고속 퓨리에 변환하고;상기 역 고속 퓨리에 변환된 적어도 하나의 서브 프레임을 오버랩-애드 하여 필터링 된 서브밴드 신호를 생성하는 것을 특징으로 하는 오디오 신호 처리 장치.
- 제8 항에 있어서,상기 특성 정보는 해당 서브밴드 필터 계수의 잔향 시간 정보를 포함하며, 상기 필터 차수 정보는 각 서브밴드 별로 하나의 값을 갖는 것을 특징으로 하는 오디오 신호 처리 장치.
- 제8 항에 있어서,적어도 하나의 상기 절단된 서브밴드 필터 계수의 길이는 다른 서브밴드의 절단된 서브밴드 필터 계수의 길이와 다른 것을 특징으로 하는 오디오 신호 처리 장치.
- 제8 항에 있어서,상기 기 설정된 블록의 길이 및 기 설정된 서브 프레임의 길이는 2의 거듭 제곱 값을 갖는 것을 특징으로 하는 오디오 신호 처리 장치.
- 제8 항에 있어서,상기 기 설정된 서브 프레임의 길이는 해당 서브밴드에서의 상기 기 설정된 블록의 길이에 기초하여 결정되는 것을 특징으로 하는 오디오 신호 처리 장치.
- 제12 항에 있어서,상기 서브밴드 신호의 고속 퓨리에 변환은,상기 서브밴드 신호를 상기 기 설정된 서브 프레임 단위로 분할하고,상기 분할된 서브 프레임으로 구성된 전반부 및 제로-패딩된 값으로 구성된 후반부를 포함하는 임시 서브 프레임을 생성하고,상기 생성된 임시 서브 프레임을 고속 퓨리에 변환하여 수행되는 것을 특징으로 하는 오디오 신호 처리 장치.
- 입력 오디오 신호에 대한 바이노럴 렌더링을 수행하기 위한 오디오 신호 처리 장치로서, 상기 입력 오디오 신호는 각각 복수의 서브밴드 신호들을 포함하며, 상기 오디오 신호 처리 장치는 상기 각 서브밴드 신호에 대한 직접음 및 초기 반사음 파트의 렌더링을 수행하는 고속 콘볼루션부를 포함하되,상기 고속 콘볼루션부는,입력 오디오 신호를 수신하고;상기 입력 오디오 신호의 각 서브밴드 신호의 필터링을 위한 절단된 서브밴드 필터 계수들을 수신하되, 상기 절단된 서브밴드 필터 계수는 상기 입력 오디오 신호의 바이노럴 필터링을 위한 BRIR(Binaural Room Impulse Response) 필터 계수로부터 획득된 서브밴드 필터 계수의 적어도 일 부분이며, 상기 절단된 서브밴드 필터 계수의 길이는 해당 서브밴드 필터 계수에서 추출된 특성 정보를 적어도 부분적으로 이용하여 획득된 필터 차수 정보에 기초하여 결정되고;상기 절단된 서브밴드 필터 계수를 해당 서브밴드에서의 기 설정된 블록 단위로 고속 퓨리에 변환(Fast Fourier Transform, FFT)하여 적어도 하나의 FFT 필터 계수를 획득하고;상기 서브밴드 신호를 해당 서브밴드에서의 기 설정된 서브 프레임 단위에 기초하여 고속 퓨리에 변환을 수행하고;상기 고속 퓨리에 변환된 서브 프레임과 상기 FFT 필터 계수를 곱하여 필터링 된 서브 프레임을 생성하고;상기 필터링 된 서브 프레임을 역 고속 퓨리에 변환하고;상기 역 고속 퓨리에 변환된 적어도 하나의 서브 프레임을 오버랩-애드 하여 필터링 된 서브밴드 신호를 생성하는 것을 특징으로 하는 오디오 신호 처리 장치.
Priority Applications (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201480058320.0A CN105900455B (zh) | 2013-10-22 | 2014-10-22 | 用于处理音频信号的方法和设备 |
US15/031,275 US10580417B2 (en) | 2013-10-22 | 2014-10-22 | Method and apparatus for binaural rendering audio signal using variable order filtering in frequency domain |
KR1020167009852A KR101804744B1 (ko) | 2013-10-22 | 2014-10-22 | 오디오 신호 처리 방법 및 장치 |
EP14856742.3A EP3062535B1 (en) | 2013-10-22 | 2014-10-22 | Method and apparatus for processing audio signal |
US16/747,533 US11195537B2 (en) | 2013-10-22 | 2020-01-21 | Method and apparatus for binaural rendering audio signal using variable order filtering in frequency domain |
US17/517,630 US20220059105A1 (en) | 2013-10-22 | 2021-11-02 | Method and apparatus for binaural rendering audio signal using variable order filtering in frequency domain |
Applications Claiming Priority (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR20130125933 | 2013-10-22 | ||
KR10-2013-0125933 | 2013-10-22 | ||
KR10-2013-0125930 | 2013-10-22 | ||
KR20130125930 | 2013-10-22 | ||
US201461973868P | 2014-04-02 | 2014-04-02 | |
US61/973,868 | 2014-04-02 |
Related Child Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/031,275 A-371-Of-International US10580417B2 (en) | 2013-10-22 | 2014-10-22 | Method and apparatus for binaural rendering audio signal using variable order filtering in frequency domain |
US16/747,533 Continuation US11195537B2 (en) | 2013-10-22 | 2020-01-21 | Method and apparatus for binaural rendering audio signal using variable order filtering in frequency domain |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2015060652A1 true WO2015060652A1 (ko) | 2015-04-30 |
Family
ID=52993176
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2014/009975 WO2015060652A1 (ko) | 2013-10-22 | 2014-10-22 | 오디오 신호 처리 방법 및 장치 |
PCT/KR2014/009978 WO2015060654A1 (ko) | 2013-10-22 | 2014-10-22 | 오디오 신호의 필터 생성 방법 및 이를 위한 파라메터화 장치 |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2014/009978 WO2015060654A1 (ko) | 2013-10-22 | 2014-10-22 | 오디오 신호의 필터 생성 방법 및 이를 위한 파라메터화 장치 |
Country Status (5)
Country | Link |
---|---|
US (5) | US10204630B2 (ko) |
EP (2) | EP3062535B1 (ko) |
KR (2) | KR101804744B1 (ko) |
CN (4) | CN105900455B (ko) |
WO (2) | WO2015060652A1 (ko) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105792090A (zh) * | 2016-04-27 | 2016-07-20 | 华为技术有限公司 | 一种增加混响的方法与装置 |
CN110100460A (zh) * | 2017-01-30 | 2019-08-06 | 谷歌有限责任公司 | 基于头部位置和时间的具有非头部跟踪立体声的高保真度立体声响复制音频 |
Families Citing this family (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10075795B2 (en) | 2013-04-19 | 2018-09-11 | Electronics And Telecommunications Research Institute | Apparatus and method for processing multi-channel audio signal |
WO2014171791A1 (ko) | 2013-04-19 | 2014-10-23 | 한국전자통신연구원 | 다채널 오디오 신호 처리 장치 및 방법 |
US9319819B2 (en) | 2013-07-25 | 2016-04-19 | Etri | Binaural rendering method and apparatus for decoding multi channel audio |
US10469969B2 (en) | 2013-09-17 | 2019-11-05 | Wilus Institute Of Standards And Technology Inc. | Method and apparatus for processing multimedia signals |
US10204630B2 (en) | 2013-10-22 | 2019-02-12 | Electronics And Telecommunications Research Instit Ute | Method for generating filter for audio signal and parameterizing device therefor |
US9805704B1 (en) | 2013-12-02 | 2017-10-31 | Jonathan S. Abel | Method and system for artificial reverberation using modal decomposition |
US11488574B2 (en) | 2013-12-02 | 2022-11-01 | Jonathan Stuart Abel | Method and system for implementing a modal processor |
US11087733B1 (en) | 2013-12-02 | 2021-08-10 | Jonathan Stuart Abel | Method and system for designing a modal filter for a desired reverberation |
JP6151866B2 (ja) | 2013-12-23 | 2017-06-21 | ウィルス インスティテュート オブ スタンダーズ アンド テクノロジー インコーポレイティド | オーディオ信号のフィルタ生成方法およびそのためのパラメータ化装置 |
US9832585B2 (en) | 2014-03-19 | 2017-11-28 | Wilus Institute Of Standards And Technology Inc. | Audio signal processing method and apparatus |
US9848275B2 (en) | 2014-04-02 | 2017-12-19 | Wilus Institute Of Standards And Technology Inc. | Audio signal processing method and device |
US9961475B2 (en) * | 2015-10-08 | 2018-05-01 | Qualcomm Incorporated | Conversion from object-based audio to HOA |
US10249312B2 (en) | 2015-10-08 | 2019-04-02 | Qualcomm Incorporated | Quantization of spatial vectors |
US10142755B2 (en) * | 2016-02-18 | 2018-11-27 | Google Llc | Signal processing methods and systems for rendering audio on virtual loudspeaker arrays |
US10559295B1 (en) * | 2017-12-08 | 2020-02-11 | Jonathan S. Abel | Artificial reverberator room size control |
JP7031543B2 (ja) * | 2018-09-21 | 2022-03-08 | 株式会社Jvcケンウッド | 処理装置、処理方法、再生方法、及びプログラム |
KR102458962B1 (ko) | 2018-10-02 | 2022-10-26 | 한국전자통신연구원 | 가상 현실에서 음향 확대 효과 적용을 위한 음향 신호 제어 방법 및 장치 |
US10841728B1 (en) * | 2019-10-10 | 2020-11-17 | Boomcloud 360, Inc. | Multi-channel crosstalk processing |
CN111211759B (zh) * | 2019-12-31 | 2022-03-25 | 京信网络系统股份有限公司 | 滤波器系数确定方法、装置和数字das系统 |
KR102500157B1 (ko) | 2020-07-09 | 2023-02-15 | 한국전자통신연구원 | 오디오 신호의 바이노럴 렌더링 방법 및 장치 |
CN114650033B (zh) * | 2021-09-13 | 2022-11-15 | 中国科学院地质与地球物理研究所 | 一种基于dsp的快速滤波方法 |
DE102021211278B3 (de) * | 2021-10-06 | 2023-04-06 | Sivantos Pte. Ltd. | Verfahren zur Bestimmung einer HRTF und Hörgerät |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090012638A1 (en) * | 2007-07-06 | 2009-01-08 | Xia Lou | Feature extraction for identification and classification of audio signals |
KR100924576B1 (ko) * | 2004-10-20 | 2009-11-02 | 프라운호퍼-게젤샤프트 츄어 푀르더룽 데어 안게반텐 포르슝에.파우. | 바이노럴 큐 코딩 방법 등을 위한 개별 채널 시간 엔벌로프정형 |
KR20100063113A (ko) * | 2007-10-09 | 2010-06-10 | 코닌클리즈케 필립스 일렉트로닉스 엔.브이. | 바이노럴 오디오 신호를 생성하기 위한 방법 및 장치 |
KR20120013893A (ko) * | 2010-08-06 | 2012-02-15 | 삼성전자주식회사 | 디코딩 방법 및 그에 따른 디코딩 장치 |
KR20130045414A (ko) * | 2005-09-13 | 2013-05-03 | 코닌클리케 필립스 일렉트로닉스 엔.브이. | 3d 사운드를 발생시키기 위한 방법 및 디바이스 |
Family Cites Families (100)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0472907A (ja) * | 1990-07-13 | 1992-03-06 | Sony Corp | ノイズシェーピングフィルタの係数設定方法 |
US5329587A (en) | 1993-03-12 | 1994-07-12 | At&T Bell Laboratories | Low-delay subband adaptive filter |
US5371799A (en) | 1993-06-01 | 1994-12-06 | Qsound Labs, Inc. | Stereo headphone sound source localization system |
DE4328620C1 (de) | 1993-08-26 | 1995-01-19 | Akg Akustische Kino Geraete | Verfahren zur Simulation eines Raum- und/oder Klangeindrucks |
WO1995034883A1 (fr) | 1994-06-15 | 1995-12-21 | Sony Corporation | Processeur de signaux et dispositif de reproduction sonore |
IT1281001B1 (it) | 1995-10-27 | 1998-02-11 | Cselt Centro Studi Lab Telecom | Procedimento e apparecchiatura per codificare, manipolare e decodificare segnali audio. |
US5956674A (en) * | 1995-12-01 | 1999-09-21 | Digital Theater Systems, Inc. | Multi-channel predictive subband audio coder using psychoacoustic adaptive bit allocation in frequency, time and over the multiple channels |
WO1999014983A1 (en) | 1997-09-16 | 1999-03-25 | Lake Dsp Pty. Limited | Utilisation of filtering effects in stereo headphone devices to enhance spatialization of source around a listener |
US7421304B2 (en) * | 2002-01-21 | 2008-09-02 | Kenwood Corporation | Audio signal processing device, signal recovering device, audio signal processing method and signal recovering method |
CN1328707C (zh) * | 2002-07-19 | 2007-07-25 | 日本电气株式会社 | 音频解码设备以及解码方法 |
FI118247B (fi) | 2003-02-26 | 2007-08-31 | Fraunhofer Ges Forschung | Menetelmä luonnollisen tai modifioidun tilavaikutelman aikaansaamiseksi monikanavakuuntelussa |
SE0301273D0 (sv) * | 2003-04-30 | 2003-04-30 | Coding Technologies Sweden Ab | Advanced processing based on a complex-exponential-modulated filterbank and adaptive time signalling methods |
US7680289B2 (en) | 2003-11-04 | 2010-03-16 | Texas Instruments Incorporated | Binaural sound localization using a formant-type cascade of resonators and anti-resonators |
US7949141B2 (en) | 2003-11-12 | 2011-05-24 | Dolby Laboratories Licensing Corporation | Processing audio signals with head related transfer function filters and a reverberator |
JP4867914B2 (ja) | 2004-03-01 | 2012-02-01 | ドルビー ラボラトリーズ ライセンシング コーポレイション | マルチチャンネルオーディオコーディング |
KR100634506B1 (ko) | 2004-06-25 | 2006-10-16 | 삼성전자주식회사 | 저비트율 부호화/복호화 방법 및 장치 |
WO2006003891A1 (ja) * | 2004-07-02 | 2006-01-12 | Matsushita Electric Industrial Co., Ltd. | 音声信号復号化装置及び音声信号符号化装置 |
CN1731694A (zh) * | 2004-08-04 | 2006-02-08 | 上海乐金广电电子有限公司 | 数字音频编码方法以及装置 |
CN101241701B (zh) * | 2004-09-17 | 2012-06-27 | 广州广晟数码技术有限公司 | 用于对音频信号进行解码的方法和设备 |
JP2006189298A (ja) * | 2005-01-05 | 2006-07-20 | Shimadzu Corp | ガスクロマトグラフ質量分析装置及びそれを用いたバックグラウンドの低減方法 |
US7715575B1 (en) | 2005-02-28 | 2010-05-11 | Texas Instruments Incorporated | Room impulse response |
US7707034B2 (en) * | 2005-05-31 | 2010-04-27 | Microsoft Corporation | Audio codec post-filter |
JP2006337767A (ja) * | 2005-06-02 | 2006-12-14 | Matsushita Electric Ind Co Ltd | 低演算量パラメトリックマルチチャンネル復号装置および方法 |
EP1740016B1 (en) | 2005-06-28 | 2010-02-24 | AKG Acoustics GmbH | Method for the simulation of a room impression and/or sound impression |
CA2621175C (en) | 2005-09-13 | 2015-12-22 | Srs Labs, Inc. | Systems and methods for audio processing |
KR101512995B1 (ko) | 2005-09-13 | 2015-04-17 | 코닌클리케 필립스 엔.브이. | 공간 디코더 유닛, 공간 디코더 장치, 오디오 시스템, 한 쌍의 바이노럴 출력 채널들을 생성하는 방법 |
WO2007031905A1 (en) | 2005-09-13 | 2007-03-22 | Koninklijke Philips Electronics N.V. | Method of and device for generating and processing parameters representing hrtfs |
US7917561B2 (en) | 2005-09-16 | 2011-03-29 | Coding Technologies Ab | Partially complex modulated filter bank |
US8443026B2 (en) | 2005-09-16 | 2013-05-14 | Dolby International Ab | Partially complex modulated filter bank |
US8811627B2 (en) | 2005-10-26 | 2014-08-19 | Nec Corporation | Echo suppressing method and apparatus |
CN1996811A (zh) * | 2005-12-31 | 2007-07-11 | 北京三星通信技术研究有限公司 | 用于判决传输模式转换的测量报告实现方法及设备 |
WO2007080211A1 (en) | 2006-01-09 | 2007-07-19 | Nokia Corporation | Decoding of binaural audio signals |
CN101361117B (zh) * | 2006-01-19 | 2011-06-15 | Lg电子株式会社 | 处理媒体信号的方法和装置 |
CN101379554B (zh) * | 2006-02-07 | 2012-09-19 | Lg电子株式会社 | 用于编码/解码信号的装置和方法 |
DE602007004451D1 (de) | 2006-02-21 | 2010-03-11 | Koninkl Philips Electronics Nv | Audiokodierung und audiodekodierung |
CN101030845B (zh) * | 2006-03-01 | 2011-02-09 | 中国科学院上海微系统与信息技术研究所 | 频分多址系统的发射、接收装置及其方法 |
KR100754220B1 (ko) * | 2006-03-07 | 2007-09-03 | 삼성전자주식회사 | Mpeg 서라운드를 위한 바이노럴 디코더 및 그 디코딩방법 |
WO2007106553A1 (en) | 2006-03-15 | 2007-09-20 | Dolby Laboratories Licensing Corporation | Binaural rendering using subband filters |
FR2899424A1 (fr) | 2006-03-28 | 2007-10-05 | France Telecom | Procede de synthese binaurale prenant en compte un effet de salle |
JP2007264154A (ja) * | 2006-03-28 | 2007-10-11 | Sony Corp | オーディオ信号符号化方法、オーディオ信号符号化方法のプログラム、オーディオ信号符号化方法のプログラムを記録した記録媒体及びオーディオ信号符号化装置 |
US8374365B2 (en) | 2006-05-17 | 2013-02-12 | Creative Technology Ltd | Spatial audio analysis and synthesis for binaural reproduction and format conversion |
EP2036201B1 (en) | 2006-07-04 | 2017-02-01 | Dolby International AB | Filter unit and method for generating subband filter impulse responses |
US7876903B2 (en) | 2006-07-07 | 2011-01-25 | Harris Corporation | Method and apparatus for creating a multi-dimensional communication space for use in a binaural audio system |
DE102006047197B3 (de) * | 2006-07-31 | 2008-01-31 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Vorrichtung und Verfahren zum Verarbeiten eines reellen Subband-Signals zur Reduktion von Aliasing-Effekten |
US9496850B2 (en) * | 2006-08-04 | 2016-11-15 | Creative Technology Ltd | Alias-free subband processing |
EP4325724A3 (en) | 2006-10-25 | 2024-04-17 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Apparatus and method for generating audio subband values |
WO2008069597A1 (en) | 2006-12-07 | 2008-06-12 | Lg Electronics Inc. | A method and an apparatus for processing an audio signal |
FR2912249A1 (fr) * | 2007-02-02 | 2008-08-08 | France Telecom | Codage/decodage perfectionnes de signaux audionumeriques. |
KR20080076691A (ko) | 2007-02-14 | 2008-08-20 | 엘지전자 주식회사 | 멀티채널 오디오신호 복호화방법 및 그 장치, 부호화방법및 그 장치 |
US8363843B2 (en) * | 2007-03-01 | 2013-01-29 | Apple Inc. | Methods, modules, and computer-readable recording media for providing a multi-channel convolution reverb |
KR100955328B1 (ko) | 2007-05-04 | 2010-04-29 | 한국전자통신연구원 | 반사음 재생을 위한 입체 음장 재생 장치 및 그 방법 |
CN101743586B (zh) * | 2007-06-11 | 2012-10-17 | 弗劳恩霍夫应用研究促进协会 | 音频编码器、编码方法、解码器、解码方法 |
KR100899836B1 (ko) * | 2007-08-24 | 2009-05-27 | 광주과학기술원 | 실내 충격응답 모델링 방법 및 장치 |
CN101884065B (zh) * | 2007-10-03 | 2013-07-10 | 创新科技有限公司 | 用于双耳再现和格式转换的空间音频分析和合成的方法 |
KR100971700B1 (ko) | 2007-11-07 | 2010-07-22 | 한국전자통신연구원 | 공간큐 기반의 바이노럴 스테레오 합성 장치 및 그 방법과,그를 이용한 바이노럴 스테레오 복호화 장치 |
US8125885B2 (en) | 2008-07-11 | 2012-02-28 | Texas Instruments Incorporated | Frequency offset estimation in orthogonal frequency division multiple access wireless networks |
CA2732079C (en) | 2008-07-31 | 2016-09-27 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Signal generation for binaural signals |
TWI475896B (zh) | 2008-09-25 | 2015-03-01 | Dolby Lab Licensing Corp | 單音相容性及揚聲器相容性之立體聲濾波器 |
EP2175670A1 (en) | 2008-10-07 | 2010-04-14 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Binaural rendering of a multi-channel audio signal |
FR2938947B1 (fr) * | 2008-11-25 | 2012-08-17 | A Volute | Procede de traitement du signal, notamment audionumerique. |
KR20100062784A (ko) | 2008-12-02 | 2010-06-10 | 한국전자통신연구원 | 객체 기반 오디오 컨텐츠 생성/재생 장치 |
CN102257562B (zh) * | 2008-12-19 | 2013-09-11 | 杜比国际公司 | 用空间线索参数对多通道音频信号应用混响的方法和装置 |
EP3751570B1 (en) * | 2009-01-28 | 2021-12-22 | Dolby International AB | Improved harmonic transposition |
WO2010091077A1 (en) | 2009-02-03 | 2010-08-12 | University Of Ottawa | Method and system for a multi-microphone noise reduction |
TWI662788B (zh) * | 2009-02-18 | 2019-06-11 | 瑞典商杜比國際公司 | 用於高頻重建或參數立體聲之複指數調變濾波器組 |
EP2237270B1 (en) | 2009-03-30 | 2012-07-04 | Nuance Communications, Inc. | A method for determining a noise reference signal for noise compensation and/or noise reduction |
KR20120006060A (ko) | 2009-04-21 | 2012-01-17 | 코닌클리케 필립스 일렉트로닉스 엔.브이. | 오디오 신호 합성 |
JP4893789B2 (ja) | 2009-08-10 | 2012-03-07 | ヤマハ株式会社 | 音場制御装置 |
US9432790B2 (en) | 2009-10-05 | 2016-08-30 | Microsoft Technology Licensing, Llc | Real-time sound propagation for dynamic sources |
WO2010004056A2 (en) * | 2009-10-27 | 2010-01-14 | Phonak Ag | Method and system for speech enhancement in a room |
EP2365630B1 (en) | 2010-03-02 | 2016-06-08 | Harman Becker Automotive Systems GmbH | Efficient sub-band adaptive fir-filtering |
PL3570278T3 (pl) | 2010-03-09 | 2023-03-20 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Rekonstrukcja wysokiej częstotliwości wejściowego sygnału audio przy użyciu kaskadowych banków filtrów |
KR101844511B1 (ko) | 2010-03-19 | 2018-05-18 | 삼성전자주식회사 | 입체 음향 재생 방법 및 장치 |
JP5850216B2 (ja) | 2010-04-13 | 2016-02-03 | ソニー株式会社 | 信号処理装置および方法、符号化装置および方法、復号装置および方法、並びにプログラム |
US8693677B2 (en) | 2010-04-27 | 2014-04-08 | Freescale Semiconductor, Inc. | Techniques for updating filter coefficients of an adaptive filter |
CN102256200A (zh) * | 2010-05-19 | 2011-11-23 | 上海聪维声学技术有限公司 | 全数字助听器的基于wola滤波器组的信号处理方法 |
US8762158B2 (en) * | 2010-08-06 | 2014-06-24 | Samsung Electronics Co., Ltd. | Decoding method and decoding apparatus therefor |
NZ587483A (en) | 2010-08-20 | 2012-12-21 | Ind Res Ltd | Holophonic speaker system with filters that are pre-configured based on acoustic transfer functions |
KR101744621B1 (ko) | 2010-09-16 | 2017-06-09 | 돌비 인터네셔널 에이비 | 교차 곱 강화된 서브밴드 블록 기반 고조파 전위 |
JP5707842B2 (ja) | 2010-10-15 | 2015-04-30 | ソニー株式会社 | 符号化装置および方法、復号装置および方法、並びにプログラム |
EP2444967A1 (en) * | 2010-10-25 | 2012-04-25 | Fraunhofer-Gesellschaft zur Förderung der Angewandten Forschung e.V. | Echo suppression comprising modeling of late reverberation components |
EP2464146A1 (en) | 2010-12-10 | 2012-06-13 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Apparatus and method for decomposing an input signal using a pre-calculated reference curve |
CN103329576B (zh) * | 2011-01-05 | 2016-12-07 | 皇家飞利浦电子股份有限公司 | 音频系统及其操作方法 |
EP2541542A1 (en) | 2011-06-27 | 2013-01-02 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Apparatus and method for determining a measure for a perceived level of reverberation, audio processor and method for processing a signal |
EP2503800B1 (en) | 2011-03-24 | 2018-09-19 | Harman Becker Automotive Systems GmbH | Spatially constant surround sound |
JP5704397B2 (ja) | 2011-03-31 | 2015-04-22 | ソニー株式会社 | 符号化装置および方法、並びにプログラム |
EP2710588B1 (en) | 2011-05-19 | 2015-09-09 | Dolby Laboratories Licensing Corporation | Forensic detection of parametric audio coding schemes |
EP2530840B1 (en) * | 2011-05-30 | 2014-09-03 | Harman Becker Automotive Systems GmbH | Efficient sub-band adaptive FIR-filtering |
KR101809272B1 (ko) * | 2011-08-03 | 2017-12-14 | 삼성전자주식회사 | 다 채널 오디오 신호의 다운 믹스 방법 및 장치 |
US9826328B2 (en) | 2012-08-31 | 2017-11-21 | Dolby Laboratories Licensing Corporation | System for rendering and playback of object based audio in various listening environments |
US9319764B2 (en) | 2013-03-08 | 2016-04-19 | Merry Electronics Co., Ltd. | MEMS microphone packaging structure |
US20140270189A1 (en) | 2013-03-15 | 2014-09-18 | Beats Electronics, Llc | Impulse response approximation methods and related systems |
US9420393B2 (en) | 2013-05-29 | 2016-08-16 | Qualcomm Incorporated | Binaural rendering of spherical harmonic coefficients |
EP2840811A1 (en) | 2013-07-22 | 2015-02-25 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Method for processing an audio signal; signal processing unit, binaural renderer, audio encoder and audio decoder |
US9319819B2 (en) | 2013-07-25 | 2016-04-19 | Etri | Binaural rendering method and apparatus for decoding multi channel audio |
US10469969B2 (en) | 2013-09-17 | 2019-11-05 | Wilus Institute Of Standards And Technology Inc. | Method and apparatus for processing multimedia signals |
US10204630B2 (en) * | 2013-10-22 | 2019-02-12 | Electronics And Telecommunications Research Instit Ute | Method for generating filter for audio signal and parameterizing device therefor |
JP6151866B2 (ja) | 2013-12-23 | 2017-06-21 | ウィルス インスティテュート オブ スタンダーズ アンド テクノロジー インコーポレイティド | オーディオ信号のフィルタ生成方法およびそのためのパラメータ化装置 |
US9832585B2 (en) * | 2014-03-19 | 2017-11-28 | Wilus Institute Of Standards And Technology Inc. | Audio signal processing method and apparatus |
US9848275B2 (en) * | 2014-04-02 | 2017-12-19 | Wilus Institute Of Standards And Technology Inc. | Audio signal processing method and device |
-
2014
- 2014-10-22 US US15/031,274 patent/US10204630B2/en active Active
- 2014-10-22 EP EP14856742.3A patent/EP3062535B1/en active Active
- 2014-10-22 WO PCT/KR2014/009975 patent/WO2015060652A1/ko active Application Filing
- 2014-10-22 CN CN201480058320.0A patent/CN105900455B/zh active Active
- 2014-10-22 CN CN201810179405.4A patent/CN108449704B/zh active Active
- 2014-10-22 EP EP14855415.7A patent/EP3062534B1/en active Active
- 2014-10-22 KR KR1020167009852A patent/KR101804744B1/ko active IP Right Grant
- 2014-10-22 US US15/031,275 patent/US10580417B2/en active Active
- 2014-10-22 CN CN201810180321.2A patent/CN108347689B/zh active Active
- 2014-10-22 CN CN201480058172.2A patent/CN105874819B/zh active Active
- 2014-10-22 KR KR1020167009853A patent/KR101804745B1/ko active IP Right Grant
- 2014-10-22 WO PCT/KR2014/009978 patent/WO2015060654A1/ko active Application Filing
-
2018
- 2018-12-19 US US16/224,820 patent/US10692508B2/en active Active
-
2020
- 2020-01-21 US US16/747,533 patent/US11195537B2/en active Active
-
2021
- 2021-11-02 US US17/517,630 patent/US20220059105A1/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100924576B1 (ko) * | 2004-10-20 | 2009-11-02 | 프라운호퍼-게젤샤프트 츄어 푀르더룽 데어 안게반텐 포르슝에.파우. | 바이노럴 큐 코딩 방법 등을 위한 개별 채널 시간 엔벌로프정형 |
KR20130045414A (ko) * | 2005-09-13 | 2013-05-03 | 코닌클리케 필립스 일렉트로닉스 엔.브이. | 3d 사운드를 발생시키기 위한 방법 및 디바이스 |
US20090012638A1 (en) * | 2007-07-06 | 2009-01-08 | Xia Lou | Feature extraction for identification and classification of audio signals |
KR20100063113A (ko) * | 2007-10-09 | 2010-06-10 | 코닌클리즈케 필립스 일렉트로닉스 엔.브이. | 바이노럴 오디오 신호를 생성하기 위한 방법 및 장치 |
KR20120013893A (ko) * | 2010-08-06 | 2012-02-15 | 삼성전자주식회사 | 디코딩 방법 및 그에 따른 디코딩 장치 |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105792090A (zh) * | 2016-04-27 | 2016-07-20 | 华为技术有限公司 | 一种增加混响的方法与装置 |
CN105792090B (zh) * | 2016-04-27 | 2018-06-26 | 华为技术有限公司 | 一种增加混响的方法与装置 |
CN110100460A (zh) * | 2017-01-30 | 2019-08-06 | 谷歌有限责任公司 | 基于头部位置和时间的具有非头部跟踪立体声的高保真度立体声响复制音频 |
Also Published As
Publication number | Publication date |
---|---|
KR20160083859A (ko) | 2016-07-12 |
CN105900455B (zh) | 2018-04-06 |
US20160275956A1 (en) | 2016-09-22 |
US20220059105A1 (en) | 2022-02-24 |
KR101804744B1 (ko) | 2017-12-06 |
EP3062534B1 (en) | 2021-03-03 |
US10204630B2 (en) | 2019-02-12 |
US10692508B2 (en) | 2020-06-23 |
WO2015060654A1 (ko) | 2015-04-30 |
CN108347689A (zh) | 2018-07-31 |
US20190122676A1 (en) | 2019-04-25 |
EP3062535A4 (en) | 2017-07-05 |
KR101804745B1 (ko) | 2017-12-06 |
CN108449704B (zh) | 2021-01-01 |
CN105900455A (zh) | 2016-08-24 |
EP3062534A4 (en) | 2017-07-05 |
EP3062534A1 (en) | 2016-08-31 |
CN105874819B (zh) | 2018-04-10 |
CN105874819A (zh) | 2016-08-17 |
EP3062535A1 (en) | 2016-08-31 |
US11195537B2 (en) | 2021-12-07 |
CN108449704A (zh) | 2018-08-24 |
US10580417B2 (en) | 2020-03-03 |
CN108347689B (zh) | 2021-01-01 |
KR20160083860A (ko) | 2016-07-12 |
EP3062535B1 (en) | 2019-07-03 |
US20160277865A1 (en) | 2016-09-22 |
US20200152211A1 (en) | 2020-05-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2015060652A1 (ko) | 오디오 신호 처리 방법 및 장치 | |
WO2015041476A1 (ko) | 오디오 신호 처리 방법 및 장치 | |
WO2015099429A1 (ko) | 오디오 신호 처리 방법, 이를 위한 파라메터화 장치 및 오디오 신호 처리 장치 | |
WO2015142073A1 (ko) | 오디오 신호 처리 방법 및 장치 | |
WO2015152665A1 (ko) | 오디오 신호 처리 방법 및 장치 | |
KR102216657B1 (ko) | 오디오 신호 처리 방법 및 장치 | |
KR102317732B1 (ko) | 오디오 신호 처리 방법 및 장치 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 14856742 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 20167009852 Country of ref document: KR Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 15031275 Country of ref document: US |
|
REEP | Request for entry into the european phase |
Ref document number: 2014856742 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2014856742 Country of ref document: EP |