EP4133477A1 - Method of processing an input audio signal for generating an output audio signal having the reverberation characteristics of a specific acoustic environment - Google Patents
Method of processing an input audio signal for generating an output audio signal having the reverberation characteristics of a specific acoustic environmentInfo
- Publication number
- EP4133477A1 EP4133477A1 EP20717647.0A EP20717647A EP4133477A1 EP 4133477 A1 EP4133477 A1 EP 4133477A1 EP 20717647 A EP20717647 A EP 20717647A EP 4133477 A1 EP4133477 A1 EP 4133477A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- audio signal
- rir
- block
- samples
- input audio
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000005236 sound signal Effects 0.000 title claims abstract description 208
- 238000000034 method Methods 0.000 title claims abstract description 160
- 239000011159 matrix material Substances 0.000 claims description 61
- 230000009466 transformation Effects 0.000 claims description 50
- 230000003247 decreasing effect Effects 0.000 claims description 20
- 230000006870 function Effects 0.000 claims description 13
- 230000003068 static effect Effects 0.000 claims description 11
- 238000005315 distribution function Methods 0.000 claims description 4
- 238000009499 grossing Methods 0.000 claims description 4
- 239000000872 buffer Substances 0.000 description 67
- 239000013598 vector Substances 0.000 description 37
- 238000005192 partition Methods 0.000 description 6
- 230000007423 decrease Effects 0.000 description 5
- 238000013459 approach Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 238000000844 transformation Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 230000000135 prohibitive effect Effects 0.000 description 1
- 230000001902 propagating effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10K—SOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
- G10K15/00—Acoustics not otherwise provided for
- G10K15/08—Arrangements for producing a reverberation or echo sound
- G10K15/12—Arrangements for producing a reverberation or echo sound using electronic time-delay networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R5/00—Stereophonic arrangements
- H04R5/04—Circuit arrangements, e.g. for selective connection of amplifier inputs/outputs to loudspeakers, for loudspeaker detection, or for adaptation of settings to personal preferences or hearing impairments
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
- H04S7/305—Electronic adaptation of stereophonic audio signals to reverberation of the listening space
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2499/00—Aspects covered by H04R or H04S not otherwise provided for in their subgroups
- H04R2499/10—General applications
- H04R2499/13—Acoustic transducers and sound field adaptation in vehicles
Definitions
- the invention relates to a method of processing an input audio signal for generating an output audio signal having the reverberation characteristics of a specific acoustic environment represented by its pre-recorded Room-Impulse-Response (“RIR”).
- RIR Room-Impulse-Response
- Audio signal processing generally, comprises processing of input audio signals, i.e. audio signals which are input to a digital audio signal processing unit, having specific input audio signal properties so as to generate output audio signals, i.e. audio signals which are output of the audio signal processing unit, having specific output audio signal properties at least partly different from the input audio signal properties.
- audio signal processing may comprise modifying one or more properties of an input audio signal so as to obtain an output audio signal having one or more properties which are modified relative to the respective properties of the input audio signal.
- One specific aim in audio signal processing comprises processing an input audio signal for generating an output audio signal having the reverberation characteristics of a specific acoustic environment, e.g. a specific room or venue.
- a respective room or venue can form part of a specific building.
- audio signal processing comprising real-time convolution-based artificial reverberation using pre-recorded RIR data from a real acoustic environment, e.g. a real room
- a real acoustic environment e.g. a real room
- computing power i.e. more computing operations, such as Floating-Point Operations Per Input Sample (“FLOPIS”)
- FLOPIS Floating-Point Operations Per Input Sample
- memory throughput i.e. memory access operations than those feasible in real-time.
- FLOPIS Floating-Point Operations Per Input Sample
- the required number of computing operations and memory size typically, depend on the physical size of the respective acoustic environment, the sampling rate used during the recording of the RIR data and the play-back of the audio signal to be reverberated.
- the sampling rates typically used for audio and for the reverberation times of large acoustic environments, such as large buildings or venues, e.g. cathedrals, the length of the RIR data typically, turns out to be very large.
- the monophonic RIR Finite Impulse Response (“FIR”) model may have 192 x 10 3 samples.
- a direct real convolution would thus, require 384 x 10 3 FLOPIS, and 384 x 10 3 memory locations to store the 192 x 10 3 RIR samples plus the 192 x 10 3 most recent samples of the input signal.
- stereophonic configuration stereo input signal and stereo RIR FIR model
- the Uniform Partition Overlap-Save (“UPOLS”) method is a widely used uniform partition algorithm for real-time artificial reverberation. UPOLS may significantly reduce the computing operations compared with the direct convolution, however it doubles the required memory because UPOLS works with complex data.
- the Non-Uniform Partition Overlap-Save (“N UPOLS”) method is known. N UPOLS is a non-uniform partition algorithm for real-time artificial reverberation. It has the same memory requirements as UPOLS but reduces the computing operations even further.
- NUPOLS is a multi-thread algorithm which can be very challenging in its implementation and is even not possible to use when the real-time processing needs to be done in a single thread.
- the object of the present invention to provide an improved method of processing an input audio signal for generating an output audio signal having the reverberation characteristics of a specific acoustic environment, particularly with respect to the computing power and memory requirements for the respective digital audio signal processing unit and with respect to the ease of implementation.
- the object is achieved by a method of processing an input audio signal for generating an output audio signal having the reverberation characteristics of a specific acoustic environment according to Claim 1.
- the Claims depending on Claim 1 refer to possible embodiments of the method of Claim 1.
- a first aspect of the invention refers to a method of processing an input audio signal for generating an output audio signal having the reverberation characteristics of a specific acoustic environment represented by its pre-recorded Room-Impulse-Response (“RIR”).
- the method thus, enables, by processing an input audio signal, generating an output audio signal having the reverberation characteristics of a specific acoustic environment, e.g. a specific room of a specific building, represented by its pre-recorded RIR.
- the method can be implemented by a hardware- and/or software-embodied digital signal processing unit configured to perform the method.
- the digital signal processing unit may comprise at least one processing unit, such as a processor, and at least one memory unit.
- the digital signal processing unit may form part of an apparatus for processing an audio signal.
- a respective apparatus can form a vehicle audio system or a car audio system, i.e. an audio system that is to be installed or is installed in a vehicle or a car, respectively or form part of a respective vehicle audio system or car audio system, respectively.
- a pre-recorded RIR of a specific acoustic environment e.g. a specific room of a specific building
- the pre-recorded RIR is or can be represented by its RIR samples.
- the pre-recorded RIR of a respective specific acoustic environment can be obtained through known methods for recording the RIR of acoustic environments.
- the actual recording of a respective acoustic environment is typically, not a step of the method.
- the first step of the method can be implemented by a hardware- and/or software- embodied RIR provision unit which is configured to provide a pre-recorded RIR of a specific acoustic environment.
- the pre-recorded RIR provided by the RIR provision unit is or can be represented by its RIR samples.
- a discrete input audio signal i.e. a signal representative of a specific audio content, e.g. a musical piece.
- the input audio signal is or can be represented by its incoming audio signal samples.
- the second step of the method can be implemented by a hardware- and/or software-embodied input audio signal provision unit which is configured to provide a discrete input audio signal from a physical or non-physical input audio signal source, such as data carrier source, a network source, etc.
- the discrete input audio signal provided by the input audio signal provision unit is or can be represented by its incoming audio signal samples.
- the incoming audio signal samples of the discrete input audio signal are divided in a number of input audio signal blocks, whereby each input audio signal block has the same size in audio signal samples and/or the same number of audio signal samples.
- every input audio signal block can comprise the same number of audio signal samples.
- the third step of the method can be implemented by a hardware- and/or software-embodied sample dividing unit which is configured to divide the incoming audio signal samples of the discrete input audio signal in a number of input audio signal blocks, whereby each input audio signal block has the same number of audio signal samples.
- the samples of the RIR are divided in a number of RIR blocks, whereby each RIR block has the same number of RIR samples.
- every RIR block can comprise the same number of RIR samples.
- the number of RIR samples of the RIR blocks is equal to the size in audio signal samples and/or the number of audio signal samples of the input audio signal blocks.
- the fourth step of the method can be implemented by a hardware- and/or software-embodied sample dividing unit which is configured to divide the RIR samples of the RIR in a number of RIR blocks, whereby each RIR block has the same number of RIR samples.
- a fifth step of the method it is determined if/when an input audio signal block becomes available, and, if an input audio signal block has become available, an output audio signal block is produced by processing the respective input audio signal block, whereby the output audio signal block has the same size and/or the same number of audio signal samples as the input audio signal block. As such, it is determined if/when a sufficient number of audio signal samples have been input to build an input audio signal block and when the input audio signal block is built, an output audio signal block is produced by processing the respective input audio signal block.
- the fifth step of the method can be implemented by a hardware- and/or software- embodied sample determining unit which is configured to determine if/when an input audio signal block becomes available, and, by a hardware- and/or software-embodied input block processing unit which is configured to process the respective input audio signal block so as to produce an output audio signal block, whereby the output audio signal block has the same size and/or the same number of audio signal samples as the input audio signal block.
- a number of RIR operating coefficients, particularly transformation coefficients, more particularly Discrete-Fourier-Transform (“DFT”) coefficients is determined for each RIR block, where this number is the same for all RIR blocks, on basis of a first processing rule.
- a first processing rule is applied on basis of which a number of RIR operating coefficients, particularly transformation coefficients, more particularly DFT coefficients, is determined for each RIR block, where this number is the same for all RIR blocks.
- the sixth step of the method can be implemented by a hardware- and/or software-embodied operating coefficient determining unit which is configured to determine a number of RIR operating coefficients, particularly transformation coefficients, more particularly DFT coefficients, for each RIR block, where this number is the same for all RIR blocks, on basis of a first processing rule.
- a hardware- and/or software-embodied operating coefficient determining unit which is configured to determine a number of RIR operating coefficients, particularly transformation coefficients, more particularly DFT coefficients, for each RIR block, where this number is the same for all RIR blocks, on basis of a first processing rule.
- a number of determined RIR operating coefficients is assigned to each RIR block. Typically, these coefficients are selected from those already determined for this RIR block. As such, each RIR block is assigned with at least one RIR operating coefficient which has been previously determined for this block.
- the seventh step of the method can be implemented by a hardware- and/or software-embodied operating coefficient assigning unit which is configured to assign a number of RIR operating coefficients to each RIR block, selected from those already determined for this RIR block.
- the RIR operating coefficients which have been assigned to the respective RIR blocks are stored as static values in at least one memory unit.
- the eighth step of the method can be implemented by a hardware- and/or software-embodied memory unit which is configured to store the RIR operating coefficients which have been assigned to the respective RIR blocks as static values.
- the stored RIR operating coefficients of the RIR are utilized together with corresponding time-varying operating coefficients of the input audio signal for determining and/or generating an output audio signal having the reverberation characteristics of the specific acoustic environment on basis of a second processing rule.
- each RIR operating coefficient has its corresponding input audio signal operating coefficient and, based on this relation, the RIR operating coefficients and the corresponding time-varying operating coefficients of the input audio signal are utilized for determining and/or generating an output audio signal having the reverberation characteristics of the specific acoustic environment on basis of a second processing rule.
- the ninth step of the method can be implemented by a hardware- and/or software-embodied processing unit which is configured to use the stored static RIR operating coefficients of the RIR together with corresponding time-varying operating coefficients of the input audio signal for determining and/or generating an output audio signal having the reverberation characteristics of the specific acoustic environment on basis of a second processing rule.
- the method thus, allows for implementing an Approximate Uniform Partition Overlap Save (“AUPOLS”) method which is different from and operates in between the abovementioned UPOLS- and NUPOLS-methods.
- the AUPOLS-method typically, has the same latency as the UPOLS- and NUPOLS-methods, it is a single-thread approach which is simple in its implementation, and uses/requires less memory than UPOLS and NUPOLS.
- THE AUPOLS- method allows for forming an approximate model of the pre-recorded RIR, in contrast to the error- free UPOLS- and NUPOLS methods, which form an exact model for the pre-recorded RIR.
- the AUPOLS-method provides an approximate model of the RIR, instead of providing an exact model of it.
- the AUPOLS-method then operates with the DFT transformed data of the K blocks.
- the transformed data can have the following time-frequency properties:
- the first processing rule can be applied based on an energy-based time-frequency tiling-process (“EBTFT-process”).
- EBTFT-process an energy-based time-frequency tiling-process
- an EBTFT-process can be used for determining the parameters for implementing the AUPOLS method and the corresponding AUPOLS structure for a given pre recorded RIR.
- the AUPOLS structure is particularly, beneficial in view of the resources required, namely the number of computing operations and the memory size.
- the EBTFT-process can comprise applying a time-domain window function to each RIR block to modify the first and last samples of each block so as to generate blocks, particularly gradually, increasing from a zero absolute value at a first sample and, particularly gradually, decreasing to a zero absolute value at a last sample.
- the EBTFT-process can further comprise appending a number of zero samples after the last sample of each block so as to generate double-sized blocks.
- the EBTFT-process can further comprise arranging the double-sized blocks as columns of a real matrix having a number of rows and a number of columns, whereby the number of rows corresponds to the number of samples of each double-sized block and the number of columns corresponds to the number of RIR blocks.
- the EBTFT-process can further comprise applying a DFT transformation to each column of the real matrix, and applying a replacement rule to each of the columns so as to replace each column by the squared magnitude of its DFT transformation, resulting in a matrix of the same size having only real positive elements.
- the EBTFT-process can further comprise removing all last rows comprising redundant information and doubling the elements of all rows except of those of the first and the last row, so as to generate a matrix of real positive elements, whereby the elements of the matrix represent the energy distribution function of the particular RIR in the time-frequency domain.
- the EBTFT-process can further comprise applying a smoothing function to the energy distribution function.
- the EBTFT-process can further comprise applying an energy threshold rule to the elements of each column, such that only the first elements of each column that sum up to a threshold energy, e.g. 90%, of the total energy of the respective column, are kept, whereas the remaining elements of the respective column are set to zero, resulting in a modified matrix having zeros at the last locations of each column.
- a threshold energy e.g. 90%
- the EBTFT-process can further comprise generating a strictly-monotonically decreasing sequence, indicating for each column of the matrix the remaining energy of the matrix starting from the particular column and normalizing this sequence with the sum of all energies of all columns of the matrix or the sum of all elements of the matrix, respectively thereby, generating a strictly-monotonically decreasing sequence in the interval between 0 and 1.
- the EBTFT-process can further comprise modifying the decay rate of the strictly-monotonically- decreasing sequence by applying a transformation on the sequence which converts an arbitrary strictly-monotonically-decreasing sequence to another sequence with the same property, that takes values in the same interval.
- the EBTFT-process can further comprise determining a second sequence based on the modified matrix, that for each particular column of the modified matrix expresses the sum of all elements of the respective column of the modified matrix.
- the EBTFT-process can further comprise determining a third sequence based on the modified matrix, the strictly-monotonically decreasing sequence, and the second sequence, whereby the third sequence is a monotonically decreasing sequence. It is possible that two or more consecutive values of this third sequence are equal to each other, which means that the third sequence is not a strictly-monotonically-decreasing sequence.
- the EBTFT-process can further comprise applying a grouping rule to the samples of the third sequence, so as to group together P g consecutive samples having the same value N g .
- This value N g represents the number of RIR operating coefficients for each of the P g RIR blocks in the respective group of RIR blocks.
- the number of samples P g grouped together represents the number of RIR blocks using the same number N g of RIR operating coefficients.
- this unique value N g represents the number of RIR operating coefficients for the respective RIR block.
- a respective EBTFT-process typically, analyzes the provided pre-recorded RIR samples on the time-frequency plane and determines the numbers P g and N g that best match how the energy of the RIR samples is distributed in time and frequency. It therefore, allows for yielding an AUPOLS structure that best matches the time-frequency properties of the RIR samples.
- the pre-recorded RIR can be extended with zeros if its length is not a multiple of the block size B.
- a time-domain window function can optionally be applied to each RIR block.
- the time-domain window function allows for modifying the RIR samples at the vicinity of the block boundaries.
- the time-domain window function allows for modifying the first and the last RIR samples of each RIR block so as to make the power of the RIR samples to, particularly gradually, decrease to zero when approaching the start or the end of the RIR block.
- each of the RIR blocks can be processed so as to be arranged as columns of a real matrix, having 2B rows and K columns. Each of the K columns can then be replaced by the squared magnitude of its transform, typically its DFT transform.
- the result is a real matrix of the same size, having non-negative elements.
- the last (B-1) rows of this matrix can be removed so as to yield a real matrix E[l, b] of non-negative elements, having (B+1) rows and K columns.
- the elements of all (B-1) rows in between the first and the last row of the matrix can be doubled so as to compensate for the last (B-1) removed rows.
- the non-negative elements of the real matrix E[l, b] describe how the energy of the RIR is distributed in the time-frequency plane. More specifically, they describe how the energy contained in a certain frequency region fades in time, and how the energy contained in a certain time region fades in frequency.
- the elements of the real matrix E[l, b] being considered as the samples of a two-dimensional function, do not necessarily correspond to a smooth and well-behaved surface; but to a surface which may exhibit sudden minima and maxima along the time and the frequency direction. It is possible to smooth the two-dimensional surface with a suitable filter, such as a two- dimensional low-pass filter.
- a thresholding rule is a applied to the elements of each of the K columns of the real matrix E[l, b], whereby only the first elements (for the lower values of I) of the column containing a configuration or threshold parameter T p0 w% of the total column energy are maintained.
- the last elements (for the higher values of I) of the column can be set to zero.
- the configuration or threshold parameter T po % is between 0% and 100%.
- a first sequence D[b] is constructed.
- This first sequence D[b] expresses the remaining total power of the real matrix E[l, b] from column b to the last column (K-1). Thereby, column 0 is the first column.
- the sequence D[b] is divided by the total power of the real matrix E[l, b] (the sum of all the elements of the real matrix E[l, b]).
- the sequence D[b] has length K (because the real matrix E[l, b] has K columns).
- the last property states that the last sample of the sequence D[b] is typically, positive and (very) close to zero.
- the sequence D[b] is a strictly-monotonically-decreasing sequence and takes values in the interval (0 1]
- the decay rate of the sequence D[b] affects the complexity and the memory requirements of the AUPOLS structure that results from implementing the EBTFT-process. It is possible to modify the decay rate of the sequence D[b] by applying a transformation on the sequence that maps the interval (0 1] to the same interval and converts an arbitrary strictly- monotonically-decreasing sequence in this interval to another sequence with the same property.
- a second sequence C[b] is constructed.
- the second sequence C[b] expresses the sum of all elements of column b of the real matrix E[l, b].
- the second sequence C[b] has length K (because the real matrix E[l, b] has K columns).
- N[b] is the number of first elements of the first column of the real matrix E[l, b], that when added together yield a value that is at least equal to the product D[0]C[0]. This is similar for all other columns of the real matrix E[l, b].
- N[K-1] is the number of first elements of the last column of the real matrix E[l, b], that when added together yield a value that is at least equal to the product D[K-1]C[K-1]
- the selection of the elements starts from the first row of the real matrix E[l, b], whereby only the necessary number of elements is selected.
- a lower bound N min can be imposed on the sequence N[b]
- the lower bound N min can also be set to the value of 1 such that, in fact, no lower bound is imposed on the K integers which are the samples of the sequence N[b]
- the symbols N b and N[b] have the same meaning and denote the samples of the same sequence.
- N b is a monotonically decreasing sequence with only a few exceptions.
- the sequence N b can be processed so as to yield a monotonically decreasing sequence everywhere (i.e. with no exceptions this time).
- the way that the processing is done is not crucial, as long as the resulting sequence is monotonically decreasing everywhere, and its samples are close to the samples of the original sequence. In any case, the processing must yield values in the range N min £ N b £(B+1) for all values of the index b.
- the samples of the sequence N b yield the sequences of numbers (or in other words the sequences) P g and N g as follows: A number of P g consecutive samples of N b which are all equal to N g are grouped together to form the group g with population P g . This is repeated for all consecutive samples that can be grouped together because they are equal. Samples of N b with unique values, form each one its own group, with population one, since they cannot be grouped with any other sample due to having unique values. The total number of the groups formed in this way is the number G. For the group index g it is then 0£g ⁇ G.
- the symbols P g and P[g] have the same meaning and denote the population of the group with index g.
- the symbols N g and N[g] have the same meaning and denote the value of the elements, or the value of the element, of the sequence N[b] that formed the group with index g.
- the previous step may yield a group g’ with a population that is deemed to be too small. If respective groups g’ with a population smaller than P min > 1 are not allowed, then this group g’ must be merged with another neighbouring group. Thus, group g’ can be merged either with its previous group (g’ - 1) or with its next group (g’ + 1). This increases the population of the previous group (g’ - 1) or of the next group (g’ + 1) by the population of group g’ and also eliminates the samples of the sequences P g and N g associated with group g’.
- the second processing rule (see the ninth step of the method) can be applied for every incoming input audio signal block.
- the second processing rule can comprise the following steps:
- a second aspect of the invention refers to a non-transitory computer readable medium comprising or storing computer-executable instructions, which when executed by a processing unit of a digital signal processing unit cause the digital signal processing unit to perform the method according to the first aspect of the invention.
- a non-transitory computer-readable medium can refer to any tangible computer-based device implemented in any method of technology for short-term and long-term storage of information, such as, computer-readable instructions, data structures, program modules and sub-modules, or other data in any device. Therefore, the method described herein may be encoded as executable instructions embodied in a tangible, non-transitory, computer-readable medium, including, without limitation, a storage device and/or a memory device.
- non-transitory computer-readable medium generally, includes all tangible, computer-readable media, including, without limitation, non-transitory computer storage devices, including without limitation, volatile and non-volatile media, and removable and non-removable media such as firmware, physical and virtual storage, CD-ROMS, DVDs, and any other digital source such as a local or global network, as well as yet to be developed digital means, with the sole exception being transitory, propagating signal.
- a third aspect of the invention refers to a processing unit comprising at least one processor having computer-executable instructions, which when executed by the processor cause the digital signal processing unit to perform the method according to the first aspect of the invention.
- a fourth aspect of the invention refers to an apparatus for processing an audio signal, comprising a processing unit according to the third aspect of the invention.
- a fifth aspect of the invention refers to a vehicle, particularly a car, comprising an apparatus for processing an audio signal according to the fourth aspect of the invention.
- FIG. 1, 2 each show a principle drawing a structure allowing for implementing a method of processing an input audio signal for generating an output audio signal having the reverberation characteristics of a specific acoustic environment according to an exemplary embodiment.
- Fig. 1, 2 each show a principle drawing a, particularly software-embodied, structure allowing for implementing a method of processing an input audio signal for generating an output audio signal having the reverberation characteristics of a specific acoustic environment according to an exemplary embodiment.
- the method is a method of processing an input audio signal for generating an output audio signal having the reverberation characteristics of a specific acoustic environment represented by its pre recorded Room-Impulse-Response (“RIR”).
- RIR Room-Impulse-Response
- the method thus, enables, by processing an input audio signal, generating an output audio signal having the reverberation characteristics of a specific acoustic environment, e.g. a specific room of a specific building, such as an interior of a specific cathedral, represented by its pre-recorded RIR.
- the method can be implemented by a hardware- and/or software-embodied digital signal processing unit configured to perform the method.
- the digital signal processing unit may comprise at least one processing unit (not shown), such as a processor, and at least one memory unit (not shown), such as a memory.
- the digital signal processing unit may form part of an apparatus for processing an audio signal (not shown).
- a respective apparatus can form a vehicle audio system or a car audio system, i.e. an audio system that is to be installed or is installed in a vehicle or a car, respectively or form part of a respective vehicle audio system or car audio system, respectively.
- a pre-recorded RIR of a specific acoustic environment e.g. a specific room, such as a specific room of a specific building.
- the pre-recorded RIR is or can be represented by its RIR samples.
- the first step of the method can be implemented by a hardware- and/or software-embodied RIR provision unit (not shown) which is configured to provide a pre-recorded RIR of a specific acoustic environment.
- the pre-recorded RIR provided by the RIR provision unit is or can be represented by its RIR samples.
- a discrete input audio signal i.e. a signal representative of a specific audio content, e.g. a musical piece.
- the input audio signal is or can be represented by its incoming audio signal samples.
- the second step of the method can be implemented by a hardware- and/or software-embodied input audio signal provision unit which is configured to provide a discrete input audio signal from a physical or non-physical input audio signal source, such as data carrier source, a network source, etc.
- the discrete input audio signal provided by the input audio signal provision unit is or can be represented by its incoming audio signal samples.
- the incoming audio signal samples of the discrete input audio signal are divided in a number of input audio signal blocks, whereby each input audio signal block has the same size in audio signal samples and/or the same number of audio signal samples.
- every input audio signal block can comprise the same number of audio signal samples.
- the third step of the method can be implemented by a hardware- and/or software-embodied sample dividing unit (not shown) which is configured to divide the incoming audio signal samples of the discrete input audio signal in a number of input audio signal blocks, whereby each input audio signal block has the same number of audio signal samples.
- the samples of the RIR are divided in a number of RIR blocks, whereby each RIR block has the same size in RIR samples and/or the same number of RIR samples.
- every RIR block can comprise the same size of RIR samples and/or the same number of RIR samples.
- the size in RIR samples and/or the number of RIR samples of the Rl R blocks is equal to the size in audio signal samples and/or the number of audio signal samples of the input audio signal blocks.
- the fourth step of the method can be implemented by a hardware- and/or software-embodied sample dividing unit (not shown) which is configured to divide the RIR samples of the RIR in a number of RIR blocks, whereby each RIR block has the same size in RIR samples and/or the same number of RIR samples.
- a hardware- and/or software-embodied sample dividing unit (not shown) which is configured to divide the RIR samples of the RIR in a number of RIR blocks, whereby each RIR block has the same size in RIR samples and/or the same number of RIR samples.
- a fifth step of the method it is determined if/when an input audio signal block becomes available, and, if an input audio signal block has become available, an output audio signal block is produced by processing the respective input audio signal block, whereby the output audio signal block has the same size and/or the same number of audio signal samples as the input audio signal block. As such, it is determined if/when a sufficient number of audio signal samples have been input to build an input audio signal block and when the input audio signal block is built, an output audio signal block is produced by processing the respective input audio signal block.
- the fifth step of the method can be implemented by a hardware- and/or software- embodied sample determining unit (not shown) which is configured to determine if/when an input audio signal block becomes available, and, by a hardware- and/or software-embodied input block processing unit (not shown) which is configured to process the respective input audio signal block so as to produce an output audio signal block, whereby the output audio signal block has the same size and/or the same number of audio signal samples as the input audio signal block.
- a number of RIR coefficients, particularly transformation coefficients, more particularly Discrete-Fourier-Transform (“DFT”) coefficients is determined for each RIR block, where this number is the same for all RIR blocks, on basis of a first processing rule.
- a first processing rule is applied on basis of which a number of RIR coefficients, particularly transformation coefficients, more particularly DFT coefficients, is determined for each RIR block, where this number is the same for all RIR blocks.
- the sixth step of the method can be implemented by a hardware- and/or software-embodied coefficient determining unit (not shown) which is configured to determine a number of RIR coefficients, particularly transformation coefficients, more particularly DFT coefficients, for each RIR block, where this number is the same for all RIR blocks, on basis of a first processing rule.
- a hardware- and/or software-embodied coefficient determining unit (not shown) which is configured to determine a number of RIR coefficients, particularly transformation coefficients, more particularly DFT coefficients, for each RIR block, where this number is the same for all RIR blocks, on basis of a first processing rule.
- a or the number of RIR operating coefficients is assigned to each RIR block, where these coefficients are selected from those already determined for this RIR block.
- each RIR block is assigned with at least one RIR operating coefficient which has been previously determined for this RIR block.
- the seventh step of the method can be implemented by a hardware- and/or software-embodied operating coefficient assigning unit (not shown) which is configured to assign a or the number of RIR operating coefficients to each RIR block selected from those already determined for this block.
- the RIR operating coefficients which have been assigned to the respective RIR blocks are stored as static values in at least one memory unit.
- the eighth step of the method can be implemented by a hardware- and/or software-embodied memory unit which is configured to store the RIR operating coefficients which have been assigned to the respective RIR blocks as static values.
- the stored RIR operating coefficients of the RIR are utilized together with corresponding time-varying operating coefficients of the input audio signal for determining and/or generating an output audio signal having the reverberation characteristics of the specific acoustic environment on basis of a second processing rule.
- each RIR operating coefficient has its corresponding input audio signal operating coefficient and, based on this relation, the RIR operating coefficients of the RIR and the corresponding time-varying operating coefficients of the input audio signal are utilized for determining and/or generating an output audio signal having the reverberation characteristics of the specific acoustic environment on basis of a second processing rule.
- the ninth step of the method can be implemented by a hardware- and/or software-embodied processing unit (not shown) which is configured to use the stored static RIR operating coefficients of the RIR together with corresponding time-varying operating coefficients of the input audio signal for determining and/or generating an output audio signal having the reverberation characteristics of the specific acoustic environment on basis of a second processing rule.
- the method thus, allows for implementing an Approximate Uniform Partition Overlap Save (“AUPOLS”) method which is different from and operates in between the abovementioned UPOLS- and NUPOLS-methods.
- the AUPOLS-method has the same latency as the UPOLS- and NUPOLS-methods. It is a single-thread approach which is simple in its implementation and uses/requires less memory than UPOLS and N UPOLS.
- the AUPOLS-method allows for forming an approximate model of the pre-recorded RIR, in contrast to the error-free UPOLS- and NUPOLS-methods.
- the AUPOLS-method provides an approximate model of the pre-recorded RIR, instead of providing an exact model of the pre-recorded RIR.
- the first processing rule can advantageously be applied based on an energy-based time- frequency tiling-process (“EBTFT-process”).
- EBTFT-process can be used for determining the parameters for implementing the AUPOLS method and the corresponding AUPOLS structure for a given pre-recorded RIR.
- the EBTFT-process can comprise applying a time-domain window function to each RIR block to modify the first and last samples of each block so as to generate blocks, particularly gradually, increasing from a zero absolute value at a first sample and, particularly gradually, decreasing to a zero absolute value at a last sample.
- the EBTFT-process can further comprise appending a number of zero samples after the last sample of each block so as to generate double-sized blocks.
- the EBTFT-process can further comprise arranging the double-sized blocks as columns of a real matrix having a number of rows and a number of columns, whereby the number of rows corresponds to the number of samples of each double-sized block and the number of columns corresponds to the number of RIR blocks.
- the EBTFT-process can further comprise applying a DFT transformation to each column of the real matrix, and applying a replacement rule to each of the columns so as to replace each column by the squared magnitude of its DFT transformation, resulting in a matrix of the same size having only real positive elements.
- the EBTFT-process can further comprise removing all last rows comprising redundant information and doubling the elements of all rows except of those of the first and the last row, so as to generate a matrix of real positive elements, whereby the elements of the matrix represent the energy distribution (function) of the particular RIR in the time-frequency domain.
- the EBTFT-process can further comprise applying a filter function or operation, particularly a smoothing function or operation, to the energy distribution function.
- the EBTFT-process can further comprise applying an energy threshold rule to the elements of each column, such that only the first elements of each column that sum up to a threshold energy, e.g. 90%, of the total energy of the respective column, are kept, whereas the remaining elements of the respective column are set to zero, resulting in a modified matrix having zeros at the last locations of each column.
- a threshold energy e.g. 90%
- the EBTFT-process can further comprise generating a strictly-monotonically decreasing sequence, indicating for each column of the matrix the remaining energy of the matrix starting from the particular column and normalizing this sequence with the sum of all energies of all columns (the sum of all elements of the matrix) thereby, generating a strictly-monotonically decreasing sequence in the interval between 0 and 1.
- the EBTFT-process can further comprise modifying the decay rate of the strictly-monotonically- decreasing sequence by applying a transformation on the sequence which converts an arbitrary strictly-monotonically-decreasing sequence to another sequence with the same property, that takes values in the same interval.
- the EBTFT-process can further comprise determining a second sequence based on the modified matrix, that for each particular column of the modified matrix expresses the sum of all elements of the respective column of the modified matrix.
- the EBTFT-process can further comprise determining a third sequence based on the modified matrix, the strictly-monotonically decreasing sequence, and the second sequence, whereby the third sequence is a monotonically decreasing sequence. It is possible two or more consecutive values of this third sequence to be equal to each other, meaning that the third sequence is not a strictly-monotonically-decreasing sequence.
- the EBTFT-process can further comprise applying a grouping rule to the samples of the third sequence, so as to group together consecutive samples having the same value, whereby this value represents the number of RIR transformation operating coefficients in the respective group of RIR blocks, and whereby the number of samples grouped together represents the number of RIR blocks in the respective group of RIR blocks.
- FIG. 1 shows a structure allowing for implementing an AUPOLS- method according to an exemplary embodiment of the method.
- the method processes incoming samples x n frame by frame.
- x n represents the value of the input audio signal at time n, where n30.
- the input audio signal is assumed to be zero at time n ⁇ 0.
- the frame size is B 31 samples.
- the k th frame to be processed, whereby k 30 is the frame index and frame 0 is the first frame, is the vector of samples Xk [XKB+O, XkB +i , ... , x kB + (B-i)]ixB.
- Buffer 1 contains the samples of the vector X k .
- the first sample XKB of the vector X k is located at the first (leftmost) location of buffer 1. This is the first buffer location. This convention is followed for all buffers and vectors shown in the illustration of Fig.1.
- the transformation indicated at 4 represents a size 2B Discrete-Fourier-Transform (size 2B R-C DFT transform) of the real time-domain vector [Xk-i
- Xk] [XkB-B, ... , XkB +(B -i)]ix2B, which is the row vector formed by the samples of the vector xn located in buffer 3 followed by the samples of the vector X k located in buffer 2.
- This transform maps a real time-domain vector to a complex frequency-domain vector of the same size.
- the first transformation coefficient (DC term) is X k 2 B (the first element of the vector) and the last transformation coefficient is X k 2 B+( 2 B -i> (the last element of the vector)
- yj [d kB , ... , d kB+(B -i)
- the transformation indicated at 6 represents a size 2B Inverse-Discrete-Fourier Transform (size 2B C-R I DFT transform). This transform maps a complex frequency-domain vector to a real time- domain vector of the same size.
- the elements of vector d k are collected in buffer 8 and are all discarded.
- Xk [XkB + o, XkB +i , ... , x kB +(B -i)]ixB.
- This output has an inherent delay of (B-1) samples, since a total of B input samples need to be collected to build up a respective block for the processing to start. Only when an input block is complete and available, the output to this block can be calculated. The latency of the AUPOLS-method is thus, (B-1) samples.
- buffer 18 contains the samples ho of block 1 (first block) of group 0 (first group)
- buffer 19 contains the samples hi of block 2 of group 0
- buffer 20 contains the samples hp [ o ] -i of block Po (last block) of group 0. It is similar for buffers 21 , 22, 23, but this time for the blocks within group g.
- the transformation indicated at 25 represents a Discrete-Fourier-Transform (size 2B R-C DFT transform) of the vector [ho
- the transformation indicated at 25 maps a real time-domain vector of size 2B to a complex frequency-domain vector of the same size.
- 0B]I X2 B, 0 £k ⁇ K, into the frequency-domain vectors Hk [H B+O, H B+I , ... , H B+(2B-I)]I C 2B, 0 £k ⁇ K.
- K represents the number of the RIR blocks.
- the first DFT coefficient (DC term) that results from the transformation is H k 2 B and the last DFT coefficient that results from the transformation is H k 2 B+( 2 B -i ) .
- Buffer 26 contains the first No elements of the vector Ho and buffer 27 contains the first N g elements of the vector H k , where k is equal to the number of buffers located above buffer 27, all of them marked with the sequence symbol N. For group 0 , there are Po buffers having the same size No as buffer 26 and for group g there are P g buffers having the same size N g as buffer 27.
- Buffers and transformations not explicitly shown in Fig. 1 are indicated at 28 and 29.
- Groups not explicitly shown in Fig. 1 are indicated at 30 and 31.
- buffer 26 as an exemplary buffer and all the buffers underneath.
- a total of K buffers containing complex data comprise the RIR transformation operating coefficients.
- the values of the RIR transformation operating coefficients can be calculated off-line and stay constant throughout the streaming and the processing of the real-time data.
- the static RIR transformation operating coefficients are stored in a memory unit.
- the output X k of the transformation indicated at 4 is calculated.
- the last (B-1) elements of X k are implied by the complex-conjugate symmetry property of the transformation indicated at 4. These are all discarded immediately after the output of the transformation indicated at 4 has been calculated. A total of (B+1) elements (complex numbers) remain after discarding these last (B-1) elements (complex numbers) of X k.
- the first No are shifted into buffer 10 and the last (B+1-No) are discarded.
- the previous elements of buffer 10 are shifted to the next buffer, namely one buffer below. Every time that elements are shifted by moving downwards into any of the buffers, the elements of the buffer where the elements get shifted to, are also shifted downwards one buffer further below.
- the calculation of the output frame yo for the input frame xo is done using the initial zero values in all (K-1) buffers below buffer 10 and the non-zero values in buffer 10 resulting from the transformation indicated at 4.
- the last (N g - N g+i ) elements of the last buffer of group g are discarded, since N g > N g+i , meaning that the buffers of group (g+1) can each only accommodate N g+i elements.
- the first N g+i elements of the last buffer of group g are shifted into the first buffer of group (g+1) and the last (N g - N g+i ) elements of the last buffer of group g are simply discarded.
- the complex multiplier indicated at 11 forms (outputs) the complex vector [HoX k 2B, HIXK2B + I , , H N[ o ] -i X k 2 B+N[ o ] -i ] 1 xN[ o ]
- This is a complex vector with N[0] elements.
- each of the multipliers under the multiplier indicated at 11 forms in a similar way the element-by-element complex product between the complex contents of its corresponding pair of buffers. These are the buffers marked with the symbol N.
- accumulator 5 appends (B+1-No) complex zero samples to the complex vector sum. This corresponds to the removal of the last (B+1-No) complex elements when feeding buffer 10 from the output of the transformation indicated at 4.
- Fig. 2 shows a principle drawing of a structure allowing for implementing a method of processing an input audio signal for generating an output audio signal having the reverberation characteristics of a specific acoustic environment according to another exemplary embodiment.
- the structure of Fig. 2 is a structure for a specific numerical scenario, whereby the numbers in Fig. 2 indicate exemplary buffer sizes and exemplary numbers of Discrete-Time Fourier Transform operating coefficients used in the respective buffers.
- the block size B needs to be no less than 256.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Multimedia (AREA)
- Stereophonic System (AREA)
Abstract
Description
Claims
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/EP2020/059889 WO2021204363A1 (en) | 2020-04-07 | 2020-04-07 | Method of processing an input audio signal for generating an output audio signal having the reverberation characteristics of a specific acoustic environment |
Publications (1)
Publication Number | Publication Date |
---|---|
EP4133477A1 true EP4133477A1 (en) | 2023-02-15 |
Family
ID=70224390
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP20717647.0A Pending EP4133477A1 (en) | 2020-04-07 | 2020-04-07 | Method of processing an input audio signal for generating an output audio signal having the reverberation characteristics of a specific acoustic environment |
Country Status (2)
Country | Link |
---|---|
EP (1) | EP4133477A1 (en) |
WO (1) | WO2021204363A1 (en) |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU2004203538B2 (en) * | 1998-09-25 | 2006-11-16 | Sony Corporation | Sound effect adding apparatus |
-
2020
- 2020-04-07 EP EP20717647.0A patent/EP4133477A1/en active Pending
- 2020-04-07 WO PCT/EP2020/059889 patent/WO2021204363A1/en unknown
Also Published As
Publication number | Publication date |
---|---|
WO2021204363A1 (en) | 2021-10-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102546541B1 (en) | Method and apparatus for compressing and decompressing a higher order ambisonics representation for a sound field | |
KR102694615B1 (en) | Cross product enhanced subband block based harmonic transposition | |
KR20240096662A (en) | Method and apparatus for compressing and decompressing a higher order ambisonics representation | |
CN104681034A (en) | Audio signal processing method | |
KR102410850B1 (en) | Method and apparatus for extracting reverberant environment embedding using dereverberation autoencoder | |
AU2015271580B2 (en) | Method for processing speech/audio signal and apparatus | |
JP7248745B2 (en) | Stereo signal processing method and apparatus | |
EP4133477A1 (en) | Method of processing an input audio signal for generating an output audio signal having the reverberation characteristics of a specific acoustic environment | |
EP2730026B1 (en) | Low-delay filtering | |
CN111968620B (en) | Algorithm testing method and device, electronic equipment and storage medium | |
CN111667846A (en) | Blind source separation method and device | |
EP4007310A1 (en) | Method of processing an input audio signal for generating a stereo output audio signal having specific reverberation characteristics | |
CN111383643B (en) | Audio packet loss hiding method and device and Bluetooth receiver | |
CN112997511B (en) | Generating harmonics in an audio system | |
CN114283842A (en) | Training method and device for audio separation network, electronic equipment and storage medium | |
JP4443118B2 (en) | Inverse filtering method, synthesis filtering method, inverse filter device, synthesis filter device, and device having such a filter device | |
CN104952455B (en) | The method and apparatus for realizing reverberation | |
JP5169584B2 (en) | Impulse response processing device, reverberation imparting device and program | |
JP2020122855A (en) | Estimation device, method thereof and program | |
Lorente et al. | GPU based implementation of multichannel adaptive room equalization | |
CN111540372B (en) | Method and device for noise reduction processing of multi-microphone array | |
CN113079452B (en) | Audio processing method, audio direction information generating method, electronic device, and medium | |
JP7571984B2 (en) | Stereo signal processing method and apparatus | |
CN204791955U (en) | Realize device of reverberation | |
Bai et al. | Multirate synthesis of reverberators using subband filtering |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: UNKNOWN |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20221011 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
DAV | Request for validation of the european patent (deleted) | ||
DAX | Request for extension of the european patent (deleted) | ||
P01 | Opt-out of the competence of the unified patent court (upc) registered |
Effective date: 20230530 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: EXAMINATION IS IN PROGRESS |
|
17Q | First examination report despatched |
Effective date: 20231123 |