EP3583786A1 - Apparatus and method for downmixing multichannel audio signals - Google Patents
Apparatus and method for downmixing multichannel audio signalsInfo
- Publication number
- EP3583786A1 EP3583786A1 EP18754857.3A EP18754857A EP3583786A1 EP 3583786 A1 EP3583786 A1 EP 3583786A1 EP 18754857 A EP18754857 A EP 18754857A EP 3583786 A1 EP3583786 A1 EP 3583786A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- channel
- component
- input
- input channel
- computing device
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000000034 method Methods 0.000 title claims abstract description 67
- 230000005236 sound signal Effects 0.000 title claims abstract description 44
- 238000012545 processing Methods 0.000 claims abstract description 67
- 230000015654 memory Effects 0.000 claims description 28
- 238000004590 computer program Methods 0.000 claims description 19
- 230000006835 compression Effects 0.000 claims description 14
- 238000007906 compression Methods 0.000 claims description 14
- 230000000694 effects Effects 0.000 claims description 11
- 230000008569 process Effects 0.000 description 32
- 238000004891 communication Methods 0.000 description 17
- 238000010586 diagram Methods 0.000 description 15
- 230000002085 persistent effect Effects 0.000 description 11
- 238000012546 transfer Methods 0.000 description 8
- 230000006870 function Effects 0.000 description 7
- 238000012986 modification Methods 0.000 description 6
- 230000004048 modification Effects 0.000 description 6
- 230000001419 dependent effect Effects 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 5
- 230000002238 attenuated effect Effects 0.000 description 3
- 239000004744 fabric Substances 0.000 description 3
- 239000004065 semiconductor Substances 0.000 description 3
- 239000007787 solid Substances 0.000 description 3
- 238000001228 spectrum Methods 0.000 description 3
- 101001125402 Homo sapiens Vitamin K-dependent protein C Proteins 0.000 description 2
- 241000610375 Sparisoma viride Species 0.000 description 2
- 102100029477 Vitamin K-dependent protein C Human genes 0.000 description 2
- 238000007796 conventional method Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 210000005069 ears Anatomy 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 238000010183 spectrum analysis Methods 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
- 210000003813 thumb Anatomy 0.000 description 1
- 238000003079 width control Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S3/00—Systems employing more than two channels, e.g. quadraphonic
- H04S3/008—Systems employing more than two channels, e.g. quadraphonic in which the audio signals are in digital form, i.e. employing more than two discrete digital channels
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/03—Aspects of down-mixing multi-channel audio to configurations with lower numbers of playback channels, e.g. 7.1 -> 5.1
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/07—Generation or adaptation of the Low Frequency Effect [LFE] channel, e.g. distribution or signal processing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/01—Enhancing the perception of the sound image or of the spatial distribution using head related transfer functions [HRTF's] or equivalents thereof, e.g. interaural time difference [ITD] or interaural level difference [ILD]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/03—Application of parametric coding in stereophonic audio systems
Definitions
- the present application is related generally to audio signal processing and in particular to a computer implemented method, apparatus, and computer usable program code for downmixing multichannel audio signals.
- a 5.1 surround sound configuration consists of a pair of front speakers (L and R), one center front channel (C), a pair of side speakers (Ls and Rs), and one Low Frequency Effect (LFE), with a conventional order of L, C, R, Ls, Rs, LFE.
- a 7.1 surround sound configuration consists of a pair of front speakers (L and R), one center front channel (C), a pair of side surround speakers (Lss and Rss), a pair of rear surround speakers (Lrs and Rrs), and one Low Frequency Effect (LFE), with a conventional order of L, C, R, Lss, Rss, Lrs, Rrs, and LFE.
- Downmixing is a process that converts a program with a multiple-channel configuration (e.g., a multichannel audio file) into a program with fewer channels.
- a program with a multiple-channel configuration e.g., a multichannel audio file
- a 5.1 surround sound file or a 7.1 surround sound file can be downmixed and played using a two-channel stereophonic playback system while providing a good listening experience to a listener using the two-channel stereophonic playback system.
- each of the multichannel audio input is processed individually and separately with a respective processor to produced one or two channel output.
- the process on each channel does not properly consider the meaningful information between the speaker pairs. Consequently, the audio output obtained using these conventional downmixing processes is less accurate and sometimes can compromise the immersive listening experience.
- An object of the present application is to develop an audio downmix pipeline that processes the input audio channels by pairs.
- the resulting downmixed output has a better accuracy while retaining the spatial information of the original multichannel audio stream. While the number of the channel input and output can be any meaningful number under the definition, the following description uses 5.1 to stereo downmix and 7.1 to stereo downmix as examples.
- a method for processing a multichannel input audio signal is performed at a computing device having one or more processors, memory, and a pluralit of program modules stored in the memory and to be executed by the one or more processors.
- the method includes the following steps: selecting, from the multi-channel input audio signal, a left input channel and a right input channel, wherein the left input channel and the right input channel correspond to a pair of spatially symmetrical signal sources; generating one or more cross-channel features from the left input channel and the right input channel; processing, in accordance with the cross-channel features, the left input channel and the right input channel to generate a left intermediate channel and a right intermediate channel; and combining each of the left intermediate channel and the right intermediate channel with a third input channel of the multi-channel input audio signal to form a two-channel output audio signal.
- a computing device comprises: one or more processors; memory; and a plurality of program modules stored in the memory and lu be executed by the one or more processors.
- the plurality of program modules when executed by the one or more processors, cause the computing device to perform the method described above for processing a multichannel input audio signal.
- a computer program product stored in a non-transitory computer-readable storage medium in conjunction with a computing device having one or more processors, the computer program product including a plurality of program modules that, when executed by the one or more processors, cause the computing device to perform the method described above for processing a multichannel audio signal.
- Figure 1A is a block diagram illustrating a conventional downmixing process performed to a 5.1 input signal, in accordance with some embodiments.
- Figure IB is a block diagram illustrating a conventional LoRo downmixing process performed to a 5.1 input signal, in accordance with some embodiments.
- Figure 1 C is a block diagram illustrating a conventional downmixing process of surround sound virtualization or spatialization from a 7.1 input signal, in accordance with some embodiments.
- Figure 2 illustrates a block diagram of a data processing system configured to perform audio downmixing in accordance with an illustrative embodiment of the present application.
- Figures 3A-3B are block diagrams illustrating audio downmix pipelines of processing multichannel input signals in accordance with some embodiments.
- Figure 4A is a block diagram illustrating a signal workflow including a PROC applied to an input pair in accordance with some embodiments.
- Figure 4B is a block diagram illustrating a signal workflow applied to a 7.1 surround sound file in accordance with some embodiments.
- Figure 5 illustrates a user interface of a software application or a plugin component of a software application that is used to manage implementing the signal pipeline as discussed with reference to in Figure 4B for a 7.1 surround sound file in accordance with some embodiments.
- Figures 6A-6C is a flowchart illustrating a process of downmixing
- FIG. 1A is a block diagram illustrating a conventional downmixing process performed to a 5.1 input signal.
- each input channel i.e., L, C, R, Ls, Rs, and LFE
- PROC processor module
- the processor can include one or more sub- modules (not shown), such as a gain, a time delay, a low pass filter, and/or other audio processing modules.
- the output of each process module for a respective channel can include one or more channels, depending on the type of the process(es) implemented in the process module. At the end, these outputs are summed (i.e., ⁇ ) up to be, in this example, a two channel audio (i.e., L and R output channels).
- Figure IB is a block diagram illustrating a conventional Left only/Right only
- Each input channel i.e., L, C, R, Ls, Rs, and LFE
- L, C, R, Ls, Rs, and LFE will be passed through a gain module individually.
- the adjustment of the gain depends on the physical location as if it was reproduced by a surround sound system. While a surround channel may be attenuated more than a L/R channel, the relationship between the left side and right side is ignored.
- the left channel output is created by adding all channels from the left side, plus attenuated C and LFE signals.
- Center channel is split into two because there is no physical speaker located on the center line in a stereo reproduction setting.
- LFE is also split into two channels.
- the right channel output is created by adding all channels from the right side, plus attenuated C and LFE signals.
- every input channel is treated differently by its individual processor, with a simple gain module, which takes a one channel input and produce a one or two channel output. Finally, all the PROC output will be summed up based on the intended reproduced location of the input.
- FIG. 1 C is a block diagram illustrating a conventional downmixing process of surround sound virtualization or spatialization from a 7.1 input signal.
- each channel i.e., L, C, R, Lss, Rss, Lrs, Rrs, and LFE
- URTF head-related transfer function
- the left channel input will be processed by its HRTF representing the left channel of the speaker in a surround sound system. Similar process will be applied to all other input channels.
- All of the two channel output sets based on respective input channels will be summed together and become the left channel output and the right channel output, respectively.
- every input channel is also treated differently by an individual processor (e.g., consisting of a gain module and a HRTF filter).
- the processor will take a one channel input and produce a two channel output. All the two channel output will be summed up and become the final two channel output.
- FIG. 2 illustrates a block diagram of a data processing system 100 configured to perform audio downmixing in accordance with an illustrative embodiment of the present application.
- the data processing system 100 includes communications fabric 102, which provides communications between processor unit 104, memory 106, persistent storage 108, communications unit 1 10, input/output (I/O) unit 1 12, display 1 14, and one or more speakers 1 16.
- the speakers 1 16 may be built into the data processing system 100 or external to the data processing system 100.
- the data processing system 100 takes the form of a laptop computer, a desktop computer, a tablet computer, a mobile phone (such as a smartphone), a multimedia player device, a navigation device, an educational device (such as a child's learning toy), a gaming system, an audio/video (AV) receiver, or a control device (e.g., a home or industrial controller).
- a laptop computer such as a desktop computer, a tablet computer
- a mobile phone such as a smartphone
- a multimedia player device such as a smartphone
- a navigation device such as a smartphone
- an educational device such as a child's learning toy
- gaming system such as a child's learning toy
- AV audio/video
- Processor unit 104 serves to execute instructions for software programs that may be loaded into memory 106.
- Processor unit 104 may be a set of one or more processors or may be a multi-processor core, depending on the particular implementation. Further, processor unit 104 may be implemented using one or more heterogeneous processor systems in which a main processor is present with secondary processors on a single chip. As another illustrative example, processor unit 104 may be a symmetric multi-processor system containing multiple processors of the same type.
- Memory 106 may be a random access memory or any other suitable volatile or non- volatile storage device.
- Persistent storage 108 may take various forms depending on the particular implementation.
- persistent storage 1 08 may contain one or more components or devices such as a hard drive, a flash memory, a rewritable optical disk, a rewritable magnetic tape, or some combination of the above.
- the media used by persistent storage 108 may also be removable.
- a removable hard drive may be used for persistent storage 108.
- Communications unit 1 10 in these examples, provides for communications with other data processing systems or devices. In these examples, communications unit 1 10 is a network interface card. Communications unit 1 10 may provide communications through the use of either or both physical and wireless communications links.
- Input/output unit 1 12 allows for input and output of data with other devices that may be connected to data processing system 100.
- input/output unit 1 12 may provide a connection for user input through a keyboard and mouse. Further, input/output unit 1 12 may send output to a printer.
- Display 1 14 provides a mechanism to display information to a user. Speakers 1 16 play out sounds to the user.
- Instructions for the operating system and applications or programs are located on persistent storage 108. These instructions may be loaded into memory 106 for execution by processor unit 104. The processes of the different embodiments as described below may be performed by processor unit 104 using computer implemented instructions, which may be located in a memory, such as memory 106. These instructions are referred to as program code (or module), computer usable program code (or module), or computer readable program code (or module) that may be read and executed by a processor in processor unit 104.
- the program code (or module) in the different embodiments may be embodied on different physical or tangible computer readable media, such as memory 106 or persistent storage 108.
- Program code/module 120 is located in a functional form on the computer readable storage media 1 18 that is selectively removable and may be loaded onto or transferred to data processing system 100 for execution by processor unit 104.
- Program code/module 120 and computer readable storage media 1 18 form computer program product 122 in these examples.
- computer readable storage media 1 18 may be in a tangible form, such as, for example, an optical or magnetic disc that is inserted or placed into a drive or other device that is part of the persistent storage 108 for transfer onto a storage device, such as a hard drive that is part of the persistent storage 108.
- the computer readable storage media 1 18 may also take the form of a persistent storage, such as a hard drive, a thumb drive, or a flash memory that is connected to data processing system 100.
- the tangible form of computer readable storage media 1 18 is also referred to as computer recordable storage media. In some instances, the computer readable storage media 1 18 may not be removable from the data processing system 100.
- program code/module 120 may be transferred to data processing system 100 from computer readable storage media 1 18 through a communications link to communications unit 1 10 and/or through a connection to input/output unit 1 12.
- the communications link and/or the connection may be physical or wireless in the illustrative examples.
- the computer readable media also may take the form of non-tangible media, such as communications links or wireless transmissions containing the program code/module.
- the different components illustrated for data processing system 100 are not meant to provide architectural limitations to the manner in which different embodiments may be implemented.
- the different illustrative embodiments may be implemented in a data processing system including components in addition to or in place of those illustrated for data processing system 100.
- Other components shown in Figure 1 can be varied from the illustrative examples shown.
- a storage device in data processing system 100 is any hardware apparatus that may store data.
- Memory 106, persistent storage 108 and computer readable storage media 1 18 are examples of storage devices in a tangible form.
- a bus system may be used to implement communications fabric 102 and may be comprised of one or more buses, such as a system bus or an input/output bus.
- the bus system may be implemented using any suitable type of architecture that provides for a transfer of data between different components or devices attached to the bus system.
- a communications unit may include one or more devices used to transmit and receive data, such as a modem or a network adapter.
- a memory may be, for example, memory 106 or a cache such as found in an interface and memory controller hub that may be present in communications fabric 102.
- FIGs 3A-3B are block diagrams illustrating audio downmix pipelines of processing multichannel input signals in accordance with some embodiments.
- the multichannel input signal in Figure 3 A is a 5.1 surround sound file 210 including a left front channel (L), a center front channel (C), a right front channel (R), a left side channel (Ls), a right side channel (Rs), and a Low Frequency Effect (LFE).
- LFE Low Frequency Effect
- the multichannel input signal in Figure 3B is a 7.1 surround sound file 240 including a left front channel (L), a center front channel (C), a right front channel (R), a left side surround channel (Lss), a right side surround channel (Rss), a left rear surround channel (Lrs), a right rear surround channel (Rrs), and a Low Frequency Effect (LFE).
- L left front channel
- C center front channel
- R right front channel
- Lss left side surround channel
- Rss right side surround channel
- Lrs left rear surround channel
- Rrs right rear surround channel
- LFE Low Frequency Effect
- the system selects one or more input pairs from the multichannel input signal.
- an input pair corresponds to two sets of audio streams that are intended to be reproduced at speakers placed in symmetrical sides.
- the input pair includes a pair of spatially symmetrical signal sources, in som
- an input pair includes two audio channels at two sides (i.e., left side and right side) with the same angle from the center line.
- the front pair includes the left front (L) channel and the right front (K) channel, which are 30° to the left and right of the center line respectively.
- the rear pair in a Dolby 7.1 surround sound setup places the left rear surround (Lrs) channel and the right rear surround (Rrs) channel at 135° to the left and right of the center line respectively.
- Each selected input pair is then sent to a respective processor (PROC) that produces an output audio pair.
- PROC processor
- the system selects the left front channel (L) and the right front channel (R) as a pair 222, and the left side channel (Ls) and the right side channel (Rs) as a pair 224.
- the system selects the left front channel (L) and the right front channel (R) as a pair 242, the left side surround channel (Lss) and the right side surround channel (Rss) as a pair 244, and the left rear surround channel (Lrs) and the right rear surround channel (Rrs) as a pair 246.
- the channel sitting on the center line (e.g., the center front C channel) and the channel with omni direction (e.g., LFE) are single channels and will not be paired with any other channel.
- each pair will be passed into a processor respectively.
- pairs 222 and 224 are sent to PROCs 232 and 234 respectively
- pairs 242, 244, and 246 are sent to PROCs 252, 254, and 256 respectively.
- the information of a pair is compared and analyzed against each other in order to create a more solid sound image and a better spatial outcome.
- at least one of the one or more modules in each processor PROC needs to cross- reference information between two channels within the pair.
- each processor PROC includes two channels, and the two-channel output of each pair will be summed ( ⁇ ) with the single channels to create output signal including a left channel output (L') and a right channel output (R') (as shown in Figure 3A and 3B respectively).
- the processor consists of at least one module that incorporates cross-channel features from an input pair.
- a pair processor includes one or more modules that retrieve signal information from an input pair. Based on the input pair signal information, the pair processor (PROC) then modifies the output stream on a pair basis.
- a pair processor consists of a plurality of different components (or modules), including channel-dependent components and/or channel- independent components.
- a channel-dependent component performs a multichannel-in-multichannel-out process, e.g., a two-in-two-out process in a pair processor.
- the channel-dependent component produces multiple output channels with each based on more than one input channel.
- a channel-dependent component uses the information of the input signal and adjust the process based on extracted cross-channel features.
- the cross-channel features include a comparison of volumes of respective input channels, relationship of the frequency spectrum characteristics (e.g., magnitude and/or phase) of the left and right input channels, and/or time and amplitude differences of the signal onset of the left and right input channels.
- the pair processor includes one or more channel-dependent components such as a mid/side (M/S) mixer and/or a width controller (WC).
- M/S mid/side
- WC width controller
- a M/S mixer can produce mid and side signals using sum and difference of the input left and right signals.
- a M/S mixer can produce mid and side signals by comparing the overlap region of the frequency spectrum of input signals.
- a channel-independent component is a multichannel-in- multichannel-out process (including two-in-two-out) that treats each respective channel of the multichannel input file separately.
- the channel-independent component processes the multichannel input file as if the multiple channels were split into the same number of mono signals, each mono signal was processed independently, and the processed mono signals of the respective multiple channels are summed together.
- the pair processor includes one or more channel-independent components, such as an equalizer (EQ) and/or a dynamic range compressor (DRC).
- an equalizer (EQ) module takes in each channel respectively and produces the corresponding channel without using any information from other channels.
- the result of an equalizer (EQ) is as if each channel was equalized separately with the same input parameter.
- Figure 4A is a block diagram illustrating a signal workflow including a PROC
- the PROC 420 applies to an input pair 410 in accordance with some embodiments.
- the PROC 420 includes a plurality of modules (also referred to as components) that are configured to process the input pair signal 410.
- the PROC 420 can be any pair processor, such as PROC 232, PROC 234, PROC 252, PROC 254, or PROC 256 as illustrated in Figures 3A-3B. Accordingly, the pair processor 420 can be applied to any input pair, such as pair 222, pair 224, pair 242, pair 244, or pair 246 as illustrated in Figures 3A-3B.
- the pair processor 420 includes a mid/side (M/S) mixer 422, an equalizer (EQ) 428, a dynamic range compressor (DRC) 430, and a crosstalk cancellation (XTC) module 432 coupled to each other as shown in Figure 4A.
- the output signal of the PROC 420 is further processed with a width controller (WC) 434 and another dynamic range compressor (DRC) 436 to obtain the output pair 440 as shown in Figure 4A.
- WC width controller
- DRC dynamic range compressor
- the data processing system 100 first sends the input left and right signals 410 into the M/S mixer 422.
- the M/S mixer 422 is a mixing tool configured to generate three components (two side components (S) 424, i.e., left side and right side, and one mid component (M) 426) from the input pair 410.
- the left side component represents the sound source that appears only at the left channel, whereas the right side component corresponds to the sound only appears at the right channel.
- the middle component is the sound source that appears only in the phantom center of the soundstage, e.g., main musical element and dialog.
- the M/S mixer 422 separates out the information that is useful for various subsequent soundstage enhancement and minimizes unnecessary distortion in the sound quality (e.g., coloration). Moreover, this step also helps lower the correlation between the left and right components.
- the M/S mixer 422 analyzes the sound image and estimate the sound coming from center, left, and right. The M/S mixer 422 then splits the two input channels, i.e., the left and right signals 410, into one-channel mid signal 426 and two-channel side signals 424.
- PCT Application No. PCT/US2015/057616 entitled "APPARATUS AND METHOD FOR SOUND STAGE ENHANCEMENT" and filed on October 27, 2015, which is incorporated by references in its entirety.
- the system sends the side signals 424 to the equalizer (EQ) 428 to adjust the frequency component of the side signals 424.
- the EQ 428 encountered by the two side signals 424 includes one or more multi-band equalizers for performing bandpass filtering to the two side signals.
- the multi-band equalizers applied to each side signal are the same.
- the multi- band equalizers applied to one side signal are not the same as those applied to the other side signal. Nonetheless, their functions are to keep the original color of the sound signals and to avoid ambiguous spatial cues present in these two signals.
- this EQ 428 can also be used to select the target sound source based on the spectral analysis of the two side components.
- the EQ 428 produces two output signals 450 and 452.
- each of the output signals 450 and 452 are two-channel audio signals.
- the EQ 428 applies respective bandpass filters to each of the two-channel side signals 424 to obtain bandpass-filtered signals 450.
- the EQ 428 also produces residual signals 452 based on the difference in the frequency band between the input signals of the EQ 428 (i.e., the two- channel side signals 424) and the output signal of the EQ 428 (i.e., the two-channel bandpass- filtered signals 450).
- the data processing system 100 generates a leftside residual component and a right-side residual component by subtracting the left-side component and the right-side component from the bandpass-filtered left-side component and the bandpass-filtered right-side component, respectively.
- a respective amplifier is applied to the residual signal and the result signal from the crosstalk cancellation to adjust the gains of the two signals before they are combined together.
- the bandpass-filtered signals 450 are sent to a dynamic range compressor (DRC) 430.
- the DRC 430 includes a bandpass filter (different from the bandpass filter of EQ 428) for amplifying the two sound signals (i.e., the two-channel bandpass-filtered signals 450) within a predefined frequency range in order to maximize the soundstage enhancement effect achieved by the crosstalk cancellation block (XTC) 432.
- a user e.g., a sound engineer
- the data processing system 100 After performing equalization to the left-side component and the right-side component using the first bandpass filter of the EQ 428, the data processing system 100 removes a predefined frequency band from the left-side component and the right-side component using a second bandpass filter of the DRC 430.
- Representative bandpass filters used in the EQ block 428 and the DRC block 430 include a biquadratic filter or a Butterworth filter.
- the data processing system 100 after performing equalization to the left-side component and the right-side component using the first bandpass filter of the EQ 428, performs a first dynamic range compression by the DRC 430 to the left-side component and the right-side component to highlight a predefined frequency band with respect to other frequencies.
- DRC 430 A more detailed description of the DRC 430 can be found in PCT Application No. PCT/US2015/057616, entitled “APPARATUS AND METHOD FOR SOUND STAGE ENHANCEMENT” and filed on October 27, 2015, which is incorporated by references in its entirety.
- the output signals from the DRC 430 are then sent to the crosstalk cancellation (XTC) module 432 for performing the crosstalk cancel lat.inn process.
- Crosstalk is an inherent problem in stereo (i.e., two-channel) loudspeaker playback. It occurs when a sound reaches the ear on the opposite side from each speaker, and introduces unwanted spectral coloration to the original signal.
- the solution to this problem is a crosstalk cancellation (XTC) algorithm.
- One type of the XTC algorithm is to use a generalized directional binaural transfer function, such as Head-Related Transfer Functions (HRTFs) and/or Binaural Room Impulse Response (BRIR), to represent the angles of the two physical loudspeakers with respect to the listener's position.
- HRTFs Head-Related Transfer Functions
- BRIR Binaural Room Impulse Response
- Another type of the XTC algorithm system is a recursive crosstalk cancellation method that does not require head-related transfer function (HRTF), binaural room impulse response (BRIR), or any other binaural transfer functions.
- HRTF Head-Related Transfer Functions
- BRIR Binaural Room Impulse Response
- the basic algorithm can be formulated as follows:
- left[n] left[n — A L * right[n— d L ]
- the XTC 432 as shown in Figure 4A uses the recursive crosstalk cancellation method or the generalized directional binaural transfer function. A more detailed description of the XTC 432 can be found in US Patent Application No. 14/569,490, entitled
- the output signals of XTC 432 are fed into an amplifier 462, the pair of residual signals 452 is fed into an amplifier 464, and the mid component (M) 426 is also fed into an amplifier 466 before they are sent to the width controller (WC) 434 for processing and being combined together.
- the output signals of the amplifiers 462, 464, and 466 are sent to the WC 434 to adjust the width of the stage.
- WC 434 uses the analyzed information of the input signal pair to control the soundstage width of the output audio signal.
- a stage width can be from as narrow as 0°, to as wide as a 360° as for a total immersive sound.
- the cross-channel information of a pair e.g., the output pair 472 or 474
- a user can assign the desired stage width to the Width Controller, as illustrated below in Figure 5.
- the assigned stage width may affect the pair summing matrix of the WC 434 based on the previously analyzed information.
- a width of the soundstage can be adjusted using the following equations:
- the output signals 476 of the WC 434 are sent to the second dynamic range compressor (DRC) 436 to amplify the overall output level of the sound signals in the audio mastering process.
- DRC dynamic range compressor
- the data processing system 100 performs a second dynamic range compression by DRC 436 to the left-side component and the right-side component to preserve the localization cues in the digital audio output signals.
- the output of the pipeline is a stereo audio signal 440 including a left channel (L') and a right channel (R').
- the signal workflow including PROC 420 can be applied to a 5.1 surround sound file as shown in Figure 3A.
- PROC 232 and/or PROC 234 may be similar to PROC 420 as illustrated in Figure 4A.
- FIG. 4B is a block diagram illustrating a signal workflow applied to a 7.1 surround sound file in accordance with some embodiments.
- the input signals of the 7.1 surround sound file are grouped into pairs, i.e., L/R pair 242, Lss/Rss pair 244, and Lrs/Rrs pair 246 respectively.
- the L R pair 242, Lss/Rss pair 244, and Lrs/Rrs pair 246 is then sent to PROC 252, PROC 254, and PROC 256 respectively.
- a respective PROC of the PROC 252, PROC 254, and PROC 256 is similar to PROC 420 as discussed with reference to Figure 4A.
- PROC 252, PROC 254, or PROC 256 may include one or more other modules (or components) in different arrangements.
- the respective output signals of PROC 252, PROC 254, and PROC 256 are sent to a respective Width Control, e.g., WC 482, WC 484, and WC 486 respectively.
- the WC 482, WC 484, or WC 486 may be similar to WC 434 as discussed with reference to Figure 4A.
- the output of WC 482, WC 484, and WC 486 are combined with the center signal C and the Low Frequency Effect channel LFE to generate the output stereo audio signal 488.
- FIG. 5 illustrates a user interface (UI) 500 of a software application or a plugin component of a software application that is used to manage implementing the signal pipeline as discussed with reference to in Figure 4B for a 7.1 surround sound file in accordance with some embodiments.
- the signal pipeline may include a plurality of pair processors, such as PROC 420 as shown in Figure 4A. There are three pairs in this case: front pair (L, R), side pair (Lss, Rss), and rear pair (Lrs, Rrs).
- the left panel 510 of the UI 500 controls the gain of respective input channels.
- the controlling area 520 controls the frequency component of the equalizer (EQ).
- the controlling area 530 controls the width of the width controller.
- each pair passes through different pair processor PROC with different parameters, thus controlling area 540 can be used for selecting which pair (e.g., front pair, side pair, or rear pair) to input parameters for implementing PROC processing.
- Multi-channel audio format utilizes multiple audio tracks in order to reconstruct sound on a corresponding multi-channel sound reproduction system. Downmixing can happen both on the production end and the reproduction end. On the production end, sound mixers normally start mixing with the highest channel count, and downmix to a lower channel count. On the reproduction end, a multi-channel audio track can be downmix to lower channel count in order to fit the channel number of the reproduction system. In both case, maintaining the use and placement of sound that matches the original creative intent is the goal. [0056] Conventional downmixing method, as shown in Figures 1A-1 C, applies a process to each individual track.
- the relationship between tracks is not taken into account.
- the method and system presented herein take audio input on a pair-by-pair basis.
- the multichannel audio tracks are first grouped into pairs depending on the physical placement of the reproduction system.
- Second, the relationship between the two channels of a pair will be analyzed.
- the pair input is processed based on the analyzing result.
- the multichannel audio output downmixed with the system and method introduced herein creates a more accurate sound scape that is closer to the original multi-channel audio input, than the same audio input downmixed with conventional method.
- Figures 6A-6C is a flowchart illustrating a process of downmixing
- the data processing system 100 selects (602) a left input channel and a right input channel from the multi-channel input audio signal.
- the left input channel and the right input channel correspond to a pair of spatially symmetrical signal sources.
- the multi-channel audio input signal is the 5.1 surround sound file 210 of Figure 3A or the 7.1 surround sound file 240 of Figure 3B.
- the 5.1 surround input signal 21 0 includes a left front channel L and a right front channel R as a pair 222, and a left side channel Ls and a right side channel Rs as a pair 224.
- the 7.1 surround input signal 240 includes a left front channel L and a right front channel R as a pair 242, a left side surround channel Lss and a right side surround channel Rss as a pair 244, and a left rear surround channel Lrs and a right rear surround channel Rrs as a pair 246.
- the data processing system 100 then generates (604) one or more cross- channel features from the left input channel and the right input channel of a selected pair.
- the one or more cross-channel features include a comparison of volumes of the left and right input channels of a pair, relationship of the frequency spectrum characteristics (e.g., magnitude and/or phase) of the left and right input channels, and/or time and amplitude differences of the signal onset of the left and right input channels.
- the data processing system 100 then processes (606), in accordance with the cross-channel features, the left input channel and the right input channel of the selected pair to generate a left intermediate channel and a right intermediate channel.
- the left input channel and the right input channel of a pair are processed using a processor (PROC) as illustrated in Figures 3A-3B and 4A-4B.
- the PROC includes one or more modules as illustrated in Figure 4A.
- the data processing system 100 combines (608) each of the left intermediate channel and the right intermediate channel with a third input channel of the multi-channel input audio signal to form a two-channel output audio signal.
- a third input channel of the multi-channel input audio signal For example, as illustrated in Figure 3A, the left intermediate channel and the right intermediate channel (e.g., the L/R pair or the Ls/Rs pair) processed by a respective PROC is combined with the center channel C and/or the Low Frequency Effect (LFE) to produce the two-channel output audio signal L'/R' .
- LFE Low Frequency Effect
- the left intermediate channel and the right intermediate channel (e.g., the L/R pair, the Lss/Rss pair, or the Lrs/Rrs pair) processed by a respective PROC is combined with the center channel C and/or the Low Frequency Effect (LFE) to produce the two-channel output audio signal L' R' .
- LFE Low Frequency Effect
- a soundstage is normally defined as the area between the left-most perceived location and right-most perceived location in an audio reproduction. In other words, the soundstage is the limit of how far a sound object can be perceived.
- the soundstage width is defined as the distance between the left boundary and the right boundary.
- the soundstage width of a stereophonic reproduction is a separate distance of two loudspeakers.
- the concept of soundstage is adapted to apply to each symmetric pair of channels independently.
- the data processing system 100 further adjusts (610) a soundstage width associated with the left intermediate channel and the right intermediate channel using a width controller (e.g., the WC 434) before combining the left intermediate channel and the right intermediate channel with the third input channel.
- the data processing system 100 receives (612) a user input specifying the soundstage width of the two-channel output audio signal. The user input can be received on UI 500 as illustrated in Figure 5.
- the step 606 of processing the left input channel and the right input channel further comprises extracting (614) a middle component, a left side component, and a right side component from the left input channel and the right input channel.
- the input pair 410 is processed by the M/S mixer 422 to produce a middle component 426 and a left side component and a right side component S 424.
- the data processing system 100 processes (616) the left input channel and the right input channel the left side component and the right side component before combining them with the middle component to generate the left intermediate channel and the right intermediate channel.
- the step 606 of processing the left input channel and the right input channel further comprises performing (61 8) equalization (e.g., by the EQ block 428 of Figure 4A) to the left side component and the right side component using a bandpass filter to obtain a left bandpass-filtered component and a right bandpass-filtered component (e.g., the output signals of EQ 450).
- performing (61 8) equalization e.g., by the EQ block 428 of Figure 4A
- a bandpass filter e.g., the output signals of EQ 450.
- the equalization process further generates (620) a left-side residual component based on a difference between the left side component and the left bandpass-filtered component, and a right-side residual component based on a difference between the right side component and the right bandpass- filtered component, such as the left-side residual component and the right-side residual component 452 as illustrated in Figure 4A.
- the data processing system 100 after performing equalization to the left side component and the right side component, performs (622) a first dynamic range compression (e.g., by the DRC 430 of Figure 4A) to the left bandpass- filtered component and the right bandpass-filtered component (e.g., produced by the EQ 428 of Figure 4A), respectively, to obtain a left compressed component and a right compressed component correspondingly.
- a first dynamic range compression e.g., by the DRC 430 of Figure 4A
- the right bandpass-filtered component e.g., produced by the EQ 428 of Figure 4A
- the data processing system 100 after performing the first dynamic range compression, performs (624) crosstalk cancellation (e.g., by the XTC 432 of Figure 4A) to the left compressed component and the right compressed component (e.g., produced by DRC 430 of Figure 4A), respectively, to obtain a crosstalk-cancelled left-side component and a crosstalk-cancelled right-side component
- the data processing system 100 combines (626) the crosstalk-cancelled left-side component and the crosstalk-cancelled right-side component, the left-side residual component and the right-side residual component, and the middle component to generate the left intermediate channel and the right intermediate channel.
- the combining step further comprises adjusting (628) a soundstage width (e.g., by the WC 434 of Figure 4A) associated with the left intermediate channel and the right intermediate channel before combining them with the third input channel.
- a soundstage width e.g., by the WC 434 of Figure 4A
- the left and right intermediate channels produced by a respective PROC is sent to a respective WC for adjusting the soundstage width.
- the adjusted signals are then combined with a third input channel, e.g., the C or the LFE channel, to produce the output stereo signal 488.
- the data processing system 100 after adjusting the soundstage width, performs (630) a second dynamic range compression (e.g., by DRC 436 of Figure 4A) to generate the left intermediate channel and the right intermediate channel.
- a second dynamic range compression e.g., by DRC 436 of Figure 4A
- the invention can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment containing hnth hardware and software elements.
- the invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, etc.
- the invention can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system.
- a computer-usable or computer readable medium can be any tangible apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
- the medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium.
- Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk.
- Current examples of optical disks include compact disk - read only memory (CD-ROM), compact disk - read/write (CD-R/W) and DVD.
- a data processing system suitable for storing and/or executing program code will include at least one processor coupled directly or indirectly to memory elements through a system bus.
- the memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution.
- the data processing system is implemented in the form of a semiconductor chip (e.g., a system-on-chip) that integrates all components of a computer or other electronic system into a single chip substrate.
- I/O devices including but not limited to keyboards, displays, pointing devices, etc.
- I/O controllers can be coupled to the system either directly or through intervening I/O controllers.
- Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks.
- Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.
- first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another.
- a first port could be termed a second port, and, similarly, a second port could be termed a first port, without departing from the scope of the embodiments.
- the first port and the second port are both ports, but they are not the same port.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Stereophonic System (AREA)
Abstract
Description
Claims
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201762460584P | 2017-02-17 | 2017-02-17 | |
PCT/US2018/000075 WO2018151858A1 (en) | 2017-02-17 | 2018-02-16 | Apparatus and method for downmixing multichannel audio signals |
Publications (2)
Publication Number | Publication Date |
---|---|
EP3583786A1 true EP3583786A1 (en) | 2019-12-25 |
EP3583786A4 EP3583786A4 (en) | 2020-12-23 |
Family
ID=63169877
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP18754857.3A Withdrawn EP3583786A4 (en) | 2017-02-17 | 2018-02-16 | Apparatus and method for downmixing multichannel audio signals |
Country Status (6)
Country | Link |
---|---|
EP (1) | EP3583786A4 (en) |
JP (1) | JP2020508590A (en) |
KR (1) | KR20190109726A (en) |
CN (1) | CN109644315A (en) |
TW (1) | TW201843675A (en) |
WO (1) | WO2018151858A1 (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
FR3091636B1 (en) * | 2019-01-04 | 2020-12-11 | Parrot Faurecia Automotive Sas | Multichannel audio signal processing method |
US10841728B1 (en) * | 2019-10-10 | 2020-11-17 | Boomcloud 360, Inc. | Multi-channel crosstalk processing |
US11032644B2 (en) | 2019-10-10 | 2021-06-08 | Boomcloud 360, Inc. | Subband spatial and crosstalk processing using spectrally orthogonal audio components |
CN110853658B (en) * | 2019-11-26 | 2021-12-07 | 中国电影科学技术研究所 | Method and apparatus for downmixing audio signal, computer device, and readable storage medium |
Family Cites Families (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU2000242253A1 (en) * | 2000-04-10 | 2001-10-23 | Harman International Industries Incorporated | Creating virtual surround using dipole and monopole pressure fields |
CN1233200C (en) * | 2003-03-04 | 2005-12-21 | Tcl王牌电子(深圳)有限公司 | FPGA 5.1 channel virtual speech reproducing method and device |
KR100644617B1 (en) * | 2004-06-16 | 2006-11-10 | 삼성전자주식회사 | Apparatus and method for reproducing 7.1 channel audio |
US7853022B2 (en) * | 2004-10-28 | 2010-12-14 | Thompson Jeffrey K | Audio spatial environment engine |
US7751572B2 (en) * | 2005-04-15 | 2010-07-06 | Dolby International Ab | Adaptive residual audio coding |
JP5053849B2 (en) * | 2005-09-01 | 2012-10-24 | パナソニック株式会社 | Multi-channel acoustic signal processing apparatus and multi-channel acoustic signal processing method |
US8050434B1 (en) * | 2006-12-21 | 2011-11-01 | Srs Labs, Inc. | Multi-channel audio enhancement system |
EP2169664A3 (en) * | 2008-09-25 | 2010-04-07 | LG Electronics Inc. | A method and an apparatus for processing a signal |
UA101542C2 (en) * | 2008-12-15 | 2013-04-10 | Долби Лабораторис Лайсензин Корпорейшн | Surround sound virtualizer and method with dynamic range compression |
KR101387195B1 (en) * | 2009-10-05 | 2014-04-21 | 하만인터내셔날인더스트리스인코포레이티드 | System for spatial extraction of audio signals |
JP5955862B2 (en) * | 2011-01-04 | 2016-07-20 | ディーティーエス・エルエルシーDts Llc | Immersive audio rendering system |
TWI479905B (en) * | 2012-01-12 | 2015-04-01 | Univ Nat Central | Multi-channel down mixing device |
CN106170991B (en) * | 2013-12-13 | 2018-04-24 | 无比的优声音科技公司 | Device and method for sound field enhancing |
CN108293165A (en) * | 2015-10-27 | 2018-07-17 | 无比的优声音科技公司 | Enhance the device and method of sound field |
-
2018
- 2018-02-16 KR KR1020197007657A patent/KR20190109726A/en unknown
- 2018-02-16 WO PCT/US2018/000075 patent/WO2018151858A1/en unknown
- 2018-02-16 EP EP18754857.3A patent/EP3583786A4/en not_active Withdrawn
- 2018-02-16 JP JP2019503460A patent/JP2020508590A/en active Pending
- 2018-02-16 CN CN201880003285.0A patent/CN109644315A/en active Pending
- 2018-02-21 TW TW107105810A patent/TW201843675A/en unknown
Also Published As
Publication number | Publication date |
---|---|
TW201843675A (en) | 2018-12-16 |
CN109644315A (en) | 2019-04-16 |
EP3583786A4 (en) | 2020-12-23 |
JP2020508590A (en) | 2020-03-19 |
WO2018151858A1 (en) | 2018-08-23 |
KR20190109726A (en) | 2019-09-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10313813B2 (en) | Apparatus and method for sound stage enhancement | |
US9794715B2 (en) | System and methods for processing stereo audio content | |
CN111131970B (en) | Audio signal processing apparatus and method for filtering audio signal | |
RU2685041C2 (en) | Device of audio signal processing and method of audio signal filtering | |
EP3583786A1 (en) | Apparatus and method for downmixing multichannel audio signals | |
EP2484127B1 (en) | Method, computer program and apparatus for processing audio signals | |
KR102355770B1 (en) | Subband spatial processing and crosstalk cancellation system for conferencing | |
KR20120067294A (en) | Speaker array for virtual surround rendering | |
CN112313970B (en) | Method and system for enhancing an audio signal having a left input channel and a right input channel | |
CN109923877B (en) | Apparatus and method for weighting stereo audio signal | |
JP2024502732A (en) | Post-processing of binaural signals | |
US11924628B1 (en) | Virtual surround sound process for loudspeaker systems | |
US11373662B2 (en) | Audio system height channel up-mixing | |
JP2020039168A (en) | Device and method for sound stage extension | |
CN116615919A (en) | Post-processing of binaural signals |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20190318 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
AX | Request for extension of the european patent |
Extension state: BA ME |
|
RIN1 | Information on inventor provided before grant (corrected) |
Inventor name: HSIEH, PEI-LUN Inventor name: WU, TSAI-YI |
|
DAV | Request for validation of the european patent (deleted) | ||
DAX | Request for extension of the european patent (deleted) | ||
A4 | Supplementary search report drawn up and despatched |
Effective date: 20201125 |
|
RIC1 | Information provided on ipc code assigned before grant |
Ipc: H04S 3/00 20060101AFI20201119BHEP Ipc: G10L 19/008 20130101ALI20201119BHEP |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20210626 |