EP3451332A1 - Decoder-provided time domain aliasing cancellation during lossy/lossless transitions - Google Patents

Decoder-provided time domain aliasing cancellation during lossy/lossless transitions Download PDF

Info

Publication number
EP3451332A1
EP3451332A1 EP18191910.1A EP18191910A EP3451332A1 EP 3451332 A1 EP3451332 A1 EP 3451332A1 EP 18191910 A EP18191910 A EP 18191910A EP 3451332 A1 EP3451332 A1 EP 3451332A1
Authority
EP
European Patent Office
Prior art keywords
lossy
lossless
decoder
aliasing cancellation
cancellation component
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
EP18191910.1A
Other languages
German (de)
French (fr)
Other versions
EP3451332B1 (en
Inventor
Arijit Biswas
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dolby International AB
Original Assignee
Dolby International AB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dolby International AB filed Critical Dolby International AB
Publication of EP3451332A1 publication Critical patent/EP3451332A1/en
Application granted granted Critical
Publication of EP3451332B1 publication Critical patent/EP3451332B1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/0017Lossless audio signal coding; Perfect reconstruction of coded audio signal by transmission of coding error
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/24Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0212Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using orthogonal transformation

Definitions

  • Embodiments herein relate generally to audio signal processing, and more specifically to switching between lossy coded time segments and a lossless stream of the same source audio.
  • a decoder may receive a stream of lossy coded time segments that includes audio encoded using a frequency-domain lossy coding method over a network.
  • the decoder may also receive, over the network, a lossless stream that includes the audio encoded using a lossless coding method.
  • the decoder may provide audio playback of the lossless stream.
  • the lossy coded time segments and the lossless stream may be encoded from the same source audio, may also be time-aligned, and may have a same sampling rate.
  • the decoder may generate an aliasing cancellation component based on a previously-decoded frame of the lossless stream.
  • the generated aliasing cancellation component may be added to a lossy time segment at a transition frame.
  • the sum of the aliasing cancellation component and the lossy time segment may be normalized using a weight caused by an encoding window applied to the transition frame, thereby providing aliasing cancellation on the transition frame. Audio playback of the lossy time segment may then be provided by the decoder, beginning with the aliasing-canceled transition frame.
  • the aliasing cancellation component may be generated based on reconstructing an adjacent frame of the previously-decoded frames of the lossless stream as a sum of a lossy component and an aliasing cancellation component for the adjacent frame.
  • the aliasing cancellation component for the adjacent frame may then be extrapolated to generate the aliasing cancellation component for the transition frame.
  • TDAC Time Domain Aliasing Cancelation
  • the proposed solution does not require additional bits to be sent from the encoder side (as metadata) because adjacent decoded lossless samples (past frames, in the case of lossless to lossy switching, and future frames, in the case of lossy to lossless switching) are utilized to generate aliasing cancelation terms by the decoder.
  • AC-4 lossless mode is used for music delivery over a network protocol, such as an Internet protocol
  • acute network bandwidth constraints may require transition to and from a fallback lossy AC-4 sub-stream.
  • fallback to ASF mode can be sufficient to preserve high-quality playback. Therefore, transitions to and from a frequency-domain lossy modified discrete cosine transform ("MDCT") -coded time segment, which may use overlapping windows, and a time segment coded by the lossless coder, which may use rectangular non-overlapping windows, should be handled efficiently.
  • MDCT frequency-domain lossy modified discrete cosine transform
  • a lossy MDCT frame relies on TDAC of adjacent windows (which is why overlapping windows are commonly used).
  • the MDCT removes the aliasing part of the current frame by combining with the signal decoded in the following frame. Therefore, if the encoding mode of the next frame is lossless coding, the aliasing term of the frame coded with lossy coding is not canceled, since the frame coded with the lossless codec does not have the corresponding time domain alias cancelation components to cancel out the time domain aliasing of the previous lossy frame.
  • the aliasing cancellation components for the lossy MDCT encoding are generally forwarded to the decoder by the encoder. This side information will not be available, if it is not sent by the encoder in advance. Furthermore, forwarding aliasing cancellation components is not an option for responding to bandwidth constraints, because the decoder performing the switching does not know a priori the transition points between encoding methods.
  • FIGS. 1A-B show exemplary signal segments where forwarding aliasing cancelation is used to switch between a lossless stream and lossy coded time segments by an encoder, according to conventional forward aliasing cancelation.
  • the transition is made from lossless coded stream 115 to the lossy coded time segment 120 (in diagram 100), and vice-versa (in diagram 150) by the encoder, and necessary steps required to do the seamless switching for the overall encoder-decoder system are managed on the encoder side, prior to transmitting the streams to the decoder.
  • lossless-coded time segments 115 and 170 are rectangular-windowed segments.
  • the MDCT windowed lossy-coded time segments 120 and 165 are also shown.
  • the encoder determines and transmits a forwarding aliasing cancellation (FAC) signal 125 and 175 in the frames 105 and 110 and similarly in the frames 155 and 160 where the transition occurs.
  • the FAC signal 125 may include an aliasing cancellation component 129 and a symmetric windowed signal 127.
  • the FAC signal 125 may be forwarded to the decoder from the encoder, where the FAC signals 125 and 175 are added to the corresponding lossy time segments 120 and 165 at the frames 105 and 110 and 155 and 160 where the transition occurs.
  • the FAC signals 125 and 175 may be symmetric windowed signals to the lossy time segments 120 and 165.
  • unaliased signals 130 and 180 are generated at the frames 110 and 155, where the transitions respectively occur.
  • the last rows of diagrams 100 and 150 represent lossless signals in the same frame as the lossy time segment 140. Since the lossless signals (dummy signal 115 in frame X 0 105 in figure 100, and dummy signal 170 in frame X 3 160 in the figure 150) are available to the decoder for reconstruction, the FAC signals are not needed to cancel aliasing in the lossy time segments. Omitting transmission of the dummy signal by the encoder may reduce the need for side information transmission in encoder-side switching applications.
  • a decoded signal of adjacent frames may be used to generate the relevant aliasing cancelation signals.
  • Output audio signals may be reconstructed by adding a generated aliasing cancelation component to the decoded lossy time segment, and by normalizing the sum using a weight caused by the encoding window.
  • FIG. 2 shows a flow diagram for a method 200 of switching between a lossy coded time segment and a lossless stream of the same source audio, according to an embodiment.
  • a decoder may receive lossy coded time segments that include audio encoded using a frequency-domain lossy coding method over a network at step 205.
  • the decoder may also receive, over the network, a lossless stream that includes the audio encoded using a lossless coding method at step 210.
  • the lossless coding method may be a time domain coding method, as is commonly the case.
  • the decoder may also provide audio playback of the lossless stream.
  • the lossy and lossless streams may be transmitted in parallel over the network, so switching may be performed at any time desired by a user interacting with the decoder.
  • the lossy coded time segments and the lossless stream may be encoded from the same source audio and may also be time-aligned.
  • the lossy and lossless sub-streams (when streamed together) may share a same video frame rate and may have a same sampling rate.
  • FIG. 3 shows a simplified block diagram of a decoder 300 for switching between lossy coded time segments and a lossless stream of the same source audio, according to an embodiment.
  • Decoder 300 may include lossy decoder 315, which receives and decodes lossy coded time segments 305, and lossless decoder 320, which receives and decodes lossless stream 310.
  • Figure 3 also shows typical peripheral components of AC-4 lossless and lossy decoder. While a high-level summary of the components shown in FIG.
  • Lossy decoder 315 includes an MDCT spectral front end decoder, complex quadrature mirror filters (CQMF) 325, and an SRC.
  • the MDCT spectral front end decoder may use an MDCT domain signal buffer to predict each bin of the lossy coded time segments.
  • the CQMF 325 may include three modules as shown: modules for parametric decoders, object audio renderer and upmixer module, and dialogue and loudness management module.
  • the parametric decoders may include a plurality of coding tools, including one or more of companding, advanced spectral extension algorithms, advanced coupling, advanced joint object coding, and advanced joint channel coding.
  • the object audio renderer and upmixer module may perform spatial rendering of decoded audio based on metadata associated with the received lossy coded time segments.
  • the dialogue and loudness management module may allow users to adjust the relative level of voice and adjust loudness filtering and/or processing.
  • the SRC (sampling rate conversion) module may perform video frame synchronizing at a desired frame rate.
  • the exemplary lossless decoder 320 may include a core decoder, an SRC module (which operates substantially similarly to the SRC of the lossy decoder 315, though it may be likely that the SRC of the lossless decoder 320 operates in the time domain, rather than the frequency domain), a CQMF 330, and a second SRC module, applied after the CQMF 330 has been applied to the received lossless stream 310.
  • the core decoder may be any suitable lossless decoder.
  • the CQMF 330 may include an object audio renderer and upmixer module and a dialogue and loudness management module.
  • the sub-modules of CQMF 330 may function substantially similarly to the corresponding modules of CQMF 325, again with the caveat that objects CQMF 330 operates on may be encoded in the time domain, while the objects that CQMF 325 operate on may be in the frequency domain.
  • a first potential switching point may be achieved by running MDCT on the pulse-code modulation (PCM) output 340 of the lossless decoder 320, and splicing the MDCT output of the lossless decoder with the MDCT output of the lossy decoder 315. Switching after running MDCT on the output 340 of the lossless decoder 320 may advantageously provide built-in MDCT overlap/add to facilitate smooth transitions.
  • PCM pulse-code modulation
  • a second potential switching point may be at the PCM stage between MDCT and the CQMF 325 of the lossy decoder.
  • switching before the CQMF 325 may necessitate a smooth fading strategy, and in addition may suffer from the same problems as switching after running MDCT on the output 340 of the lossless decoder 320 described above.
  • a third potential switching point may take place at the indicated switch/crossfade block 350, before the peak-limiter 360 (which may be any suitable post-processing module) is applied to the output of the decoder 380. While switching at block 350 may also require a smooth fading strategy, there are several key benefits to switching at 350. Notably, since all content is rendered to the same number of output speakers, programs with different numbers/arrangements of objects may be switched, thereby avoiding a major drawback of the first two switching points described above.
  • the decoder may generate an aliasing cancellation component based on previously-decoded frames of the lossless stream at step 220.
  • FIG. 4A a diagram 400 which shows exemplary signal segments where minimized forwarding aliasing cancelation (AC) is used to switch between lossy coded time segments and a lossless stream, according to an embodiment.
  • AC forwarding aliasing cancelation
  • AC signal 425 may be derived, without side information from the encoder, by expressing the lossless segment 415 before transition frame X 1 410, during frame X 0 405, in terms of being a sum of an aliased signal and an aliasing cancellation component. To do so, time domain lossy aliased samples may be derived in terms of the original lossless data samples. Based on research published in Britanak, Vladimir, and Huibert J. Lincklaen Arri ⁇ ns.
  • J in equations (1) - (4) may refer to an identity matrix that time reverses a signal vector.
  • the aliased lossy signal x ⁇ 1 for frame X 1 410 may be rewritten as ⁇ JX 0 + X 1
  • a MDCT window vector Wk may be introduced, causing the above equation to be rewritten as: ⁇ JX 0 ⁇ W 0 + X 1 ⁇ W 1 .
  • the ° indicates element-wise multiplication between the window vectors W 0 and W 1 by the lossless signal segment vectors X 0 and X 1 respectively.
  • the decoder may reconstruct a transition frame lossless signal during time segment X 1 410 as a sum of a lossy time segment component 420 and an aliasing cancellation component based on adjacent (previous, in the case of switching from lossless to lossy) lossless time segment 415.
  • the determined aliasing cancellation component from segment 415 may then be used to extrapolate the aliasing cancellation component for frame X 1 410.
  • the unused determined AC signal 440 can be discarded, because this particular time segment can be reconstructed by the lossless decoder.
  • the generated aliasing cancellation component 425 may be added to the lossy time segment 420 at a transition frame 410 at step 240.
  • the sum of the aliasing cancellation component and the lossy time segment may be normalized using a weight caused by the encoding window applied to the transition frame at step 250, thereby providing aliasing cancellation at the transition frame.
  • An exemplary unaliased signal 430 for frame X 1 may be expressed as shown below, in equation (6).
  • the aliasing cancellation component is the leftmost term, derived from equation (2) and the rightmost term is the lossy time segment component.
  • -JX 0 W 0 is the aliasing component in the time-domain aliased signal in Equation (5) for frame X 1 .
  • Equation (6) To correct for this aliasing component, the leftmost term in Equation (6), generated based on the decoder previously decoding frame X 0 of the lossless stream 415, is added to the lossy time segment component for frame X 1 .
  • Equation (6) also illustrates the normalizing step 250, as the terms are multiplied by the inverse window function term W 1 -1 for transition frame X 1 410. Audio playback of the lossy coded time segment may then be provided by the decoder at step 260, beginning with the unaliased signal 430 at the transition frame.
  • FIG. 5 shows a flow diagram for a method 500 of switching back from lossy coded time segments to a lossless stream of the same source audio, according to an embodiment.
  • FIG. 4B a diagram 450 which shows exemplary signal segments where minimized forwarding aliasing cancelation is used to switch between a lossy coded time segment and a lossless stream, according to an embodiment.
  • the decoder may receive a lossy time segment 465 that includes audio encoded using a lossy coding method over a network at step 505.
  • the decoder may also provide audio playback of the lossy coded time segments.
  • the decoder may also receive, over the network, a lossless stream 470 that includes the audio encoded using a lossless coding method at step 510.
  • the decoder may switch the playback from the lossy coded time segments to the lossless stream.
  • the decoder may perform the switch automatically, after determining that network bandwidth exceeds a predetermined threshold for providing adequate performance for the lossless stream, or in response to a user-provided indication on an interface in communication with the decoder.
  • the decoder may generate an aliasing cancellation component 475 based on previously-decoded frames of the lossless stream at step 520.
  • the previously-decoded frame may be the subsequent frame (i.e., the first decoded frame of the lossless stream).
  • the aliased lossy time segment for frame X 2 455 may be rewritten, based on equation (3) as: X 2 + JX 3 .
  • the decoder may reconstruct transition frame X 2 455 as a sum of a lossy time segment component 465 and aliasing cancellation component for adjacent time segment X 3 460.
  • Using lossless signals from frames after the transition frame is possible due to the decoder receiving both the lossy and the lossless streams, and by buffering decoded time segments of the lossless stream.
  • the determined aliasing cancellation component for segment X 3 460 may then be used to extrapolate the aliasing cancellation component for frame X 2 455.
  • the generated aliasing cancellation component 475 may be added to the lossy time segment 465 at a transition frame 455 at step 540.
  • the sum of the aliasing cancellation component and the lossy time segment may be normalized using a weight caused by the encoding window applied to the transition frame at step 550, thereby providing aliasing cancellation at the transition frame.
  • An exemplary unaliased signal 480 for frame X 3 may be expressed as shown below, in equation (8).
  • the aliasing cancellation component is the rightmost term, derived from equation (3) and the leftmost term is the lossy time segment component.
  • JX 3 W 3 is the aliasing component in the time-domain aliased signal in Equation (7) for frame X 3 .
  • Equation (8) illustrates the normalizing step 550 as well, where the terms are multiplied by the inverse window function term W 2 -1 for transition frame X 2 455. Audio playback of the lossless stream may then be provided by the decoder, after the unaliased signal 480 at step 560.
  • FIG. 6 is a block diagram of an exemplary system for providing decoder-side switching between lossy coded time segments and a lossless stream of the same source audio as described above.
  • an exemplary system for implementing the subject matter disclosed herein, including the methods described above includes a hardware device 600, including a processing unit 602, memory 604, storage 606, data entry module 608, display adapter 610, communication interface 612, and a bus 614 that couples elements 604-612 to the processing unit 602.
  • the bus 614 may comprise any type of bus architecture. Examples include a memory bus, a peripheral bus, a local bus, etc.
  • the processing unit 602 is an instruction execution machine, apparatus, or device and may comprise a microprocessor, a digital signal processor, a graphics processing unit, an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), etc.
  • the processing unit 602 may be configured to execute program instructions stored in memory 604 and/or storage 606 and/or received via data entry module 608.
  • the memory 604 may include read only memory (ROM) 616 and random access memory (RAM) 618.
  • Memory 604 may be configured to store program instructions and data during operation of device 600.
  • memory 604 may include any of a variety of memory technologies such as static random access memory (SRAM) or dynamic RAM (DRAM), including variants such as dual data rate synchronous DRAM (DDR SDRAM), error correcting code synchronous DRAM (ECC SDRAM), or RAMBUS DRAM (RDRAM), for example.
  • SRAM static random access memory
  • DRAM dynamic RAM
  • DRAM dynamic RAM
  • ECC SDRAM error correcting code synchronous DRAM
  • RDRAM RAMBUS DRAM
  • Memory 604 may also include nonvolatile memory technologies such as nonvolatile flash RAM (NVRAM) or ROM.
  • NVRAM nonvolatile flash RAM
  • NVRAM nonvolatile flash RAM
  • ROM basic input/output system
  • BIOS basic input/output system
  • the storage 606 may include a flash memory data storage device for reading from and writing to flash memory, a hard disk drive for reading from and writing to a hard disk, a magnetic disk drive for reading from or writing to a removable magnetic disk, and/or an optical disk drive for reading from or writing to a removable optical disk such as a CD ROM, DVD or other optical media.
  • the drives and their associated computer-readable media provide nonvolatile storage of computer readable instructions, data structures, program modules and other data for the hardware device 600.
  • the methods described herein can be embodied in executable instructions stored in a non-transitory computer readable medium for use by or in connection with an instruction execution machine, apparatus, or device, such as a computer-based or processor-containing machine, apparatus, or device. It will be appreciated by those skilled in the art that for some embodiments, other types of computer readable media may be used which can store data that is accessible by a computer, such as magnetic cassettes, flash memory cards, digital video disks, Bernoulli cartridges, RAM, ROM, and the like may also be used in the exemplary operating environment.
  • a "computer-readable medium” can include one or more of any suitable media for storing the executable instructions of a computer program in one or more of an electronic, magnetic, optical, and electromagnetic format, such that the instruction execution machine, system, apparatus, or device can read (or fetch) the instructions from the computer readable medium and execute the instructions for carrying out the described methods.
  • a non-exhaustive list of conventional exemplary computer readable medium includes: a portable computer diskette; a RAM; a ROM; an erasable programmable read only memory (EPROM or flash memory); optical storage devices, including a portable compact disc (CD), a portable digital video disc (DVD), a high definition DVD (HD-DVDTM), a BLU-RAY disc; and the like.
  • a number of program modules may be stored on the storage 606, ROM 616 or RAM 618, including an operating system 622, one or more applications programs 624, program data 626, and other program modules 628.
  • a user may enter commands and information into the hardware device 600 through data entry module 608.
  • Data entry module 608 may include mechanisms such as a keyboard, a touch screen, a pointing device, etc.
  • Other external input devices (not shown) are connected to the hardware device 600 via external data entry interface 630.
  • external input devices may include a microphone, joystick, game pad, satellite dish, scanner, or the like.
  • external input devices may include video or audio input devices such as a video camera, a still camera, etc.
  • Data entry module 608 may be configured to receive input from one or more users of device 600 and to deliver such input to processing unit 602 and/or memory 604 via bus 614.
  • the hardware device 600 may operate in a networked environment using logical connections to one or more remote nodes (not shown) via communication interface 612.
  • the remote node may be another computer, a server, a router, a peer device or other common network node, and typically includes many or all of the elements described above relative to the hardware device 600.
  • the communication interface 612 may interface with a wireless network and/or a wired network. Examples of wireless networks include, for example, a BLUETOOTH network, a wireless personal area network, a wireless 802.11 local area network (LAN), and/or wireless telephony network (e.g., a cellular, PCS, or GSM network).
  • wireless networks include, for example, a BLUETOOTH network, a wireless personal area network, a wireless 802.11 local area network (LAN), and/or wireless telephony network (e.g., a cellular, PCS, or GSM network).
  • wired networks include, for example, a LAN, a fiber optic network, a wired personal area network, a telephony network, and/or a wide area network (WAN).
  • WAN wide area network
  • communication interface 612 may include logic configured to support direct memory access (DMA) transfers between memory 604 and other devices.
  • DMA direct memory access
  • program modules depicted relative to the hardware device 600 may be stored in a remote storage device, such as, for example, on a server. It will be appreciated that other hardware and/or software to establish a communications link between the hardware device 600 and other devices may be used.
  • At least one component defined by the claims is implemented at least partially as an electronic hardware component, such as an instruction execution machine (e.g., a processor-based or processor-containing machine) and/or as specialized circuits or circuitry (e.g., discrete logic gates interconnected to perform a specialized function), such as those illustrated in FIG. 6 .
  • Other components may be implemented in software, hardware, or a combination of software and hardware. Moreover, some or all of these other components may be combined, some may be omitted altogether, and additional components can be added while still achieving the functionality described herein.
  • the subject matter described herein can be embodied in many different variations, and all such variations are contemplated to be within the scope of what is claimed.
  • the terms “component,” “module,” and “process,” may be used interchangeably to refer to a processing unit that performs a particular function and that may be implemented through computer program code (software), digital or analog circuitry, computer firmware, or any combination thereof.
  • a decoder may receive a stream of lossy coded time segments that includes audio encoded using a frequency-domain lossy coding method over a network.
  • the decoder may also receive, over the network, a lossless stream that includes the audio encoded using a lossless coding method.
  • the decoder may provide audio playback of the lossless stream.
  • the lossy coded time segments and the lossless stream may be encoded from the same source audio, may also be time-aligned, and may have a same sampling rate.
  • the decoder may generate an aliasing cancellation component based on a previously-decoded frame of the lossless stream.
  • the generated aliasing cancellation component may be added to a lossy time segment at a transition frame.
  • the sum of the aliasing cancellation component and the lossy time segment may be normalized using a weight caused by an encoding window applied to the transition frame, thereby providing aliasing cancellation on the transition frame. Audio playback of the lossy time segment may then be provided by the decoder, beginning with the aliasing-canceled transition frame.
  • the lossless coding method may use MDCT with overlapping windows, and the lossless coding method may use rectangular non-overlapping windows that are different from windows used by the lossy coding method.
  • the switching may be performed in various locations in the decoding process, including, in response to receiving the determination that network bandwidth is constrained, selecting the transition frame to be before a peak-limiter is applied to the lossless stream.
  • the aliasing cancellation component may be generated based on reconstructing an adjacent frame of the previously-decoded frames of the lossless stream as a sum of a lossy component and an aliasing cancellation component for the adjacent frame.
  • the aliasing cancellation component for the adjacent frame may then be extrapolated to generate the aliasing cancellation component for the transition frame.
  • the normalizing may include multiplying the sum of the aliasing cancellation component and the lossy time segment by an inverse window function vector determined for the transition frame.
  • the determination that network bandwidth is constrained may be received using several different methods. For example, the determination may be received from a user-provided indication on an interface in communication with the decoder (e.g., using a software application running on a computer). The determination may alternatively be automatically generated by software that monitors network bandwidth.
  • Computer program products comprising computer-readable program code to be executed by one or more processors when retrieved from a non-transitory computer-readable medium, are also described for switching between lossy coded time segments and a lossless stream of the same source audio.
  • the program code may include instructions to receive a stream of lossy coded time segments that includes audio encoded using a frequency-domain lossy coding method over a network.
  • the program code may also include instructions to receive, over the network, a lossless stream that includes the audio encoded using a lossless coding method.
  • the decoder may provide audio playback of the lossless stream.
  • the lossy coded time segments and the lossless stream may be encoded from the same source audio, may also be time-aligned, and may have a same sampling rate.
  • the decoder may generate an aliasing cancellation component based on a previously-decoded frame of the lossless stream.
  • the generated aliasing cancellation component may be added to a lossy time segment at a transition frame.
  • the sum of the aliasing cancellation component and the lossy time segment may be normalized using a weight caused by an encoding window applied to the transition frame, thereby providing aliasing cancellation on the transition frame. Audio playback of the lossy time segment may then be provided by the decoder, beginning with the aliasing-canceled transition frame.
  • an exemplary decoder may include a lossy decoder circuit, an lossless decoder circuit, and an analysis circuit.
  • the lossy decoder circuit may receive lossy coded time segments from a network connection, the lossy coded time segments comprising audio encoded using a frequency-domain lossy coding method.
  • the lossless decoder circuit may receive a lossless stream from the network connection, the lossless stream comprising the audio encoded using a lossless coding method, the lossy coded time segments and the lossless stream being encoded from the same source audio, being time-aligned, and having a same sampling rate, the decoder providing audio playback of the lossless stream.
  • the analysis circuit may be coupled to both the lossy decoder circuit and the lossless decoder circuit.
  • the analysis circuit may, in response to a determination that network bandwidth is constrained, generate an aliasing cancellation component based on previously-decoded frames of the lossless stream, add the generated aliasing cancellation component to a lossy time segment at a transition frame, normalize the sum of the generated aliasing cancellation component and the lossy time segment using a weight caused by an encoding window applied to the transition frame to provide aliasing cancellation on the transition frame, and provide audio playback of the lossy coded time segments beginning with the aliasing-canceled transition frame.
  • the lossless coding method may use MDCT with overlapping windows, and the lossless coding method may use rectangular non-overlapping windows that are different from windows used by the lossy coding method.
  • the switching may be performed in various locations in the decoding process.
  • the analysis circuit may, in response to receiving the determination that network bandwidth is constrained, select the transition frame to be before a peak-limiter is applied to the lossless stream.
  • the aliasing cancellation component may be generated based on reconstructing an adjacent frame of the previously-decoded frames of the lossless stream as a sum of a lossy component and an aliasing cancellation component for the adjacent frame.
  • the aliasing cancellation component for the adjacent frame may then be extrapolated to generate the aliasing cancellation component for the transition frame.
  • the normalizing may include multiplying the sum of the aliasing cancellation component and the lossy time segment by an inverse window function vector determined for the transition frame.
  • EEEs enumerated example embodiments

Abstract

Systems and methods are described for switching between lossy coded time segments and a lossless stream of the same source audio. A decoder may receive lossy coded time segments that include audio encoded using frequency-domain lossy coding. The decoder may also receive a lossless stream, which the decoder plays back, that includes audio from the same source encoded using lossless coding. In response to receiving a determination that network bandwidth is constrained, the decoder may generate an aliasing cancellation component based on a previously-decoded frame of the lossless stream, which may be added to a lossy time segment at a transition frame. The sum of the aliasing cancellation component and the lossy time segment may be normalized using a weight caused by an encoding window. Audio playback of the lossy coded time segments may then be provided, beginning with the aliasing-canceled transition frame.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This application claims priority of the following priority applications: US provisional application 62/553,042 (reference: D16126USP1), filed 31 August 2017 and EP application 17188694.8 (reference: D16126EP), filed 31 August 2017.
  • TECHNICAL FIELD
  • Embodiments herein relate generally to audio signal processing, and more specifically to switching between lossy coded time segments and a lossless stream of the same source audio.
  • SUMMARY OF THE INVENTION
  • Systems and methods are described for switching between lossy coded time segments and a lossless stream of the same source audio. A decoder may receive a stream of lossy coded time segments that includes audio encoded using a frequency-domain lossy coding method over a network. The decoder may also receive, over the network, a lossless stream that includes the audio encoded using a lossless coding method. The decoder may provide audio playback of the lossless stream. The lossy coded time segments and the lossless stream may be encoded from the same source audio, may also be time-aligned, and may have a same sampling rate.
  • In response to receiving a determination that network bandwidth is constrained, the decoder may generate an aliasing cancellation component based on a previously-decoded frame of the lossless stream. The generated aliasing cancellation component may be added to a lossy time segment at a transition frame. The sum of the aliasing cancellation component and the lossy time segment may be normalized using a weight caused by an encoding window applied to the transition frame, thereby providing aliasing cancellation on the transition frame. Audio playback of the lossy time segment may then be provided by the decoder, beginning with the aliasing-canceled transition frame.
  • In an embodiment, the aliasing cancellation component may be generated based on reconstructing an adjacent frame of the previously-decoded frames of the lossless stream as a sum of a lossy component and an aliasing cancellation component for the adjacent frame. The aliasing cancellation component for the adjacent frame may then be extrapolated to generate the aliasing cancellation component for the transition frame.
  • BRIEF DESCRIPTION OF THE FIGURES
  • This disclosure is illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements, and in which:
    • FIGS. 1A-B show exemplary signal segments where forwarding aliasing cancelation is used to switch between a lossless stream and a lossy coded time segment, according to an embodiment.
    • FIG. 2 shows a flow diagram for a method of switching between a lossy coded time segment and a lossless stream of the same source audio, according to an embodiment.
    • FIG. 3 shows a simplified block diagram of a system for switching between a lossy coded time segment and a lossless stream of the same source audio, according to an embodiment.
    • FIGS. 4A-B show exemplary signal segments where minimized forwarding aliasing cancelation is used to switch between a lossy coded time segment and a lossless stream, according to an embodiment.
    • FIG. 5 shows a flow diagram for a method of switching back from a lossy coded time segment to a lossless stream of the same source audio, according to an embodiment.
    • FIG. 6 is a block diagram of an exemplary system for switching between a lossy coded time segment and a lossless stream of the same source audio, according to an embodiment.
    DETAILED DESCRIPTION
  • Systems and methods are described for switching between lossy coded time segments (such as those encoded by the perceptually lossy AC-4 codec, developed by Dolby Laboratories, Inc., of San Francisco, California, and time segments of a lossless sub-streams (such as the lossless AC-4 codec) originating from the same source audio. A Time Domain Aliasing Cancelation (TDAC) process may be applied during the transition between non-aliased lossless coding (again, e.g., AC-4 lossless coding) and MDCT transform-based lossy coding (such as the coding used in the Audio Spectral Frontend, hereinafter referred to as ASF, in AC-4). The proposed solution does not require additional bits to be sent from the encoder side (as metadata) because adjacent decoded lossless samples (past frames, in the case of lossless to lossy switching, and future frames, in the case of lossy to lossless switching) are utilized to generate aliasing cancelation terms by the decoder.
  • If AC-4 lossless mode is used for music delivery over a network protocol, such as an Internet protocol, acute network bandwidth constraints may require transition to and from a fallback lossy AC-4 sub-stream. In many cases, fallback to ASF mode can be sufficient to preserve high-quality playback. Therefore, transitions to and from a frequency-domain lossy modified discrete cosine transform ("MDCT") -coded time segment, which may use overlapping windows, and a time segment coded by the lossless coder, which may use rectangular non-overlapping windows, should be handled efficiently.
  • Transitioning to and from lossy coding from lossless coding may present several challenges. To compute the decoded signal, a lossy MDCT frame relies on TDAC of adjacent windows (which is why overlapping windows are commonly used). The MDCT removes the aliasing part of the current frame by combining with the signal decoded in the following frame. Therefore, if the encoding mode of the next frame is lossless coding, the aliasing term of the frame coded with lossy coding is not canceled, since the frame coded with the lossless codec does not have the corresponding time domain alias cancelation components to cancel out the time domain aliasing of the previous lossy frame.
  • To handle the transitions seamlessly between the two modes using conventional techniques, the aliasing cancellation components for the lossy MDCT encoding are generally forwarded to the decoder by the encoder. This side information will not be available, if it is not sent by the encoder in advance. Furthermore, forwarding aliasing cancellation components is not an option for responding to bandwidth constraints, because the decoder performing the switching does not know a priori the transition points between encoding methods.
  • FIGS. 1A-B show exemplary signal segments where forwarding aliasing cancelation is used to switch between a lossless stream and lossy coded time segments by an encoder, according to conventional forward aliasing cancelation. For designing seamless switching in the decoder, it is assumed that the transition is made from lossless coded stream 115 to the lossy coded time segment 120 (in diagram 100), and vice-versa (in diagram 150) by the encoder, and necessary steps required to do the seamless switching for the overall encoder-decoder system are managed on the encoder side, prior to transmitting the streams to the decoder. As seen in diagrams 100 and 150, lossless-coded time segments 115 and 170 are rectangular-windowed segments. Likewise, the MDCT windowed lossy-coded time segments 120 and 165 are also shown.
  • To compensate for the aliasing caused by switching between streams (handled at transition time segments X 1 110 in diagram 100 and X 2 155 in diagram 150), the encoder determines and transmits a forwarding aliasing cancellation (FAC) signal 125 and 175 in the frames 105 and 110 and similarly in the frames 155 and 160 where the transition occurs. The FAC signal 125 may include an aliasing cancellation component 129 and a symmetric windowed signal 127. The FAC signal 125 may be forwarded to the decoder from the encoder, where the FAC signals 125 and 175 are added to the corresponding lossy time segments 120 and 165 at the frames 105 and 110 and 155 and 160 where the transition occurs. As seen in diagrams 100 and 150, the FAC signals 125 and 175 may be symmetric windowed signals to the lossy time segments 120 and 165. When the FAC signals 125 and 175 are added to by the decoder, unaliased signals 130 and 180 are generated at the frames 110 and 155, where the transitions respectively occur.
  • Assuming there is no quantization error (of the FAC signal), the last rows of diagrams 100 and 150 represent lossless signals in the same frame as the lossy time segment 140. Since the lossless signals (dummy signal 115 in frame X 0 105 in figure 100, and dummy signal 170 in frame X 3 160 in the figure 150) are available to the decoder for reconstruction, the FAC signals are not needed to cancel aliasing in the lossy time segments. Omitting transmission of the dummy signal by the encoder may reduce the need for side information transmission in encoder-side switching applications.
  • To avoid the shortcomings of conventional switching between lossy and lossless-encoded streams described above, a decoded signal of adjacent frames may be used to generate the relevant aliasing cancelation signals. Output audio signals may be reconstructed by adding a generated aliasing cancelation component to the decoded lossy time segment, and by normalizing the sum using a weight caused by the encoding window.
  • FIG. 2 shows a flow diagram for a method 200 of switching between a lossy coded time segment and a lossless stream of the same source audio, according to an embodiment. A decoder may receive lossy coded time segments that include audio encoded using a frequency-domain lossy coding method over a network at step 205. The decoder may also receive, over the network, a lossless stream that includes the audio encoded using a lossless coding method at step 210. In some embodiments, the lossless coding method may be a time domain coding method, as is commonly the case. The decoder may also provide audio playback of the lossless stream. In a specific application, the lossy and lossless streams may be transmitted in parallel over the network, so switching may be performed at any time desired by a user interacting with the decoder. To further facilitate switching, the lossy coded time segments and the lossless stream may be encoded from the same source audio and may also be time-aligned. Furthermore, the lossy and lossless sub-streams (when streamed together) may share a same video frame rate and may have a same sampling rate.
  • In response to receiving a determination that network bandwidth is constrained, the decoder may switch the playback from the lossless stream to the lossy coded time segments. FIG. 3 shows a simplified block diagram of a decoder 300 for switching between lossy coded time segments and a lossless stream of the same source audio, according to an embodiment. Decoder 300 may include lossy decoder 315, which receives and decodes lossy coded time segments 305, and lossless decoder 320, which receives and decodes lossless stream 310. Figure 3 also shows typical peripheral components of AC-4 lossless and lossy decoder. While a high-level summary of the components shown in FIG. 3 is given below, further detail may be found in Riedmiller et al., Delivering Scalable Audio Experiences using AC-4, IEEE Transactions on Broadcasting, Vo. 63, no. 1, March 2017 pp. 179-198, incorporated by reference herein.
  • Lossy decoder 315 includes an MDCT spectral front end decoder, complex quadrature mirror filters (CQMF) 325, and an SRC. The MDCT spectral front end decoder may use an MDCT domain signal buffer to predict each bin of the lossy coded time segments. The CQMF 325 may include three modules as shown: modules for parametric decoders, object audio renderer and upmixer module, and dialogue and loudness management module. The parametric decoders may include a plurality of coding tools, including one or more of companding, advanced spectral extension algorithms, advanced coupling, advanced joint object coding, and advanced joint channel coding. The object audio renderer and upmixer module may perform spatial rendering of decoded audio based on metadata associated with the received lossy coded time segments. The dialogue and loudness management module may allow users to adjust the relative level of voice and adjust loudness filtering and/or processing. The SRC (sampling rate conversion) module may perform video frame synchronizing at a desired frame rate.
  • The exemplary lossless decoder 320 may include a core decoder, an SRC module (which operates substantially similarly to the SRC of the lossy decoder 315, though it may be likely that the SRC of the lossless decoder 320 operates in the time domain, rather than the frequency domain), a CQMF 330, and a second SRC module, applied after the CQMF 330 has been applied to the received lossless stream 310. The core decoder may be any suitable lossless decoder. The CQMF 330 may include an object audio renderer and upmixer module and a dialogue and loudness management module. The sub-modules of CQMF 330 may function substantially similarly to the corresponding modules of CQMF 325, again with the caveat that objects CQMF 330 operates on may be encoded in the time domain, while the objects that CQMF 325 operate on may be in the frequency domain.
  • There are several potential points in the decoding process where the lossy/lossless switching of method 200 may be inserted. A first potential switching point may be achieved by running MDCT on the pulse-code modulation (PCM) output 340 of the lossless decoder 320, and splicing the MDCT output of the lossless decoder with the MDCT output of the lossy decoder 315. Switching after running MDCT on the output 340 of the lossless decoder 320 may advantageously provide built-in MDCT overlap/add to facilitate smooth transitions. However, running an additional MDCT module on the output 340 of the lossless decoder 320 adds complexity to the decoder, and would also have to go through the sample rate converter (SRC, often required to deal with video frame synchronous audio coding feature in AC-4) if the video frame-rate is used. Switching after running MDCT on the output 340 of the lossless decoder 320 may also be problematic for object-based audio if programs have different numbers/arrangement of objects, and risks being non-seamless if the switching takes place before application of parametric decoding tools.
  • A second potential switching point may be at the PCM stage between MDCT and the CQMF 325 of the lossy decoder. However, switching before the CQMF 325 may necessitate a smooth fading strategy, and in addition may suffer from the same problems as switching after running MDCT on the output 340 of the lossless decoder 320 described above.
  • A third potential switching point may take place at the indicated switch/crossfade block 350, before the peak-limiter 360 (which may be any suitable post-processing module) is applied to the output of the decoder 380. While switching at block 350 may also require a smooth fading strategy, there are several key benefits to switching at 350. Notably, since all content is rendered to the same number of output speakers, programs with different numbers/arrangements of objects may be switched, thereby avoiding a major drawback of the first two switching points described above.
  • Returning to FIG. 2, in response to receiving a determination that network bandwidth is constrained, the decoder may generate an aliasing cancellation component based on previously-decoded frames of the lossless stream at step 220. In the discussion below, reference is also made to FIG. 4A, a diagram 400 which shows exemplary signal segments where minimized forwarding aliasing cancelation (AC) is used to switch between lossy coded time segments and a lossless stream, according to an embodiment.
  • AC signal 425 may be derived, without side information from the encoder, by expressing the lossless segment 415 before transition frame X 1 410, during frame X 0 405, in terms of being a sum of an aliased signal and an aliasing cancellation component. To do so, time domain lossy aliased samples may be derived in terms of the original lossless data samples. Based on research published in Britanak, Vladimir, and Huibert J. Lincklaen Arriëns. "Fast computational structures for an efficient implementation of the complete TDAC analysis/synthesis MDCT/MDST filter banks." Signal Processing 89.7 (2009): 1379-1394 (hereinafter referred to as "Britanak," and incorporated by reference herein), the aliased data samples for each lossy signal segment in diagram 100 may be expressed as: x ^ n MDCT = x n J N 4 x n + N 4 = x n x N 2 1 n
    Figure imgb0001
    x ^ n + N 4 MDCT = J N 4 x n + x n + N 4 x ^ N 2 1 n MDCT = x ^ n MDCT
    Figure imgb0002
    x ^ n + N 2 MDCT = x n + N 2 + J N 4 x n + 3 N 4 = x n + x N 1 n
    Figure imgb0003
    x ^ n + 3 N 4 MDCT = J N 4 x n + N 2 + x n + 3 N 4 x ^ N 1 n MDCT = x ^ n + N 2 MDCT
    Figure imgb0004
    That is, equations (1) - (4) refer to lossy time segment signals in each of frames X0-X3, for example. J in equations (1) - (4) may refer to an identity matrix that time reverses a signal vector. In an exemplary embodiment, J may be the matrix: J 4 = 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 .
    Figure imgb0005
    Based on equation (2), the aliased lossy signal 1 for frame X 1 410 may be rewritten as JX 0 + X 1
    Figure imgb0006
    A MDCT window vector Wk may be introduced, causing the above equation to be rewritten as: JX 0 W 0 + X 1 W 1 .
    Figure imgb0007
    In equation (5), the ° indicates element-wise multiplication between the window vectors W0 and W1 by the lossless signal segment vectors X0 and X1 respectively. As described in Britanak, the following constraints exist upon the windowing vector for perfect reconstruction of the lossy signal segment to occur: W o J = W 3 and W 1 J = W 2
    Figure imgb0008
    W k W k + W k + 2 W k + 2 = 1 ... 1
    Figure imgb0009
  • Based on the foregoing, the decoder may reconstruct a transition frame lossless signal during time segment X 1 410 as a sum of a lossy time segment component 420 and an aliasing cancellation component based on adjacent (previous, in the case of switching from lossless to lossy) lossless time segment 415. The determined aliasing cancellation component from segment 415 may then be used to extrapolate the aliasing cancellation component for frame X 1 410. The unused determined AC signal 440 can be discarded, because this particular time segment can be reconstructed by the lossless decoder. Returning to FIG. 2, the generated aliasing cancellation component 425 may be added to the lossy time segment 420 at a transition frame 410 at step 240. The sum of the aliasing cancellation component and the lossy time segment may be normalized using a weight caused by the encoding window applied to the transition frame at step 250, thereby providing aliasing cancellation at the transition frame.
  • An exemplary unaliased signal 430 for frame X1 may be expressed as shown below, in equation (6). 1 X 0 W 0 JX 1 W 1 J + X 0 W 0 J W 1 1 .
    Figure imgb0010
    In equation (6) the aliasing cancellation component is the leftmost term, derived from equation (2) and the rightmost term is the lossy time segment component. From Equation (2), -JX0W0 is the aliasing component in the time-domain aliased signal in Equation (5) for frame X1. To correct for this aliasing component, the leftmost term in Equation (6), generated based on the decoder previously decoding frame X0 of the lossless stream 415, is added to the lossy time segment component for frame X1. Equation (6) also illustrates the normalizing step 250, as the terms are multiplied by the inverse window function term W1 -1 for transition frame X 1 410. Audio playback of the lossy coded time segment may then be provided by the decoder at step 260, beginning with the unaliased signal 430 at the transition frame.
  • While the above discussion focuses on the transition from lossless encoding to lossy encoding, the reverse operation may be performed as well using the principles of the present invention. FIG. 5 shows a flow diagram for a method 500 of switching back from lossy coded time segments to a lossless stream of the same source audio, according to an embodiment. In the discussion below, reference is also made to FIG. 4B, a diagram 450 which shows exemplary signal segments where minimized forwarding aliasing cancelation is used to switch between a lossy coded time segment and a lossless stream, according to an embodiment.
  • Similarly as described above in the discussion of FIG. 2, the decoder may receive a lossy time segment 465 that includes audio encoded using a lossy coding method over a network at step 505. The decoder may also provide audio playback of the lossy coded time segments. The decoder may also receive, over the network, a lossless stream 470 that includes the audio encoded using a lossless coding method at step 510. In response to receiving a determination that network bandwidth is no longer constrained, the decoder may switch the playback from the lossy coded time segments to the lossless stream. The decoder may perform the switch automatically, after determining that network bandwidth exceeds a predetermined threshold for providing adequate performance for the lossless stream, or in response to a user-provided indication on an interface in communication with the decoder.
  • To switch from a lossy coded time segment to the lossless stream, the decoder may generate an aliasing cancellation component 475 based on previously-decoded frames of the lossless stream at step 520. In the case of switching from lossy coding to lossless coding, the previously-decoded frame may be the subsequent frame (i.e., the first decoded frame of the lossless stream). To derive the aliasing cancellation component 475, the aliased lossy time segment for frame X 2 455 may be rewritten, based on equation (3) as: X 2 + JX 3 .
    Figure imgb0011
    As described above, MDCT window vector Wk may be introduced, causing the above equation to be rewritten as: X 2 W 2 + JX 3 W 3
    Figure imgb0012
    Based on the conditions on perfect reconstruction described above, the decoder may reconstruct transition frame X 2 455 as a sum of a lossy time segment component 465 and aliasing cancellation component for adjacent time segment X 3 460. Using lossless signals from frames after the transition frame is possible due to the decoder receiving both the lossy and the lossless streams, and by buffering decoded time segments of the lossless stream. The determined aliasing cancellation component for segment X 3 460 may then be used to extrapolate the aliasing cancellation component for frame X 2 455. Returning to FIG. 5, the generated aliasing cancellation component 475 may be added to the lossy time segment 465 at a transition frame 455 at step 540. The sum of the aliasing cancellation component and the lossy time segment may be normalized using a weight caused by the encoding window applied to the transition frame at step 550, thereby providing aliasing cancellation at the transition frame.
  • An exemplary unaliased signal 480 for frame X3 may be expressed as shown below, in equation (8). JX 2 W 2 + X 3 W 3 J X 3 W 3 J W 2 1 .
    Figure imgb0013
    In equation (8) the aliasing cancellation component is the rightmost term, derived from equation (3) and the leftmost term is the lossy time segment component. From Equation (3), JX3W3 is the aliasing component in the time-domain aliased signal in Equation (7) for frame X3. To correct for this aliasing component, the rightmost term in equation (8), generated based on the decoder previously-decoding (yet subsequent) frame X3 of the lossless stream 470, is added to the lossy time segment component for transition frame X2. Equation (8) illustrates the normalizing step 550 as well, where the terms are multiplied by the inverse window function term W2 -1 for transition frame X 2 455. Audio playback of the lossless stream may then be provided by the decoder, after the unaliased signal 480 at step 560.
  • FIG. 6 is a block diagram of an exemplary system for providing decoder-side switching between lossy coded time segments and a lossless stream of the same source audio as described above. With reference to FIG. 6, an exemplary system for implementing the subject matter disclosed herein, including the methods described above, includes a hardware device 600, including a processing unit 602, memory 604, storage 606, data entry module 608, display adapter 610, communication interface 612, and a bus 614 that couples elements 604-612 to the processing unit 602.
  • The bus 614 may comprise any type of bus architecture. Examples include a memory bus, a peripheral bus, a local bus, etc. The processing unit 602 is an instruction execution machine, apparatus, or device and may comprise a microprocessor, a digital signal processor, a graphics processing unit, an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), etc. The processing unit 602 may be configured to execute program instructions stored in memory 604 and/or storage 606 and/or received via data entry module 608.
  • The memory 604 may include read only memory (ROM) 616 and random access memory (RAM) 618. Memory 604 may be configured to store program instructions and data during operation of device 600. In various embodiments, memory 604 may include any of a variety of memory technologies such as static random access memory (SRAM) or dynamic RAM (DRAM), including variants such as dual data rate synchronous DRAM (DDR SDRAM), error correcting code synchronous DRAM (ECC SDRAM), or RAMBUS DRAM (RDRAM), for example. Memory 604 may also include nonvolatile memory technologies such as nonvolatile flash RAM (NVRAM) or ROM. In some embodiments, it is contemplated that memory 604 may include a combination of technologies such as the foregoing, as well as other technologies not specifically mentioned. When the subject matter is implemented in a computer system, a basic input/output system (BIOS) 620, containing the basic routines that help to transfer information between elements within the computer system, such as during start-up, is stored in ROM 616.
  • The storage 606 may include a flash memory data storage device for reading from and writing to flash memory, a hard disk drive for reading from and writing to a hard disk, a magnetic disk drive for reading from or writing to a removable magnetic disk, and/or an optical disk drive for reading from or writing to a removable optical disk such as a CD ROM, DVD or other optical media. The drives and their associated computer-readable media provide nonvolatile storage of computer readable instructions, data structures, program modules and other data for the hardware device 600.
  • It is noted that the methods described herein can be embodied in executable instructions stored in a non-transitory computer readable medium for use by or in connection with an instruction execution machine, apparatus, or device, such as a computer-based or processor-containing machine, apparatus, or device. It will be appreciated by those skilled in the art that for some embodiments, other types of computer readable media may be used which can store data that is accessible by a computer, such as magnetic cassettes, flash memory cards, digital video disks, Bernoulli cartridges, RAM, ROM, and the like may also be used in the exemplary operating environment. As used here, a "computer-readable medium" can include one or more of any suitable media for storing the executable instructions of a computer program in one or more of an electronic, magnetic, optical, and electromagnetic format, such that the instruction execution machine, system, apparatus, or device can read (or fetch) the instructions from the computer readable medium and execute the instructions for carrying out the described methods. A non-exhaustive list of conventional exemplary computer readable medium includes: a portable computer diskette; a RAM; a ROM; an erasable programmable read only memory (EPROM or flash memory); optical storage devices, including a portable compact disc (CD), a portable digital video disc (DVD), a high definition DVD (HD-DVD™), a BLU-RAY disc; and the like.
  • A number of program modules may be stored on the storage 606, ROM 616 or RAM 618, including an operating system 622, one or more applications programs 624, program data 626, and other program modules 628. A user may enter commands and information into the hardware device 600 through data entry module 608. Data entry module 608 may include mechanisms such as a keyboard, a touch screen, a pointing device, etc. Other external input devices (not shown) are connected to the hardware device 600 via external data entry interface 630. By way of example and not limitation, external input devices may include a microphone, joystick, game pad, satellite dish, scanner, or the like. In some embodiments, external input devices may include video or audio input devices such as a video camera, a still camera, etc. Data entry module 608 may be configured to receive input from one or more users of device 600 and to deliver such input to processing unit 602 and/or memory 604 via bus 614.
  • The hardware device 600 may operate in a networked environment using logical connections to one or more remote nodes (not shown) via communication interface 612. The remote node may be another computer, a server, a router, a peer device or other common network node, and typically includes many or all of the elements described above relative to the hardware device 600. The communication interface 612 may interface with a wireless network and/or a wired network. Examples of wireless networks include, for example, a BLUETOOTH network, a wireless personal area network, a wireless 802.11 local area network (LAN), and/or wireless telephony network (e.g., a cellular, PCS, or GSM network). Examples of wired networks include, for example, a LAN, a fiber optic network, a wired personal area network, a telephony network, and/or a wide area network (WAN). Such networking environments are commonplace in intranets, the Internet, offices, enterprise-wide computer networks and the like. In some embodiments, communication interface 612 may include logic configured to support direct memory access (DMA) transfers between memory 604 and other devices.
  • In a networked environment, program modules depicted relative to the hardware device 600, or portions thereof, may be stored in a remote storage device, such as, for example, on a server. It will be appreciated that other hardware and/or software to establish a communications link between the hardware device 600 and other devices may be used.
  • It should be understood that the arrangement of hardware device 600 illustrated in FIG. 6 is but one possible implementation and that other arrangements are possible. It should also be understood that the various system components (and means) defined by the claims, described above, and illustrated in the various block diagrams represent logical components that are configured to perform the functionality described herein. For example, one or more of these system components (and means) can be realized, in whole or in part, by at least some of the components illustrated in the arrangement of hardware device 600. In addition, while at least one of these components are implemented at least partially as an electronic hardware component, and therefore constitutes a machine, the other components may be implemented in software, hardware, or a combination of software and hardware. More particularly, at least one component defined by the claims is implemented at least partially as an electronic hardware component, such as an instruction execution machine (e.g., a processor-based or processor-containing machine) and/or as specialized circuits or circuitry (e.g., discrete logic gates interconnected to perform a specialized function), such as those illustrated in FIG. 6. Other components may be implemented in software, hardware, or a combination of software and hardware. Moreover, some or all of these other components may be combined, some may be omitted altogether, and additional components can be added while still achieving the functionality described herein. Thus, the subject matter described herein can be embodied in many different variations, and all such variations are contemplated to be within the scope of what is claimed.
  • In the description above, the subject matter may be described with reference to acts and symbolic representations of operations that are performed by one or more devices, unless indicated otherwise. As such, it will be understood that such acts and operations, which are at times referred to as being computer-executed, include the manipulation by the processing unit of data in a structured form. This manipulation transforms the data or maintains it at locations in the memory system of the computer, which reconfigures or otherwise alters the operation of the device in a manner well understood by those skilled in the art. The data structures where data is maintained are physical locations of the memory that have particular properties defined by the format of the data. However, while the subject matter is being described in the foregoing context, it is not meant to be limiting as those of skill in the art will appreciate that various of the acts and operation described hereinafter may also be implemented in hardware.
  • For purposes of the present description, the terms "component," "module," and "process," may be used interchangeably to refer to a processing unit that performs a particular function and that may be implemented through computer program code (software), digital or analog circuitry, computer firmware, or any combination thereof.
  • It should be noted that the various functions disclosed herein may be described using any number of combinations of hardware, firmware, and/or as data and/or instructions embodied in various machine-readable or computer-readable media, in terms of their behavioral, register transfer, logic component, and/or other characteristics. Computer-readable media in which such formatted data and/or instructions may be embodied include, but are not limited to, physical (non-transitory), non-volatile storage media in various forms, such as optical, magnetic or semiconductor storage media.
  • Unless the context clearly requires otherwise, throughout the description and the claims, the words "comprise," "comprising," and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is to say, in a sense of "including, but not limited to." Words using the singular or plural number also include the plural or singular number respectively. Additionally, the words "herein," "hereunder," "above," "below," and words of similar import refer to this application as a whole and not to any particular portions of this application. When the word "or" is used in reference to a list of two or more items, that word covers all of the following interpretations of the word: any of the items in the list, all of the items in the list and any combination of the items in the list.
  • In the description above and throughout, numerous specific details are set forth in order to provide a thorough understanding of the disclosure. It will be evident, however, to one of ordinary skill in the art, that the disclosure may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form to facilitate explanation. The description of the preferred an embodiment is not intended to limit the scope of the claims appended hereto. Further, in the methods disclosed herein, various steps are disclosed illustrating some of the functions of the disclosure. One will appreciate that these steps are merely exemplary and are not meant to be limiting in any way. Other steps and functions may be contemplated without departing from this disclosure.
  • Systems and methods are described for switching between lossy coded time segments and a lossless stream of the same source audio. A decoder may receive a stream of lossy coded time segments that includes audio encoded using a frequency-domain lossy coding method over a network. The decoder may also receive, over the network, a lossless stream that includes the audio encoded using a lossless coding method. The decoder may provide audio playback of the lossless stream. The lossy coded time segments and the lossless stream may be encoded from the same source audio, may also be time-aligned, and may have a same sampling rate.
  • In response to receiving a determination that network bandwidth is constrained, the decoder may generate an aliasing cancellation component based on a previously-decoded frame of the lossless stream. The generated aliasing cancellation component may be added to a lossy time segment at a transition frame. The sum of the aliasing cancellation component and the lossy time segment may be normalized using a weight caused by an encoding window applied to the transition frame, thereby providing aliasing cancellation on the transition frame. Audio playback of the lossy time segment may then be provided by the decoder, beginning with the aliasing-canceled transition frame.
  • Different embodiments may provide variations on the basic principles outlined above. For example, the lossless coding method may use MDCT with overlapping windows, and the lossless coding method may use rectangular non-overlapping windows that are different from windows used by the lossy coding method. The switching may be performed in various locations in the decoding process, including, in response to receiving the determination that network bandwidth is constrained, selecting the transition frame to be before a peak-limiter is applied to the lossless stream. In some embodiments, the aliasing cancellation component may be generated based on reconstructing an adjacent frame of the previously-decoded frames of the lossless stream as a sum of a lossy component and an aliasing cancellation component for the adjacent frame. The aliasing cancellation component for the adjacent frame may then be extrapolated to generate the aliasing cancellation component for the transition frame. The normalizing may include multiplying the sum of the aliasing cancellation component and the lossy time segment by an inverse window function vector determined for the transition frame. Also, the determination that network bandwidth is constrained may be received using several different methods. For example, the determination may be received from a user-provided indication on an interface in communication with the decoder (e.g., using a software application running on a computer). The determination may alternatively be automatically generated by software that monitors network bandwidth.
  • Computer program products, comprising computer-readable program code to be executed by one or more processors when retrieved from a non-transitory computer-readable medium, are also described for switching between lossy coded time segments and a lossless stream of the same source audio. The program code may include instructions to receive a stream of lossy coded time segments that includes audio encoded using a frequency-domain lossy coding method over a network. The program code may also include instructions to receive, over the network, a lossless stream that includes the audio encoded using a lossless coding method. The decoder may provide audio playback of the lossless stream. The lossy coded time segments and the lossless stream may be encoded from the same source audio, may also be time-aligned, and may have a same sampling rate.
  • In response to receiving a determination that network bandwidth is constrained, the decoder may generate an aliasing cancellation component based on a previously-decoded frame of the lossless stream. The generated aliasing cancellation component may be added to a lossy time segment at a transition frame. The sum of the aliasing cancellation component and the lossy time segment may be normalized using a weight caused by an encoding window applied to the transition frame, thereby providing aliasing cancellation on the transition frame. Audio playback of the lossy time segment may then be provided by the decoder, beginning with the aliasing-canceled transition frame.
  • Regarding the above-described decoder, an exemplary decoder may include a lossy decoder circuit, an lossless decoder circuit, and an analysis circuit. The lossy decoder circuit may receive lossy coded time segments from a network connection, the lossy coded time segments comprising audio encoded using a frequency-domain lossy coding method. The lossless decoder circuit may receive a lossless stream from the network connection, the lossless stream comprising the audio encoded using a lossless coding method, the lossy coded time segments and the lossless stream being encoded from the same source audio, being time-aligned, and having a same sampling rate, the decoder providing audio playback of the lossless stream. The analysis circuit may be coupled to both the lossy decoder circuit and the lossless decoder circuit. The analysis circuit may, in response to a determination that network bandwidth is constrained, generate an aliasing cancellation component based on previously-decoded frames of the lossless stream, add the generated aliasing cancellation component to a lossy time segment at a transition frame, normalize the sum of the generated aliasing cancellation component and the lossy time segment using a weight caused by an encoding window applied to the transition frame to provide aliasing cancellation on the transition frame, and provide audio playback of the lossy coded time segments beginning with the aliasing-canceled transition frame.
  • Different embodiments may provide variations on the decoder outlined above. For example, the lossless coding method may use MDCT with overlapping windows, and the lossless coding method may use rectangular non-overlapping windows that are different from windows used by the lossy coding method. The switching may be performed in various locations in the decoding process. For example, the analysis circuit may, in response to receiving the determination that network bandwidth is constrained, select the transition frame to be before a peak-limiter is applied to the lossless stream. In some embodiments, the aliasing cancellation component may be generated based on reconstructing an adjacent frame of the previously-decoded frames of the lossless stream as a sum of a lossy component and an aliasing cancellation component for the adjacent frame. The aliasing cancellation component for the adjacent frame may then be extrapolated to generate the aliasing cancellation component for the transition frame. Furthermore, the normalizing may include multiplying the sum of the aliasing cancellation component and the lossy time segment by an inverse window function vector determined for the transition frame.
  • Various aspects of the present invention may be appreciated from the following enumerated example embodiments (EEEs):
    1. 1. A method comprising:
      • receiving, by a decoder over a network, lossy coded time segments comprising audio encoded using a frequency-domain lossy coding method;
      • receiving, by the decoder over the network, a lossless stream, the lossless stream comprising the audio encoded using a lossless coding method, the lossy coded time segments and the lossless stream being encoded from the same source audio, being time-aligned, and having a same sampling rate, the decoder providing audio playback of the lossless stream;
      • in response to receiving a determination that network bandwidth is constrained:
        • generating, by the decoder, an aliasing cancellation component based on previously-decoded frames of the lossless stream;
        • adding the generated aliasing cancellation component to a lossy time segment at a transition frame;
        • normalizing, the sum of the generated aliasing cancellation component and the lossy time segment using a weight caused by an encoding window applied to the transition frame, thereby providing aliasing cancellation on the transition frame; and
        • providing audio playback of the lossy coded time segments beginning with the aliasing-canceled transition frame.
    2. 2. The method of EEE 1, wherein the lossy coding method uses MDCT with overlapping windows.
    3. 3. The method of any preceding EEE, wherein the lossless coding method uses rectangular non-overlapping windows that are different from windows used by the lossy coding method.
    4. 4. The method of any preceding EEE, further comprising, in response to receiving the determination that network bandwidth is constrained, selecting the transition frame to be before a peak-limiter is applied to the lossless stream.
    5. 5. The method of any preceding EEE, the generating the aliasing cancellation component comprising:
      • reconstructing an adjacent frame of the previously-decoded frames of the lossless stream as a sum of a lossy component and an aliasing cancellation component for the adjacent frame; and
      • extrapolating the aliasing cancellation component for the adjacent frame to generate the aliasing cancellation component.
    6. 6. The method of any preceding EEE, the normalizing comprising multiplying by an inverse window function vector determined for the transition frame.
    7. 7. The method of any preceding EEE, the determination that network bandwidth is constrained being received from a user-provided indication on an interface in communication with the decoder.
    8. 8. A computer program product comprising computer-readable program code to be executed by one or more processors when retrieved from a non-transitory computer-readable medium, the program code including instructions to:
      • receive lossy coded time segments comprising audio encoded using a frequency-domain lossy coding method;
      • receive a lossless stream, the lossless stream comprising the audio encoded using a lossless coding method, the lossy coded time segments and the lossless stream being encoded from the same source audio, being time-aligned, and having a same sampling rate, the decoder providing audio playback of the lossless stream;
      • in response to receiving a determination that network bandwidth is constrained:
        • generate, by the decoder, an aliasing cancellation component based on previously-decoded frames of the lossless stream;
        • add the generated aliasing cancellation component to a lossy time segment at a transition frame;
        • normalize the sum of the generated aliasing cancellation component and the lossy time segment using a weight caused by an encoding window applied to the transition frame, thereby providing aliasing cancellation on the transition frame; and
      • provide audio playback of the lossy coded time segments beginning with the aliasing-canceled transition frame.
    9. 9. The computer program product of EEE 8, wherein the lossy coding method uses MDCT with overlapping windows.
    10. 10. The computer program product of EEE 8 or 9, wherein the lossless coding method uses rectangular non-overlapping windows that are different from windows used by the lossy coding method.
    11. 11. The computer program product of any one of EEEs 8 to 10, the program code further including instructions to, in response to receiving the determination that network bandwidth is constrained, select the transition frame to be before a peak-limiter is applied to the lossless stream.
    12. 12. The computer program product of any one of EEEs 8 to 11, wherein the instructions to generate the aliasing cancellation component include instructions to:
      • reconstruct an adjacent frame of the previously-decoded frames of the lossless stream as a sum of a lossy component and an aliasing cancellation component for the adjacent frame; and
      • extrapolate the aliasing cancellation component for the adjacent frame to generate the aliasing cancellation component.
    13. 13. The computer program product of any one of EEEs 8 to 12, wherein the instructions to normalize the aliasing-canceled transition frame include instructions to multiply the aliasing-canceled transition frame by an inverse window function vector determined for the transition frame.
    14. 14. The computer program product of any one of EEEs 8 to 13, the determination that network bandwidth is constrained being received from a user-provided indication on an interface in communication with the decoder.
    15. 15. A decoder system for audio streams comprising,
      a lossy decoder circuit that receives lossy coded time segments from a network connection, the lossy coded time segments comprising audio encoded using a frequency-domain lossy coding method;
      a lossless decoder circuit that receives a lossless stream from the network connection, the lossless stream comprising the audio encoded using a lossless coding method, the lossy coded time segments and the lossless stream being encoded from the same source audio, being time-aligned, and having a same sampling rate, the decoder providing audio playback of the lossless stream; and
      an analysis circuit coupled to both the lossy decoder circuit and the lossless decoder circuit, the analysis circuit generating, in response to a determination that network bandwidth is constrained, an aliasing cancellation component based on previously-decoded frames of the lossless stream, adding the generated aliasing cancellation component to alossy time segment at a transition frame, normalizing the sum of the generated aliasing cancellation component and the lossy time segment using a weight caused by an encoding window applied to the transition frame to provide aliasing cancellation on the transition frame, and providing audio playback of the lossy coded time segments beginning with the aliasing-canceled transition frame.
    16. 16. The system of EEE 15, wherein the lossy coding method uses MDCT with overlapping windows.
    17. 17. The system of EEE 15 or 16, wherein the lossless coding method uses rectangular non-overlapping windows that are different from windows used by the lossy coding method.
    18. 18. The system of any one of EEEs 15 to 17, the analysis circuit selecting the transition frame to be before a peak-limiter is applied to the lossless stream.
    19. 19. The system of any one of EEEs 15 to 18, the generating the aliasing cancellation component comprising:
      • reconstructing an adjacent frame of the previously-decoded frames of the lossless stream as a sum of a lossy component and an aliasing cancellation component for the adjacent frame; and
      • extrapolating the aliasing cancellation component for the adjacent frame to generate the aliasing cancellation component.
    20. 20. The system of any one of EEEs 15 to 19, the normalizing comprising multiplying by an inverse window function vector determined for the transition frame.

Claims (15)

  1. A method comprising:
    receiving (205), by a decoder over a network, lossy coded time segments comprising audio encoded using a frequency-domain lossy coding method;
    receiving (210), by the decoder over the network, a lossless stream, the lossless stream comprising the audio encoded using a lossless coding method, the lossy coded time segments and the lossless stream being encoded from the same source audio, being time-aligned, and having a same sampling rate, the decoder providing audio playback of the lossless stream;
    in response to receiving a determination that network bandwidth is constrained:
    generating (220), by the decoder, an aliasing cancellation component based on previously-decoded frames of the lossless stream;
    adding (240) the generated aliasing cancellation component to a lossy time segment at a transition frame;
    normalizing (250) the sum of the generated aliasing cancellation component and the lossy time segment using a weight caused by an encoding window applied to the transition frame, thereby providing aliasing cancellation on the transition frame; and
    providing (260) audio playback of the lossy coded time segments beginning with the aliasing-canceled transition frame.
  2. The method of claim 1, wherein the lossy coding method uses MDCT with overlapping windows.
  3. The method of any preceding claim, wherein the lossless coding method uses rectangular non-overlapping windows that are different from windows used by the lossy coding method.
  4. The method of any preceding claim, further comprising, in response to receiving the determination that network bandwidth is constrained, selecting the transition frame to be before a peak-limiter is applied to the lossless stream.
  5. The method of any preceding claim, the generating the aliasing cancellation component comprising:
    reconstructing an adjacent frame of the previously-decoded frames of the lossless stream as a sum of a lossy component and an aliasing cancellation component for the adjacent frame; and
    extrapolating the aliasing cancellation component for the adjacent frame to generate the aliasing cancellation component.
  6. The method of any preceding claim, the normalizing comprising multiplying by an inverse window function vector determined for the transition frame.
  7. The method of any preceding claim, the determination that network bandwidth is constrained being received from a user-provided indication on an interface in communication with the decoder.
  8. A computer program product comprising computer-readable program code to be executed by one or more processors of a decoder when retrieved from a non-transitory computer-readable medium, the program code including instructions to:
    receive (205) lossy coded time segments comprising audio encoded using a frequency-domain lossy coding method;
    receive (210) a lossless stream, the lossless stream comprising the audio encoded using a lossless coding method, the lossy coded time segments and the lossless stream being encoded from the same source audio, being time-aligned, and having a same sampling rate, the decoder providing audio playback of the lossless stream;
    in response to receiving a determination that network bandwidth is constrained:
    generate (220), by the decoder, an aliasing cancellation component based on previously-decoded frames of the lossless stream;
    add (240) the generated aliasing cancellation component to a lossy time segment at a transition frame;
    normalize (250) the sum of the generated aliasing cancellation component and the lossy time segment using a weight caused by an encoding window applied to the transition frame, thereby providing aliasing cancellation on the transition frame; and
    provide (260) audio playback of the lossy coded time segments beginning with the aliasing-canceled transition frame.
  9. The computer program product of claim 8, wherein the instructions to generate the aliasing cancellation component include instructions to:
    reconstruct an adjacent frame of the previously-decoded frames of the lossless stream as a sum of a lossy component and an aliasing cancellation component for the adjacent frame; and
    extrapolate the aliasing cancellation component for the adjacent frame to generate the aliasing cancellation component.
  10. A decoder for audio streams comprising,
    a lossy decoder circuit that receives (205) lossy coded time segments from a network connection, the lossy coded time segments comprising audio encoded using a frequency-domain lossy coding method;
    a lossless decoder circuit that receives (210) a lossless stream from the network connection, the lossless stream comprising the audio encoded using a lossless coding method, the lossy coded time segments and the lossless stream being encoded from the same source audio, being time-aligned, and having a same sampling rate, the decoder providing audio playback of the lossless stream; and
    an analysis circuit coupled to both the lossy decoder circuit and the lossless decoder circuit, the analysis circuit generating (220), in response to a determination that network bandwidth is constrained, an aliasing cancellation component based on previously-decoded frames of the lossless stream, adding (240) the generated aliasing cancellation component to a lossy time segment at a transition frame, normalizing (250) the sum of the generated aliasing cancellation component and the lossy time segment using a weight caused by an encoding window applied to the transition frame to provide aliasing cancellation on the transition frame, and providing (260) audio playback of the lossy coded time segments beginning with the aliasing-canceled transition frame.
  11. The decoder of claim 10, wherein the lossy coding method uses MDCT with overlapping windows.
  12. The decoder of claim 10 or 11, wherein the lossless coding method uses rectangular non-overlapping windows that are different from windows used by the lossy coding method.
  13. The decoder of any one of claims 10 to 12, the analysis circuit selecting the transition frame to be before a peak-limiter is applied to the lossless stream.
  14. The decoder of any one of claims 10 to 13, the generating the aliasing cancellation component comprising:
    reconstructing an adjacent frame of the previously-decoded frames of the lossless stream as a sum of a lossy component and an aliasing cancellation component for the adjacent frame; and
    extrapolating the aliasing cancellation component for the adjacent frame to generate the aliasing cancellation component.
  15. The decoder of any one of claims 10 to 14, the normalizing comprising multiplying by an inverse window function vector determined for the transition frame.
EP18191910.1A 2017-08-31 2018-08-31 Decoder-provided time domain aliasing cancellation during lossy/lossless transitions Active EP3451332B1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201762553042P 2017-08-31 2017-08-31
EP17188694 2017-08-31

Publications (2)

Publication Number Publication Date
EP3451332A1 true EP3451332A1 (en) 2019-03-06
EP3451332B1 EP3451332B1 (en) 2020-03-25

Family

ID=59745775

Family Applications (1)

Application Number Title Priority Date Filing Date
EP18191910.1A Active EP3451332B1 (en) 2017-08-31 2018-08-31 Decoder-provided time domain aliasing cancellation during lossy/lossless transitions

Country Status (1)

Country Link
EP (1) EP3451332B1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111049829A (en) * 2019-12-13 2020-04-21 南方科技大学 Video streaming transmission method and device, computer equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7424434B2 (en) * 2002-09-04 2008-09-09 Microsoft Corporation Unified lossy and lossless audio compression
US7617097B2 (en) * 2002-03-09 2009-11-10 Samsung Electronics Co., Ltd. Scalable lossless audio coding/decoding apparatus and method
US20120022880A1 (en) * 2010-01-13 2012-01-26 Bruno Bessette Forward time-domain aliasing cancellation using linear-predictive filtering

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7617097B2 (en) * 2002-03-09 2009-11-10 Samsung Electronics Co., Ltd. Scalable lossless audio coding/decoding apparatus and method
US7424434B2 (en) * 2002-09-04 2008-09-09 Microsoft Corporation Unified lossy and lossless audio compression
US20120022880A1 (en) * 2010-01-13 2012-01-26 Bruno Bessette Forward time-domain aliasing cancellation using linear-predictive filtering

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
BRITANAK, VLADIMIR; HUIBERT J. LINCKLAEN ARRIENS: "Fast computational structures for an efficient implementation of the complete TDAC analysis/synthesis MDCT/MDST filter banks", SIGNAL PROCESSING, vol. 89.7, 2009, pages 1379 - 1394, XP026073418, DOI: doi:10.1016/j.sigpro.2009.01.014
RIEDMILLER ET AL.: "Delivering Scalable Audio Experiences using AC-4", IEEE TRANSACTIONS ON BROADCASTING, vol. 63, no. 1, March 2017 (2017-03-01), pages 179 - 198

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111049829A (en) * 2019-12-13 2020-04-21 南方科技大学 Video streaming transmission method and device, computer equipment and storage medium
CN111049829B (en) * 2019-12-13 2021-12-03 南方科技大学 Video streaming transmission method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
EP3451332B1 (en) 2020-03-25

Similar Documents

Publication Publication Date Title
US20200335117A1 (en) Encoder, decoder and methods for encoding and decoding data segments representing a time-domain data stream
RU2625444C2 (en) Audio processing system
EP2619758B1 (en) Audio signal transformer and inverse transformer, methods for audio signal analysis and synthesis
US20140046670A1 (en) Audio encoding method and apparatus, audio decoding method and apparatus, and multimedia device employing the same
JP6486962B2 (en) Method, encoder and decoder for linear predictive encoding and decoding of speech signals by transitioning between frames with different sampling rates
JP2014505272A (en) Low-delay acoustic coding that repeats predictive coding and transform coding
JPWO2009081567A1 (en) Stereo signal conversion apparatus, stereo signal inverse conversion apparatus, and methods thereof
US20150187361A1 (en) Smooth configuration switching for multichannel audio
EP3553777B1 (en) Low-complexity packet loss concealment for transcoded audio signals
WO2013061584A1 (en) Hybrid sound-signal decoder, hybrid sound-signal encoder, sound-signal decoding method, and sound-signal encoding method
US10438597B2 (en) Decoder-provided time domain aliasing cancellation during lossy/lossless transitions
US20110087494A1 (en) Apparatus and method of encoding audio signal by switching frequency domain transformation scheme and time domain transformation scheme
EP3451332B1 (en) Decoder-provided time domain aliasing cancellation during lossy/lossless transitions
JP6654236B2 (en) Encoder, decoder and method for signal adaptive switching of overlap rate in audio transform coding
Helmrich et al. Low-delay transform coding using the MPEG-H 3D audio codec
US20220165283A1 (en) Time-varying time-frequency tilings using non-uniform orthogonal filterbanks based on mdct analysis/synthesis and tdar
KR101601906B1 (en) Apparatus and method for coding audio signal by swithcing transform scheme among frequency domain transform and time domain transform
JP7420829B2 (en) Method and apparatus for low cost error recovery in predictive coding
KR102654181B1 (en) Method and apparatus for low-cost error recovery in predictive coding
KR101805631B1 (en) Apparatus and method for coding audio signal by swithcing transform scheme among frequency domain transform and time domain transform
KR101702565B1 (en) Apparatus and method for coding audio signal by swithcing transform scheme among frequency domain transform and time domain transform
US20220172732A1 (en) Method and apparatus for error recovery in predictive coding in multichannel audio frames
KR101178222B1 (en) Method for encoding and decoding audio and apparatus thereof
KR20240046634A (en) Method and apparatus for low cost error recovery in predictive oding

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN PUBLISHED

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20190906

RBV Designated contracting states (corrected)

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: GRANT OF PATENT IS INTENDED

RIC1 Information provided on ipc code assigned before grant

Ipc: G10L 19/24 20130101ALI20190925BHEP

Ipc: G10L 19/00 20130101AFI20190925BHEP

Ipc: G10L 19/02 20130101ALN20190925BHEP

INTG Intention to grant announced

Effective date: 20191023

RIN1 Information on inventor provided before grant (corrected)

Inventor name: BISWAS, ARIJIT

GRAS Grant fee paid

Free format text: ORIGINAL CODE: EPIDOSNIGR3

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE PATENT HAS BEEN GRANTED

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

REG Reference to a national code

Ref country code: GB

Ref legal event code: FG4D

REG Reference to a national code

Ref country code: AT

Ref legal event code: REF

Ref document number: 1249490

Country of ref document: AT

Kind code of ref document: T

Effective date: 20200415

Ref country code: IE

Ref legal event code: FG4D

REG Reference to a national code

Ref country code: DE

Ref legal event code: R096

Ref document number: 602018003260

Country of ref document: DE

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: FI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200325

Ref country code: RS

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200325

Ref country code: NO

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200625

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: BG

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200625

Ref country code: SE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200325

Ref country code: LV

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200325

Ref country code: HR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200325

Ref country code: GR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200626

REG Reference to a national code

Ref country code: NL

Ref legal event code: MP

Effective date: 20200325

REG Reference to a national code

Ref country code: LT

Ref legal event code: MG4D

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: NL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200325

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: EE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200325

Ref country code: SM

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200325

Ref country code: RO

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200325

Ref country code: IS

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200725

Ref country code: SK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200325

Ref country code: CZ

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200325

Ref country code: PT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200818

Ref country code: LT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200325

REG Reference to a national code

Ref country code: AT

Ref legal event code: MK05

Ref document number: 1249490

Country of ref document: AT

Kind code of ref document: T

Effective date: 20200325

REG Reference to a national code

Ref country code: DE

Ref legal event code: R097

Ref document number: 602018003260

Country of ref document: DE

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: AT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200325

Ref country code: IT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200325

Ref country code: DK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200325

Ref country code: ES

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200325

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: PL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200325

26N No opposition filed

Effective date: 20210112

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: MC

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200325

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: LU

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20200831

REG Reference to a national code

Ref country code: BE

Ref legal event code: MM

Effective date: 20200831

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200325

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: IE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20200831

Ref country code: BE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20200831

REG Reference to a national code

Ref country code: CH

Ref legal event code: PL

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: LI

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20210831

Ref country code: CH

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20210831

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: TR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200325

Ref country code: MT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200325

Ref country code: CY

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200325

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: MK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200325

Ref country code: AL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200325

REG Reference to a national code

Ref country code: DE

Ref legal event code: R081

Ref document number: 602018003260

Country of ref document: DE

Owner name: DOLBY INTERNATIONAL AB, IE

Free format text: FORMER OWNER: DOLBY INTERNATIONAL AB, AMSTERDAM ZUIDOOST, NL

Ref country code: DE

Ref legal event code: R081

Ref document number: 602018003260

Country of ref document: DE

Owner name: DOLBY INTERNATIONAL AB, NL

Free format text: FORMER OWNER: DOLBY INTERNATIONAL AB, AMSTERDAM ZUIDOOST, NL

REG Reference to a national code

Ref country code: DE

Ref legal event code: R081

Ref document number: 602018003260

Country of ref document: DE

Owner name: DOLBY INTERNATIONAL AB, IE

Free format text: FORMER OWNER: DOLBY INTERNATIONAL AB, DP AMSTERDAM, NL

P01 Opt-out of the competence of the unified patent court (upc) registered

Effective date: 20230512

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 20230720

Year of fee payment: 6

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FR

Payment date: 20230720

Year of fee payment: 6

Ref country code: DE

Payment date: 20230720

Year of fee payment: 6