WO2013073810A1 - Appareil d'encodage et appareil de décodage prenant en charge un signal audio multicanal pouvant être mis à l'échelle, et procédé pour des appareils effectuant ces encodage et décodage - Google Patents

Appareil d'encodage et appareil de décodage prenant en charge un signal audio multicanal pouvant être mis à l'échelle, et procédé pour des appareils effectuant ces encodage et décodage Download PDF

Info

Publication number
WO2013073810A1
WO2013073810A1 PCT/KR2012/009543 KR2012009543W WO2013073810A1 WO 2013073810 A1 WO2013073810 A1 WO 2013073810A1 KR 2012009543 W KR2012009543 W KR 2012009543W WO 2013073810 A1 WO2013073810 A1 WO 2013073810A1
Authority
WO
WIPO (PCT)
Prior art keywords
bitstream
signal
audio signal
multichannel audio
encoding
Prior art date
Application number
PCT/KR2012/009543
Other languages
English (en)
Korean (ko)
Inventor
서정일
백승권
강경옥
이태진
이용주
유재현
최근우
김진웅
Original Assignee
한국전자통신연구원
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from KR1020120127499A external-priority patent/KR102172279B1/ko
Application filed by 한국전자통신연구원 filed Critical 한국전자통신연구원
Priority to US14/358,104 priority Critical patent/US20140310010A1/en
Publication of WO2013073810A1 publication Critical patent/WO2013073810A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/24Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing

Definitions

  • the present invention relates to an encoding apparatus and a decoding apparatus for supporting a scalable multichannel audio signal, and a method performed by the apparatus, and to compresses the multichannel audio signal in order to provide three-dimensional audio in a realistic broadcasting environment that provides an excellent realism.
  • the present invention relates to an apparatus and a method for restoring the same.
  • Multichannel audio signals such as 5.1 channel signals
  • This encoding / decoding technique is based on a psychological audio model and a perceptual audio coding technique using time / frequency transform.
  • a channel coding technique that uses the correlation between adjacent signals in a multichannel audio signal is additionally used.
  • Spatial audio encoding technology refers to a technique for downmixing a multichannel audio signal into a mono or stereo signal and encoding and representing a spatial parameter necessary for reconstructing the multichannel audio signal with additional information.
  • Spatial audio coding technology is MPEG Surround, which is standardized in MPEG.
  • a loudspeaker of 10 channels or more may be required.
  • a 22.2 channel multichannel audio reproduction system may be an example.
  • the newly proposed multi-channel audio signal encoding and reproducing technique is a two-channel stereo system that is widely used in previous reproduction environments. Or it needs to provide the ability to maintain or convert compatibility with 5.1 channel systems.
  • the present invention proposes a method of compressing and reconstructing a multichannel audio signal for providing 3D audio in a realistic broadcasting environment that provides a realistic feeling, such as 3DTV or Ultra High Definition TeleVision (UHDTV).
  • a realistic feeling such as 3DTV or Ultra High Definition TeleVision (UHDTV).
  • the present invention provides an apparatus and method for performing scalable sound quality encoding and decoding for providing adaptive sound quality according to a transmission environment, performance of a terminal, and taste of a listener.
  • the present invention provides an apparatus and method for performing scalable channel encoding and decoding for providing an adaptive multi-channel sound according to a transmission environment, a reproduction environment of a terminal (speaker arrangement environment), and a taste of a listener.
  • the present invention provides an apparatus and method capable of processing an audio object signal for providing an interactive function to a listener or for providing a three-dimensional effect independent of a specific audio object signal.
  • An encoding apparatus includes a signal generator for generating a compatible multichannel audio signal using an audio object signal and an input multichannel audio signal; A first encoder configured to hierarchically encode the compatible multichannel audio signal to generate a first bitstream; A second encoder which encodes the audio object signal to generate a second bitstream; And a bitstream formatter configured to generate an output bitstream using the first bitstream and the second bitstream.
  • a decoding apparatus is a bitstream demultiplexer for extracting a first bitstream including an encoded multichannel audio signal and a second bitstream including an encoded audio object signal from an output bitstream. ; A first decoder which decodes the first bitstream and outputs a compatible multichannel audio signal; A second decoder which outputs an audio object signal by decoding the second bitstream;
  • the apparatus may include a renderer configured to synthesize the output compatible multichannel audio signal and an audio object signal.
  • An encoding method comprises the steps of: generating a compatible multichannel audio signal using an audio object signal and an input multichannel audio signal; Hierarchically encoding the compatible multichannel audio signals to generate a first bitstream; Encoding the audio object signal to generate a second bitstream; The method may include generating an output bitstream using the first bitstream and the second bitstream.
  • a decoding method includes extracting a first bitstream including a coded compatible multichannel audio signal and a second bitstream including a coded audio object signal from an output bitstream; Decoding the first bitstream to output a compatible multichannel audio signal; Outputting an audio object signal by decoding the second bitstream; And synthesizing the output compatible multichannel audio signal and an audio object signal.
  • An output bitstream includes a first bitstream in which a multichannel audio signal and an audio object signal are encoded; A second bitstream in which the audio object signal is encoded; First additional information for editing an audio object signal in the compatible multichannel audio signal; And additional information including at least one of second additional information related to the compatible multichannel audio signal and third additional information related to the audio object signal.
  • a multi-channel audio signal for providing 3D audio may be compressed and reconstructed in a realistic broadcast environment that provides a realistic feeling, such as 3DTV or Ultra High Definition TeleVision (UHDTV).
  • a realistic broadcast environment such as 3DTV or Ultra High Definition TeleVision (UHDTV).
  • scalable sound quality encoding and decoding may be performed to provide adaptive sound quality according to a transmission environment, performance of a terminal, and taste of a listener.
  • scalable channel encoding and decoding may be performed to provide an adaptive multichannel sound according to a transmission environment, a reproduction environment of a terminal (speaker arrangement environment), and a taste of a listener.
  • an audio object signal may be processed to provide an interactive function to a listener or to provide a three-dimensional effect independent of a specific audio object signal.
  • FIG. 1 is a diagram illustrating an encoding device and a decoding device according to an embodiment of the present invention.
  • FIG. 2 is a diagram illustrating a detailed configuration of an encoding apparatus according to an embodiment of the present invention.
  • FIG. 3 is a diagram illustrating a detailed configuration of a decoding apparatus according to an embodiment of the present invention.
  • FIG. 4 is a diagram for describing a scalable channel coding method according to an embodiment of the present invention.
  • FIG. 5 is a diagram for describing a scalable channel decoding method according to an embodiment of the present invention.
  • FIG. 6 is a diagram for describing a scalable sound quality coding scheme according to an embodiment of the present invention.
  • FIG. 7 is a diagram for describing a scalable sound quality decoding method according to an embodiment of the present invention.
  • FIG 8 illustrates components of an output bitstream according to an embodiment of the present invention.
  • FIG. 9 is a diagram illustrating a modularized bitstream according to an embodiment of the present invention.
  • FIG. 10 illustrates the basic structure of a modular bitstream according to an embodiment of the present invention.
  • FIG. 11 is a diagram illustrating the types of processing unit (PU) payloads in a bitstream basic structure according to an embodiment of the present invention.
  • FIG. 12 illustrates a process of restoring an audio signal according to an audio reproduction environment according to an embodiment of the present invention.
  • FIG. 13 is a diagram illustrating an encoding method according to an embodiment of the present invention.
  • FIG. 14 is a diagram illustrating a decoding method according to an embodiment of the present invention.
  • FIG. 1 is a diagram illustrating an encoding device and a decoding device according to an embodiment of the present invention.
  • the encoding apparatus 101 may receive an audio object signal and a multichannel audio signal.
  • the encoding apparatus 101 may generate an output bitstream by encoding the audio object signal and the compatible multichannel audio signal in which the audio object signal and the multichannel audio signal are combined.
  • the encoding apparatus 101 may add additional information for the audio object signal and additional information for the compatible multichannel audio signal to the output bitstream.
  • the encoding apparatus 101 may add additional information for removing or extracting the audio object signal from the compatible multichannel audio signal to the output bitstream.
  • the encoding apparatus 101 may apply scalable channel encoding and scalable sound quality encoding in the encoding process.
  • Scalable channel coding and scalable sound quality coding will be described in detail later.
  • Such an output bitstream may be transmitted to the decoding apparatus 102 in real time, or may be previously transmitted to the decoding apparatus 102 and stored in a storage medium such as a buffer or a memory of the decoding apparatus 102.
  • the output bitstream may be stored and distributed in an optical recording medium such as a CD-ROM, CD-RW, DVD-R, DVD-RW, or the like.
  • the decoding apparatus 101 may extract a multichannel audio signal compatible with the audio object signal from the input output bitstream. In addition, the decoding apparatus 101 may output the extracted compatible multichannel audio signal as it is, or output the rendered output signal in combination with the audio object signal. Here, the rendering process may be performed in consideration of the sound reproduction environment associated with the decoding apparatus.
  • the decoding device 101 refers to a playback terminal that can be connected to a wired or wireless network. In addition, the decoding apparatus 101 may be connected to at least one speaker to reproduce the audio signal in various forms.
  • FIG. 2 is a diagram illustrating a detailed configuration of an encoding apparatus according to an embodiment of the present invention.
  • the encoding apparatus 101 may include a signal generator 201, a first encoder 202, a second encoder 203, and a bitstream formatter 204.
  • the signal generator 201 may generate a backward compatible multichannel audio signal by mixing the audio object signal and the input multichannel audio signal.
  • the signal generator 201 may predict first additional information necessary for removing or extracting the audio object signal from the compatible multichannel audio signal. If the audio object signal is already included in the multichannel audio signal input to the encoding apparatus 101, the signal generator 201 may output the input multichannel audio signal as a compatible multichannel audio signal. In this case, the signal generator 201 may predict only the first additional information for removing or extracting the audio object signal from the compatible multichannel audio signal.
  • the predicted first additional information may include a spatial parameter and a residual signal per grid of time or frequency.
  • third additional information associated with the audio object signal may be further used to predict the first additional information.
  • the third additional information may include rendering information.
  • the audio object signal is associated with a sound source of the audio signal.
  • the audio object signal may include any one of an audio object signal corresponding to the time domain or an audio object signal converted into the frequency domain in the encoding process of the second encoder 203.
  • the multi-channel audio signal refers to an audio signal composed of a plurality of channels such as 2 channels, 5.1 channels, 7.1 channels, 10.2 channels, and 22.2 channels.
  • the first encoder 202 may hierarchically encode a compatible multichannel audio signal to generate a first bitstream.
  • the first bitstream may be represented as a scalable channel bitstream.
  • the first encoder 202 may predict second additional information for supporting a channel format that is not represented in the hierarchical encoding process of the compatible multichannel audio signal.
  • the second additional information may include a downmix matrix, a downmix parameter, an upmix matrix, and an upmix parameter.
  • the second encoder 203 may generate a second bitstream by encoding the audio object signal.
  • the bitstream formatter 204 may generate an output bitstream by multiplexing the first bitstream of the first encoder 202 and the second bitstream of the second encoder 203.
  • the bitstream formatter 204 may include first additional information for editing the audio object signal in the compatible multichannel audio signal, second additional information related to the compatible multichannel audio signal, and third additional information related to the audio object signal. Can be added to the output bitstream.
  • FIG. 3 is a diagram illustrating a detailed configuration of a decoding apparatus according to an embodiment of the present invention.
  • the decoding device 102 may include a bitstream demultiplexer 301, a first decoder 302, a second decoder 303, and a renderer 304.
  • the decoding device 102 uses a legacy multichannel decoding unit (not shown), and generally known multichannels such as stereo and 5.1 channels.
  • the audio signal can be restored.
  • the bitstream demultiplexer 301 may extract a first bitstream including the encoded compatible multichannel audio signal and a second bitstream including the encoded audio object signal from the output bitstream.
  • the bitstream demultiplexer 301 may divide the output bitstream into blocks of a plurality of bitstreams for each decoding block.
  • the block of the divided bitstream may include a scalable channel bitstream, an object bitstream, a scalable sound quality bitstream, additional information for the bitstreams, and header information related to an output bitstream.
  • the header information may include information necessary for initializing the entirety of the decoding apparatus 102 and for initializing each component of the decoding apparatus 102.
  • the first decoder 302 may output a compatible multichannel audio signal by decoding the first bitstream.
  • the first decoder 302 may extract a compatible multichannel audio signal corresponding to a sound reproduction environment of the decoding apparatus by using additional information related to the compatible multichannel audio signal.
  • the additional information related to the compatible multichannel audio signal may mean additional information for the scalable channel.
  • the extracted compatible multichannel audio signal may be output as the first output signal as it is or transmitted to the renderer 304.
  • the sound reproduction environment of the decoding apparatus refers to a reproduction environment of a multichannel audio signal associated with the decoding apparatus 102.
  • the sound reproducing environment is determined according to the number and positions of speakers associated with the decoding apparatus.
  • the second decoder 303 may output the audio object signal by decoding the second bitstream.
  • the renderer 304 may synthesize the compatible multichannel audio signal output from the first decoder 302 and the second audio object signal output from the second decoder 303.
  • the rendering unit 304 may synthesize the compatible multichannel audio signal and the audio object signal in consideration of the sound reproduction environment of the decoding device 102.
  • the rendering unit 304 may remove the audio object signal from the compatible multichannel audio signal by using additional information for removing the audio object signal. . Then, the renderer 304 may render the audio object signal transmitted from the second decoder 303 to the compatible multichannel audio signal and output the second output signal.
  • the rendering unit 304 does not need to perform a process of removing the audio object signal from the compatible multichannel audio signal. Meanwhile, the renderer 304 may render the audio object signal on the compatible multichannel audio signal based on the position where the audio object signal is rendered. Here, the position where the audio object signal is rendered may be included in additional information related to the audio object signal.
  • FIG. 4 is a diagram for describing a scalable channel coding method according to an embodiment of the present invention.
  • the scalable channel coding method may be applied to the first encoder 202 of FIG. 2.
  • the first encoder 202 may generate a first bitstream that is a scalable channel bitstream by hierarchically encoding a compatible multichannel audio signal according to the scalable channel encoding scheme.
  • FIG. 4 illustrates a process of encoding according to a scalable channel encoding method when the multichannel audio signal is a 22.2 channel signal. Specifically, FIG. 4 illustrates a process in which 22.2 channel signals are hierarchically encoded into 5.1 channel signals, 10.2 channel signals, and 22.2 channel signals.
  • FIG. 4 is a block diagram of a scalable channel decoder 204, and illustrates a process of decoding a 5.1-channel, 10.2-channel, and 22.2-channel hierarchical encoding bitstream through the encoding process of FIG.
  • the input 22.2 channel signal is downmixed into a 10.2 channel signal through the first downmixing 401.
  • the 22.2 channel signal is converted into a 12 channel signal through the first channel conversion 402 to which the downmix 10.2 channel signal is input.
  • the downmix 10.2 channel signal is downmixed into the downmix 5.1 channel signal through the second downmixing 403.
  • the downmix 5.1 channel signal output through the second downmixing 403 may be encoded according to the base layer encoding 405.
  • the result encoded according to the base layer encoding 405 means a base layer bitstream.
  • the downmix 10.2 channel signal output according to the first downmixing 401 is a 5.1 channel signal through a second channel conversion 404 to which the downmix 5.1 channel signal output according to the second downmixing 403 is input. Is converted to.
  • the converted 5.1 channel signal may be encoded through the first enhancement layer encoding 406.
  • the result encoded through the first enhancement layer encoding 406 means a first enhancement layer bitstream.
  • the 12-channel signal output according to the first channel transform 402 may be encoded through the second enhancement layer encoding 407.
  • the result encoded through the second enhancement layer encoding 407 means a second enhancement layer bitstream.
  • the base layer bitstream, the first enhancement layer bitstream, and the second enhancement layer bitstream may then be multiplexed via bitstream formatting 408 to produce a first bitstream.
  • Information related to downmix and channel conversion generated in the scalable channel encoding process is provided as scalable channel side information for the decoding process of the decoding apparatus 102.
  • the scalable channel coding method refers to a method of encoding a multichannel audio signal of a base layer and a multichannel audio signal of an enhancement layer derived through at least one downmixing and channel conversion.
  • the number of downmixes and channel conversions shown in FIG. 4 may vary depending on the input multichannel audio signal.
  • FIG. 5 is a diagram for describing a scalable channel decoding method according to an embodiment of the present invention.
  • the first bitstream may be demultiplexed into a base layer bitstream, a first enhancement layer bitstream, and a second enhancement layer bitstream through bitstream demultiplexing 501.
  • the base layer stream may be decoded through base layer decoding 502 to output a compatible 5.1 channel signal. Then, the compatible 5.1 channel signal may be output as the 5.1 channel output sound through the first signal conversion 507. In this case, when the compatible 5.1 channel signal is a signal in the frequency domain, the compatible 5.1 channel signal may be converted from the frequency domain to the time domain through the first signal conversion 507.
  • the first enhancement layer bitstream may be output as a 5.1 channel signal through the first enhancement layer decoding 503. Then, the compatible 5.1 channel signal output through the base layer decoding 502 and the 5.1 channel signal output through the first enhancement layer decoding 503 may be synthesized into 10.2 channel signals according to the first channel synthesis 505. . In this case, the first channel synthesis 505 may be processed according to the additional information included in the scalable channel additional information. The synthesized 10.2 channel signal may be output as a 10.2 channel output sound through the second signal conversion 508.
  • the second enhancement layer bitstream may be output as a 12 channel signal through the second enhancement layer decoding 504. Then, the 10.2 channel signal output through the first channel synthesis 505 and the 12 channel signal output through the second enhancement layer decoding 504 may be synthesized into 22.2 channel signals according to the second channel synthesis 506. . In this case, the second channel synthesis 506 may be processed according to the additional information included in the scalable channel additional information. The synthesized 22.2 channel signal may be output as a 22.2 channel output sound through the third signal conversion 509.
  • All processes of FIG. 5 may be performed by the first decoder 502 of the decoder 102.
  • all operations in FIG. 5 are controlled based on the reproduction environment information transmitted from the encoding apparatus 101 or provided by the decoding apparatus 102 itself.
  • the channel configuration for example, 7.1 channel
  • the first channel synthesis 505 and the second channel synthesis ( 506 may include a downmix or upmix process according to another channel configuration. Information necessary for performing the downmix or upmix may be transmitted as additional information in the encoding apparatus 101 or may be used in prediction in the decoding apparatus 102.
  • the scalable channel decoding method refers to a decoding method of decoding a multichannel audio signal of a base layer and a multichannel audio signal of an enhancement layer through at least one upmixing and channel synthesis.
  • FIG. 6 is a diagram for describing a scalable sound quality coding scheme according to an embodiment of the present invention.
  • the scalable sound quality coding method of FIG. 6 may be applied to the first encoder 202 and the second encoder 203 of the encoder 101.
  • an input signal may mean an audio object signal or a compatible multichannel audio signal.
  • the input signal may be processed according to base layer coding 601 and base layer decoding 602.
  • the base layer bitstream may be generated through the base layer encoding 601.
  • a first residual signal which is a difference between the input signal and the synthesized signal output through the base layer decoding 602, is generated.
  • the first residual signal may be processed according to the first enhancement layer encoding 603 and the first enhancement layer decoding 604.
  • the first enhancement layer bitstream may be generated through the first enhancement layer encoding 603.
  • a second residual signal which is a difference between the first residual signal and the synthesized signal output through the first enhancement layer decoding 604, is generated.
  • the second residual signal may be processed according to the second enhancement layer encoding 605 and the second enhancement layer decoding 606.
  • the second enhancement layer bitstream may be generated through the second enhancement layer encoding 605.
  • a third residual signal which is a difference between the second residual signal and the synthesized signal output through the second enhancement layer decoding 606, is generated.
  • the above process is repeated until the output signal of the preset sound quality is derived.
  • the 2 enhancement layer bitstream may be multiplexed through the bitstream formatting 607 and output as the first bitstream or the second bitstream.
  • FIG. 6 may proceed to provide a scalability function for sound quality.
  • the scalable sound quality coding method of FIG. 6 may mean repeatedly performing base layer coding and at least one enhancement layer coding on an input compatible multichannel audio signal or an audio object signal.
  • FIG. 7 is a diagram for describing a scalable sound quality decoding method according to an embodiment of the present invention.
  • an input bitstream means a result of encoding an audio object signal or a compatible multichannel audio signal according to scalable sound quality coding.
  • the input bitstream may be divided into bitstreams for each layer through bitstream demultiplexing 701.
  • the input bitstream may be divided into one base layer bitstream and a plurality of enhancement layer bitstreams through the bitstream demultiplexing 701.
  • the base layer bitstream is output as a base layer output signal through base layer decoding 702.
  • the first enhancement layer bitstream corresponding to the first enhancement layer is decoded through the first enhancement layer decoding 703.
  • the output signal decoded through the first enhancement layer decoding 703 is summed with the base layer output signal and output as the first enhancement layer output signal.
  • the second enhancement layer bitstream corresponding to the second enhancement layer is decoded through the second enhancement layer decoding 704.
  • the output signal decoded through the second enhancement layer decoding 704 is summed with the first enhancement layer output signal and output as a second enhancement layer output signal.
  • the process of FIG. 7 is repeatedly performed according to the input bitstream.
  • FIG 8 illustrates components of an output bitstream according to an embodiment of the present invention.
  • bitstreams that are the results of the encoding through the first encoder 202 and the second encoder 203 of the encoding apparatus 101 are multiplexed through the bitstream formatter 204 to output bits.
  • the stream is created. 8 shows an output bitstream that is a result of multiplexing the bitstream while maintaining compatibility with a decoding apparatus that supports a conventional stereo audio signal or a 5.1 channel audio signal.
  • the output bitstream includes a compatible bitstream structure (legacy 2 / 5.1) associated with a stereo channel (2 channel) or 5.1 channel signal that is an MPEG-2 Audio Backward compatibility bitstream structure to maintain compatibility.
  • the compatible bitstream structure may include a scalable channel signal, a scalable quality signal, an audio object signal, and additional information associated with a stereo channel (2 channel) or 5.1 channel signal.
  • the output bitstream may include a scalable channel signal, a scalable sound quality signal, an audio object signal, and additional information in an additional data area such as an ancillary data area of the MPEG-2 Audio Backward compatibility bitstream structure.
  • a container of a scalable channel signal is composed of a bitstream for each layer and additional information, etc., in which a channel is increased or improved.
  • the container of the scalable sound quality signal includes a bitstream for each layer, additional information, and the like, for improving sound quality.
  • the container of the audio object signal is composed of an audio object signal, additional information related to the audio object signal, extraction information of the audio object signal, and the like.
  • the container of additional information may be configured with additional information inserted into each container of the scalable channel signal, the scalable sound quality signal, and the audio object signal.
  • the container of the additional information is composed of header information, metadata, and the like necessary for initializing each component of the decoding apparatus and the decoding apparatus.
  • FIG. 9 is a diagram illustrating a modularized bitstream according to an embodiment of the present invention.
  • FIG. 9 illustrates a case in which a coded output bitstream, such as a network abstraction layer (NAL) unit used in H.264 / AVC, can be cooked according to a transmission environment.
  • NAL network abstraction layer
  • FIG. 9 illustrates a result of modularizing bitstreams output from each component constituting the encoding device so that the decoding device can easily select and process necessary information from the output bitstream.
  • FIG. 9 illustrates an output bitstream composed of a core layer (basic multichannel signal), two channel enhancement layers, one sound quality enhancement layer, and two object signal layers using the processing unit PU illustrated in FIG. 10. , The configuration of the processing units included in the frame and the order in which they are transmitted. dependency_id (dependency ID) indicates that information of the previous layer is needed to decrypt the processing unit.
  • the block number indicates a pu_type of FIG. 11.
  • a sequence header including information necessary for initializing a decoding apparatus is delivered, and then a frame header and frame metadata are disposed.
  • the bitstreams output from the respective coding blocks are divided into core block data and channel / sound / object enhancement layer data.
  • information necessary for each coding block (first encoder and second encoder) or bitstream is also disposed.
  • the decoding apparatus may generate the audio signal to be output after selecting the processing unit thus delivered according to the sound reproduction environment of the decoding apparatus or the user's preference.
  • FIG. 10 illustrates the basic structure of a modular bitstream according to an embodiment of the present invention.
  • FIG. 10 illustrates a basic structure of a result of modularizing the bitstream illustrated in FIG. 8, and may be a basic unit configuring an output bitstream.
  • This basic unit is defined as a processing unit (PU), and a header of a processing unit is allocated with information such as random_access (1 bit), dependency_id (3 bits), and su_type (4 bits).
  • random_access is a flag indicating whether decryption is possible in the processing unit without information of the previous layer
  • dependency_id (dependency ID) indicates that information of the previous layer is needed to decrypt the processing unit. For example, if dependency_id is 1, this means that one previous layer (that is, the base layer) is required.
  • pu_type indicates the type of bitstream input to the payload of the processing unit. The pu_type will be described in detail with reference to FIG. 11.
  • FIG. 11 is a diagram illustrating a type of payload of a processing unit (PU) in a bitstream basic structure according to an embodiment of the present invention.
  • pu_type indicates the type of bitstream input to the payload of the processing unit.
  • a sequence header indicates a header of an output bitstream input to the encoding apparatus.
  • the frame header indicates a header for each frame.
  • the payload of the processing unit is an access unit (AU) which is an encoded bitstream extracted from the components of the encoding apparatus.
  • FIG. 12 illustrates a process of restoring an audio signal according to an audio reproduction environment according to an embodiment of the present invention.
  • a 7.1-channel audio signal may be distributed and encoded into three components, 2 channel stereo, 3.1 channel extension A, and 2 channel extension B.
  • the distributed encoding result may be multiplexed and transmitted in one entire bitstream.
  • the terminal capable of reproducing the stereo signal can extract and reproduce only the bitstream related to the two-channel stereo from the entire bitstream.
  • a terminal capable of reproducing a 5.1-channel signal reproduces a 5.1-channel signal using a 2-channel stereo bitstream and a 3.1-channel extended A bitstream.
  • the terminal capable of reproducing the 7.1-channel signal can reproduce the 7.1-channel signal using all the bitstreams included in the entire bitstream.
  • the audio signal is adapted to the reproduction environment of the terminal by utilizing the required bitstream in the entire bitstream without performing an additional conversion process. Can be restored
  • FIG. 13 is a diagram illustrating an encoding method according to an embodiment of the present invention.
  • the encoding apparatus 101 may synthesize the input audio object signal and the multichannel audio signal to generate a compatible multichannel audio signal.
  • the encoding apparatus 101 may generate a bitstream related to the audio object signal by encoding the input audio object signal.
  • the encoding apparatus 101 may hierarchically encode the audio object signal according to the scalable sound quality encoding scheme.
  • the encoding apparatus 101 may encode a compatible multichannel audio signal to generate a bitstream associated with the compatible multichannel audio signal.
  • the encoding apparatus 101 may hierarchically encode a compatible multichannel audio signal according to the scalable sound quality encoding method or the scalable channel encoding method.
  • the encoding apparatus 101 may finally generate an output bitstream by multiplexing the generated bitstreams. Meanwhile, the encoding apparatus 101 may also include additional information related to the audio object signal and the compatible multichannel audio signal in the output bitstream.
  • FIG. 14 is a diagram illustrating a decoding method according to an embodiment of the present invention.
  • the decoding apparatus 102 may demultiplex an output bitstream transmitted from the encoding apparatus 101. Then, the first bitstream encoded with the compatible multichannel audio signal and the second bitstream encoded with the audio object signal may be distinguished from the output bitstream.
  • the decoding device 102 may output a compatible multichannel audio signal by decoding the first bitstream.
  • the decoding apparatus 102 may extract a compatible multichannel audio signal from the first bitstream according to the scalable sound quality decoding method or the scalable channel decoding method.
  • the output compatible multichannel audio signal may be externally output as it is.
  • the decoding apparatus 102 may output an audio object signal by decoding the second bitstream.
  • the decoding apparatus 102 may output an audio object signal from the second bitstream according to the scalable sound quality decoding scheme.
  • the decoding apparatus 102 may derive the rendered result by combining the compatible multichannel audio signal and the audio object signal.
  • the decoding apparatus 102 may combine the audio object signals in consideration of the position or arrangement of the loudspeakers, which are sound reproduction environments.
  • the decoding apparatus 102 may derive the multi-channel audio signal to be finally output through repeated channel conversion and synthesis from the compatible multi-channel audio signal in consideration of the position or arrangement of the loudspeaker which is the sound reproduction environment.
  • Methods according to an embodiment of the present invention can be implemented in the form of program instructions that can be executed by various computer means and recorded in a computer readable medium.
  • the computer readable medium may include program instructions, data files, data structures, etc. alone or in combination.
  • Program instructions recorded on the media may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well-known and available to those having skill in the computer software arts.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Mathematical Physics (AREA)
  • Stereophonic System (AREA)

Abstract

L'invention concerne un appareil d'encodage et un appareil de décodage prenant en charge des signaux audio multicanaux pouvant être mis à l'échelle, et un procédé pour les appareils effectuant ces encodage et décodage. Lors de la compression/récupération des signaux audio multicanaux pour compresser et recréer des sons tridimensionnels de haute qualité, les appareils et le procédé peuvent fournir les fonctions suivantes dans une structure intégrée : (1) une fonction de mise à l'échelle de qualité du son pour fournir un signal audio présentant un grand nombre de qualités de son qui s'adaptent à un environnement de transmission, à la performance d'un terminal, et à un environnement d'écoute ; (2) une fonction de mise à l'échelle de canal pour fournir des signaux multicanaux présentant un grand nombre de formats qui s'adaptent à l'environnement de transmission, à la performance du terminal, et à un environnement de reproduction (environnement d'agencement de haut-parleurs) du terminal ; et (3) une fonction de mise à l'échelle d'objet pour commander individuellement des objets audio spécifiques de manière à augmenter à un maximum un effet de niveau sonore tridimensionnel.
PCT/KR2012/009543 2011-11-14 2012-11-13 Appareil d'encodage et appareil de décodage prenant en charge un signal audio multicanal pouvant être mis à l'échelle, et procédé pour des appareils effectuant ces encodage et décodage WO2013073810A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US14/358,104 US20140310010A1 (en) 2011-11-14 2012-11-13 Apparatus for encoding and apparatus for decoding supporting scalable multichannel audio signal, and method for apparatuses performing same

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
KR20110118102 2011-11-14
KR10-2011-0118102 2011-11-14
KR10-2012-0127499 2012-11-12
KR1020120127499A KR102172279B1 (ko) 2011-11-14 2012-11-12 스케일러블 다채널 오디오 신호를 지원하는 부호화 장치 및 복호화 장치, 상기 장치가 수행하는 방법

Publications (1)

Publication Number Publication Date
WO2013073810A1 true WO2013073810A1 (fr) 2013-05-23

Family

ID=48429830

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2012/009543 WO2013073810A1 (fr) 2011-11-14 2012-11-13 Appareil d'encodage et appareil de décodage prenant en charge un signal audio multicanal pouvant être mis à l'échelle, et procédé pour des appareils effectuant ces encodage et décodage

Country Status (1)

Country Link
WO (1) WO2013073810A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9805727B2 (en) 2013-04-03 2017-10-31 Dolby Laboratories Licensing Corporation Methods and systems for generating and interactively rendering object based audio
CN110677369A (zh) * 2014-07-09 2020-01-10 韩国电子通信研究院 用于使用分层划分多路复用来传送广播信号的设备和方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050157883A1 (en) * 2004-01-20 2005-07-21 Jurgen Herre Apparatus and method for constructing a multi-channel output signal or for generating a downmix signal
US20100324915A1 (en) * 2009-06-23 2010-12-23 Electronic And Telecommunications Research Institute Encoding and decoding apparatuses for high quality multi-channel audio codec
US20110022402A1 (en) * 2006-10-16 2011-01-27 Dolby Sweden Ab Enhanced coding and parameter representation of multichannel downmixed object coding

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050157883A1 (en) * 2004-01-20 2005-07-21 Jurgen Herre Apparatus and method for constructing a multi-channel output signal or for generating a downmix signal
US20110022402A1 (en) * 2006-10-16 2011-01-27 Dolby Sweden Ab Enhanced coding and parameter representation of multichannel downmixed object coding
US20100324915A1 (en) * 2009-06-23 2010-12-23 Electronic And Telecommunications Research Institute Encoding and decoding apparatuses for high quality multi-channel audio codec

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9805727B2 (en) 2013-04-03 2017-10-31 Dolby Laboratories Licensing Corporation Methods and systems for generating and interactively rendering object based audio
US10276172B2 (en) 2013-04-03 2019-04-30 Dolby Laboratories Licensing Corporation Methods and systems for generating and interactively rendering object based audio
US10553225B2 (en) 2013-04-03 2020-02-04 Dolby Laboratories Licensing Corporation Methods and systems for rendering object based audio
US10832690B2 (en) 2013-04-03 2020-11-10 Dolby Laboratories Licensing Corporation Methods and systems for rendering object based audio
US11270713B2 (en) 2013-04-03 2022-03-08 Dolby Laboratories Licensing Corporation Methods and systems for rendering object based audio
US11769514B2 (en) 2013-04-03 2023-09-26 Dolby Laboratories Licensing Corporation Methods and systems for rendering object based audio
CN110677369A (zh) * 2014-07-09 2020-01-10 韩国电子通信研究院 用于使用分层划分多路复用来传送广播信号的设备和方法

Similar Documents

Publication Publication Date Title
KR101283783B1 (ko) 고품질 다채널 오디오 부호화 및 복호화 장치
KR102172279B1 (ko) 스케일러블 다채널 오디오 신호를 지원하는 부호화 장치 및 복호화 장치, 상기 장치가 수행하는 방법
WO2014021588A1 (fr) Procédé et dispositif de traitement de signal audio
JP6088444B2 (ja) 3次元オーディオサウンドトラックの符号化及び復号
WO2009123409A2 (fr) Procédé et appareil de génération de flux de bits d'information additionnels de signal audio multi-objet
JP5174527B2 (ja) 音像定位音響メタ情報を付加した音響信号多重伝送システム、制作装置及び再生装置
US20100324915A1 (en) Encoding and decoding apparatuses for high quality multi-channel audio codec
RU2323551C1 (ru) Частотно-ориентированное кодирование каналов в параметрических системах многоканального кодирования
TW202007189A (zh) 以後向可相容音訊傳送同步化增強式音訊傳送
US20050273322A1 (en) Audio signal encoding and decoding apparatus
TW201907391A (zh) 用於高階立體環繞聲之音訊資料之分層中間壓縮
WO2009134085A2 (fr) Procédé et appareil d'émission/réception de signaux audio multicanaux au moyen d'une super trame
TW202007191A (zh) 在後向相容音訊位元串流中內嵌增強式音訊傳輸
WO2014021586A1 (fr) Procédé et dispositif de traitement de signal audio
WO2013073810A1 (fr) Appareil d'encodage et appareil de décodage prenant en charge un signal audio multicanal pouvant être mis à l'échelle, et procédé pour des appareils effectuant ces encodage et décodage
KR101003415B1 (ko) Dmb 신호의 디코딩 방법 및 이의 디코딩 장치
KR100917844B1 (ko) 멀티채널 오디오 신호를 전송 또는 재생하는 장치 및 방법
TW202002679A (zh) 使用不同呈現器呈現音訊資料的不同部分
US11062713B2 (en) Spatially formatted enhanced audio data for backward compatible audio bitstreams
WO2014058275A1 (fr) Dispositif et méthode de production de données audios, et dispositif et méthode de lecture de données audios
JP7182751B1 (ja) チャネルベースオーディオからオブジェクトベースオーディオへの変換のためのシステム、方法、及び機器
WO2015147433A1 (fr) Appareil et procédé pour traiter un signal audio
KR20130078534A (ko) 7.1채널 코덱을 사용한 서라운드 사운드를 제공하는 전방 음장 합성 시스템 및 방법
KR20110085155A (ko) 실시간 스트리밍을 위한 오디오 생성장치, 오디오 재생장치 및 그 방법
KR100208004B1 (ko) 상하위 채널 오디오를 이용한 입체 음향 재생 장치 및 방법

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12848852

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 14358104

Country of ref document: US

122 Ep: pct application non-entry in european phase

Ref document number: 12848852

Country of ref document: EP

Kind code of ref document: A1