KR20210145832A

KR20210145832A - Optimizing loudness and dynamic range across different playback devices

Info

Publication number: KR20210145832A
Application number: KR1020217037771A
Authority: KR
Inventors: 제프리 리드밀러; 스캇 그레고리 노크로스; 칼 조나스 로덴
Original assignee: 돌비 레버러토리즈 라이쎈싱 코오포레이션; 돌비 인터네셔널 에이비
Priority date: 2013-01-21
Filing date: 2014-01-15
Publication date: 2021-12-02
Also published as: BR122020007931B1; KR20170001717A; RU2018128291A3; RU2631139C2; HK1213374A1; US20220019404A1; JP6680858B2; JP2022166331A; KR102473260B1; KR102194120B1; EP2946469B1; RU2665873C1; US9841941B2; JP2023175019A; US20240103801A1; JP2016507779A; CN104937844B; JP2021089444A; BR112015017064B1; RU2018128291A

Abstract

실시예들은 비트스트림에서 오디오 데이터와 연관된 메타데이터를 수신하며, 제 1 그룹의 오디오 재생 디바이스들에 대한 라우드니스 파라미터가 상기 비트스트림에서 이용 가능한지 여부를 결정하기 위해 상기 메타데이터를 분석하기 위한 방법 및 시스템에 관한 것이다. 상기 파라미터들이 상기 제 1 그룹을 위해 존재한다고 결정하는 것에 응답하여, 상기 시스템은 오디오를 렌더링하기 위해 상기 파라미터들 및 오디오 데이터를 사용한다. 상기 라우드니스 파라미터들이 상기 제 1 그룹을 위해 존재하지 않는다고 결정하는 것에 응답하여, 상기 시스템은 상기 제 1 그룹의 하나 이상의 특성들을 분석하며, 상기 하나 이상의 특성들에 기초하여 상기 파라미터를 결정한다. Embodiments provide a method and system for receiving metadata associated with audio data in a bitstream and analyzing the metadata to determine whether a loudness parameter for a first group of audio reproduction devices is available in the bitstream is about In response to determining that the parameters are present for the first group, the system uses the parameters and audio data to render audio. In response to determining that the loudness parameters do not exist for the first group, the system analyzes one or more characteristics of the first group and determines the parameter based on the one or more characteristics.

Description

Optimizing loudness and dynamic range across different playback devices {OPTIMIZING LOUDNESS AND DYNAMIC RANGE ACROSS DIFFERENT PLAYBACK DEVICES}

관련 출원들에 대한 상호 참조CROSS-REFERENCE TO RELATED APPLICATIONS

본 출원은, 모두 참조로서 여기에 통합되는, 2013년 1월 21일에 출원된 미국 가출원 번호 제61/754,882호; 2013년 4월 5일에 출원된 미국 가출원 번호 제61/809,250호; 및 2013년 5월 16일에 출원된 미국 가출원 번호 제61/824,010호에 대한 우선권을 주장한다. This application is incorporated herein by reference in its entirety, in U.S. Provisional Application Serial Nos. 61/754,882, filed on January 21, 2013; U.S. Provisional Application No. 61/809,250, filed April 5, 2013; and U.S. Provisional Application No. 61/824,010, filed on May 16, 2013.

하나 이상의 실시예들은 일반적으로 오디오 신호 프로세싱에 관한 것이며, 보다 구체적으로 재생 환경들 및 디바이스들에 기초한 오디오 콘텐트의 라우드니스(loudness) 및 동적 범위 특성들을 나타내는 메타데이터를 갖는 오디오 데이터 비트스트림들을 프로세싱하는 것에 관한 것이다. One or more embodiments relate generally to audio signal processing, and more particularly to processing audio data bitstreams having metadata indicative of loudness and dynamic range characteristics of audio content based on playback environments and devices. it's about

배경 부분에서 논의된 주제는 단지 배경 부분에서 언급되고 있다는 이유만으로 종래 기술인 것으로 추정되어서는 안 된다. 유사하게, 배경 부분에 언급되거나 또는 배경 부분의 주제와 연관된 문제점은 종래 기술에서 이전에 인식되어 온 것으로 추정되어서는 안 된다. 배경 부분에서의 주제는 단지 상이한 접근법들을 나타내는 것일 뿐이며, 이것은 본질적으로 및 그 자체로 또한 발명들이 될 수 있다. The subject matter discussed in the background section should not be presumed to be prior art merely because it is recited in the background section. Similarly, problems mentioned in the background section or associated with the subject matter of the background section should not be presumed to have been previously recognized in the prior art. The subject matter in the background part merely represents different approaches, which in nature and per se can also be inventions.

오디오 신호의 동적 범위는 신호에 포함된 사운드의 최대 및 최소 가능한 값들 사이에서의 비이며, 보통 데시벨(베이스-10) 값으로서 측정된다. 많은 오디오 프로세싱 시스템들에서, 동적 범위 제어(또는 동적 범위 압축, DRC)는 넓은 동적 범위 소스 콘텐트를, 전자 장비를 사용하여 보다 쉽게 저장되며 재생될 수 있는 보다 좁은 레코딩된 동적 범위에 맞추기 위해 큰 사운드들의 레벨을 감소시키고 및/또는 조용한 사운드들의 레벨을 증폭시키기 위해 사용된다. 오디오/비주얼(AV) 콘텐트에 대해, DRC 메커니즘을 통한 압축을 위해 "널(null)" 포인트를 정의하도록 다이얼로그 기준 레벨이 사용될 수 있다. DRC는 다이얼로그 기준 레벨 아래의 콘텐트를 부스팅(boost)하며 기준 레벨 위의 콘텐트를 컷팅하도록 동작한다. The dynamic range of an audio signal is the ratio between the maximum and minimum possible values of the sound contained in the signal, and is usually measured as a decibel (base-10) value. In many audio processing systems, dynamic range control (or dynamic range compression, DRC) provides large sound to fit wide dynamic range source content into a narrower recorded dynamic range that can be more easily stored and played back using electronic equipment. used to reduce the level of noise and/or amplify the level of quiet sounds. For audio/visual (AV) content, a dialog reference level may be used to define a “null” point for compression via the DRC mechanism. The DRC operates to boost content below the dialog reference level and cut the content above the reference level.

알려진 오디오 인코딩 시스템에서, 오디오 신호와 연관된 메타데이터는 콘텐트의 유형 및 의도된 사용에 기초하여 DRC 레벨을 설정하기 위해 사용된다. DRC 모드는 오디오 신호에 적용된 압축의 양을 설정하며 디코더의 출력 기준 레벨을 정의한다. 이러한 시스템들은 인코더로 프로그램되며 사용자에 의해 선택되는 두 개의 DRC 레벨 설정들에 제한될 수 있다. 예를 들면, -31 dB(라인)의 Dialnorm(다이얼로그 정규화) 값이 AVR 또는 전체 동적 범위 가능 디바이스들 상에서 재생되는 콘텐트를 위해 전통적으로 사용되며, -20 dB(RF)의 Dialnorm 값이 텔레비전 세트들 또는 유사한 디바이스들 상에서 재생된 콘텐트를 위해 사용된다. 이러한 유형의 시스템은 단일 오디오 비트스트림이, 두 개의 상이한 세트들의 DRC 메타데이터들의 사용을 통해 두 개의 일반적이지만 매우 상이한 재생 시나리오들에서 사용되도록 허용한다. 그러나, 이러한 시스템들은 사전 설정된 Dialnorm 값들에 제한되며 디지털 미디어 및 인터넷-기반 스트리밍 기술의 출현을 통해 현재 가능한 매우 다양한 상이한 재생 디바이스들 및 청취 환경들에서의 재생에 대해 최적화되지 않았다.In known audio encoding systems, metadata associated with an audio signal is used to set the DRC level based on the type of content and its intended use. DRC mode sets the amount of compression applied to the audio signal and defines the output reference level of the decoder. These systems are programmed with the encoder and can be limited to two DRC level settings selected by the user. For example, a Dialnorm (dialog normalization) value of -31 dB (line) is traditionally used for content played on AVR or full dynamic range capable devices, and a Dialnorm value of -20 dB (RF) is used for television sets. or for content played on similar devices. This type of system allows a single audio bitstream to be used in two general but very different playback scenarios through the use of two different sets of DRC metadata. However, these systems are limited to preset Dialnorm values and have not been optimized for playback in a wide variety of different playback devices and listening environments currently available through the advent of digital media and Internet-based streaming technology.

현재 메타데이터-기반 오디오 인코딩 시스템들에서, 오디오 데이터의 스트림은 오디오 콘텐트(예로서, 오디오 콘텐트의 하나 이상의 채널들) 및 상기 오디오 콘텐트의 적어도 하나의 특성을 나타내는 메타데이터 양쪽 모두를 포함할 수 있다. 예를 들면, AC-3 비트스트림에서, 청취 환경에 전달된 프로그램의 사운드를 변경할 때 사용하는데 특히 의도된 여러 개의 오디오 메타데이터 파라미터들이 있다. 상기 메타데이터 파라미터들 중 하나는 오디오 프로그램에서 발생한 다이얼로그의 평균 라우드니스 레벨(또는 콘텐트의 평균 라우드니스)을 나타내는 Dialnorm 파라미터이며, 오디오 재생 신호 레벨을 결정하기 위해 사용된다. In current metadata-based audio encoding systems, a stream of audio data may include both audio content (eg, one or more channels of audio content) and metadata representing at least one characteristic of the audio content. . For example, in the AC-3 bitstream, there are several audio metadata parameters specifically intended for use when changing the sound of a program delivered to the listening environment. One of the metadata parameters is a Dialnorm parameter indicating an average loudness level (or average loudness of content) of a dialog generated in an audio program, and is used to determine an audio reproduction signal level.

상이한 오디오 프로그램 세그먼트들(각각은 상이한 Dialnorm 파라미터를 가진다)의 시퀀스를 포함한 비트스트림의 재생 동안, AC-3 디코더는 세그먼트의 다이얼로그의 지각된 라우드니스가 일관된 레벨에 있도록 세그먼트의 재생 레벨 또는 라우드니스를 변경하는 라우드니스 프로세싱의 유형을 수행하도록 각각의 세그먼트의 Dialnorm 파라미터를 사용한다. 인코딩된 오디오 아이템들의 시퀀스에서 각각의 인코딩된 오디오 세그먼트(아이템)는 (일반적으로) 상이한 Dialnorm 파라미터를 가지며, 재생 동안 아이템들의 상이한 레벨들에 대한 상이한 이득 양들의 적용을 요구할 수 있지만, 디코더는 각각의 아이템에 대한 다이얼로그의 라우드니스 또는 재생 레벨이 동일하거나 또는 매우 유사하도록 아이템들의 각각의 레벨을 스케일링할 것이다. During playback of a bitstream containing a sequence of different audio program segments (each having a different Dialnorm parameter), the AC-3 decoder changes the playback level or loudness of the segment so that the perceived loudness of the segment's dialog is at a consistent level. Use the Dialnorm parameter of each segment to perform some type of loudness processing. Each encoded audio segment (item) in the sequence of encoded audio items has (generally) a different Dialnorm parameter and may require application of different gain amounts to different levels of items during playback, but the decoder Each level of the items will be scaled so that the loudness or reproduction level of the dialog for the item is the same or very similar.

일부 실시예들에서, Dialnorm 파라미터는 사용자에 의해 설정되며, 어떤 값도 사용자에 의해 설정되지 않는다면 디폴트 Dialnorm 값이 있지만, 자동으로 생성되지 않는다. 예를 들면, 콘텐트 생성기는 AC-3 인코더의 외부에 있는 디바이스로 라우드니스 측정들을 할 수 있으며 그 후 Dialnorm 값을 설정하기 위해 그 결과(오디오 프로그램의 음성 다이얼로그의 라우드니스를 나타내는)를 인코더에 전송하여 Dialnorm 값을 설정하게 한다. 따라서, Dialnorm 파라미터를 정확하게 설정하기 위해서는 콘텐트 생성기를 의존해야한다.In some embodiments, the Dialnorm parameter is set by the user, and there is a default Dialnorm value if no value is set by the user, but is not automatically generated. For example, the content generator may make loudness measurements with a device external to the AC-3 encoder and then send the result (representing the loudness of the audio program's voice dialog) to the encoder to set the Dialnorm value to the Dialnorm value. to set the value. Therefore, you have to rely on the content generator to set the Dialnorm parameter correctly.

AC-3 비트스트림에서 Dialnorm 파라미터가 부정확할 수 있는 여러 개의 상이한 이유들이 있다. 첫 번째로, 각각의 AC-3 인코더는 Dialnorm 값이 콘텐트 생성기에 의해 설정되지 않는다면 비트스트림의 생성 동안 사용되는 디폴트 Dialnorm 값을 가진다. 이러한 디폴트 값은 오디오의 실제 다이얼로그 라우드니스 레벨과 상당히 상이할 수 있다. 두 번째로, 콘텐트 생성기가 라우드니스를 측정하여 그에 따라 Dialnorm 값을 설정하더라도, 추천된 라우드니스 측정 방법을 따르지 않는 라우드니스 측정 알고리즘 또는 미터가 사용되었을 수 있어, 부정확한 Dialnorm 값을 야기할 수 있다. 세 번째로, AC-3 비트스트림이 측정된 Dialnorm 값으로 생성되어 콘텐트 생성기에 의해 정확하게 설정되더라도, 그것은 비트스트림의 송신 및/또는 저장 동안 중간 모듈에 의해 부정확한 값으로 변경되었을 수 있다. 예를 들면, 텔레비전 방송 애플리케이션들에서 AC-3 비트스트림들이 부정확한 Dialnorm 메타데이터 정보를 사용하여 디코딩되고, 변경되며 그 후 재-인코딩되는 것은 드물지 않다. 따라서, AC-3 비트스트림에 포함된 Dialnorm 값은 부정확하거나 또는 오류가 있을 수 있으며 그러므로 청취 경험의 품질에 대한 부정적인 영향을 줄 수 있다.There are several different reasons why the Dialnorm parameter in an AC-3 bitstream may be incorrect. First, each AC-3 encoder has a default Dialnorm value used during generation of the bitstream unless the Dialnorm value is set by the content generator. This default value can be significantly different from the actual dialog loudness level of the audio. Second, even if the content generator measures the loudness and sets the Dialnorm value accordingly, a loudness measurement algorithm or meter that does not follow the recommended loudness measurement method may be used, which may cause an inaccurate Dialnorm value. Third, even if the AC-3 bitstream is generated with the measured Dialnorm value and set correctly by the content generator, it may have been changed to an incorrect value by an intermediate module during transmission and/or storage of the bitstream. For example, in television broadcast applications it is not uncommon for AC-3 bitstreams to be decoded, modified and then re-encoded using incorrect Dialnorm metadata information. Therefore, the Dialnorm value included in the AC-3 bitstream may be inaccurate or erroneous and may therefore have a negative impact on the quality of the listening experience.

뿐만 아니라, Dialnorm 파라미터는 대응하는 오디오 데이터의 라우드니스 프로세싱 상태(예로서, 어떤 유형(들)의 라우드니스 프로세싱이 오디오 데이터에 대해 수행되어 왔는지)를 나타내지 않는다. 부가적으로, 돌비 디지털(DD) 및 돌비 디지털 플러스(DD+) 시스템들에서의 시스템들과 같은, 현재 배치된 라우드니스 및 DRC 시스템들은 소비자의 거실 또는 영화관에서 AV 콘텐트를 렌더링하도록 설계되었다. 다른 환경들 및 청취 장비(예로서, 이동 디바이스)에서의 재생에 이러한 콘텐트를 맞추기 위해서는, 상기 청취 환경에 AV 콘텐트를 맞추기 위해 재생 디바이스에서 후-처리가 '맹목적으로' 적용되어야 한다. 다시 말해서, 후-처리기(또는 디코더)는 수신된 콘텐트의 라우드니스 레벨이 특정한 레벨(예로서, -31 또는 -20 dB)에 있다고 추정하며 상기 후-처리기는 특정한 디바이스에 적합한 미리 결정된 고정 타겟 레벨로 상기 레벨을 설정한다. 상기 추정된 라우드니스 레벨 또는 미리 결정된 타겟 레벨이 부정확하다면, 후-처리는 그것의 의도된 효과와는 반대의 효과를 가질 수 있으며; 즉, 후-처리는 사용자에게 덜 바람직하게 출력 오디오를 만들 수 있다.Furthermore, the Dialnorm parameter does not indicate the loudness processing state of the corresponding audio data (eg, what type(s) of loudness processing has been performed on the audio data). Additionally, currently deployed loudness and DRC systems, such as those in Dolby Digital (DD) and Dolby Digital Plus (DD+) systems, are designed to render AV content in a consumer's living room or movie theater. In order to adapt such content to playback in other environments and listening equipment (eg mobile device), post-processing must be 'blindly' applied at the playback device to fit AV content to the listening environment. In other words, the post-processor (or decoder) assumes that the loudness level of the received content is at a certain level (eg -31 or -20 dB) and the post-processor returns to a predetermined fixed target level suitable for the particular device. Set the level. If the estimated loudness level or the predetermined target level is incorrect, post-processing may have an effect opposite to its intended effect; That is, post-processing may make the output audio less desirable to the user.

개시된 실시예들은 AC-3 비트스트림, E-AC-3 비트스트림, 또는 돌비 E 비트스트림과 함께 사용하는데 제한되지는 않지만, 편리함을 위해, 이러한 비트스트림들은 라우드니스 프로세싱 상태 메타데이터를 포함하는 시스템과 함께 논의될 것이다. 돌비, 돌비 디지털, 돌비 디지털 플러스, 및 돌비 E는 돌비 래버러토리스 라이센싱 코포레이션(Dolby Laboratories Licensing Corporation)의 상표들이다. 돌비 래버러토리스는 돌비 디지털 및 돌비 디지털 플러스로서 각각 알려진 AC-3 및 E-AC-3의 독점 구현들을 제공한다. Although the disclosed embodiments are not limited for use with AC-3 bitstreams, E-AC-3 bitstreams, or Dolby E bitstreams, for convenience, these bitstreams are will be discussed together. Dolby, Dolby Digital, Dolby Digital Plus, and Dolby E are trademarks of Dolby Laboratories Licensing Corporation. Dolby Laboratories offers proprietary implementations of AC-3 and E-AC-3, known as Dolby Digital and Dolby Digital Plus, respectively.

본 발명은 상이한 재생 디바이스들에 걸쳐 라우드니스 및 동적 범위를 최적화하기 위한 개선된 시스템 및 방법을 제공한다.The present invention provides an improved system and method for optimizing loudness and dynamic range across different playback devices.

실시예들은 오디오 데이터와 연관된 메타데이터를 포함하는 비트스트림을 수신함으로써, 상기 오디오 데이터를 디코딩하며, 제 1 그룹의 오디오 재생 디바이스들에 대한 라우드니스 파라미터가 상기 비트스트림에서 이용 가능한지 여부를 결정하기 위해 상기 비트스트림내의 메타데이터를 분석하기 위한 방법에 관한 것이다. 상기 제 1 그룹에 대한 상기 파라미터들이 존재한다는 결정에 응답하여, 프로세싱 구성요소는 오디오를 렌더링하기 위해 상기 파라미터들 및 오디오 데이터를 사용한다. 상기 제 1 그룹에 대한 상기 라우드니스 파라미터들이 존재하지 않는다는 결정에 응답하여, 상기 프로세싱 구성요소는 상기 제 1 그룹의 하나 이상의 특성들을 분석하며 상기 하나 이상의 특성들에 기초하여 상기 파라미터를 결정한다. 상기 방법은 재생을 위해 상기 오디오를 렌더링하는 다운스트림 모듈로 상기 파라미터 및 오디오 데이터를 송신함으로써 오디오를 렌더링하기 위해 상기 파라미터들 및 오디오 데이터를 추가로 사용할 수 있다. 상기 파라미터 및 오디오 데이터는 또한 상기 파라미터 및 오디오 데이터에 기초하여 상기 오디오 데이터를 렌더링함으로써 오디오를 렌더링하기 위해 사용될 수 있다.Embodiments decode the audio data by receiving a bitstream comprising metadata associated with the audio data, wherein a loudness parameter for a first group of audio reproduction devices is available in the bitstream. A method for analyzing metadata in a bitstream. In response to determining that the parameters for the first group exist, a processing component uses the parameters and audio data to render audio. In response to determining that the loudness parameters for the first group do not exist, the processing component analyzes one or more characteristics of the first group and determines the parameter based on the one or more characteristics. The method may further use the parameters and audio data to render audio by sending the parameters and audio data to a downstream module that renders the audio for playback. The parameters and audio data may also be used to render audio by rendering the audio data based on the parameters and audio data.

실시예에서, 상기 방법은 또한 상기 수신된 오디오 스트림을 렌더링할 출력 디바이스를 결정하는 단계, 및 상기 출력 디바이스가 상기 제 1 그룹의 오디오 재생 디바이스들에 속하는지 여부를 결정하는 단계를 포함하며, 여기에서 상기 제 1 그룹의 오디오 재생 디바이스들에 대한 라우드니스 파라미터가 이용 가능하지 여부를 결정하기 위해 상기 스트림내의 메타데이터를 분석하는 단계는 상기 출력 디바이스가 상기 제 1 그룹의 오디오 재생 디바이스들에 속하는 것을 결정하는 단계 후 실행된다. 일 실시예에서, 상기 출력 디바이스가 상기 제 1 그룹의 오디오 재생 디바이스들에 속하는 것을 결정하는 단계는: 상기 출력 디바이스의 아이덴티티를 나타내거나 또는 상기 출력 디바이스를 포함하는 디바이스들의 그룹의 아이덴티티를 나타내는 표시를 상기 출력 디바이스에 접속된 모듈로부터 수신하는 단계, 및 상기 수신된 표시에 기초하여 상기 출력 디바이스가 상기 제 1 그룹의 오디오 재생 디바이스들에 속하는 것을 결정하는 단계를 포함한다. In an embodiment, the method also comprises determining an output device to render the received audio stream, and determining whether the output device belongs to the first group of audio reproduction devices, wherein parsing the metadata in the stream to determine whether a loudness parameter is available for the first group of audio reproduction devices in: determining that the output device belongs to the first group of audio reproduction devices is executed after the step. In an embodiment, determining that the output device belongs to the first group of audio reproduction devices comprises: an indication indicating the identity of the output device or indicating the identity of a group of devices comprising the output device receiving from a module connected to the output device, and determining that the output device belongs to the first group of audio reproduction devices based on the received indication.

실시예들은 또한 상기 인코딩 방법 실시예들에서 설명된 동작들을 수행하는 프로세싱 구성요소들을 포함하는 장치 또는 시스템에 관한 것이다. Embodiments also relate to an apparatus or system comprising processing components for performing the operations described in the encoding method embodiments above.

실시예들은 또한 오디오 데이터 및 상기 오디오 데이터와 연관된 메타데이터를 수신함으로써 상기 오디오 데이터를 디코딩하고, 제 1 그룹의 오디오 디바이스들에 대한 라우드니스 파라미터들과 연관된 라우드니스 정보가 상기 스트림에서 이용 가능하지 여부를 결정하기 위해 상기 비트스트림내의 메타데이터를 분석하며, 상기 제 1 그룹에 대한 라우드니스 정보가 존재하는 것에 대한 결정에 응답하여, 상기 스트림으로부터 라우드니스 정보를 결정하고, 오디오를 렌더링하는데 사용하기 위해 상기 오디오 데이터 및 라우드니스 정보를 송신하거나 또는, 상기 제 1 그룹에 대한 라우드니스 정보가 존재하지 않는다면, 출력 프로파일과 연관된 라우드니스 정보를 결정하며, 오디오를 렌더링하는데 사용하기 위해 상기 출력 프로파일에 대한 상기 결정된 라우드니스 정보를 송신하는 방법에 관한 것이다. 일 실시예에서, 출력 프로파일과 연관된 라우드니스 정보를 결정하는 단계는 상기 출력 프로파일의 특성들을 분석하는 단계, 상기 특성들에 기초하여 상기 파라미터들을 결정하는 단계를 포함하며, 결정된 라우드니스 정보를 송신하는 단계는 상기 결정된 파라미터들을 송신하는 단계를 포함한다. 상기 라우드니스 정보는 출력 프로파일에 대한 라우드니스 파라미터들 또는 그것의 특성들을 포함할 수 있다. 실시예에서, 상기 방법은 송신될 저 비트 레이트 인코딩 스트림을 결정하는 단계를 더 포함할 수 있으며, 여기에서 상기 라우드니스 정보는 하나 이상의 출력 프로파일들에 대한 특성들을 포함한다.Embodiments also decode the audio data by receiving audio data and metadata associated with the audio data, and determine whether loudness information associated with loudness parameters for a first group of audio devices is available in the stream. parses metadata in the bitstream to determine loudness information from the stream, in response to a determination that loudness information for the first group exists, determines loudness information from the stream, the audio data for use in rendering audio; A method of transmitting loudness information or, if no loudness information exists for the first group, determining loudness information associated with an output profile, and transmitting the determined loudness information for the output profile for use in rendering audio. is about In one embodiment, determining loudness information associated with an output profile comprises analyzing characteristics of the output profile, determining the parameters based on the characteristics, and transmitting the determined loudness information comprises: and transmitting the determined parameters. The loudness information may include loudness parameters for the output profile or characteristics thereof. In an embodiment, the method may further comprise determining a low bit rate encoded stream to be transmitted, wherein the loudness information includes characteristics for one or more output profiles.

실시예들은 또한 상기 디코딩 방법 실시예들에서 설명된 동작들을 수행하는 프로세싱 구성요소들을 포함하는 장치 또는 시스템에 관한 것이다. Embodiments also relate to an apparatus or system comprising processing components for performing the operations described in the decoding method embodiments above.

본 발명에 따른 시스템은 데이터 레이트를 감소시키기 위해 원래 이득들의 파라미터화를 통한 인코더로부터의 제어 하에서, 인코더에서 라우드니스 제어 및 동적 범위 요건들에 기초하여 적절한 이득들을 발생시키거나 또는 디코더에서 이득들을 발생시키며, 또한 다른 메타데이터(내부 또는 외부) 파라미터들이 라우드니스 및 동적 범위 이득들 및/또는 프로파일들을 적절히 제어하기 위해 사용되도록 허용한다.The system according to the invention generates appropriate gains based on loudness control and dynamic range requirements at the encoder or at the decoder, under control from the encoder through parameterization of the original gains to reduce the data rate, , also allows other metadata (internal or external) parameters to be used to properly control loudness and dynamic range gains and/or profiles.

다음의 도면들에서, 유사한 참조 번호들은 유사한 요소들을 나타내기 위해 사용된다. 다음의 도면들은 다양한 예들을 묘사하지만, 여기에 설명된 구현들은 도면들에 묘사된 예들에 제한되지 않는다.
도 1은 일부 실시예들에서, 라우드니스 및 동적 범위의 최적화를 수행하도록 구성된 오디오 프로세싱 시스템의 실시예의 블록도.
도 2는 일부 실시예들에서, 도 1의 시스템에서의 사용을 위한 인코더의 블록도.
도 3은 일부 실시예들에서, 도 1의 시스템에서의 사용을 위한 디코더의 블록도.
도 4는 세그먼트들로 분할된 AC-3 프레임의 다이어그램.
도 5는 세그먼트들로 분할된 AC-3 프레임의 동기화 정보(SI) 세그먼트의 다이어그램.
도 6은 세그먼트들로 분할된 AC-3 프레임의 비트스트림 정보(BSI) 세그먼트의 다이어그램.
도 7은 세그먼트들로 분할된 E-AC-3 프레임의 다이어그램.
도 8은 일부 실시예들에서, 인코딩된 비트스트림의 특정한 프레임들 및 메타데이터의 포맷을 예시한 표.
도 9는 일부 실시예들에서, 라우드니스 프로세싱 상태 메타데이터의 포맷을 예시한 표.
도 10은 일부 실시예들에서, 라우드니스 및 동적 범위의 최적화를 수행하도록 구성될 수 있는 도 1의 오디오 프로세싱 시스템의 보다 상세한 블록도.
도 11은 예시적인 사용 케이스에서 다양한 재생 디바이스들 및 배경 청취 환경들을 위한 상이한 동적 범위 요건들을 예시한 표.
도 12는 실시예에서, 동적 범위 최적화 시스템의 블록도.
도 13은 실시예에서, 다양한 상이한 재생 디바이스 클래스들에 대한 상이한 프로파일들 사이에서의 인터페이스를 예시한 블록도.
도 14는 실시예에서, 복수의 정의된 프로파일들에 대한 장기 라우드니스와 단기 동적 범위 사이에서의 상관관계를 예시하는 표.
도 15는 실시예에서, 상이한 유형들의 오디오 콘텐트에 대한 라우드니스 프로파일들의 예들을 예시한 도면.
도 16은 실시예에서, 재생 디바이스들 및 애플리케이션들에 걸쳐 라우드니스 및 동적 범위를 최적화하는 방법을 예시하는 흐름도.In the following drawings, like reference numbers are used to denote like elements. Although the following figures depict various examples, the implementations described herein are not limited to the examples depicted in the figures.
1 is a block diagram of an embodiment of an audio processing system configured to perform optimization of loudness and dynamic range, in some embodiments;
FIG. 2 is a block diagram of an encoder for use in the system of FIG. 1 , in some embodiments;
3 is a block diagram of a decoder for use in the system of FIG. 1 , in some embodiments;
Fig. 4 is a diagram of an AC-3 frame divided into segments;
5 is a diagram of a synchronization information (SI) segment of an AC-3 frame divided into segments;
6 is a diagram of a bitstream information (BSI) segment of an AC-3 frame divided into segments;
Fig. 7 is a diagram of an E-AC-3 frame divided into segments;
8 is a table illustrating the format of specific frames and metadata of an encoded bitstream, in some embodiments.
9 is a table illustrating the format of loudness processing state metadata, in some embodiments.
FIG. 10 is a more detailed block diagram of the audio processing system of FIG. 1 that may be configured to perform optimization of loudness and dynamic range, in some embodiments;
11 is a table illustrating different dynamic range requirements for various playback devices and background listening environments in an example use case;
12 is a block diagram of a dynamic range optimization system, in an embodiment;
13 is a block diagram illustrating the interface between different profiles for various different playback device classes, in an embodiment;
14 is a table illustrating the correlation between long term loudness and short term dynamic range for a plurality of defined profiles, in an embodiment.
Fig. 15 illustrates examples of loudness profiles for different types of audio content, in an embodiment;
16 is a flow diagram illustrating a method of optimizing loudness and dynamic range across playback devices and applications, in an embodiment.

정의들 및 명명법Definitions and nomenclature

청구항들을 포함하여, 본 개시 전체에 걸쳐, 신호 또는 데이터"에 대한" 동작을 수행하는 표현(예로서, 신호 또는 데이터를 필터링하고, 스케일링하고, 변환하거나 또는 그것에 이득을 적용하는)은 신호 또는 데이터에 대해, 또는 신호 또는 데이터의 프로세싱된 버전에 대해(예로서, 동작의 수행 이전에 예비 필터링 또는 전처리를 받은 신호의 버전에 대해) 직접 동작을 수행하는 것을 나타내기 위해 광범위한 의미로 사용된다. 표현("시스템")은 디바이스, 시스템, 또는 서브시스템을 나타내기 위해 광범위한 의미로 사용된다. 예를 들면, 디코더를 구현하는 서브시스템은 디코더 시스템이라 할 수 있으며, 이러한 서브시스템을 포함한 시스템(예로서, 다수의 입력들에 응답하여 X개의 출력 신호들을 발생시키는 시스템, 여기에서 서브시스템은 M의 입력들을 발생시키며 다른 X-M 입력들이 외부 소스로부터 수신된다)은 또한 디코더 시스템이라 할 수 있다. 용어("프로세서")는 데이터(예로서, 오디오, 또는 비디오 또는 다른 이미지 데이터)에 대한 동작들을 수행하도록 프로그램가능한 또는 다른 방식으로 구성 가능한(예로서, 소프트웨어 또는 펌웨어로) 시스템 또는 디바이스를 나타내기 위해 광범위한 의미로 사용된다. 프로세서들의 예들은 필드-프로그램 가능한 게이트 어레이(또는 다른 구성 가능한 집적 회로 또는 칩 셋), 오디오 또는 다른 사운드 데이터에 대한 파이프라인 프로세싱을 수행하도록 프로그램되거나 및/또는 다른 방식으로 구성된 디지털 신호 프로세서, 프로그램 가능한 범용 프로세서 또는 컴퓨터, 및 프로그램 가능한 마이크로프로세서 칩 또는 칩 셋을 포함한다.Throughout this disclosure, including in the claims, representations that perform an operation "on" a signal or data (eg, filter, scale, transform, or apply a gain to the signal or data) are It is used in a broad sense to refer to performing an operation directly on, or on a processed version of a signal or data (eg, on a version of the signal that has been pre-filtered or pre-processed prior to performance of the operation). The expression (“system”) is used in its broadest sense to refer to a device, system, or subsystem. For example, a subsystem implementing a decoder may be referred to as a decoder system, and a system including such a subsystem (eg, a system that generates X output signals in response to multiple inputs, wherein the subsystem is M ) and other XM inputs are received from an external source) can also be referred to as a decoder system. The term (“processor”) refers to a system or device programmable or otherwise configurable (eg, in software or firmware) to perform operations on data (eg, audio, or video or other image data). used in a broad sense for Examples of processors include a field-programmable gate array (or other configurable integrated circuit or chipset), a digital signal processor programmed and/or otherwise configured to perform pipeline processing on audio or other sound data, a programmable general-purpose processors or computers, and programmable microprocessor chips or chipsets.

표현들 "오디오 프로세서" 및 "오디오 프로세싱 유닛"은 상호 교환 가능하게 사용되며, 광범위한 의미로는 오디오 데이터를 프로세싱하도록 구성된 시스템을 나타내기 위해 사용된다. 오디오 프로세싱 유닛들의 예들은, 이에 제한되지 않지만, 인코더들(예로서, 트랜스코더들), 디코더들, 코덱들, 전-처리 시스템들, 후-처리 시스템들, 및 비트스트림 프로세싱 시스템들(때때로 비트스트림 프로세싱 툴이라 함)을 포함한다. 표현 "프로세싱 상태 메타데이터"(예로서, 표현 "라우드니스 프로세싱 상태 메타데이터")은 대응하는 오디오 데이터(또한 프로세싱 상태 메타데이터를 포함하는 오디오 데이터 스트림의 오디오 콘텐트)로부터의 별개의 상이한 데이터를 나타낸다. 프로세싱 상태 메타데이터는 오디오 데이터와 연관되고, 대응하는 오디오 데이터의 라우드니스 프로세싱 상태(예로서, 어떤 유형(들)의 프로세싱이 오디오 데이터에 대해 이미 수행되었는지)를 나타내며, 선택적으로 또한 오디오 데이터의 적어도 하나의 특징 또는 특성을 나타낸다. 일부 실시예에서, 오디오 데이터와 프로세싱 상태 메타데이터의 연관은 시간-동기식이다. 따라서, 현재(가장 최근에 수신된 또는 업데이트된) 프로세싱 상태 메타데이터는 대응하는 오디오 데이터가 표시된 유형(들)의 오디오 데이터 프로세싱의 결과들을 동시에 포함한다는 것을 나타낸다. 몇몇 경우들에서, 프로세싱 상태 메타데이터는 프로세싱 이력 및/또는 표시된 유형들의 프로세싱에서 사용되고 및/또는 그로부터 얻어지는 파라미터들의 일부 또는 모두를 포함할 수 있다. 부가적으로, 프로세싱 상태 메타데이터는 오디오 데이터로부터 계산되거나 또는 그로부터 추출되어 온 대응하는 오디오 데이터의 적어도 하나의 특징 또는 특성을 포함할 수 있다. 프로세싱 상태 메타데이터는 또한 대응하는 오디오 데이터의 어떤 프로세싱에도 관련되지 않거나 그로부터 얻어지지 않는 다른 메타데이터를 포함할 수 있다. 예를 들면, 제 3 자 데이터, 추적 정보, 식별자들, 독점(proprietary) 또는 표준 정보, 사용자 주석 데이터, 사용자 선호 데이터 등이 다른 오디오 프로세싱 유닛들로 전하기 위해 특정의 오디오 프로세싱 유닛에 의해 부가될 수 있다. The expressions "audio processor" and "audio processing unit" are used interchangeably, and in a broad sense are used to refer to a system configured to process audio data. Examples of audio processing units include, but are not limited to, encoders (eg, transcoders), decoders, codecs, pre-processing systems, post-processing systems, and bitstream processing systems (sometimes bit stream processing tools). The expression “processing state metadata” (eg, the expression “loudness processing state metadata”) represents distinct and different data from the corresponding audio data (the audio content of the audio data stream that also includes the processing state metadata). The processing state metadata is associated with the audio data and indicates a loudness processing state of the corresponding audio data (eg, what type(s) of processing has already been performed on the audio data), and optionally also at least one of the audio data. characteristics or characteristics of In some embodiments, the association of audio data with processing state metadata is time-synchronous. Thus, the current (most recently received or updated) processing state metadata indicates that the corresponding audio data simultaneously contains the results of audio data processing of the indicated type(s). In some cases, processing state metadata may include some or all of the processing history and/or parameters used in and/or obtained from the indicated types of processing. Additionally, the processing state metadata may include at least one characteristic or characteristic of the corresponding audio data that has been calculated from or extracted from the audio data. The processing state metadata may also include other metadata not related to or derived from any processing of the corresponding audio data. For example, third party data, tracking information, identifiers, proprietary or standard information, user annotation data, user preference data, etc. may be added by a particular audio processing unit to communicate to other audio processing units. have.

표현 "라우드니스 프로세싱 상태 메타데이터" (또는 "LPSM")은 대응하는 오디오 데이터의 라우드니스 프로세싱 상태(예로서, 어떤 유형(들)의 라우드니스 프로세싱이 오디오 데이터에 대해 수행되었는지)를 나타내고 또한 선택적으로 대응하는 오디오 데이터의 적어도 하나의 특징 또는 특성(예로서, 라우드니스)을 나타내는 프로세싱 상태 메타데이터를 표시한다. 라우드니스 프로세싱 상태 메타데이터는 라우드니스 프로세싱 상태 메타데이터가 아닌(즉, 그것이 단독으로 고려될 때) 데이터(예로서, 다른 메타데이터)를 포함할 수 있다. 용어 "결합하다" 또는 "결합된"은 직접 또는 간접 접속 중 하나를 의미하기 위해 사용된다. The expression "loudness processing state metadata" (or "LPSM") indicates and optionally corresponds to a loudness processing state (eg, what type(s) of loudness processing has been performed on the audio data) of the corresponding audio data. Indicate processing state metadata representative of at least one characteristic or characteristic (eg, loudness) of the audio data. Loudness processing state metadata may include data (eg, other metadata) that is not loudness processing state metadata (ie, when considered alone). The terms “couple” or “coupled” are used to mean either direct or indirect connection.

시스템들 및 방법들은 상이한 타겟 라우드니스 값들을 요구하거나 또는 사용하며 상이한 동적 범위 능력들을 가진 다양한 디바이스들에 걸쳐 오디오의 라우드니스 및 동적 범위를 비-파괴적으로 정규화하는 오디오 인코더/디코더에 대해 설명된다. 일부 실시예들에 따른 방법들 및 기능 구성요소들은 하나 이상의 디바이스 프로파일들에 대해 오디오 콘텐트에 대한 정보를 인코더에서 디코더로 전송한다. 디바이스 프로파일은 하나 이상의 디바이스들에 대해 원하는 타겟 라우드니스 및 동적 범위를 특정한다. 시스템은 확장 가능하며, 따라서 상이한 "공칭" 라우드니스 타겟들을 가진 새로운 디바이스 프로파일들이 지원될 수 있다.Systems and methods are described for an audio encoder/decoder that non-destructively normalizes the loudness and dynamic range of audio across various devices that require or use different target loudness values and have different dynamic range capabilities. Methods and functional components according to some embodiments transmit information about audio content from an encoder to a decoder for one or more device profiles. A device profile specifies a desired target loudness and dynamic range for one or more devices. The system is extensible, so new device profiles with different “nominal” loudness targets can be supported.

실시예에서, 상기 시스템은 데이터 레이트를 감소시키기 위해 원래 이득들의 파라미터화를 통한 인코더로부터의 제어 하에서, 인코더에서 라우드니스 제어 및 동적 범위 요건들에 기초하여 적절한 이득들을 발생시키거나 또는 디코더에서 상기 이득들을 발생시킨다. 상기 동적 범위 시스템은 라우드니스 제어를 구현하기 위한 두 개의 메커니즘들을 포함한다: 오디오가 어떻게 재생될 지에 대해서 콘텐트 생성기 제어를 제공하는 예술적 동적 범위 프로파일, 및 오버로딩이 다양한 재생 프로파일들에 대해 발생하지 않음을 보장하기 위한 별개의 보호 메커니즘. 상기 시스템은 또한 다른 메타데이터(내부 또는 외부) 파라미터들이 라우드니스 및 동적 범위 이득들 및/또는 프로파일들을 적절히 제어하기 위해 사용될 수 있도록 구성된다. 상기 디코더는 디코더-측 라우드니스 및 동적 범위 설정들/프로세싱을 레버리징할 n-채널 보조 입력을 지원하도록 구성된다.In an embodiment, the system generates appropriate gains based on loudness control and dynamic range requirements at the encoder or at the decoder, under control from the encoder via parameterization of the original gains to reduce the data rate. generate The dynamic range system includes two mechanisms for implementing loudness control: an artistic dynamic range profile that provides content creator control over how audio is played, and ensuring that overloading does not occur for various playback profiles. separate protective mechanisms for The system is also configured such that other metadata (internal or external) parameters can be used to properly control loudness and dynamic range gains and/or profiles. The decoder is configured to support an n-channel auxiliary input to leverage decoder-side loudness and dynamic range settings/processing.

일부 실시예들에서, 라우드니스 프로세싱 상태 메타데이터(LPSM)는 또한 다른 세그먼트들(오디오 데이터 세그먼트들)에 오디오 데이터를 포함하는 오디오 비트스트림의 메타데이터 세그먼트들의 하나 이상의 예약(reserved) 필드들(또는 슬롯들)에 내장된다. 예를 들면, 비트스트림의 각각의 프레임의 적어도 하나의 세그먼트는 LPSM을 포함하며, 프레임의 적어도 하나의 다른 세그먼트는 대응하는 오디오 데이터(즉, 그것의 라우드니스 프로세싱 상태 및 라우드니스가 LPSM에 의해 나타내어지는 오디오 데이터)를 포함한다. 일부 실시예들에서, LPSM의 데이터 볼륨은 오디오 데이터를 운반하도록 할당된 비트 레이트에 영향을 미치지 않고 운반되기에 충분히 작을 수 있다.In some embodiments, the loudness processing state metadata (LPSM) also includes one or more reserved fields (or slots) of metadata segments of an audio bitstream that contain audio data in other segments (audio data segments). ) is built into For example, at least one segment of each frame of the bitstream comprises an LPSM, and at least one other segment of the frame comprises corresponding audio data (ie, audio whose loudness processing state and loudness is represented by the LPSM). data) are included. In some embodiments, the data volume of the LPSM may be small enough to be carried without affecting the bit rate allocated to carry the audio data.

오디오 데이터 프로세싱 체인에서 라우드니스 프로세싱 상태 메타데이터를 전달하는 것은 둘 이상의 오디오 프로세싱 유닛들이 프로세싱 체인(또는 콘텐트 라이프사이클) 전체에 걸쳐 서로 협력하여 작동하도록 요구할 때 특히 유용하다. 오디오 비트스트림에 라우드니스 프로세싱 상태 메타데이터를 포함하지 않으면, 품질, 레벨 및 공간적 열화와 같은 미디어 프로세싱 문제점들이 예를 들면, 둘 이상의 오디오 코덱들이 체인에서 이용되며 단일-단 볼륨 레벨링이 미디어 소비 디바이스(또는 비트스트림의 오디오 콘텐트의 렌더링 포인트)에 대해 비트스트림의 여정 동안 1회 이상 적용될 때 발생할 수 있다. Passing loudness processing state metadata in an audio data processing chain is particularly useful when you require two or more audio processing units to operate in concert with each other throughout the processing chain (or content lifecycle). If the audio bitstream does not include loudness processing state metadata, media processing problems such as quality, level and spatial degradation may occur, for example, when more than one audio codec is used in a chain and single-stage volume leveling is not applied to the media consuming device (or It may occur when applied more than once during the journey of the bitstream for a rendering point of the audio content of the bitstream).

라우드니스 및 동적 범위 메타데이터 프로세싱 시스템Loudness and dynamic range metadata processing system

도 1은 특정한 메타데이터 프로세싱(예로서, 전-처리 및 후-처리) 구성요소들을 사용한 일부 실시예들에서, 라우드니스 및 동적 범위의 최적화를 수행하도록 구성될 수 있는 오디오 프로세싱 시스템의 실시예의 블록도이다. 도 1은 예시적인 오디오 프로세싱 체인(오디오 데이터 프로세싱 시스템)을 예시하며, 여기에서 시스템의 요소들 중 하나 이상은 본 발명의 실시예에 따라 구성될 수 있다. 도 1의 시스템(10)은 도시된 바와 같이 함께 결합된, 다음의 요소들을 포함한다: 전-처리 유닛(12), 인코더(14), 신호 분석 및 메타데이터 교정 유닛(16), 트랜스코더(18), 디코더(20), 및 후-처리 유닛(24). 도시된 시스템상에서의 변화로서, 요소들 중 하나 이상이 생략되거나, 또는 추가의 오디오 데이터 프로세싱 유닛들이 포함된다. 예를 들면, 일 실시예에서, 후-처리 유닛(22)은 별개의 유닛 대신에 디코더(20)의 부분이다.1 is a block diagram of an embodiment of an audio processing system that may be configured to perform optimization of loudness and dynamic range, in some embodiments using certain metadata processing (eg, pre-processing and post-processing) components; am. 1 illustrates an exemplary audio processing chain (audio data processing system), wherein one or more of the elements of the system may be configured in accordance with an embodiment of the present invention. The system 10 of FIG. 1 includes the following elements, coupled together as shown: a pre-processing unit 12 , an encoder 14 , a signal analysis and metadata correction unit 16 , a transcoder ( 18 ), decoder 20 , and post-processing unit 24 . As a variation on the illustrated system, one or more of the elements are omitted, or additional audio data processing units are included. For example, in one embodiment, the post-processing unit 22 is part of the decoder 20 instead of a separate unit.

몇몇 구현들에서, 도 1의 전-처리 유닛은 입력(11)으로서 오디오 콘텐트를 포함한 PCM(시간-도메인) 샘플들을 받아들이고, 및 프로세싱된 PCM 샘플들을 출력하도록 구성된다. 인코더(14)는 입력으로서 상기 PCM 샘플들을 받아들이고, 상기 오디오 콘텐트를 나타내는 인코딩된(예로서, 압축된) 오디오 비트스트림을 출력하도록 구성될 수 있다. 상기 오디오 콘텐트를 나타내는 비트스트림의 데이터는 때때로 여기에서 "오디오 데이터"로서 불린다. 일 실시예에서, 상기 인코더로부터 출력된 오디오 비트스트림은 오디오 데이터뿐만 아니라 라우드니스 프로세싱 상태 메타데이터(및 선택적으로 또한 다른 메타데이터)를 포함한다. In some implementations, the pre-processing unit of FIG. 1 is configured to accept PCM (time-domain) samples including audio content as input 11 , and output the processed PCM samples. The encoder 14 may be configured to accept the PCM samples as input and output an encoded (eg, compressed) audio bitstream representing the audio content. The data of the bitstream representing the audio content is sometimes referred to herein as "audio data". In one embodiment, the audio bitstream output from the encoder includes audio data as well as loudness processing state metadata (and optionally also other metadata).

신호 분석 및 메타데이터 교정 유닛(16)은 입력으로서 하나 이상의 인코딩된 오디오 비트스트림들을 받아들이고 신호 분석을 수행함으로써, 각각의 인코딩된 오디오 비트스트림에서의 프로세싱 상태 메타데이터가 정확한지를 결정(예로서, 검증)할 수 있다. 일부 실시예들에서, 검증은 도 2에 도시된 요소(102)와 같은, 상태 검증기 구성요소에 의해 수행될 수 있으며, 하나의 이러한 검증 기술이 상태 검증기(102)의 맥락에서 이하에 설명된다. 일부 실시예들에서, 유닛(16)은 인코더에 포함되며 검증은 유닛(16) 또는 검증기(102)에 의해 행해진다. 신호 분석 및 메타데이터 교정 유닛이 포함된 메타데이터가 유효하지 않음을 발견한다면, 메타데이터 교정 유닛(16)은 정확한 값(들)을 결정하기 위해 신호 분석을 수행하며 부정확한 값(들)을 상기 결정된 정확한 값(들)으로 대체한다. 따라서, 신호 분석 및 메타데이터 교정 유닛으로부터 출력된 각각의 인코딩된 오디오 비트스트림은 인코딩된 오디오 데이터뿐만 아니라 교정된 프로세싱 상태 메타데이터를 포함할 수 있다. 신호 분석 및 메타데이터 교정 유닛(16)은 전-처리 유닛(12), 인코더(14), 트랜스코더(18), 디코더(20), 또는 후-처리 유닛(22)의 부분이 될 수 있다. 대안적으로, 신호 분석 및 메타데이터 교정 유닛(16)은 오디오 프로세싱 체인에서 별개의 유닛 또는 또 다른 유닛의 부분이 될 수 있다.The signal analysis and metadata correction unit 16 accepts one or more encoded audio bitstreams as input and performs signal analysis, thereby determining (eg, verifying) that the processing state metadata in each encoded audio bitstream is correct. )can do. In some embodiments, the verification may be performed by a state verifier component, such as element 102 shown in FIG. 2 , one such verification technique is described below in the context of state verifier 102 . In some embodiments, unit 16 is included in the encoder and the verification is done by unit 16 or verifier 102 . If the signal analysis and metadata correction unit finds that the metadata included is invalid, the metadata correction unit 16 performs signal analysis to determine the correct value(s) and recalls the incorrect value(s). Replace with the exact value(s) determined. Accordingly, each encoded audio bitstream output from the signal analysis and metadata correction unit may include not only the encoded audio data but also the corrected processing state metadata. The signal analysis and metadata correction unit 16 may be part of the pre-processing unit 12 , the encoder 14 , the transcoder 18 , the decoder 20 , or the post-processing unit 22 . Alternatively, the signal analysis and metadata correction unit 16 may be a separate unit or part of another unit in the audio processing chain.

트랜스코더(18)는 입력으로서 인코딩된 오디오 비트스트림들을 받아들이고, 이에 응답하여(예로서, 입력 스트림을 디코딩하고 상이한 인코딩 포맷으로 디코딩된 스트림을 재-인코딩함으로써) 수정된(예로서, 상이하게 인코딩된) 오디오 비트스트림들을 출력할 수 있다. 상기 트랜스코더로부터 출력된 오디오 비트스트림은 인코딩된 오디오 데이터뿐만 아니라 라우드니스 프로세싱 상태 메타데이터(및 선택적으로 또한 다른 메타데이터)를 포함한다. 상기 메타데이터는 비트스트림에 포함될 수 있다.Transcoder 18 accepts encoded audio bitstreams as input and in response (eg, by decoding the input stream and re-encoding the decoded stream into a different encoding format) modified (eg, differently encoded) ) can output audio bitstreams. The audio bitstream output from the transcoder includes encoded audio data as well as loudness processing state metadata (and optionally also other metadata). The metadata may be included in the bitstream.

도 1의 디코더(20)는 입력으로서 인코딩된(예로서, 압축된) 오디오 비트스트림들을 받아들이고, (이에 응답하여) 디코딩된 PCM 오디오 샘플들의 스트림들을 출력할 수 있다. 일 실시예에서, 디코더의 출력은 다음 중 어떤 것이거나 또는 이를 포함한다: 오디오 샘플들의 스트림, 및 입력 인코딩된 비트스트림으로부터 추출된 라우드니스 프로세싱 상태 메타데이터(및 선택적으로 또한 다른 메타데이터)의 대응 스트림; 오디오 샘플들의 스트림, 및 입력 인코딩된 비트스트림으로부터 추출된 라우드니스 프로세싱 상태 메타데이터(및 선택적으로 또한 다른 메타데이터)로부터 결정된 제어 비트들의 대응 스트림; 또는 프로세싱 상태 메타데이터의 대응 스트림 또는 프로세싱 상태 메타데이터로부터 결정된 제어 비트들 없는, 오디오 샘플들의 스트림. 이러한 마지막 경우에, 디코더는, 상기 추출된 메타데이터 또는 그로부터 결정된 제어 비트들을 출력하지 않더라도, 입력 인코딩된 비트스트림으로부터 라우드니스 프로세싱 상태 메타데이터(및/또는 다른 메타데이터)를 추출하고 상기 추출된 메타데이터에 대한 적어도 하나의 동작(예로서, 검증)을 수행할 수 있다.The decoder 20 of FIG. 1 may accept as input encoded (eg, compressed) audio bitstreams and (in response) output streams of decoded PCM audio samples. In one embodiment, the output of the decoder is or comprises any of the following: a stream of audio samples, and a corresponding stream of loudness processing state metadata (and optionally also other metadata) extracted from the input encoded bitstream. ; a stream of audio samples and a corresponding stream of control bits determined from loudness processing state metadata (and optionally also other metadata) extracted from the input encoded bitstream; or a corresponding stream of processing state metadata or a stream of audio samples without control bits determined from processing state metadata. In this last case, the decoder extracts the loudness processing state metadata (and/or other metadata) from the input encoded bitstream and extracts the extracted metadata, even if it does not output the extracted metadata or the control bits determined therefrom. At least one operation (eg, verification) may be performed.

본 발명의 실시예에 따라 도 1의 후-처리 유닛을 구성함으로써, 후-처리 유닛(22)은 디코딩된 PCM 오디오 샘플들의 스트림을 받아들이고, 샘플들과 함께 수신된 라우드니스 프로세싱 상태 메타데이터(및 선택적으로 또한 다른 메타데이터), 또는 상기 샘플들과 함께 수신된 제어 비트들(라우드니스 프로세싱 상태 메타데이터 및 선택적으로 또한 다른 메타데이터로부터 디코더에 의해 결정)을 사용하여 그것에 대한 후 처리(예로서, 오디오 콘텐트의 볼륨 레벨링)를 수행하도록 구성된다. 상기 후-처리 유닛(22)은 선택적으로 또한 하나 이상의 스피커들에 의해 재생을 위한 상기 후-처리된 오디오 콘텐트를 렌더링하도록 구성된다. 이들 스피커들은 컴퓨터들, 텔레비전들, 스테레오 시스템들(홈 또는 시네마), 이동 전화들, 및 다른 휴대용 재생 디바이스들과 같은, 재생 장비의 다양한 상이한 청취 디바이스들 또는 아이템들 중 어떠한 것으로도 구현될 수 있다. 상기 스피커들은 어떠한 적절한 크기 및 출력 정격도 될 수 있으며, 독립된 드라이버들, 스피커 엔클로저들, 서라운드-사운드 시스템들, 사운드바들, 헤드폰들, 이어버드들 등의 형태로 제공될 수 있다.By configuring the post-processing unit of Fig. 1 according to an embodiment of the present invention, the post-processing unit 22 accepts a stream of decoded PCM audio samples, and receives the received loudness processing state metadata (and optional also other metadata), or post-processing on it (e.g. audio content) using control bits received with the samples (determined by the decoder from loudness processing state metadata and optionally also other metadata) volume leveling). The post-processing unit 22 is optionally also configured to render the post-processed audio content for playback by one or more speakers. These speakers may be implemented in any of a variety of different listening devices or items of playback equipment, such as computers, televisions, stereo systems (home or cinema), mobile phones, and other portable playback devices. . The speakers may be of any suitable size and power rating, and may be provided in the form of stand-alone drivers, speaker enclosures, surround-sound systems, soundbars, headphones, earbuds, and the like.

일부 실시예들은 오디오 프로세싱 유닛들(예로서, 인코더들, 디코더들, 트랜스코더들, 및 전- 및 후-처리 유닛들)이 오디오 프로세싱 유닛들에 의해 각각 수신된 라우드니스 프로세싱 상태 메타데이터에 의해 나타내어진 바와 같이 미디어 데이터의 동시 발생 상태에 따라 오디오 데이터에 적용되도록 그것들 각각의 프로세싱을 적응시키는 강화된 오디오 프로세싱 체인을 제공한다. 시스템(100)의 임의의 오디오 프로세싱 유닛(예로서, 도 1의 인코더 또는 트랜스코더)으로의 오디오 데이터 입력(11)은 오디오 데이터(예로서, 인코딩된 오디오 데이터)뿐만 아니라 라우드니스 프로세싱 상태 메타데이터(및 선택적으로 또한 다른 메타데이터)를 포함할 수 있다. 이러한 메타데이터는 일부 실시예들에 따라 또 다른 요소 또는 또 다른 소스에 의해 입력 오디오에 포함될 수 있다. 상기 입력 오디오를 수신하는(메타데이터와 함께) 프로세싱 유닛은 메타데이터에 대한 또는 메타데이터에 응답하여(예로서, 입력 오디오의 적응적 프로세싱) 적어도 하나의 동작(예로서, 검증)을 수행하고, 또한 선택적으로 그것의 출력 오디오에 메타데이터, 메타데이터의 프로세싱된 버전, 또는 메타데이터로부터 결정된 제어 비트들을 포함하도록 구성될 수 있다.Some embodiments provide that audio processing units (eg, encoders, decoders, transcoders, and pre- and post-processing units) are indicated by loudness processing state metadata received by the audio processing units, respectively. As described above, it provides an enhanced audio processing chain that adapts their respective processing to be applied to audio data according to the co-occurrence state of the media data. Audio data input 11 to any audio processing unit (eg, the encoder or transcoder in FIG. 1 ) of the system 100 may include audio data (eg, encoded audio data) as well as loudness processing state metadata ( and optionally also other metadata). Such metadata may be included in the input audio by another element or another source in accordance with some embodiments. the processing unit receiving (with metadata) the input audio performs at least one action (eg, validation) on or in response to the metadata (eg, adaptive processing of the input audio); It may also optionally be configured to include in its output audio metadata, a processed version of the metadata, or control bits determined from the metadata.

오디오 프로세싱 유닛(또는 오디오 프로세서)의 실시예는 오디오 데이터에 대응하는 라우드니스 프로세싱 상태 메타데이터에 의해 나타내어지는 바와 같이 오디오 데이터의 상태에 기초하여 오디오 데이터의 적응적 프로세싱을 수행하도록 구성된다. 일부 실시예들에서, 메타데이터가 라우드니스 프로세싱 또는 그것과 유사한 프로세싱이 오디오 데이터에 대해 아직 수행되지 않았다는 것을 나타내면, 적응적 프로세싱은 라우드니스 프로세싱이지만(또는 이를 포함하지만), 메타데이터가 이러한 라우드니스 프로세싱 또는 그것과 유사한 프로세싱이 이미 오디오 데이터에 대해 수행되었다는 것을 나타내면, 라우드니스 프로세싱이 아니다(이를 포함하지 않는다). 일부 실시예들에서, 적응적 프로세싱은 오디오 프로세싱 유닛이 라우드니스 프로세싱 상태 메타데이터에 의해 나타내어지는 바와 같이 오디오 데이터의 상태에 기초하여 오디오 데이터의 다른 적응적 프로세싱을 수행하는 것을 보장하기 위해 메타데이터 검증(예로서, 메타데이터 검증 서브-유닛에서 수행)이거나 또는 이를 포함한다. 일부 실시예들에서, 검증은 오디오 데이터와 연관된(예로서, 그와 함께 비트스트림에 포함된) 라우드니스 프로세싱 상태 메타데이터의 신뢰성을 결정한다. 예를 들면, 메타데이터가 신뢰 가능한 것으로 검증된다면, 이전 수행된 오디오 프로세싱의 유형으로부터의 결과들은 재사용될 수 있으며 동일한 유형의 오디오 프로세싱의 부가적인 수행은 회피될 수 있다. 다른 한편으로, 메타데이터가 변경되었다고(또는 그 외 신뢰 가능하지 않다고) 밝혀진다면, (신뢰 가능하지 않은 메타데이터로 나타내어진 바와 같이) 이전 수행된 것으로 알려진 미디어 프로세싱의 유형은 오디오 프로세싱 유닛에 의해 반복될 수 있으며, 및/또는 다른 프로세싱이 메타데이터 및/또는 오디오 데이터에 대해 오디오 프로세싱 유닛에 의해 수행될 수 있다. 상기 오디오 프로세싱 유닛은 또한 유닛이 프로세싱 상태 메타데이터가 유효하다고(예로서, 추출된 암호 값 및 기준 암호 값의 매칭에 기초하여) 결정한다면, (예로서, 미디어 비트스트림에 존재하는) 라우드니스 프로세싱 상태 메타데이터가 유효한 강화된 미디어 프로세싱 체인에서 아래쪽으로 다른 오디오 프로세싱 유닛들에 시그널링하도록 구성될 수 있다. An embodiment of the audio processing unit (or audio processor) is configured to perform adaptive processing of the audio data based on the state of the audio data as indicated by the loudness processing state metadata corresponding to the audio data. In some embodiments, if the metadata indicates that loudness processing or similar processing has not yet been performed on the audio data, the adaptive processing is (or includes) loudness processing, but the metadata indicates that such loudness processing or its If it indicates that processing similar to that has already been performed on the audio data, it is not (but does not include) loudness processing. In some embodiments, the adaptive processing performs metadata validation ( eg, performed in the metadata validation sub-unit) or include. In some embodiments, the verification determines the reliability of loudness processing state metadata associated with (eg, included in the bitstream with) audio data. For example, if the metadata is verified to be reliable, results from the previously performed type of audio processing may be reused and additional performance of the same type of audio processing may be avoided. On the other hand, if it is found that the metadata has been altered (or otherwise unreliable), the type of media processing previously known to have been performed (as indicated by the unreliable metadata) is repeated by the audio processing unit. and/or other processing may be performed by the audio processing unit on the metadata and/or audio data. The audio processing unit also determines the loudness processing state (eg, present in the media bitstream) if the unit determines that the processing state metadata is valid (eg, based on a match of the extracted cipher value and the reference cipher value). The metadata may be configured to signal other audio processing units down in the valid enhanced media processing chain.

도 1의 실시예를 위해, 전-처리 구성요소(12)는 인코더(14)의 부분일 수 있으며, 후-처리 구성요소(22)는 디코더(22)의 부분일 수 있다. 대안적으로, 전-처리 구성요소(12)는 인코더(14)로부터 분리되는 기능 구성요소로 구현될 수 있다. 유사하게, 후-처리 구성요소(22)는 디코더(20)로부터 분리되는 기능 구성요소로 구현될 수 있다. For the embodiment of FIG. 1 , the pre-processing component 12 may be part of the encoder 14 and the post-processing component 22 may be part of the decoder 22 . Alternatively, the pre-processing component 12 may be implemented as a functional component separate from the encoder 14 . Similarly, post-processing component 22 may be implemented as a functional component separate from decoder 20 .

도 2는 도 1의 시스템(10)과 함께 사용될 수 있는 인코더(100)의 블록도이다. 인코더(100)의 구성요소들 또는 요소들 중 어떠한 것도 하드웨어, 소프트웨어, 또는 하드웨어 및 소프트웨어의 조합으로, 하나 이상의 프로세서들 및/또는 하나 이상의 회로들(예로서, ASIC들, FPGA들, 또는 다른 집적 회로들)로서 구현될 수 있다. 인코더(100)는 도시된 바와 같이 접속된, 프레임 버퍼(110), 파서(111), 디코더(101), 오디오 상태 검증기(102), 라우드니스 프로세싱 스테이지(103), 오디오 스트림 선택 스테이지(104), 인코더(105), 스터퍼/포맷터 스테이지(107), 메타데이터 생성 스테이지(106), 다이얼로그 라우드니스 측정 서브시스템(108), 및 프레임 버퍼(109)를 포함한다. 선택적으로 또한, 인코더(100)는 다른 프로세싱 요소들(도시되지 않음)을 포함한다. (트랜스코더인) 인코더(100)는 입력 오디오 비트스트림(예를 들면, AC-3 비트스트림, E-AC-3 비트스트림, 또는 돌비 E 비트스트림 중 하나일 수 있음)을 입력 비트스트림에 포함된 라우드니스 프로세싱 상태 메타데이터를 사용하여 적응적 및 자동화된 라우드니스 프로세싱을 수행함으로써 포함하는 인코딩된 출력 오디오 비트스트림(예를 들면, AC-3 비트스트림, E-AC-3 비트스트림, 또는 돌비 E 비트스트림 중 또 다른 것일 수 있음)으로 변환하도록 구성된다. 예를 들면, 인코더(100)는 입력 돌비 E 비트스트림(통상적으로 방송되는 오디오 프로그램들을 수신하는 소비자 디바이스들에서 사용되지 않지만 제작 및 방송 설비들에서 사용되는 포맷)에서 AC-3 또는 E-AC-3 포맷으로 인코딩된 출력 오디오 비트스트림(소비자 디바이스들에 방송하기에 적합함)으로 변환하도록 구성될 수 있다. 2 is a block diagram of an encoder 100 that may be used with the system 10 of FIG. 1 . Any of the components or elements of encoder 100 may be hardware, software, or a combination of hardware and software, with one or more processors and/or one or more circuits (eg, ASICs, FPGAs, or other integrated circuits). circuits) can be implemented. The encoder 100 includes a frame buffer 110 , a parser 111 , a decoder 101 , an audio state verifier 102 , a loudness processing stage 103 , an audio stream selection stage 104 , connected as shown. It includes an encoder 105 , a stuffer/formatter stage 107 , a metadata generation stage 106 , a dialog loudness measurement subsystem 108 , and a frame buffer 109 . Optionally, the encoder 100 also includes other processing elements (not shown). The encoder 100 (which is a transcoder) includes in the input bitstream an input audio bitstream (which may be, for example, one of an AC-3 bitstream, an E-AC-3 bitstream, or a Dolby E bitstream). An encoded output audio bitstream (e.g., an AC-3 bitstream, an E-AC-3 bitstream, or a Dolby E bit may be another of the streams). For example, the encoder 100 may output an AC-3 or E-AC- in an input Dolby E bitstream (a format not typically used in consumer devices receiving broadcast audio programs, but used in production and broadcast facilities). 3 format encoded output audio bitstream (suitable for broadcasting to consumer devices).

도 2의 시스템은 또한 인코딩된 오디오 전달 서브시스템(150)(인코더(100)로부터 출력된 인코딩된 비트스트림들을 저장 및/또는 전달함) 및 디코더(152)를 포함한다. 인코더(100)로부터 출력된 인코딩된 오디오 비트스트림은 (예로서, DVD 또는 블루레이 디스크의 형태로) 서브시스템(150)에 의해 저장될 수 있거나, 또는 (송신 링크 또는 네트워크를 구현할 수 있는) 서브시스템(150)에 의해 송신되거나, 또는 서브시스템(150)에 의해 저장 및 송신 양쪽 모두가 될 수 있다. 디코더(152)는 (인코더(100)에 의해 생성된) 서브시스템(150)을 통해 수신하는 인코딩된 오디오 비트스트림을 디코딩하도록 구성되고, 비트스트림의 각각의 프레임으로부터 라우드니스 프로세싱 상태 메타데이터(LPSM)을 추출하고, 디코딩된 오디오 데이터를 생성하는 것을 포함한다. 일 실시예에서, 디코더(152)는 LPSM을 사용하여 디코딩된 오디오 데이터에 대한 적응적 라우드니스 프로세싱을 수행하도록 및/또는 LPSM을 사용하여 디코딩된 오디오 데이터에 대한 적응적 라우드니스 프로세싱을 수행하도록 구성된 전-처리기로 상기 디코딩된 오디오 데이터 및 LPSM을 포워딩하도록 구성된다. 선택적으로, 디코더(152)는 버퍼를 포함하며, 이것은 서브시스템(150)으로부터 수신된 인코딩된 오디오 비트스트림을 (예로서, 비-일시적 방식으로) 저장한다.The system of FIG. 2 also includes an encoded audio delivery subsystem 150 (which stores and/or delivers the encoded bitstreams output from the encoder 100 ) and a decoder 152 . The encoded audio bitstream output from encoder 100 may be stored by subsystem 150 (eg, in the form of a DVD or Blu-ray Disc), or may be stored by subsystem 150 (which may implement a transmit link or network). It may be transmitted by system 150 , or both stored and transmitted by subsystem 150 . The decoder 152 is configured to decode an encoded audio bitstream it receives via the subsystem 150 (generated by the encoder 100 ), and from each frame of the bitstream loudness processing state metadata (LPSM). extracting and generating decoded audio data. In one embodiment, the decoder 152 is pre-configured to perform adaptive loudness processing on audio data decoded using LPSM and/or to perform adaptive loudness processing on audio data decoded using LPSM and forward the decoded audio data and LPSM to a processor. Optionally, decoder 152 includes a buffer, which stores (eg, in a non-transitory manner) an encoded audio bitstream received from subsystem 150 .

인코더(100) 및 디코더(152)의 다양한 구현들이 여기에 설명된 상이한 실시예들을 수행하도록 구성된다. 프레임 버퍼(110)는 인코딩된 입력 오디오 비트스트림을 수신하기 위해 결합된 버퍼 메모리이다. 동작 시, 버퍼(110)는 인코딩된 오디오 비트스트림의 적어도 하나의 프레임을 (예로서, 비-일시적 방식으로) 저장하며, 인코딩된 오디오 비트스트림의 프레임들의 시퀀스는 버퍼(110)로부터 파서(110)로 어서팅(assert)된다. 파서(111)는 인코딩된 입력 오디오의 각각의 프레임으로부터 라우드니스 프로세싱 상태 메타데이터(LPSM) 및 다른 메타데이터를 추출하고, 적어도 상기 LPSM을 오디오 상태 검증기(102), 라우드니스 프로세싱 스테이지(103), 스테이지(106) 및 서브시스템(108)에 어서팅하고, 인코딩된 입력 오디오로부터 오디오 데이터를 추출하고, 오디오 데이터를 디코더(101)에 어서팅하도록 결합되며 구성된다. 인코더(100)의 디코더(101)는 디코딩된 오디오 데이터를 생성하기 위해 오디오 데이터를 디코딩하고, 디코딩된 오디오 데이터를 라우드니스 프로세싱 스테이지(103), 오디오 스트림 선택 스테이지(104), 서브시스템(108), 및 선택적으로 또한 상태 검증기(102)에 어서팅하도록 구성된다.Various implementations of encoder 100 and decoder 152 are configured to perform the different embodiments described herein. Frame buffer 110 is a buffer memory coupled to receive an encoded input audio bitstream. In operation, the buffer 110 stores (eg, in a non-transitory manner) at least one frame of the encoded audio bitstream, and the sequence of frames of the encoded audio bitstream is retrieved from the buffer 110 by the parser 110 ) is asserted. Parser 111 extracts loudness processing state metadata (LPSM) and other metadata from each frame of encoded input audio, and converts at least the LPSM to audio state verifier 102, loudness processing stage 103, stage ( 106 ) and subsystem 108 , extracting audio data from the encoded input audio, and asserting the audio data to the decoder 101 . The decoder 101 of the encoder 100 decodes the audio data to generate decoded audio data, and sends the decoded audio data to the loudness processing stage 103, the audio stream selection stage 104, the subsystem 108, and optionally also assert to the state verifier 102 .

상태 검증기(102)는 어서팅된 LPSM(및 선택적으로 다른 메타데이터)을 인증 및 검증하도록 구성된다. 일부 실시예들에서, 상기 LPSM은 (예로서, 본 발명의 실시예에 따르면) 입력 비트스트림에 포함되는 데이터 블록이다(또는 그것에 포함된다). 상기 블록은 상기 LPSM(및 선택적으로 또한 다른 메타데이터)을 프로세싱하기 위한 암호 해시(해시-기반 메시지 인증 코드 또는 "HMAC") 및/또는 (디코더(101)에서 검증기(102)로 제공된) 기본 오디오 데이터를 포함할 수 있다. 상기 데이터 블록은 이들 실시예들에서 디지털로 서명될 수 있으며, 따라서 다운스트림 오디오 프로세싱 유닛은 비교적 쉽게 프로세싱 상태 메타데이터를 인증하고 검증할 수 있다. The state verifier 102 is configured to authenticate and verify the asserted LPSM (and optionally other metadata). In some embodiments, the LPSM is (or included in) a data block included in an input bitstream (eg, according to an embodiment of the invention). The block is a cryptographic hash (hash-based message authentication code or “HMAC”) for processing the LPSM (and optionally also other metadata) and/or basic audio (provided from decoder 101 to verifier 102 ). It may contain data. The data block may be digitally signed in these embodiments, so that the downstream audio processing unit can authenticate and verify processing state metadata with relative ease.

예를 들면, HMAC은 다이제스트(digest)를 생성하기 위해 사용되며, 본 발명의 비트스트림에 포함된 보호 값(들)은 상기 다이제스트를 포함할 수 있다. 상기 다이제스트는 AC-3 프레임에 대해 다음과 같이 생성될 수 있다: (1) AC-3 데이터 및 LPSM이 인코딩된 후, 프레임 데이터 바이트들(연쇄된 프레임_데이터#1 및 프레임_데이터 #2) 및 LPSM 데이터 바이트들이 해싱-함수 HMAC에 대한 입력으로서 사용된다. 보조데이터 필드 내부에 존재할 수 있는 다른 데이터는 다이제스트를 산출하기 위해 고려되지 않는다. 이러한 다른 데이터는 AC-3 데이터에도 LPSM 데이터에도 속하지 않는 바이트들일 수 있다. 상기 LPSM에 포함된 보호 비트들은 HMAC 다이제스트를 산출하기 위해 고려되지 않을 수 있다. (2) 다이제스트가 산출된 후, 보호 비트들을 위해 예약된 필드에서 비트스트림으로 기록된다. (3) 완전한 AC-3 프레임의 생성의 마지막 단계는 CRC-검사의 산출이다. 이것은 프레임의 제일 끝에 기록되며 LPSM 비트들을 포함하여, 이러한 프레임에 속하는 모든 데이터는 LPSM 비트들을 포함하는 것으로 고려된다.For example, HMAC is used to generate a digest, and the protection value(s) included in the bitstream of the present invention may include the digest. The digest may be generated for an AC-3 frame as follows: (1) After AC-3 data and LPSM are encoded, frame data bytes (concatenated frame_data#1 and frame_data#2) and LPSM data bytes are used as input to the hashing-function HMAC. Any other data that may be present inside the ancillary data field is not taken into account to calculate the digest. These other data may be bytes that do not belong to either AC-3 data or LPSM data. The guard bits included in the LPSM may not be considered to calculate the HMAC digest. (2) After the digest is calculated, it is written into a bitstream in a field reserved for guard bits. (3) The final step in the generation of a complete AC-3 frame is the generation of a CRC-check. It is written at the very end of the frame and including the LPSM bits, all data belonging to this frame is considered to contain the LPSM bits.

이에 제한되지 않지만 하나 이상의 비-HMAC 암호 방법들 중 임의의 것을 포함한 다른 암호 방법들이 LPSM 및/또는 기본 오디오 데이터의 안전한 송신 및 수신을 보장하기 위해 LPSM의 검증(예로서, 검증기(102)에서)을 위해 사용될 수 있다. 예를 들면, (이러한 암호 방법을 사용하는) 검증은 비트스트림에 포함된 라우드니스 프로세싱 상태 메타데이터 및 비트스트림에 포함된 대응하는 오디오 데이터가 (메타데이터에 의해 나타내어진 바와 같이) 특정 라우드니스 프로세싱을 받았고(및/또는 그로부터의 결과이고) 이러한 특정 라우드니스 프로세싱의 수행 후 변경되지 않았는지를 결정하기 위해 오디오 비트스트림의 전형(embodiment)을 수신하는 각각의 오디오 프로세싱 유닛에서 수행될 수 있다.Validation of the LPSM (e.g., at verifier 102) to ensure secure transmission and reception of the LPSM and/or underlying audio data by other cryptographic methods, including, but not limited to, any of one or more non-HMAC cryptographic methods. can be used for For example, verification (using these cryptographic methods) indicates that the loudness processing state metadata contained in the bitstream and the corresponding audio data contained in the bitstream have been subjected to specific loudness processing (as indicated by the metadata); may be performed at each audio processing unit receiving an emblem of the audio bitstream to determine whether (and/or is a result thereof) has not changed after performing this particular loudness processing.

상태 검증기(102)는, 검증 동작의 결과들을 나타내기 위해, 제어 데이터를 오디오 스트림 선택 스테이지(104), 메타데이터 발생기(106), 및 다이얼로그 라우드니스 측정 서브시스템(108)에 어서팅한다. 제어 데이터에 응답하여, 스테이지(104)는 다음 중 하나를 선택할 수 있다(및 인코더(105)로 통과할 수 있다): (1) 라우드니스 프로세싱 스테이지(103)의 적응적으로 프로세싱된 출력(예로서, LPSM이 디코더(101)로부터 출력된 오디오 데이터가 특정 유형의 라우드니스 프로세싱을 받지 않음을 나타내고, 검증기(102)로부터의 제어 비트들이 LPSM이 유효하다는 것을 나타낼 때); 또는 (2) 디코더(101)로부터 출력된 오디오 데이터(예로서, LPSM이 디코더(101)로부터 출력된 오디오 데이터가 스테이지(103)에 의해 수행될 특정 유형의 라우드니스 프로세싱을 이미 받았음을 나타내고 검증기(102)로부터의 제어 비트들이 LPSM이 유효하다는 것을 나타낼 때). 실시예에서, 라우드니스 프로세싱 스테이지(103)는 라우드니스를 특정 타겟 및 라우드니스 범위로 교정한다.The state verifier 102 asserts the control data to the audio stream selection stage 104 , the metadata generator 106 , and the dialog loudness measurement subsystem 108 to represent the results of the verify operation. In response to the control data, the stage 104 may select (and pass to the encoder 105) one of the following: (1) an adaptively processed output of the loudness processing stage 103 (eg, , when the LPSM indicates that the audio data output from the decoder 101 is not subject to a particular type of loudness processing, and the control bits from the verifier 102 indicate that the LPSM is valid); or (2) the audio data output from the decoder 101 (eg, the LPSM indicates that the audio data output from the decoder 101 has already been subjected to a specific type of loudness processing to be performed by the stage 103 ) and the verifier 102 ) when the control bits from ) indicate that the LPSM is valid). In an embodiment, the loudness processing stage 103 corrects the loudness to a specific target and loudness range.

인코더(100)의 스테이지(103)는 디코더(101)에 의해 추출된 LPSM에 의해 나타내어진 하나 이상의 오디오 데이터 특성들에 기초하여, 디코더(101)로부터 출력된 디코딩된 오디오 데이터에 대한 적응적 라우드니스 프로세싱을 수행하도록 구성된다. 스테이지(103)는 적응적 변환-도메인 실시간 라우드니스 및 동적 범위 제어 프로세서일 수 있다. 스테이지(103)는 사용자 입력(예로서, 사용자 타겟 라우드니스/동적 범위 값들 또는 Dialnorm 값들), 또는 다른 메타데이터 입력(예로서, 하나 이상의 유형들의 제 3 자 데이터, 추적 정보, 식별자들, 독점 또는 표준 정보, 사용자 주석 데이터, 사용자 선호 데이터 등), 및/또는 (예로서, 핑거프린팅 프로세스로부터의) 다른 입력을 수신하고, 디코더(101)로부터 출력된 디코딩된 오디오 데이터를 프로세싱하기 위해 이러한 입력을 사용할 수 있다.The stage 103 of the encoder 100 performs adaptive loudness processing on the decoded audio data output from the decoder 101 based on one or more audio data characteristics indicated by the LPSM extracted by the decoder 101 . is configured to perform Stage 103 may be an adaptive transform-domain real-time loudness and dynamic range control processor. Stage 103 may include user input (eg, user target loudness/dynamic range values or Dialnorm values), or other metadata input (eg, one or more types of third-party data, tracking information, identifiers, proprietary or standard information, user annotation data, user preference data, etc.), and/or other inputs (eg, from a fingerprinting process) and use these inputs to process decoded audio data output from decoder 101 . can

다이얼로그 라우드니스 측정 서브시스템(108)은 검증기(102)로부터의 제어 비트들이 LPSM이 유효하지 않음을 표시할 때, 예를 들면, 디코더(101)에 의해 추출된 LPSM(및/또는 다른 메타데이터)을 사용하여, 다이얼로그(또는 다른 스피치)를 나타내는 (디코더(101)로부터의) 디코딩된 오디오의 세그먼트들의 라우드니스를 결정하도록 동작할 수 있다. 다이얼로그 라우드니스 측정 서브시스템(108)의 동작은 검증기(102)로부터의 제어 비트들이 LPSM이 유효함을 나타낼 때 LPSM이 (디코더(101)로부터의) 디코딩된 오디오의 다이얼로그(또는 다른 스피치) 세그먼트들의 이전 결정된 라우드니스를 나타낼 경우 디스에이블될 수 있다. The dialog loudness measurement subsystem 108 may, for example, evaluate the LPSM (and/or other metadata) extracted by the decoder 101 when the control bits from the verifier 102 indicate that the LPSM is not valid. used to determine the loudness of segments of decoded audio (from decoder 101 ) representing dialog (or other speech). The operation of the dialog loudness measurement subsystem 108 causes the LPSM to transfer dialog (or other speech) segments of the decoded audio (from the decoder 101) when the control bits from the verifier 102 indicate that the LPSM is valid. It may be disabled when indicating the determined loudness.

유용한 툴들(예로서, 돌비 LM100 라우드니스 미터)이 오디오 콘텐트에서 다이얼로그의 레벨을 편리하고 쉽게 측정하기 위해 존재한다. APU(예로서, 인코더(100)의 스테이지(108))의 일부 실시예들은 오디오 비트스트림(예로서, 인코더(100)의 디코더(101)로부터 스테이지(108)로 어서팅된 디코딩된 AC-3 비트스트림)의 오디오 콘텐트의 평균 다이얼로그 라우드니스를 측정하기 위해 이러한 툴을 포함하도록(또는 그것의 기능을 수행하도록) 구현된다. 스테이지(108)가 오디오 데이터의 실제 평균 다이얼로그 라우드니스를 측정하도록 실행된다면, 측정은 대개 스피치를 포함하는 오디오 콘텐트의 세그먼트들을 분리하는 단계를 포함할 수 있다. 대개 스피치인 오디오 세그먼트들은 그 후 라우드니스 측정 알고리즘에 따라 프로세싱된다. AC-3 비트스트림으로부터 디코딩된 오디오 데이터에 대해, 이러한 알고리즘은 (국제 표준 ITU-R BS.1770에 따라) 표준 K-가중 라우드니스 측정일 수 있다. 대안적으로, 다른 라우드니스 측정들이 사용될 수 있다(예로서, 라우드니스의 음향 심리학적 모델들에 기초한 것들).Useful tools (eg, Dolby LM100 Loudness Meter) exist to conveniently and easily measure the level of dialogue in audio content. Some embodiments of an APU (eg, stage 108 of encoder 100 ) have an audio bitstream (eg, decoded AC-3 asserted from decoder 101 of encoder 100 to stage 108 ) It is implemented to include (or perform its function) such a tool for measuring the average dialog loudness of the audio content of a bitstream). If stage 108 is performed to measure the actual average dialog loudness of the audio data, the measurement may include separating segments of audio content that usually contain speech. Audio segments, usually speech, are then processed according to a loudness measurement algorithm. For audio data decoded from an AC-3 bitstream, this algorithm may be a standard K-weighted loudness measure (according to international standard ITU-R BS.1770). Alternatively, other loudness measures may be used (eg, those based on psychoacoustic models of loudness).

스피치 세그먼트들의 분리는 오디오 데이터의 평균 다이얼로그 라우드니스를 측정하는데 필수적인 것은 아니다. 그러나, 그것은 측정의 정확도를 개선하며 청취자의 관점으로부터 보다 만족스러운 결과들을 제공한다. 모든 오디오 콘텐트가 다이얼로그(스피치)를 포함하는 것은 아니기 때문에, 스피치가 존재한다면, 전체 오디오 콘텐트의 라우드니스 측정은 오디오의 다이얼로그 레벨의 충분한 근사를 제공할 수 있다.Separation of speech segments is not necessary to measure the average dialog loudness of the audio data. However, it improves the accuracy of the measurement and provides more satisfactory results from the listener's point of view. Since not all audio content contains dialogue (speech), if there is speech, a measure of the loudness of the entire audio content can provide a sufficient approximation of the dialogue level of the audio.

메타데이터 발생기(106)는 인코더(100)로부터 출력될 인코딩된 비트스트림에서 스테이지(107)에 의해 포함될 메타데이터를 생성한다. 메타데이터 발생기(106)는 인코더(101)에 의해 추출된 LPSM(및/또는 다른 메타데이터)을 스테이지(107)로 통과시키거나(예로서, 검증기(102)로부터의 제어 비트들이 LPSM 및/또는 다른 메타데이터가 유효하다고 표시할 때) 또는 새로운 LPSM(및/또는 다른 메타데이터)을 생성하고, 새로운 메타데이터를 스테이지(107)로 어서팅하거나(예를 들면, 검증기(102)로부터의 제어 비트들이 디코더(101)에 의해 추출된 LPSM 및/또는 다른 메타데이터가 유효하지 않다고 표시할 때) 또는 스테이지(107)로 디코더(1010에 의해 추출된 메타데이터 및 새롭게 생성된 메타데이터의 조합을 어서팅할 수 있다. 메타데이터 발생기(106)는 서브시스템(108)에 의해 생성된 라우드니스 데이터, 및 서브시스템(108)에 의해 수행된 라우드니스 프로세싱의 유형을 나타내는 적어도 하나의 값을 포함할 수 있으며, LPSM에서, 인코더(100)로부터 출력될 인코딩된 비트스트림에 포함하기 위해 스테이지(107)로 어서팅한다. 메타데이터 발생기(106)는 인코딩된 비트스트림에 포함될 LPSM(및 선택적으로 또한 다른 메타데이터) 및/또는 인코딩된 비트스트림에 포함될 기본 오디오 데이터의 복호화(decryption), 인증, 또는 검증 중 적어도 하나에 유용한 보호 비트들(해시-기반 메시지 인증 코드 또는 "HMAC")으로 이루어지거나 또는 이를 포함할 수 있음)을 생성할 수 있다. 메타데이터 발생기(106)는 인코딩된 비트스트림에 포함하기 위해 스테이지(107)로 이러한 보호 비트들을 제공할 수 있다.The metadata generator 106 generates metadata to be included by the stage 107 in the encoded bitstream to be output from the encoder 100 . The metadata generator 106 passes the LPSM (and/or other metadata) extracted by the encoder 101 to the stage 107 (eg, the control bits from the verifier 102 are create a new LPSM (and/or other metadata) or assert the new metadata to stage 107 (e.g., control bits from verifier 102) or when other metadata indicates that it is valid assert the combination of the newly generated metadata and the metadata extracted by the decoder 1010 to the stage 107) or to stage 107 The metadata generator 106 may include the loudness data generated by the subsystem 108 and at least one value indicating the type of loudness processing performed by the subsystem 108, the LPSM , assert to stage 107 for inclusion in the encoded bitstream to be output from encoder 100. Metadata generator 106 includes an LPSM (and optionally also other metadata) to be included in the encoded bitstream and / or may consist of or contain protection bits (hash-based message authentication code or "HMAC") useful for at least one of decryption, authentication, or verification of the underlying audio data to be included in the encoded bitstream ) can be created. The metadata generator 106 may provide these guard bits to the stage 107 for inclusion in the encoded bitstream.

일 실시예에서, 다이얼로그 라우드니스 측정 서브시스템(108)은 디코더(101)로부터 출력된 오디오 데이터를 프로세싱하고 그에 응답하여 라우드니스 값들(예로서, 게이팅된 및 게이팅되지 않은 다이얼로그 라우드니스 값들) 및 동적 범위 값들을 생성한다. 이들 값들에 응답하여, 메타데이터 발생기(106)는 (스터퍼/포맷터(107)에 의해) 인코더(100)로부터 출력될 인코딩된 비트스트림에 포함하기 위해 라우드니스 프로세싱 상태 메타데이터(LPSM)를 생성할 수 있다. 실시예에서, 라우드니스는 ITU-R BS.1770-1 및 ITU-R BS.1770-2 표준들, 또는 다른 유사한 라우드니스 측정 표준들에 의해 특정된 기술들에 기초하여 산출될 수 있다. 게이팅된 라우드니스는 다이얼로그-게이팅 라우드니스 또는 상대적-게이팅 라우드니스, 또는 이들 게이팅 라우드니스 유형들의 조합일 수 있으며, 시스템은 애플리케이션 요건들 및 시스템 제약들에 의존하여 적절한 게이팅 블록들을 이용할 수 있다.In one embodiment, dialog loudness measurement subsystem 108 processes audio data output from decoder 101 and in response calculates loudness values (eg, gated and non-gated dialog loudness values) and dynamic range values. create In response to these values, the metadata generator 106 generates loudness processing state metadata (LPSM) for inclusion in the encoded bitstream to be output from the encoder 100 (by the stuffer/formatter 107 ). can In an embodiment, the loudness may be calculated based on techniques specified by the ITU-R BS.1770-1 and ITU-R BS.1770-2 standards, or other similar loudness measurement standards. The gated loudness may be dialog-gated loudness or relative-gated loudness, or a combination of these gating loudness types, and the system may use appropriate gating blocks depending on application requirements and system constraints.

부가적으로, 선택적으로, 또는 대안적으로, 인코더(100)의 106 및/또는 108의 서브시스템들은 스테이지(107)로부터 출력될 인코딩된 비트스트림에 포함하기 위해 오디오 데이터의 적어도 하나의 특성을 나타내는 메타데이터를 생성하기 위해 오디오 데이터의 부가적인 분석을 수행할 수 있다. 인코더(105)는 선택 스테이지(104)로부터 출력된 오디오 데이터를 (예로서, 그에 대한 압축을 수행함으로써) 인코딩하고, 인코딩된 오디오를 스테이지(107)로부터 출력될 인코딩된 비트스트림에 포함하기 위해 스테이지(107)로 어서팅한다. Additionally, optionally, or alternatively, subsystems 106 and/or 108 of encoder 100 represent at least one characteristic of audio data for inclusion in an encoded bitstream to be output from stage 107 . Additional analysis of the audio data may be performed to generate metadata. The encoder 105 encodes the audio data output from the selection stage 104 (eg, by performing compression thereon), and stages the encoded audio for inclusion in the encoded bitstream to be output from the stage 107 . assert with (107).

스테이지(107)는 스테이지(107)로부터 출력될 인코딩된 비트스트림을 생성하기 위해 발생기(106)로부터의 메타데이터(LPSM을 포함) 및 인코더(105)로부터의 인코딩된 오디오를 다중화하며, 따라서 인코딩된 비트스트림은 실시예에 의해 특정된 바와 같은 포맷을 가진다. 프레임 버퍼(109)는 스테이지(107)로부터 출력된 인코딩된 오디오 비트스트림의 적어도 하나의 프레임을 (예로서, 비-일시적 방식으로) 저장하는 버퍼 메모리이며, 인코딩된 오디오 비트스트림의 프레임들의 시퀀스는 그 후 인코더(100)로부터의 출력으로서 버퍼(109)로부터 전달 시스템(150)으로 어서팅된다.Stage 107 multiplexes the encoded audio from encoder 105 and metadata from generator 106 (including LPSM) to produce an encoded bitstream to be output from stage 107, and thus encoded The bitstream has a format as specified by the embodiment. The frame buffer 109 is a buffer memory that stores (eg, in a non-transitory manner) at least one frame of an encoded audio bitstream output from the stage 107 , wherein the sequence of frames of the encoded audio bitstream includes: It is then asserted from the buffer 109 to the delivery system 150 as output from the encoder 100 .

메타데이터 발생기(106)에 의해 생성되고 스테이지(107)에 의해 인코딩된 비트스트림에 포함된 LPSM은 대응하는 오디오 데이터의 라우드니스 프로세싱 상태(예로서, 어떤 유형(들)의 라우드니스 프로세싱이 오디오 데이터에 대해 수행되었는지) 및 대응하는 오디오 데이터의 라우드니스(예로서, 측정된 다이얼로그 라우드니스, 게이팅된 및/또는 게이팅되지 않은 라우드니스, 및/또는 동적 범위)를 나타낸다. 여기에서, 오디오 데이터에 대해 수행된 라우드니스 및/또는 레벨 측정들의 "게이팅"은 임계치를 초과하는 계산된 값(들)이 최종 측정에 포함되는 라우드니스 임계치 또는 특정 레벨(예로서, 최종 측정 값들에서 -60 dBFS 아래의 단기 라우드니스 값들을 무시)을 나타낸다. 절대 값에 대한 게이팅은 고정 레벨 또는 라우드니스를 나타내는 반면, 상대적 값에 대한 게이팅은 현재 "게이팅되지 않은" 측정 값에 의존하는 값을 나타낸다.The LPSM generated by the metadata generator 106 and included in the bitstream encoded by the stage 107 indicates the loudness processing state of the corresponding audio data (eg, what type(s) of loudness processing is performed for the audio data). performed) and the loudness of the corresponding audio data (eg, measured dialog loudness, gated and/or non-gated loudness, and/or dynamic range). Here, “gating” of loudness and/or level measurements performed on audio data means that the calculated value(s) exceeding the threshold are included in the final measurement, either at a loudness threshold or at a specific level (e.g., at the final measurement values - Ignoring short-term loudness values below 60 dBFS). Gating on absolute values indicates a fixed level or loudness, while gating on relative values indicates values that depend on the current “non-gated” measurement.

인코더(100)의 몇몇 구현에서, 메모리(109)에 버퍼링된(및 전달 시스템(150)에 출력된) 인코딩된 비트스트림은 AC-3 비트스트림 또는 E-AC-3 비트스트림이며, 오디오 데이터 세그먼트들(예로서, 도 4에 도시된 프레임의 AB0 내지 AB5 세그먼트들) 및 메타데이터 세그먼트들을 포함하고, 여기에서 오디오 데이터 세그먼트들은 오디오 데이터를 나타내며, 메타데이터 세그먼트들의 적어도 일부의 각각은 라우드니스 프로세싱 상태 메타데이터(LPSM)를 포함한다. 스테이지(107)는 다음의 포맷에서 비트스트림에 LPSM을 삽입한다. LPSM을 포함하는 메타데이터 세그먼트들의 각각은 비트스트림의 프레임의 비트스트림 정보("BSI") 세그먼트의 "addbsi" 필드에, 또는 비트스트림의 프레임의 끝에서 보조데이터 필드(예로서, 도 4에 도시된 AUX 세그먼트)에 포함된다.In some implementations of encoder 100 , the encoded bitstream buffered in memory 109 (and output to delivery system 150 ) is an AC-3 bitstream or an E-AC-3 bitstream, and an audio data segment (eg, segments AB0 through AB5 of the frame shown in FIG. 4 ) and metadata segments, wherein the audio data segments represent audio data, wherein each of at least some of the metadata segments is a loudness processing state meta data (LPSM). Stage 107 inserts the LPSM into the bitstream in the following format. Each of the metadata segments containing the LPSM is stored in the "addbsi" field of the bitstream information ("BSI") segment of the frame of the bitstream, or in the ancillary data field (e.g., shown in FIG. 4) at the end of the frame of the bitstream. included in the AUX segment).

비트스트림의 프레임은 그 각각이 LPSM을 포함하는, 하나 또는 두 개의 메타데이터 세그먼트들을 포함하며, 프레임이 두 개의 메타데이터 세그먼트들을 포함한다면, 하나는 프레임의 addbsi 필드에 존재하고 다른 하나는 프레임의 AUX 필드에 존재한다. LPSM을 포함한 각각의 메타데이터 세그먼트는 다음의 포맷을 가진 LPSM 페이로드(또는 컨테이너) 세그먼트를 포함한다: 헤더(예로서, LPSM 페이로드의 시작을 식별하는 syncword, 이어서 적어도 하나의 식별 값, 예로서 LPSM 포맷 버전, 길이, 기간, 카운트, 및 이하의 표 2에 표시된 서브스트림 연관 값들을 포함); 및 헤더 후, 대응하는 오디오 데이터가 다이얼로그를 표시하는지 또는 다이얼로그를 표시하는 않는지를 표시한 적어도 하나의 다이얼로그 표시 값(예로서, 표 2의 파라미터 "다이얼로그 채널(들)")(예로서, 대응하는 오디오 데이터의 어떤 채널들이 다이얼로그를 표시하는지); 대응하는 오디오 데이터가 표시된 세트의 라우드니스 규정들을 따르는지를 표시한 적어도 하나의 라우드니스 규정 준수 값(예로서, 표 2의 파라미터 "라우드니스 규정 유형"); 대응하는 오디오 데이터에 대해 수행된 라우드니스 프로세싱의 적어도 하나의 유형을 표시한 적어도 하나의 라우드니스 프로세싱 값(예로서, 표 2의 파라미터들 "다이얼로그 게이팅 라우드니스 교정 플래그", "라우드니스 교정 유형") 중 하나 이상); 및 대응하는 오디오 데이터의 적어도 하나의 라우드니스 (예로서, 피크 또는 평균 라우드니스) 특성을 표시한 적어도 하나의 라우드니스 값(예로서, 표 2의 파라미터들 "ITU 상대적 게이팅 라우드니스", "ITU 스피치 게이팅 라우드니스", "ITU (EBU 3341) 단기 3s 라우드니스", 및 "실제 피크" 중 하나 이상).A frame of a bitstream contains one or two metadata segments, each containing an LPSM, and if a frame contains two metadata segments, one is in the addbsi field of the frame and the other is the AUX of the frame. exists in the field. Each metadata segment comprising an LPSM includes an LPSM payload (or container) segment with the following format: a header (eg, a syncword identifying the start of the LPSM payload, followed by at least one identifying value, eg LPSM format version, length, duration, count, and substream associated values shown in Table 2 below); and after the header, at least one dialog display value (eg, the parameter “dialog channel(s)” of Table 2) indicating whether the corresponding audio data displays a dialog or does not display a dialog (eg, a corresponding which channels of audio data display the dialog); at least one loudness rule compliance value indicating whether the corresponding audio data conforms to the indicated set of loudness rules (eg, the parameter “loudness rule type” in Table 2); One or more of at least one loudness processing value indicating at least one type of loudness processing performed on the corresponding audio data (eg, parameters “Dialog Gating Loudness Correction Flag”, “Loudness Correction Type” in Table 2) ); and at least one loudness value indicative of at least one loudness (eg, peak or average loudness) characteristic of the corresponding audio data (eg, parameters “ITU Relative Gating Loudness”, “ITU Speech Gating Loudness” in Table 2) , "ITU (EBU 3341) short-term 3s loudness", and one or more of "real peaks").

몇몇 구현들에서, 스테이지(107)에 의해 비트스트림의 프레임의 "addbsi" 필드 또는 보조데이터 필드에 삽입된 메타데이터 세그먼트들의 각각은 다음의 포맷을 가진다: 코어 헤더(예로서, 메타데이터 세그먼트의 시작을 식별하는 syncword, 이어서 식별 값들, 예로서 코어 요소 버전, 길이, 및 이하의 표 1에 표시된 기간, 연장된 요소 카운트, 및 서브스트림 연관 값들을 포함); 및 상기 코어 헤더 후, 라우드니스 프로세싱 상태 메타데이터 또는 대응하는 오디오 데이터의 적어도 하나의 복호화, 인증, 또는 검증의 적어도 하나에 유용한 적어도 하나의 보호 값(예로서, 표 1의 HMAC 다이제스트 및 오디오 핑거프린트 값들); 및 또한 상기 코어 헤더 후, 메타데이터 세그먼트가 LPSM을 포함한다면, 이어지는 메타데이터를 LPSM 페이로드로서 식별하며 LPSM 페이로드의 크기를 표시하는 LPSM 페이로드 식별("ID") 및 LPSM 페이로드 크기 값들.In some implementations, each of the metadata segments inserted by stage 107 into an "addbsi" field or ancillary data field of a frame of the bitstream has the following format: a core header (eg, start of a metadata segment) a syncword that identifies , followed by identification values, such as core element version, length, and duration, extended element count, and substream association values as shown in Table 1 below); and, after the core header, at least one protection value useful for at least one of decryption, authentication, or verification of at least one of loudness processing state metadata or corresponding audio data (eg, the HMAC digest and audio fingerprint values of Table 1). ); and LPSM Payload Identification (“ID”) and LPSM Payload Size values that also identify subsequent metadata as an LPSM payload and indicate the size of the LPSM payload, if the metadata segment contains LPSM after the core header.

LPSM 페이로드(또는 컨테이너) 세그먼트(예로서, 상기-특정된 포맷을 가짐)는 LPSM 페이로드 ID 및 LPSM 페이로드 크기 값들을 따른다.An LPSM payload (or container) segment (eg, having the above-specified format) conforms to the LPSM Payload ID and LPSM Payload Size values.

일부 실시예들에서, 프레임의 보조데이터 필드(또는 "addbsi" 필드)에서의 메타데이터 세그먼트들의 각각은 3개의 레벨들의 구조를 가진다: 보조데이터(또는 addbsi) 필드가 메타데이터를 포함하는지를 표시한 플래그, 어떤 유형(들)의 메타데이터가 존재하는지를 표시한 적어도 하나의 ID 값, 및 선택적으로 또한 얼마나 많으 비트들의 (예로서, 각각의 유형의) 메타데이터가 존재하는지(메타데이터가 존재한다면)를 표시한 값을 포함한 고 레벨 구조. 존재할 수 있는 메타데이터의 한가지 유형은 LPSM이며, 존재할 수 있는 메타데이터의 또 다른 유형은 미디어 리서치 메타데이터(예로서, 닐센 미디어 리서처 메타데이터)이다; 각각의 식별된 유형의 메타데이터를 위한 코어 요소(예로서, 상기 언급된 바와 같이, 메타데이터의 각각의 식별된 유형에 대한, 코어 헤더, 보호 값들, 및 LPSM 페이로드 ID 및 LPSM 페이로드 크기 값들)를 포함한 중간 레벨 구조; 및 하나의 코어 요소에 대한 각각의 페이로드(예로서, 하나가 코어 요소에 의해 존재하는 것으로서 식별된다면, LPSM 레이로드, 및/또는 하나가 코어 요소에 의해 존재하는 것으로서 식별된다면, 또 다른 유형의 메타데이터 페이로드)를 포함한, 저 레벨 구조.In some embodiments, each of the metadata segments in the ancillary data field (or "addbsi" field) of the frame has three levels of structure: a flag indicating whether the ancillary data (or addbsi) field contains metadata , at least one ID value indicating what type(s) of metadata is present, and optionally also how many bits of (eg each type) metadata present (if metadata exists). High-level structure with displayed values. One type of metadata that may exist is LPSM, and another type of metadata that may exist is media research metadata (eg, Nielsen Media Researcher metadata); A core element for each identified type of metadata (eg, a core header, protection values, and LPSM Payload ID and LPSM Payload Size values, for each identified type of metadata, as noted above) ), including mid-level structures; and each payload for one core element (eg, an LPSM payload if one is identified as present by the core element, and/or another type of payload if one is identified as present by the core element) low-level structure, including metadata payload).

이러한 3 레벨 구조에서의 데이터 값들은 내포될 수 있다. 예를 들면, 코어 요소에 의해 식별된 LPSM 페이로드 및/또는 또 다른 메타데이터 페이로드에 대한 보호 값(들)은 코어 요소에 의해 식별된 각각의 페이로드 후(및 따라서 코어 요소의 코어 헤더 후) 포함될 수 있다. 일 예에서, 코어 헤더는 LPSM 페이로드 및 또 다른 메타데이터 페이로드를 식별할 수 있고, 제 1 페이로드(예로서, LPSM 페이로드)에 대한 페이로드 ID 및 페이로드 크기 값들은 코어 헤더를 따를 수 있고, 제 1 페이로드 자체는 ID 및 크기 값들을 따를 수 있고, 제 2 페이로드에 대한 페이로드 ID 및 페이로드 크기 값은 제 1 페이로드를 따를 수 있고, 제 2 페이로드 자체는 이들 ID 및 크기 값들을 따를 수 있으며, 양쪽 페이로드들에 대한(또는 코어 요소 값들 및 양쪽 페이로드들 모두에 대한) 보호 비트들은 마지막 페이로드를 따를 수 있다.Data values in this three-level structure may be nested. For example, the protection value(s) for an LPSM payload and/or another metadata payload identified by a core element may be determined after each payload identified by the core element (and thus after the core header of the core element). ) may be included. In one example, the core header may identify an LPSM payload and another metadata payload, and payload ID and payload size values for the first payload (eg, LPSM payload) follow the core header. wherein the first payload itself can follow the ID and size values, the payload ID and payload size value for the second payload can follow the first payload, and the second payload itself can follow these IDs and magnitude values, and the guard bits for both payloads (or for both core element values and both payloads) may follow the last payload.

일부 실시예들에서, 디코더(101)가 암호 해시를 갖는 본 발명의 실시예에 따라 생성된 오디오 비트스트림을 수신한다면, 디코더는 비트스트림으로부터 결정된 데이터 블록으로부터 암호 해시를 파싱 및 검색하도록 구성되며, 상기 블록은 라우드니스 프로세싱 상태 메타데이터(LPSM)를 포함한다. 검증기(102)는 수신된 비트스트림 및/또는 연관된 메타데이터를 검증하기 위해 암호 해시를 사용할 수 있다. 예를 들면, 검증기(102)는 기준 암호 해시와 데이터 블록으로부터 검색된 암호 해시 사이에서의 매칭에 기초하여 LPSM을 유효한 것으로 발견하며, 그 후 대응하는 오디오 데이터에 대한 프로세서(103)의 동작을 불능시킬 수 있으며 선택 스테이지(104)로 하여금 (변경되지 않은) 오디오 데이터를 통과시키게 할 수 있다. 부가적으로, 선택적으로, 또는 대안적으로, 다른 유형들의 암호 기술들이 암호 해시에 기초한 방법을 대신하여 사용될 수 있다.In some embodiments, if the decoder 101 receives an audio bitstream generated according to an embodiment of the present invention having a cryptographic hash, the decoder is configured to parse and retrieve the cryptographic hash from a data block determined from the bitstream, The block contains Loudness Processing State Metadata (LPSM). Validator 102 may use the cryptographic hash to verify the received bitstream and/or associated metadata. For example, the verifier 102 finds the LPSM valid based on a match between the reference cryptographic hash and the cryptographic hash retrieved from the data block, and then disables the processor 103 on the corresponding audio data. and cause the selection stage 104 to pass (unmodified) audio data. Additionally, alternatively, or alternatively, other types of cryptographic techniques may be used in place of cryptographic hash-based methods.

도 2의 인코더(100)는 후/전-처리 유닛이 (요소들(105, 106, 및 107)에서) 인코딩될 오디오 데이터에 대한 일 유형의 라우드니스 프로세싱을 수행했는지를 (디코더(101)에 의해 추출된 LPSM에 응답하여) 결정할 수 있으며, 따라서 이전 수행된 라우드니스 프로세싱에서 사용되며 및/또는 그로부터 도출된 특정 파라미터들을 포함하는 라우드니스 프로세싱 상태 메타데이터를 (발생기(106)에서) 생성할 수 있다. 몇몇 구현들에서, 인코더(100)는 인코더가 오디오 콘텐트에 대해 수행되는 프로세싱의 유형들을 알고 있는 한 오디오 콘텐트에 대한 프로세싱 이력을 나타내는 프로세싱 상태 메타데이터를 생성할 수 있다(및 그로부터 출력된 인코딩된 비트스트림에 포함할 수 있다).The encoder 100 of FIG. 2 determines (by the decoder 101 ) whether the post/pre-processing unit has performed (in elements 105 , 106 , and 107 ) one type of loudness processing on the audio data to be encoded. may be determined (in response to the extracted LPSM), thus generating (in the generator 106 ) loudness processing state metadata comprising specific parameters used in and/or derived from previously performed loudness processing. In some implementations, encoder 100 can generate processing state metadata representing processing history for audio content (and encoded bits output therefrom) as long as the encoder knows the types of processing being performed on the audio content. can be included in the stream).

도 3은 도 1의 시스템(10)과 함께 사용될 수 있는 디코더의 블록도이다. 디코더(200) 및 후-처리기(300)의 구성요소들 또는 요소들 중 어떠한 것도 하드웨어, 소프트웨어, 또는 하드웨어 및 소프트웨어의 조합으로 하나 이상의 프로세스들 및/또는 하나 이상의 회로들(예로서, ASIC들, FPGA들, 또는 다른 집적 회로들)로서 구현될 수 있다. 디코더(200)는 도시된 바와 같이 접속된, 프레임 버퍼(201), 파서(205), 오디오 디코더(202), 오디오 상태 검증 스테이지(검증기)(203), 및 제어 비트 발생 스테이지(204)를 포함한다. 디코더(200)는 다른 프로세싱 요소들(도시되지 않음)을 포함할 수 있다. 프레임 버퍼(201)(버퍼 메모리)는 디코더(200)에 의해 수신된 인코딩된 오디오 비트스트림의 적어도 하나의 프레임을 (예로서, 비-일시적 방식으로) 저장한다. 인코딩된 오디오 비트스트림의 프레임들의 시퀀스는 버퍼(201)로부터 파서(205)로 어서팅된다. 파서(205)는 인코딩된 입력 오디오의 각각의 프레임으로부터 라우드니스 프로세싱 상태 메타데이터(LPSM) 및 다른 메타데이터를 추출하고, 적어도 상기 LPSM을 오디오 상태 검증기(203) 및 스테이지(204)에 어서팅하고, 출력으로서 상기 LSPM을 (예로서, 후-처리기(300)에) 어서팅하고, 인코딩된 입력 오디오로부터 오디오 데이터를 추출하고, 추출된 오디오 데이터를 디코더(202)에 어서팅하도록 결합되고 구성된다. 디코더(200)에 입력된 인코딩된 오디오 비트스트림은 AC-3 비트스트림, E-AC-3 비트스트림, 또는 돌비 E 비트스트림 중 하나일 수 있다.3 is a block diagram of a decoder that may be used with system 10 of FIG. 1 . None of the components or elements of decoder 200 and post-processor 300 are hardware, software, or a combination of hardware and software, such as one or more processes and/or one or more circuits (eg, ASICs; FPGAs, or other integrated circuits). The decoder 200 includes a frame buffer 201 , a parser 205 , an audio decoder 202 , an audio state verification stage (verifier) 203 , and a control bit generation stage 204 , connected as shown. do. The decoder 200 may include other processing elements (not shown). The frame buffer 201 (buffer memory) stores (eg, in a non-transitory manner) at least one frame of an encoded audio bitstream received by the decoder 200 . A sequence of frames of the encoded audio bitstream is asserted from the buffer 201 to the parser 205 . parser 205 extracts loudness processing state metadata (LPSM) and other metadata from each frame of encoded input audio, and asserts at least said LPSM to audio state verifier 203 and stage 204; coupled and configured to assert the LSPM (eg, to the post-processor 300 ) as output, extract audio data from the encoded input audio, and assert the extracted audio data to the decoder 202 . The encoded audio bitstream input to the decoder 200 may be one of an AC-3 bitstream, an E-AC-3 bitstream, or a Dolby E bitstream.

도 3의 시스템은 또한 후-처리기(300)를 포함한다. 후-처리기(300)는 프레임 버퍼(301) 및 버퍼(301)에 결합된 적어도 하나의 프로세싱 요소를 포함한 다른 프로세싱 요소들(도시되지 않음)을 포함한다. 프레임 버퍼(301)는 후-처리기(300)에 의해 디코더(200)로부터 수신된 디코딩된 오디오 비트스트림의 적어도 하나의 프레임을 (예로서, 비-일시적 방식으로) 저장한다. 후-처리기(300)의 프로세싱 요소들은 디코더(202)로부터 출력된 메타데이터(LPSM 값들을 포함) 및/또는 디코더(200)의 스테이지(204)로부터 출력된 제어 비트들을 사용하여, 버퍼(301)로부터 출력된 디코딩된 오디오 비트스트림의 프레임들의 시퀀스를 수신하며 적응적으로 프로세싱하도록 결합되고 구성된다. 일 실시예에서, 후-처리기(300)는 (예로서, LPSM에 의해 표시된, 라우드니스 프로세싱 상태, 및/또는 하나 이상의 오디오 데이터 특성들에 기초하여) LPSM 값들을 사용하여 디코딩된 오디오 데이터에 대한 적응적 라우드니스 프로세싱을 수행하도록 구성된다. 디코더(200) 및 후-처리기(300)의 다양한 구현들은 여기에 설명된 실시예들에 따른 방법들의 상이한 실시예들을 수행하도록 구성된다.The system of FIG. 3 also includes a post-processor 300 . Post-processor 300 includes frame buffer 301 and other processing elements (not shown) including at least one processing element coupled to buffer 301 . The frame buffer 301 stores (eg, in a non-transitory manner) at least one frame of the decoded audio bitstream received from the decoder 200 by the post-processor 300 . The processing elements of the post-processor 300 use the metadata (including LPSM values) output from the decoder 202 and/or the control bits output from the stage 204 of the decoder 200 , the buffer 301 . Combined and configured to receive and adaptively process a sequence of frames of a decoded audio bitstream output from In one embodiment, the post-processor 300 adapts to the decoded audio data using the LPSM values (eg, based on a loudness processing state, indicated by the LPSM, and/or one or more audio data characteristics). and perform enemy loudness processing. Various implementations of decoder 200 and post-processor 300 are configured to perform different embodiments of methods in accordance with the embodiments described herein.

디코더(200)의 오디오 디코더(202)는 디코딩된 오디오 데이터를 생성하기 위해 파서(205)에 의해 추출된 오디오 데이터를 디코딩하고, 출력으로서 상기 디코딩된 오디오 데이터를 (예로서, 후-처리기(300)에) 어서팅하도록 구성된다. 상태 검증기(203)는 어서팅된 LPSM(및 선택적으로 다른 메타데이터)을 인증 및 검증하도록 구성된다. 일부 실시예들에서, 상기 LPSM은 (예로서, 본 발명의 실시예에 따라) 입력 비트스트림에 포함된 데이터 블록이다(또는 그것에 포함된다). 상기 블록은 LPSM(및 선택적으로 또한 다른 메타데이터) 및/또는 (파서(205) 및/또는 디코더(202)로부터 검증기(203)로 제공된) 기본 오디오 데이터를 프로세싱하기 위한 암호 해시(해시-기반 메시지 인증 코드 또는 "HMAC")를 포함할 수 있다. 데이터 블록은 이들 실시예들에서 디지털로 서명될 수 있으며, 따라서 다운스트림 오디오 프로세싱 유닛은 프로세싱 상태 메타데이터를 비교적 쉽게 인증 및 검증할 수 있다.The audio decoder 202 of the decoder 200 decodes the audio data extracted by the parser 205 to generate decoded audio data, and outputs the decoded audio data as an output (eg, the post-processor 300 ) to) is configured to assert. The state verifier 203 is configured to authenticate and verify the asserted LPSM (and optionally other metadata). In some embodiments, the LPSM is (or is included in) a data block included in an input bitstream (eg, according to an embodiment of the present invention). The block is a cryptographic hash (hash-based message) for processing LPSM (and optionally also other metadata) and/or basic audio data (provided from parser 205 and/or decoder 202 to verifier 203 ). authentication code or "HMAC"). The data block may be digitally signed in these embodiments, so that the downstream audio processing unit can relatively easily authenticate and verify the processing state metadata.

이에 제한되지 않지만, 하나 이상의 비-HMAC 암호 방법들 중 임의의 것을 포함한 다른 암호 방법들이 LPSM 및/또는 기본 오디오 데이터의 안전한 송신 및 수신을 보장하기 위해 (예로서, 검증기(203)에서) LPSM의 검증을 위해 사용될 수 있다. 예를 들면, (이러한 암호 방법을 사용한) 검증은 비트스트림에 포함된 라우드니스 프로세싱 상태 메타데이터 및 대응하는 오디오 데이터가 (메타데이터에 의해 표시된 바와 같이) 특정 라우드니스 프로세싱을 받는지(및/또는 그로부터 결과인지) 및 이러한 특정 라우드니스 프로세싱의 수행 후 변경되지 않았는지를 결정하기 위해 본 발명의 오디오 비트스트림의 전형을 수신하는 각각의 오디오 프로세싱에서 수행될 수 있다.Other cryptographic methods including, but not limited to, any of one or more non-HMAC cryptographic methods may be used to ensure secure transmission and reception of LPSM and/or underlying audio data (eg, at verifier 203 ). It can be used for verification. For example, verification (using these cryptographic methods) can determine whether loudness processing state metadata and corresponding audio data included in a bitstream are subject to (and/or result from) certain loudness processing (as indicated by the metadata). ) and in each audio processing receiving a typical audio bitstream of the present invention to determine if it has not changed after performing this particular loudness processing.

상태 검증기(203)는, 검증 동작의 결과들을 표시하기 위해, 제어 데이터를 제어 비트 발생기(204)에 어서팅하고, 및/또는 (예로서, 후-처리기(300)에) 출력으로서 제어 데이터를 어서팅한다. 제어 데이터(및 선택적으로 또한 입력 비트스트림으로부터 추출된 다른 메타데이터)에 응답하여, 스테이지(204)는 다음 중 하나를 생성할 수 있다(및 후-처리기(300)에 어서팅할 수 있다): 디코더(202)로부터 출력된 디코딩된 오디오 데이터가 특정 유형의 라우드니스 프로세싱을 받았는지를 표시하는 제어 비트들(LPSM이 디코더(202)로부터 출력된 오디오 데이터가 특정 유형의 라우드니스 프로세싱을 받았음을 표시하며, 검증기(203)로부터의 제어 비트들이 LPSM이 유효하다고 표시할 때); 또는 디코더(202)로부터 출력된 디코딩된 오디오 데이터가 특정 유형의 라우드니스 프로세싱을 받아야 한다고 표시하는 제어 비트들(예로서, LPSM이 디코더(202)로부터 출력된 오디오 데이터가 특정 유형의 라우드니스 프로세싱을 받지 않았음을 표시할 때, 또는 LPSM이 디코더(202)로부터 출력된 오디오 데이터가 특정 유형의 라우드니스 프로세싱을 받았음을 표시하지만 검증기(203)로부터의 제어 비트들은 LPSM이 유효하지 않다고 표시할 때).The state verifier 203 asserts the control data to the control bit generator 204 and/or as output (eg, to the post-processor 300 ) to indicate the results of the verification operation. assert In response to the control data (and optionally also other metadata extracted from the input bitstream), stage 204 may generate (and assert to post-processor 300 ) one of the following: Control bits indicating whether the decoded audio data output from the decoder 202 has been subjected to a specific type of loudness processing (LPSM indicates that the audio data output from the decoder 202 has been subjected to a specific type of loudness processing; when the control bits from verifier 203 indicate that the LPSM is valid); or control bits indicating that the decoded audio data output from the decoder 202 should be subjected to a specific type of loudness processing (eg, the LPSM indicates that the audio data output from the decoder 202 has not been subjected to a specific type of loudness processing). when indicating a tone, or when the LPSM indicates that the audio data output from the decoder 202 has undergone some type of loudness processing but the control bits from the verifier 203 indicate that the LPSM is invalid).

대안적으로, 디코더(200)는 입력 비트스트림으로부터 디코더(202)에 의해 추출된 LPSM(및 임의의 다른 메타데이터)을 후-처리기(300)에 어서팅하며, 후-처리기(300)는 LPSM을 사용하여 디코딩된 오디오 데이터에 대한 라우드니스 프로세싱을 수행하거나, 또는 LPSM의 검증을 수행하며 그 후 검증이 LPSM이 유효하다고 표시한다면 LPSM을 사용하여 디코딩된 오디오 데이터에 대한 라우드니스 프로세싱을 수행한다.Alternatively, the decoder 200 asserts the LPSM (and any other metadata) extracted by the decoder 202 from the input bitstream to the post-processor 300, which in turn the LPSM Perform loudness processing on the decoded audio data using LPSM, or perform verification of the LPSM and then perform loudness processing on the decoded audio data using the LPSM if the verification indicates that the LPSM is valid.

일부 실시예들에서, 디코더(201)가 암호 해시를 갖는 본 발명의 실시예에 따라 생성된 오디오 비트스트림을 수신한다면, 디코더는 비트스트림으로부터 결정된 데이터 블록으로부터 암호 해시를 파싱 및 검색하도록 구성되며, 상기 블록은 라우드니스 프로세싱 상태 메타데이터(LPSM)를 포함한다. 검증기(203)는 수신된 비트스트림 및/또는 연관된 메타데이터를 검증하기 위해 암호 해시를 사용할 수 있다. 예를 들면, 검증기(203)가 기준 암호 해시와 데이터 블록으로부터 검색된 암호 해시 사이에서의 매칭에 기초하여 LPSM을 유효한 것으로 발견한다면, 비트스트림의 (변경되지 않은) 오디오 데이터를 통과시키기 위해 다운스트림 오디오 프로세싱 유닛(예로서, 볼륨 레벨링 유닛이거나 또는 이를 포함할 수 있는, 후-처리기(300))으로 시그널링할 수 있다. 부가적으로, 선택적으로, 또는 대안적으로, 다른 유형들의 암호 기술들이 암호 해시에 기초한 방법을 대신하여 사용될 수 있다.In some embodiments, if the decoder 201 receives an audio bitstream generated according to an embodiment of the present invention having a cryptographic hash, the decoder is configured to parse and retrieve the cryptographic hash from a data block determined from the bitstream, The block contains Loudness Processing State Metadata (LPSM). The verifier 203 may use the cryptographic hash to verify the received bitstream and/or associated metadata. For example, if the verifier 203 finds the LPSM valid based on a match between the reference cryptographic hash and the cryptographic hash retrieved from the data block, the downstream audio data of the bitstream (unaltered) is passed through. may signal to a processing unit (eg, post-processor 300 , which may be or include a volume leveling unit). Additionally, alternatively, or alternatively, other types of cryptographic techniques may be used in place of cryptographic hash-based methods.

디코더(100)의 몇몇 구현들에서, 수신된(및 메모리(201)에 버퍼링된) 인코딩된 비트스트림은 AC-3 비트스트림 또는 E-AC-3 비트스트림이며, 오디오 데이터 세그먼트들(예로서, 도 4에 도시된 프레임의 AB0 내지 AB5 세그먼트들) 및 메타데이터 세그먼트들을 포함하고, 여기에서 오디오 데이터 세그먼트들은 오디오 데이터를 표시하며, 메타데이터 세그먼트들 중 적어도 일부의 각각은 라우드니스 프로세싱 상태 메타데이터(LPSM)를 포함한다. 디코더 스테이지(202)는 다음의 포맷을 가진 LPSM을 비트스트림으로부터 추출하도록 구성된다. LPSM을 포함하는 메타데이터 세그먼트들의 각각은 비트스트림의 프레임의 비트스트림 정보("BSI") 세그먼트의 "addbsi" 필드에, 또는 비트스트림의 프레임의 끝에서 보조데이터 필드(예로서, 도 4에 도시된 AUX 세그먼트)에 포함된다. 비트스트림의 프레임은 그 각각이 LPSM을 포함하는 하나 또는 두 개의 메타데이터 세그먼트들을 포함할 수 있으며, 프레임이 두 개의 메타데이터 세그먼트들을 포함한다면, 하나는 프레임의 addbsi 필드에 존재하며 다른 하나는 프레임의 AUX 필드에 존재한다. LPSM을 포함한 각각의 메타데이터 세그먼트는 다음의 포맷을 가진 LPSM 페이로드(또는 컨테이너) 세그먼트를 포함한다: 헤더(예로서, LPSM 페이로드의 시작을 식별하는 syncword, 이어서 식별 값들, 예로서 이하의 표 2에 표시된 LPSM 포맷 버전, 길이, 기간, 카운트, 및 서브스트림 연관 값들을 포함); 및 상기 헤더 후, 대응하는 오디오 데이터가 다이얼로그를 표시하는지 또는 다이얼로그를 표시하지 않는지(예로서, 대응하는 오디오 데이터의 어떤 채널들이 다이얼로그를 표시하는지)를 표시한 적어도 하나의 다이얼로그 표시 값(예로서, 표 2의 파라미터 "다이얼로그 채널(들)"); 대응하는 오디오 데이터가 표시된 세트의 라우드니스 규정들을 따르는지를 표시한 적어도 하나의 라우드니스 규정 준수 값(예로서, 표 2의 파라미터 "라우드니스 규정 유형"); 대응하는 오디오 데이터에 대해 수행된 라우드니스 프로세싱의 적어도 하나의 유형을 표시한 적어도 하나의 라우드니스 프로세싱 값(예로서, 표 2의 파라미터들 "다이얼로그 게이팅 라우드니스 교정 플래그", "라우드니스 교정 유형" 중 하나 이상); 및 대응하는 오디오 데이터의 적어도 하나의 라우드니스(예로서, 피크 또는 평균 라우드니스) 특성을 표시한 적어도 하나의 라우드니스 값(예로서, 표 2의 파라미터들 "ITU 상대적 게이팅 라우드니스", "ITU 스피치 게이팅 라우드니스', "ITU (EBU 3341) 단기 3s 라우드니스", 및 "실제 피크" 중 하나 이상).In some implementations of the decoder 100 , the received (and buffered in memory 201 ) encoded bitstream is an AC-3 bitstream or an E-AC-3 bitstream and includes audio data segments (eg, AB0 to AB5 segments of the frame shown in Figure 4) and metadata segments, wherein the audio data segments represent audio data, each of at least some of the metadata segments comprising loudness processing state metadata (LPSM) ) is included. The decoder stage 202 is configured to extract an LPSM having the following format from the bitstream. Each of the metadata segments containing the LPSM is stored in the "addbsi" field of the bitstream information ("BSI") segment of the frame of the bitstream, or in the ancillary data field (e.g., shown in FIG. 4) at the end of the frame of the bitstream. included in the AUX segment). A frame of a bitstream may contain one or two metadata segments, each containing an LPSM, and if a frame contains two metadata segments, one is in the addbsi field of the frame and the other is in the frame's addbsi field. It exists in the AUX field. Each metadata segment containing an LPSM includes an LPSM payload (or container) segment with the following format: a header (eg, a syncword identifying the start of the LPSM payload, followed by identification values, eg, in the table below) 2, including LPSM format version, length, duration, count, and substream associated values); and after the header, at least one dialog display value indicating whether the corresponding audio data displays a dialog or not (eg, which channels of the corresponding audio data display a dialog) parameter "Dialog Channel(s)" in Table 2); at least one loudness rule compliance value indicating whether the corresponding audio data conforms to the indicated set of loudness rules (eg, the parameter “loudness rule type” in Table 2); At least one loudness processing value indicative of at least one type of loudness processing performed on the corresponding audio data (eg, one or more of the parameters “Dialog Gating Loudness Correction Flag”, “Loudness Correction Type” in Table 2) ; and at least one loudness value indicative of at least one loudness (eg, peak or average loudness) characteristic of the corresponding audio data (eg, the parameters “ITU Relative Gating Loudness”, “ITU Speech Gating Loudness” in Table 2) , "ITU (EBU 3341) short-term 3s loudness", and one or more of "real peaks").

몇몇 구현들에서, 디코더 스테이지(202)는 비트스트림의 프레임의 "addbsi" 필드 또는 보조데이터 필드로부터, 다음의 포맷을 가진 각각의 메타데이터 세그먼트를 추출하도록 구성된다: 코어 헤더(예로서, 메타데이터 세그먼트의 시작을 식별하는 syncword, 이어서 적어도 하나의 식별 값, 예로서 이하의 표 1에 표시된 코어 요소 버전, 길이, 및 기간, 연장된 요소 카운트, 및 서브스트림 연관 값들을 포함); 및 상기 코어 헤더 후, 라우드니스 프로세싱 상태 메타데이터 또는 대응하는 오디오 데이터 중 적어도 하나의 복호화, 인증, 또는 검증 중 적어도 하나에 유용한 적어도 하나의 보호 값(예로서, 표 1의 HMAC 다이제스트 및 오디오 핑거프린트 값들); 및 또한 상기 코어 헤더 후, 메타데이터 세그먼트가 LPSM을 포함한다면, 다음의 메타데이터를 LPSM 페이로드로 식별하며 LPSM 페이로드의 크기를 표시하는 LPSM 페이로드 식별("ID") 및 LPSM 페이로드 크기. (예로서, 상기 특정된 포맷을 가진) 상기 LPSM 페이로드(또는 컨테이너) 세그먼트는 LPSM 페이로드 ID 및 LPSM 페이로드 크기 값들을 따른다.In some implementations, the decoder stage 202 is configured to extract, from an "addbsi" field or ancillary data field of a frame of the bitstream, each metadata segment having the following format: a core header (eg, metadata a syncword identifying the start of a segment followed by at least one identifying value, eg, including core element version, length, and duration, extended element count, and substream associated values indicated in Table 1 below); and after the core header, at least one protection value useful for at least one of decryption, authentication, or verification of at least one of loudness processing state metadata or corresponding audio data (eg, the HMAC digest and audio fingerprint values of Table 1). ); and also after the core header, if the metadata segment contains LPSM, an LPSM Payload Identification (“ID”) and LPSM Payload Size that identifies the following metadata as the LPSM Payload and indicates the size of the LPSM Payload. The LPSM payload (or container) segment (eg, with the specified format) conforms to LPSM Payload ID and LPSM Payload Size values.

보다 일반적으로, 실시예에 의해 생성된 인코딩된 오디오 비트스트림은 코어(의무적) 또는 확장(선택적 요소들)으로서 메타데이터 요소들 및 서브-요소들을 라벨링하기 위한 메커니즘을 제공하는 구조를 가진다. 이것은 (그 메타데이터를 포함한) 비트스트림의 데이터 레이트로 하여금 다수의 애플리케이션들에 걸쳐 스케일링하도록 허용한다. 비트스트림 신택스의 코어(의무적) 요소들은 또한 오디오 콘텐트와 연관된 확장(선택적) 요소들이 존재하며(밴드-내) 및/또는 원격 위치에(밴드 외) 있음을 시그널링할 수 있어야 한다.More generally, the encoded audio bitstream generated by an embodiment has a structure that provides a mechanism for labeling metadata elements and sub-elements as core (mandatory) or extension (optional elements). This allows the data rate of the bitstream (including its metadata) to scale across multiple applications. The core (mandatory) elements of the bitstream syntax should also be able to signal that extension (optional) elements associated with the audio content are present (in-band) and/or in a remote location (out-of-band).

일부 실시예에서, 코어 요소(들)는 비트스트림의 매 프레임에 존재하도록 요구된다. 코어 요소들의 몇몇 서브-요소들은 선택적이며 임의의 조합으로 존재할 수 있다. 확장된 요소들은 (비트레이트 오버헤드를 제한하기 위해) 매 프레임에 존재하도록 요구되지 않는다. 따라서, 확장 요소들은 몇몇 프레임들에 존재할 수 있으며 다른 것들에는 존재하지 않을 수 있다. 확장 요소의 몇몇 서브-요소들은 선택적이며 임의의 조합으로 존재할 수 있지만, 확장 요소의 몇몇 서브-요소들은 의무적일 수도 있다(즉, 확장 요소가 비트스트림의 프레임에 존재한다면).In some embodiments, the core element(s) are required to be present in every frame of the bitstream. Some sub-elements of the core elements are optional and may be present in any combination. Extended elements are not required to be present every frame (to limit bitrate overhead). Thus, extension elements may be present in some frames and not in others. Some sub-elements of the extension element are optional and may be present in any combination, while some sub-elements of the extension element may be mandatory (ie, if the extension element is present in a frame of the bitstream).

일부 실시예들에서, 오디오 데이터 세그먼트 및 메타데이터 세그먼트들의 시퀀스를 포함한 인코딩된 오디오 비트스트림이 (예로서, 본 발명을 구현한 오디오 프로세싱 유닛에 의해) 생성된다. 오디오 데이터 세그먼트들은 오디오 데이터를 나타내며, 메타데이터 세그먼트들 중 적어도 일부의 각각은 라우드니스 프로세싱 상태 메타데이터(LPSM)를 포함하며 오디오 데이터 세그먼트들은 메타데이터 세그먼트들과 시간-분할 다중화된다. 이 클래스에서의 일부 실시예들에서, 메타데이터 세그먼트들의 각각은 여기에 설명될 포맷을 가진다. 하나의 포맷에서, 인코딩된 비트스트림은 AC-3 비트스트림 또는 E-AC-3 비트스트림이며, LPSM을 포함하는 메타데이터 세그먼트들의 각각은 비트스트림의 프레임의 비트스트림 정보("BSI") 세그먼트의 "addbsi" 필드(도 6에 도시)에서, 또는 비트스트림의 프레임의 보조데이터 필드에서 (예로서, 인코더(100)의 스테이지(107)에 의해) 부가적인 비트 스트림 정보로서 포함된다. 프레임들의 각각은 도 8의 표 1에 도시된 포맷을 가진 프레임의 addbsi 필드에 코어 요소를 포함한다.In some embodiments, an encoded audio bitstream comprising an audio data segment and a sequence of metadata segments is generated (eg, by an audio processing unit embodying the present invention). The audio data segments represent audio data, each of at least some of the metadata segments including loudness processing state metadata (LPSM) and the audio data segments are time-division multiplexed with the metadata segments. In some embodiments in this class, each of the metadata segments has a format as will be described herein. In one format, the encoded bitstream is an AC-3 bitstream or an E-AC-3 bitstream, and each of the metadata segments comprising an LPSM is a bitstream information (“BSI”) segment of a frame of the bitstream. It is included as additional bit stream information in the "addbsi" field (shown in FIG. 6 ), or in the auxiliary data field of a frame of the bitstream (eg, by stage 107 of encoder 100 ). Each of the frames includes a core element in the addbsi field of the frame having the format shown in Table 1 of FIG. 8 .

하나의 포맷에서, LPSM을 포함하는 addbsi(또는 보조데이터) 필드들의 각각은 코어 헤더(및 선택적으로 또한 부가적인 코어 요소들), 및 상기 코어 헤더(또는 코어 헤더 및 다른 코어 요소들) 후, 다음의 LPSM 값들(파라미터들)을 포함한다: 코어 요소 값들(예로서, 표 1에 특정된 바와 같이)을 따르는 (LPSM으로서 메타데이터를 식별하는) 페이로드 ID; 페이로드 ID를 따르는 (LPSM 페이로드의 크기를 표시한) 페이로드 크기; 및 도 9의 표 2에 표시된 바와 같은 포맷을 가진 (페이로드 ID 및 페이로드 크기 값을 따르는) LPSM 데이터.In one format, each of the addbsi (or auxiliary data) fields comprising the LPSM is a core header (and optionally also additional core elements), and after the core header (or core header and other core elements), the next contains the LPSM values (parameters) of: a payload ID (identifying the metadata as an LPSM) that conforms to the core element values (eg, as specified in Table 1); payload size (indicating the size of the LPSM payload) followed by the payload ID; and LPSM data (according to payload ID and payload size values) having a format as indicated in Table 2 of FIG. 9 .

인코딩된 비트스트림의 제 2 포맷에서, 비트스트림은 AC-3 비트스트림 또는 E-AC-3 비트스트림이며, LPSM을 포함하는 메타데이터 세그먼트들의 각각은 (예로서, 인코더(100)의 스테이지(107)에 의해) 다음 중 하나에 포함된다: 비트스트림의 프레임의 비트스트림 정보("BSI") 세그먼트의 "addbsi" 필드(도 6에 도시); 또는 비트스트림의 프레임의 끝에서 보조데이터 필드(예로서, 도 4에 도시된 AUX 세그먼트). 프레임은 그 각각이 LPSM을 포함하는 하나 또는 두 개의 메타데이터 세그먼트들을 포함할 수 있으며, 프레임이 두 개의 메타데이터 세그먼트들을 포함한다면, 하나는 프레임의 addbsi 필드에 및 다른 하나는 프레임의 AUX 필드에 존재한다. LPSM을 포함한 각각의 메타데이터 세그먼트는 상기 표 1 및 표 2를 참조하여 상기한 특정된 포맷을 가진다(즉, 그것은 표 1에 특정된 코어 요소들, 이어서 상기 특정된 페이로드 ID(LPSM으로서 메타데이터를 식별) 및 페이로드 크기 값들, 이어서 페이로드(표 2에 표시된 바와 같은 포맷을 가진 LPSM 데이터)를 포함한다).In a second format of the encoded bitstream, the bitstream is an AC-3 bitstream or an E-AC-3 bitstream, and each of the metadata segments comprising the LPSM (eg, stage 107 of encoder 100 ) )) in one of the following: the "addbsi" field of the bitstream information ("BSI") segment of the frame of the bitstream (shown in FIG. 6 ); or an auxiliary data field at the end of the frame of the bitstream (eg, the AUX segment shown in FIG. 4 ). A frame may contain one or two metadata segments, each containing an LPSM, if the frame contains two metadata segments, one in the addbsi field of the frame and the other in the AUX field of the frame. do. Each metadata segment including the LPSM has the format specified above with reference to Tables 1 and 2 above (ie it contains the core elements specified in Table 1, followed by the specified Payload ID (Metadata as LPSM). ) and payload size values, followed by the payload (LPSM data in the format as shown in Table 2).

또 다른 것에서, 인코딩된 비트스트림은 돌비 E 비트스트림이며, LPSM을 포함하는 메타데이터 세그먼트들의 각각은 돌비 E 보호 밴드 간격의 첫 N 샘플 위치들이다. LPSM을 포함하는 이러한 메타데이터 세그먼트를 포함하는 돌비 E 비트스트림은, 예로서 SMPTE 337M 프리앰블의 Pd 워드에서 시그널링된 LPSM 페이로드 길이를 나타내는 값을 포함한다(SMPTE 337M Pa 워드 반복 레이트는 연관된 비디오 프레임 레이트와 동일한 채로 유지될 수 있다).In another, the encoded bitstream is a Dolby E bitstream, and each of the metadata segments containing the LPSM is the first N sample positions of the Dolby E guard band interval. A Dolby E bitstream containing this metadata segment comprising LPSM contains, for example, a value indicating the LPSM payload length signaled in the Pd word of the SMPTE 337M preamble (SMPTE 337M Pa word repetition rate is the associated video frame rate). can remain the same as ).

인코딩된 비트스트림이 E-AC-3 비트스트림인 포맷에서, LPSM을 포함하는 메타데이터 세그먼트들의 각각은 (예로서, 인코더(100)의 스테이지(107)에 의해) 비트스트림의 프레임의 비트스트림 정보("BSI") 세그먼트의 "addbsi" 필드에서 부가적인 비트스트림 정보로서 포함된다. 이러한 포맷에서 LPSM을 갖는 E-AC-3 비트스트림을 인코딩하는 부가적인 양상들은 다음과 같이 설명된다: (1) E-AC-3 비트스트림의 생성 동안, (LPSM 값들을 비트스트림에 삽입하는) E-AC-3 인코더는 발생된 매 프레임(syncframe)에 대해 "활성"인 동안, 비트스트림은 프레임의 addbsi 필드에서 운반된 (LPSM을 포함한) 메타데이터 블록을 포함해야 한다. 메타데이터 블록을 운반하기 위해 요구된 비트들은 인코더 비트레이트(프레임 길이)를 증가시키지 않아야 한다; (2) (LPSM을 포함한) 매 메타데이터 블록은 다음의 정보를 포함해야 한다: 라우드니스_교정_유형_프래그: 여기에서 '1'은 대응하는 오디오 데이터의 라우드니스가 인코더로부터 업스트림에서 교정되었음을 표시하며, '0'은 라우드니스가 인코더에 내장된 라우드니스 교정기에 의해 교정되었음을 표시한다(예로서, 도 2의 인코더(100)의 라우드니스 프로세서(103)); 스피치_채널: (이전 0.5 초에 걸쳐) 어떤 소스 채널(들)이 스피치를 포함하는지를 표시한다. 어떤 스피치도 검출되지 않았다면, 이것은 이와 같이 표시될 것이다; 스피치_라우드니스: (이전 0.5 초에 걸쳐) 스피치를 포함하는 각각의 대응하는 오디오 채널의 통합된 스피치 라우드니스를 표시한다; ITU_라우드니스: 각각의 대응하는 오디오 채널의 통합된 ITU BS.1770-2 라우드니스를 표시한다; 이득: (가역성을 보여주기 위해) 디코더에서의 역전(reversal)을 위한 라우드니스 복합 이득(들).In a format in which the encoded bitstream is an E-AC-3 bitstream, each of the metadata segments comprising the LPSM (eg, by stage 107 of the encoder 100 ) contains bitstream information of a frame of the bitstream. ("BSI") Included as additional bitstream information in the "addbsi" field of the segment. Additional aspects of encoding an E-AC-3 bitstream with LPSM in this format are described as follows: (1) During generation of the E-AC-3 bitstream (inserting LPSM values into the bitstream) While the E-AC-3 encoder is "active" for every generated frame (syncframe), the bitstream must contain a block of metadata (including LPSM) carried in the addbsi field of the frame. The bits required to carry a metadata block should not increase the encoder bitrate (frame length); (2) Every metadata block (including LPSM) shall contain the following information: Loudness_Correction_Type_Flag: where '1' indicates that the loudness of the corresponding audio data has been calibrated upstream from the encoder. and '0' indicates that the loudness has been corrected by a loudness corrector built into the encoder (eg, the loudness processor 103 of the encoder 100 of FIG. 2); Speech_Channel: Indicates which source channel(s) contains speech (over the previous 0.5 seconds). If no speech was detected, it will be marked like this; Speech_Loudness: indicates the integrated speech loudness of each corresponding audio channel containing speech (over the previous 0.5 seconds); ITU_Loudness: indicates the integrated ITU BS.1770-2 loudness of each corresponding audio channel; Gain: Loudness composite gain(s) for reversal at the decoder (to show reversibility).

(LPSM 값들을 비트스트림에 삽입하는) E-AC-3 인코더가 "활성"이고 "신뢰" 플래그를 가진 AC-3 프레임을 수신하는 동안, 인코더의 라우드니스 제어기(예로서, 도 2의 인코더(100)의 라우드니스 프로세서(103))는 바이패싱된다. "신뢰된" 소스 Dialnorm 및 DRC 값들은 E-AC-3 인코더 구성요소(예로서, 인코더(100)의 스테이지(107))로 (예로서, 인코더(100)의 발생기(106)에 의해) 통과된다. LPSM 블록 생성은 계속되며 라우드니스_교정_유형_플래그는 '1'로 설정된다. 라우드니스 제어기 바이패스 시퀀스는 '신뢰된' 플래그가 나타나는 디코딩된 AC-3 프레임의 시작에 동기화된다. 라우드니스 제어기 바이패스 시퀀스는 다음과 같이 구현된다: 레벨러_양 제어는 10 오디오 블록 기간들(즉, 53.3 밀리초)에 걸쳐 9의 값으로부터 0의 값으로 감소되며 레벨러_백_엔드_미터 제어는 바이패스 모드에 위치된다(이 동작은 끊김없는 전이가 되어야 한다). 용어인 레벨러의 '신뢰된' 바이패스는 소스 비트스트림의 Dialnorm 값이 또한 인코더의 출력에서 재-이용됨을 의미한다. (예로서, '신뢰된' 소스 비트스트림이 -30의 Dialnorm 값을 가진다면, 인코더의 출력은 아웃바운드 Dialnorm 값에 대해 -30을 사용해야 한다).While the E-AC-3 encoder (inserting LPSM values into the bitstream) is "active" and receives an AC-3 frame with a "trust" flag, its loudness controller (eg, encoder 100 in FIG. 2 ) ) of the loudness processor 103) is bypassed. The “trusted” source Dialnorm and DRC values are passed (eg, by the generator 106 of the encoder 100 ) to the E-AC-3 encoder component (eg, the stage 107 of the encoder 100 ). do. LPSM block generation continues and the loudness_correction_type_flag is set to '1'. The loudness controller bypass sequence is synchronized to the beginning of the decoded AC-3 frame where the 'trusted' flag appears. The loudness controller bypass sequence is implemented as follows: the leveler_amount control is reduced from a value of 9 to a value of 0 over 10 audio block periods (ie 53.3 milliseconds) and the leveler_back_end_meter control is It is placed in bypass mode (this action should be a seamless transition). The term leveler's 'trusted' bypass means that the Dialnorm value of the source bitstream is also re-used at the output of the encoder. (For example, if the 'trusted' source bitstream has a Dialnorm value of -30, the output of the encoder must use -30 for the outbound Dialnorm value).

(LPSM 값들을 비트스트림에 삽입하는) E-AC-3 인코더가 "활성"이고 '신뢰된' 플래그 없이 AC-3 프레임을 수신하는 동안, 인코더에 내장된 라우드니스 제어기(예로서, 도 2의 인코더(100)의 라우드니스 프로세서(103))는 활성이다. LPSM 블록 생성은 계속되며 라우드니스_교정_유형_플래그는 '0'으로 설정된다. 라우드니스 제어기 활성화 시퀀스는 '신뢰' 플래그가 사라지는 디코딩된 AC-3 프레임의 시작에 동기화된다. 라우드니스 제어기 활성화 시퀀스는 다음과 같이 구현된다: 레벨러_양 제어는 1 오디오 블록 기간(즉, 5.3 밀리초)에 걸쳐 0의 값으로부터 9의 값으로 증가되며 레벨러_백_엔드_미터 제어는 '활성' 모드로 위치된다(이 동작은 끊김없는 전이가 되며 백_엔드-미터 통합 리셋을 포함한다); 및 인코딩 동안, 그래픽 사용자 인터페이스(GUI)는 다음의 파라미터들을 사용자에게 표시한다: "입력 오디오 프로그램: [신뢰됨/신뢰되지 않음]" - 이러한 파라미터의 상태는 입력 신호 내에서 "신뢰" 플래그의 존재에 기초한다; 및 "실시간 라우드니스 교정: [인에이블/디스에이블]" - 이 파라미터의 상태는 인코더에 내장된 이러한 라우드니스 제어기가 활성인지 여부에 기초한다.While the E-AC-3 encoder (which inserts the LPSM values into the bitstream) is "active" and receives an AC-3 frame without a 'trusted' flag, the loudness controller built into the encoder (e.g. the encoder of Figure 2) The loudness processor 103 of 100 is active. LPSM block generation continues and the loudness_correction_type_flag is set to '0'. The loudness controller activation sequence is synchronized to the beginning of the decoded AC-3 frame where the 'trust' flag disappears. The loudness controller activation sequence is implemented as follows: Leveler_Amount control is incremented from a value of 0 to a value of 9 over a period of 1 audio block (ie 5.3 milliseconds) and Leveler_Back_End_Meter control is 'active' placed in ' mode (this operation becomes a seamless transition and includes a back_end-meter integrated reset); and during encoding, the graphical user interface (GUI) displays the following parameters to the user: "Input audio program: [trusted/untrusted]" - the state of this parameter is the presence of a "trusted" flag in the input signal based on; and “Real-Time Loudness Calibration: [Enable/Disable]”—The state of this parameter is based on whether or not this loudness controller built into the encoder is active.

비트스트림의 각각의 프레임의 비트스트림 정보("BSI") 세그먼트의 "addbsi" 필드에 포함된 (설명된 포맷의) LPSM을 가진 AC-3 또는 E-AC-3 비트스트림을 디코딩할 때, 디코더는 (addbsi 필드에서) LPSM 블록 데이터를 파싱하며 추출된 LPSM 값들을 그래픽 사용자 인터페이서(GUI)로 전달한다. 추출된 LPSM 값들의 세트는 매 프레임마다 리프레싱된다.When decoding an AC-3 or E-AC-3 bitstream with LPSM (of the format described) included in the "addbsi" field of the bitstream information ("BSI") segment of each frame of the bitstream, the decoder parses the LPSM block data (in the addbsi field) and delivers the extracted LPSM values to the graphical user interface (GUI). The set of extracted LPSM values is refreshed every frame.

또 다른 포맷에서, 인코딩된 비트스트림은 AC-3 비트스트림 또는 E-AC-3 비트스트림이며, LPSM을 포함하는 메타데이터 세그먼트들의 각각은 비트스트림의 프레임의 비트스트림 정보("BSI") 세그먼트의 "addbsi" 필드(도 6에 도시된)에서(또는 Aux 세그먼트에서) 부가적인 비트 스트림 정보로서 포함된다(예로서, 인코더(100)의 스테이지(107)에 의해). (표 1 및 표 2를 참조하여 상기 설명된 포맷에 대한 변형인) 이 포맷에서, LPSM을 포함하는 addbsi(또는 Aux) 필드들의 각각은 다음의 LPSM 값들을 포함한다: 표 1에 특정된 코어 요소들, 이어서 (LPSM으로서 메타데이터를 식별하는) 페이로드 ID 및 페이로드 크기 값들, 이어서 (상기 표 2에 표시된 요소들과 유사한) 다음의 포맷을 가진 페이로드(LPSM 데이터): LPSM 페이로드의 버전: LPSM 페이로드의 버전을 표시하는 2-비트 필드; dialchan: 대응하는 오디오 데이터의 좌, 우 및/또는 중심 채널들이 음성 다이얼로그를 포함하는지 여부를 표시하는 3-비트 필드. dialchan 필드의 비트 할당은 다음과 같을 수 있다: 좌 채널에서 다이얼로그의 존재를 표시하는 비트 0은 dialchan 필드의 최상위 비트에 저장되며; 중심 채널에서 다이얼로그의 존재를 표시하는 비트 2는 dialchan 필드의 최하위 비트에 저장된다. dialchan 필드의 각각의 비트는 대응하는 채널이 프로그램의 이전 0.5 초 동안 음성 다이얼로그를 포함한다면 '1'로 설정된다; loudregtyp: 프로그램 라우드니스가 어떤 라우드니스 규정 표준을 따르는지를 표시하는 3-비트 필드. "loudregtyp" 필드를 '000'으로 설정하는 것은 LPSM이 라우드니스 규정 준수를 표시하지 않음을 표시한다. 예를 들면, 이러한 필드의 하나의 값(예로서, 000)은 라우드니스 규정 표준의 준수가 표시되지 않음을 표시할 수 있고, 이러한 필드의 또 다른 값(예로서, 001)은 프로그램의 오디오 데이터가 ATSC A/85 표준을 준수함을 표시할 수 있으며, 이러한 필드의 또 다른 값(예로서, 010)은 프로그램의 오디오 데이터가 EBU R128 표준을 준수함을 표시할 수 있다. 상기 예에서, 필드가 '000'외의 임의의 값으로 설정된다면, 페이로드에서 loudcorrdialgat 및 loudcorrtyp 필드들이 뒤따라야 한다; loudcorrdialgat: 다이얼로그-게이팅 라우드니스 교정이 적용되었는지를 표시하는 1-비트 필드. 프로그램의 라우드니스가 다이얼로그 게이팅을 사용하여 교정되었다면, loudcorrdialgat 필드의 값은 '1'로 설정된다: 그렇지 않다면 그것은 '0'으로 설정된다; loudcorrtyp: 프로그램에 적용된 라우드니스 교정의 유형을 표시하는 1-비트 필드. 프로그램의 라우드니스가 무한 예견(infinite look-ahead)(파일-기반) 라우드니스 교정 프로세스로 교정되었다면, loudcorrtyp 필드의 값은 '0'으로 설정된다. 프로그램의 라우드니스가 실시간 라우드니스 측정 및 동적 범위 제어의 조합을 사용하여 교정되었다면, 이 필드의 값은 '1'로 설정된다; loudrelgate: 상대적 게이팅 라우드니스 데이터(ITU)가 존재하는지 여부를 표시한 1-비트 필드. loudrelgate 필드가 '1'로 설정된다면, 7-비트 ituloudrelgat 필드가 페이로드에서 이어져야 한다; loudrelgat: 상대적 게이팅 프로그램 라우드니스(ITU)를 표시하는 7-비트 필드. 이 필드는 적용되는 Dialnorm 및 동적 범위 압축으로 인한 어떠한 이득 조정들도 없이 ITU-R BS.1770-2에 따라 측정된, 오디오 프로그램의 통합 라우드니스를 표시한다. 0 내지 127의 값들이 0.5 LKFS 단차들로, -58 LKFS 내지 +5.5 LKFS로서 해석된다; loudspchgate: 스피치-게이팅 라우드니스 데이터(ITU)가 존재하는지 여부를 표시하는 1-비트 필드. loudspchgate 필드가 '1'로 설정된다면, 7-비트 loudspchgat 필드가 페이로드에서 이어져야 한다; loudspchgat: 스피치-게이팅 프로그램 라우드니스를 표시하는 7-비트 필드. 이 필드는 적용되는 Dialnorm 및 동적 범위 압축으로 인한 어떠한 이득 조정들도 없이 및 ITU-R BS.1770-3의 식(2)에 따라 측정된, 전체 대응하는 오디오 프로그램의 통합 라우드니스를 표시한다. 0 내지 127의 값들은 0.5 LKFS 단차들로, -58 내지 +5.5 LKFS로서 해석된다; loudstrm3se: 단기(3 초) 라우드니스 데이터가 존재하는지 여부를 표시하는 1-비트 필드. 필드가 '1'로 설정된다면, 7-비트 loudstrm3s 필드가 페이로드에서 이어져야 한다; loudstrm3s: 적용되는 Dialnorm 및 동적 범위 압축으로 인한 어떠한 이득 조정들도 없이, ITU-R BS.1771-1에 따라 측정되는 대응하는 오디오 프로그램의 이전 3초의 게이팅되지 않은 라우드니스를 표시하는 7-비트 필드. 0 내지 256의 값들은 0.5 LKFS 단차들로 -116 LKFS 내지 +11.5 LKFS로서 해석된다; truepke: 실제 피크 라우드니스 데이터가 존재하는지 여부를 표시하는 1-비트 필드. truepke 필드가 '1'로 설정된다면, 8-비트 truepk 필드가 페이로드에서 이어져야 한다; 및 truepk: 적용되는 Dialnorm 및 동적 범위 압축으로 인한 어떠한 이득 조정들도 없이 ITU-R BS.1770-3의 부록 2에 따라 측정되는, 프로그램의 실제 피크 샘플 값을 표시하는 8-비트 필드. 0 내지 256의 값들이 0.5 LKFS 단차들로 -116 LKFS 내지 +11.5 LKFS로서 해석된다. In another format, the encoded bitstream is an AC-3 bitstream or an E-AC-3 bitstream, wherein each of the metadata segments comprising the LPSM is a bitstream information (“BSI”) segment of a frame of the bitstream. Included as additional bit stream information in the "addbsi" field (shown in FIG. 6 ) (or in the Aux segment) (eg, by stage 107 of encoder 100 ). In this format (which is a variation on the format described above with reference to Tables 1 and 2), each of the addbsi (or Aux) fields containing the LPSM contains the following LPSM values: Core Element Specified in Table 1 , followed by the payload ID and payload size values (identifying the metadata as LPSM), followed by a payload (LPSM data) with the following format (similar to the elements indicated in Table 2 above): the version of the LPSM payload. : 2-bit field indicating the version of the LPSM payload; dialchan: A 3-bit field indicating whether the left, right and/or center channels of the corresponding audio data contain a voice dialog. The bit assignment of the dialchan field may be as follows: bit 0 indicating the presence of a dialog in the left channel is stored in the most significant bit of the dialchan field; Bit 2, indicating the presence of a dialog in the central channel, is stored in the least significant bit of the dialchan field. Each bit of the dialchan field is set to '1' if the corresponding channel contains voice dialogue during the previous 0.5 seconds of the program; loudregtyp: 3-bit field indicating which loudness regulation standard the program loudness conforms to. Setting the "loudregtyp" field to '000' indicates that the LPSM does not indicate loudness compliance. For example, one value of this field (eg, 000) may indicate that compliance with a loudness regulation standard is not indicated, and another value of this field (eg, 001) may indicate that the program's audio data It may indicate conformance to the ATSC A/85 standard, and another value in this field (eg, 010) may indicate that the audio data of the program conforms to the EBU R128 standard. In the example above, if the field is set to any value other than '000', then the loudcorrdialgat and loudcorrtyp fields in the payload must follow; loudcorrdialgat: 1-bit field indicating whether dialog-gated loudness correction has been applied. If the loudness of the program has been calibrated using dialog gating, the value of the loudcorrdialgat field is set to '1'; otherwise it is set to '0'; loudcorrtyp: 1-bit field indicating the type of loudness correction applied to the program. If the loudness of the program has been calibrated with an infinite look-ahead (file-based) loudness calibration process, the value of the loudcorrtyp field is set to '0'. If the loudness of the program has been calibrated using a combination of real-time loudness measurement and dynamic range control, the value of this field is set to '1'; loudrelgate: 1-bit field indicating whether relative gating loudness data (ITU) is present. If the loudrelgate field is set to '1', a 7-bit ituloudrelgat field shall follow in the payload; loudrelgat: 7-bit field indicating the relative gating program loudness (ITU). This field indicates the integrated loudness of the audio program, measured according to ITU-R BS.1770-2, without any gain adjustments due to the applied Dialnorm and dynamic range compression. Values from 0 to 127 are interpreted as 0.5 LKFS steps, -58 LKFS to +5.5 LKFS; loudspchgate: 1-bit field indicating whether speech-gating loudness data (ITU) is present. If the loudspchgate field is set to '1', a 7-bit loudspchgat field shall follow in the payload; loudspchgat: 7-bit field indicating speech-gating program loudness. This field indicates the integrated loudness of the entire corresponding audio program, measured according to equation (2) of ITU-R BS.1770-3 and without any gain adjustments due to the applied Dialnorm and dynamic range compression. Values from 0 to 127 are 0.5 LKFS steps, interpreted as -58 to +5.5 LKFS; loudstrm3se: 1-bit field indicating whether short-term (3 sec) loudness data is present. If the field is set to '1', a 7-bit loudstrm3s field shall follow in the payload; loudstrm3s: 7-bit field indicating the ungated loudness of the previous 3 seconds of the corresponding audio program, measured according to ITU-R BS.1771-1, without any gain adjustments due to Dialnorm and dynamic range compression applied. Values from 0 to 256 are interpreted as -116 LKFS to +11.5 LKFS with steps of 0.5 LKFS; truepke: 1-bit field indicating whether actual peak loudness data is present. If the truepke field is set to '1', an 8-bit truepk field MUST be followed in the payload; and truepk: an 8-bit field indicating the actual peak sample value of the program, measured according to Annex 2 of ITU-R BS.1770-3 without any gain adjustments due to Dialnorm and dynamic range compression applied. Values from 0 to 256 are interpreted as -116 LKFS to +11.5 LKFS with 0.5 LKFS steps.

일부 실시예들에서, AC-3 비트스트림 또는 E-AC-3 비트스트림의 프레임의 보조데이터 필드(또는 "addbsi" 필드)에서의 메타데이터 세그먼트의 코어 요소는 코어 헤더(선택적으로 식별 값들, 예로서 코어 요소 버전을 포함), 및 상기 코어 헤더 후: 핑거프린트 데이터가 메타데이터 세그먼트의 메타데이터를 위해 포함되는지(또는 다른 보호 값들이 포함되는지)를 표시한 값들, (메타데이터 세그먼트의 메타데이터에 대응하는 오디오 데이터와 관련된) 외부 데이터가 존재하는지 여부를 표시한 값들, 코어 요소에 의해 식별된 메타데이터의 각각의 유형(예로서, LPSM, 및/또는 LPSM 이외의 유형의 메타데이터)에 대한 페이로드 ID 및 페이로드 크기 값들, 및 코어 요소에 의해 식별된 메타데이터의 적어도 한 유형에 대한 보호 값들을 포함한다. 메타데이터 세그먼트의 메타데이터 페이로드(들)는 코어 헤더를 따르며, (몇몇 경우들에서) 코어 요소의 값들 내에 내포된다.In some embodiments, the core element of the metadata segment in the ancillary data field (or "addbsi" field) of the frame of the AC-3 bitstream or E-AC-3 bitstream is a core header (optionally identifying values, e.g. including the core element version), and after the core header: values indicating whether the fingerprint data is included for the metadata of the metadata segment (or other protection values are included), (in the metadata of the metadata segment) Values indicating whether external data (related to the corresponding audio data) exists, a pay for each type of metadata identified by the core element (eg, LPSM, and/or metadata of a type other than LPSM) load ID and payload size values, and protection values for at least one type of metadata identified by the core element. The metadata payload(s) of the metadata segment follow the core header and are (in some cases) nested within the values of the core element.

최적화된 라우드니스 및 동적 범위 시스템Optimized loudness and dynamic range system

상기 설명된 안전한 메타데이터 코딩 및 전송 기법은 도 1에 예시된 바와 같이, 상이한 재생 디바이스들, 애플리케이션들, 및 청취 환경들에 걸쳐 라우드니스 및 동적 범위를 최적화하기 위한 확대 가능하며 확장 가능한 시스템과 함께 사용된다. 실시예에서, 시스템(10)은 상이한 타겟 라우드니스 값들을 요구하며 상이한 동적 범위 능력들을 가진 다양한 디바이스들에 걸쳐 입력 오디오(11)의 라우드니스 레벨들 및 동적 범위를 정규화하도록 구성된다. 라우드니스 레벨들 및 동적 범위를 정규화하기 위해, 시스템(10)은 오디오 콘텐트를 가진 상이한 디바이스 프로파일들을 포함하며 정규화는 이들 프로파일들에 기초하여 행해진다. 상기 프로파일들은 오디오 프로세싱 체인들에서 오디오 프로세싱 유닛들 중 하나에 의해 포함될 수 있으며 상기 포함된 프로파일들은 타겟 디바이스에 대한 원하는 타겟 라우드니스 및 동적 범위를 결정하기 위해 상기 오디오 프로세싱 체인에서 다운스트림 프로세싱 유닛에 의해 사용될 수 있다. 부가적인 프로세싱 구성요소들이 ((이에 제한되지 않지만) 널 대역 범위, 실제 피크 임계치, 라우드니스 범위, 고속/저속 시간 상수(계수들) 및 최대 부스트의 파라미터들을 포함하는) 디바이스 프로파일 관리, 이득 제어 및 광대역 및/또는 다중대역 이득 생성 함수들을 위한 정보를 제공하거나 또는 프로세싱할 수 있다. The secure metadata coding and transmission technique described above is for use with a scalable and extensible system for optimizing loudness and dynamic range across different playback devices, applications, and listening environments, as illustrated in FIG. 1 . do. In an embodiment, the system 10 is configured to normalize the dynamic range and loudness levels of the input audio 11 across various devices that require different target loudness values and have different dynamic range capabilities. To normalize loudness levels and dynamic range, system 10 includes different device profiles with audio content and normalization is done based on these profiles. The profiles may be included by one of the audio processing units in the audio processing chains and the included profiles will be used by a downstream processing unit in the audio processing chain to determine a desired target loudness and dynamic range for a target device. can Additional processing components include device profile management, gain control and wideband (including but not limited to parameters of null band range, actual peak threshold, loudness range, fast/slow time constants (coefficients) and max boost). and/or provide or process information for multiband gain generation functions.

도 10은 일부 실시예들 하에서, 최적화된 라우드니스 및 동적 범위 제어를 제공하는 시스템에 대한 도 1의 시스템의 보다 상세한 다이어그램을 예시한다. 도 10의 시스템(321)에 대해, 인코더 스테이지는 디코더(312)로의 송신을 위해 적절한 디지털 포맷으로 오디오 입력(303)을 인코딩하는 코어 인코더 구성요소(304)를 포함한다. 상기 오디오는, 그 각각이 상이한 라우드니스 및/또는 동적 범위 타겟 설정들을 요구할 수 있는, 다양한 상이한 청취 환경들에서 재생될 수 있도록 프로세싱된다. 따라서, 도 10에 도시된 바와 같이, 전체 범위 스피커들(320), 소형 스피커들(322), 및 헤드폰들(324)을 포함한 다양한 상이한 드라이버 유형들을 통한 재생을 위해 디코더는 디지털-대-아날로그 변환기(316)에 의해 아날로그 포맷으로 변환되는 디지털 신호를 출력한다. 이들 드라이버들은 가능한 재생 드라이버들의 몇몇 예들만을 예시하며, 어떠한 적절한 크기 및 유형의 어떠한 트랜듀서 또는 드라이버도 사용될 수 있다. 또한, 도 10의 드라이버들/트랜듀서들(320 내지 324)은 임의의 대응하는 청취 환경에서의 사용을 위한 어떠한 적절한 재생 디바이스에서도 구현될 수 있다. 디바이스 유형들은, 예를 들면, AVR들, 텔레비전들, 스테레오 장비, 컴퓨터들, 이동 전화들, 태블릿 컴퓨터들, MP3 플레이어들 등을 포함할 수 있으며, 청취 환경들은, 예를 들면, 강당들, 가정들, 차들, 청취 부스들 등을 포함할 수 있다.10 illustrates a more detailed diagram of the system of FIG. 1 for a system that provides optimized loudness and dynamic range control, under some embodiments. For system 321 of FIG. 10 , the encoder stage includes a core encoder component 304 that encodes audio input 303 into a digital format suitable for transmission to decoder 312 . The audio is processed so that it can be reproduced in a variety of different listening environments, each of which may require different loudness and/or dynamic range target settings. Thus, as shown in FIG. 10 , the decoder is a digital-to-analog converter for playback through a variety of different driver types, including full range speakers 320 , small speakers 322 , and headphones 324 . 316 outputs a digital signal that is converted to an analog format. These drivers illustrate only a few examples of possible regenerative drivers, and any transducer or driver of any suitable size and type may be used. Further, the drivers/transducers 320 - 324 of FIG. 10 may be implemented in any suitable playback device for use in any corresponding listening environment. Device types may include, for example, AVRs, televisions, stereo equipment, computers, mobile phones, tablet computers, MP3 players, etc., and listening environments are, for example, auditoriums, home fields, cars, listening booths, and the like.

재생 환경들 및 드라이버 유형들의 범위는 매우 작은 사설 콘텍스트들에서 매우 큰 공공 장소들로 변할 수 있기 때문에, 가능하며 최적의 재생 라우드니스 및 동적 범위 구성들의 폭은 콘텐트 유형, 배경 잡음 레벨들 등에 의존하여 크게 달라질 수 있다. 예를 들면, 홈 시어터 환경에서, 광범위한 동적 범위 콘텐트가 서라운드 사운드 장비를 통해 플레이될 수 있으며 보다 좁은 동적 범위 콘텐트가 (평판 LED/LCD 유형과 같은) 통상의 텔레비전 시스템을 통해 플레이될 수 있는 반면, 매우 좁은 동적 범위 모드는 큰 레벨 변형들이 바람직하지 않은 특정한 청취 조건들을 위해 사용될 수 있다(예로서, 밤에 또는 심각한 음향 출력 전력 한계들을 가진 디바이스, 예로서 이동 전화/태블릿 내부 스피커들 또는 헤드폰 출력 상에서). 소형 컴퓨터 또는 독 스피커들, 또는 헤드폰들/이어버드들을 사용하는 것과 같은 휴대용 또는 이동 청취 콘텍스트들에서, 재생의 최적의 동적 범위는 환경에 의존하여 달라질 수 있다. 예를 들면, 조용한 환경에서, 최적의 동적 범위는 시끄러운 환경과 비교하여 더 클 수 있다. 도 10의 적응적 오디오 프로세싱 시스템의 실시예들은 청취 디바이스 환경 및 재생 디바이스 유형과 같은, 파라미터들에 의존하여 오디오 콘텐트를 보다 쉽게 이해할 수 있게 하기 위해 동적 범위를 변경할 것이다. Because the range of playback environments and driver types can vary from very small private contexts to very large public places, the breadth of possible and optimal playback loudness and dynamic range configurations can vary greatly depending on the content type, background noise levels, etc. may vary. For example, in a home theater environment, wide dynamic range content may be played through surround sound equipment and narrower dynamic range content may be played through conventional television systems (such as flat panel LED/LCD types), whereas The very narrow dynamic range mode can be used for certain listening conditions where large level variations are undesirable (eg at night or on devices with severe acoustic output power limitations, eg mobile phone/tablet internal speakers or headphone output). ). In portable or mobile listening contexts, such as using a small computer or dock speakers, or headphones/earbuds, the optimal dynamic range of playback may vary depending on the environment. For example, in a quiet environment, the optimal dynamic range may be greater compared to a noisy environment. Embodiments of the adaptive audio processing system of FIG. 10 will change the dynamic range to make the audio content easier to understand depending on parameters, such as the listening device environment and playback device type.

도 11은 예시적인 사용 경우에 있어서 다양한 재생 디바이스들 및 배경 청취 환경들을 위한 상이한 동적 범위 요건들을 예시하는 표이다. 유사한 요건들이 라우드니스를 위해 얻어질 수 있다. 상이한 동적 범위 및 라우드니스 요건들은 최적화 시스템(321)에 의해 사용되는 상이한 프로파일들을 생성한다. 시스템(321)은 입력 오디오의 라우드니스 및 동적 범위를 분석 및 측정하는 라우드니스 및 동적 범위 측정 구성요소(302)를 포함한다. 실시예에서, 시스템은 전체 라우드니스 파라미터를 결정하기 위해 전체 프로그램 콘텐트를 분석한다. 이러한 콘텍스트에서, 라우드니스는 프로그램의 장기 프로그램 라우드니스 또는 평균 라우드니스를 나타내며, 여기에서 프로그램은 영화, 텔레비전 쇼, 광고, 또는 유사한 프로그램 콘텐트와 같은, 오디오 콘텐트의 단일 유닛이다. 라우드니스는 오디오가 어떻게 재생될 것인지를 제어하기 위해 콘텐트 생성기들에 의해 사용되는 예술적 동적 범위 프로파일의 표시를 제공하기 위해 사용된다. 라우드니스는 Dialnorm이 단일 프로그램(예로서, 영화, 쇼, 광고 등)의 평균 다이얼로그 라우드니스를 표현한다는 점에서 Dialnorm 메타데이터 값과 관련된다. 단기 동적 범위는 프로그램 라우드니스보다 훨씬 더 짧은 시간 기간에 걸쳐 신호들에서의 변형들을 수량화한다. 예를 들면, 단기 동적 범위는 초들(seconds)의 정도로 측정될 수 있는 반면, 프로그램 라우드니스는 분들 또는 심지어 시간들의 폭에 걸쳐 측정될 수 있다. 단기 동적 범위는 오버로딩이 다양한 재생 프로파일들 및 디바이스 유형들에 대해 발생하지 않음을 보장하기 위해 프로그램 라우드니스에 독립적인 보호 메커니즘을 제공한다. 실시예에서, 라우드니스(장기 프로그램 라우드니스) 타겟은 다이얼로그 라우드니스에 기초하며 단기 동적 범위는 상대적-게이팅된 및/또는 게이팅되지 않은 라우드니스에 기초한다. 이 경우에, 시스템에서 특정한 DRC 및 라우드니스 구성요소들은 콘텐트 유형 및/또는 타겟 디바이스 유형들 및 특성들에 관한 콘텍스트-인식이다. 이러한 콘텍스트-인식 능력의 일부로서, 시스템은 디바이스가 AVR 유형 디바이스들, 텔레비전들, 컴퓨터들, 휴대용 디바이스들 등과 같은, 특정한 DRC 및 라우드니스 재생 조건들에 대해 최적화되는 특정한 그룹들의 디바이스들의 멤버인지 여부를 결정하기 위해 출력 디바이스의 하나 이상의 특성들을 분석하도록 구성된다.11 is a table illustrating different dynamic range requirements for various playback devices and background listening environments in an exemplary use case. Similar requirements can be obtained for loudness. Different dynamic range and loudness requirements create different profiles used by the optimization system 321 . System 321 includes a loudness and dynamic range measurement component 302 that analyzes and measures the loudness and dynamic range of input audio. In an embodiment, the system analyzes the entire program content to determine an overall loudness parameter. In this context, loudness refers to the long term program loudness or average loudness of a program, where a program is a single unit of audio content, such as a movie, television show, advertisement, or similar program content. Loudness is used to provide an indication of the artistic dynamic range profile used by content creators to control how the audio will be played. Loudness is related to Dialnorm metadata values in that Dialnorm represents the average dialog loudness of a single program (eg, movie, show, advertisement, etc.). Short-term dynamic range quantifies variations in signals over a much shorter period of time than program loudness. For example, short-term dynamic range may be measured in the order of seconds, whereas program loudness may be measured over a span of minutes or even hours. Short-term dynamic range provides a program loudness-independent protection mechanism to ensure that overloading does not occur for various playback profiles and device types. In an embodiment, the loudness (long term program loudness) target is based on dialog loudness and the short term dynamic range is based on relative-gated and/or non-gated loudness. In this case, the specific DRC and loudness components in the system are context-aware with respect to content type and/or target device types and characteristics. As part of this context-aware capability, the system determines whether a device is a member of specific groups of devices that are optimized for specific DRC and loudness playback conditions, such as AVR type devices, televisions, computers, portable devices, and the like. and analyze one or more characteristics of the output device to determine.

전-처리 구성요소는 복수의 상이한 프로파일들의 각각의 프로파일에 대한 고유의 메타데이터를 생성하기 위해 라우드니스, 피크들, 실제 피크들, 및 조용한 기간들을 결정하도록 상기 프로그램 콘텐트를 분석한다. 실시예에서, 라우드니스는 다이얼로그-게이팅 라우드니스 및/또는 상대적-게이팅 라우드니스일 수 있다. 상이한 프로파일들이 다양한 DRC(동적 범위 제어) 및 타겟 라우드니스 모드들을 정의하며, 여기에서 상이한 이득 값들이 소스 오디오 콘텐트의 특성들, 원하는 타겟 라우드니스 및 재생 디바이스 유형 및/또는 환경에 의존하여 인코더에서 생성된다. 상기 디코더는 (상기 언급된 프로파일들에 의해 인에이블된) 상이한 DRC 및 타겟 라우드니스 모드들을 제공할 수 있으며 오디오 신호의 압축이 없고 라우드니스 정규화가 없는 전체 동적 범위 목록을 허용하는 오프/디스에이블된 DRC 및 타겟 라우드니스, -31 LKFS의 타겟을 가진 라우드니스 정규화를 갖고 인코더에서 (구체적으로 이러한 재생 모드 및/또느 디바이스 프로파일을 위해) 생성된 이득 값들을 통해 적절한 동적 범위 압축을 제공하는, 홈 시어터 시스템들 상에서의 재생을 위한 -31 LKFS 라인 모드의 타겟을 가진 오프/디스에이블된 DRC 및 라우드니스 정규화; -24, -23 또는 -20 LKFS 중 하나의 타겟으로 라우드니스 정규화를 갖는 많은 양의 동적 범위 압축을 제공하는 TV 스피커들을 통한 재생을 위한 RF 모드, -14 LKFS의 타겟에서 라우드니스 정규화를 가진 압축을 제공하는, 컴퓨터들 또는 유사한 디바이스들에 걸친 재생을 위한 중간 모드, 및 -11 LKFS의 라우드니스 정규화 타겟을 가진 매우 많은 동적 범위 압축을 제공하는 휴대용 모드를 포함할 수 있다. -31, -23/-20, -14, 및 -11 LKFS의 타겟 라우드니스 값들은 일부 실시예들 하에서 시스템을 위해 정의될 수 있는 상이한 재생/디바이스 프로파일들의 예들이 되도록 의도되며, 임의의 다른 적절한 타겟 라우드니스 값들이 사용될 수 있고, 시스템은 이들 재생 모드들 및/또는 디바이스 프로파일에 특정한 적절한 이득 값들을 생성한다. 더욱이, 시스템은 확장 및 적응 가능하여, 상이한 재생 디바이스들 및 청취 환경들이 인코더 또는 그 밖의 것에서 새로운 프로파일을 정의함으로써 수용되어 인코더로 로딩될 수 있다. 이러한 방식으로, 새로우며 고유한 재생/디바이스 프로파일들이 추가 응용을 위해 개선된 또는 상이한 재생 디바이스들을 지원하기 위해 생성될 수 있다.A pre-processing component analyzes the program content to determine loudness, peaks, actual peaks, and quiet periods to generate unique metadata for each profile of a plurality of different profiles. In embodiments, the loudness may be dialog-gated loudness and/or relative-gated loudness. Different profiles define various DRC (dynamic range control) and target loudness modes, where different gain values are generated at the encoder depending on the characteristics of the source audio content, the desired target loudness and playback device type and/or environment. The decoder can provide different DRC and target loudness modes (enabled by the profiles mentioned above) and off/disabled DRC and Target Loudness, on home theater systems with loudness normalization with a target of -31 LKFS and providing adequate dynamic range compression via gain values generated at the encoder (specifically for this playback mode and/or device profile). Off/disabled DRC and loudness normalization with target of -31 LKFS line mode for playback; RF mode for playback through TV speakers providing a large amount of dynamic range compression with loudness normalization with a target of either -24, -23 or -20 LKFS, compression with loudness normalization at a target of -14 LKFS , an intermediate mode for playback across computers or similar devices, and a portable mode that provides a very large dynamic range compression with a loudness normalization target of -11 LKFS. Target loudness values of -31, -23/-20, -14, and -11 LKFS are intended to be examples of different playback/device profiles that may be defined for the system under some embodiments, and any other suitable target Loudness values may be used and the system generates appropriate gain values specific to these playback modes and/or device profile. Moreover, the system is scalable and adaptable so that different playback devices and listening environments can be accommodated and loaded into the encoder by defining a new profile at the encoder or otherwise. In this way, new and unique playback/device profiles can be created to support improved or different playback devices for further applications.

실시예에서, 이득 값들은 인코더(304), 디코더(312), 또는 트랜스코더(308)와 같은, 시스템(321)의 임의의 적절한 프로세싱 구성요소, 또는 인코더와 연관된 임의의 연관된 전-처리 구성요소 또는 디코더와 연관된 임의의 후-처리 구성요소에서 산출될 수 있다.In an embodiment, the gain values are determined by any suitable processing component of system 321 , such as encoder 304 , decoder 312 , or transcoder 308 , or any associated pre-processing component associated with the encoder. or in any post-processing component associated with the decoder.

도 13은 실시예 하에서, 다양한 상이한 재생 디바이스 클래스들을 위한 상이한 프로파일들 사이에서의 인터페이스를 예시한 블록도이다. 도 13에 도시된 바와 같이, 인코더(502)는 오디오 입력(501) 및 여러 개의 상이한 가능한 프로파일들(506) 중 하나를 수신한다. 인코더는 타겟 재생 디바이스에 존재하거나 또는 그것과 연관된 디코더 구성요소에서 프로세싱되는 출력 비트스트림 파일을 생성하기 위해 선택된 프로파일과 오디오 데이터를 결합한다. 도 13의 예에서, 상이한 재생 디바이스들은 컴퓨터(510), 이동 전화(512), AVR(514), 및 텔레비전(516)일 수 있지만, 많은 다른 출력 디바이스들이 또한 가능하다. 디바이스들(510 내지 516)의 각각은 드라이버들(320 내지 324)과 같은 (드라이버들 및/또는 트랜듀서들을 포함한) 스피커들을 포함하거나 또는 그것에 결합된다. 재생 디바이스들 및 연관된 스피커들의 프로세싱, 출력 정격들, 및 크기들의 조합은 일반적으로 어떤 프로파일이 상기 특정한 타겟에 대해 가장 최적인지를 설명한다. 따라서, 상기 프로파일들(506)은 구체적으로 AVR들, TV들, 이동 스피커들, 이동 헤드폰들 등을 통한 재생을 위해 정의될 수 있다. 그것들은 또한 조용한 모드, 야간 모드, 옥외, 실내 등과 같은, 특정 동작 모드들 또는 상태들을 위해 정의될 수 있다. 도 13에 도시된 프로파일들은 단지 예시적인 모드들이며 특정 타겟들 및 환경들을 위한 맞춤 프로파일들을 포함한, 임의의 적절한 프로파일이 정의될 수 있다.13 is a block diagram illustrating the interface between different profiles for various different playback device classes, under an embodiment. As shown in FIG. 13 , the encoder 502 receives an audio input 501 and one of several different possible profiles 506 . The encoder combines the audio data with the selected profile to produce an output bitstream file that is present at or associated with the target playback device and processed at a decoder component. In the example of FIG. 13 , the different playback devices may be a computer 510 , a mobile phone 512 , an AVR 514 , and a television 516 , although many other output devices are also possible. Each of devices 510 - 516 includes or is coupled to speakers (including drivers and/or transducers), such as drivers 320 - 324 . The combination of processing, output ratings, and sizes of playback devices and associated speakers generally dictates which profile is most optimal for that particular target. Thus, the profiles 506 may be specifically defined for playback via AVRs, TVs, mobile speakers, mobile headphones, and the like. They may also be defined for specific operating modes or states, such as quiet mode, night mode, outdoor, indoor, and the like. The profiles shown in FIG. 13 are merely exemplary modes and any suitable profile may be defined, including custom profiles for specific targets and environments.

도 13은 인코더(502)가 프로파일들(506)을 수신하며 라우드니스 및 DRC 프로세싱을 위한 적절한 파라미터들을 생성하는 실시예를 예시하지만, 프로파일 및 오디오 콘텐트에 기초하여 생성된 파라미터들은 인코더, 디코더, 트랜코더, 전-처리기, 후-처리기 등과 같은 어떠한 적절한 오디오 프로세싱 유닛 상에서도 수행될 수 있다는 것이 주의되어야 한다. 예를 들면, 도 13의 각각의 출력 디바이스(510 내지 516)는 라우드니스 및 동적 범위의 적응화를 상기 디바이스 또는 타겟 출력 디바이스의 디바이스 유형에 매칭할 수 있게 하기 위해 인코더(502)로부터 전송된 파일(504)의 비트스트림에서 메타데이터를 프로세싱하는 디코더 구성요소를 갖거나 또는 그에 결합된다.13 illustrates an embodiment in which the encoder 502 receives the profiles 506 and generates appropriate parameters for loudness and DRC processing, but the parameters generated based on the profile and audio content are the encoder, decoder, transcoder , it should be noted that it may be performed on any suitable audio processing unit, such as a pre-processor, post-processor, etc. For example, each output device 510-516 of FIG. 13 may have a file 504 sent from the encoder 502 to enable an adaptation of loudness and dynamic range to match the device type of the device or target output device. ) has or is coupled to a decoder component that processes metadata in the bitstream.

실시예에서, 오디오 콘텐트의 동적 범위 및 라우드니스는 각각의 가능한 재생 디바이스에 대해 최적화된다. 이것은 (신호 역학들, 샘플 피크들 및/또는 실제 피크들을 제어함으로써) 타겟 재생 모드들의 각각에 대한 오디오 경험을 최적화하기 위해 장기 라우드니스를 타겟으로 유지하며 단기 동적 범위를 제어함으로써 달성된다. 상이한 메타데이터 요소들이 장기 라우드니스 및 단기 동적 범위에 대해 정의된다. 도 10에 도시된 바와 같이, 구성요소(302)는 이들 별개의 DR 구성요소들의 양쪽 모두에 대한 관련 특성들을 도출하기 위해 전체 입력 오디오 신호(또는 적용 가능하다면, 스피치 구성요소와 같은 그것의 부분들)를 분석한다. 이것은 상이한 이득 값들이 예술적 이득들 대 클립(오버로드 보호) 이득 값들에 대해 정의될 수 있도록 한다.In an embodiment, the dynamic range and loudness of the audio content is optimized for each possible playback device. This is achieved by controlling the short-term dynamic range while targeting long-term loudness to optimize the audio experience for each of the target playback modes (by controlling the signal dynamics, sample peaks and/or actual peaks). Different metadata elements are defined for long term loudness and short term dynamic range. As shown in Figure 10, component 302 is configured to derive the relevant properties for both of these distinct DR components, the entire input audio signal (or portions thereof, such as the speech component, if applicable). ) is analyzed. This allows different gain values to be defined for artistic gains versus clip (overload protection) gain values.

장기 라우드니스 및 단기 동적 범위에 대한 이들 이득 값들은 그 후 라우드니스 및 동적 범위 제어 이득 값들을 설명하는 파라미터들을 생성하기 위해 프로파일(305)에 매핑된다. 이들 파라미터들은 다중화기(306), 또는 트랜스코더(308)를 통해 디코더 스테이지로 송신되는 비트스트림의 생성을 위한 유사한 구성요소에서 인코더(304)로부터의 인코딩된 오디오 신호와 결합된다. 디코더 스테이지로 입력된 비트스트림은 역다중화기(310)에서 역다중화된다. 그것은 그 후 디코더(312)에서 디코딩된다. 이득 구성요소(314)는 그 후 적절한 재생 디바이스들 및 드라이버들 또는 트랜듀서들(320 내지 324)을 통한 재생을 위해 DACS 유닛(416)을 통해 프로세싱되는 디지털 오디오 데이터를 생성하기 위해 적절한 프로파일에 대응하는 이득들을 적용한다. These gain values for long term loudness and short term dynamic range are then mapped to profile 305 to generate parameters that describe loudness and dynamic range control gain values. These parameters are combined with the encoded audio signal from the encoder 304 in a multiplexer 306 , or similar component for generation of a bitstream that is transmitted to the decoder stage via a transcoder 308 . The bitstream input to the decoder stage is demultiplexed in the demultiplexer 310 . It is then decoded in decoder 312 . The gain component 314 then corresponds to the appropriate profile to generate digital audio data that is processed via the DACS unit 416 for playback via the appropriate playback devices and drivers or transducers 320 - 324 . apply the benefits of

도 14는 실시예 하에서, 복수의 정의된 프로파일들에 대한 장기 라우드니스 및 단기 동적 범위 사이에서의 상관관계를 예시하는 표이다. 도 14의 표 4에 도시된 바와 같이, 각각의 프로파일은 시스템의 디코더에서 또는 타겟 디바이스의 각각에서 적용된 동적 범위 압축(DRC)의 양을 딕테이트하는 이득 값들의 세트를 포함한다. 프로파일들(1 내지 N)로 표시된 N개의 프로파일들의 각각은, 디코더 스테이지에서 적용된 대응하는 이득 값들을 딕테이트함으로써, 특정한 장기 라우드니스 파라미터들(예로서, Dialnorm) 및 오버로드 압축 파라미터들을 설정한다. 상기 프로파일들에 대한 DRC 이득 값들은 인코더에 의해 수용되는 외부 소스에 의해 정의될 수 있거나, 또는 그것들은 외부 값들이 제공되지 않는다면 디폴트 이득 값들로서 인코더 내에서 내부적으로 생성될 수 있다.14 is a table illustrating the correlation between long term loudness and short term dynamic range for a plurality of defined profiles, under an embodiment. As shown in Table 4 of FIG. 14 , each profile includes a set of gain values that dictate the amount of dynamic range compression (DRC) applied at each of the target devices or at the decoder of the system. Each of the N profiles, denoted as profiles 1 through N, sets certain long term loudness parameters (eg Dialnorm) and overload compression parameters by dictating the corresponding gain values applied in the decoder stage. The DRC gain values for the profiles may be defined by an external source accepted by the encoder, or they may be generated internally within the encoder as default gain values if no external values are provided.

실시예에서, 각각의 프로파일에 대한 이득 값들은 각각의 가능한 디바이스 프로파일 및/또는 타겟 라우드니스에 대한 최종 DRC 이득들의 고속/저속 어택(attack) 및 고속/저속 해제를 실행하기 위해 필요한 시간 상수들뿐만 아니라 선택된 프로파일(즉, 전송 특성 또는 곡선)에 기초하여 정적 이득을 계산하기 위해, 피크, 실제 피크, 다이얼로그의 단기 라우드니스 또는 전체 단기 라우드니스 또는 양쪽 모두의 조합(하이브리드)과 같은, 오디오 신호의 특정한 특성들의 분석에 기초하여 계산되는 DRC 이득 워드들에서 구현된다. 상기 서술된 바와 같이, 이들 프로파일들은 인코더, 디코더에서 존재할 수 있거나 또는 외부적으로 생성되며 콘텐트 생성기로부터 외부 메타데이터를 통해 인코더에 전달된다. In an embodiment, the gain values for each profile are the time constants needed to effect fast/slow attack and fast/slow release of the final DRC gains for each possible device profile and/or target loudness as well as the time constants required to perform fast/slow release. certain characteristics of the audio signal, such as peak, actual peak, short-term loudness of dialog or overall short-term loudness, or a combination of both (hybrid), to calculate a static gain based on a selected profile (i.e., transmission characteristic or curve). It is implemented in the DRC gain words calculated based on the analysis. As described above, these profiles may exist at the encoder, decoder, or are generated externally and passed from the content producer to the encoder via external metadata.

실시예에서, 이득 값들은 오디오 콘텐트의 전체 주파수들에 걸쳐 동일한 이득을 적용하는 광대역 이득일 수 있다. 대안적으로, 상기 이득은 상이한 이득 값들이 상이한 주파수들 또는 오디오 콘텐트의 주파수 대역들에 적용되도록 다중-대역 이득 값들로 구성될 수 있다. 다-채널 경우에, 각각의 프로파일은 단일 이득 값 대신에 상이한 주파수 대역들에 대한 이득들을 표시한 이득 값들의 행렬을 구성할 수 있다.In an embodiment, the gain values may be a wideband gain that applies the same gain across all frequencies of the audio content. Alternatively, the gain may consist of multi-band gain values such that different gain values are applied to different frequencies or frequency bands of the audio content. In the multi-channel case, each profile may constitute a matrix of gain values indicating gains for different frequency bands instead of a single gain value.

도 10을 참조하면, 실시예에서, 청취 환경들 및/또는 재생 디바이스들의 능력들 및 구성들의 속성들 또는 특성들에 관한 정보가 디코더 스테이지에 의해, 피드백 링크(330)에 의해 인코더 스테이지에 제공된다. 프로파일 정보(332)는 또한 인코더(304)에 입력된다. 실시예에서, 디코더는 제 1 그룹의 오디오 재생 디바이스들에 대한 라우드니스 파라미터가 비트스트림에서 이용 가능하지 여부를 결정하기 위해 비트스트림내의 메타데이터를 분석한다. 그렇다면, 그것은 오디오를 렌더링할 때 사용하기 위해 다운스트림으로 파라미터들을 송신한다. 그렇지 않다면, 인코더는 파라미터들을 도출하기 위해 디바이스들의 특정한 특성들을 분석한다. 이들 파라미터들은 그 후 재생을 위해 다운스트림 렌더링 구성요소로 전송된다. 상기 인코더는 또한 수신된 오디오 스트림을 렌더링할 출력 디바이스(또는 출력 디바이스를 포함한 출력 디바이스들의 그룹)를 결정한다. 예를 들면, 출력 디바이스는 셀 전화인 것으로 또는 그룹 형 휴대용 디바이스들에 속하는 것으로 결정될 수 있다. 실시예에서, 디코더는 결정된 출력 디바이스 또는 출력 디바이스들의 그룹을 인코더에 표시하기 위해 피드백 링크(330)를 사용한다. 이러한 피드백에 대해, 출력 디바이스에 접속된 모듈(예로서, 헤드셋들에 접속되거나 또는 랩탑의 스피커들에 접속된 사운드카드에서의 모듈)은 출력 디바이스의 아이덴티티 또는 출력 디바이스를 포함하는 디바이스들의 그룹의 아이덴티티를 디코더에 표시할 수 있다. 디코더는 이러한 정보를 피드백 링크(330)를 통해 인코더에 송신한다. 실시예에서, 디코더는 라우드니스 및 DRC 파라미터들을 결정하기 위해 디코더를 수행한다. 실시예에서, 디코더는 라우드니스 및 DRC 파라미터들을 결정한다. 이 실시예에서, 피드백 링크(330)를 통해 정보를 송신하는 대신에, 디코더는 라우드니스 및 DRC 파라미터들을 결정하기 위해 결정된 디바이스 또는 출력 디바이스들의 그룹에 대한 정보를 사용한다. 또 다른 실시예에서, 또 다른 오디오 프로세싱 유닛은 라우드니스 및 DRC 파라미터들을 결정하며 디코더는 상기 정보를 디코더 대신에 상기 오디오 프로세싱 유닛에 송신한다.Referring to FIG. 10 , in an embodiment information regarding properties or characteristics of the capabilities and configurations of the listening environments and/or playback devices is provided by the decoder stage to the encoder stage by way of a feedback link 330 . . Profile information 332 is also input to encoder 304 . In an embodiment, the decoder analyzes the metadata in the bitstream to determine whether a loudness parameter for the first group of audio reproduction devices is available in the bitstream. If so, it sends parameters downstream for use in rendering the audio. Otherwise, the encoder analyzes certain characteristics of the devices to derive parameters. These parameters are then sent to the downstream rendering component for playback. The encoder also determines an output device (or group of output devices comprising an output device) to render the received audio stream. For example, the output device may be determined to be a cell phone or to belong to a group of portable devices. In an embodiment, the decoder uses the feedback link 330 to indicate to the encoder the determined output device or group of output devices. For this feedback, a module connected to the output device (eg, a module in a soundcard connected to headsets or connected to speakers of a laptop) is the identity of the output device or the identity of the group of devices containing the output device. can be displayed to the decoder. The decoder transmits this information to the encoder via a feedback link 330 . In an embodiment, the decoder performs the decoder to determine loudness and DRC parameters. In an embodiment, the decoder determines loudness and DRC parameters. In this embodiment, instead of transmitting information over feedback link 330 , the decoder uses information about the determined device or group of output devices to determine loudness and DRC parameters. In another embodiment, another audio processing unit determines loudness and DRC parameters and the decoder sends the information to the audio processing unit instead of the decoder.

도 12는 실시예 하에서, 동적 범위 최적화 시스템의 블록도이다. 도 12에 도시된 바와 같이, 인코더(402)는 입력 오디오(401)를 수신한다. 상기 인코딩된 오디오는 선택된 압축 곡선(422) 및 Dialnorm 값(424)으로부터 생성된 파라미터들(404)과 다중화기(409)에서 결합된다. 결과적인 비트스트림은 디코더(406)에 의해 디코딩되는 오디오 신호들을 생성하는 역다중화기(411)에 송신된다. 상기 파라미터들 및 Dialnorm 값들은 디코더 출력의 증폭을 위해 증폭기(410)를 구동하는 이득 레벨들을 생성하기 위해 이득 산출 유닛(408)에 의해 사용된다. 도 12는 어떻게 동적 범위 제어가 파라미터화되고 비트스트림에 삽입되는지를 예시한다. 라우드니스가 또한 유사한 구성요소들을 사용하여 파라미터화되고 비트스트림에 삽입될 수 있다. 실시예에서, 출력 기준 레벨 제어(도시되지 않음)가 또한 디코더에 제공될 수 있다. 도면은 인코더에서 결정되어 삽입되는 것으로서 라우드니스 및 동적 범위 파라미터들을 예시하지만, 유사한 결정이 전-처리기, 디코더, 및 후-처리기와 같은 다른 오디오 프로세싱 유닛들에서 수행될 수 있다.12 is a block diagram of a dynamic range optimization system, under an embodiment; As shown in FIG. 12 , the encoder 402 receives the input audio 401 . The encoded audio is combined in a multiplexer 409 with parameters 404 generated from a selected compression curve 422 and a Dialnorm value 424 . The resulting bitstream is sent to a demultiplexer 411 that generates audio signals that are decoded by a decoder 406 . The parameters and Dialnorm values are used by the gain calculation unit 408 to generate gain levels that drive the amplifier 410 for amplification of the decoder output. 12 illustrates how dynamic range control is parameterized and inserted into the bitstream. Loudness can also be parameterized and inserted into the bitstream using similar components. In an embodiment, an output reference level control (not shown) may also be provided to the decoder. Although the figure illustrates loudness and dynamic range parameters as determined and inserted at the encoder, similar determinations may be made in other audio processing units such as pre-processor, decoder, and post-processor.

도 15는 실시예 하에서, 상이한 유형들의 오디오 콘텐트에 대한 라우드니스 프로파일들의 예들을 예시한다. 도 15에 도시된 바와 같이, 예시적인 곡선들(600 및 602)은 0 LKFS 주위에 중심을 둔 이득에 대한 입력 라우드니스(LKFS로)를 나타낸다. 상이한 유형들의 콘텐트가 도 15에 도시된 바와 같이 상이한 곡선들을 보여주며, 여기에서 곡선(600)은 스피치를 나타내며 곡선(602)은 표준 필름 콘텐트를 나타낼 수 있다. 도 15에 도시된 바와 같이, 스피치 콘텐트는 필름 콘텐트에 비해 보다 많은 양의 이득의 대상이 된다. 도 15는 특정한 유형들의 오디오 콘텐트에 대한 대표적인 프로파일 곡선들의 예들이 되도록 의도되며, 다른 프로파일 곡선들이 또한 사용될 수 있다. 도 15에 도시된 바와 같은, 프로파일 특성들의 특정한 양상들이 최적화 시스템에 대한 관련 파라미터들을 도출하기 위해 사용된다. 실시예에서, 이들 파라미터들은: 널 대역폭, 컷 비(cut ratio), 부스트 비, 최대 부스트, FS 어택, FS 감쇠, 홀드오프(holdoff), 피크 제한, 및 타겟 레벨 라우드니스를 포함한다. 애플리케이션 요건들 및 시스템 제약들에 의존하여 이들 파라미터들 중 적어도 일부에 부가하여 또는 대안적으로 다른 파라미터들이 사용될 수 있다. 15 illustrates examples of loudness profiles for different types of audio content, under an embodiment. As shown in FIG. 15 , exemplary curves 600 and 602 represent the input loudness (in LKFS) versus gain centered around zero LKFS. Different types of content show different curves as shown in FIG. 15 , where curve 600 represents speech and curve 602 may represent standard film content. As shown in Figure 15, speech content is subject to a greater amount of gain compared to film content. 15 is intended to be examples of representative profile curves for certain types of audio content, other profile curves may also be used. As shown in FIG. 15 , certain aspects of the profile characteristics are used to derive relevant parameters for the optimization system. In an embodiment, these parameters include: null bandwidth, cut ratio, boost ratio, maximum boost, FS attack, FS attenuation, holdoff, peak limit, and target level loudness. Other parameters may be used in addition to or alternatively to at least some of these parameters depending on application requirements and system constraints.

도 16은 실시예 하에서, 재생 디바이스들 및 애플리케이션들에 걸쳐 라우드니스 및 동적 범위를 최적화하는 방법을 예시하는 흐름도이다. 도면은 인코더에서 수행되는 것으로서 라우드니스 및 동적 범위 최적화를 예시하지만, 유사한 최적화가 전-처리기, 디코더, 및 후-처리기와 같은 다른 오디오 프로세싱 유닛들에서 수행될 수 있다. 프로세스(620)에서 도시된 바와 같이, 방법은 인코더 스테이지가 소스로부터 입력 신호를 수신하는 것에서 시작한다(603). 상기 인코더 또는 전-처리 구성요소는 그 후 소스 신호가 타겟 라우드니스 및/또는 동적 범위를 달성하는 프로세스를 받았는지 여부를 결정한다(604). 상기 타겟 라우드니스는 장기 라우드니스에 대응하며 외부에서 또는 내부적으로 정의될 수 있다. 소스 신호가 타겟 라우드니스 및/또는 동적 범위를 달성하기 위한 프로세스를 받지 않았다면, 시스템은 적절한 라우드니스 및/또는 동적 범위 제어 동작을 수행한다(608); 그렇지 않고, 소스 신호가 이러한 라우드니스 및/또는 동적 범위 제어 동작을 받았다면, 시스템은 원래 프로세스가 적절한 장기 라우드니스 및/또는 동적 범위를 딕테이트하도록 허용하기 위해 라우드니스 제어 및/또는 동적 범위 동작들을 스킵하도록 바이패스 모드에 들어간다(606). 바이패스 모드(606) 또는 수행 모드(608)(단일 광대역 이득 값들 또는 주파수-종속적 다중-대역 이득 값들일 수 있음) 중 하나에 대한 적절한 이득 값들이 그 후 디코더에 적용된다(612).16 is a flow diagram illustrating a method of optimizing loudness and dynamic range across playback devices and applications, under an embodiment. Although the figure illustrates loudness and dynamic range optimization as being performed at the encoder, similar optimizations may be performed in other audio processing units such as pre-processor, decoder, and post-processor. As shown in process 620, the method begins with an encoder stage receiving an input signal from a source (603). The encoder or pre-processing component then determines whether the source signal has been subjected to a process to achieve target loudness and/or dynamic range (604). The target loudness corresponds to a long-term loudness and may be defined externally or internally. If the source signal has not been subjected to processing to achieve the target loudness and/or dynamic range, the system performs 608 appropriate loudness and/or dynamic range control operations; Otherwise, if the source signal has been subjected to such a loudness and/or dynamic range control operation, the system is configured to skip the loudness control and/or dynamic range operations to allow the original process to dictate an appropriate long term loudness and/or dynamic range operation. Enter bypass mode (606). Appropriate gain values for either bypass mode 606 or performance mode 608 (which may be single wideband gain values or frequency-dependent multi-band gain values) are then applied 612 to the decoder.

비트스트림 포맷bitstream format

이전에 기술된 바와 같이, 라우드니스 및 동적 범위를 최적화하기 위한 시스템은 인코더 및 디코더 사이에서, 또는 소스 및 렌더링/재생 디바이스들 사이에서 비트스트림에 송신된 메타데이터 및 오디오 콘텐트가 서로로부터 분리되지 않았거나 또는 그 외 네트워크들 또는 서비스 제공자 인터페이스들과 같은 다른 독점 장비 등을 통한 송신 동안 변질되지 않았음을 보장하기 위해 안전한 확장 가능한 메타데이터 포맷을 이용한다. 이러한 비트스트림은 적절한 프로파일 정보를 통해 오디오 콘텐트 및 출력 디바이스 특성들을 맞추도록 오디오 신호의 라우드니스 및 동적 범위를 적응시키기 위해 인코더 및/또는 디코더 구성요소들을 시그널링하기 위한 메커니즘을 제공한다. 실시예에서, 시스템은 인코더 및 디코더 사이에서 송신될 저 비트 레이트 인코딩된 비트스트림을 결정하도록 구성되며, 메타데이터를 통해 인코딩된 라우드니스 정보는 하나 이상의 출력 프로파일들에 대한 특성들을 포함한다. 실시예 하에서 라우드니스 및 동적 범위 최적화 시스템과 함께 사용하기 위한 비트스트림 포맷의 설명이 이어진다.As previously described, a system for optimizing loudness and dynamic range ensures that the metadata and audio content transmitted in the bitstream between the encoder and the decoder, or between the source and rendering/playback devices, are not separated from each other or or other networks or other proprietary equipment, such as service provider interfaces, to ensure that it has not been corrupted during transmission, etc., using a secure, extensible metadata format. This bitstream provides a mechanism for signaling the encoder and/or decoder components to adapt the loudness and dynamic range of the audio signal to match the audio content and output device characteristics via appropriate profile information. In an embodiment, the system is configured to determine a low bit rate encoded bitstream to be transmitted between an encoder and a decoder, wherein the encoded loudness information via metadata includes characteristics for one or more output profiles. A description of a bitstream format for use with a loudness and dynamic range optimization system follows under an embodiment.

AC-3 인코딩된 비트스트림은 오디오 콘텐트에 대한 1 대 6 채널들 및 메타데이터를 포함한다. 상기 오디오 콘텐트는 지각적 오디오 코딩을 사용하여 압축된 오디오 데이터이다. 상기 메타데이터는 청취 환경에 전달된 프로그램의 사운드를 변경하는데 사용하기 위해 의도되는 여러 개의 오디오 메타데이터 파라미터들을 포함한다. AC-3 인코딩된 오디오 비트스트림의 각각의 프레임은 디지털 오디오의 1536개의 샘플들에 대한 오디오 콘텐트 및 메타데이터를 포함한다. 48 kHz의 샘플링 레이트에 대해, 이것은 32 밀리초들의 디지털 오디오 또는 오디오의 초당 31.25 프레임들의 레이트를 나타낸다. An AC-3 encoded bitstream contains 1 to 6 channels and metadata for audio content. The audio content is audio data compressed using perceptual audio coding. The metadata includes several audio metadata parameters intended for use in modifying the sound of a program delivered to the listening environment. Each frame of the AC-3 encoded audio bitstream contains audio content and metadata for 1536 samples of digital audio. For a sampling rate of 48 kHz, this represents a rate of 32 milliseconds of digital audio or 31.25 frames per second of audio.

E-AC-3 인코딩된 오디오 비트스트림의 각각의 프레임은 프레임이 각각 오디오 데이터의 1, 2, 3, 또는 6개의 블록들을 포함하는지에 의존하여, 디지털 오디오의 256, 512, 768, 또는 1536 샘플들에 대한 오디오 콘텐트 및 메타데이터를 포함한다. 48 kHz의 샘플링 레이트에 대해, 이것은 각각 5.333, 10.667, 16 또는 32 밀리초들의 디지털 오디오 또는 각각 오디오의 초당 189.9, 93.75, 62.5 또는 31.25 프레임들의 레이트를 나타낸다. Each frame of an E-AC-3 encoded audio bitstream contains 256, 512, 768, or 1536 samples of digital audio, depending on whether the frame contains 1, 2, 3, or 6 blocks of audio data, respectively. It contains audio content and metadata about them. For a sampling rate of 48 kHz, this represents a rate of 5.333, 10.667, 16 or 32 milliseconds of digital audio, respectively, or 189.9, 93.75, 62.5 or 31.25 frames per second of audio, respectively.

도 4에 표시된 바와 같이, 각각의 AC-3 프레임은 다음을 포함한 섹션들(세그먼트들)로 분할된다: 동기화 워드(SW) 및 두 개의 에러 교정 워드들 중 첫 번째(CRC1)를 (도 5에 도시된 바와 같이) 포함하는 동기화 정보(SI) 섹션; 메타데이터의 대부분을 포함하는 비트스트림 정보(BSI) 섹션; 데이터 압축 오디오 콘텐트를 포함하는(및 또한 메타데이터를 포함할 수 있는) 6개의 오디오 블록들(AB0 내지 AB5); 오디오 콘텐트가 압축된 후 남겨진 임의의 사용되지 않은 비트들을 포함하는 낭비(waste) 비트들(W); 보다 많은 메타데이터를 포함할 수 있는 보조(AUX) 정보 섹션; 및 두 개의 에러 교정 워드들 중 두 번째(CRC2).As indicated in FIG. 4, each AC-3 frame is divided into sections (segments) containing: a synchronization word (SW) and the first of two error correction words (CRC1) (in FIG. as shown) a synchronization information (SI) section comprising; a bitstream information (BSI) section containing most of the metadata; 6 audio blocks AB0 to AB5 containing data compressed audio content (and may also contain metadata); waste bits (W) including any unused bits left after the audio content has been compressed; Auxiliary (AUX) information section, which may contain more metadata; and the second of the two error correction words (CRC2).

도 7에 표시된 바와 같이, 각각의 E-AC-3 프레임은 다음을 포함한 섹션들(세그먼트들)로 분할된다: (도 5에 도시된 바와 같이) 동기화 워드(SW)를 포함하는 동기화 정보(SI) 섹션; 메타데이터의 대부분을 포함하는 비트스트림 정보(BSI) 섹션; 데이터 압축 오디오 콘텐트를 포함하는(및 또한 메타데이터를 포함할 수 있는) 1 및 6 사이에서의 오디오 블록들(AB0 내지 AB5); 오디오 콘텐트가 압축된 후 남겨진 임의의 사용되지 않은 비트들을 포함하는 낭비 비트들(W); 보다 많은 메타데이터를 포함할 수 있는 보조(AUX) 정보 섹션; 및 에러 교정 워드(CRC).As shown in Fig. 7, each E-AC-3 frame is divided into sections (segments) containing: Synchronization information (SI) containing a synchronization word (SW) (as shown in Fig. 5) ) section; a bitstream information (BSI) section containing most of the metadata; audio blocks AB0 to AB5 between 1 and 6 containing data compressed audio content (and may also contain metadata); waste bits (W) including any unused bits left after the audio content has been compressed; Auxiliary (AUX) information section, which may contain more metadata; and Error Correction Word (CRC).

AC-3(또는 E-AC-3) 비트스트림에서, 청취 환경에 전달된 프로그램의 사운드를 변경하는데 특별히 사용하도록 의도되는 여러 개의 오디오 메타데이터 파라미터들이 있다. 메타데이터 파라미터들 중 하나는 Dialnorm 파라미터이며, 이것은 BSI 세그먼트에 포함된다.In the AC-3 (or E-AC-3) bitstream, there are several audio metadata parameters that are specifically intended for use in modifying the sound of a program delivered to the listening environment. One of the metadata parameters is the Dialnorm parameter, which is included in the BSI segment.

도 6에 도시된 바와 같이, AC-3 프레임의 BSI 세그먼트는 프로그램에 대한 Dialnorm 값을 표시한 5-비트 파라미터("Dialnorm")를 포함한다. 동일한 AC-3 프레임으로 운반된 제 2 오디오 프로그램에 대한 Dialnorm 값을 표시한 5-비트 파라미터("Dialnorm2")는 AC-3 프레임의 오디오 코딩 모드("acmod")가, 이중-모노 또는 "1+1" 채널 구성이 사용 중임을 나타내는, "0"이면 포함된다.As shown in Fig. 6, the BSI segment of the AC-3 frame includes a 5-bit parameter ("Dialnorm") indicating the Dialnorm value for the program. The 5-bit parameter ("Dialnorm2") indicating the Dialnorm value for the second audio program carried in the same AC-3 frame indicates that the audio coding mode ("acmod") of the AC-3 frame is either double-mono or "1". +1" is included if "0", indicating that the channel configuration is in use.

BSI 세그먼트는 또한 "addbsie" 비트에 이어 부가적인 비트 스트림 정보의 존재(또는 부재)를 표시한 플래그("addbsie"), "addbsil" 값에 이어 임의의 부가적인 비트 스트림 정보의 길이를 표시한 파라미터("addbsil"), 및 "addbsil" 값에 이어 최대 64 비트들까지의 부가적인 비트 스트림 정보("addbsi")를 포함한다. 상기 BSI 세그먼트는 도 6에 특별히 도시되지 않은 다른 메타데이터 값들을 포함할 수 있다.The BSI segment also includes an “addbsie” bit followed by a flag (“addbsie”) indicating the presence (or absence) of additional bit stream information, a “addbsil” value followed by a parameter indicating the length of any additional bit stream information ("addbsil"), and up to 64 bits of additional bit stream information ("addbsi") following the "addbsil" value. The BSI segment may include other metadata values not specifically shown in FIG. 6 .

여기에 설명된 하나 이상의 실시예들의 양상들은 소프트웨어 지시들을 실행하는 하나 이상의 컴퓨터들 또는 프로세싱 디바이스들을 포함하는 네트워크에 걸친 송신을 위해 오디오 신호들을 프로세싱하는 오디오 시스템에서 구현될 수 있다. 설명된 실시예들 중 어떠한 것도 단독으로 또는 임의의 조합으로 서로 함께 사용될 수 있다. 다양한 실시예들이 종래 기술이 가진 다양한 결점들에 의해 동기 부여가 되었을 수 있고 이러한 것이 본 명세서에서 하나 이상의 부분에서 논의되거나 또는 시사될 수 있지만, 실시예들은 이들 결점들 중 어떠한 것도 반드시 다루지는 않는다. 다시 말해서, 상이한 실시예들이 본 명세서에서 논의될 수 있는 상이한 결점들을 다룰 수 있다. 일부 실시예들은 본 명세서에서 논의될 수 있는 몇몇 결점들 또는 단지 하나의 결점만을 단지 부분적으로 다룰 수 있으며, 일부 실시예들은 이들 결점들의 어떠한 것도 다루지 않을 수 있다.Aspects of one or more embodiments described herein may be implemented in an audio system that processes audio signals for transmission over a network that includes one or more computers or processing devices executing software instructions. Any of the described embodiments can be used alone or with each other in any combination. While various embodiments may have been motivated by various deficiencies of the prior art and may be discussed or suggested in one or more portions herein, the embodiments do not necessarily address any of these deficiencies. In other words, different embodiments may address different drawbacks that may be discussed herein. Some embodiments may only partially address some or only one drawback that may be discussed herein, and some embodiments may not address any of these drawbacks.

여기에 설명된 시스템들의 양상들은 디지털 또는 디지털화된 오디오 파일들을 프로세싱하기 위한 적절한 컴퓨터-기반 사운드 프로세싱 네트워크 환경에서 구현될 수 있다. 적응적 오디오 시스템의 부분들은 컴퓨터들 중에서 송신된 데이터를 버퍼링하고 라우팅하도록 작용하는 하나 이상의 라우터들(도시되지 않음)을 포함하여, 임의의 원하는 수의 개개의 기계들을 포함하는 하나 이상의 네트워크들을 포함할 수 있다. 이러한 네트워크는 다양한 상이한 네트워크 프로토콜들 상에서 수립될 수 있으며, 인터넷, 광역 네트워크(WAN), 근거리 네트워크(LAN), 또는 그것의 임의의 조합일 수 있다.Aspects of the systems described herein may be implemented in a suitable computer-based sound processing network environment for processing digital or digitized audio files. Portions of the adaptive audio system may include one or more networks comprising any desired number of individual machines, including one or more routers (not shown) that act to buffer and route transmitted data among the computers. can Such a network may be established over a variety of different network protocols, and may be the Internet, a wide area network (WAN), a local area network (LAN), or any combination thereof.

구성요소들, 블록들, 프로세스들, 또는 다른 기능 구성요소들 중 하나 이상이 시스템의 프로세서-기반 계산 디바이스의 실행을 제어하는 컴퓨터 프로그램을 통해 구현될 수 있다. 여기에 개시된 다양한 기능들은 그것들의 행동, 레지스터 전송, 논리적 구성요소, 및/또는 다른 특성들에 대하여, 하드웨어, 펌웨어의 임의의 수의 조합들을 사용하여 및/또는 다양한 기계-판독 가능한 또는 컴퓨터-판독 가능한 미디어에서 구체화된 데이터 및/또는 지시들로서 설명될 수 있다는 것이 또한 주시되어야 한다. 이러한 포맷팅된 데이터 및/또는 지시들이 구체화될 수 있는 컴퓨터-판독 가능한 미디어는 이에 제한되지 않지만, 광학적, 자기, 또는 반도체 저장 미디어와 같은, 다양한 형태들에서 물리적(비-일시적), 비-휘발성, 저장 미디어를 포함한다. One or more of the components, blocks, processes, or other functional components may be implemented via a computer program that controls execution of a processor-based computing device of the system. The various functions disclosed herein may be implemented using any number of combinations of hardware, firmware and/or various machine-readable or computer-readable with respect to their behavior, register transfers, logical components, and/or other characteristics. It should also be noted that may be described as embodied data and/or instructions in possible media. Computer-readable media in which such formatted data and/or instructions may be embodied include, but are not limited to, physical (non-transitory), non-volatile, Includes storage media.

맥락이 달리 명확하게 요구하지 않는다면, 설명 및 청구항들 전체에 걸쳐, 단어들 "포함하다", "포함하는" 등은 배타적 또는 철저한 의미와는 대조적으로 포괄적인 의미로, 즉, "이에 제한되지 않지만, 이를 포함하는"의 의미로 해석될 것이다. 단수형 또는 복수형 숫자를 사용한 단어들은 또한 각각 복수 또는 단수 숫자를 포함한다. 부가적으로, 단어들 "여기에서, "아래에", "상기", "이하에서" 및 유사한 의미의 단어들은 본 출원 명세서의 임의의 특정한 부분들이 아닌, 전체로서 본 출원 명세서를 나타낸다. 단어 "또는" 이 둘 이상의 아이템들의 리스트를 참조하여 사용될 때, 상기 단어는 상기 단어의 다음의 해석들 모두를 커버한다: 리스트에서의 아이템들 중 임의의 것. 상기 리스트에서의 아이템들의 모두 및 상기 리스트에서의 아이템들의 임의의 조합.Unless the context clearly requires otherwise, throughout the description and claims, the words "comprises", "comprising" and the like are in an inclusive sense, i.e., not limited thereto, as opposed to an exclusive or exhaustive sense. , which will be construed as “including”. Words using the singular or plural number also include the plural or singular number, respectively. Additionally, the words "herein," "below," "above," "below," and words of similar meaning refer to the present application as a whole and not to any specific portions of the present application. or "when used in reference to a list of two or more items, the word covers all of the following interpretations of the word: any of the items in the list. all of the items in the list and in the list. any combination of items in

하나 이상의 구현들이 예로서 및 특정한 실시예들에 대하여 설명되었지만, 하나 이상의 구현들은 개시된 실시예들에 제한되지 않는다는 것이 이해될 것이다. 반대로, 그것은 이 기술분야의 숙련자들에게 명백할 바와 같이, 다양한 변경들 및 유사한 배열들을 커버하도록 의도된다. 그러므로, 첨부된 청구항들의 범위는 모든 이러한 변경들 및 유사한 배역들을 포괄하도록 가장 광범위한 해석에 부합되어야 한다. While one or more implementations have been described by way of example and with respect to specific embodiments, it will be understood that one or more implementations are not limited to the disclosed embodiments. On the contrary, it is intended to cover various modifications and similar arrangements, as will be apparent to those skilled in the art. Therefore, the scope of the appended claims is to be accorded the broadest interpretation so as to cover all such modifications and similar scopes.

10: 시스템 11: 입력
12: 전-처리 유닛 14: 인코더
16: 신호 분석 및 메타데이터 교정 유닛 18: 트랜스코더
20: 디코더 24: 후-처리 유닛
100: 인코더 101: 디코더
102: 오디오 상태 검증기 103: 라우드니스 프로세싱 스테이지
104: 오디오 스트림 선택 스테이지 105: 인코더
106: 메타데이터 발생기 107: 스터퍼/포맷터 스테이지
108: 다이얼로그 라우드니스 측정 서브시스템 109: 프레임 버퍼
110: 프레임 버퍼 111: 파서
150: 서브시스템 152: 디코더
200: 디코더 201: 프레임 버퍼
202: 오디오 디코더 203: 오디오 상태 검증기
204: 제어 비트 발생 스테이지 205: 파서
300: 후-처리기 301: 프레임 버퍼
303: 오디오 입력 304: 코어 인코더 구성요소
306: 다중화기 308: 트랜스코더
310: 역다중화기 312: 디코더
314: 이득 구성요소 316: 디지털-대-아날로그 변환기
320: 전체 범위 스피커 322: 소형 스피커
324: 헤드폰 401: 입력 오디오
402: 인코더 406: 디코더
408: 이득 산출 유닛 409: 다중화기
410: 증폭기 411: 역다중화기
416: DACS 유닛 501: 오디오 입력
502: 인코더 506: 프로파일
510: 컴퓨터 512: 이동 전화
514: AVR 516: 텔레비전10: System 11: Input
12: pre-processing unit 14: encoder
16: signal analysis and metadata correction unit 18: transcoder
20: decoder 24: post-processing unit
100: encoder 101: decoder
102: audio state verifier 103: loudness processing stage
104: audio stream selection stage 105: encoder
106: metadata generator 107: stuffer/formatter stage
108: dialog loudness measurement subsystem 109: frame buffer
110: frame buffer 111: parser
150: subsystem 152: decoder
200: decoder 201: frame buffer
202: audio decoder 203: audio state verifier
204: control bit generation stage 205: parser
300: post-processor 301: frame buffer
303: audio input 304: core encoder component
306: multiplexer 308: transcoder
310: demultiplexer 312: decoder
314: gain component 316: digital-to-analog converter
320: full range speaker 322: small speaker
324: headphones 401: input audio
402: encoder 406: decoder
408: gain calculation unit 409: multiplexer
410: amplifier 411: demultiplexer
416: DACS unit 501: audio input
502: encoder 506: profile
510: computer 512: mobile phone
514: AVR 516: Television

Claims

An audio processing device for decoding one or more frames of an encoded audio bitstream, the encoded audio bitstream comprising audio data and metadata for a plurality of dynamic range control (DRC) profiles. In:
a bitstream parser configured to parse the encoded audio bitstream and extract encoded audio data and metadata for one or more of the DRC profiles; and
an audio decoder configured to decode the encoded audio data and apply a DRC gain to the decoded audio data;
Each DRC profile is suitable for at least one device type or listening environment;
the audio decoder selects one or more DRC profiles in response to information about the audio processing device or a listening environment;
a DRC gain applied to the decoded audio data corresponds to one or more selected DRC profiles;
wherein the DRC gain corresponding to the one or more selected DRC profiles is determined from DRC parameters for the one or more selected DRC profiles included in metadata of the encoded audio bitstream.

The method of claim 1,
and the DRC parameter represents a static gain transmission characteristic and a gain smoothing time constant.

3. The method of claim 2,
wherein the time constants include slow and fast attack time constants and slow and fast release time constants.

3. The method of claim 2,
wherein the static gain transfer characteristic comprises a null band range and a maximum boost.

An audio processing method for decoding one or more frames of an encoded audio bitstream, performed by an audio processing device, wherein the encoded audio bitstream comprises audio data and metadata for a plurality of dynamic range control (DRC) profiles. In the audio processing method comprising:
parsing the encoded audio bitstream and extracting encoded audio data and metadata for one or more of the DRC profiles; and
decoding the encoded audio data and applying a DRC gain to the decoded audio data;
each DRC profile is suitable for at least one device type or listening environment;
one or more DRC profiles are selected in response to information about the audio processing device or a listening environment;
a DRC gain applied to the decoded audio data corresponds to one or more selected DRC profiles;
wherein the DRC gain corresponding to the one or more selected DRC profiles is determined from DRC parameters for the one or more selected DRC profiles included in metadata of the encoded audio bitstream.

A non-transitory computer-readable recording medium comprising a series of instructions that, when executed by an audio decoding apparatus, cause the audio decoding apparatus to perform the method of claim 2 .

A software program suitable for execution on a processor comprising a series of instructions that, when executed by an audio decoding apparatus, cause the method of claim 5 to be performed.

A computer program product comprising a series of instructions that, when executed on a computer, perform the method of claim 5 .