RU2376656C1

RU2376656C1 - Audio signal coding and decoding method and device to this end

Info

Publication number: RU2376656C1
Application number: RU2008112226/09A
Authority: RU
Inventors: Хее Сук ПАНГ (KR); Хее Сук ПАНГ; Хиеон О ОХ (KR); Хиеон О ОХ; Донг Соо КИМ (KR); Донг Соо КИМ; Дзае Хиун ЛИМ (KR); Дзае Хиун ЛИМ; Йанг Вон ДЗУНГ (KR); Йанг Вон ДЗУНГ
Original assignee: ЭлДжи ЭЛЕКТРОНИКС ИНК.
Priority date: 2005-08-30
Filing date: 2006-08-30
Publication date: 2009-12-20
Also published as: RU2473062C2; RU2009131769A; RU2008112226A

Abstract

FIELD: physics; acoustics.

SUBSTANCE: invention relates to audio signal processing. Spatial information related to an audio signal is coded in a bit stream which can be transmitted to a decoder or recorded on a data carrier. The bit stream has a different syntax, which is related to time, frequency and space domains, and also includes one or more data structures (for example frames) which contain ordered sets of time intervals with certain parametres. The data structures can be fixed or varying. An indicator of the type of data structure is included in the bit stream so that the decoder can determine types of data structures and start the corresponding decoding process. The data structure includes information on position, which is used by the decoder to identify the correct time interval for which a given set of parametres is applied. The information on position of the time interval is coded using a fixed number of bits or a varying number of bits based on the type of the data structure, indicated by data structure type indicator. For a varying type data structure, information on position is coded using a varying number of bits based on the position of the time interval in the ordered set of time intervals.

EFFECT: transmission of a multichannnel audio signal at low bit rate.

19 cl, 26 dwg

Description

Область техники, к которой относится изобретениеFIELD OF THE INVENTION

Предмет этой заявки относится в общем к обработке аудиосигнала.The subject matter of this application relates generally to audio processing.

Уровень техникиState of the art

В настоящее время проводятся исследования и разрабатываются новые подходы к перцептивному кодированию многоканального аудиосигнала, которое обычно называют пространственным аудиокодированием (SAC). SAC дает возможность передавать многоканальный аудиосигнал с низкими скоростями передачи битов, что позволяет использовать SAC для многих популярных аудиоприложений (например, потоковая передача через Интернет, скачивание музыки).Research is currently underway and new approaches are being developed for perceptual coding of multi-channel audio, which is commonly referred to as spatial audio coding (SAC). SAC allows you to transmit multi-channel audio with low bit rates, which allows you to use SAC for many popular audio applications (for example, streaming over the Internet, downloading music).

Вместо выполнения дискретного кодирования отдельных входных аудиоканалов, при использовании SAC фиксируется пространственное изображение многоканального аудиосигнала в компактном наборе параметров. Эти параметры могут передаваться в декодер, где они используются для синтеза или восстановления пространственных свойств аудиосигнала.Instead of performing discrete coding of individual input audio channels, when using SAC, a spatial image of a multi-channel audio signal is recorded in a compact set of parameters. These parameters can be transmitted to the decoder, where they are used to synthesize or restore the spatial properties of the audio signal.

В некоторых приложениях, относящихся к SAC, пространственные параметры передаются в декодер в виде части битового потока. Битовый поток включает в себя пространственные кадры, которые содержат упорядоченные наборы временных интервалов, для которых могут применяться наборы пространственных параметров. Битовый поток также включает в себя информацию о положении, которая может быть использована декодером для идентификации правильного временного интервала, для которого применяется заданный набор параметров.In some SAC related applications, spatial parameters are transmitted to the decoder as part of the bitstream. The bitstream includes spatial frames that contain ordered sets of time intervals for which sets of spatial parameters can be applied. The bitstream also includes position information that can be used by the decoder to identify the correct time interval for which a given set of parameters is applied.

В некоторых приложениях SAC в трактах кодирования/декодирования используют концептуальные элементы. Один элемент обычно называется элементом «один к двум» (OTT), а другой элемент обычно называют элементом «два к трем» (TTT), где указанные имена заключают в себе количество входных и выходных сигналов соответствующего элемента декодера. Элемент кодера OTT выделяет два пространственных параметра и создает сигнал, являющийся результатом понижающего микширования, и остаточный сигнал. Элемент TTT выполняет понижающее микширование трех аудиосигналов, в результате чего получают стереосигнал после понижающего микширования плюс остаточный сигнал. Эти элементы могут быть объединены для создания разнообразных конфигураций пространственной аудиосреды (например, объемного звука).In some SAC applications, conceptual elements are used in the coding / decoding paths. One element is usually called a one-to-two (OTT) element, and the other element is usually called a two-to-three (TTT) element, where these names include the number of input and output signals of the corresponding decoder element. The OTT encoder element extracts two spatial parameters and produces a signal resulting from the downmix and a residual signal. The TTT element down-mixes the three audio signals, resulting in a stereo signal after down-mixing plus a residual signal. These elements can be combined to create a variety of spatial audio environment configurations (such as surround sound).

Некоторые приложения SAC могут работать в неуправляемом рабочем режиме, когда из кодера в декодер передается только стереосигнал после понижающего микширования без необходимости передачи пространственных параметров. Декодер синтезирует пространственные параметры из сигнала, полученного в результате понижающего микширования, и использует эти параметры для создания многоканального аудиосигнала.Some SAC applications can operate in uncontrolled operating mode, when only the stereo signal is transmitted from the encoder to the decoder after down-mixing without the need for spatial parameters. The decoder synthesizes spatial parameters from the signal obtained as a result of the down-mix, and uses these parameters to create a multi-channel audio signal.

Сущность изобретенияSUMMARY OF THE INVENTION

Пространственная информация, связанная с аудиосигналом кодируется в битовый поток, который может передаваться в декодер или записываться на носителе данных. Битовый поток может содержать в себе различный синтаксис, относящийся к временной, частотной и пространственной областям. В некоторых вариантах битовый поток включает в себя одну или несколько структур данных (например, кадры), которые содержат упорядоченные наборы временных интервалов, для которых можно применять те или иные параметры. Эти структуры данных могут быть фиксированными или переменными. Индикатор типа структуры данных может быть вставлен в битовый поток, чтобы дать возможность декодеру определить тип структуры данных и инициировать соответствующий процесс декодирования. Структура данных может включать в себя информацию о положении, которая может быть использована декодером для идентификации правильного временного интервала, для которого применим заданный набор параметров. Информация о положении временного интервала может быть закодирована с использованием фиксированного количества бит или переменного количества бит в зависимости от типа структуры данных, указанного индикатором типа структуры данных. Для структуры данных переменного типа информация о положении временного интервала может быть закодирована с использованием переменного количества бит на основе положения временного интервала в упорядоченном наборе временных интервалов.The spatial information associated with the audio signal is encoded into a bitstream that can be transmitted to a decoder or recorded on a storage medium. The bitstream may contain various syntax related to the time, frequency and spatial domains. In some embodiments, the bitstream includes one or more data structures (eg, frames) that contain ordered sets of time slots for which certain parameters can be applied. These data structures can be fixed or variable. An indicator of the type of data structure can be inserted into the bitstream to enable the decoder to determine the type of data structure and initiate the corresponding decoding process. The data structure may include position information that can be used by the decoder to identify the correct time interval for which a given set of parameters is applicable. The information about the position of the time interval can be encoded using a fixed number of bits or a variable number of bits depending on the type of data structure indicated by the indicator of the type of data structure. For a variable-type data structure, time slot position information can be encoded using a variable number of bits based on the position of the time slot in an ordered set of time slots.

В некоторых вариантах реализации способ кодирования аудиосигнала включает в себя: определение количества временных интервалов и количества наборов параметров, причем наборы параметров включают в себя один или несколько параметров; создание информации, указывающей положение по меньшей мере одного временного интервала в упорядоченном наборе временных интервалов, для которого применяется набор параметров; кодирование аудиосигнала в виде битового потока, включающего в себя кадр, причем этот кадр содержит упорядоченный набор временных интервалов; и вставку переменного количества бит в битовый поток, которые представляют положение временного интервала в упорядоченном наборе временных интервалов, где переменное количество бит определяется положением временного интервала.In some embodiments, an audio encoding method includes: determining the number of time slots and the number of parameter sets, the parameter sets including one or more parameters; creating information indicating the position of at least one time interval in an ordered set of time intervals for which a set of parameters is applied; encoding an audio signal in the form of a bit stream including a frame, this frame comprising an ordered set of time intervals; and inserting a variable number of bits into the bitstream, which represent the position of the time interval in an ordered set of time intervals, where the variable number of bits is determined by the position of the time interval.

В некоторых вариантах изобретения способ декодирования аудиосигнала включат в себя: прием битового потока, представляющего аудиосигнал, причем битовый поток содержит кадр; определение количества временных интервалов и количества наборов параметров из битового потока, причем наборы параметров включают в себя один или несколько параметров; определение информации о положении из битового потока, причем информация о положении указывает положение временного интервала в упорядоченном наборе временных интервалов, для которого применяется указанный набор параметров, где упорядоченный набор временных интервалов содержится в указанном кадре; и декодирование аудиосигнала на основе количества временных интервалов, количества наборов параметров и информации о положении, где информация о положении представлена переменным количеством бит на основе положения временного интервала.In some embodiments of the invention, a method for decoding an audio signal includes: receiving a bit stream representing an audio signal, the bit stream comprising a frame; determining the number of time slots and the number of parameter sets from the bitstream, the parameter sets including one or more parameters; determining position information from the bitstream, wherein the position information indicates a position of a time interval in an ordered set of time intervals for which a specified set of parameters is applied, where an ordered set of time intervals is contained in the specified frame; and decoding the audio signal based on the number of time slots, the number of parameter sets and position information, where the position information is represented by a variable number of bits based on the position of the time interval.

Раскрыты другие варианты кодирования положения временного интервала, которые относятся к системам, способам, устройствам, структурам данных и считываемым компьютером носителям.Disclosed are other options for encoding the position of the time interval, which relate to systems, methods, devices, data structures and computer-readable media.

Следует понимать, что предшествующее общее описание и последующее подробное описание вариантов изобретения носят иллюстративный и пояснительный характер и предполагают дополнительное разъяснение заявленного изобретения.It should be understood that the foregoing general description and the following detailed description of embodiments of the invention are illustrative and explanatory and suggest further clarification of the claimed invention.

Краткое описание чертежейBrief Description of the Drawings

Сопроводительные чертежи, которые включены сюда для более глубокого понимания изобретения и образуют часть данной заявки, иллюстрируют вариант (варианты) осуществления изобретения и вместе с описанием служат для объяснения принципов, лежащих в основе настоящего изобретения. На чертежах:The accompanying drawings, which are included here for a deeper understanding of the invention and form part of this application, illustrate an embodiment (s) of the invention and together with the description serve to explain the principles underlying the present invention. In the drawings:

Фиг. 1 - схема, иллюстрирующая принцип создания пространственной информации согласно одному варианту настоящего изобретения;FIG. 1 is a diagram illustrating the principle of creating spatial information according to one embodiment of the present invention;

фиг. 2 - блок-схема кодера для кодирования аудиосигнала согласно одному варианту настоящего изобретения;FIG. 2 is a block diagram of an encoder for encoding an audio signal according to one embodiment of the present invention;

фиг. 3 - блок-схема декодера для декодирования аудиосигнала согласно одному варианту настоящего изобретения;FIG. 3 is a block diagram of a decoder for decoding an audio signal according to one embodiment of the present invention;

фиг. 4 - блок-схема модуля преобразования каналов, содержащегося в блоке повышающего микширования, входящем в декодер, согласно одному варианту настоящего изобретения;FIG. 4 is a block diagram of a channel conversion module contained in an upmix unit included in a decoder according to one embodiment of the present invention;

фиг. 5 - схема, объясняющая способ конфигурирования битового потока аудиосигнала согласно одному варианту настоящего изобретения;FIG. 5 is a diagram explaining a method for configuring an audio bitstream according to one embodiment of the present invention;

фиг. 6А и 6В - схема и график время/частота, объясняющие взаимосвязи между набором параметров, временным интервалом и параметрическими диапазонами согласно одному варианту настоящего изобретения;FIG. 6A and 6B are a time / frequency diagram and graph explaining the relationships between a set of parameters, a time interval, and parametric ranges according to one embodiment of the present invention;

фиг. 7А - иллюстрация синтаксиса, представляющего информацию о конфигурации пространственного информационного сигнала согласно одному варианту настоящего изобретения;FIG. 7A is an illustration of a syntax representing spatial information signal configuration information according to one embodiment of the present invention;

фиг. 7В - таблица для нескольких параметрических диапазонов пространственного информационного сигнала согласно одному варианту настоящего изобретения;FIG. 7B is a table for several parametric ranges of a spatial information signal according to one embodiment of the present invention;

фиг. 8А - иллюстрация синтаксиса, представляющего несколько параметрических диапазонов, применимых для блока OTT, в виде фиксированного количества бит, согласно одному варианту настоящего изобретения;FIG. 8A is an illustration of a syntax representing several parameter ranges applicable to an OTT block as a fixed number of bits, according to one embodiment of the present invention;

фиг. 8В - иллюстрация синтаксиса, представляющего несколько параметрических диапазонов, применимых для блока OTT, с помощью переменного количества бит согласно одному варианту настоящего изобретения;FIG. 8B is an illustration of a syntax representing several parametric ranges applicable to an OTT block using a variable number of bits according to one embodiment of the present invention;

фиг. 9А - иллюстрация синтаксиса, представляющего несколько параметрических диапазонов, применимых для блока TTT, в виде фиксированного количества бит, согласно одному варианту настоящего изобретения;FIG. 9A is an illustration of a syntax representing several parameter ranges applicable to a TTT block as a fixed number of bits, according to one embodiment of the present invention;

фиг. 9В - иллюстрация синтаксиса, представляющего несколько параметрических диапазонов, применимых для блока TTT, с помощью переменного количества бит согласно одному варианту настоящего изобретения;FIG. 9B is an illustration of a syntax representing several parametric ranges applicable to a TTT block using a variable number of bits according to one embodiment of the present invention;

фиг. 10А - иллюстрация синтаксиса информации о конфигурации пространственного расширения для кадра пространственного расширения согласно одному варианту настоящего изобретения;FIG. 10A is an illustration of the syntax of spatial extension configuration information for a spatial extension frame according to one embodiment of the present invention;

фиг. 10В и 10С - иллюстрации синтаксисов информации о конфигурации пространственного расширения для остаточного сигнала в том случае, когда остаточный сигнал содержится в кадре пространственного расширения согласно одному варианту настоящего изобретения;FIG. 10B and 10C illustrate syntaxes of spatial extension configuration information for a residual signal when the residual signal is contained in a spatial extension frame according to one embodiment of the present invention;

фиг. 10D - иллюстрация синтаксиса для способа представления количества параметрических диапазонов для остаточного сигнала согласно одному варианту настоящего изобретения;FIG. 10D illustrates syntax for a method for representing the number of parametric ranges for a residual signal according to one embodiment of the present invention;

фиг. 11А - блок-схема устройства декодирования при использовании неуправляемого кодирования согласно одному варианту настоящего изобретения;FIG. 11A is a block diagram of a decoding apparatus using uncontrolled coding according to one embodiment of the present invention;

фиг. 11В - схема для способа представления количества параметрических диапазонов в виде одной группы согласно одному варианту настоящего изобретения;FIG. 11B is a diagram for a method for representing the number of parametric ranges in a single group according to one embodiment of the present invention;

фиг. 12 - иллюстрация синтаксиса информации о конфигурации пространственного кадра согласно одному варианту настоящего изобретения;FIG. 12 is an illustration of the syntax of spatial frame configuration information according to one embodiment of the present invention;

фиг. 13А - иллюстрация синтаксиса информации о положении временного интервала, для которого применяется набор параметров, согласно одному варианту настоящего изобретения;FIG. 13A is an illustration of a syntax of position information of a time interval for which a set of parameters is applied, according to one embodiment of the present invention;

фиг. 13В - иллюстрация синтаксиса для представления информации о положении временного интервала, для которого применяется набор параметров, в виде абсолютного значения и значения разности согласно одному варианту настоящего изобретения;FIG. 13B is an illustration of a syntax for presenting position information of a time interval for which a set of parameters is applied as an absolute value and a difference value according to one embodiment of the present invention;

фиг. 13С - схема, представляющая множество информаций о положении временных интервалов, для которых применяются наборы параметров, в виде группы согласно одному варианту настоящего изобретения;FIG. 13C is a diagram representing a plurality of information about the position of time intervals for which sets of parameters are applied, as a group according to one embodiment of the present invention;

фиг. 14 - блок-схема способа кодирования согласно одному варианту настоящего изобретения;FIG. 14 is a flowchart of an encoding method according to one embodiment of the present invention;

фиг. 15 - блок-схема способа декодирования согласно одному варианту настоящего изобретения; иFIG. 15 is a flowchart of a decoding method according to one embodiment of the present invention; and

фиг. 16 - блок-схема архитектуры устройства для реализации процессов кодирования и декодирования, описанных со ссылками на фигуры 1-15.FIG. 16 is a block diagram of a device architecture for implementing the encoding and decoding processes described with reference to FIGS. 1-15.

Наилучший вариант осуществления изобретенияBest Mode for Carrying Out the Invention

На фиг. 1 представлена схема, иллюстрирующая принцип создания пространственной информации согласно одному варианту настоящего изобретения. Схема перцептивного кодирования для многоканальных аудиосигналов основана на том факте, что человек может воспринимать аудиосигналы через трехмерное пространство. Это трехмерное пространстве аудиосигнала можно представить с использованием пространственной информации, включающей в себя, но не только, следующие известные пространственные параметры: разности канальных уровней (CLD), межканальную корреляцию/когерентность (ICC), разновременность каналов (CTD), коэффициенты канального предсказания (CPC) и т.д. Параметр CLD описывает разности в энергии (уровне) между двумя аудиоканалами, параметр ICC описывает величину корреляции или когерентности между двумя аудиоканалами, а параметр CTD описывает разность по времени между двумя аудиоканалами.In FIG. 1 is a diagram illustrating the principle of creating spatial information according to one embodiment of the present invention. The perceptual coding scheme for multichannel audio signals is based on the fact that a person can perceive audio signals through three-dimensional space. This three-dimensional space of the audio signal can be represented using spatial information, including, but not limited to, the following known spatial parameters: channel level differences (CLD), inter-channel correlation / coherence (ICC), channel variance (CTD), channel prediction coefficients (CPC) ) etc. The CLD parameter describes the differences in energy (level) between the two audio channels, the ICC parameter describes the correlation or coherence between the two audio channels, and the CTD parameter describes the time difference between the two audio channels.

Создание параметров CTD и CLD показано на фиг. 1. Первая прямая звуковая волна 103 от удаленного источника 101 звука поступает в левое человеческое ухо 107, а вторая прямая звуковая волна 102 дифрагирует вокруг головы человека, достигая его правого уха 106. Прямые звуковые волны 102 и 103 отличаются друг от друга временем поступления и уровнем энергии. Параметры CTD и CLD могут быть созданы на основе разностей времен поступления и уровней энергии звуковых волн 102 и 103 соответственно. Вдобавок, в уши 106 и 107 поступают отраженные звуковые волны 104 и 105 соответственно, которые не имеют взаимной корреляции. Параметр ICC может быть создан на основе корреляции между звуковыми волнами 104 и 105.The creation of CTD and CLD parameters is shown in FIG. 1. The first direct sound wave 103 from a distant sound source 101 enters the left human ear 107, and the second direct sound wave 102 diffracts around the person’s head, reaching his right ear 106. The direct sound waves 102 and 103 differ in arrival time and level energy. The CTD and CLD parameters can be created based on differences in the arrival times and energy levels of the sound waves 102 and 103, respectively. In addition, reflected sound waves 104 and 105, respectively, which do not have mutual correlation, enter the ears 106 and 107. The ICC parameter can be created based on the correlation between the sound waves 104 and 105.

В кодере из многоканального входного аудиосигнала выделяется пространственная информация (например, пространственные параметры) и создается сигнал, являющийся результатом понижающего микширования. Сигнал после понижающего микширования и пространственные параметры передаются в декодер. Для сигнала после понижающего микширования можно использовать любое количество аудиоканалов, включая, но не только: моносигнал, стереосигнал или многоканальный аудиосигнал. В декодере из сигнала после понижающего микширования и пространственных параметров создается многоканальный сигнал, являющийся результатом повышающего микширования.In the encoder, spatial information (for example, spatial parameters) is extracted from the multi-channel input audio signal and a signal is generated that is the result of downmixing. The signal after down-mixing and spatial parameters are transmitted to the decoder. For the signal after down-mixing, you can use any number of audio channels, including, but not limited to: mono signal, stereo signal or multi-channel audio signal. In the decoder, a multi-channel signal is created from the signal after down-mixing and spatial parameters, which is the result of up-mixing.

На фиг. 2 представлена блок-схема кодера для кодирования аудиосигнала согласно одному варианту настоящего изобретения. Кодер включает в себя блок 202 понижающего микширования, блок 203 создания пространственной информации, блок 207 кодирования сигнала после понижающего микширования и блок 209 мультиплексирования. Возможны и другие конфигурации кодера. Кодеры могут быть реализованы аппаратными средствами, программными средствами или в виде комбинации аппаратных и программных средств. Кодеры могут быть реализованы в интегральных микросхемах, наборах микросхем, однокристальной системе (SoC), цифровых процессорах сигналов, процессорах общего назначения и различных цифровых и аналоговых устройствах.In FIG. 2 is a block diagram of an encoder for encoding an audio signal according to one embodiment of the present invention. The encoder includes a downmix unit 202, a spatial information creation unit 203, a signal encoding unit 207 after the downmix, and a multiplexing unit 209. Other encoder configurations are possible. Encoders can be implemented in hardware, software, or as a combination of hardware and software. Encoders can be implemented in integrated circuits, chipsets, single-chip system (SoC), digital signal processors, general-purpose processors, and various digital and analog devices.

Блок 202 понижающего микширования создает из многоканального аудиосигнала 201 сигнал 204, являющийся результатом понижающего микширования. На фиг.2 x₁,...,x_n указывают входные аудиоканалы. Как упоминалось выше, сигнал 204 после понижающего микширования может представлять собой моносигнал, стереосигнал или многоканальный аудиосигнал. В показанном примере x'₁,...,x'_mуказывают количество каналов сигнала 204 после понижающего микширования. В некоторых вариантах кодер вместо сигнала 204 понижающего микширования обрабатывает сигнал 205 понижающего микширования, который подается извне (например, понижающее микширование для создания художественных эффектов).The downmix unit 202 creates from the multi-channel audio signal 201 a signal 204 resulting from the downmix. 2, x ₁ , ..., x _n indicate input audio channels. As mentioned above, the downmix signal 204 may be a mono signal, a stereo signal, or a multi-channel audio signal. In the example shown, x ' ₁ , ..., x' _m indicate the number of channels of the signal 204 after down-mixing. In some embodiments, the encoder, instead of the downmix signal 204, processes the downmix signal 205, which is supplied externally (e.g., downmix to create artistic effects).

Блок 203 создания пространственной информации извлекает пространственную информацию из многоканального аудиосигнала 201. В этом случае термин «пространственная информация» обозначает информацию, относящуюся к каналам аудиосигнала, используемым при повышающем микшировании сигнала 204 после понижающего микширования с получением многоканального аудиосигнала в декодере. Сигнал 204 понижающего микширования создается посредством понижающего микширования многоканального аудиосигнала. Пространственную информацию кодируют для обеспечения кодированного сигнала 206 с пространственной информацией.The spatial information generating unit 203 extracts spatial information from the multi-channel audio signal 201. In this case, the term “spatial information” refers to information related to the audio channels used in up-mixing of the signal 204 after down-mixing to obtain multi-channel audio in the decoder. The downmix signal 204 is created by downmixing a multi-channel audio signal. Spatial information is encoded to provide an encoded signal 206 with spatial information.

Блок 207 кодирования сигнала понижающего микширования создает кодированный сигнал 208 понижающего микширования путем кодирования сигнала 204 понижающего микширования, созданного в блоке 202 понижающего микширования.The downmix signal encoding unit 207 creates an encoded downmix signal 208 by encoding the downmix signal 204 created in the downmix unit 202.

Блок 209 мультиплексирования создает битовый поток 210, включающий в себя кодированный сигнал 208 после понижающего микширования и кодированный сигнал 206 с пространственной информацией. Битовый поток 210 может быть передан в последующий декодер и/или записан на носителе данных.The multiplexing unit 209 creates a bitstream 210 including an encoded signal 208 after down-mixing and an encoded signal 206 with spatial information. Bitstream 210 may be transmitted to a subsequent decoder and / or recorded on a storage medium.

На фиг.3 представлена блок-схема декодера для декодирования кодированного аудиосигнала согласно одному варианту настоящего изобретения. Декодер включает в себя блок 302 демультиплексирования, блок 305 декодирования сигнала после понижающего микширования, блок 307 декодирования пространственной информации и блок 309 повышающего микширования. Декодеры могут быть реализованы аппаратными средствами, программными средствами или в виде комбинации аппаратных и программных средств. Декодеры могут быть реализованы в интегральных микросхемах, наборах микросхем, однокристальной системе (SoC), цифровых процессорах сигналов, процессорах общего назначения и различных цифровых и аналоговых устройствах.FIG. 3 is a block diagram of a decoder for decoding an encoded audio signal according to one embodiment of the present invention. The decoder includes a demultiplexing unit 302, a downmix signal decoding unit 305, a spatial information decoding unit 307, and an upmix unit 309. Decoders can be implemented in hardware, software, or as a combination of hardware and software. Decoders can be implemented in integrated circuits, chipsets, single-chip system (SoC), digital signal processors, general-purpose processors, and various digital and analog devices.

В некоторых вариантах блок 302 демультиплексирования принимает битовый поток 301, представляющий аудиосигнал, а затем выделяет из битового потока 301 кодированный сигнал 303 после понижающего микширования и кодированный сигнал 304 с пространственной информацией. На фиг.3 x'₁,...,x'_m указывает каналы сигнала 303 после понижающего микширования. Блок 305 декодирования сигнала понижающего микширования выводит декодированный сигнал 306 понижающего микширования посредством декодирования кодированного сигнала 303 понижающего микширования. Если декодер не способен выдавать многоканальный аудиосигнал, то тогда блок 305 декодирования сигнала понижающего микширования может непосредственно вывести сигнал 306 понижающего микширования. На фиг.3 y'₁,...,y'_m указывают прямые выходные каналы блока 305 декодирования сигнала понижающего микширования.In some embodiments, the demultiplexing unit 302 receives a bitstream 301 representing an audio signal, and then extracts the encoded signal 303 from the bitstream 301 after down-mixing and the encoded signal 304 with spatial information. In Fig. 3, x ' ₁ , ..., x' _m indicates the channels of signal 303 after down-mixing. The downmix signal decoding unit 305 outputs the decoded downmix signal 306 by decoding the encoded downmix signal 303. If the decoder is not capable of delivering a multi-channel audio signal, then the downmix signal decoding unit 305 can directly output the downmix signal 306. 3, y ′ ₁ , ..., y ′ _m indicate the direct output channels of the downmix signal decoding unit 305.

Блок 307 декодирования сигнала с пространственной информацией выделяет информацию о конфигурации сигнала с пространственной информацией из закодированного сигнала 304 с пространственной информацией, а затем декодирует сигнал 304 с пространственной информацией, используя извлеченную информацию о конфигурации.The spatial information signal decoding unit 307 extracts the spatial information signal configuration information from the encoded spatial information signal 304, and then decodes the spatial information signal 304 using the extracted configuration information.

Блок 309 повышающего микширования может выполнить повышающее микширование сигнала 306, являющегося результатом понижающего микширования, с получением многоканального аудиосигнала 310, используя извлеченную пространственную информацию 308. На фиг.3 y₁,...,y_n указывают ряд выходных каналов блока 309 повышающего микширования.The upmixing unit 309 may perform upmixing the downmix signal 306 to obtain a multi-channel audio signal 310 using the extracted spatial information 308. In FIG. 3, y ₁ , ..., y _n indicate a number of output channels of the upmixing unit 309.

На фиг.4 представлена блок-схема модуля преобразования каналов, который может быть включен в состав блока 309 повышающего микширования в декодере, показанном на фиг.3. В некоторых вариантах блок 309 повышающего микширования может включать в себя множество модулей преобразования каналов. Модуль преобразования каналов является концептуальным устройством, которое может отличать количество входных каналов от количества выходных каналов, используя специальную информацию.Figure 4 presents a block diagram of a channel conversion module, which can be included in the block 309 up-mixing in the decoder shown in figure 3. In some embodiments, upmix unit 309 may include multiple channel conversion modules. The channel conversion module is a conceptual device that can distinguish the number of input channels from the number of output channels using special information.

В некоторых вариантах модуль преобразования каналов может включать в себя блок OTT (один к двум) для преобразования одного канала в два канала, и наоборот, и блок TTT (два к трем) для преобразования двух каналов в три канала, и наоборот. Блоки OTT и/или TTT могут быть скомпонованы с использованием множества различных полезных конфигураций. Например, блок 309 повышающего микширования, показанный на фиг.3, может включать в себя конфигурацию 5-1-5, конфигурацию 5-2-5, конфигурацию 7-2-7, конфигурацию 7-5-7 и т.д. В конфигурации 5-1-5 сигнал, имеющий после понижающего микширования один канал, создается путем понижающего микширования пяти каналов в один канал, который затем может быть подвергнут повышающему микшированию до пяти каналов. Аналогичным образом можно создать и другие конфигурации, используя различные комбинации блоков OTT и TTT.In some embodiments, the channel conversion module may include an OTT block (one to two) for converting one channel to two channels, and vice versa, and a TTT block (two to three) for converting two channels to three channels, and vice versa. OTT and / or TTT blocks can be arranged using many different useful configurations. For example, the upmix unit 309 shown in FIG. 3 may include a configuration 5-1-5, a configuration 5-2-5, a configuration 7-2-7, a configuration 7-5-7, etc. In configuration 5-1-5, a signal having one channel after downmixing is created by downmixing five channels into one channel, which can then be upmixed to five channels. Similarly, other configurations can be created using various combinations of OTT and TTT blocks.

Обратимся к фиг.4, где показана в качестве примера конфигурация 5-2-5 для блока 400 повышающего микширования. В конфигурации 5-2-5 сигнал 401, который после понижающего микширования имеет два канала, вводится в блок 400 повышающего микширования. В показанном примере в качестве входов в блок 400 повышающего микширования предусмотрены левый канал (L) и правый канал (R). В этом варианте блок 400 повышающего микширования включает в себя один блок TTT 402 и три блока OTT 406, 407 и 408. Сигнал 401, который имеет после понижающего микширования два канала, подается в качестве входного сигнал в блок TTT (TTTo) 402, который обрабатывает сигнал 401 после понижающего микширования и обеспечивает в качестве выходных сигналов три канала 403, 404 и 405. В качестве входа в блок TTT 402 могут быть предусмотрены один или несколько пространственных параметров (например, CPC, CLD, ICC), которые используют для обработки сигнала 401 после понижающего микширования, как описано ниже. В некоторых вариантах в качестве входа в блок TTT 402 может быть избирательно предусмотрен остаточный сигнал. В указанном случае в качестве коэффициента предсказания для создания трех каналов из двух каналов может быть определен параметр CPC.Referring to FIG. 4, an example 5-2-5 configuration for upmix unit 400 is shown. In the configuration 5-2-5, the signal 401, which after downmixing has two channels, is input to the upmixing unit 400. In the example shown, the left channel (L) and the right channel (R) are provided as inputs to the upmix unit 400. In this embodiment, the upmix block 400 includes one TTT block 402 and three OTT blocks 406, 407 and 408. A signal 401, which has two channels after downmix, is supplied as input to the TTT block (TTTo) 402, which processes signal 401 after down-mixing and provides three channels 403, 404 and 405 as output signals. One or more spatial parameters (for example, CPC, CLD, ICC) that are used to process signal 401 can be provided as input to the TTT block 402 after down mix i as described below. In some embodiments, a residual signal may be selectively provided as input to the TTT 402. In this case, the CPC parameter can be determined as a prediction coefficient for creating three channels from two channels.

Канал 403, который предусмотрен в качестве выхода из блока TTT 402, является входом в блок OTT 406, который создает два выходных канала, используя один или несколько пространственных параметров. В показанном примере два выходных канала представляют положения переднего левого (FL) и заднего левого (BL) динамиков, например, в объемной звуковой среде. Канал 404 предусмотрен в качестве входа в блок OTT 407, который создает два выходных канала, используя один или несколько пространственных параметров. В показанном примере два выходных канала представляют положения переднего правого (FR) и заднего правого (BR) динамиков. Канал 405 предусмотрен в качестве входа в блок OTT 408, который создает два выходных канала. В показанном примере два выходных канала представляют положение центрального (С) динамика и канал низкочастотной оптимизации (LFE). В этом случае пространственная информация (например, CLD, ICC) может быть предусмотрена в качестве входа для каждого из блоков OTT. В некоторых вариантах в качестве входов в блоки OTT 406 и 407 могут быть предусмотрены остаточные сигналы (Res1, Res2). В указанном варианте остаточный сигнал может быть не предусмотрен в качестве входного сигнала в блок OTT 408, который выдает центральный канал и канал LFE.Channel 403, which is provided as an output from the TTT block 402, is an input to the OTT block 406, which creates two output channels using one or more spatial parameters. In the example shown, the two output channels represent the positions of the front left (FL) and rear left (BL) speakers, for example, in a surround sound environment. Channel 404 is provided as an input to the OTT block 407, which creates two output channels using one or more spatial parameters. In the example shown, the two output channels represent the positions of the front right (FR) and rear right (BR) speakers. Channel 405 is provided as an input to the OTT block 408, which creates two output channels. In the example shown, two output channels represent the position of the center (C) speaker and the low-frequency optimization (LFE) channel. In this case, spatial information (e.g., CLD, ICC) may be provided as an input for each of the OTT blocks. In some embodiments, residual signals (Res1, Res2) may be provided as inputs to the OTT blocks 406 and 407. In the indicated embodiment, the residual signal may not be provided as an input signal to the OTT 408 block, which provides the central channel and the LFE channel.

Конфигурация, показанная на фиг.4, является одним примером конфигурации для модуля преобразования каналов. Возможны другие конфигурации для модуля преобразования каналов, включающие в себя различные комбинации блоков OTT и TTT. Поскольку каждый из модулей преобразования каналов может функционировать в частотной области, можно определить количество параметрических диапазонов, применимых для каждого из модулей преобразования каналов. Диапазон параметра означает по меньшей мере один частотный диапазон, применимый к одному параметру. Ряд параметрических диапазонов описан со ссылками на фиг.6В.The configuration shown in FIG. 4 is one example configuration for a channel transform module. Other configurations for the channel conversion module are possible, including various combinations of OTT and TTT blocks. Since each of the channel conversion modules can operate in the frequency domain, it is possible to determine the number of parametric ranges applicable to each of the channel conversion modules. A parameter range means at least one frequency range applicable to one parameter. A number of parametric ranges are described with reference to FIG.

На фиг.5 показана схема, иллюстрирующая способ конфигурирования битового потока аудиосигнала согласно одному варианту настоящего изобретения. На фиг.5(а) показан битовый поток аудиосигнала, включающего только сигнал с пространственной информацией, а на фигурах 5(b) и 5(с) показаны битовые потоки аудиосигнала, включающего в себя сигнал после понижающего микширования и сигнал с пространственной информацией.5 is a diagram illustrating a method for configuring an audio bitstream according to one embodiment of the present invention. Fig. 5 (a) shows a bit stream of an audio signal including only a spatial information signal, and Figures 5 (b) and 5 (c) show bit streams of an audio signal including a downmix signal and a spatial information signal.

Обратимся к фиг.5(а), где битовый поток аудиосигнала может включать в себя информацию 501 о конфигурации и кадр 503. Кадр 503 может повторяться в битовом потоке, причем в некоторых вариантах включать в себя единственный пространственный кадр 502, содержащий пространственную аудиоинформацию.Referring to FIG. 5 (a), where the audio bitstream may include configuration information 501 and frame 503. Frame 503 may be repeated in the bitstream, and in some embodiments, include a single spatial frame 502 containing spatial audio information.

В некоторых вариантах информация 501 о конфигурации включает в себя информацию, описывающую общее количество временных интервалов в одном пространственном кадре 502, общее количество параметрических диапазонов, охватывающих частотный диапазон аудиосигнала, количество параметрических диапазонов в блоке OTT, количество параметрических диапазонов в блоке TTT и количество параметрических диапазонов в остаточном сигнале. При необходимости в информацию 501 о конфигурации может быть включена и другая информация.In some embodiments, the configuration information 501 includes information describing the total number of time slots in one spatial frame 502, the total number of parametric ranges covering the frequency range of the audio signal, the number of parametric ranges in the OTT block, the number of parametric ranges in the TTT block, and the number of parametric ranges in the residual signal. If necessary, other information may be included in the configuration information 501.

В некоторых вариантах пространственный кадр 502 включает в себя один или несколько пространственных параметров (например, CLD, ICC), тип кадра, количество наборов параметров в одном кадре, а также временные интервалы, для которых могут быть применены наборы параметров. При необходимости в пространственный кадр 502 может быть включена и другая информация. Смысл и ценность информации 501 о конфигурации и информации, содержащейся в пространственном кадре 502, объясняются ниже со ссылками на фигуры с 6 по 10.In some embodiments, the spatial frame 502 includes one or more spatial parameters (eg, CLD, ICC), the type of frame, the number of parameter sets in one frame, as well as time intervals for which parameter sets can be applied. Optionally, other information may be included in the spatial frame 502. The meaning and value of the configuration information 501 and the information contained in the spatial frame 502 are explained below with reference to figures 6 through 10.

Обратимся к фиг.5(b), где битовый поток аудиосигнала может включать в себя информацию 504 о конфигурации, сигнал 505 после понижающего микширования и пространственный кадр 506. В этом случае один кадр 507 может включать в себя сигнал 505 после понижающего микширования и пространственный кадр 506, причем кадр 507 может в битовом потоке повторяться.Referring to FIG. 5 (b), where the bitstream of an audio signal may include configuration information 504, a downmix signal 505 and a spatial frame 506. In this case, one frame 507 may include a downmix signal 505 and a spatial frame 506, the frame 507 may be repeated in the bitstream.

Обратимся к фиг.5(с), где битовый поток аудиосигнала может включать в себя сигнал 508 после понижающего микширования, информацию 509 о конфигурации и пространственный кадр 510. В этом случае один кадр 511 может включать в себя информацию 509 о конфигурации и пространственный кадр 510, причем кадр 511 в битовом потоке может повторяться. Если информация 509 о конфигурации вставлена в каждый кадр 511, то аудиосигнал может воспроизводиться устройством воспроизведения с произвольного места.Referring to FIG. 5 (c), where the bitstream of an audio signal may include a downmix signal 508, configuration information 509 and a spatial frame 510. In this case, one frame 511 may include configuration information 509 and a spatial frame 510 moreover, frame 511 in the bitstream may be repeated. If configuration information 509 is inserted in each frame 511, then the audio signal may be reproduced by the reproducing apparatus from an arbitrary location.

Хотя на фиг.5(с) показано, что информация 509 о конфигурации вставлена в битовый поток с помощью кадра 511, должно быть очевидно, что информация 509 о конфигурации может быть вставлена в битовый поток с использованием множества кадров, которые повторяются периодически или не периодически.Although FIG. 5 (c) shows that configuration information 509 is inserted into the bitstream using frame 511, it should be apparent that configuration information 509 can be inserted into the bitstream using a plurality of frames that are repeated periodically or periodically .

На фигурах 6А и 6В представлены схемы, иллюстрирующие связи между набором параметров, временным интервалом и параметрическими диапазонами согласно одному варианту настоящего изобретения. Набор параметров обозначает один или несколько пространственных параметров, используемых для одного временного интервала. Пространственные параметры могут включать в себя пространственную информацию, такую как CDL, ICC, CPC и т.д. Временной интервал означает интервал аудиосигнала, для которого могут быть применены пространственные параметры. Один пространственный кадр может включать в себя один или несколько временных интервалов.Figures 6A and 6B are diagrams illustrating the relationships between a set of parameters, a time interval, and parametric ranges according to one embodiment of the present invention. A set of parameters indicates one or more spatial parameters used for a single time interval. Spatial parameters may include spatial information such as CDL, ICC, CPC, etc. A time interval means an interval of an audio signal for which spatial parameters can be applied. One spatial frame may include one or more time slots.

Обратимся к фиг.6А, где в пространственном кадре может быть использовано несколько наборов параметров 1,…,P, а каждый набор параметров может включать в себя одно или несколько полей 1,…,Q-1 данных. Набор параметров может быть применен ко всему частотному диапазону аудиосигнала, а каждый пространственный параметр в наборе параметров может быть применен к одному или нескольким участкам полосы частот. Например, если набор параметров включает в себя 20 пространственных параметров, то вся полоса частот аудиосигнала может быть разбита на 20 зон (называемых далее «параметрические диапазоны»), и для 20 параметрических диапазонов можно применить 20 пространственных параметров из данного набора параметров. Параметры могут быть применены к параметрическим диапазонам исходя из конкретных требований. Например, пространственные параметры могут быть применены к низкочастотным параметрическим диапазонам без разрядки, а к высокочастотным параметрическим диапазонам с разрядкой.Referring to FIG. 6A, where in a spatial frame several sets of parameters 1, ..., P can be used, and each set of parameters can include one or more data fields 1, ..., Q-1. The set of parameters can be applied to the entire frequency range of the audio signal, and each spatial parameter in the set of parameters can be applied to one or more parts of the frequency band. For example, if a set of parameters includes 20 spatial parameters, then the entire frequency band of the audio signal can be divided into 20 zones (hereinafter referred to as “parametric ranges”), and for 20 parametric ranges you can apply 20 spatial parameters from this set of parameters. Parameters can be applied to parametric ranges based on specific requirements. For example, spatial parameters can be applied to low-frequency parametric ranges without discharge, and to high-frequency parametric ranges with discharge.

Обратимся к фиг.6В, где график время/частота показывает взаимосвязь между наборами параметров и временными интервалами. В показанном примере три набора параметров (набор 1 параметров, набор 2 параметров, набор 3 параметров) применяются для упорядоченного набора из 12 временных интервалов в одном пространственном кадре. В этом случае весь частотный диапазон аудиосигнала разбивается на 9 параметрических диапазонов. Таким образом, горизонтальная ось указывает количество временных интервалов, а вертикальная ось указывает количество параметрических диапазонов. Каждый из трех наборов параметров применяется для конкретного временного интервала. Например, первый набор параметров (набор 1 параметров) применяется для временного интервала #1, второй набор параметров (набор 2 параметров) применяется для временного интервала #5, а третий набор параметров (набор 3 параметров) применяется для временного интервала #9. Наборы параметров могут быть применены для других временных интервалов посредством интерполяции и/или копирования наборов параметров для этих временных интервалов. В общем случае количество наборов параметров может быть меньше или равно количеству временных интервалов, а количество параметрических диапазонов может быть меньше или равно количеству частотных полос аудиосигнала. Посредством кодирования пространственной информации для некоторых частей временной-частотной области аудиосигнала вместо того, чтобы делать это для всей временной-частотной области аудиосигнала, можно уменьшить объем пространственной информации, посылаемой от кодера на декодер. Это сокращение объема данных возможно потому, что согласно известным принципам перцепционного аудиокодирования разряженной информации во временной-частотной области частот часто бывает достаточно для восприятия человеком звука.Turning to FIG. 6B, a time / frequency graph shows the relationship between parameter sets and time intervals. In the shown example, three sets of parameters (set of 1 parameters, set of 2 parameters, set of 3 parameters) are used for an ordered set of 12 time intervals in one spatial frame. In this case, the entire frequency range of the audio signal is divided into 9 parametric ranges. Thus, the horizontal axis indicates the number of time intervals, and the vertical axis indicates the number of parametric ranges. Each of the three sets of parameters is applied to a specific time interval. For example, the first set of parameters (set of 1 parameters) is used for time interval # 1, the second set of parameters (set of 2 parameters) is used for time interval # 5, and the third set of parameters (set of 3 parameters) is used for time interval # 9. The parameter sets can be applied to other time intervals by interpolating and / or copying the parameter sets for these time intervals. In general, the number of parameter sets may be less than or equal to the number of time intervals, and the number of parametric ranges may be less than or equal to the number of frequency bands of the audio signal. By encoding spatial information for some parts of the time-frequency region of the audio signal, instead of doing this for the entire time-frequency region of the audio signal, it is possible to reduce the amount of spatial information sent from the encoder to the decoder. This reduction in the amount of data is possible because, according to the well-known principles of perceptual audio coding of discharged information in the time-frequency range of frequencies, it is often enough for a person to perceive sound.

Важным признаком раскрытых здесь вариантов осуществления изобретения является кодирование и декодирование положений временных интервалов, для которых применяются наборы параметров, с использованием фиксированного или переменного количества бит. Количество параметрических диапазонов также может быть представлено фиксированным количеством бит или переменным количеством бит. Схема кодирования с переменным количеством бит также может быть применена к другой информации, используемой при пространственном аудиокодировании, в том числе, но не только, к информации, связанной с временной, пространственной и/или частотной областями (например, для нескольких частотных поддиапазонов на выходе гребенки фильтров).An important feature of the embodiments disclosed herein is the encoding and decoding of the positions of time slots for which sets of parameters are applied using a fixed or variable number of bits. The number of parametric ranges can also be represented by a fixed number of bits or a variable number of bits. A coding scheme with a variable number of bits can also be applied to other information used in spatial audio coding, including, but not limited to, information related to the time, spatial and / or frequency domains (for example, for several frequency subbands at the output of a comb filters).

На фиг.7А показан синтаксис для представления информации о конфигурации пространственного информационного сигнала согласно одному варианту настоящего изобретения. Информация о конфигурации включает в себя множество полей с 701 по 718, которым может быть присвоено некоторое количество бит.FIG. 7A shows a syntax for representing spatial information signal configuration information according to one embodiment of the present invention. The configuration information includes a plurality of fields 701 to 718, to which a number of bits can be assigned.

Поле “bsSamplingFreqencyIndex” 701 указывает частоту дискретизации, полученную из процесса дискретизации аудиосигнала. Для представления частоты дискретизации полю “bsSamplingFreqencyIndex” 701 выделено 4 бита. Если значение поля “bsSamplingFreqencyIndex” 701 составляет 15, то есть двоичное число 1111, то добавляется поле “bsSamplingFreqency” 702 для представления частоты дискретизации. В этом случае полю “bsSamplingFreqency” 702 выделяется 24 бита.The “bsSamplingFreqencyIndex” 701 field indicates the sampling rate obtained from the sampling process of the audio signal. To represent the sampling rate, 4 bits are allocated to the “bsSamplingFreqencyIndex” 701 field. If the value of the “bsSamplingFreqencyIndex” 701 field is 15, that is, the binary number 1111, then the “bsSamplingFreqency” 702 field is added to represent the sampling frequency. In this case, the “bsSamplingFreqency” 702 field is allocated 24 bits.

Поле “bsFrameLength” 703 указывает общее количество временных интервалов (далее называемое "numSlots”) в одном пространственном кадре, причем между "numSlots” и полем “bsFrameLength” 703 может иметь место соотношение NumSlots = bsFrameLength+1.Field “bsFrameLength” 703 indicates the total number of time intervals (hereinafter referred to as “numSlots”) in one spatial frame, and between “numSlots” and field “bsFrameLength” 703, the relation NumSlots = bsFrameLength + 1 can take place.

Поле “bsFreqRes” 704 указывает общее количество параметрических диапазонов, охватывающих всю частотную область аудиосигнала. Поле “bsFreqRes” 704 объясняется ниже на фиг.7В.The “bsFreqRes” field 704 indicates the total number of parametric ranges covering the entire frequency domain of the audio signal. The “bsFreqRes” field 704 is explained below in FIG.

Поле “bsTreeConfig” 705 указывает информацию для древовидной конфигурации, включающей в себя множество модулей преобразования каналов, таких как были описаны со ссылками на фиг.4. Информация для древовидной конфигурации включает в себя такую информацию, как тип модуля преобразования каналов, количество модулей преобразования каналов, тип пространственной информации, используемой в модуле преобразования каналов, количество входных/выходных каналов аудиосигнала и т.д.The “bsTreeConfig” field 705 indicates information for a tree configuration including a plurality of channel conversion modules, such as those described with reference to FIG. 4. Information for the tree configuration includes information such as the type of channel conversion module, the number of channel conversion modules, the type of spatial information used in the channel conversion module, the number of audio input / output channels, etc.

Древовидная конфигурация может иметь одну из следующих конфигураций: конфигурацию 5-1-5, конфигурацию 5-2-5, конфигурацию 7-2-7, конфигурацию 7-5-7 и т.п., в соответствии с типом модуля преобразования каналов или количеством каналов. На фиг.4 показана древовидная конфигурация 5-2-5.The tree configuration can have one of the following configurations: configuration 5-1-5, configuration 5-2-5, configuration 7-2-7, configuration 7-5-7, etc., according to the type of channel conversion module or number of channels. Figure 4 shows the tree configuration 5-2-5.

Поле 706 “bsQuantMode” указывает информацию о режиме квантования пространственной информации.Field 706 “bsQuantMode” indicates information about the quantization mode of spatial information.

Поле “bsOneIcc” 707 указывает, используется ли для всех блоков OTT один поднабор параметров ICC. В этом случае поднабор параметров означает набор параметров, примененных для конкретного временного интервала и конкретного модуля преобразования каналов.The bsOneIcc field 707 indicates whether a single subset of ICC parameters is used for all OTT blocks. In this case, a subset of parameters means a set of parameters applied for a specific time interval and a specific channel conversion module.

Поле “bsArbitraryDownmix” 708 указывает наличие или отсутствие произвольно выбранного коэффициента усиления при понижающем микшировании. Поле “bsFixedGainSur” 709 указывает коэффициент усиления, применяемый для объемного канала, например LS (левый канал объемного звучания) и RS (правый канал объемного звучания).The “bsArbitraryDownmix” field 708 indicates the presence or absence of an arbitrarily selected gain during downmix. The “bsFixedGainSur” field 709 indicates the gain applied to the surround channel, such as LS (left surround channel) and RS (right surround channel).

Поле “bsFixedGainLF” 710 указывает коэффициент усиления, применяемый для канала LFE.The “bsFixedGainLF” field 710 indicates the gain applied to the LFE channel.

Поле “bsFixedGainDM” 711 указывает коэффициент усиления, применяемый для сигнала, являющегося результатом понижающего микширования.The “bsFixedGainDM” field 711 indicates the gain applied to the signal resulting from the downmix.

Поле “bsMatrixMode” 712 указывает, создается ли в кодере матрично совместимый стереосигнал после понижающего микширования.The “bsMatrixMode” 712 field indicates whether a matrix-compatible stereo signal is generated in the encoder after down-mixing.

Поле “bsTempShapeConfig” 713 указывает рабочий режим временного формирования (например, TES (формирование временной огибающей) и/или TP (временное формирование)) в декодере.The “bsTempShapeConfig” 713 field indicates the operating mode of temporal shaping (eg, TES (temporal envelope shaping) and / or TP (temporal shaping)) in the decoder.

Поле “bsDecorrConfig” 714 указывает рабочий режим декоррелятора декодера.The “bsDecorrConfig” field 714 indicates the operational mode of the decoder decorrelator.

И поле “bs3DaudioМode” 715 указывает, закодирован ли сигнал после понижающего микширования в 3D (трехмерный) сигнал и использована ли обработка с применением обратной функции HRTF (функция моделирования восприятия звука человеком).And the “bs3DaudioMode” field 715 indicates whether the signal after downmixing is encoded into a 3D (three-dimensional) signal and whether processing using the inverse HRTF function (human perception modeling function) is used.

После того, как была определена/извлечена информация из каждого поля в кодере/декодере, в кодере/декодере определяется/извлекается информация для количества параметрических диапазонов, применяемых для модуля преобразования каналов. Сначала определяется/извлекается (716) количество параметрических диапазонов, применяемых для блока OTT, а затем определяется/извлекается (717) количество параметрических диапазонов, применяемых для блока TTT. Количество параметрических диапазонов для блока ATT и/или блока TTT будет подробно описано со ссылками на фигуры с 8А по 9В.After the information from each field in the encoder / decoder has been determined / extracted, information for the number of parametric ranges used for the channel conversion module is determined / extracted in the encoder / decoder. First, the number of parametric ranges used for the OTT block is determined / extracted (716), and then the number of parametric ranges used for the TTT block is determined / extracted (717). The number of parametric ranges for the ATT block and / or the TTT block will be described in detail with reference to figures 8A through 9B.

В случае, когда существует кадр расширения, блок “spatialExtensionConfig” 718 включает в себя информацию о конфигурации для кадра расширения. Информация, включенная в блок “spatialExtensionConfig” 718, описывается ниже со ссылками на фигуры с 10А по 10D.In the case where an extension frame exists, the “spatialExtensionConfig” block 718 includes configuration information for the extension frame. The information included in the spatialExtensionConfig block 718 is described below with reference to Figures 10A through 10D.

На фиг.7В представлена таблица для количества параметрических диапазонов сигнала с пространственной информацией согласно одному варианту настоящего изобретения. “numBands” указывает количество параметрических диапазонов для всей частотной области аудиосигнала, а “bsFreqRes” указывает индексную информацию для количества параметрических диапазонов. Например, вся частотная область аудиосигнала по желанию может быть разбита на несколько параметрических диапазонов (например, 4, 5, 7, 10, 14, 20, 28 и т.д.).7B is a table for the number of parametric ranges of a spatial information signal according to one embodiment of the present invention. “NumBands” indicates the number of parametric ranges for the entire frequency domain of the audio signal, and “bsFreqRes” indicates index information for the number of parametric ranges. For example, the entire frequency domain of an audio signal can be optionally divided into several parametric ranges (for example, 4, 5, 7, 10, 14, 20, 28, etc.).

В некоторых вариантах один параметр может применяться для каждого параметрического диапазона. Например, если “numBands” составляет 28, то тогда вся частотная область аудиосигнала разбивается на 28 параметрических диапазонов и каждый из 28 параметров может применяться для каждого из 28 параметрических диапазонов. В другом примере, если “numBands” равно 4, то тогда вся частотная область данного аудиосигнала разбивается на 4 параметрических диапазона и каждый из 4 параметров может применяться для каждого из 4 параметрических диапазонов. На фиг.7В термин «Зарезервировано» означает, что количество параметрических диапазонов для всей частотной области данного аудиосигнала не определено.In some embodiments, one parameter may be applied for each parametric range. For example, if “numBands” is 28, then the entire frequency domain of the audio signal is divided into 28 parametric ranges and each of 28 parameters can be applied to each of 28 parametric ranges. In another example, if “numBands” is 4, then the entire frequency domain of this audio signal is divided into 4 parametric ranges and each of 4 parameters can be applied to each of the 4 parametric ranges. 7B, the term “Reserved” means that the number of parametric ranges for the entire frequency domain of a given audio signal is not defined.

Следует заметить, что орган слуха человека не чувствителен к количеству параметрических диапазонов, используемых в схеме кодирования. Таким образом, использование небольшого количества параметрических диапазонов может обеспечить такой же пространственный аудиоэффект для слушателя, как если бы использовалось большее количество параметрических диапазонов.It should be noted that the human hearing organ is not sensitive to the number of parametric ranges used in the coding scheme. Thus, the use of a small number of parametric ranges can provide the same spatial audio effect for the listener as if more parametric ranges were used.

В отличие от параметра “numBands”, параметр "numSlots”, представленный полем “bsFrameLength” 703, показанным на фиг.7А, может представлять все значения. Однако значения "numSlots” могут быть ограничены, если количество отсчетов в одном пространственном кадре точно делится на "numSlots”. Таким образом, если максимальное представляемое значение "numSlots” равно 'b', то каждое значение поля “bsFrameLength” 703 может быть представлено ceil{log₂(b)} битами. В этом случае 'ceil(x)' означает минимальное целое число, большее или равное значению 'x'. Например, если один пространственный кадр включает в себя 72 временных интервала, то тогда для поля “bsFrameLength” 703 может быть выделено ceil{log₂(72)} = 7 бит, и количество параметрических диапазонов, применяемых для модуля преобразования каналов, может быть принято равным значению в пределах “numBands”.Unlike the “numBands” parameter, the “numSlots” parameter represented by the “bsFrameLength” 703 field shown in Fig. 7A can represent all values. However, the “numSlots” values can be limited if the number of samples in one spatial frame is exactly divided by “numSlots”. Thus, if the maximum represented value of “numSlots” is 'b', then each value of the “bsFrameLength” 703 field may be represented by ceil {log ₂ (b)} bits. In this case, 'ceil (x)' means the smallest integer greater than or equal to the value of 'x'. For example, if one spatial frame includes 72 time slots, then for the “bsFrameLength” 703 field, ceil {log ₂ (72)} = 7 bits can be allocated, and the number of parametric ranges used for the channel transform module can be taken equal to the value within “numBands”.

На фиг.8А показан синтаксис для представления количества параметрических диапазонов, применяемых для блока OTT, с помощью фиксированного количества бит согласно одному варианту настоящего изобретения. Обратимся к фигурам 7А и 8А, где 'i' имеет значение от нуля до numOttBoxes-1 и где 'numOttBoxes' - общее количество блоков OTT. А именно, значение 'i' указывает каждый блок OTT, а количество параметрических диапазонов, применяемое для каждого блока OTT, представлено соответствующим значением 'i'. Если блок OTT имеет режим канала LFE, то количество параметрических диапазонов (обозначенное далее как “bsOttBands”), применяемое для канала LFE блока OTT, можно представить с использованием фиксированного количества бит. В примере, показанном на фиг.8А, для поля “bsOttBands” 801 выделено 5 бит. Если блок OTT не имеет режим канала LFE, то для канала блока OTT может быть применено общее количество параметрических диапазонов (numBands).FIG. 8A shows a syntax for representing the number of parametric ranges applied to an OTT block using a fixed number of bits according to one embodiment of the present invention. Referring to Figures 7A and 8A, where 'i' has a value from zero to numOttBoxes-1 and where 'numOttBoxes' is the total number of OTT blocks. Namely, the value of 'i' indicates each OTT block, and the number of parametric ranges used for each OTT block is represented by the corresponding value of 'i'. If the OTT block has an LFE channel mode, then the number of parametric ranges (hereinafter referred to as “bsOttBands”) used for the LFE channel of the OTT block can be represented using a fixed number of bits. In the example shown in FIG. 8A, 5 bits are allocated for the “bsOttBands” 801 field. If the OTT block does not have LFE channel mode, then the total number of parametric ranges (numBands) can be applied to the channel of the OTT block.

На фиг.8В показан синтаксис для представления количества параметрических диапазонов, применяемых для блока OTT, с переменным количеством бит согласно одному варианту настоящего изобретения. На фиг.8В, которая аналогична фиг.8А, в отличие от фиг.8А поле “bsOttBands” 802, показанное на фиг.8В, представлено переменным количеством бит. В частности, поле “bsOttBands” 802, значение которого меньше или равно “numBands”, может быть представлено переменным количеством бит с использованием “numBands”.FIG. 8B shows a syntax for representing the number of parametric ranges used for an OTT block with a variable number of bits according to one embodiment of the present invention. In FIG. 8B, which is similar to FIG. 8A, in contrast to FIG. 8A, the “bsOttBands” 802 field shown in FIG. 8B is represented by a variable number of bits. In particular, the “bsOttBands” 802 field, whose value is less than or equal to “numBands”, can be represented by a variable number of bits using “numBands”.

Если “numBands” находится в диапазоне, большем или равным 2^(n-1) и меньшим 2^(n), то поле “bsOttBands” 802 может быть представлено переменным количеством бит n.If “numBands” is in the range greater than or equal to 2 ^ (n-1) and less than 2 ^ (n), then the field “bsOttBands” 802 can be represented by a variable number of bits n.

Например: (а) если “numBands” равно 40, то поле “bsOttBands” 802 представляется 6 битами; (b) если “numBands” равно 28 или 20, то поле “bsOttBands” 802 представляется 5 битами; (с) если “numBands” равно 14 или 10, то поле “bsOttBands” 802 представляется 4 битами; и (d) если “numBands” равно 7, 5 или 4, то поле “bsOttBands” 802 представляется 3 битами.For example: (a) if “numBands” is 40, then the “bsOttBands” 802 field is represented by 6 bits; (b) if “numBands” is 28 or 20, then the “bsOttBands” 802 field is represented by 5 bits; (c) if “numBands” is 14 or 10, then the “bsOttBands” 802 field is represented by 4 bits; and (d) if “numBands” is 7, 5, or 4, then the “bsOttBands” 802 field is represented by 3 bits.

Если “numBands” находится в диапазоне, большем, чем 2^(n-1), и меньшим или равным 2^(n), то поле “bsOttBands” 802 может быть представлено переменным количеством бит n.If “numBands” is in a range greater than 2 ^ (n-1) and less than or equal to 2 ^ (n), then the field “bsOttBands” 802 can be represented by a variable number of bits n.

Например: (а) если “numBands” равно 40, то поле “bsOttBands” 802 представляется 6 битами; (b) если “numBands” равно 28 или 20, то поле “bsOttBands” 802 представляется 5 битами; (с) если “numBands” равно 14 или 10, то поле “bsOttBands” 802 представляется 4 битами; (d) если “numBands” равно 7, 5, то поле “bsOttBands” 802 представляется 3 битами; и (e) если “numBands” равно 4, то поле “bsOttBands” 802 представляется 2 битами.For example: (a) if “numBands” is 40, then the “bsOttBands” 802 field is represented by 6 bits; (b) if “numBands” is 28 or 20, then the “bsOttBands” 802 field is represented by 5 bits; (c) if “numBands” is 14 or 10, then the “bsOttBands” 802 field is represented by 4 bits; (d) if “numBands” is 7, 5, then the “bsOttBands” 802 field is represented by 3 bits; and (e) if “numBands” is 4, then the “bsOttBands” 802 field is represented by 2 bits.

Поле “bsOttBands” 802 может быть представлено переменным количеством бит через функцию (далее называемую «функция наименьшего целого») округления до ближайшего целого значения, где в качестве переменной берется “numBands”.The “bsOttBands” 802 field can be represented by a variable number of bits through a function (hereinafter referred to as the “smallest integer function”) rounding to the nearest integer value, where “numBands” is taken as a variable.

В частности: (i) в случае, когда 0 < bsOttBands ≤ numBands или 0 ≤ bsOttBands < numBands, поле “bsOttBands” 802 представляется количеством бит, соответствующим значению ceil(log₂(numBands)); или (ii) в случае, когда 0 ≤ bsOttBands ≤ numBands, поле “bsOttBands” 802 может быть представлено ceil(log₂(numBands+1)) битами.In particular: (i) in the case when 0 <bsOttBands ≤ numBands or 0 ≤ bsOttBands <numBands, the “bsOttBands” 802 field is represented by the number of bits corresponding to the value of ceil (log ₂ (numBands)); or (ii) in the case where 0 ≤ bsOttBands ≤ numBands, the “bsOttBands” 802 field may be represented by ceil (log ₂ (numBands + 1)) bits.

Если значение, меньшее или равное “numBands” (называемое далее “numberBands”) определено произвольно, то поле “bsOttBands” 802 может быть представлено переменным количеством бит через функцию наименьшего целого, если в качестве переменной принять “numberBands”.If a value less than or equal to “numBands” (hereinafter referred to as “numberBands”) is determined arbitrarily, then the field “bsOttBands” 802 can be represented by a variable number of bits through the function of the smallest integer, if “numberBands” is taken as a variable.

В частности: (i) в случае, когда 0 < bsOttBands ≤ numberBands или 0 ≤ bsOttBands < numberBands, поле “bsOttBands” 802 представляется ceil(log₂(numberBands)) битами; или (ii) в случае, когда 0 ≤ bsOttBands ≤ numberBands, поле “bsOttBands” 802 может быть представлено ceil(log₂(numberBands+1)) битами.In particular: (i) in the case when 0 <bsOttBands ≤ numberBands or 0 ≤ bsOttBands <numberBands, the field “bsOttBands” 802 is represented by ceil (log ₂ (numberBands)) bits; or (ii) in the case where 0 ≤ bsOttBands ≤ numberBands, the “bsOttBands” 802 field may be represented by ceil (log ₂ (numberBands + 1)) bits.

Если используется более одного блока OTT, то комбинация “bsOttBands” может быть выражена формулой 1, приведенной ниже.If more than one OTT block is used, then the combination “bsOttBands” can be expressed by formula 1 below.

[Формула 1][Formula 1]

где bsOttBands_i указывает i-й “bsOttBands”. Например, предположим, что имеется три блока OTT и три значения (N=3) для поля “bsOttBands” 802. В этом примере три значения поля “bsOttBands” 802 (обозначенные далее a1, a2 и a3 соответственно), применяемые для трех соответствующих блоков OTT, могут быть представлены 2 битами каждое. Следовательно, для выражения значений a1, a2 и a3 потребуется всего 6 бит. Кроме того, если значения a1, a2 и a3 представлены в виде группы, то тогда может возникнуть 27 (=3*3*3) вариантов, которые могут быть представлены 5 битами с экономией одного бита. Если “numBands” равно 3, а значение группы, представленное 5 битами, равно 15, то значение группы может быть представлено в виде 15=1×(3^2)+2*(3^1)+0*(3^0). Следовательно, декодер, применив обратным образом формулу 1, может определить из значения 15 группы, что три значения a1, a2 и a3 поля “bsOttBands” 802 составляют 1, 2 и 0 соответственно.where bsOttBands _i indicates the ith “bsOttBands”. For example, suppose there are three OTT blocks and three values (N = 3) for the “bsOttBands” field 802. In this example, three values of the “bsOttBands” 802 field (indicated below by a1, a2, and a3, respectively) used for the three corresponding blocks OTT can be represented by 2 bits each. Therefore, only 6 bits are required to express the values of a1, a2, and a3. In addition, if the values a1, a2 and a3 are presented as a group, then 27 (= 3 * 3 * 3) options may arise, which can be represented by 5 bits with the saving of one bit. If “numBands” is 3, and the group value represented by 5 bits is 15, then the group value can be represented as 15 = 1 × (3 ^ 2) + 2 * (3 ^ 1) + 0 * (3 ^ 0 ) Therefore, the decoder, applying the reverse formula 1, can determine from the value 15 of the group that the three values a1, a2 and a3 of the field “bsOttBands” 802 are 1, 2 and 0, respectively.

В случае множества блоков OTT комбинация “bsOttBands” может быть представлена в виде одной из формул с 2 по 4 (определенных ниже) с использованием “numberBands”. Поскольку представление “bsOttBands” с использованием “numberBands” аналогично представлению с использованием “numBands” из формулы 1, подробное объяснение представленных ниже формул опускается.In the case of many OTT blocks, the combination “bsOttBands” can be represented as one of formulas 2 through 4 (defined below) using “numberBands”. Since the representation of “bsOttBands” using “numberBands” is similar to the representation using “numBands” from formula 1, a detailed explanation of the formulas below is omitted.

[Формула 2][Formula 2]

[Формула 3][Formula 3]

[Формула 4][Formula 4]

На фиг.9А показан синтаксис для представления количества параметрических диапазонов, применяемых для блока TTT, с фиксированным количеством бит согласно одному варианту настоящего изобретения. Обратимся к фигурам 7А и 9А, где 'i' имеет значение от нуля до numTttboxes-1, где 'numTttboxes' - общее количество блоков TTT. А именно, значение 'i' указывает каждый блок TTT. Количество параметрических диапазонов, применяемое для каждого блока TTT, представляется в соответствии со значением 'i'. В некоторых вариантах блок TTT может быть разбит на низкочастотный диапазон и высокочастотный диапазон, и для низкочастотного и высокочастотного диапазонов могут быть использованы разные процессы обработки. Возможны и другие варианты разбиения.FIG. 9A shows a syntax for representing the number of parametric ranges applied to a TTT block with a fixed number of bits according to one embodiment of the present invention. Turning to figures 7A and 9A, where 'i' has a value from zero to numTttboxes-1, where 'numTttboxes' is the total number of TTT blocks. Namely, the value of 'i' indicates each TTT block. The number of parametric ranges used for each TTT block is represented according to the value of 'i'. In some embodiments, the TTT block can be divided into a low frequency range and a high frequency range, and different processing processes can be used for the low frequency and high frequency ranges. Other splitting options are possible.

Поле “bsTttDualMode” 901 указывает, работает ли данный блок TTT в разных режимах (далее это называется «дуальный режим») для низкочастотного диапазона и высокочастотного диапазона соответственно. Например, если значение поля “bsTttDualMode” 901 равно нулю, то тогда используется один режим для всего диапазона без различия между низкочастотным диапазоном и высокочастотным диапазоном. Если значение поля “bsTttDualMode” 901 равно 1, то тогда для низкочастотного диапазона и высокочастотного диапазона могут использоваться разные режимы.The “bsTttDualMode” field 901 indicates whether this TTT block operates in different modes (hereinafter referred to as “dual mode”) for the low frequency range and high frequency range, respectively. For example, if the value of the “bsTttDualMode” 901 field is zero, then one mode is used for the entire range without a difference between the low-frequency range and the high-frequency range. If the value of the “bsTttDualMode” 901 field is 1, then different modes can be used for the low-frequency range and high-frequency range.

Поле “bsTttModeLow” 902 указывает рабочий режим данного блока TTT, который может иметь различные рабочие режимы. Например, блок TTT может работать в режиме предсказания, в котором используются, например, параметры CPC и ICC, в режиме на основе оценки энергии, в котором используются, например, параметры CLD, и т.д. Если блок TTT имеет дуальный режим, то для высокочастотного диапазона может потребоваться дополнительная информация.The “bsTttModeLow” 902 field indicates the operating mode of this TTT block, which may have various operating modes. For example, the TTT block may operate in a prediction mode in which, for example, CPC and ICC parameters are used, in an energy estimation based mode in which, for example, CLD parameters are used, etc. If the TTT unit has dual mode, additional information may be required for the high frequency range.

Поле “bsTttModeHigh” 903 указывает рабочий режим высокочастотного диапазона, и в этом случае блок TTT имеет дуальный режим.The “bsTttModeHigh” field 903 indicates the operating mode of the high frequency range, in which case the TTT block has a dual mode.

Поле “bsTttBandsLow” 904 указывает количество параметрических диапазонов, применяемых для блока TTT.The “bsTttBandsLow” field 904 indicates the number of parametric ranges used for the TTT block.

Поле “bsTttBandsHigh” 905 имеет “numBands”.Field “bsTttBandsHigh” 905 has “numBands”.

Если блок TTT имеет дуальный режим, то низкочастотный диапазон может быть большим или равным нулю и меньшим, чем “bsTttBandsLow”, в то время как высокочастотный диапазон может быть большим или равным “bsTttBandsLow” и меньшим, чем “bsTttBandsHigh”.If the TTT block has dual mode, then the low-frequency range can be greater than or equal to zero and smaller than “bsTttBandsLow”, while the high-frequency range can be greater than or equal to “bsTttBandsLow” and smaller than “bsTttBandsHigh”.

Если блок TTT не имеет дуальный режим, то количество параметрических диапазонов, применяемых для блока TTT, может быть больше или равно нулю и меньше “numBands” (907).If the TTT block does not have dual mode, then the number of parametric ranges used for the TTT block may be greater than or equal to zero and less than “numBands” (907).

Поле “bsTttBandsLow” 904 может быть представлено фиксированным количеством бит. Например, как показано на фиг.9А, для представления поля “bsTttBandsLow” 904 может быть выделено 5 бит.The “bsTttBandsLow” 904 field may be represented by a fixed number of bits. For example, as shown in FIG. 9A, 5 bits may be allocated to represent the “bsTttBandsLow” 904 field.

На фиг.9В показан синтаксис для представления количества параметрических диапазонов, применяемых для блока TTT, с переменным количеством бит согласно одному варианту настоящего изобретения. Фиг.9В аналогична фиг.9А, но отличается от фиг.9А тем, что поле “bsTttBandsLow” 907 на фиг.9В представлено переменным количеством бит, в то время как поле “bsTttBandsLow” 904 на фиг.9А представлено фиксированным количеством бит. В частности, поскольку поле “bsTttBandsLow” 907 имеет значение, меньшее или равное “numBands”, поле “bsTttBands” 907 может быть представлено переменным количеством бит с использованием “numBands”.FIG. 9B shows a syntax for representing the number of parametric ranges used for a TTT block with a variable number of bits according to one embodiment of the present invention. FIG. 9B is similar to FIG. 9A, but differs from FIG. 9A in that the “bsTttBandsLow” 907 field in FIG. 9B is represented by a variable number of bits, while the “bsTttBandsLow” 904 field in FIG. 9A is a fixed number of bits. In particular, since the “bsTttBandsLow” 907 field has a value less than or equal to “numBands”, the “bsTttBands” 907 field can be represented by a variable number of bits using “numBands”.

В частности, в случае, когда “numBands” больше или равно 2^(n-1) и меньше 2^(n), поле “bsTttBandsLow” 907 может быть представлено n битами.In particular, in the case where “numBands” is greater than or equal to 2 ^ (n-1) and less than 2 ^ (n), the field “bsTttBandsLow” 907 may be represented by n bits.

Например: (i) если "numBands” равно 40, то поле “bsTttBandsLow” 907 представляется 6 битами; (ii) если "numBands” равно 28 или 20, то поле “bsTttBandsLow” 907 представляется 5 битами; (iii) если "numBands” равно 14 или 10, то поле “bsTttBandsLow” 907 представляется 4 битами; и (iv) если "numBands” равно 7, 5 или 4, то поле “bsTttBandsLow” представляется 3 битами.For example: (i) if “numBands” is 40, then the “bsTttBandsLow” 907 field is represented by 6 bits; (ii) if “numBands” is 28 or 20, then the “bsTttBandsLow” 907 field is 5 bits; (iii) if “numBands” is 14 or 10, then the “bsTttBandsLow” 907 field is represented by 4 bits; and (iv) if “numBands” is 7, 5 or 4, then the “bsTttBandsLow” field is 3 bits.

Если "numBands” находится в диапазоне, большем 2^(n-1) и меньшим или равным 2^(n), то тогда поле “bsTttBandsLow” 907 может быть представлено n битами.If “numBands” is in a range greater than 2 ^ (n-1) and less than or equal to 2 ^ (n), then the field “bsTttBandsLow” 907 can be represented by n bits.

Например: (i) если “numBands” равно 40, то поле “bsTttBandsLow” 907 представляется 6 битами; (ii) если “numBands” равно 28 или 20, то поле “bsTttBandsLow” 907 представляется 5 битами; (iii) если “numBands” равно 14 или 10, то поле “bsTttBandsLow” 907 представляется 4 битами; (iv) если “numBands” равно 7 или 5, то поле “bsTttBandsLow” представляется 3 битами; и (v) если “numBands” равно 4, то поле “bsTttBandsLow” 907 представляется 2 битами.For example: (i) if “numBands” is 40, then the “bsTttBandsLow” 907 field is represented by 6 bits; (ii) if “numBands” is 28 or 20, then the field “bsTttBandsLow” 907 is represented by 5 bits; (iii) if “numBands” is 14 or 10, then the “bsTttBandsLow” 907 field is represented by 4 bits; (iv) if “numBands” is 7 or 5, then the “bsTttBandsLow” field is represented by 3 bits; and (v) if “numBands” is 4, then the “bsTttBandsLow” 907 field is represented by 2 bits.

Поле “bsTttBandsLow” 907 может быть представлено количеством бит, определяемым функцией наименьшего целого, если в качестве переменной принять “numBands”.The “bsTttBandsLow” 907 field can be represented by the number of bits determined by the smallest integer function, if “numBands” is taken as a variable.

Например: i) в случае, когда 0 < bsTttBandsLow ≤ numBands или 0 ≤ bsTttBandsLow < numBands, поле “bsTttBandsLow” 907 представляется количеством бит, соответствующим значению ceil(log₂(numBands)) или ii) в случае, когда 0 ≤ bsTttBandsLow ≤ numBands, поле “bsTttBandsLow” 907 может быть представлено ceil(log₂(numBands+1)) битами.For example: i) in the case when 0 <bsTttBandsLow ≤ numBands or 0 ≤ bsTttBandsLow <numBands, the field “bsTttBandsLow” 907 is represented by the number of bits corresponding to the value ceil (log ₂ (numBands)) or ii) in the case when 0 ≤ bsTttBandsLow ≤ numBands, the “bsTttBandsLow” 907 field may be represented by ceil (log ₂ (numBands + 1)) bits.

Если значение меньше или равно "numBands”, то есть “numberBands” определено произвольно, то поле “bsTttBandsLow” 907 может быть представлено переменным количеством бит с использованием “numberBands”.If the value is less than or equal to “numBands”, that is, “numberBands” is defined arbitrarily, then the field “bsTttBandsLow” 907 can be represented by a variable number of bits using “numberBands”.

В частности: i) в случае, когда 0 < bsTttBandsLow ≤ numberBands или 0 ≤ bsTttBandsLow < numberBands, поле “bsTttBandsLow” 907 представляется количеством бит, соответствующим значению ceil(log₂(numberBands)) или ii) в случае, когда 0 ≤ bsTttBandsLow ≤ numberBands, поле “bsTttBandsLow” 907 может быть представлено количеством бит, соответствующим ceil(log₂(numberBands +1)).In particular: i) in the case when 0 <bsTttBandsLow ≤ numberBands or 0 ≤ bsTttBandsLow <numberBands, the field “bsTttBandsLow” 907 is represented by the number of bits corresponding to the value of ceil (log ₂ (numberBands)) or ii) in the case when 0 ≤ bsTttBandsLow ≤ numberBands, the “bsTttBandsLow” 907 field can be represented by the number of bits corresponding to ceil (log ₂ (numberBands +1)).

В случае множества блоков TTT комбинация из “bsTttBandsLow” может быть выражена в виде формулы 5, определенной ниже.In the case of multiple TTT blocks, the combination of “bsTttBandsLow” can be expressed as formula 5, defined below.

[Формула 5][Formula 5]

В этом случае bsTttBandsLow_i указывает i-й “bsTttBandsLow”. Поскольку формула 5 идентична формуле 1, подробное объяснение формулы 5 в дальнейшем описании опущено.In this case, bsTttBandsLow _i indicates the ith “bsTttBandsLow”. Since formula 5 is identical to formula 1, a detailed explanation of formula 5 is omitted in the following description.

В случае множества блоков TTT комбинация из “bsTttBandsLow” может быть представлена одной из формул с 6 по 8 с использованием “numberBands”. Поскольку формулы с 6 по 8 идентичны формулам с 2 по 4, подробное объяснение формул с 6 по 8 в последующем описании опущено.In the case of multiple TTT blocks, the combination of “bsTttBandsLow” can be represented by one of formulas 6 through 8 using “numberBands”. Since formulas 6 through 8 are identical to formulas 2 through 4, a detailed explanation of formulas 6 through 8 is omitted in the following description.

[Формула 6][Formula 6]

[Формула 7][Formula 7]

[Формула 8][Formula 8]

Количество параметрических диапазонов, применяемых для модуля преобразования каналов (например, блок OTT и/или блок TTT), может быть представлено в виде значения деления “numBands”. В этом случае в качестве значения деления используют половинное значение “numBands” или значение, являющееся результатом деления "numBands” на конкретное число.The number of parametric ranges used for the channel conversion module (for example, the OTT block and / or the TTT block) can be represented as the division value “numBands”. In this case, the half value “numBands” or the value resulting from dividing “numBands” by a specific number is used as the division value.

Как только определено количество параметрических диапазонов, применяемых для блока OTT и/или TTT, могут быть определены наборы параметров, которые можно применить для каждого блока OTT и/или каждого блока TTT в рамках количества параметрических диапазонов. Каждый из наборов параметров может быть применен для каждого блока OTT и/или каждого блока TTT за временной интервал, принятый за единицу времени. А именно, один набор параметров может быть применен к одному временному интервалу.Once the number of parametric ranges used for the OTT and / or TTT block is determined, parameter sets that can be applied to each OTT block and / or each TTT block within the number of parametric ranges can be determined. Each of the parameter sets can be applied for each OTT block and / or each TTT block for a time interval taken as a unit of time. Namely, one set of parameters can be applied to one time interval.

Как упоминалось в предыдущем описании, один пространственный кадр может включать в себя множество временных интервалов. Если пространственный кадр относится к кадру фиксированного типа, тогда набор параметров может быть применен для множества временных интервалов раной длительности. Если кадр относится к кадру переменного типа, то необходимо иметь информацию о положении временного интервала, для которого применяется набор параметров. Это подробно объясняется ниже со ссылками на фигуры с 13А по 13С.As mentioned in the previous description, one spatial frame may include multiple time slots. If a spatial frame refers to a frame of a fixed type, then a set of parameters can be applied for a variety of time intervals of wound duration. If the frame refers to a frame of variable type, then it is necessary to have information about the position of the time interval for which a set of parameters is applied. This is explained in detail below with reference to figures 13A to 13C.

На фиг.10А показан синтаксис для информации о конфигурации с пространственным расширением для кадра пространственного расширения согласно одному варианту настоящего изобретения. Информация о конфигурации с пространственным расширением может включать в себя поле “bsSacExtType” 1001, поле “bsSacExtLen” 1002, поле “bsSacExtLenAdd” 1003, поле “bsSacExtLenAddAdd” 1004 и поле “bsFillBits” 1007. Возможны и другие поля.10A shows the syntax for spatial extension configuration information for a spatial extension frame according to one embodiment of the present invention. The spatial extension configuration information may include a “bsSacExtType” field 1001, a “bsSacExtLen” field 1002, a “bsSacExtLenAdd” field 1003, a “bsSacExtLenAddAdd” field 1004, and a “bsFillBits” field 1007. Other fields are possible.

Поле “bsSacExtType” 1001 указывает тип данных кадра пространственного расширения. Например, кадр пространственного расширения может быть заполнен нулями, остаточными данными сигнала, произвольными остаточными данными сигнала после понижающего микширования или произвольными данными о дереве.The “bsSacExtType” field 1001 indicates the data type of the spatial extension frame. For example, a spatial expansion frame may be filled with zeros, residual signal data, arbitrary residual signal data after downmixing, or arbitrary tree data.

Поле “bsSacExtLen” 1002 указывает количество байт информации о конфигурации с пространственным расширением.The “bsSacExtLen” field 1002 indicates the number of bytes of spatial extension configuration information.

Поле “bsSacExtLenAdd” 1003 указывает дополнительное количество байт информации о конфигурации с пространственным расширением, если количество байт информации о конфигурации с пространственным расширением стало больше или равным, например, 15.The “bsSacExtLenAdd” field 1003 indicates an additional number of bytes of spatial extension configuration information if the number of bytes of spatial extension configuration information has become greater than or equal to, for example, 15.

Поле “bsSacExtLenAddAdd” 1004 указывает дополнительное количество байт информации о конфигурации с пространственным расширением, если количество байт информации о конфигурации с пространственным расширением стало больше или равно, например, 270.The “bsSacExtLenAddAdd” field 1004 indicates an additional number of bytes of spatial extension configuration information if the number of bytes of spatial extension configuration information is greater than or equal to, for example, 270.

После определения/выделения в кодере/декодере соответствующих полей определяется (1005) информация о конфигурации для типа данных, включенная в кадр пространственного расширения.After determining / highlighting the corresponding fields in the encoder / decoder (1005), configuration information for the data type included in the spatial extension frame is determined.

Как упоминалось в приведенном выше описании, в кадре пространственного расширения могут содержаться остаточные данные сигнала, производные остаточные данные сигнала после понижающего микширования, данные о конфигурации дерева или т.п.As mentioned in the above description, the spatial extension frame may contain residual signal data, derivative residual signal data after downmixing, tree configuration data, or the like.

Далее вычисляется (1006) количество не использованных бит исходя из длины информации о конфигурации с пространственным расширением.Next, the (1006) number of unused bits is calculated based on the length of the spatial extension configuration information.

Поле “bsFillBits” 1007 указывает количество бит данных, которые можно опустить при заполнении неиспользованных бит.The “bsFillBits” field 1007 indicates the number of data bits that can be omitted when filling in unused bits.

На фигурах 10В и 10С показаны синтаксисы для информации о конфигурации с пространственным расширением для остаточного сигнала в том случае, когда остаточный сигнал включен в кадр пространственного расширения, согласно одному варианту настоящего изобретения.Figures 10B and 10C show syntaxes for spatial extension configuration information for a residual signal when the residual signal is included in the spatial extension frame, according to one embodiment of the present invention.

Обратимся к фиг.10В, где поле “bsResidualSamplingFrequencyIndex” 1008 указывает частоту дискретизации остаточного сигнала.Referring to FIG. 10B, where the “bsResidualSamplingFrequencyIndex” field 1008 indicates the sampling frequency of the residual signal.

Поле “bsResidualFramesPerSpetialFrame” 1009 указывает количество остаточных кадров на один пространственный кадр. Например, в одном пространственном кадре может находиться 1, 2, 3 или 4 остаточных кадра.The “bsResidualFramesPerSpetialFrame” field 1009 indicates the number of residual frames per spatial frame. For example, in one spatial frame there may be 1, 2, 3 or 4 residual frames.

Блок “ResidualConfig” 1010 указывает количество параметрических диапазонов для остаточного сигнала, применяемое для каждого блока OTT и/или TTT.The “ResidualConfig” block 1010 indicates the number of parametric ranges for the residual signal applied for each OTT and / or TTT block.

Обратимся к фиг.10С, где поле “bsResidualPresent” 1011 указывает, применим ли остаточный сигнал для каждого блока OTT и/или TTT.Referring to FIG. 10C, a “bsResidualPresent” field 1011 indicates whether a residual signal is applicable for each OTT and / or TTT block.

Поле “bsResidualBands” 1012 указывает количество параметрических диапазонов остаточного сигнала, существующее в каждом блоке OTT и/или TTT, если остаточный сигнал существует в каждом блоке OTT и/или TTT. Количество параметрических диапазонов остаточного сигнала может быть представлено фиксированным количеством бит или переменным количеством бит. В случае, когда количество параметрических диапазонов представлено фиксированным количеством бит, остаточный сигнал может иметь значение, меньшее или равное общему количеству параметрических диапазонов аудиосигнала. Так для представления всех параметрических диапазонов может быть выделено необходимое количество бит (например, 5 бит на фиг.10С).The “bsResidualBands” field 1012 indicates the number of parametric ranges of the residual signal existing in each OTT and / or TTT block, if a residual signal exists in each OTT and / or TTT block. The number of parametric ranges of the residual signal can be represented by a fixed number of bits or a variable number of bits. In the case where the number of parametric ranges is represented by a fixed number of bits, the residual signal may have a value less than or equal to the total number of parametric ranges of the audio signal. So, to represent all the parametric ranges, the necessary number of bits can be allocated (for example, 5 bits in FIG. 10C).

На фиг.10D показан синтаксис для представления количества параметрических диапазонов остаточного сигнала с помощью переменного количества бит согласно одному варианту настоящего изобретения. Поле “bsResidualBands” 1014 может быть представлено переменным количеством бит с использованием “numBands”. Если numBands больше или равно 2^(n-1) и меньше 2^(n), то поле “bsResidualBands” 1014 может быть представлено n битами.10D shows a syntax for representing the number of parametric ranges of a residual signal using a variable number of bits according to one embodiment of the present invention. The “bsResidualBands” field 1014 may be represented by a variable number of bits using “numBands”. If numBands is greater than or equal to 2 ^ (n-1) and less than 2 ^ (n), then the field “bsResidualBands” 1014 may be represented by n bits.

Например: (i) если “numBands” равно 40, то поле “bsResidualBands” 1014 представляется 6 битами; (ii) если “numBands” равно 28 или 20, то поле “bsResidualBands” 1014 представляется 5 битами; (iii) если “numBands” равно 14 или 10, то поле “bsResidualBands” 1014 представляется 4 битами; (iv) если “numBands” равно 7, 5 или 4, то поле “bsResidualBands” 1014 представляется 3 битами.For example: (i) if “numBands” is 40, then the field “bsResidualBands” 1014 is represented by 6 bits; (ii) if “numBands” is 28 or 20, then the field “bsResidualBands” 1014 is represented by 5 bits; (iii) if “numBands” is 14 or 10, then the field “bsResidualBands” 1014 is represented by 4 bits; (iv) if “numBands” is 7, 5 or 4, then the “bsResidualBands” 1014 field is represented by 3 bits.

Если numBands больше 2^(n-1) и меньше или равно 2^(n), то количество параметрических диапазонов остаточного сигнала может быть представлено n битами.If numBands is greater than 2 ^ (n-1) and less than or equal to 2 ^ (n), then the number of parametric ranges of the residual signal can be represented by n bits.

Например: (i) если “numBands” равно 40, то поле “bsResidualBands” 1014 представляется 6 битами; (ii) если “numBands” равно 28 или 20, то поле “bsResidualBands” 1014 представляется 5 битами; (iii) если “numBands” равно 14 или 10, то поле “bsResidualBands” 1014 представляется 4 битами; (iv) если “numBands” равно 7 или 5, то поле “bsResidualBands” 1014 представляется 3 битами; и (v) если “numBands” равно 4, то поле “bsResidualBands” 1014 представляется 2 битами.For example: (i) if “numBands” is 40, then the field “bsResidualBands” 1014 is represented by 6 bits; (ii) if “numBands” is 28 or 20, then the field “bsResidualBands” 1014 is represented by 5 bits; (iii) if “numBands” is 14 or 10, then the field “bsResidualBands” 1014 is represented by 4 bits; (iv) if “numBands” is 7 or 5, then the field “bsResidualBands” 1014 is represented by 3 bits; and (v) if “numBands” is 4, then the “bsResidualBands” 1014 field is represented by 2 bits.

Кроме того, поле “bsResidualBands” 1014 может быть представлено количеством бит, определяемым функцией округления до ближайшего целого, если в качестве переменной принять “numBands”.In addition, the “bsResidualBands” field 1014 can be represented by the number of bits determined by the rounding function to the nearest integer if “numBands” is taken as a variable.

В частности: i) в случае, когда 0 < bsResidualBands ≤ numBands или 0 ≤ bsResidualBands < numBands, поле “bsResidualBands” 1014 представляется ceil(log₂(numBands)) битами или ii) в случае, когда 0 ≤ bsResidualBands ≤ numBands, поле “bsResidualBands” 1014 может быть представлено ceil(log₂(numBands+1)) битами.In particular: i) in the case when 0 <bsResidualBands ≤ numBands or 0 ≤ bsResidualBands <numBands, the bsResidualBands 1014 field is represented by ceil (log ₂ (numBands)) bits or ii) in the case when 0 ≤ bsResidualBands ≤ numBands, the field “BsResidualBands” 1014 may be represented by ceil (log ₂ (numBands + 1)) bits.

В некоторых вариантах поле “bsResidualBands” 1014 может быть представлено с использованием значения (numberBands), меньшего или равного numBands.In some embodiments, the “bsResidualBands” field 1014 may be represented using a value (numberBands) less than or equal to numBands.

В частности: i) в случае, когда 0 < bsResidualBands ≤ numberBands или 0 ≤ bsResidualBands < numberBands, поле “bsResidualBands” 1014 представляется ceil(log₂(numberBands)) битами или ii) в случае, когда 0 ≤ bsResidualBands ≤ numberBands, поле “bsResidualBands” 1014 может быть представлено ceil(log₂(numberBands +1)) битами.In particular: i) in the case when 0 <bsResidualBands ≤ numberBands or 0 ≤ bsResidualBands <numberBands, the bsResidualBands 1014 field is represented by ceil (log ₂ (numberBands)) bits or ii) in the case when 0 ≤ bsResidualBands ≤ numberBands, the field “BsResidualBands” 1014 may be represented by ceil (log ₂ (numberBands +1)) bits.

В случае наличия множества остаточных сигналов (N) комбинация из “bsResidualBands” может быть выражена в виде формулы 9, определенной ниже.In the case of a plurality of residual signals (N), the combination of “bsResidualBands” can be expressed as formula 9, defined below.

[Формула 9][Formula 9]

В этом случае bsResidualBands_i указывает i-й “bsResidualBands”. Поскольку формула 9 идентична формуле 1, подробное объяснение формулы 9 в дальнейшем описании опущено.In this case, bsResidualBands _i indicates the ith “bsResidualBands”. Since formula 9 is identical to formula 1, a detailed explanation of formula 9 is omitted in the following description.

В случае множества остаточных сигналов комбинация из “bsResidualBands” может быть представлена одной из формул с 10 по 12 с использованием "numberBands”. Поскольку представление “bsResidualBands” с использованием “numberBands” идентично представлению формул с 2 по 4, их подробное объяснение в последующем описании опущено.In the case of multiple residuals, the combination of “bsResidualBands” can be represented by one of formulas 10 through 12 using “numberBands.” Since the representation of “bsResidualBands” using “numberBands” is identical to the representation of formulas 2 through 4, their detailed explanation in the following description omitted.

[Формула 10][Formula 10]

[Формула 11][Formula 11]

[Формула 12][Formula 12]

Количество параметрических диапазонов остаточного сигнала может быть представлено в виде значения деления “numBands”. В этом случае для значения деления можно использовать половинное значение “numBands” или значение, являющееся результатом деления “numBands” на конкретное значение.The number of parametric ranges of the residual signal can be represented as the division value “numBands”. In this case, you can use the half value “numBands” or the value that results from dividing “numBands” by a specific value for the division value.

Остаточный сигнал может быть включен в битовый поток аудиосигнала вместе с сигналом после понижающего микширования и пространственным информационным сигналом и этот битовый поток может пересылаться в декодер. Декодер может выделять из битового потока сигнал после понижающего микширования, пространственный информационный сигнал и остаточный сигнал.The residual signal may be included in the bitstream of the audio signal along with the downmix signal and the spatial information signal, and this bitstream may be sent to the decoder. The decoder can extract from the bitstream the signal after down-mixing, the spatial information signal and the residual signal.

Далее сигнал, полученный в результате понижающего микширования, подвергается повышающему микшированию с использованием пространственной информации. Между тем, в ходе повышающего микширования к сигналу понижающего микширования прилагается остаточный сигнал. В частности, сигнал, полученный в результате понижающего микширования, подвергается повышающему микшированию во множестве модулей преобразования каналов с использованием пространственной информации. При выполнении этого в модуль преобразования каналов подается остаточный сигнал. Как упоминалось в предшествующем описании, модуль преобразования каналов имеет несколько параметрических диапазонов, и в модуль преобразования каналов подается набор параметров для каждого временного интервала. Когда остаточный сигнал подается в модуль преобразования каналов, возможно понадобится, чтобы остаточный сигнал обновил информацию о межканальной корреляции аудиосигнала, для которого применяется остаточный сигнал. Затем обновленная информация о межканальной корреляции используется в процессе повышающего микширования.Further, the signal resulting from the downmix is subjected to upmix using spatial information. Meanwhile, during the upmix, a residual signal is applied to the downmix signal. In particular, the signal resulting from the downmix is upmixed in a plurality of channel conversion modules using spatial information. When this is done, a residual signal is supplied to the channel conversion module. As mentioned in the previous description, the channel conversion module has several parametric ranges, and a set of parameters for each time interval is supplied to the channel conversion module. When the residual signal is supplied to the channel conversion module, it may be necessary for the residual signal to update the inter-channel correlation information of the audio signal for which the residual signal is applied. The updated cross-channel correlation information is then used in the upmix process.

На фиг.11А представлена блок-схема декодера для неуправляемого кодирования согласно одному варианту настоящего изобретения. Неуправляемое кодирование означает, что пространственная информация не включена в битовый поток аудиосигнала.11A is a block diagram of a decoder for unmanaged coding according to one embodiment of the present invention. Uncontrolled coding means that spatial information is not included in the bitstream of the audio signal.

В некоторых вариантах декодер включает в себя гребенку 1102 фильтров для анализа, блок 1104 анализа, блок 1106 пространственного синтеза и гребенку 1108 фильтров для синтеза. Хотя на фиг.11А показан сигнал после понижающего микширования в сигнале типа «стерео», можно использовать и другие типы сигналов после понижающего микширования.In some embodiments, the decoder includes an analysis filter bank 1102, an analysis unit 1104, a spatial synthesis unit 1106, and a synthesis filter bank 1108. Although FIG. 11A shows a signal after downmix in a stereo signal, other types of signals after downmix can be used.

В процессе функционирования декодер принимает сигнал 1101 после понижающего микширования, а гребенка 1102 фильтров для анализа преобразует полученный сигнал 1101 после понижающего микширования в сигнал 1103 частотной области. Блок 1104 анализа создает пространственную информацию на основе преобразованного сигнала 1103 после понижающего микширования. Блок 1104 анализа выполняет обработку по каждому временному интервалу, а пространственная информация 1105 может создаваться для множества временных интервалов. В этом случае временной интервал включает в себя один временной интервал.During operation, the decoder receives the signal 1101 after the down-mix, and the filter bank 1102 for analysis converts the received signal 1101 after the down-mix to a frequency-domain signal 1103. An analysis unit 1104 creates spatial information based on the transformed signal 1103 after down-mixing. An analysis unit 1104 performs processing for each time interval, and spatial information 1105 may be generated for a plurality of time intervals. In this case, the time interval includes one time interval.

Пространственная информация может быть создана в два этапа. Во-первых, из сигнала после понижающего микширования создается параметр понижающего микширования. Во-вторых, параметр понижающего микширования преобразуется в пространственную информацию, например пространственный параметр. В некоторых вариантах параметр понижающего микширования может быть создан посредством матричных вычислений сигнала после понижающего микширования.Spatial information can be created in two stages. Firstly, a down-mix parameter is created from the signal after down-mixing. Secondly, the downmix parameter is converted to spatial information, such as a spatial parameter. In some embodiments, the downmix parameter can be created by matrix calculations of the signal after downmix.

Блок 1106 пространственного синтеза создает многоканальный аудиосигнал 1107 путем синтеза созданной пространственной информации 1105 с сигналом 1103 понижающего микширования. Созданный многоканальный аудиосигнал 1107 проходит через гребенку 1108 фильтров для синтеза с целью преобразования в аудиосигнал 1109 во временной области.The spatial synthesis unit 1106 creates a multi-channel audio signal 1107 by synthesizing the generated spatial information 1105 with a downmix signal 1103. The created multi-channel audio signal 1107 passes through a filter bank 1108 for synthesis to be converted to an audio signal 1109 in the time domain.

Пространственная информация может создаваться с заранее определенных положений временных интервалов. Расстояние между этими положениями могут быть равными (то есть, эквидистантные). Например, пространственная информация может создаваться на 4 временных интервала. Пространственная информация может также созадваться с переменных положений временных интервалов. В этом случае информация о положении временного интервала, с которого создается пространственная информация, может быть извлечена из битового потока. Информация о положении может быть представлена переменным количеством бит. Информация о положении может быть представлена в виде абсолютного значения и значения разности относительно предыдущей информации о положении временного интервала.Spatial information can be created from predefined positions of time intervals. The distance between these positions may be equal (i.e., equidistant). For example, spatial information can be created over 4 time slots. Spatial information can also be created from the variable positions of time intervals. In this case, information about the position of the time interval from which the spatial information is generated can be extracted from the bitstream. The position information may be represented by a variable number of bits. Information about the position can be presented in the form of an absolute value and a difference value relative to previous information about the position of the time interval.

В случае использования неуправляемого кодирования количество параметрических диапазонов (далее называемое “bsNumguidedBlindBands”) для каждого канала аудиосигнала может быть представлено фиксированным количеством бит. “bsNumguidedBlindBands” может быть представлено переменным количеством бит с использованием “numBands”. Например, если “numBands” больше или равно 2^(n-1) и меньше 2^(n), то “bsNumguidedBlindBands” может быть представлено переменным количеством n бит.In the case of uncontrolled coding, the number of parametric ranges (hereinafter referred to as “bsNumguidedBlindBands”) for each channel of the audio signal can be represented by a fixed number of bits. “BsNumguidedBlindBands” can be represented by a variable number of bits using “numBands”. For example, if “numBands” is greater than or equal to 2 ^ (n-1) and less than 2 ^ (n), then “bsNumguidedBlindBands” can be represented by a variable number of n bits.

В частности, (а) если “numBands” равно 40, то “bsNumguidedBlindBands” представляется 6 битами, (b) если “numBands” равно 28 или 20, то “bsNumguidedBlindBands” представляется 5 битами, (с) если “numBands” равно 14 или 10, то “bsNumguidedBlindBands” представляется 4 битами, и (d) если “numBands” равно 7,5 или 4, то “bsNumguidedBlindBands” представляется 3 битами.In particular, (a) if “numBands” is 40, then “bsNumguidedBlindBands” is represented by 6 bits, (b) if “numBands” is equal to 28 or 20, then “bsNumguidedBlindBands” is represented by 5 bits, (c) if “numBands” is 14 or 10, then “bsNumguidedBlindBands” is represented by 4 bits, and (d) if “numBands” is 7.5 or 4, then “bsNumguidedBlindBands” is represented by 3 bits.

Если “numBands” больше 2^(n-1) и меньше или равно 2^(n), то тогда “bsNumguidedBlindBands” может быть представлено переменным количеством n бит.If “numBands” is greater than 2 ^ (n-1) and less than or equal to 2 ^ (n), then “bsNumguidedBlindBands” can be represented by a variable number of n bits.

Например: (а) если “numBands” равно 40, то “bsNumguidedBlindBands” представляется 6 битами, (b) если “numBands” равно 28 или 20, то “bsNumguidedBlindBands” представляется 5 битами, (с) если “numBands” равно 14 или 10, то “bsNumguidedBlindBands” представляется 4 битами, (d) если “numBands” равно 7 или 5, то “bsNumguidedBlindBands” представляется 3 битами; и (e) если “numBands” равно 4, то “bsNumguidedBlindBands” представляется 2 битами.For example: (a) if “numBands” is 40, then “bsNumguidedBlindBands” is represented by 6 bits, (b) if “numBands” is equal to 28 or 20, then “bsNumguidedBlindBands” is represented by 5 bits, (c) if “numBands” is 14 or 10, then “bsNumguidedBlindBands” is represented by 4 bits, (d) if “numBands” is 7 or 5, then “bsNumguidedBlindBands” is represented by 3 bits; and (e) if “numBands” is 4, then “bsNumguidedBlindBands” is represented by 2 bits.

Кроме того, “bsNumguidedBlindBands” может быть представлено переменным количеством бит с использованием функции минимального целого, если в качестве переменной принять “numBands”.In addition, “bsNumguidedBlindBands” can be represented by a variable number of bits using the minimum integer function, if “numBands” is taken as a variable.

Например, i) в случае, если 0 < bsNumguidedBlindBands ≤ numBands или 0 ≤ bsNumguidedBlindBands < numBands, то “bsNumguidedBlindBands” представляется ceil(log₂(numBands)) битами или ii) в случае, когда 0 ≤ bsNumguidedBlindBands ≤ numBands, то “bsNumguidedBlindBands” может быть представлено ceil(log₂(numBands+1)) битами.For example, i) if 0 <bsNumguidedBlindBands ≤ numBands or 0 ≤ bsNumguidedBlindBands <numBands, then “bsNumguidedBlindBands” is represented by ceil (log ₂ (numBands)) bits or ii) if 0 ≤ bsNumguidedBlindBands bumindbands ”Can be represented by ceil (log ₂ (numBands + 1)) bits.

Если значение меньше или равно “numBands”, то есть “numberBands” определено произвольно, то “bsNumguidedBlindBands” может быть представлено следующим образом.If the value is less than or equal to “numBands”, that is, “numberBands” is defined arbitrarily, then “bsNumguidedBlindBands” can be represented as follows.

В частности, i) в случае, если 0 < bsNumguidedBlindBands ≤ numberBands или 0 ≤ bsNumguidedBlindBands < numberBands, то “bsNumguidedBlindBands” представляется ceil(log₂(numberBands)) битами или ii) в случае, когда 0 ≤ bsNumguidedBlindBands ≤ numberBands, то “bsNumguidedBlindBands” может быть представлено ceil(log₂(numberBands+1)) битами.In particular, i) if 0 <bsNumguidedBlindBands ≤ numberBands or 0 ≤ bsNumguidedBlindBands <numberBands, then “bsNumguidedBlindBands” is represented by ceil (log ₂ (numberBands)) bits or ii) if 0 ≤ bsNumguidedBlindBands ≤ numberBands, bsNumguidedBlindBands ”may be represented by ceil (log ₂ (numberBands + 1)) bits.

Если существует N каналов, то комбинация “bsNumguidedBlindBands” может быть выражена в виде формулы 13.If there are N channels, then the combination “bsNumguidedBlindBands” can be expressed as formula 13.

[Формула 13][Formula 13]

В этом случае bsNumguidedBlindBands_i указывает i-е “bsNumguidedBlindBands”. Поскольку формула 13 идентична формуле 1, подробное объяснение формулы 13 в последующем описании опущено. Если имеется множество каналов, то “bsNumguidedBlindBands” может быть представлено одной из формул с 14 по 16 с использованием “numberBands”. Поскольку преставление “bsNumguidedBlindBands” с использованием “numberBands” идентично представлениям в формулах с 2 по 4, подробное объяснение формул с 14 по 16 в последующем описании опущено.In this case, bsNumguidedBlindBands _i indicates the ith “bsNumguidedBlindBands”. Since formula 13 is identical to formula 1, a detailed explanation of formula 13 is omitted in the following description. If there are multiple channels, then “bsNumguidedBlindBands” can be represented by one of the formulas 14 through 16 using “numberBands”. Since the representation of “bsNumguidedBlindBands” using “numberBands” is identical to the representations in formulas 2 through 4, a detailed explanation of formulas 14 through 16 is omitted in the following description.

[Формула 14][Formula 14]

[Формула 15][Formula 15]

[Формула 16][Formula 16]

На фиг.11В представлена схема для способа представления количества параметрических диапазонов в виде группы согласно одному варианту настоящего изобретения. Количество параметрических диапазонов включает в себя информацию о количестве параметрических диапазонов, применяемую для модуля преобразования каналов, информацию о количестве параметрических диапазонов, применяемых для остаточного сигнала, и информацию о количестве параметрических диапазонов для каждого канала аудиосигнала в случае использования неуправляемого кодирования. В случае, когда существует множество информаций о количестве параметрических диапазонов, множество информаций о количестве (например, “bsOttbands”, “bsTttbands”, “bsResidualBand” и/или “bsNumguidedBlindBands”) может быть представлено по меньшей мере в виде одной или нескольких групп.11B is a diagram for a method for representing the number of parametric ranges as a group according to one embodiment of the present invention. The number of parametric ranges includes information on the number of parametric ranges used for the channel conversion module, information on the number of parametric ranges used for the residual signal, and information on the number of parametric ranges for each channel of the audio signal in the case of uncontrolled coding. In the case where there is a lot of information about the number of parametric ranges, a lot of information about the quantity (for example, “bsOttbands”, “bsTttbands”, “bsResidualBand” and / or “bsNumguidedBlindBands”) can be presented in at least one or more groups.

Обратимся к фиг.11В, где, если имеется (kN+L) информаций о количестве параметрических диапазонов и если для представления каждой информации о количестве параметрических диапазонов необходимо Q бит, то множество данных о количестве параметрических диапазонов может быть представлено в виде следующей группы. В этом случае 'k' и 'N' являются произвольными целыми числами, не равными нулю, а 'L' является произвольным целым числом, удовлетворяющим неравенству 0≤L<N.Turning to FIG. 11B, where if there is (kN + L) information about the number of parametric ranges and if Q bits are needed to represent each information about the number of parametric ranges, then a lot of data on the number of parametric ranges can be represented as the following group. In this case, 'k' and 'N' are arbitrary non-zero integers, and 'L' is an arbitrary integer satisfying the inequality 0≤L <N.

Метод группирования включает в себя шаги создания k групп путем связывания N информаций о количестве параметрических диапазонов и создания последней группы путем связывания последних L информаций о количестве параметрических диапазонов. k групп могут быть представлены в виде M бит, а последняя группа может быть представлена в виде p бит. В этом случае предпочтительно, чтобы M бит было меньше N*Q бит, используемых в случае представления каждой информации о количестве параметрических диапазонов без их группирования. p бит предпочтительно меньше или равно L*Q бит, используемых в случае представления каждой информации о количестве параметрических диапазонов без их группирования.The grouping method includes the steps of creating k groups by linking N information about the number of parametric ranges and creating the last group by linking the last L information about the number of parametric ranges. k groups can be represented as M bits, and the last group can be represented as p bits. In this case, it is preferable that M bits be less than N * Q bits used in the case of presenting each information about the number of parametric ranges without grouping them. The p bit is preferably less than or equal to L * Q bits used in the case of presenting each information about the number of parametric ranges without grouping them.

Предположим, например, что информация о количестве параметрических диапазонов представляет собой b1 и b2 соответственно. Если каждая из величин b1 и b2 могут иметь пять значений, то для представления каждой из величин b1 и b2 необходимо иметь 3 бита. В этом случае, хотя 3 битами можно представить восемь значений, фактически потребуется только пять значений. То есть, каждая из величин b1 и b2 имеет три избыточных значения. Кроме того, в случае представления b1 и b2 в виде группы путем связывания b1 и b2 вместе, вместо 6 бит (=3 бита + 3 бита) можно использовать 5 бит. В частности, поскольку все комбинации b1 и b2 включают в себя 25 (=5*5) вариантов, группа из b1 и b2 может быть представлена 5 битами. Поскольку 5 бит могут представлять 32 значения, в случае представления в виде группы создается семь избыточных значений. Кроме того, в случае представления путем группирования b1 и b2 избыточность получается меньшей, чем в случае представления каждой из величин b1 и b2 в виде 3 бит. Способ представления множества данных о количестве параметрических диапазонов в виде групп может быть реализован различными путями следующим образом.Suppose, for example, that the information on the number of parametric ranges is b1 and b2, respectively. If each of the values b1 and b2 can have five values, then to represent each of the values b1 and b2 it is necessary to have 3 bits. In this case, although eight values can be represented in 3 bits, in fact only five values are required. That is, each of the quantities b1 and b2 has three redundant values. In addition, in the case of representing b1 and b2 as a group by linking b1 and b2 together, instead of 6 bits (= 3 bits + 3 bits), 5 bits can be used. In particular, since all combinations of b1 and b2 include 25 (= 5 * 5) options, a group of b1 and b2 can be represented by 5 bits. Since 5 bits can represent 32 values, when presented as a group, seven redundant values are created. In addition, in the case of representing by grouping b1 and b2, the redundancy is less than in the case of representing each of the values of b1 and b2 as 3 bits. A method of representing a plurality of data on the number of parametric ranges in the form of groups can be implemented in various ways as follows.

Если множество данных о количестве параметрических диапазонов имеет 40 значений каждое, то создается k групп с использованием в качестве N значений 2, 3, 4, 5 или 6. k групп могут быть представлены 11, 16, 22, 27 и 32 битами соответственно. В альтернативном варианте k групп представляют путем комбинирования соответствующих вариантов.If the data set on the number of parametric ranges has 40 values each, then k groups are created using N, values 2, 3, 4, 5, or 6. k groups can be represented by 11, 16, 22, 27, and 32 bits, respectively. Alternatively, k groups are represented by combining the appropriate options.

Если множество информаций о количестве параметрических диапазонов имеет 28 значений для каждой, то создается k групп с использованием в качестве N значения 6, и k групп можно представить в виде 29 бит.If the set of information about the number of parametric ranges has 28 values for each, then k groups are created using 6 as N, and k groups can be represented as 29 bits.

Если множество информаций о количестве параметрических диапазонов имеет 20 значений для каждой, то создается k групп с использованием в качестве N значений 2, 3, 4, 5, 6 или 7. k групп могут быть представлены 9, 13, 18, 22, 26 и 31 битами соответственно. В альтернативном варианте k групп могут быть представлены путем комбинирования соответствующих случаев.If a lot of information about the number of parametric ranges has 20 values for each, then k groups are created using N, values 2, 3, 4, 5, 6 or 7. k groups can be represented 9, 13, 18, 22, 26 and 31 bits respectively. Alternatively, k groups may be represented by combining appropriate cases.

Если множество информаций о количестве параметрических диапазонов имеет 14 значений для каждой, то создается k групп с использованием в качестве N значения 6. k групп можно представить 23 битами.If the set of information on the number of parametric ranges has 14 values for each, then k groups are created using N as the value 6. 6. k groups can be represented by 23 bits.

Если множество информаций о количестве параметрических диапазонов имеет 10 значений для каждой, то создается k групп с использованием в качестве N значений 2, 3, 4, 5, 6, 7, 8 или 9. k групп могут быть представлены 7, 10, 14, 17, 20, 24, 27 и 30 битами соответственно. В альтернативном варианте k групп могут быть представлены путем комбинирования соответствующих случаев.If a lot of information about the number of parametric ranges has 10 values for each, then k groups are created using N, values 2, 3, 4, 5, 6, 7, 8, or 9. k groups can be represented 7, 10, 14, 17, 20, 24, 27 and 30 bits respectively. Alternatively, k groups may be represented by combining appropriate cases.

Если множество информаций о количестве параметрических диапазонов имеет 7 значений для каждой, то создается k групп с использованием в качестве N значений 6, 7, 8, 9, 10 или 11. k групп могут быть представлены 17, 20, 23, 26, 29 и 31 битами соответственно. В альтернативном варианте k групп представляются путем комбинирования соответствующих случаев.If the set of information about the number of parametric ranges has 7 values for each, then k groups are created using N, values 6, 7, 8, 9, 10 or 11. k groups can be represented 17, 20, 23, 26, 29 and 31 bits respectively. Alternatively, k groups are represented by combining appropriate cases.

Если множество информаций о количестве параметрических диапазонов имеет, например, 5 значений для каждой, то создается k групп с использованием в качестве N значений 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 или 13. k групп могут быть представлены 5, 7, 10, 12, 14, 17, 19, 21, 24, 26, 28 и 31 битами соответственно. В альтернативном варианте k групп представляются путем комбинирования соответствующих случаев.If a lot of information about the number of parametric ranges has, for example, 5 values for each, then k groups are created using N, values 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, or 13. k groups can be represented by 5, 7, 10, 12, 14, 17, 19, 21, 24, 26, 28, and 31 bits, respectively. Alternatively, k groups are represented by combining appropriate cases.

Кроме того, множество информаций о количестве параметрических диапазонов может быть сконфигурировано для представления в виде вышеописанных групп, или для последовательного представления путем формирования каждой информации о количестве параметрических диапазонов в виде независимой битовой последовательности.In addition, a lot of information about the number of parametric ranges can be configured to represent in the form of the above groups, or for sequential presentation by generating each information about the number of parametric ranges in the form of an independent bit sequence.

На фиг.12 показан синтаксис, представляющий информацию о конфигурации пространственного кадра согласно одному варианту настоящего изобретения. Пространственный кадр включает в себя блок “FramingInfo” 1201, блок “bsIndependencyfield” 1202, блок “OttData” 1203, блок “TttData” 1204, блок “SmgData” 1205 и блок “tempShapeData” 1206.12 is a syntax showing spatial frame configuration information according to one embodiment of the present invention. The spatial frame includes a FramingInfo block 1201, a bsIndependencyfield block 1202, an OttData block 1203, a TttData block 1204, a SmgData block 1205, and a tempShapeData block 1206.

Блок “FramingInfo” 1201 включает в себя информацию о количестве наборов параметров и информацию о временном интервале, для которого применяется каждый набор параметров. Блок “FramingInfo” 1201 подробно объясняется со ссылками на фиг.13А.The “FramingInfo” block 1201 includes information about the number of parameter sets and information about the time interval for which each parameter set is applied. The “FramingInfo” block 1201 is explained in detail with reference to FIG. 13A.

Поле “bsIndependencyfield” 1202 указывает, можно ли декодировать текущий кадр без сведений о предыдущем кадре.The bsIndependencyfield 1202 field indicates whether the current frame can be decoded without information about the previous frame.

Блок “OttData” 1203 включает в себя всю информацию о пространственных параметрах для всех блоков OTT.The OttData block 1203 includes all spatial parameter information for all OTT blocks.

Блок “TttData” 1204 включает в себя всю информацию о пространственных параметрах для всех блоков TTT.The “TttData” block 1204 includes all spatial parameter information for all TTT blocks.

Блок “SmgData” 1205 включает в себя информацию для временного сглаживания, применяемую для деквантизированного пространственного параметра.The “SmgData” block 1205 includes temporal smoothing information used for the dequantized spatial parameter.

Блок “tempShapeData” 1206 включает в себя информацию для формирования временной огибающей, применяемой для декоррелированного сигнала.The “tempShapeData” block 1206 includes information for generating the temporal envelope used for the decorrelated signal.

На фиг.13А показан синтаксис для представления информации о положении временного интервала, для которого применяется набор параметров, согласно одному варианту настоящего изобретения. Поле “bsFramingType” 1301 указывает, является ли пространственный кадр аудиосигнала фиксированным кадром или переменным кадром. Фиксированный кадр означает кадр, в котором набор параметров применяется для предварительно установленного временного интервала. Например, набор параметров применяется к временному интервалу, заранее установленному с равным интервалом. Переменный кадр означает кадр, который избирательно принимает информацию о положении временного интервала, для которого применяется набор параметров.13A shows a syntax for representing position information of a time slot for which a set of parameters is applied, according to one embodiment of the present invention. The “bsFramingType” field 1301 indicates whether the spatial frame of the audio signal is a fixed frame or a variable frame. Fixed frame means a frame in which a set of parameters is applied to a preset time interval. For example, a set of parameters is applied to a time interval predefined at an equal interval. A variable frame means a frame that selectively receives information about the position of a time interval for which a set of parameters is applied.

Поле “bsNumParamSets” 1302 указывает количество наборов параметров в одном пространственном кадре (называемое далее “NumParamSets”), причем между “NumParamSets” и “bsNumParamSets” существует соотношение “NumParamSets = bsNumParamSets +1”.The “bsNumParamSets” field 1302 indicates the number of parameter sets in one spatial frame (hereinafter referred to as “NumParamSets”), and the relation “NumParamSets = bsNumParamSets +1” exists between “NumParamSets” and “bsNumParamSets”.

Например, поскольку для поля “bsNumParamSets” 1302 на фиг.13А выделено 3 бита, в одном пространственном кадре можно обеспечить максимум восемь наборов параметров. Поскольку количество выделенных бит не ограничено, в пространственном кадре можно обеспечить больше наборов параметров.For example, since 3 bits are allocated for the “bsNumParamSets” field 1302 in FIG. 13A, a maximum of eight sets of parameters can be provided in one spatial frame. Since the number of allocated bits is not limited, more sets of parameters can be provided in a spatial frame.

Если пространственный кадр относится к фиксированному типу, то информация о положении временного интервала, для которого применяется набор параметров, может быть определена согласно заранее установленному правилу, и в дополнительной информации о положении временного интервала, для которого применяется набор параметров, нет необходимости. Однако, если пространственный кадр относится к переменному типу, то необходимо иметь информацию о положении временного интервала, для которого применяется набор параметров.If the spatial frame is of a fixed type, then information about the position of the time interval for which the set of parameters is applied can be determined according to a predefined rule, and there is no need for additional information about the position of the time interval for which the set of parameters is applied. However, if the spatial frame is of a variable type, then it is necessary to have information about the position of the time interval for which the set of parameters is applied.

Поле “bsParamSlot” 1303 указывает информацию о положении временного интервала, для которого применяется набор параметров. Поле “bsParamSlot” 1303 может быть представлено переменным количеством бит с использованием нескольких временных интервалов в одном пространственном кадре, то есть “numSlots”. В частности, в случае, когда “numSlots” больше или равно 2^(n-1) и меньше 2^(n), поле “bsParamSlot” 1303 может быть представлено n битами.The “bsParamSlot” field 1303 indicates the position information of the time interval for which the parameter set is applied. The “bsParamSlot” field 1303 may be represented by a variable number of bits using multiple time intervals in one spatial frame, that is, “numSlots”. In particular, in the case where “numSlots” is greater than or equal to 2 ^ (n-1) and less than 2 ^ (n), the field “bsParamSlot” 1303 may be represented by n bits.

Например: (i) если “numSlots” находится в диапазоне между 64 и 127, то поле “bsParamSlot” 1303 может быть представлено 7 битами; (ii) если “numSlots” находится в диапазоне между 32 и 63, то поле “bsParamSlot” 1303 может быть представлено 6 битами; (iii) если “numSlots” находится в диапазоне между 16 и 31, то поле “bsParamSlot” 1303 может быть представлено 5 битами; (iv) если “numSlots” находится в диапазоне между 8 и 15, то поле “bsParamSlot” 1303 может быть представлено 4 битами; (v) если “numSlots” находится в диапазоне между 4 и 7, то поле “bsParamSlot” 1303 может быть представлено 3 битами; (vi) если “numSlots” находится в диапазоне между 2 и 3, то поле “bsParamSlot” 1303 может быть представлено 2 битами; (vii) если “numSlots” равно 1, то поле “bsParamSlot” 1303 может быть представлено 1 битом; и (viii) если “numSlots” равно 0, то поле “bsParamSlot” 1303 может быть представлено 0 бит. Аналогичным образом, если “numSlots” находится в диапазоне между 64 и 127, то поле “bsParamSlot” 1303 может быть представлено 7 битами.For example: (i) if “numSlots” is between 64 and 127, then the “bsParamSlot” 1303 field may be represented by 7 bits; (ii) if “numSlots” is between 32 and 63, then the “bsParamSlot” 1303 field may be represented by 6 bits; (iii) if “numSlots” is between 16 and 31, then the “bsParamSlot” 1303 field may be represented by 5 bits; (iv) if “numSlots” is between 8 and 15, then the “bsParamSlot” 1303 field may be represented by 4 bits; (v) if “numSlots” is between 4 and 7, then the “bsParamSlot” 1303 field may be represented by 3 bits; (vi) if “numSlots” is between 2 and 3, then the “bsParamSlot” 1303 field may be represented by 2 bits; (vii) if “numSlots” is 1, then the field “bsParamSlot” 1303 may be represented by 1 bit; and (viii) if “numSlots” is 0, then the “bsParamSlot” field 1303 may be represented with 0 bits. Similarly, if “numSlots” is between 64 and 127, then the “bsParamSlot” 1303 field may be represented by 7 bits.

Если имеется множество (N) наборов параметров, то комбинация “bsParamSlot” может быть представлена согласно формуле 9.If there are many (N) parameter sets, then the combination “bsParamSlot” can be represented according to formula 9.

[Формула 9][Formula 9]

В этом случае “bsParamSlot_i” указывает временной интервал, для которого применяется i-й набор параметров. Предположим, например, что “numSlots” равно 3 и что поле “bsParamSlot” 1303 может иметь десять значений. В этом случае для поля “bsParamSlot” 1303 необходимо три информации (далее называемые с1, с2 и с3 соответственно). Поскольку для представления каждой (с1, с2 и с3) необходимо 4 бита, всего потребуется 12 (= 4*3) бита. В случае представления с1, с2 и с3 в виде группы путем связывания их вместе, может возникнуть 1000 (=10*10*10) случаев, которые могут быть представлены 10 битами, что сэкономит 2 бита. Если “numSlots” равно 3 и если считанное значение в виде 5 бит составляет 31, то это значение может быть представлено как 31=1×(3^2)+5*(3^1)+7*(3^0). Декодирующее устройство может определить, что с1, с2 и с3 составляют 1, 5 и 7 соответственно, применив инверсию формулы 9.In this case, “bsParamSlot _i ” indicates the time interval for which the ith set of parameters is applied. Suppose, for example, that “numSlots” is 3 and that the field “bsParamSlot” 1303 can have ten values. In this case, three information is needed for the “bsParamSlot” field 1303 (hereinafter referred to as c1, c2 and c3, respectively). Since 4 bits are needed to represent each (c1, c2 and c3), a total of 12 (= 4 * 3) bits are required. In the case of representing c1, c2 and c3 as a group by linking them together, 1000 (= 10 * 10 * 10) cases may arise that can be represented by 10 bits, which will save 2 bits. If “numSlots” is 3 and if the read value in the form of 5 bits is 31, then this value can be represented as 31 = 1 × (3 ^ 2) + 5 * (3 ^ 1) + 7 * (3 ^ 0). The decoding device can determine that c1, c2 and c3 are 1, 5 and 7, respectively, by applying the inverse of formula 9.

На фиг.13В показан синтаксис для представления информации о положении временного интервала, для которого применяется набор параметров, в виде абсолютного значения и значения разности согласно одному варианту настоящего изобретения. Если пространственный кадр является кадром переменного типа, то поле “bsParamSlot” 1303 на фиг.13А может быть представлено в виде абсолютного значения и значения разности с учетом того факта, что значение информации “bsParamSlot” монотонно возрастает.FIG. 13B shows a syntax for presenting position information of a time interval for which a set of parameters is applied as an absolute value and a difference value according to one embodiment of the present invention. If the spatial frame is a variable type frame, then the “bsParamSlot” field 1303 in FIG. 13A can be represented as an absolute value and a difference value taking into account the fact that the “bsParamSlot” information value is monotonically increasing.

Например: (i) положение временного интервала, для которого применяется первый набор параметров, может быть сформировано как абсолютное значение, то есть “bsParamSlot[0]”; и (ii) положение временного интервала, для которого применяется второй или последующий набор параметров, может быть сформировано в виде значения разности, то есть “difference value” между “bsParamSlot[ps]” и “bsParamSlot[ps-1]” или “difference value-1” (далее называемое “bsDiffParamSlot[ps]”). В этом случае “ps” означает набор параметров.For example: (i) the position of the time interval for which the first set of parameters is applied can be formed as an absolute value, that is, “bsParamSlot [0]”; and (ii) the position of the time interval for which the second or subsequent set of parameters is applied can be formed as a difference value, that is, a “difference value” between “bsParamSlot [ps]” and “bsParamSlot [ps-1]” or “difference value-1 ”(hereinafter referred to as“ bsDiffParamSlot [ps] ”). In this case, “ps” means a set of parameters.

Поле “bsParamSlot[0]” 1304 может быть представлено количеством бит (далее называемым “nBitsParamSlot(0)”), вычисленным с использованием “numSlots” и “numParamSets”.The “bsParamSlot [0]” field 1304 may be represented by the number of bits (hereinafter referred to as “nBitsParamSlot (0)”) calculated using “numSlots” and “numParamSets”.

Поле “bsDiffParamSlot[ps]” 1305 может быть представлено количеством бит (далее называемым “nBitsParamSlot(ps)”), вычисленным с использованием “numSlots”, “numParamSets” и положения временного интервала, для которого применяется предыдущий набор параметров, то есть “bsParamSlot[ps-1]”.The “bsDiffParamSlot [ps]” field 1305 may be represented by the number of bits (hereinafter referred to as “nBitsParamSlot (ps)”) calculated using “numSlots”, “numParamSets” and the position of the time interval for which the previous parameter set is applied, that is, “bsParamSlot [ps-1]. ”

В частности, для представления “bsParamSlot[ps]” минимальным количеством бит количество бит для представления “bsParamSlot[ps]” может быть принято на основе следующих правил: (i) множество “bsParamSlot[ps]” увеличивается в возрастающей последовательности (bsParamSlot[ps] > bsParamSlot[ps-1]); (ii) максимальное значение “bsParamSlot[0]” равно “numSlots - numParamSets”; и (iii) в случае, если 0 < ps < numParamSets, “bsParamSlot[ps]” может иметь значение только между “bsParamSlot[ps-1] +1” и “numSlots - numParamSets + ps”.In particular, to represent “bsParamSlot [ps]” with the minimum number of bits, the number of bits to represent “bsParamSlot [ps]” can be adopted based on the following rules: (i) the set “bsParamSlot [ps]” increases in ascending order (bsParamSlot [ps ]> bsParamSlot [ps-1]); (ii) the maximum value of “bsParamSlot [0]” is equal to “numSlots - numParamSets”; and (iii) if 0 <ps <numParamSets, “bsParamSlot [ps]” can only mean between “bsParamSlot [ps-1] +1” and “numSlots - numParamSets + ps”.

Например, если “numSlots” равно 10 и если “numParamSets” равно 3, то поскольку “bsParamSlot[ps]” возрастает, то максимальное значение “bsParamSlot[0]” становится равным '10-3=7'. А именно, “bsParamSlot[0]” должно выбираться из значений от 0 до 7. Причина этого заключается в том, что количество временных интервалов для остальных наборов параметров (например, если ps равно 1 или 2) недостаточно, если “bsParamSlot[0]” имеет значение, большее 7.For example, if “numSlots” is 10 and if “numParamSets” is 3, then since “bsParamSlot [ps]” is increasing, then the maximum value of “bsParamSlot [0]” becomes '10 -3 = 7 '. Namely, “bsParamSlot [0]” must be selected from values from 0 to 7. The reason for this is that the number of time intervals for the remaining sets of parameters (for example, if ps is 1 or 2) is not enough if “bsParamSlot [0] ”Has a value greater than 7.

Если “bsParamSlot[0]” равно 5, то положение bsParamSlot[1] временного интервала для второго набора параметров должно выбираться из значений между '5+1=6' и '10-3+1=8'.If “bsParamSlot [0]” is 5, then the position of bsParamSlot [1] of the time interval for the second set of parameters should be selected from the values between '5 + 1 = 6' and '10 -3 + 1 = 8 '.

Если “bsParamSlot[1]” равно 7, то “bsParamSlot[2]” может стать равным 8 или 9. Если “bsParamSlot[1]” равно 8, то “bsParamSlot[2]” может стать равным 9.If “bsParamSlot [1]” is equal to 7, then “bsParamSlot [2]” can become equal to 8 or 9. If “bsParamSlot [1]” is equal to 8, then “bsParamSlot [2]” can become equal to 9.

Следовательно, “bsParamSlot[ps]” может быть представлено переменным количеством бит с использованием вышеуказанных признаков вместо представления в виде фиксированного количества бит.Therefore, “bsParamSlot [ps]” can be represented by a variable number of bits using the above features instead of being represented as a fixed number of bits.

При конфигурировании “bsParamSlot[ps]” в битовом потоке, если “ps” равно 0, то “bsParamSlot[0]” может быть представлено в виде абсолютного значения с помощью количества бит, соответствующих “nBitsParamSlot(0)”. Если “ps” больше 0, то “bsParamSlot[ps]” может быть представлено в виде значения разности с помощью количества бит, соответствующих “nBitsParamSlot(ps)”. При считывании сконфигурированного выше “bsParamSlot[ps]” из битового потока длину битового потока для каждых данных, то есть “nBitsParamSlot(ps)”, можно найти, использовав формулу 10.When configuring “bsParamSlot [ps]” in the bitstream, if “ps” is 0, then “bsParamSlot [0]” can be represented as an absolute value using the number of bits corresponding to “nBitsParamSlot (0)”. If “ps” is greater than 0, then “bsParamSlot [ps]” can be represented as the difference value using the number of bits corresponding to “nBitsParamSlot (ps)”. When reading the “bsParamSlot [ps]” configured above from the bitstream, the length of the bitstream for each data, that is, “nBitsParamSlot (ps),” can be found using formula 10.

[Формула 10][Formula 10]

В частности, “nBitsParamSlot(ps)” можно найти как nBitsParamSlot(0) = f_b(numSlots - numParamSets +1). Если 0<ps<numParamSets, то “nBitsParamSlot(ps)” можно найти как nBitsParamSlot(ps) = f_b(numSlots - numParamSets + ps - bsParamSlot[ps-1]). “nBitsParamSlot(ps)” можно определить, если использовать формулу 11, которая дополняет формулу 10 до 7 бит.In particular, “nBitsParamSlot (ps)” can be found as nBitsParamSlot (0) = f _b (numSlots - numParamSets +1). If 0 <ps <numParamSets, then “nBitsParamSlot (ps)” can be found as nBitsParamSlot (ps) = f _b (numSlots - numParamSets + ps - bsParamSlot [ps-1]). “NBitsParamSlot (ps)” can be defined using formula 11, which complements formula 10 to 7 bits.

[Формула 11][Formula 11]

Далее приводится пример функции f_b(x). Если “numSlots” равно 15 и если “numParamSets” равно 3, то эту функцию можно оценить как nBitsParamSlot(0)=f_b(15-3+1)=4 бита.The following is an example of the function f _b (x). If “numSlots” is 15 and if “numParamSets” is 3, then this function can be evaluated as nBitsParamSlot (0) = f _b (15-3 + 1) = 4 bits.

Если “bsParamSlot[0]”, представленное 4 битами, равно 7, то указанная функция может быть оценена как nBitsParamSlot(1)=f_b(15-3+1-7)=3 бита. В этом случае поле “bsDiffParamSlot[1]” 1305 может быть представлено 3 битами.If “bsParamSlot [0]” represented by 4 bits is 7, then the specified function can be evaluated as nBitsParamSlot (1) = f _b (15-3 + 1-7) = 3 bits. In this case, the field “bsDiffParamSlot [1]” 1305 may be represented by 3 bits.

Если значение, представленное 3 битами, равно 3, то “bsParamSlot[1]” становится равным 7+3=10. Следовательно, nBitsParamSlot(2)=f_b(15-3+2-10)=2 бита. В этом случае поле “bsDiffParamSlot[2]” 1305 может быть представлено 2 битами. Если количество оставшихся временных интервалов равно количеству оставшихся наборов параметров, то для поля “bsDiffParamSlot[ps]” может быть выделено 0 бит. Другими словами, для представления положения временного интервала, для которого применяется данный набор параметров, дополнительная информация не требуется.If the value represented by 3 bits is 3, then “bsParamSlot [1]” becomes 7 + 3 = 10. Therefore, nBitsParamSlot (2) = f _b (15-3 + 2-10) = 2 bits. In this case, the field “bsDiffParamSlot [2]” 1305 may be represented by 2 bits. If the number of remaining time intervals is equal to the number of remaining parameter sets, then 0 bits can be allocated for the “bsDiffParamSlot [ps]” field. In other words, additional information is not required to represent the position of the time interval for which this set of parameters is applied.

Таким образом, количество бит для “bsParamSlot[ps]” может быть принято переменным. Количество бит для “bsParamSlot[ps]” может считываться из битового потока с использованием функции f_b(x) в декодере. В некоторых вариантах функция f_b(x) может включать в себя функцию ceil(log₂(x)).Thus, the number of bits for “bsParamSlot [ps]” can be accepted as a variable. The number of bits for “bsParamSlot [ps]” can be read from the bitstream using the f _b (x) function in the decoder. In some embodiments, the function f _b (x) may include the function ceil (log ₂ (x)).

При считывании информации для “bsParamSlot[ps]”, представленной в виде абсолютного значения и значения разности, из битового потока в декодере сначала из битового потока может быть считан “bsParamSlot[0]”, а затем может быть считан “bsDiffParamSlot[ps]” для 0<ps<numParamSets. Затем можно найти “bsParamSlot[ps]” для интервала 0≤ps<numParamSets с использованием “bsParamSlot[0]” и “bsDiffParamSlot[ps]”. Например, как показано на фиг.13В, “bsParamSlot[ps]” можно найти, добавив “bsParamSlot[ps-1]” к “bsDiffParamSlot[ps]+1”.When reading the information for “bsParamSlot [ps]”, presented as an absolute value and a difference value, “bsParamSlot [0]” can be read from the bit stream first from the bit stream and then “bsDiffParamSlot [ps]” can be read for 0 <ps <numParamSets. Then you can find “bsParamSlot [ps]” for the interval 0≤ps <numParamSets using “bsParamSlot [0]” and “bsDiffParamSlot [ps]”. For example, as shown in FIG. 13B, “bsParamSlot [ps]” can be found by adding “bsParamSlot [ps-1]” to “bsDiffParamSlot [ps] +1”.

На фиг.13С показан синтаксис для представления информации о положении временного интервала, для которого применяется набор параметров, в виде группы согласно одному варианту настоящего изобретения. В случае, когда существует множество наборов параметров, множество “bsParamSlots” 1307 для множества наборов параметров может быть представлено по меньшей мере одной или несколькими группами.FIG. 13C illustrates a syntax for presenting position information of a time interval for which a set of parameters is applied as a group according to one embodiment of the present invention. In the case where there are multiple parameter sets, the multiple “bsParamSlots” 1307 for the multiple parameter sets may be represented by at least one or more groups.

Если количество “bsParamSlots” 1307 равно (kN+L) и если для представления каждого “bsParamSlots” 1307 необходимо Q бит, то “bsParamSlots” 1307 может быть представлено в виде следующей группы. В этом случае 'k' и 'N' являются произвольными целыми числами, не равными нулю, а 'L' является произвольным целым числом, удовлетворяющим неравенству 0≤L<N.If the number of “bsParamSlots” 1307 is equal to (kN + L) and if Q bits are required to represent each “bsParamSlots” 1307, then “bsParamSlots” 1307 can be represented as the following group. In this case, 'k' and 'N' are arbitrary non-zero integers, and 'L' is an arbitrary integer satisfying the inequality 0≤L <N.

Метод группирования может включать в себя шаги создания k групп путем связывания N “bsParamSlots” 1307 и создания последней группы путем связывания последних L “bsParamSlots” 1307. k групп могут быть представлены M битами, а последняя группа может быть представлена p битами. В этом случае предпочтительно, чтобы M бит было меньше N*Q бит, используемых в случае представления каждого из “bsParamSlots” 1307 без их группирования. p бит предпочтительно меньше или равно L*Q бит, используемых в случае представления каждого “bsParamSlots” 1307 без их группирования.The grouping method may include the steps of creating k groups by linking N “bsParamSlots” 1307 and creating the last group by linking the last L “bsParamSlots” 1307. k groups can be represented by M bits, and the last group can be represented by p bits. In this case, it is preferable that the M bits be less than the N * Q bits used in the case of representing each of the “bsParamSlots” 1307 without grouping them. The p bit is preferably less than or equal to L * Q bits used in the case of representing each “bsParamSlots” 1307 without grouping them.

Предположим, например, что пара “bsParamSlots” 1307 для двух наборов параметров представляет собой d1 и d2 соответственно. Если каждая из величин d1 и d2 могут иметь пять значений, то для представления каждой из величин d1 и d2 необходимо иметь 3 бита. В этом случае, хотя 3 битами можно представить восемь значений, фактически потребуется пять значений. То есть, каждая из величин d1 и d2 имеет три избыточных значения. Кроме того, в случае представления d1 и d2 в виде группы путем связывания d1 и d2 вместе, вместо 6 бит (=3 бита+3 бита) можно использовать 5 бит. В частности, поскольку все комбинации d1 и d2 включают в себя 25 (=5*5) типов, группа из d1 и d2 может быть представлена только 5 битами. Поскольку 5 бит могут представлять 32 значения, в случае представления в виде группы создается семь избыточных значений. Кроме того, в случае представления путем группирования d1 и d2 избыточность получается меньшей, чем в случае представления каждой из величин d1 и d2 в виде 3 бит.Suppose, for example, that the pair “bsParamSlots” 1307 for two sets of parameters is d1 and d2, respectively. If each of the values of d1 and d2 can have five values, then to represent each of the values of d1 and d2 it is necessary to have 3 bits. In this case, although eight values can be represented in 3 bits, five values are actually required. That is, each of the quantities d1 and d2 has three redundant values. In addition, if d1 and d2 are represented as a group by linking d1 and d2 together, instead of 6 bits (= 3 bits + 3 bits), 5 bits can be used. In particular, since all combinations of d1 and d2 include 25 (= 5 * 5) types, a group of d1 and d2 can only be represented by 5 bits. Since 5 bits can represent 32 values, when presented as a group, seven redundant values are created. In addition, in the case of representing by grouping d1 and d2, the redundancy is less than in the case of representing each of the values of d1 and d2 as 3 bits.

При конфигурировании группы данные для группы можно сконфигурировать, используя “bsParamSlot[0]” для начального значения и значение разности между парами “bsParamSlot[ps]” для второго или последующего значения.When configuring a group, the data for the group can be configured using “bsParamSlot [0]” for the initial value and the difference between the pairs “bsParamSlot [ps]” for the second or subsequent value.

При конфигурировании группы биты можно распределять непосредственно без группирования, если количество наборов параметров равно 1, и биты можно распределять после завершения группирования, если количество наборов параметров больше или равно 2.When configuring a group, bits can be allocated directly without grouping if the number of parameter sets is 1, and bits can be distributed after grouping is completed if the number of parameter sets is greater than or equal to 2.

На фиг.14 представлена блок-схема способа кодирования согласно одному варианту настоящего изобретения. Далее разъясняется способ кодирования аудиосигнала и функционирование кодера согласно настоящему изобретению.FIG. 14 is a flowchart of an encoding method according to one embodiment of the present invention. The following explains the encoding method of the audio signal and the operation of the encoder according to the present invention.

Сначала (S1401) определяется общее количество временных интервалов (numSlots) в одном пространственном кадре и общее количество параметрических диапазонов (numBands) аудиосигнала.First, (S1401), the total number of time slots (numSlots) in one spatial frame and the total number of parametric ranges (numBands) of the audio signal are determined.

Затем (S1402) определяется количество параметрических диапазонов, применяемых для модуля преобразования каналов (блок OTT и/или блок TTT), и/или остаточный сигнал.Then (S1402), the number of parametric ranges used for the channel conversion module (OTT block and / or TTT block) and / or the residual signal are determined.

Если блок OTT имеет режим канала LFE, то отдельно определяют количество параметрических диапазонов, применяемых для блока OTT.If the OTT block has an LFE channel mode, then the number of parametric ranges used for the OTT block is separately determined.

Если блок OTT не имеет режима канала LFE, то в качестве количества параметров, применяемых для блока OTT, используют “numBands”.If the OTT block does not have the LFE channel mode, then “numBands” is used as the number of parameters used for the OTT block.

Далее определяется тип пространственного кадра. В этом случае пространственный кадр может быть отнесен к кадру фиксированного типа или кадру переменного типа.Next, the type of spatial frame is determined. In this case, the spatial frame may be assigned to a frame of a fixed type or a frame of variable type.

Если пространственный кадр относится к переменному типу (S1403), то определяется (S1406) количество наборов параметров, используемых в одном пространственном кадре. В этом случае набор параметров может быть использован для модуля преобразования каналов на каждом временном интервале.If the spatial frame is of a variable type (S1403), then the number of parameter sets used in one spatial frame is determined (S1406). In this case, a set of parameters can be used for the channel conversion module at each time interval.

Далее (S1407) определяют положение временного интервала, для которого применяется данный набор параметров. В этом случае положение временного интервала, для которого применяется данный набор параметров, может быть представлено в виде абсолютного значения и значения разности. Например, положение временного интервала, для которого применяется первый набор параметров, может быть представлено в виде абсолютного значения, а положение временного интервала, для которого применяется второй или последующий набор параметров, может быть представлено в виде значения разности относительно положения предыдущего временного интервала. В этом случае положение временного интервала, для которого применяется данный набор параметров, может быть представлено переменным количеством бит.Next (S1407), the position of the time interval for which this set of parameters is applied is determined. In this case, the position of the time interval for which this set of parameters is applied can be represented as an absolute value and a difference value. For example, the position of the time interval for which the first set of parameters is applied can be represented as an absolute value, and the position of the time interval for which the second or subsequent set of parameters is applied can be represented as the difference value relative to the position of the previous time interval. In this case, the position of the time interval for which this set of parameters is applied can be represented by a variable number of bits.

В частности, положение временного интервала, для которого применяется первый набор параметров, может быть представлено количеством бит, вычисляемых с использованием общего количества временных интервалов и общего количества наборов параметров. Положение временного интервала, для которого применяется второй или последующий набор параметров, может быть представлено количеством бит, вычисляемым с использованием общего количества временных интервалов, общего количества наборов параметров и положения временного интервала, для которого применяется предыдущий набор параметров.In particular, the position of the time interval for which the first set of parameters is applied can be represented by the number of bits computed using the total number of time intervals and the total number of parameter sets. The position of the time interval for which the second or subsequent set of parameters is applied may be represented by the number of bits calculated using the total number of time intervals, the total number of parameter sets and the position of the time interval for which the previous parameter set is applied.

Если пространственный кадр относится к фиксированному типу, то определяется (S1404) количество наборов параметров, используемых в одном пространственном кадре. В этом случае положение временного интервала, для которого применяется данный набор параметров, выбирают с использованием заранее установленного правила. Например, положение временного интервала, для которого применяется набор параметров, может быть определено таким образом, чтобы он находился на равном расстоянии от положения временного интервала, для которого применяется предыдущий набор параметров (S1405).If the spatial frame is of a fixed type, then the number of parameter sets used in one spatial frame is determined (S1404). In this case, the position of the time interval for which this set of parameters is applied is selected using a predetermined rule. For example, the position of the time interval for which the parameter set is applied can be determined so that it is at an equal distance from the position of the time interval for which the previous parameter set is applied (S1405).

Далее блок понижающего микширования и блок создания пространственной информации создают сигнал после понижающего микширования и пространственную информацию соответственно, используя определенное выше общее количество временных интервалов, общее количество параметрических диапазонов, количество параметрических диапазонов, применяемых для блока преобразования канала, общее количество наборов параметров в одном пространственном кадре и информацию о положении временного интервала, для которого применяется набор параметров (S1408).Next, the downmix block and the spatial information creation block create a signal after downmix and spatial information, respectively, using the total number of time intervals defined above, the total number of parametric ranges, the number of parametric ranges used for the channel transform block, the total number of parameter sets in one spatial frame and information about the position of the time interval for which the set of parameters is applied (S1408).

Наконец, блок мультиплексирования создает битовый поток, включающий в себя сигнал после понижающего микширования и пространственную информацию (S1409), а затем передает созданный битовый поток в декодер (S1409).Finally, the multiplexing unit creates a bit stream including a signal after down-mixing and spatial information (S1409), and then transfers the created bit stream to a decoder (S1409).

На фиг.15 представлена блок-схема способа декодирования согласно одному варианту настоящего изобретения. Способ декодирования аудиосигнала и функционирование декодера согласно настоящему изобретению поясняется ниже.15 is a flowchart of a decoding method according to one embodiment of the present invention. The audio decoding method and the operation of the decoder according to the present invention is explained below.

Сначала декодер принимает битовый поток аудиосигнала (S1501). Блок демультиплексирования выделяет сигнал после понижающего микширования и сигнал с пространственной информацией из полученного битового потока (S1502). Далее блок декодирования сигнала с пространственной информацией извлекает из информации о конфигурации сигнала с пространственной информацией информацию об общем количестве временных интервалов в одном пространственном кадре, общем количестве параметрических диапазонов и количестве параметрических диапазонов, применяемом для модуля преобразования каналов (S1503).First, the decoder receives the bitstream of the audio signal (S1501). The demultiplexing unit extracts the signal after down-mixing and the spatial information signal from the obtained bit stream (S1502). Next, the spatial information signal decoding unit extracts from the configuration information of the spatial information signal information on the total number of time intervals in one spatial frame, the total number of parametric ranges and the number of parametric ranges used for the channel conversion module (S1503).

Если пространственный кадр относится к переменному типу (S1504), из пространственного кадра извлекается количество наборов параметров в одном пространственном кадре и информация о положении временного интервала, для которого применяется данный набор параметров (S1505). Информация о положении временного интервала может быть представлена фиксированным или переменным количеством бит. В этом случае информация о положении временного интервала, для которого применяется первый набор параметров, может быть представлена в виде абсолютного значения, а информация о положении временных интервалов, для которых применяются второй или последующий наборы параметров, может быть представлена в виде значения разности. Действительную информацию о положении временных интервалов, для которых применяются второй или последующие наборы параметров, можно найти путем добавления значения разности к информации о положении временного интервала, для которого применяется предыдущий набор параметров.If the spatial frame is of a variable type (S1504), the number of parameter sets in one spatial frame and information about the position of the time interval for which this set of parameters is applied is extracted from the spatial frame (S1505). Information about the position of the time interval can be represented by a fixed or variable number of bits. In this case, information about the position of the time interval for which the first set of parameters is applied can be represented as an absolute value, and information about the position of time intervals for which the second or subsequent sets of parameters are applied can be represented as the difference value. Valid information about the position of the time intervals for which the second or subsequent sets of parameters are applied can be found by adding the difference value to the information about the position of the time interval for which the previous set of parameters is applied.

Наконец, сигнал после понижающего микширования преобразуется в многоканальный аудиосигнал с использованием извлеченной информации (S1506).Finally, the signal after down-mixing is converted to a multi-channel audio signal using the extracted information (S1506).

Вышеописанные раскрытые варианты изобретения обеспечивают ряд преимуществ по сравнению со стандартными схемами аудиокодирования.The above disclosed embodiments of the invention provide a number of advantages over standard audio coding schemes.

Во-первых, при кодировании многоканального аудиосигнала путем представления положения временного интервала, для которого применяется набор параметров, с помощью переменного количества бит в раскрытых вариантах изобретения возможно уменьшение объема передаваемых данных.First, when encoding a multi-channel audio signal by presenting the position of a time interval for which a set of parameters is applied, using a variable number of bits in the disclosed embodiments of the invention, it is possible to reduce the amount of transmitted data.

Во-вторых, путем представления положения временного интервала, для которого применяется первый набор параметров, в виде абсолютного значения и представления положений временных интервалов, для которых применяются второй или последующий наборы параметров, в виде значения разности в раскрытых вариантах изобретения можно уменьшить объем передаваемых данных.Secondly, by representing the position of the time interval for which the first set of parameters is applied, in the form of an absolute value and presenting the positions of the time intervals for which the second or subsequent sets of parameters are applied, as the difference value in the disclosed embodiments of the invention, the amount of transmitted data can be reduced.

В-третьих, путем представления количества параметрических диапазонов, применяемых для указанного модуля преобразования каналов, в виде блока OTT и/или блока TTT с помощью фиксированного или переменного количества бит, в раскрытых вариантах изобретения можно уменьшить объем передаваемых данных. В этом случае положения временных интервалов, для которых применяются наборы параметров, могут быть представлены с использованием обсужденного выше принципа, где наборы параметров могут находиться в рамках количества параметрических диапазонов.Thirdly, by representing the number of parametric ranges used for the specified channel conversion module as an OTT block and / or TTT block using a fixed or variable number of bits, the amount of transmitted data can be reduced in the disclosed embodiments of the invention. In this case, the positions of the time intervals for which the parameter sets are applied can be represented using the principle discussed above, where the parameter sets can be within the number of parametric ranges.

На фиг.16 представлена блок-схема примерной архитектуры 1600 устройства для реализации аудиокодера/декодера, описанных со ссылками на фигуры 1-15. Архитектура 1600 устройства применима для множества различных устройств, включающих в себя, но не только: персональные компьютеры, компьютеры-серверы, пользовательские электронные устройства, мобильные телефоны, персональные цифровые помощники (PDA), электронные планшеты, телевизионные системы, телевизионные приставки, игровые приставки, медиа плееры, музыкальные плееры, навигационные системы или любое другое устройство, способное декодировать аудиосигналы. В некоторых из этих устройств может быть реализована модифицированная архитектура, где используется комбинация аппаратных и программных средств.FIG. 16 is a block diagram of an example architecture 1600 of an apparatus for implementing an audio encoder / decoder described with reference to FIGS. 1-15. The device architecture 1600 is applicable to many different devices, including, but not limited to: personal computers, server computers, user electronic devices, mobile phones, personal digital assistants (PDAs), electronic tablets, television systems, set-top boxes, game consoles, media players, music players, navigation systems, or any other device capable of decoding audio signals. Some of these devices may have a modified architecture that uses a combination of hardware and software.

Архитектура 1600 включает в себя один или несколько процессоров 1602 (например, PowerPC®, Intel Pentium® 4 и т.д.), одно или несколько устройств 1604 отображения (например, электроннолучевая трубка (CRT), жидкокристаллический дисплей (LCD)), аудиоподсистему 1606 (например, аппаратные/программные аудиосредства), один или несколько сетевых интерфейсов 1608 (например, Ethernet, FireWire®, шина USB и т.д.), устройства 1610 ввода (например, клавиатура, мышь и т.д.) и один или несколько считываемых компьютером носителей 1612 (например, ОЗУ (RAM), ПЗУ (ROM), синхронное динамическое ОЗУ (SDRAM), жесткий диск, оптический диск, флэш-память и т.д.). Эти компоненты могут обмениваться сообщениями и данными через одну или несколько шин 1614 (например, стандартов EISA, PCI, PCI Express и т.д.).Architecture 1600 includes one or more processors 1602 (e.g., PowerPC®, Intel Pentium® 4, etc.), one or more display devices 1604 (eg, a CRT, a liquid crystal display (LCD)), and an audio subsystem 1606 (e.g., hardware / software audio), one or more network interfaces 1608 (e.g., Ethernet, FireWire®, USB bus, etc.), input devices 1610 (e.g., keyboard, mouse, etc.) and one or several computer-readable media 1612 (e.g., RAM, ROM, synchronous dynamic RAM (SDRAM), hard disk, optical disk, flash memory, etc.). These components can exchange messages and data through one or more buses 1614 (for example, standards EISA, PCI, PCI Express, etc.).

Термин «считываемый компьютером носитель» относится к любому носителю, который участвует в обеспечении процессора 1602 командами для их исполнения, в том числе, но не только: энергонезависимые носители (например, оптические или магнитные диски), энергозависимые носители (например, память) и среду передачи. Среда передачи включает в себя, но не только, коаксиальные кабели, медные провода и оптическое волокно. Среда передачи также может существовать в виде акустических, световых или радиочастотных волн.The term “computer-readable medium” refers to any medium that is involved in providing the processor 1602 with instructions for executing them, including but not limited to non-volatile media (eg, optical or magnetic disks), volatile media (eg, memory) and the environment transmission. The transmission medium includes, but not limited to, coaxial cables, copper wires, and optical fiber. The transmission medium may also exist in the form of acoustic, light or radio frequency waves.

Считываемый компьютером носитель 1612 кроме того включает в себя операционную систему 1616 (например, Mac OS®, Windows®, Linux и т.д.), сетевой модуль 1618 связи, аудиокодек 1620 и одно или несколько приложений 1622. Операционная система 1616 может быть многопользовательской, многопроцессорной, многозадачной, многопоточной, иметь режим работы в реальном времени и т.п. Операционная система 1616 выполняет базовые задачи, в том числе, но не только: распознавание входных данных, поступающих от устройств 1610 ввода; посылку выходных данных на устройства 1604 отображения и аудиоподсистему 1606; отслеживание файлов и каталогов на считываемых компьютером носителях 1612 (например, память или запоминающее устройство); управление периферийными устройствами (например, накопители на дисках, принтеры и т.д.); и управление трафиком по одной или нескольким шинам 1614.Computer-readable media 1612 also includes an operating system 1616 (eg, Mac OS®, Windows®, Linux, etc.), a network communication module 1618, an audio codec 1620, and one or more applications 1622. The operating system 1616 may be multi-user , multiprocessor, multitasking, multithreaded, have a real-time mode of operation, etc. Operating system 1616 performs basic tasks, including, but not limited to: recognition of input data coming from input devices 1610; sending output to the display devices 1604 and the audio subsystem 1606; tracking files and directories on computer-readable media 1612 (e.g., memory or storage device); peripheral device management (for example, disk drives, printers, etc.); and traffic control on one or more buses 1614.

Сетевой модуль 1618 связи включает в себя различные компоненты для установки и поддержки сетевых соединений (например, программные средства для реализации протоколов связи, таких как, TCP/IP, HTTP, Ethernet и т.д.). Сетевой модуль 1618 связи может включать в себя браузер, предоставляющий операторам архитектуры 1600 устройства возможность вести в сети (например, Интернет) поиск информации (например, аудиоконтента).The communication network module 1618 includes various components for setting up and maintaining network connections (for example, software for implementing communication protocols such as TCP / IP, HTTP, Ethernet, etc.). The communication network module 1618 may include a browser enabling device architecture operators 1600 to search for information (e.g., audio content) on the network (e.g., the Internet).

Аудиокодек 1620 отвечает за реализацию всех или части процессов обработки, связанной с кодированием и/или декодированием и описанной со ссылками на фигуры 1-15. В некоторых вариантах аудиокодек работает вместе с аппаратными средствами (например, процессор (процессоры) 1602, аудиоподсистема 1606) для обработки аудиосигналов, включая кодирование и/или декодирование аудиосигналов, согласно описанному здесь настоящему изобретению.The audio codec 1620 is responsible for implementing all or part of the processing processes associated with encoding and / or decoding and described with reference to figures 1-15. In some embodiments, the audio codec operates in conjunction with hardware (e.g., processor (s) 1602, audio subsystem 1606) for processing audio signals, including encoding and / or decoding of audio signals, as described herein.

Приложения 1622 могут включать в себя любое программное приложение, относящееся к аудиоконтенту, и/или к кодированию и/или декодированию аудиоконтента, в том числе, но не только: к медиа плеерам, музыкальным плеерам (например, MP3 плеерам), приложениям мобильных телефонов, устройствам PDA, телевизионным системам, телевизионным приставкам. В одном варианте аудиокодек может быть использован поставщиком прикладных услуг для обеспечения услуг кодирования/декодирования через сеть (например, Интернет).Applications 1622 may include any software application related to audio content and / or to encoding and / or decoding audio content, including but not limited to: media players, music players (e.g., MP3 players), mobile phone applications, PDA devices, television systems, set-top boxes. In one embodiment, the audio codec may be used by an application service provider to provide encoding / decoding services over a network (e.g., the Internet).

В приведенном выше описании в целях объяснения изложены многочисленные конкретные детали, чтобы обеспечить полное понимание изобретения. Однако специалистам в данной области техники должно быть очевидно, что изобретение можно практически реализовать без этих конкретных деталей. В других примерах структуры и устройства показаны в виде блок-схем во избежание затемнения сути изобретения.In the above description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the invention. However, it should be apparent to those skilled in the art that the invention can be practiced without these specific details. In other examples, structures and devices are shown in block diagrams in order to avoid obscuring the invention.

В частности, специалисты в данной области техники должны понимать, что могут быть использованы другие архитектуры и графические среды и что настоящее изобретение можно реализовать с использованием графических инструментов и продуктов, отличающихся от описанных выше. В частности, подход по схеме клиент/сервер является лишь примером архитектуры, обеспечивающей инструментальные функциональные возможности настоящего изобретения; специалистам в данной области техники очевидно, что также могут быть использованы другие подходы, отличающиеся от схемы клиент/сервер.In particular, those skilled in the art should understand that other architectures and graphical environments can be used and that the present invention can be implemented using graphical tools and products other than those described above. In particular, the client / server approach is just an example of an architecture providing instrumental functionality of the present invention; it will be apparent to those skilled in the art that other approaches other than a client / server scheme may also be used.

Некоторые разделы подробного описания представлены на языке алгоритмов и символических представлений операций над битами данных в компьютерной памяти. Эти алгоритмические описания и представления являются средствами, используемыми специалистами в области обработки данных для наиболее эффективной передачи существа своей работы другим специалистам в данной области техники. Под алгоритмом здесь, как и в общем случае, понимается самосогласованная последовательность шагов, приводящая к желаемому результату. Эти шаги требуют выполнения физических манипуляций с физическими величинами. Обычно, хотя не обязательно, эти величины принимают форму электрических или магнитных сигналов, которые можно запоминать, передавать, комбинировать вместе, сравнивать и выполнять с ними иные манипуляции. При практическом использовании эти сигналы удобно называть битами, значениями, элементами, символами, знаками, членами, числами или т.п.Some sections of the detailed description are presented in the language of algorithms and symbolic representations of operations on data bits in computer memory. These algorithmic descriptions and representations are the means used by specialists in the field of data processing for the most efficient transfer of the essence of their work to other specialists in this field of technology. Here, as in the general case, an algorithm is understood as a self-consistent sequence of steps leading to the desired result. These steps require physical manipulations with physical quantities. Usually, although not necessarily, these quantities take the form of electrical or magnetic signals that can be stored, transmitted, combined together, compared and otherwise manipulated with them. In practical use, these signals are conveniently called bits, values, elements, symbols, signs, members, numbers, or the like.

Промышленная применимостьIndustrial applicability

Однако следует иметь в виду, что все указанные и аналогичные термины должны быть связаны с соответствующими физическими величинами и являются просто удобными обозначениями, применяемыми для этих величин. Как очевидно следует из данного обсуждения, если специально не оговорено иное, во всем описании рассуждения, где используются такие термины, как «обработка», «вычисление», «расчеты», «определение» или «отображение» и т.п., относятся к действиям и процессам компьютерной системы или аналогичного электронного вычислительного устройства, которое манипулирует данными и преобразует данные, представленные в виде физических (электронных) величин в регистрах и запоминающих устройствах компьютерной системы, в другие данные, представляемые аналогичным образом как физические величины в запоминающих устройствах и регистрах компьютерной системы или других устройствах для запоминания, передачи или отображения информации.However, it should be borne in mind that all these and similar terms should be associated with the corresponding physical quantities and are simply convenient notations used for these quantities. As obviously follows from this discussion, unless expressly agreed otherwise, in the entire description of reasoning, where terms such as “processing”, “calculation”, “calculations”, “definition” or “display”, etc. are used, refer to the actions and processes of a computer system or similar electronic computing device that manipulates data and converts data presented in the form of physical (electronic) quantities in registers and storage devices of a computer system into other data represented by anal similar way as the physical quantities in the memories and registers of the computer system or other devices for storing, transmitting or displaying information.

Настоящее изобретение также относится к устройству для выполнения операций. Это устройство может быть специально сконструировано для требуемых целей либо оно может содержать компьютер общего назначения, избирательно активируемый или реконфигурируемый компьютерной программой, хранящейся в компьютере. Указанная компьютерная программа может храниться на считываемом компьютером носителе, таком как, но не только: диск любого типа, включая гибкие диски, оптические диски, компакт диски (CD-ROM) и магнитооптические диски, память только для считывания (ROM), память с произвольным доступом (RAM), электрически стираемые постоянные запоминающие устройства (EPROM), электрически стираемые программируемые запоминающие устройства (EEPROM), магнитные или оптические карты или носители любого типа, подходящие для запоминания электронных команд и связанные с шиной компьютерной системы.The present invention also relates to a device for performing operations. This device can be specially designed for the required purposes or it can contain a general-purpose computer selectively activated or reconfigurable by a computer program stored in a computer. The specified computer program may be stored on a computer-readable medium, such as, but not limited to: any type of disk, including floppy disks, optical disks, compact discs (CD-ROM) and magneto-optical disks, read-only memory (ROM), random-access memory access (RAM), electrically erasable read-only memory devices (EPROM), electrically erasable programmable memory devices (EEPROM), magnetic or optical cards or any type of media suitable for storing electronic commands and connected to the bus computer system.

Представленные здесь алгоритмы и модули по существу не относятся к какому-либо конкретному компьютеру или иному устройству. Могут быть использованы системы общего назначения с программами в соответствии с изложенными здесь основными принципами, или, бывает, что удобнее сконструировать более специализированные устройства для выполнения шагов, раскрытых здесь способов. Необходимая структура для этих разнообразных систем вытекает из приведенного ниже описания. Вдобавок, настоящее изобретение не описано со ссылками на какой-либо конкретный программный язык. Ценно то, что для реализации описанных здесь основных принципов изобретения можно использовать множество различных языков программирования. Кроме того, специалистам в данной области техники должно быть ясно, что указанные модули, признаки, атрибуты, методики и другие аспекты изобретения могут быть реализованы в виде программных средств, аппаратных средств, программно-аппаратных средств или любой их комбинации. Конечно, когда компонента настоящего изобретения реализована в виде программного средства, она может быть выполнена в виде автономной программы, в виде части более крупной программы, множества отдельных программ, статически или динамически связанной библиотеки, загружаемого в ядро модуля, и/или любым другим путем, известным в настоящее время или который может появиться в будущем в распоряжении специалистов в области компьютерного программирования. Вдобавок, настоящее изобретение не ограничивается реализацией в рамках какой-либо конкретной операционной системы или среды.The algorithms and modules presented here are essentially not related to any particular computer or other device. General-purpose systems with programs can be used in accordance with the basic principles set forth here, or it may be more convenient to design more specialized devices to perform the steps described here. The necessary structure for these various systems follows from the description below. In addition, the present invention is not described with reference to any particular programming language. It is valuable that for the implementation of the basic principles of the invention described here, you can use many different programming languages. In addition, it should be clear to those skilled in the art that these modules, features, attributes, techniques, and other aspects of the invention may be implemented in the form of software, hardware, firmware, or any combination thereof. Of course, when a component of the present invention is implemented as a software tool, it can be implemented as a stand-alone program, as part of a larger program, many separate programs, a statically or dynamically linked library loaded into the core of the module, and / or in any other way, known at the present time or which may appear in the future at the disposal of specialists in the field of computer programming. In addition, the present invention is not limited to implementation within any particular operating system or environment.

Специалистам в данной области техники должно быть ясно, что в обсужденный здесь вариант изобретения могут быть внесены различные изменения и модификации, не выходящие за рамки существа или объема изобретения. Таким образом, здесь предполагается, что настоящее изобретение покрывает все указанные модификации и видоизменения раскрытых здесь вариантов, при условии, если указанные модификации и видоизменения не выходят за рамки объема прилагаемой формулы изобретения и ее эквивалентов.Those skilled in the art will appreciate that various changes and modifications may be made to the embodiment of the invention discussed herein without departing from the spirit or scope of the invention. Thus, it is intended here that the present invention covers all of these modifications and variations of the embodiments disclosed herein, provided that these modifications and modifications are not beyond the scope of the appended claims and their equivalents.

Claims

1. A method of decoding an audio signal, comprising:
receiving a bit stream representing an audio signal, the bit stream having a frame;
determining the number of time slots and the number of parameter sets from the bitstream, the parameter sets including one or more parameters;
determining position information from the bitstream, wherein the position information indicates a position of a time interval in an ordered set of time intervals for which a set of parameters is applied where an ordered set of time intervals is included in the frame; and
decoding an audio signal based on the number of time slots, the number of parameter sets and position information,
where the position information is represented by a variable number of bits based on the position of the time interval.

2. The method according to claim 1, in which a variable number of bits is determined using the number of time slots.

3. The method according to claim 1, additionally containing:
if the number of time intervals to be decoded is equal to the number of sets of parameters used, then information about the position of the time interval for which the set of parameters is applied is not determined.

4. The method according to claim 3, in which if the number of parameter sets is greater than or equal to 2 ^ (n-1) and less than 2 ^ (n), then the variable number of bits is defined as n bits.

5. The method according to claim 3, in which if the number of parameter sets is greater than 2 ^ (n-1) and less than or equal to 2 ^ (n), then a variable number of bits is defined as n bits.

6. The method according to claim 2, in which the position information is represented as the sum of the previous value and the difference value, where the previous value indicates the position information of the time interval for which the first set of parameters is applied, and the difference value indicates information about the position of the time interval, for which the second set of parameters is applied.

7. The method according to claim 6, in which the previous value is represented by a variable number of bits determined using at least one of: the number of time intervals and the number of sets of parameters.

8. The method according to claim 7, in which a variable number of bits is determined using the difference between the number of time intervals and the number of sets of parameters.

9. The method according to claim 6, in which the difference value is represented by a variable number of bits determined using at least one of: the number of time intervals, the number of parameter sets and the information about the position of the time interval for which the previous parameter set is applied.

10. The method according to claim 9, in which a variable number of bits is determined using the difference between the number of time intervals and at least one of: the number of sets of parameters and information about the position of the time interval for which the previous set of parameters is applied.

11. An apparatus for decoding an audio signal, comprising:
a demultiplexer separating the downmix signal and the spatial information from the bitstream representing the audio signal, the bitstream comprising a frame;
a unit for decoding the downmix signal;
a spatial information decoding unit determining a number of time intervals, a number of parameter sets and position information indicating a position of a time interval in an ordered set of time intervals included in a frame to which the parameter set is applied; and
an upmixing unit decoding an audio signal based on the number of time slots, the number of parameter sets and position information,
moreover, the position information is represented by a variable number of bits based on the position of the time interval.

12. The device according to claim 11, in which a variable number of bits is determined using the number of time intervals.

13. The device according to claim 11, in which if the number of time intervals to be decoded is equal to the number of parameter sets to be applied, the spatial information decoding unit does not determine information about the position of the time interval to which the parameter set is applied.

14. The device according to claim 11, in which, if the number of parameter sets is greater than or equal to 2 ^ (n-1) and less than 2 ^ (n), then the variable number of bits is defined as n bits.

15. The device according to claim 11, in which the position information is represented as the sum of the previous value and the difference value, where the previous value indicates the position information of the time interval for which the first set of parameters is applied, and the difference value indicates information about the position of the time interval, for which the second set of parameters is applied.

16. The device according to clause 15, in which the previous value is represented by a variable number of bits determined using at least one of: the number of time intervals and the number of sets of parameters.

17. The device according to clause 16, in which a variable number of bits is determined using the difference between the number of time intervals and the number of sets of parameters.

18. A method of encoding an audio signal, comprising:
determining the number of time slots and the number of parameter sets, the parameter sets including one or more parameters;
generating information indicating the position of at least one time interval in an ordered set of time intervals to which a set of parameters is applied;
encoding the audio signal in the form of a bit stream including a frame, the frame including an ordered set of time slots; and
inserting into the bit stream a variable number of bits that represent the position of said time interval in an ordered set of time intervals, the variable number of bits being determined by the position of the time interval.

19. An apparatus for encoding an audio signal, comprising:
a down-mix unit performing down-mix from an audio signal to a down-mix signal;
a downmix signal coding unit encoding a downmix signal;
a spatial information generation unit generating spatial information defining a number of time intervals and a number of parameter sets, and indicating a position of at least one time interval in an ordered set of time intervals to which the parameter set is applied; and
a multiplexer that multiplexes the encoded downmix signal and spatial information into the bitstream and inserts a variable number of bits into the bitstream that represents the position of the time interval in an ordered set of time intervals, the variable number of bits being determined by the position of the time interval.