RU2792944C2

RU2792944C2 - Methods, device and systems for generating, transmitting and processing immediate playback frames (ipf)

Info

Publication number: RU2792944C2
Application number: RU2021106407A
Authority: RU
Inventors: Кристоф ФЕРШ; Дэниел ФИШЕР
Original assignee: Долби Интернешнл Аб
Priority date: 2018-08-21
Filing date: 2019-08-20
Publication date: 2023-03-28

Abstract

FIELD: computing technology.

SUBSTANCE: technical result is achieved by decoding the encoded audio data bitstream, wherein the encoded audio data bitstream represents a sequence of audio signal sample values and contains a plurality of frames, each frame containing associated encoded audio signal sample values; removing instant play frames from the encoded audio data bitstream and a corresponding permanent digital storage medium.

EFFECT: technical result consists in enabling the processing of the instant playback frame (IPF) in the MPEG-4 Audio standard.

15 cl, 9 dwg

Description

ПЕРЕКРЕСТНАЯ ССЫЛКА НА РОДСТВЕННУЮ ЗАЯВКУCROSS-REFERENCE TO RELATED APPLICATION

Данная заявка заявляет приоритет следующей приоритетной заявки: предварительная заявка США 62/720 680 (ссылка: D18080USP1), поданная 21 августа 2018 г., которая включена в данную заявку посредством ссылки.This application claims priority to the following priority application: U.S. provisional application 62/720,680 (reference: D18080USP1), filed August 21, 2018, which is incorporated herein by reference.

ОБЛАСТЬ ТЕХНИКИFIELD OF TECHNOLOGY

Настоящее изобретение в целом относится к звуковым кодерам, способам кодирования, звуковым декодерам и способам декодирования, в том числе к способу декодирования битового потока кодированных звуковых данных, причем битовый поток кодированных звуковых данных представляет последовательность значений отсчетов звукового сигнала и содержит множество кадров, причем каждый кадр содержит связанные кодированные значения отсчетов звукового сигнала, и способу генерирования битового потока кодированных звуковых данных с кадрами немедленного воспроизведения. Настоящее изобретение также относится к устройству для генерирования кадров немедленного воспроизведения в битовом потоке кодированных звуковых данных или для удаления кадров немедленного воспроизведения из битового потока кодированных звуковых данных.The present invention generally relates to audio encoders, coding methods, audio decoders, and decoding methods, including a method for decoding an encoded audio data bitstream, wherein the encoded audio data bitstream represents a sequence of audio sample values and contains a plurality of frames, each frame contains associated encoded audio sample values, and a method for generating an encoded audio data bitstream with instant playback frames. The present invention also relates to an apparatus for generating instant play frames in an encoded audio bitstream or for removing instant play frames from an encoded audio bitstream.

Хотя некоторые варианты осуществления будут описаны в настоящем документе с конкретной ссылкой на настоящее изобретение, следует понимать, что настоящее изобретение не ограничено такой областью использования и может быть применимо в более широких контекстах.Although some embodiments will be described herein with specific reference to the present invention, it should be understood that the present invention is not limited to such a field of use and may be applicable in broader contexts.

ПРЕДПОСЫЛКИ ИЗОБРЕТЕНИЯBACKGROUND OF THE INVENTION

В настоящее время существует недостаток в стандарте MPEG-4 Audio, изложенном в ISO/IEC 14496-3, «Кодирование аудиовизуальных объектов. Часть 3. Аудио», в контексте генерирования, передачи и обработки кадров немедленного воспроизведения (IPF). IPF предоставляет информацию в специальный кадр, который позволяет немедленно инициализировать декодер, и, таким образом, осуществить немедленное воспроизведение при переключении на поток данных, содержащих специальный кадр. Иными словами, IPF представляет собой кадр, после приема которого декодер может немедленно производить правильные отсчеты из первого отсчета, который закодирован в этот IPF, поскольку он содержит всю необходимую для этого информацию. IPF, таким образом, обозначает независимо декодируемый кадр, который может быть декодирован с использованием информации, содержащейся только в нем самом.There is currently a flaw in the MPEG-4 Audio standard as outlined in ISO/IEC 14496-3, Audiovisual Object Coding. Part 3: Audio" in the context of generating, transmitting and processing instant playback frames (IPF). The IPF provides information in a special frame, which allows the decoder to be initialized immediately, and thus perform immediate playback when switching to the data stream containing the special frame. In other words, the IPF is a frame upon receipt of which the decoder can immediately make correct samples from the first sample encoded in this IPF, since it contains all the information necessary for this. IPF thus denotes an independently decodable frame that can be decoded using information contained within itself alone.

Кодированный звуковой сигнал обычно поступает в виде кадров или фрагментов данных. В контексте звукового сигнала, стандартизованного согласно MPEG-4, кадры/фрагменты могут быть известны как гранулы, кодированные фрагменты/кадры называются блоками доступа (AU), а декодированные фрагменты называются составными блоками (CU). В системах передачи звуковой сигнал может быть доступен и адресован только в гранулярности этих кодированных фрагментов (блоков доступа).The encoded audio signal usually comes in the form of frames or chunks of data. In the context of MPEG-4 standardized audio, frames/chunks may be known as granules, coded chunks/frames are called access units (AUs), and decoded chunks are called compound units (CUs). In transmission systems, the audio signal can only be accessed and addressed in the granularity of these encoded fragments (access blocks).

В контексте адаптивного потокового вещания, когда звуковой сигнал переключается на другую конфигурацию (например, другую битовую скорость, такую как битовая скорость, настроенная в рамках адаптации, установленной в MPEG-DASH), для воспроизведения отсчетов звукового сигнала точно с начала, в декодер необходимо подать AU_n, представляющий соответствующий временной отрезок звуковой программы, и дополнительные AU_n-1, AU_n-2, …AU и данные конфигурации, предшествующие AU_n. В ином случае, из-за разных конфигураций кодирования (например, данные оконного преобразования, данные, связанные с SBR, данные, связанные с PS) нельзя гарантировать, что декодер произведет правильный вывод при декодировании только AU_n. Таким образом, первый AU_n, который должен быть декодирован с новой конфигурацией, должен переносить новые данные конфигурации и все предварительно загруженные данные (в форме AU_n-x, представляющего временные отрезки до AU_n), которые нужны для инициализации декодера с новой конфигурацией. Это можно осуществить посредством кадра немедленного воспроизведения (IPF), как определено в стандарте MPEG-H 3D Audio, или в стандарте MPEG-D USAC.In the context of adaptive streaming, when the audio signal is switched to a different configuration (e.g., a different bit rate, such as the bit rate configured within the adaptation set in MPEG-DASH), in order to reproduce the audio samples exactly from the beginning, the decoder needs to be fed AU _n representing the corresponding time section of the audio program, and additional AU _n-1 , AU _n-2 , ...AU and configuration data preceding AU _n . Otherwise, due to different encoding configurations (eg, windowing data, SBR-related data, PS-related data), it cannot be guaranteed that the decoder will produce correct output when decoding only AU _n . Thus, the first AU _n to be decoded with the new configuration must carry the new configuration data and any preloaded data (in the form of AU _nx representing time slices before AU _n ) needed to initialize the decoder with the new configuration. This can be done by means of an instant playback frame (IPF) as defined in the MPEG-H 3D Audio standard, or in the MPEG-D USAC standard.

С учетом вышеуказанного, целью настоящего изобретения является предоставление звукового декодера и способа декодирования, а также звукового кодера, системы звуковых кодеров, устройства и способа кодирования, которые могут обрабатывать IPF в MPEG-4 Audio.In view of the above, it is an object of the present invention to provide an audio decoder and a decoding method, as well as an audio encoder, an audio encoder system, an apparatus and an encoding method that can process IPF in MPEG-4 Audio.

СУЩНОСТЬ ИЗОБРЕТЕНИЯSUMMARY OF THE INVENTION

В соответствии с первым аспектом настоящего изобретения предлагается звуковой декодер для декодирования битового потока кодированных звуковых данных, причем битовый поток кодированных звуковых данных представляет последовательность значений отсчетов звукового сигнала и содержит множество кадров, причем каждый кадр содержит связанные кодированные значения отсчетов звукового сигнала.According to a first aspect of the present invention, an audio decoder is provided for decoding an encoded audio data bitstream, wherein the encoded audio data bitstream represents a sequence of audio sample values and comprises a plurality of frames, each frame containing associated audio sample encoded values.

Звуковой декодер может содержать блок определения, выполненный с возможностью определения того, является ли кадр битового потока кодированных звуковых данных кадром немедленного воспроизведения, который содержит кодированные значения отсчетов звукового сигнала, связанные с текущим кадром, и дополнительную информацию, причем дополнительная информация может содержать кодированные значения отсчетов звукового сигнала некоторого количества кадров, предшествующих кадру немедленного воспроизведения, причем кодированные значения отсчетов звукового сигнала предшествующих кадров могут быть закодированы с использованием той же конфигурации кодеков, что и текущий кадр, причем количество предшествующих кадров, соответствующих предварительно загруженным кадрам, может соответствовать количеству кадров, которые необходимы декодеру для формирования полного сигнала, чтобы иметь возможность выводить действительные значения отсчетов звукового сигнала, связанные с текущим кадром, каждый раз, когда декодируется кадр немедленного воспроизведения.The audio decoder may comprise a determination unit configured to determine whether a frame of the encoded audio data bitstream is an instant playback frame that contains encoded audio sample values associated with the current frame and side information, wherein the side information may comprise encoded sample values. audio signal of a certain number of frames preceding the immediate playback frame, and the encoded values of the samples of the audio signal of the previous frames can be encoded using the same codec configuration as the current frame, and the number of previous frames corresponding to preloaded frames can correspond to the number of frames that necessary for the decoder to generate a complete signal in order to be able to output the actual audio sample values associated with the current frame each time an instant playback frame is decoded.

И декодер может содержать блок инициализации, выполненный с возможностью инициализации декодера, если блок определения определяет, что кадр представляет собой кадр немедленного воспроизведения, причем инициализация декодера может включать декодирование кодированных значений отсчетов звукового сигнала, содержащихся в дополнительной информации, перед декодированием кодированных значений отсчетов звукового сигнала, связанных с текущим кадром, причем блок инициализации может быть выполнен с возможностью переключения звукового декодера с текущей конфигурации кодеков на другую конфигурацию кодеков, если блок определения определяет, что кадр представляет собой кадр немедленного воспроизведения, и если значения отсчетов звукового сигнала текущего кадра были закодированы с использованием другой конфигурации кодеков, и причем декодер может быть выполнен с возможностью декодирования текущего кадра с использованием текущей конфигурации кодеков и отбрасывания дополнительной информации, если блок определения определяет, что кадр представляет собой кадр немедленного воспроизведения и если значения отсчетов звукового сигнала текущего кадра были закодированы с использованием текущей конфигурации кодеков.And, the decoder may comprise an initialization block configured to initialize the decoder if the determining block determines that the frame is an immediate playback frame, wherein the decoder initialization may include decoding the encoded audio sample values contained in the side information before decoding the encoded audio sample values. associated with the current frame, wherein the initializer may be configured to switch the audio decoder from the current codec configuration to another codec configuration if the determiner determines that the frame is an immediate playback frame, and if the audio sample values of the current frame have been encoded with using a different codec configuration, and wherein the decoder may be configured to decode the current frame using the current codec configuration and discard the side information if the determiner determines that the frame is an immediate playback frame and if the audio sample values of the current frame were encoded using current codec configuration.

В некоторых вариантах осуществления дополнительная информация может дополнительно содержать информацию о конфигурации кодеков, которая используется для кодирования значений отсчетов звукового сигнала, связанных с текущим кадром, и блок определения может дополнительно быть выполнен с возможностью определения того, отличается ли конфигурация кодеков дополнительной информации от текущей конфигурации кодеков.In some embodiments, the side information may further comprise codec configuration information that is used to encode audio sample values associated with the current frame, and the determiner may further be configured to determine if the side information codec configuration is different from the current codec configuration. .

В некоторых вариантах осуществления кадр немедленного воспроизведения может содержать дополнительную информацию в качестве полезной нагрузки расширения, и блок определения может быть выполнен с возможностью оценки полезной нагрузки расширения кадра немедленного воспроизведения.In some embodiments, the instant play frame may contain additional information as an extension payload, and the determiner may be configured to evaluate the extension payload of the instant play frame.

В некоторых вариантах осуществления битовый поток кодированных звуковых данных может представлять собой битовый поток MPEG-4 Audio.In some embodiments, the encoded audio data bitstream may be an MPEG-4 Audio bitstream.

В некоторых вариантах осуществления дополнительная информация может передаваться посредством механизма расширения битового потока MPEG-4 Audio, который представляет собой либо элемент потока данных (DSE), либо элемент extension_payload.In some embodiments, the additional information may be conveyed via an MPEG-4 Audio bitstream extension mechanism that is either a data stream element (DSE) or an extension_payload element.

В некоторых вариантах осуществления либо элемент потока данных (DSE), либо элемент extension_payload может быть расположен в заданном положении в битовом потоке MPEG-4 Audio и/или может иметь специальную конкретную метку, сообщающую, что полезная нагрузка элемента потока данных (DSE) или элемента extension_payload представляет собой дополнительную информацию.In some embodiments, either a data stream element (DSE) or an extension_payload element may be located at a given position in the MPEG-4 Audio bitstream and/or may have a specific specific label indicating that the payload of a data stream element (DSE) or element extension_payload is additional information.

Элемент extension_payload может, например, находиться в разных местах синтаксиса битового потока MPEG-4 Audio. Соответственно, это позволяет использовать функциональность кадра немедленного воспроизведения также в MPEG-4 Audio.The extension_payload element may, for example, appear in different places in the MPEG-4 Audio bitstream syntax. Accordingly, this allows the instant playback frame functionality to be used also in MPEG-4 Audio.

В некоторых вариантах осуществления элемент extension_payload может находиться внутри заполняющего элемента (ID_FIL).In some embodiments, the extension_payload element may be within a fill element (ID_FIL).

В некоторых вариантах осуществления дополнительная информация может дополнительно содержать уникальный идентификатор, и необязательно уникальный идентификатор может использоваться для обнаружения другой конфигурации кодеков.In some embodiments, the additional information may further comprise a unique identifier, and optionally the unique identifier may be used to detect a different codec configuration.

В некоторых вариантах осуществления декодер может дополнительно содержать блок плавного микширования, выполненный с возможностью выполнения плавного микширования выходных значений отсчетов, полученных за счет сброса декодера в предыдущей конфигурации кодеков, и выходных значений отсчетов, полученных за счет декодирования кодированных значений отсчетов звукового сигнала, связанных с текущим кадром.In some embodiments, the decoder may further comprise a smooth mixing unit configured to perform smooth mixing of output sample values obtained by resetting the decoder in a previous codec configuration and output sample values obtained by decoding the encoded audio sample values associated with the current overs.

В некоторых вариантах осуществления самый ранний кадр из количества кадров, содержащихся в дополнительной информации, может не быть подвергнут временному дифференциальному кодированию или энтропийному кодированию относительно любого кадра до самого раннего кадра, и кадр немедленного воспроизведения может не быть подвергнут временному дифференциальному кодированию или энтропийному кодированию относительно любого кадра до самого раннего кадра из количества кадров, предшествующих кадру немедленного воспроизведения, или относительно любого кадра до кадра немедленного воспроизведения.In some embodiments, the earliest frame of the number of frames contained in the side information may not be temporally differential or entropy encoded with respect to any frame prior to the earliest frame, and the immediate playback frame may not be temporally differential or entropy encoded with respect to any frame to the earliest frame of the number of frames preceding the instant playback frame, or relative to any frame before the immediate playback frame.

В соответствии со вторым аспектом настоящего изобретения предлагается способ декодирования битового потока кодированных звуковых данных, причем битовый поток кодированных звуковых данных представляет последовательность значений отсчетов звукового сигнала и содержит множество кадров, причем каждый кадр содержит связанные кодированные значения отсчетов звукового сигнала.According to a second aspect of the present invention, there is provided a method for decoding an encoded audio bitstream, wherein the encoded audio bitstream represents a sequence of audio sample values and comprises a plurality of frames, each frame containing associated encoded audio sample values.

Способ может включать определение того, является ли кадр битового потока кодированных звуковых данных кадром немедленного воспроизведения, который содержит кодированные значения отсчетов звукового сигнала, связанные с текущим кадром, и дополнительную информацию, причем дополнительная информация может содержать кодированные значения отсчетов звукового сигнала некоторого количества кадров, предшествующих кадру немедленного воспроизведения, причем кодированные значения отсчетов звукового сигнала предшествующих кадров могут быть закодированы с использованием той же конфигурации кодеков, что и кадр немедленного воспроизведения, причем количество предшествующих кадров, соответствующих предварительно загруженным кадрам, может соответствовать количеству кадров, которые необходимы декодеру для формирования полного сигнала, чтобы иметь возможность выводить действительные значения отсчетов звукового сигнала, связанные с текущим кадром, каждый раз, когда декодируется кадр немедленного воспроизведения.The method may include determining if a frame of the encoded audio data bitstream is an instant playback frame that contains encoded audio sample values associated with the current frame and side information, the side information may include encoded audio sample values of a number of frames preceding instant playback frame, wherein the encoded audio sample values of previous frames may be encoded using the same codec configuration as the immediate playback frame, wherein the number of previous frames corresponding to preloaded frames may correspond to the number of frames that the decoder needs to generate the complete signal to be able to output the actual audio sample values associated with the current frame each time an instant playback frame is decoded.

Способ может дополнительно включать инициализацию декодера, если определяют, что кадр представляет собой кадр немедленного воспроизведения, причем инициализация может включать декодирование кодированных значений отсчетов звукового сигнала, содержащихся в дополнительной информации, перед декодированием кодированных значений отсчетов звукового сигнала, связанных с текущим кадром.The method may further include initializing the decoder if the frame is determined to be an instant playback frame, wherein the initialization may include decoding the coded audio sample values contained in the side information before decoding the coded audio sample values associated with the current frame.

Способ может дополнительно включать переключение звукового декодера с текущей конфигурации кодеков на другую конфигурацию кодеков, если определяют, что кадр представляет собой кадр немедленного воспроизведения, и если значения отсчетов звукового сигнала кадра немедленного воспроизведения были закодированы с использованием другой конфигурации кодеков.The method may further include switching the audio decoder from the current codec configuration to another codec configuration if the frame is determined to be an immediate playback frame and if the audio sample values of the immediate playback frame were encoded using a different codec configuration.

И способ может включать декодирование кадра немедленного воспроизведения с использованием текущей конфигурации кодеков и отбрасывание дополнительной информации, если определяют, что кадр представляет собой кадр немедленного воспроизведения, и если значения отсчетов звукового сигнала кадра немедленного воспроизведения были закодированы с использованием текущей конфигурации кодеков.And the method may include decoding the instant playback frame using the current codec configuration and discarding additional information if the frame is determined to be an immediate playback frame and if the audio sample values of the immediate playback frame were encoded using the current codec configuration.

Сконфигурированный как предложено, способ позволяет, например, переключать AudioObjectTypes (типы звукового объекта) (AOT), как определено в ISO/IEC 14496-3, в комбинации с непрерывным созданием правильных выходных отсчетов и без внесения периодов тишины в звуковой вывод.Configured as proposed, the method allows, for example, switching AudioObjectTypes (Audio Object Types) (AOT) as defined in ISO/IEC 14496-3, in combination with continuously generating correct output samples and without introducing periods of silence into the audio output.

В некоторых вариантах осуществления дополнительная информация может дополнительно содержать информацию о конфигурации кодеков, которая используется для кодирования значений отсчетов звукового сигнала, связанных с текущим кадром, причем способ может дополнительно включать определение того, отличается ли конфигурация кодеков дополнительной информации от текущей конфигурации кодеков, используемой для кодирования значений отсчетов звукового сигнала, связанных с кадрами в битовом потоке, предшествующими кадру немедленного воспроизведения.In some embodiments, the side information may further comprise codec configuration information that is used to encode audio sample values associated with the current frame, the method may further include determining if the side information codec configuration is different from the current codec configuration used for encoding. audio sample values associated with frames in the bitstream prior to the immediate playback frame.

В некоторых вариантах осуществления дополнительная информация может передаваться посредством механизма расширения битового потока MPEG-4 Audio, который представляет собой либо элемент потока данных (ID_DSE), либо элемент extension_payload.In some embodiments, the additional information may be conveyed via the MPEG-4 Audio bitstream extension mechanism, which is either a data stream element (ID_DSE) or an extension_payload element.

В некоторых вариантах осуществления либо элемент потока данных (ID_DSE), либо элемент extension_payload может быть расположен в заданном положении в битовом потоке MPEG-4 Audio и/или может иметь специальную конкретную метку, сообщающую, что полезная нагрузка элемента потока данных (ID_DSE) или элемента extension_payload представляет собой дополнительную информацию.In some embodiments, either the data stream element (ID_DSE) or the extension_payload element may be located at a given position in the MPEG-4 Audio bitstream and/or may have a specific specific label indicating that the payload of the data stream element (ID_DSE) or the element extension_payload is additional information.

В некоторых вариантах осуществления битовый поток кодированных звуковых данных может содержать первое количество кадров, закодированных с использованием первой конфигурации кодеков, и второе количество кадров, следующих за первым количеством кадров и закодированных с использованием второй конфигурации кодеков, причем первый кадр из второго количества кадров может представлять собой кадр немедленного воспроизведения.In some embodiments, the encoded audio data bitstream may comprise a first number of frames encoded using a first codec configuration and a second number of frames following the first frame number and encoded using a second codec configuration, where the first frame of the second number of frames may be instant playback frame.

В соответствии с третьим аспектом настоящего изобретения предлагается звуковой кодер для генерирования битового потока кодированных звуковых данных с кадрами немедленного воспроизведения, причем битовый поток кодированных звуковых данных представляет последовательность значений отсчетов звукового сигнала и содержит множество кадров, причем каждый кадр содержит связанные кодированные значения отсчетов звукового сигнала.According to a third aspect of the present invention, an audio encoder is provided for generating an encoded audio data bitstream with instant play frames, the encoded audio data bitstream representing a sequence of audio sample values and comprising a plurality of frames, each frame containing associated encoded audio sample values.

Звуковой кодер может содержать базовый кодер, выполненный с возможностью кодирования несжатых значений отсчетов звукового сигнала, связанных с множеством кадров, с использованием заданной конфигурации кодеков.The audio encoder may comprise a core encoder configured to encode uncompressed audio sample values associated with a plurality of frames using a predetermined codec configuration.

Звуковой кодер может дополнительно содержать буфер, выполненный с возможностью сохранения кодированных значений отсчетов звукового сигнала некоторого количества кадров, предшествующих текущему кадру, из множества кадров, закодированных с использованием заданной конфигурации кодеков.The audio encoder may further comprise a buffer configured to store encoded audio sample values of a number of frames preceding the current frame from a plurality of frames encoded using a given codec configuration.

И звуковой кодер может содержать эмбеддер, выполненный с возможностью записи кадра немедленного воспроизведения в текущий кадр из множества кадров, причем кадр немедленного воспроизведения может содержать кодированные значения отсчетов звукового сигнала, связанные с указанным текущим кадром, и дополнительную информацию, соответствующую кодированным значениям отсчетов звукового сигнала количества кадров, предшествующих указанному текущему кадру.And the audio encoder may comprise an embedder configured to record an immediate playback frame into a current frame of a plurality of frames, wherein the immediate playback frame may contain encoded audio sample values associated with said current frame and side information corresponding to encoded audio sample values of the number frames preceding the specified current frame.

В некоторых вариантах осуществления эмбеддер может быть дополнительно выполнен с возможностью включения информации о заданной конфигурации кодеков в дополнительную информацию.In some embodiments, the implementation of the embedder may be further configured to include information about a given codec configuration in the additional information.

В некоторых вариантах осуществления эмбеддер может быть дополнительно выполнен с возможностью включения дополнительной информации в кадр немедленного воспроизведения.In some embodiments, the implementation of the embedder may be further configured to include additional information in the frame of immediate playback.

В некоторых вариантах осуществления сгенерированный битовый поток кодированных звуковых данных может представлять собой битовый поток MPEG-4 Audio.In some embodiments, the generated encoded audio data bitstream may be an MPEG-4 Audio bitstream.

В некоторых вариантах осуществления эмбеддер может быть дополнительно выполнен с возможностью ввода дополнительной информации в битовый поток посредством механизма расширения битового потока MPEG-4 Audio, который представляет собой либо элемент потока данных (ID_DSE), либо элемент extension_payload.In some embodiments, the embedder may be further configured to insert additional information into the bitstream via the MPEG-4 Audio bitstream extension mechanism, which is either a data stream element (ID_DSE) or an extension_payload element.

В некоторых вариантах осуществления эмбеддер может быть дополнительно выполнен с возможностью размещения либо элемента потока данных (ID_DSE), либо элемента extension_payload в заданном положении в битовом потоке MPEG-4 Audio и/или присвоения специальной конкретной метки, сообщающей, что полезная нагрузка элемента потока данных (ID_DSE) или элемента extension_payload представляет собой дополнительную информацию.In some embodiments, the embedder may be further configured to place either a data stream element (ID_DSE) or an extension_payload element at a given position in the MPEG-4 Audio bitstream and/or assign a specific specific label indicating that the payload of the data stream element ( ID_DSE) or extension_payload element is additional information.

В некоторых вариантах осуществления эмбеддер может быть дополнительно выполнен с возможностью ввода элемента extension_payload внутрь заполняющего элемента (ID_FIL).In some embodiments, the implementation of the embedder may be further configured to include the extension_payload element inside the fill element (ID_FIL).

В некоторых вариантах осуществления эмбеддер может быть дополнительно выполнен с возможностью включения уникального идентификатора в дополнительную информацию, и необязательно уникальный идентификатор может сообщать заданную конфигурацию кодеков.In some embodiments, the implementation of the embedder may be further configured to include a unique identifier in the additional information, and optionally a unique identifier may communicate a given configuration of codecs.

В некоторых вариантах осуществления звуковой кодер может быть дополнительно выполнен с возможностью не подвергать временному дифференциальному кодированию или энтропийному кодированию самый ранний кадр из количества кадров, содержащихся в дополнительной информации, относительно любого кадра до самого раннего кадра, и звуковой кодер может быть дополнительно выполнен с возможностью не подвергать временному дифференциальному кодированию или энтропийному кодированию кадр немедленного воспроизведения относительно любого кадра до самого раннего кадра из количества кадров, предшествующих кадру немедленного воспроизведения, или относительно любого кадра до кадра немедленного воспроизведения.In some embodiments, the audio encoder may be further configured to not temporally differential or entropy encode the earliest frame of the number of frames contained in the side information relative to any frame prior to the earliest frame, and the audio encoder may be further configured not to temporal differential encoding or entropy encoding of the instant playback frame with respect to any frame up to the earliest frame of the number of frames preceding the instant playback frame, or with respect to any frame before the instant playback frame.

В соответствии с четвертым аспектом настоящего изобретения предлагается система, содержащая два или более звуковых кодеров для генерирования множества битовых потоков кодированных звуковых данных, каждый из которых имеет кадры немедленного воспроизведения, причем каждый битовый поток кодированных звуковых данных представляет последовательность значений отсчетов звукового сигнала и содержит множество кадров, и причем каждый кадр содержит связанные кодированные значения отсчетов звукового сигнала.According to a fourth aspect of the present invention, there is provided a system comprising two or more audio encoders for generating a plurality of encoded audio data bitstreams, each having immediate playback frames, each encoded audio data bitstream representing a sequence of audio sample values and comprising a plurality of frames , and each frame contains the associated encoded audio sample values.

В некоторых вариантах осуществления предварительно определенная частота дискретизации может быть одинаковой для каждого из базовых кодеров двух или более звуковых кодеров. Соответственно, можно избежать передискретизации и обработки с дополнительной задержкой на декодере.In some embodiments, the predetermined sampling rate may be the same for each of the base encoders of the two or more audio encoders. Accordingly, oversampling and additional delay processing at the decoder can be avoided.

В некоторых вариантах осуществления система может дополнительно содержать блок выравнивания задержки для выравнивания задержки множества битовых потоков. Соответственно, это позволяет осуществлять плавное переключение на декодере за счет компенсации задержек разных кодеров.In some embodiments, the system may further comprise a delay equalizer for delay equalization of multiple bitstreams. Accordingly, this allows for smooth switching at the decoder by compensating for the delays of different encoders.

В соответствии с пятым аспектом настоящего изобретения предлагается способ генерирования посредством звукового кодера битового потока кодированных звуковых данных с кадрами немедленного воспроизведения, причем битовый поток кодированных звуковых данных представляет последовательность значений отсчетов звукового сигнала и содержит множество кадров, причем каждый кадр содержит связанные кодированные значения отсчетов звукового сигнала.According to a fifth aspect of the present invention, there is provided a method for generating, by an audio encoder, an encoded audio data bitstream with instant play frames, the encoded audio data bitstream representing a sequence of audio sample values and comprising a plurality of frames, each frame containing associated encoded audio sample values. .

Способ может включать этап кодирования посредством базового кодера несжатых значений отсчетов звукового сигнала, связанных с множеством кадров, с использованием заданной конфигурации кодеков.The method may include the step of encoding, by means of a core encoder, uncompressed audio sample values associated with a plurality of frames using a predetermined codec configuration.

Способ может дополнительно включать этап сохранения посредством буфера кодированных значений отсчетов звукового сигнала некоторого количества кадров, предшествующих текущему кадру, из множества кадров, закодированных с использованием заданной конфигурации кодеков.The method may further include the step of storing, via a buffer of encoded audio sample values, a number of frames preceding the current frame from a plurality of frames encoded using a given codec configuration.

И способ может включать этап записи посредством эмбеддера кадра немедленного воспроизведения в текущий кадр из множества кадров, причем кадр немедленного воспроизведения может содержать кодированные значения отсчетов звукового сигнала, связанные с указанным текущим кадром, и дополнительную информацию, соответствующую кодированным значениям отсчетов звукового сигнала количества кадров, предшествующих указанному текущему кадру.And the method may include the step of recording, by means of an instant playback frame embedder, into a current frame of a plurality of frames, wherein the immediate playback frame may contain encoded audio sample values associated with said current frame and additional information corresponding to the encoded audio sample values of the number of frames preceding specified current frame.

В некоторых вариантах осуществления дополнительная информация может дополнительно содержать информацию о заданной конфигурации кодеков.In some embodiments, the additional information may further comprise information about the specified codec configuration.

В некоторых вариантах осуществления кадр немедленного воспроизведения может дополнительно содержать дополнительную информацию.In some embodiments, the instant play frame may further comprise additional information.

В некоторых вариантах осуществления дополнительная информация может быть введена в битовый поток посредством эмбеддера с помощью механизма расширения битового потока MPEG-4 Audio, который может представлять собой либо элемент потока данных (ID_DSE), либо элемент extension_payload.In some embodiments, additional information may be introduced into the bitstream by an embedder using the MPEG-4 Audio bitstream extension mechanism, which may be either a data stream element (ID_DSE) or an extension_payload element.

В некоторых вариантах осуществления либо элемент потока данных (ID_DSE), либо элемент extension_payload может быть размещен посредством эмбеддера в заданном положении в битовом потоке MPEG-4 Audio и/или ему может быть присвоена специальная конкретная метка, сообщающая, что полезная нагрузка элемента потока данных (ID_DSE) или элемента extension_payload представляет собой дополнительную информацию.In some embodiments, either the data stream element (ID_DSE) or the extension_payload element can be placed by the embedder at a given position in the MPEG-4 Audio bitstream and/or it can be assigned a specific specific label indicating that the payload of the data stream element ( ID_DSE) or extension_payload element is additional information.

В некоторых вариантах осуществления элемент extension_payload может быть введен посредством эмбеддера внутрь заполняющего элемента (ID_FIL).In some embodiments, the extension_payload element may be inserted via an embedder inside a fill element (ID_FIL).

В некоторых вариантах осуществления дополнительная информация может дополнительно содержать уникальный идентификатор, и необязательно уникальный идентификатор может сообщать заданную конфигурацию кодеков.In some embodiments, the additional information may further comprise a unique identifier, and optionally, the unique identifier may communicate a predetermined codec configuration.

В некоторых вариантах осуществления посредством звукового кодера самый ранний кадр из количества кадров, содержащихся в дополнительной информации, может не быть подвергнут временному дифференциальному кодированию или энтропийному кодированию относительно любого кадра до самого раннего кадра, и посредством звукового кодера кадр немедленного воспроизведения может не быть подвергнут временному дифференциальному кодированию или энтропийному кодированию относительно любого кадра до самого раннего кадра из количества кадров, предшествующих кадру немедленного воспроизведения, или относительно любого кадра до кадра немедленного воспроизведения.In some embodiments, by an audio encoder, the earliest frame of the number of frames contained in the side information may not be temporally differentially coded or entropy encoded with respect to any frame prior to the earliest frame, and by an audio encoder, an immediate playback frame may not be temporally differentially coded. coding or entropy coding with respect to any frame up to the earliest frame of the number of frames preceding the immediate playback frame, or with respect to any frame before the immediate playback frame.

В соответствии с шестым аспектом настоящего изобретения предлагается устройство для генерирования кадров немедленного воспроизведения в битовом потоке кодированных звуковых данных или для удаления кадров немедленного воспроизведения из битового потока кодированных звуковых данных, причем битовый поток кодированных звуковых данных представляет последовательность значений отсчетов звукового сигнала и содержит множество кадров, причем каждый кадр содержит связанные кодированные значения отсчетов звукового сигнала.According to a sixth aspect of the present invention, there is provided an apparatus for generating instant playback frames in an encoded audio bitstream or for removing instant playback frames from an encoded audio bitstream, wherein the encoded audio bitstream represents a sequence of audio sample values and contains a plurality of frames, wherein each frame contains associated encoded audio sample values.

Устройство может содержать приемник, выполненный с возможностью приема битового потока кодированных звуковых данных, причем битовый поток кодированных звуковых данных представляет последовательность значений отсчетов звукового сигнала и содержит множество кадров, причем каждый кадр содержит связанные кодированные значения отсчетов звукового сигнала.The apparatus may comprise a receiver configured to receive an encoded audio data bitstream, wherein the encoded audio data bitstream represents a sequence of audio sample values and comprises a plurality of frames, each frame containing associated encoded audio sample values.

И устройство может содержать эмбеддер, выполненный с возможностью записи кадра немедленного воспроизведения в текущий кадр из множества кадров, причем кадр немедленного воспроизведения может содержать кодированные значения отсчетов звукового сигнала, связанные с указанным текущим кадром, и дополнительную информацию, соответствующую кодированным значениям отсчетов звукового сигнала некоторого количества кадров, предшествующих указанному текущему кадру.And the device may contain an embedder configured to record an immediate playback frame into the current frame of a plurality of frames, wherein the immediate playback frame may contain encoded audio sample values associated with the specified current frame, and additional information corresponding to the encoded audio sample values of a certain number frames preceding the specified current frame.

Сконфигурированное как предложено, устройство позволяет по отдельности генерировать кадры немедленного воспроизведения в любых уже существующих битовых потоках кодированных звуковых данных, т. е. перед распределением, если это необходимо.Configured as proposed, the device allows immediate playback frames to be individually generated in any pre-existing encoded audio data bitstreams, i.e., prior to distribution, if desired.

В некоторых вариантах осуществления устройство может дополнительно содержать буфер, выполненный с возможностью сохранения кодированных значений отсчетов звукового сигнала количества кадров, предшествующих текущему кадру, из множества кадров.In some embodiments, the apparatus may further comprise a buffer configured to store encoded audio sample values of the number of frames preceding the current frame from the plurality of frames.

В некоторых вариантах осуществления эмбеддер может быть дополнительно выполнен с возможностью удаления из кадра немедленного воспроизведения дополнительной информации, соответствующей кодированным значениям отсчетов звукового сигнала количества кадров, предшествующих указанному текущему кадру.In some embodiments, the implementation of the embedder may be further configured to remove additional information from the immediate playback frame corresponding to the encoded values of the audio samples of the number of frames preceding the specified current frame.

Соответственно, это позволяет по отдельности удалять кадры немедленного воспроизведения из битового потока кодированных звуковых данных, например, в случае если звуковой кодер генерирует только кадры немедленного воспроизведения.Accordingly, this allows immediate playback frames to be individually removed from the encoded audio data bitstream, for example, in the case where the audio encoder only generates immediate playback frames.

В соответствии с седьмым аспектом настоящего изобретения предлагается постоянный цифровой носитель данных, на котором хранится компьютерная программа для выполнения способа декодирования битового потока кодированных звуковых данных, причем битовый поток кодированных звуковых данных представляет последовательность значений отсчетов звукового сигнала и содержит множество кадров, причем каждый кадр содержит связанные кодированные значения отсчетов звукового сигнала, когда указанная компьютерная программа исполняется компьютером или процессором.According to a seventh aspect of the present invention, a permanent digital storage medium is provided on which a computer program is stored for performing a method for decoding an encoded audio data bitstream, wherein the encoded audio data bitstream represents a sequence of audio sample values and contains a plurality of frames, each frame containing associated encoded values of audio signal samples when said computer program is executed by a computer or processor.

В соответствии с восьмым аспектом настоящего изобретения предлагается постоянный цифровой носитель данных, на котором хранится компьютерная программа для выполнения способа генерирования посредством звукового кодера битового потока кодированных звуковых данных с кадрами немедленного воспроизведения, причем битовый поток кодированных звуковых данных представляет последовательность значений отсчетов звукового сигнала и содержит множество кадров, причем каждый кадр содержит связанные кодированные значения отсчетов звукового сигнала, когда указанная компьютерная программа исполняется компьютером или процессором.According to an eighth aspect of the present invention, a permanent digital storage medium is provided on which a computer program is stored for performing a method for generating, by means of an audio encoder, an encoded audio data bitstream with instant playback frames, the encoded audio data bitstream representing a sequence of audio sample values and comprising a plurality of frames, each frame containing associated encoded audio sample values when said computer program is being executed by a computer or processor.

КРАТКОЕ ОПИСАНИЕ ГРАФИЧЕСКИХ МАТЕРИАЛОВBRIEF DESCRIPTION OF GRAPHICS

Иллюстративные варианты осуществления настоящего изобретения ниже будут описаны в качестве только примера со ссылкой на сопроводительные графические материалы, на которых:Exemplary embodiments of the present invention will be described below, by way of example only, with reference to the accompanying drawings, in which:

на фиг. 1 изображен пример кадра немедленного воспроизведения в битовом потоке MPEG-4 Audio кодированных звуковых данных;in fig. 1 shows an example of an instant playback frame in an MPEG-4 Audio bitstream of encoded audio data;

на фиг. 2 изображен пример способа декодирования битового потока кодированных звуковых данных, причем битовый поток кодированных звуковых данных представляет последовательность значений отсчетов звукового сигнала и содержит множество кадров, причем каждый кадр содержит связанные кодированные значения отсчетов звукового сигнала;in fig. 2 shows an exemplary method for decoding an encoded audio data bitstream, wherein the encoded audio data bitstream represents a sequence of audio sample values and comprises a plurality of frames, each frame containing associated encoded audio sample values;

на фиг. 3 изображен другой пример способа декодирования битового потока кодированных звуковых данных, причем битовый поток кодированных звуковых данных представляет последовательность значений отсчетов звукового сигнала и содержит множество кадров, причем каждый кадр содержит связанные кодированные значения отсчетов звукового сигнала;in fig. 3 shows another example of a method for decoding an encoded audio data bitstream, wherein the encoded audio data bitstream represents a sequence of audio sample values and comprises a plurality of frames, each frame containing associated encoded audio sample values;

на фиг. 4 изображен пример звукового декодера для декодирования битового потока кодированных звуковых данных, причем битовый поток кодированных звуковых данных представляет последовательность значений отсчетов звукового сигнала и содержит множество кадров, причем каждый кадр содержит связанные кодированные значения отсчетов звукового сигнала;in fig. 4 shows an exemplary audio decoder for decoding an encoded audio data bitstream, wherein the encoded audio data bitstream represents a sequence of audio sample values and comprises a plurality of frames, each frame containing associated coded audio sample values;

на фиг. 5 изображен пример звукового кодера для генерирования битового потока кодированных звуковых данных с точками произвольного доступа (кадры немедленного воспроизведения, IPF);in fig. 5 shows an example of an audio encoder for generating an encoded audio data bitstream with random access points (immediate play frames, IPFs);

на фиг. 6 изображен пример способа генерирования битового потока кодированных звуковых данных с точками произвольного доступа (кадры немедленного воспроизведения, IPF);in fig. 6 shows an example of a method for generating an encoded audio data bitstream with random access points (immediate play frames, IPFs);

на фиг. 7 изображен пример системы звуковых кодеров для генерирования множества битовых потоков кодированных звуковых данных, каждый из которых имеет синхронизированные во времени точки произвольного доступа (кадры немедленного воспроизведения, IPF);in fig. 7 shows an exemplary system of audio encoders for generating a plurality of encoded audio data bitstreams, each having time-synchronized random access points (immediate play frames, IPFs);

на фиг. 8 изображен пример устройства для генерирования точек произвольного доступа (кадры немедленного воспроизведения, IPF) в битовом потоке кодированных звуковых данных или для удаления точек произвольного доступа (кадры немедленного воспроизведения, IPF) из битового потока кодированных звуковых данных;in fig. 8 shows an exemplary apparatus for generating random access points (immediate play frames, IPF) in an encoded audio bitstream or for removing random access points (immediate play frames, IPF) from an encoded audio bitstream;

на фиг. 9 изображен пример устройства, имеющего процессор для исполнения компьютерной программы, хранящейся на постоянном цифровом носителе данных.in fig. 9 shows an example of an apparatus having a processor for executing a computer program stored on a permanent digital storage medium.

ПОДРОБНОЕ ОПИСАНИЕDETAILED DESCRIPTION

Настоящее изобретение относится к созданию (кодированию), передаче (битовый поток) и обработке (декодирование) IPF в MPEG-4 Audio, например, в контексте потока данных, содержащего звук, стандартизованный в соответствии с другим стандартом, таким как стандарт MPEG-H 3D audio. Здесь и далее битовые потоки MPEG-4 Audio могут относиться к битовым потокам, совместимым со стандартом, изложенным в ISO/IEC 14496-3, «Кодирование аудиовизуальных объектов. Часть 3. Аудио», и всеми его будущими редакциями, исправлениями и поправками к нему («здесь и далее MPEG-4 Audio»). Для обеспечения функциональности IPF в MPEG-4 Audio, существует несколько вариантов для генерирования и передачи AU и данных конфигурации, предшествующих AU_n во времени, как часть того же пакета полезной нагрузки, который используется для AU_n. Это осуществляется для обеспечения правильного вывода отсчетов звукового сигнала из первого отсчета, полученного в результате декодирования AU_n.The present invention relates to the creation (encoding), transmission (bitstream) and processing (decoding) of an IPF in MPEG-4 Audio, for example in the context of a data stream containing audio standardized according to another standard, such as the MPEG-H 3D standard. audio. Hereinafter, MPEG-4 Audio bitstreams may refer to bitstreams conforming to the standard set out in ISO/IEC 14496-3, “Coding of audiovisual objects. Part 3 Audio" and all future editions, corrections and amendments to it ("hereinafter MPEG-4 Audio"). To provide IPF functionality in MPEG-4 Audio, there are several options for generating and transmitting AU and configuration data preceding AU _n in time as part of the same payload packet as used for AU _n . This is done to ensure that audio samples are correctly derived from the first sample resulting from the decoding of AU _n .

Битовый поток кодированных звуковых данных может содержать последовательность значений отсчетов звукового сигнала, например, пакеты полезной нагрузки. Битовый поток кодированных звуковых данных может дополнительно содержать множество кадров. Каждый кадр может содержать связанные кодированные значения отсчетов звукового сигнала. Другими словами, каждый пакет полезной нагрузки может принадлежать к соответствующему кадру или AU.The encoded audio data bitstream may contain a sequence of audio sample values, such as payload packets. The encoded audio data bitstream may further comprise a plurality of frames. Each frame may contain associated encoded audio sample values. In other words, each payload packet may belong to a corresponding frame or AU.

В одном примере пакет полезной нагрузки звука может соответствовать следующему синтаксису, как определено в ISO/IEC 14496-3:In one example, an audio payload packet may conform to the following syntax, as defined in ISO/IEC 14496-3:

raw_data_block() {raw_data_block() {

while( (id = id_syn_ele) != ID_END ){ 3 uimsbfwhile( (id = id_syn_ele ) != ID_END ){ 3 uimsbf

switch (id) {switch (id) {

case ID_SCE: single_channel_element(); break;case ID_SCE: single_channel_element(); break;

case ID_CPE: channel_pair_element(); break;case ID_CPE: channel_pair_element(); break;

case ID_CCE: coupling_channel_element(); break;case ID_CCE: coupling_channel_element(); break;

case ID_LFE: lfe_channel_element(); break;case ID_LFE: lfe_channel_element(); break;

case ID_DSE: data_stream_element(); break;case ID_DSE: data_stream_element(); break;

case ID_PCE: program_config_element(); break;case ID_PCE: program_config_element(); break;

case ID_FIL: fill_element();case ID_FIL: fill_element();

}}

byte_align()byte_align()

}}

Вышеописанный пакет полезной нагрузки звука может быть совместимым с текущей и будущими версиями стандарта MPEG, такими как стандарт MPEG-4 Audio. В одном варианте осуществления битовый поток кодированных звуковых данных может представлять собой битовый поток MPEG-4 Audio (т. е. битовый поток, совместимый со стандартом MPEG-4 Audio).The audio payload package described above may be compatible with current and future versions of the MPEG standard, such as the MPEG-4 Audio standard. In one embodiment, the encoded audio data bitstream may be an MPEG-4 Audio bitstream (ie, an MPEG-4 Audio standard compliant bitstream).

Кадр битового потока кодированных звуковых данных может представлять собой кадр немедленного воспроизведения (точка произвольного доступа, специальный кадр), содержащий кодированные значения отсчетов звукового сигнала, связанные с текущим кадром, и дополнительную информацию. Дополнительная информация может содержать кодированные значения отсчетов звукового сигнала некоторого количества кадров, предшествующих кадру немедленного воспроизведения, причем кодированные значения отсчетов звукового сигнала предшествующих кадров могут быть закодированы с использованием той же конфигурации кодеков, что и текущий кадр. Количество предшествующих кадров, соответствующих предварительно загруженным кадрам, может соответствовать количеству кадров, которые необходимы декодеру для формирования полного сигнала, чтобы иметь возможность выводить действительные значения отсчетов звукового сигнала, связанные с текущим кадром, каждый раз, когда декодируется кадр немедленного воспроизведения. Полный сигнал может, например, быть сформирован во время запуска или перезапуска декодера. Кадр немедленного воспроизведения может представлять собой, например, первый кадр после запуска декодера.The encoded audio data bitstream frame may be an instant play frame (random access point, special frame) containing encoded audio sample values associated with the current frame and additional information. The side information may comprise encoded audio sample values of a number of frames preceding the instant playback frame, wherein the encoded audio sample values of previous frames may be encoded using the same codec configuration as the current frame. The number of previous frames corresponding to the preloaded frames may correspond to the number of frames that the decoder needs to generate a complete signal to be able to output the actual audio sample values associated with the current frame each time an immediate playback frame is decoded. The complete signal may, for example, be generated during start-up or restart of the decoder. The instant play frame may be, for example, the first frame after the start of the decoder.

В одном варианте осуществления дополнительная информация может передаваться посредством механизма расширения битового потока MPEG-4, который может представлять собой либо элемент потока данных (ID_DSE), либо элемент extension_payload. Элемент extension_payload может, например, находиться в разных местах синтаксиса битового потока MPEG-4 Audio, например, на разных уровнях. В одном варианте осуществления элемент extension_payload может находиться внутри заполняющего элемента (ID_FIL).In one embodiment, the additional information may be conveyed by an MPEG-4 bitstream extension mechanism, which may be either a data stream element (ID_DSE) or an extension_payload element. The extension_payload element may, for example, be located in different places in the MPEG-4 Audio bitstream syntax, for example, at different levels. In one embodiment, the extension_payload element may be within a fill element (ID_FIL).

Дополнительная информация, таким образом, может передаваться посредством механизма расширения битового потока MPEG-4 Audio, например, на основании одного из следующих вариантов:The side information can thus be conveyed by the MPEG-4 Audio bitstream extension mechanism, for example based on one of the following options:

Вариант 1:Option 1:

raw_data_block() -> case ID_DSE -> data_stream_element() -> data_stream_byte[element_instance_tag][i];raw_data_block() -> case ID_DSE -> data_stream_element() -> data_stream_byte[element_instance_tag][i];

Вариант 2:Option 2:

raw_data_block() -> case ID_FIL -> fill_element() -> extension_payload(cnt) -> switch( extension_type ) -> extension_type == EXT_DATA_ELEMENT (+ convention how to identify) or EXT_AUDIO_PRE_ROLLraw_data_block() -> case ID_FIL -> fill_element() -> extension_payload(cnt) -> switch( extension_type ) -> extension_type == EXT_DATA_ELEMENT (+ convention how to identify) or EXT_AUDIO_PRE_ROLL

Вариант 3:Option 3:

er_raw_data_block() or er_raw_data_block_eld() -> extension_payload(cnt) -> switch( extension_type ) -> extension_type == EXT_DATA_ELEMENT (+ convention how to identify) or EXT_AUDIO_PRE_ROLLer_raw_data_block() or er_raw_data_block_eld() -> extension_payload(cnt) -> switch( extension_type ) -> extension_type == EXT_DATA_ELEMENT (+ convention how to identify) or EXT_AUDIO_PRE_ROLL

Элемент потока данных (сообщаемый посредством id_syn_ele, равного ID_DSE) или заполняющий элемент (сообщаемый посредством id_syn_ele, равного ID_FIL), или их эквиваленты, как определено в ISO/IEC 14496-3 и/или будущих стандартах, могут использоваться для переноса полезных нагрузок расширения, которые могут использоваться для дополнительного расширения информации, которая передается в таком пакете полезной нагрузки, без нарушения совместимости с унаследованными декодерами.A data flow element (reported by id_syn_ele equal to ID_DSE) or padding element (reported by id_syn_ele equal to ID_FIL), or their equivalents, as defined in ISO/IEC 14496-3 and/or future standards, may be used to carry extension payloads, which can be used to further enhance the information that is carried in such a payload packet without breaking compatibility with legacy decoders.

Таким образом, в контексте MPEG-4 Audio, либо элемент потока данных (ID_DSE), либо элемент extension_payload, который может находиться внутри заполняющего элемента (ID_FIL), могут использоваться для передачи AU и информации о конфигурации, которые представляют временные отрезки до AU_n (т. е. дополнительную информацию) в том же пакете полезной нагрузки, что и AU_n (т. е. текущий кадр, кадр немедленного воспроизведения). Это может дополнительно использоваться для применения обработки, которая позволяет использовать функциональность IPF также в MPEG-4 Audio. Аналогично, как в MPEG-D USAC, где механизм расширения (usacExtElement) может использоваться для передачи полезной нагрузки AudioPreRoll(), но с некоторыми отличиями.Thus, in the context of MPEG-4 Audio, either a data stream element (ID_DSE) or an extension_payload element, which may be within a fill element (ID_FIL), may be used to convey AU and configuration information that represent time slices up to AU _n ( i.e. additional information) in the same payload packet as AU _n (i.e. current frame, instant play frame). This can additionally be used to apply processing that allows the IPF functionality to be used also in MPEG-4 Audio. Similar to MPEG-D USAC where an extension mechanism (usacExtElement) can be used to pass the AudioPreRoll() payload, but with some differences.

В одном варианте осуществления элемент расширения (например, элемент extension_payload) может быть введен в AU, причем тип элемента сообщается в битовом потоке для каждого элемента и AU. В одном примере элемент расширения может представлять собой первый элемент в пакете полезной нагрузки, который предшествует первому звуковому элементу.In one embodiment, an extension element (eg, an extension_payload element) may be introduced into the AU, with the element type reported in the bitstream for each element and AU. In one example, the extension element may be the first element in the payload packet that precedes the first audio element.

Каждый звуковой предварительно загруженный элемент, определенный ниже, может быть идентифицирован посредством универсально уникального идентификатора (UUID). Поле UUID может использоваться для сообщения посредством звукового кодера и обнаружения посредством звукового декодера переключения конфигураций потока. Если поле UUID изменилось относительно предыдущего кадра или исходного состояния (например, при первом запуске декодера), могла измениться конфигурация потока, и предварительно загруженная полезная нагрузка должна быть оценена для обеспечения правильного декодирования. Если UUID не изменился относительно предыдущего кадра, декодер может пропустить полезную нагрузку audio_preroll_element() и перейти к обычному декодированию.Each audio preload element, defined below, can be identified by a universally unique identifier (UUID). The UUID field may be used for reporting by the audio encoder and detection by the audio decoder of switching stream configurations. If the UUID field has changed from the previous frame or initial state (eg, when the decoder was first started), the stream configuration may have changed and the preloaded payload must be evaluated to ensure correct decoding. If the UUID has not changed from the previous frame, the decoder may skip the audio_preroll_element() payload and proceed with normal decoding.

В одном примере, если UUID отсутствует, декодер может сравнить AudioSpecificConfig, принадлежащий audio_preroll_element(), с текущей конфигурацией декодера для обнаружения переключения конфигурации потока.In one example, if no UUID is present, the decoder may compare the AudioSpecificConfig owned by audio_preroll_element() with the decoder's current configuration to detect a stream configuration switch.

Поле «флаги» используется в качестве 8-битного набора битов, который может использоваться для сообщения дополнительной информации декодеру. Он может представлять собой информацию о том, должно ли применяться плавное микширование, или о типе плавного микширования (например, линейное, логарифмическое). В примере ниже один бит используется для сообщения того, присутствуют или не присутствуют предварительно загруженная полезная нагрузка или UUID в битовом потоке.The flags field is used as an 8-bit set of bits that can be used to provide additional information to the decoder. It may be information about whether smooth mixing should be applied or the type of smooth mixing (eg, linear, logarithmic). In the example below, one bit is used to report whether or not a preloaded payload or UUID is present in the bitstream.

Конфигурация потока должна быть известна декодеру до обработки каких-либо AU. В MPEG-4 Audio конфигурация декодера передается в элементе AudioSpecificConfig. Конфигурация декодера и UUID являются частями предварительно загруженной полезной нагрузки. Кроме того, предварительно загруженная полезная нагрузка содержит зависящее от конфигурации количество MPEG-4 Audio AU (raw_data_block).The stream configuration must be known to the decoder before any AUs are processed. In MPEG-4 Audio, the decoder configuration is passed in the AudioSpecificConfig element. The decoder configuration and UUID are part of the preloaded payload. In addition, the preloaded payload contains a configuration dependent number of MPEG-4 Audio AUs (raw_data_block).

Кадр немедленного воспроизведения (IPF) в MPEG-4 может содержать точно одну полезную нагрузку audio_preroll_element(), как описано ниже, и один или несколько потоков звуковых элементов (например, single_channel_element()), определенных в ISO/IEC 14496-3. Элементарные потоки звуковых элементов связаны с текущей временной меткой. Предварительно загруженная полезная нагрузка может переноситься в одном из механизмов полезной нагрузки расширения MPEG-4 Audio.An immediate playback frame (IPF) in MPEG-4 may contain exactly one audio_preroll_element() payload, as described below, and one or more audio element streams (eg, single_channel_element()) defined in ISO/IEC 14496-3. The elementary streams of sound elements are associated with the current timestamp. The preloaded payload may be carried in one of the payload mechanisms of the MPEG-4 Audio extension.

На фиг. 1 показан кадр немедленного воспроизведения (AU_n) 1, содержащий два предварительно загруженных кадра (AU_n-1, AU_n-2) 2, 3, а также соответствующую конфигурацию потока 4 (AudioSpecificConfig) и идентификатор потока 5 (UUID).In FIG. 1 shows an immediate playback frame (AU _n ) 1 containing two preloaded frames (AU _n-1 , AU _n-2 ) 2, 3, as well as the corresponding stream 4 configuration (AudioSpecificConfig) and stream identifier 5 (UUID).

Звуковой предварительно загруженный элемент может быть определен на основании следующего:An audio preload element can be determined based on the following:

audio_preroll_element() {audio_preroll_element() {

flags; 8 uimbsf flags ; 8 umbsf

if((flags & 0x01) == 1)if((flags & 0x01) == 1)

uuid; 128 uimbsf uid ; 128 uimbsf

if((flags & 0x02) == 0) return; // No payload presentif((flags & 0x02) == 0) return; // No payload present

asc_size = bs_asc_size; 8 uimbsfasc_size = bs_asc_size ; 8 umbsf

if(asc_size == 255)if(asc_size == 255)

asc_size += esc; 8 uimbsfasc_size += esc ; 8 umbsf

AudioSpecificConfig(); asc_size * 8AudioSpecificConfig(); asc_size * 8

n_preroll_frames; 8 uimbsf n_preroll_frames ; 8 umbsf

for(f = 0; f < n_preroll_frames; ++f) {for(f = 0; f < n_preroll_frames; ++f) {

au_size = bs_au_size; 8 uimbsfau_size = bs_au_size ; 8 umbsf

if(au_size == 255)if(au_size == 255)

au_size += esc; 8 uimbsfau_size += esc ; 8 umbsf

raw_data_block(); au_size * 8raw_data_block(); au_size * 8

}}

В одном примере звуковой предварительно загруженный элемент (например, audio_preroll_element()) подвергнут байтовому выравниванию и, таким образом, может быть передан без дополнительного байтового выравнивания как элементом extension_payload (например, внутри заполняющего элемента), так и элементом потока данных.In one example, an audio preload element (eg, audio_preroll_element()) is byte-aligned and thus can be passed without additional byte-alignment as both an extension_payload element (eg, within a padding element) and a dataflow element.

В одном примере IPF относится к предварительно загруженному элементу следующим образом: IPF содержат как текущий AU, так и дополнительные AU (т.е. количество предшествующих кадров), которые необходимы для правильного декодирования. Дополнительные AU запакованы как часть предварительно загруженного элемента, который, в свою очередь, запакован в raw_data_block() (посредством ID_DSE или ID_FIL). Такой raw_data_block может представлять собой IPF.In one example, the IPF refers to the preloaded element as follows: The IPFs contain both the current AU and additional AUs (ie, the number of previous frames) that are necessary for proper decoding. Additional AUs are packed as part of the preload element, which is in turn packed in raw_data_block() (via ID_DSE or ID_FIL). Such raw_data_block may be an IPF.

IPF могут быть закодированы посредством различных способов. В одном примере предварительно загруженный кадр должен быть независимо декодируемым, например, если используется SBR, должен присутствовать заголовок SBR. В одном варианте осуществления audio_preroll_element() может быть инкапсулирован в элемент потока данных. Например, audio_preroll_element() может быть инкапсулирован в элемент потока данных на основании следующего синтаксиса:The IPFs may be encoded in various ways. In one example, the preloaded frame must be independently decodable, for example, if SBR is used, an SBR header must be present. In one embodiment, audio_preroll_element() may be encapsulated in a data stream element. For example, audio_preroll_element() can be encapsulated in a data stream element based on the following syntax:

data_stream_element() {data_stream_element() {

element_instance_tag; 4 uimsbf element_instance_tag ; 4 uimsbf

data_byte_align_flag; 1 uimsbf data_byte_align_flag ; 1 uimsbf

cnt = count; 8 uimsbfcnt= count ; 8 uimsbf

if (cnt == 255)if (cnt == 255)

cnt += esc_count; 8 uimsbfcnt += esc_count ; 8 uimsbf

if (data_byte_align_flag)if (data_byte_align_flag)

byte_alignment();byte_alignment();

for (i = 0; i < cnt; i++)for (i = 0; i < cnt; i++)

data_stream_byte[element_instance_tag][i]; 8 uimsbf data_stream_byte [element_instance_tag][i]; 8 uimsbf

}}

Соглашение может использоваться для идентификации элемента потока данных (ID_DSE), который переносит audio_preroll_element(). В одном варианте осуществления ID_DSE может быть расположен в заданном положении в потоке и/или может иметь специальную конкретную метку, сообщающую, что полезная нагрузка представляет собой audio_preroll_element().The convention can be used to identify the data stream element (ID_DSE) that audio_preroll_element() carries. In one embodiment, the ID_DSE may be located at a given position in the stream and/or may have a specific specific label indicating that the payload is an audio_preroll_element().

В другом варианте осуществления audio_preroll_element() может быть инкапсулирован в элемент extension_payload внутри заполняющего элемента. Например, audio_preroll_element() может быть инкапсулирован на основании следующего синтаксиса:In another embodiment, audio_preroll_element() may be encapsulated in an extension_payload element within a padding element. For example, audio_preroll_element() can be encapsulated based on the following syntax:

fill_element() {fill_element() {

cnt = count; 4 uimsbfcnt= count ; 4 uimsbf

if (cnt == 15)if (cnt == 15)

cnt += esc_count - 1; 8 uimsbfcnt += esc_count - 1; 8 uimsbf

while (cnt > 0) {while (cnt > 0) {

cnt -= extension_payload(cnt);cnt -= extension_payload(cnt);

}}

Как в предыдущем примере, соглашение может использоваться для идентификации типа элемента extension_payload, например, если EXT_DATA_ELEMENT передается в определенном заданном положении в пакете полезной нагрузки, то полезная нагрузка представляет собой audio_preroll_element(). Таким образом, в одном варианте осуществления элемент extension_payload может быть расположен в заданном положении в потоке и/или может иметь специальную конкретную метку, сообщающую, что полезная нагрузка представляет собой audio_preroll_element().As in the previous example, a convention can be used to identify the element type of the extension_payload, for example, if an EXT_DATA_ELEMENT is passed at a specific pre-rolled position in the payload packet, then the payload is an audio_preroll_element(). Thus, in one embodiment, the extension_payload element may be located at a given position in the stream and/or may have a specific specific label indicating that the payload is an audio_preroll_element().

В одном примере полезная нагрузка расширения в соответствии с настоящим изобретением может сообщаться с использованием нового типа полезной нагрузки расширения, например extension_type = In one example, an extension payload in accordance with the present invention may be communicated using a new extension payload type, such as extension_type =

EXT_AUDIO_PRE_ROLL = 1010b.EXT_AUDIO_PRE_ROLL = 1010b.

extension_payload(cnt) {extension_payload(cnt) {

extension_type; 4 uimsbf extension_type ; 4 uimsbf

align = 4;align=4;

switch( extension_type ) {switch( extension_type ) {

case EXT_AUDIO_PRE_ROLL:case EXT_AUDIO_PRE_ROLL:

// Always byte-aligned, do not modify align// Always byte-aligned, do not modify align

audio_preroll_element();audio_preroll_element();

break;break;

[...][...]

В одном примере элемент потока данных (ID_DSE) или элемент extension_payload (например, внутри заполняющего элемента (ID_FIL)), который переносит audio_preroll_element(), может быть введен в битовый поток до любого звукового элемента в этом же пакете полезной нагрузки. Некоторые примеры ввода таким образом полезной нагрузки IPF включают:In one example, the data stream element (ID_DSE) or extension_payload element (eg, within padding element (ID_FIL)) that carries audio_preroll_element() may be introduced into the bitstream before any audio element in the same payload packet. Some examples of inputting an IPF payload this way include:

Mono: Mono : <ID_(DSE|FIL)><ID_SCE>…<ID_END>< ID_(DSE|FIL) ><ID_SCE>…<ID_END> 5.1: 5.1 : <ID_(DSE|FIL)><ID_SCE><ID_CPE><ID_CPE><ID_LFE>…<ID_END>< ID_(DSE|FIL )><ID_SCE><ID_CPE><ID_CPE><ID_LFE>…<ID_END>

Один аспект настоящего изобретения относится к декодированию IPF. Обратимся к примеру на фиг. 2; способ декодирования может включать этап S101 приема битового потока кодированных звуковых данных. Битовый поток кодированных звуковых данных может представлять последовательность значений отсчетов звукового сигнала и может содержать множество кадров, причем каждый кадр может содержать связанные кодированные значения отсчетов звукового сигнала.One aspect of the present invention relates to IPF decoding. Referring to the example in FIG. 2; the decoding method may include step S101 of receiving the encoded audio data bitstream. The encoded audio data bitstream may represent a sequence of audio sample values and may comprise a plurality of frames, where each frame may contain associated encoded audio sample values.

Способ может дополнительно включать этап S102 определения того, является ли кадр битового потока кодированных звуковых данных кадром немедленного воспроизведения. Кадр немедленного воспроизведения может содержать кодированные значения отсчетов звукового сигнала, связанные с текущим кадром, и дополнительную информацию. Дополнительная информация может содержать кодированные значения отсчетов звукового сигнала некоторого количества кадров, предшествующих кадру немедленного воспроизведения, причем кодированные значения отсчетов звукового сигнала предшествующих кадров закодированы с использованием той же конфигурации кодеков, что и кадр немедленного воспроизведения. Количество предшествующих кадров, соответствующих предварительно загруженным кадрам, может соответствовать количеству кадров, которые необходимы декодеру для формирования полного сигнала, чтобы иметь возможность выводить действительные значения отсчетов звукового сигнала, связанные с текущим кадром, каждый раз, когда декодируется кадр немедленного воспроизведения. Полный сигнал может, например, быть сформирован во время запуска или перезапуска декодера. Кадр немедленного воспроизведения может представлять собой, например, первый кадр после запуска декодера.The method may further include step S102 of determining whether the encoded audio data bitstream frame is an immediate playback frame. The instant playback frame may contain encoded audio sample values associated with the current frame and additional information. The side information may comprise encoded audio sample values of a number of frames preceding the instant play frame, wherein the encoded audio sample values of previous frames are encoded using the same codec configuration as the instant play frame. The number of previous frames corresponding to the preloaded frames may correspond to the number of frames that the decoder needs to generate a complete signal to be able to output the actual audio sample values associated with the current frame each time an immediate playback frame is decoded. The complete signal may, for example, be generated during start-up or restart of the decoder. The instant play frame may be, for example, the first frame after the start of the decoder.

И способ может включать этап S103 инициализации декодера, если определяют, что кадр представляет собой кадр немедленного воспроизведения. Инициализация может включать декодирование кодированных значений отсчетов звукового сигнала, содержащихся в дополнительной информации, перед декодированием кодированных значений отсчетов звукового сигнала, связанных с текущим кадром.And the method may include the decoder initialization step S103 if it is determined that the frame is an instant playback frame. The initialization may include decoding the encoded audio sample values contained in the side information before decoding the encoded audio sample values associated with the current frame.

Обратимся теперь к примеру фиг. 3; звуковой декодер может быть переключен с текущей конфигурации кодеков на другую конфигурацию кодеков, если определяют, что кадр представляет собой кадр немедленного воспроизведения, и если значения отсчетов звукового сигнала кадра немедленного воспроизведения были закодированы с использованием другой конфигурации кодеков. Кадр немедленного воспроизведения может быть декодирован с использованием текущей конфигурации кодеков, и дополнительная информация может быть отброшена, если определяют, что кадр представляет собой кадр немедленного воспроизведения, и если значения отсчетов звукового сигнала кадра немедленного воспроизведения были закодированы с использованием текущей конфигурации кодеков.Referring now to the example of FIG. 3; the audio decoder may be switched from the current codec configuration to another codec configuration if the frame is determined to be an instant playback frame and if the audio sample values of the instant playback frame were encoded using a different codec configuration. An instant playback frame may be decoded using the current codec configuration, and side information may be discarded if the frame is determined to be an immediate playback frame and if the audio sample values of the instant playback frame were encoded using the current codec configuration.

В одном примере способ декодирования может использовать элементы из декодирования IPF в MPEG-D USAC. Снова обратимся к примеру на фиг. 3; способ декодирования может выполняться в точности следующим образом.In one example, the decoding method may use elements from IPF to MPEG-D USAC decoding. Referring again to the example in Fig. 3; the decoding method can be performed exactly as follows.

• Если полезная нагрузка присутствует и если uuid изменился относительно предыдущего кадра (решение в блоке S104)• If payload is present and if uuid has changed from the previous frame (decision in block S104)

OROR

• Если полезная нагрузка присутствует и uuid не присутствует (решение в блоке S105)• If payload is present and uuid is not present (decision in block S105)

1. Считать новую конфигурацию потока S106, т.е. AudioSpecificConfig() из audio_preroll_element()1. Read the new flow configuration S106, i.e. AudioSpecificConfig() from audio_preroll_element()

2. Сбросить состояния декодера и сохранить результат в буфере S107, например, с использованием блока доступа «NULL»2. Reset the decoder states and store the result in buffer S107, for example, using the "NULL" access block

▪ Сохранить результат в буфере (буфер плавного микширования)▪ Store the result in a buffer (smooth mix buffer)

3. Повторно настроить (повторно инициализировать) декодер S1083. Re-configure (re-initialize) the S108 decoder

4. Декодировать n_preroll_frames в audio_preroll_element S109 и отбросить вывод4. Decode n_preroll_frames to audio_preroll_element S109 and discard output

5. Декодировать следующий звуковой элемент (например, SCE/CPE/LFE) в битовом потоке и сохранить результат в буфере (буфер плавного микширования B)5. Decode the next audio element (e.g. SCE/CPE/LFE) in the bitstream and store the result in a buffer (Fade Mix Buffer B)

▪ Если это сообщается посредством flags, применить плавное микширование S110 между буфером A плавного микширования и выводом декодера и записать результат в PCM буфер вывода. Результат плавного микширования составляет составной блок для этого кадра▪ If signaled by flags , apply S110 soft-mix between soft-mix buffer A and decoder output, and write the result to the PCM output buffer. The result of smooth mixing constitutes the composite block for that frame

▪ Иначе записать вывод декодера непосредственно в PCM буфер▪ Otherwise write decoder output directly to PCM buffer

6. Продолжить со следующим кадром6. Continue with the next frame

• ELSE • ELSE

1. Пропустить audio_preroll_element() и декодировать кадр S1111. Skip audio_preroll_element() and decode frame S111

2. Записать результат в PCM буфер S1122. Write the result to the PCM buffer S112

Способ декодирования может дополнительно включать следующее:The decoding method may further include the following:

• Если полезная нагрузка не присутствует, декодировать соответствующий кадр и продолжить со следующим кадром• If no payload is present, decode the corresponding frame and continue with the next frame

В одном примере эта обработка может позволять переключение AudioObjectTypes (AOT), как определено в ISO/IEC 14496-3, в комбинации с непрерывным созданием правильных выходных отсчетов и без внесения периодов тишины в звуковой вывод. В одном примере AOT, которые переключаются, могут включать AOT 2 (AAC), AOT 5 (SBR), AOT 29 (PS) и другие совместимые экземпляры.In one example, this processing may allow switching AudioObjectTypes (AOT) as defined in ISO/IEC 14496-3, in combination with continuously generating correct output samples and without introducing periods of silence into the audio output. In one example, AOTs that are switched may include AOT 2 (AAC), AOT 5 (SBR), AOT 29 (PS), and other compatible instances.

Обратимся теперь к примеру на фиг. 4; один аспект настоящего изобретения относится к звуковому декодеру для декодирования IPF. Звуковой декодер 100 может содержать блок 101 определения. Блок 101 определения может быть выполнен с возможностью определения того, является ли кадр битового потока кодированных звуковых данных кадром немедленного воспроизведения, который содержит кодированные значения отсчетов звукового сигнала, связанные с текущим кадром, и дополнительную информацию. Дополнительная информация может содержать кодированные значения отсчетов звукового сигнала некоторого количества кадров, предшествующих кадру немедленного воспроизведения. Кодированные значения отсчетов звукового сигнала предшествующих кадров могут быть закодированы с использованием той же конфигурации кодеков, что и текущий кадр. Количество предшествующих кадров, соответствующих предварительно загруженным кадрам, может соответствовать количеству кадров, которые необходимы декодеру 100 для формирования полного сигнала, чтобы иметь возможность выводить действительные значения отсчетов звукового сигнала, связанные с текущим кадром, каждый раз, когда декодируется кадр немедленного воспроизведения. Полный сигнал может, например, быть сформирован во время запуска или перезапуска декодера 100. Кадр немедленного воспроизведения может представлять собой, например, первый кадр после запуска декодера 100.Referring now to the example in Fig. 4; one aspect of the present invention relates to an audio decoder for IPF decoding. The audio decoder 100 may include a determiner 101 . The determining unit 101 may be configured to determine whether a frame of the encoded audio data bitstream is an instant playback frame that contains encoded audio sample values associated with the current frame and side information. The side information may comprise encoded audio sample values of a number of frames preceding the immediate playback frame. The encoded audio sample values of previous frames may be encoded using the same codec configuration as the current frame. The number of previous frames corresponding to the preloaded frames may correspond to the number of frames that the decoder 100 needs to complete the signal to be able to output the actual audio sample values associated with the current frame each time an instant playback frame is decoded. The complete signal may, for example, be generated during the start or restart of the decoder 100. The instant playback frame may be, for example, the first frame after the start of the decoder 100.

И звуковой декодер 100 может содержать блок 102 инициализации. Блок 102 инициализации может быть выполнен с возможностью инициализации декодера 100, если блок 101 определения определяет, что кадр представляет собой кадр немедленного воспроизведения. Инициализация декодера 100 может включать декодирование кодированных значений отсчетов звукового сигнала, содержащихся в дополнительной информации, перед декодированием кодированных значений отсчетов звукового сигнала, связанных с текущим кадром. Блок 102 инициализации может быть дополнительно выполнен с возможностью переключения звукового декодера 100 с текущей конфигурации кодеков на другую конфигурацию кодеков, если блок 101 определения определяет, что кадр представляет собой кадр немедленного воспроизведения, и если значения отсчетов звукового сигнала текущего кадра были закодированы с использованием другой конфигурации кодеков. И декодер 100 может быть выполнен с возможностью декодирования текущего кадра с использованием текущей конфигурации кодеков и отбрасывания дополнительной информации, если блок 101 определения определяет, что кадр представляет собой кадр немедленного воспроизведения, и если значения отсчетов звукового сигнала текущего кадра были закодированы с использованием текущей конфигурации кодеков.And the audio decoder 100 may include an initialization block 102 . The initializer 102 may be configured to initialize the decoder 100 if the determiner 101 determines that the frame is an instant playback frame. Initialization of the decoder 100 may include decoding the encoded audio sample values contained in the side information before decoding the encoded audio sample values associated with the current frame. The initializer 102 may be further configured to switch the audio decoder 100 from the current codec configuration to another codec configuration if the determiner 101 determines that the frame is an immediate playback frame, and if the audio sample values of the current frame were encoded using a different configuration. codecs. And, the decoder 100 may be configured to decode the current frame using the current codec configuration and discard side information if the determiner 101 determines that the frame is an immediate playback frame and if the audio sample values of the current frame have been encoded using the current codec configuration .

Обратимся теперь к примеру на фиг. 5; один аспект настоящего изобретения относится к звуковому кодеру для генерирования битового потока кодированных звуковых данных с кадрами немедленного воспроизведения (точки произвольного доступа), причем битовый поток кодированных звуковых данных представляет последовательность значений отсчетов звукового сигнала и содержит множество кадров, причем каждый кадр содержит связанные кодированные значения отсчетов звукового сигнала.Referring now to the example in Fig. 5; one aspect of the present invention relates to an audio encoder for generating an encoded audio data bitstream with instant play (random access points) frames, the encoded audio data bitstream representing a sequence of audio sample values and comprising a plurality of frames, each frame containing associated encoded sample values sound signal.

Звуковой кодер 200 может содержать базовый кодер 202, выполненный с возможностью кодирования несжатых значений отсчетов звукового сигнала, связанных с множеством кадров, с использованием заданной конфигурации кодеков. Использование заданной конфигурации кодеков может, например, включать использование предварительно определенной частоты дискретизации. В одном примере базовый кодер 202 может кодировать несжатые отсчеты звукового сигнала, которые должны быть совместимы с декодированием согласно стандарту MPEG-4 Audio.Audio encoder 200 may include a core encoder 202 configured to encode uncompressed audio sample values associated with multiple frames using a predetermined codec configuration. The use of a given codec configuration may, for example, include the use of a predetermined sampling rate. In one example, core encoder 202 may encode uncompressed audio samples that must be compatible with MPEG-4 Audio decoding.

Звуковой кодер 200 может дополнительно содержать буфер 203, выполненный с возможностью сохранения кодированных значений отсчетов звукового сигнала некоторого количества кадров, предшествующих текущему кадру, из множества кадров, закодированных с использованием заданной конфигурации кодеков (как указано пунктирными линиями).Audio encoder 200 may further comprise a buffer 203 configured to store encoded audio sample values of a number of frames preceding the current frame from a plurality of frames encoded using a given codec configuration (as indicated by dashed lines).

При кодировании, например, кадра N, соответствующие предыдущие кадры N-1, N-2, … всегда могут помещаться в буфер/сохраняться. При получении инструкции на запись IPF в кадр N (например, каждые 2 секунды для обеспечения динамического переключения), сохраненные соответствующие предыдущие кадры N-1, N-2, … могут быть взяты и запакованы в текущий кадр N.When encoding, for example, frame N, the corresponding previous frames N-1, N-2, ... can always be buffered/stored. When instructed to write an IPF to frame N (for example, every 2 seconds to ensure dynamic switching), the stored corresponding previous frames N-1, N-2, ... can be taken and packed into the current frame N.

И звуковой кодер 200 может содержать эмбеддер 204, выполненный с возможностью записи кадра немедленного воспроизведения в текущий кадр из множества кадров, причем кадр немедленного воспроизведения может содержать кодированные значения отсчетов звукового сигнала, связанные с указанным текущим кадром, и дополнительную информацию, соответствующую кодированным значениям отсчетов звукового сигнала количества кадров, предшествующих указанному текущему кадру.And, the audio encoder 200 may include an embedder 204 configured to record an immediate playback frame into a current frame of a plurality of frames, wherein the immediate playback frame may comprise encoded audio sample values associated with said current frame and side information corresponding to the encoded audio sample values. a signal of the number of frames preceding the specified current frame.

Хотя в примере на фиг. 5 эмбеддер 204 определен как часть звукового кодера 200, следует отметить, что альтернативно или дополнительно эмбеддер 204 также может быть реализован отдельно для записи кадров немедленного воспроизведения в любой текущий кадр битового потока кодированных звуковых данных или для преобразования кадров немедленного воспроизведения в битовом потоке кодированных звуковых данных в «нормальные» кадры за счет удаления дополнительной информации из кадров немедленного воспроизведения. В этом случае эмбеддер 204 может быть частью схемы кодера, но не обязательно.Although in the example of FIG. 5, embedder 204 is defined as part of audio encoder 200, it should be noted that alternatively or additionally, embedder 204 can also be implemented separately to record immediate play frames in any current frame of the encoded audio bitstream, or to transform the immediate play frames in the encoded audio bitstream. into "normal" frames by removing additional information from the immediate playback frames. In this case, the embedder 204 may be part of the encoder scheme, but need not be.

В одном варианте осуществления эмбеддер 204 может быть дополнительно выполнен с возможностью включения информации о заданной конфигурации кодеков в дополнительную информацию. В этом случае, дополнительная информация может предоставлять информацию о заданной конфигурации кодеков в декодер.In one embodiment, embedder 204 may be further configured to include information about a given codec configuration in the side information. In this case, the side information may provide information about the specified codec configuration to the decoder.

В одном варианте осуществления эмбеддер 204 может быть дополнительно выполнен с возможностью включения дополнительной информации в кадр немедленного воспроизведения. Таким образом, дополнительная информация может передаваться в битовом потоке в декодер.In one embodiment, embedder 204 may be further configured to include additional information in an immediate play frame. Thus, the side information can be transmitted in the bitstream to the decoder.

В одном варианте осуществления сгенерированный битовый поток кодированных звуковых данных может представлять собой битовый поток MPEG-4 Audio.In one embodiment, the generated encoded audio data bitstream may be an MPEG-4 Audio bitstream.

В одном варианте осуществления эмбеддер 204 может быть дополнительно выполнен с возможностью ввода дополнительной информации в битовый поток (например, для передачи) посредством механизма расширения битового потока MPEG-4 Audio, который может представлять собой либо элемент потока данных (ID_DSE), либо элемент extension_payload.In one embodiment, embedder 204 may be further configured to insert additional information into the bitstream (eg, for transmission) via an MPEG-4 Audio bitstream extension mechanism, which may be either a data stream element (ID_DSE) or an extension_payload element.

В одном варианте осуществления эмбеддер 204 может быть дополнительно выполнен с возможностью размещения либо элемента потока данных (ID_DSE), либо элемента extension_payload в заданном положении в битовом потоке MPEG-4 Audio и/или присвоения специальной конкретной метки, сообщающей, что полезная нагрузка элемента потока данных (ID_DSE) или элемента extension_payload представляет собой дополнительную информацию. Заданное положение может соответствовать первому положению в битовом потоке MPEG-4 Audio, т.е. всегда первому в кадрах, поскольку оно может переносить конфигурацию декодера, которая может потребоваться для декодирования кадра n (фиг. 1, raw_data_block()[n]). Декодер, таким образом, может предполагать, что, если первый элемент в кадре представляет собой ID_DSE или элемент extension_payload (может находиться внутри элемента ID_FIL), то этот элемент переносит предварительно загруженные данные (предшествующие кадры, предварительно загруженные кадры).In one embodiment, embedder 204 may be further configured to place either a data stream element (ID_DSE) or an extension_payload element at a given position in the MPEG-4 Audio bitstream and/or assign a specific specific label indicating that the payload of the data stream element (ID_DSE) or extension_payload element is additional information. The given position may correspond to the first position in the MPEG-4 Audio bitstream, i. e. always first in frames, as it may carry decoder configuration that may be required to decode frame n (FIG. 1, raw_data_block()[n]). The decoder may thus assume that if the first element in a frame is an ID_DSE or an extension_payload element (may be within an ID_FIL element), then that element carries preloaded data (previous frames, preloaded frames).

В одном варианте осуществления эмбеддер 204 может быть дополнительно выполнен с возможностью включения уникального идентификатора в дополнительную информацию. Необязательно уникальный идентификатор может сообщать заданную конфигурацию кодеков. Заданная конфигурация кодеков затем может использоваться декодером для декодирования кадра n, как указано выше. На основании уникального идентификатора декодер может быть способен идентифицировать дополнительную информацию в битовом потоке и выполнить синтаксический анализ битового потока соответственно.In one embodiment, embedder 204 may be further configured to include a unique identifier in the side information. Optionally, the unique identifier may communicate a given codec configuration. The given codec configuration can then be used by the decoder to decode frame n as above. Based on the unique identifier, the decoder may be able to identify additional information in the bitstream and parse the bitstream accordingly.

В одном варианте осуществления звуковой кодер 200 может быть дополнительно выполнен с возможностью не подвергать временному дифференциальному кодированию или энтропийному кодированию самый ранний кадр из количества кадров, содержащихся в дополнительной информации, относительно любого кадра до самого раннего кадра, и звуковой кодер 200 может быть дополнительно выполнен с возможностью не подвергать временному дифференциальному кодированию или энтропийному кодированию кадр немедленного воспроизведения относительно любого кадра до самого раннего кадра из количества кадров, предшествующих кадру немедленного воспроизведения, или относительно любого кадра до кадра немедленного воспроизведения. In one embodiment, the audio encoder 200 may be further configured to not temporally differential or entropy encode the earliest frame of the number of frames contained in the side information relative to any frame prior to the earliest frame, and the audio encoder 200 may be further configured to the possibility of not temporally differential or entropy encoding the instant playback frame with respect to any frame up to the earliest frame of the number of frames preceding the instant playback frame, or with respect to any frame before the immediate playback frame.

Обратимся теперь к примеру на фиг. 6; один аспект настоящего изобретения относится к способу генерирования посредством звукового кодера битового потока кодированных звуковых данных с кадрами немедленного воспроизведения (точки произвольного доступа), причем битовый поток кодированных звуковых данных представляет последовательность значений отсчетов звукового сигнала и содержит множество кадров, причем каждый кадр содержит связанные кодированные значения отсчетов звукового сигнала.Referring now to the example in Fig. 6; One aspect of the present invention relates to a method for generating, by means of an audio encoder, an encoded audio data bitstream with instant playback frames (random access points), wherein the encoded audio data bitstream represents a sequence of audio sample values and contains a plurality of frames, each frame containing associated encoded values. sound signal samples.

Способ может включать этап S201 кодирования посредством базового кодера несжатых значений отсчетов звукового сигнала, связанных с множеством кадров, с использованием заданной конфигурации кодеков. Использование заданной конфигурации кодеков может, например, включать использование предварительно определенной частоты дискретизации. Способ может дополнительно включать этап S202 сохранения посредством буфера кодированных значений отсчетов звукового сигнала некоторого количества кадров, предшествующих текущему кадру, из множества кадров, закодированных с использованием заданной конфигурации кодеков.The method may include the step of S201 encoding, by means of a core encoder, uncompressed audio sample values associated with a plurality of frames using a predetermined codec configuration. The use of a given codec configuration may, for example, include the use of a predetermined sampling rate. The method may further include the step of S202 storing, by means of a buffer of encoded audio sample values, a number of frames preceding the current frame from a plurality of frames encoded using the predetermined codec configuration.

И способ может включать этап S203 записи посредством эмбеддера кадра немедленного воспроизведения в текущий кадр из множества кадров, причем кадр немедленного воспроизведения содержит кодированные значения отсчетов звукового сигнала, связанные с указанным текущим кадром, и дополнительную информацию, соответствующую кодированным значениям отсчетов звукового сигнала количества кадров, предшествующих указанному текущему кадру.And the method may include the step of S203 recording by the instant playback frame embedder into the current frame of the plurality of frames, wherein the immediate playback frame contains encoded audio sample values associated with said current frame and additional information corresponding to the encoded audio sample values of the number of frames preceding specified current frame.

В одном варианте осуществления дополнительная информация может дополнительно содержать информацию о заданной конфигурации кодеков. Заданная конфигурация кодеков может использоваться декодером в способе декодирования, как подробно описано выше.In one embodiment, the additional information may further comprise information about a given codec configuration. The given codec configuration may be used by the decoder in the decoding method as detailed above.

В одном варианте осуществления кадр немедленного воспроизведения может дополнительно содержать дополнительную информацию. Таким образом, дополнительная информация может передаваться в битовом потоке.In one embodiment, the instant play frame may further comprise additional information. Thus, additional information can be transmitted in the bitstream.

В одном варианте осуществления дополнительная информация может быть введена в битовый поток (например, для передачи) посредством эмбеддера с помощью механизма расширения битового потока MPEG-4 Audio, который представляет собой либо элемент потока данных (ID_DSE), либо элемент extension_payload. Элемент extension_payload может, например, находиться в разных местах синтаксиса битового потока MPEG-4 Audio. В одном варианте осуществления элемент extension_payload может быть введен (например, для передачи) посредством эмбеддера внутрь заполняющего элемента (ID_FIL).In one embodiment, additional information may be introduced into the bitstream (eg, for transmission) by the embedder using the MPEG-4 Audio bitstream extension mechanism, which is either a data stream element (ID_DSE) or an extension_payload element. The extension_payload element may, for example, appear in different places in the MPEG-4 Audio bitstream syntax. In one embodiment, an extension_payload element may be inserted (eg, for transmission) by an embedder inside a fill element (ID_FIL).

Как подробно описано выше, в одном варианте осуществления либо элемент потока данных (ID_DSE), либо элемент extension_payload может быть размещен посредством эмбеддера в заданном положении в битовом потоке MPEG-4 Audio и/или ему может быть присвоена специальная конкретная метка, сообщающая, что полезная нагрузка элемента потока данных (ID_DSE) или элемента extension_payload представляет собой дополнительную информацию. Заданное положение всегда может быть первым положением в кадрах, поскольку оно может переносить конфигурацию декодера, которая может потребоваться для декодирования текущего кадра.As detailed above, in one embodiment, either the data stream element (ID_DSE) or the extension_payload element can be placed by the embedder at a given position in the MPEG-4 Audio bitstream and/or it can be assigned a specific specific label indicating that a useful the data flow element payload (ID_DSE) or extension_payload element is additional information. The given position can always be the first position in frames, as it may carry a decoder configuration that may be required to decode the current frame.

В одном варианте осуществления дополнительная информация может дополнительно содержать уникальный идентификатор. Необязательно уникальный идентификатор может сообщать заданную конфигурацию кодеков.In one embodiment, the additional information may further comprise a unique identifier. Optionally, the unique identifier may communicate a given codec configuration.

В одном варианте осуществления самый ранний кадр из количества кадров, содержащихся в дополнительной информации, может не быть подвергнут временному дифференциальному кодированию или энтропийному кодированию посредством звукового кодера относительно любого кадра до самого раннего кадра, и кадр немедленного воспроизведения может не быть подвергнут временному дифференциальному кодированию или энтропийному кодированию посредством звукового кодера относительно любого кадра до самого раннего кадра из количества кадров, предшествующих кадру немедленного воспроизведения, или относительно любого кадра до кадра немедленного воспроизведения.In one embodiment, the earliest frame of the number of frames contained in the side information may not be temporally differential or entropy encoded by an audio encoder with respect to any frame prior to the earliest frame, and the immediate playback frame may not be temporally differential or entropy encoded. encoding by the audio encoder with respect to any frame up to the earliest frame of the number of frames preceding the immediate playback frame, or with respect to any frame before the immediate playback frame.

Обратимся теперь к примеру на фиг. 7; один аспект настоящего изобретения относится к системе, содержащей два или более звуковых кодеров для генерирования множества битовых потоков кодированных звуковых данных, каждый из которых имеет кадры немедленного воспроизведения (синхронизированные во времени точки произвольного доступа), причем каждый битовый поток кодированных звуковых данных представляет последовательность значений отсчетов звукового сигнала и содержит множество кадров, и причем каждый кадр содержит связанные кодированные значения отсчетов звукового сигнала. Хотя количество звуковых кодеров в системе не ограничено, в примере на фиг. 7 изображена система, содержащая два звуковых кодера. Система может использовать два звуковых кодера параллельно на одном содержимом, вставляющих кадры немедленного воспроизведения с одинаковой каденцией, но имеющих разную конфигурацию, например, в отношении битовой скорости. Каждая схема кодирования может выводить свой собственный битовый поток. Оба битовых потока, созданных таким образом, могут сохраняться, например, на веб-сервере. Клиент может начать воспроизводить поток A (например, на более высокой битовой скорости). В некоторый момент времени клиент может решить переключиться на поток B с более низкой битовой скоростью и, таким образом, может запросить то же содержимое, но с другой битовой скоростью. Когда первый сегмент потока B поступает на декодер, такой сегмент может всегда начинаться с IPF (это может сообщаться, например, посредством файла манифеста согласно MPEG-DASH), который позволяет декодеру выводить правильный звук с самого начала.Referring now to the example in Fig. 7; one aspect of the present invention relates to a system comprising two or more audio encoders for generating a plurality of encoded audio data bitstreams each having instant play frames (time-synchronized random access points), each encoded audio data bitstream representing a sequence of sample values audio signal and contains a plurality of frames, and each frame contains the associated encoded values of the samples of the audio signal. Although the number of audio encoders in the system is not limited, in the example of FIG. 7 shows a system containing two audio encoders. The system may use two audio encoders in parallel on the same content, inserting instant play frames with the same cadence but having a different configuration, for example, in terms of bit rate. Each coding scheme can output its own bitstream. Both bitstreams created in this way can be stored, for example, on a web server. The client may start playing stream A (eg, at a higher bit rate). At some point in time, the client may decide to switch to Stream B at a lower bit rate and thus may request the same content but at a different bit rate. When the first segment of stream B arrives at a decoder, such a segment may always begin with an IPF (this may be communicated, for example, via an MPEG-DASH manifest file), which allows the decoder to output the correct audio from the start.

Обратимся теперь к примеру на фиг. 7, два звуковых кодера изображены параллельно, каждый из звуковых кодеров содержит базовый кодер 202 (базовый кодер №1, базовый кодер №2), буфер (не показан) и эмбеддер 204 (ввод IPF).Referring now to the example in Fig. 7, two audio encoders are shown in parallel, each of the audio encoders contains a base encoder 202 (base encoder #1, base encoder #2), a buffer (not shown), and an embedder 204 (IPF input).

В одном варианте осуществления предварительно определенная частота дискретизации может быть одинаковой для каждого из базовых кодеров 202. В ином случае на стороне декодера может потребоваться выполнить передискретизацию и обработку с дополнительной задержкой. Однако базовые кодеры 202 могут быть выполнены с возможностью работы при разных частотах кадров (например AAC-LC 1024; HE-AAC 2048). Кроме того, конфигурации базовых кодеров могут требовать разного количества предварительно загруженных кадров p. Может потребоваться, чтобы как кадр n-p, так и n были независимо декодируемыми, т.е. чтобы они могли не полагаться на информацию из предыдущих кадров (для HE-AAC они могут содержать заголовок SBR). После базового кодирования периоды времени декодирования IPF могут быть синхронизированы по разным потокам.In one embodiment, the predetermined sampling rate may be the same for each of the base encoders 202. Otherwise, the decoder side may need to perform resampling and additional delay processing. However, core encoders 202 may be configured to operate at different frame rates (eg AAC-LC 1024; HE-AAC 2048). In addition, core encoder configurations may require different numbers of preloaded p frames. Both frame n-p and n may be required to be independently decodable, i.e. so that they can not rely on information from previous frames (for HE-AAC, they may contain an SBR header). After base encoding, IPF decoding time periods may be synchronized across different streams.

В одном варианте осуществления система может дополнительно содержать блок 201 выравнивания задержки (задержка №1, задержка №2) для выравнивания задержки множества битовых потоков. Для плавного переключения на декодере может потребоваться синхронизировать периоды времени декодирования кадров немедленного воспроизведения (IPF). На стадии выравнивания задержки могут задерживать входные PCM-отсчеты (несжатые значения отсчетов звукового сигнала, входные значения отсчетов звукового сигнала) для компенсации разных задержек кодера/декодера.In one embodiment, the system may further comprise a delay equalizer 201 (delay #1, delay #2) to equalize the delay of multiple bit streams. For smooth switching at the decoder, it may be necessary to synchronize the decoding time periods of instant playback frames (IPFs). The delay equalization stage may delay input PCM samples (uncompressed audio sample values, input audio sample values) to compensate for different encoder/decoder delays.

Обратимся теперь к примеру на фиг. 8; один аспект настоящего изобретения относится к устройству для генерирования кадров немедленного воспроизведения (точек произвольного доступа) в битовом потоке кодированных звуковых данных или для удаления кадров немедленного воспроизведения (точек произвольного доступа) из битового потока кодированных звуковых данных, причем битовый поток кодированных звуковых данных представляет последовательность значений отсчетов звукового сигнала и содержит множество кадров, причем каждый кадр содержит связанные кодированные значения отсчетов звукового сигнала.Referring now to the example in Fig. 8; One aspect of the present invention relates to an apparatus for generating instant play frames (random access points) in an encoded audio bitstream or for removing instant play frames (random access points) from an encoded audio bitstream, wherein the encoded audio bitstream represents a sequence of values audio samples and comprises a plurality of frames, each frame containing associated encoded audio sample values.

Устройство 300 может содержать приемник 301, выполненный с возможностью приема битового потока кодированных звуковых данных, причем битовый поток кодированных звуковых данных представляет последовательность значений отсчетов звукового сигнала и содержит множество кадров, причем каждый кадр содержит связанные кодированные значения отсчетов звукового сигнала.The apparatus 300 may include a receiver 301 configured to receive an encoded audio bitstream, the encoded audio bitstream representing a sequence of audio sample values and comprising a plurality of frames, each frame containing associated encoded audio sample values.

И устройство 300 может содержать эмбеддер 302, выполненный с возможностью записи кадра немедленного воспроизведения в текущий кадр из множества кадров, причем кадр немедленного воспроизведения содержит кодированные значения отсчетов звукового сигнала, связанные с указанным текущим кадром, и дополнительную информацию, соответствующую кодированным значениям отсчетов звукового сигнала некоторого количества кадров, предшествующих указанному текущему кадру. При генерировании IPF эмбеддер 302 может работать в соответствии с принципами, описанными выше в связи с фиг. 5–7. Можно сказать, что эта работа соответствует преобразованию «нормального» (не IPF) кадра в IPF.And, the device 300 may include an embedder 302 configured to record an instant playback frame into a current frame of a plurality of frames, wherein the instant playback frame contains encoded audio sample values associated with said current frame and side information corresponding to encoded audio sample values of some the number of frames preceding the specified current frame. When generating the IPF, embedder 302 may operate in accordance with the principles described above in connection with FIG. 5–7. We can say that this work corresponds to the transformation of a "normal" (non-IPF) frame into IPF.

Ввод IPF может обеспечивать фактическое копирование и упаковку. За счет разделения базового кодирования и ввода IPF, как в вышеупомянутом устройстве, можно сохранить битовые потоки без IPF, и только вводить IPF перед распределением, если это необходимо. Если он сконфигурирован для плавного переключения, каждый IPF может содержать конфигурацию потока ASC.The IPF input can provide the actual copying and packaging. By separating the base encoding and IPF input, as in the above device, it is possible to store bitstreams without IPF, and only input IPF before distribution if necessary. If configured for soft switching, each IPF may contain an ASC flow configuration.

В одном варианте осуществления устройство 300 может дополнительно содержать буфер, выполненный с возможностью сохранения кодированных значений отсчетов звукового сигнала количества кадров, предшествующих текущему кадру, из множества кадров.In one embodiment, apparatus 300 may further comprise a buffer configured to store encoded audio sample values of the number of frames preceding the current frame from a plurality of frames.

В одном варианте осуществления эмбеддер 302 может быть дополнительно выполнен с возможностью удаления из кадра немедленного воспроизведения дополнительной информации, соответствующей кодированным значениям отсчетов звукового сигнала количества кадров, предшествующих указанному текущему кадру. Можно сказать, что это соответствует преобразованию IPF в «нормальный» кадр. Например, звуковой кодер может генерировать только IPF, которые затем могут быть удалены эмбеддером, в зависимости от ограничений, которые может иметь соответствующий канал/услуга.In one embodiment, the embedder 302 may be further configured to remove additional information from the immediate playback frame corresponding to the encoded audio sample values of the number of frames preceding the specified current frame. We can say that this corresponds to the transformation of the IPF into a "normal" frame. For example, an audio encoder may only generate IPFs, which may then be removed by the embedder, depending on the restrictions that the respective channel/service may have.

Обратимся теперь к примеру на фиг. 9; аспекты настоящего изобретения могут включать постоянный цифровой носитель данных, на котором хранится компьютерная программа для выполнения способов, описанных в настоящем документе, когда указанная компьютерная программа исполняется компьютером или процессором. На фиг. 9 в иллюстративных целях изображено устройство 400, имеющее процессор 401, который может исполнять указанную компьютерную программу. Альтернативно устройство 400 может представлять соответствующий компьютер.Referring now to the example in Fig. 9; aspects of the present invention may include a persistent digital storage medium that stores a computer program for performing the methods described herein when said computer program is executed by a computer or processor. In FIG. 9, for illustrative purposes, a device 400 is shown having a processor 401 that can execute the indicated computer program. Alternatively, device 400 may represent a corresponding computer.

Способы и системы, описанные в настоящем документе, могут быть реализованы как программное обеспечение, аппаратно-программное обеспечение и/или аппаратное обеспечение. Некоторые компоненты могут быть реализованы, например, как программное обеспечение, запускаемое на процессоре цифровой обработки сигналов или на микропроцессоре. Другие компоненты могут быть реализованы, например, как аппаратное обеспечение и/или как интегральные схемы специального назначения. Сигналы, которые встречаются в описанных способах и системах, могут храниться на носителях, таких как оперативные запоминающие устройства или оптические носители информации. Они могут передаваться по сетям, таким как радиосети, спутниковые сети, беспроводные сети или проводные сети, например Интернет. Типичными устройствами, использующими способы, устройство и системы, описанные в настоящем документе, являются переносные электронные устройства или другая бытовая аппаратура, которая используется для хранения и/или рендеринга звуковых сигналов.The methods and systems described herein may be implemented as software, firmware, and/or hardware. Some components may be implemented, for example, as software running on a digital signal processor or on a microprocessor. Other components may be implemented as hardware and/or ASICs, for example. The signals encountered in the described methods and systems may be stored on media such as random access memory or optical storage media. They may be transmitted over networks such as radio networks, satellite networks, wireless networks, or wired networks such as the Internet. Typical devices using the methods, apparatus, and systems described herein are portable electronic devices or other consumer equipment that is used to store and/or render audio signals.

Следует отметить, что описание и графические материалы/фигуры иллюстрируют только принципы предложенных способов, систем и аппаратов. Специалисты в данной области техники будут способны реализовать различные схемы, которые, хотя явно не описаны или показаны в настоящем документе, осуществляют принципы настоящего изобретения и включены в его сущность и объем. Более того, все примеры и варианты осуществления, изложенные в настоящем документе, в первую очередь явным образом предназначены для пояснительных целей, чтобы помочь читателю в понимании принципов предложенного способа. Кроме того, все утверждения в настоящем документе, представляющие принципы, аспекты и варианты осуществления настоящего изобретения, а также их конкретные примеры, предполагаются как охватывающие их эквиваленты.It should be noted that the description and drawings/figures only illustrate the principles of the proposed methods, systems and apparatuses. Those skilled in the art will be able to implement various circuits which, although not expressly described or shown herein, implement the principles of the present invention and are included within its spirit and scope. Moreover, all examples and embodiments set forth herein are expressly intended primarily for explanatory purposes to assist the reader in understanding the principles of the proposed method. In addition, all statements in this document representing the principles, aspects and embodiments of the present invention, as well as their specific examples, are intended to cover their equivalents.

Claims

1. An audio decoder (100) for decoding an encoded audio data bitstream, wherein the encoded audio data bitstream represents a sequence of audio signal sample values and contains a plurality of frames, each frame containing associated encoded audio signal sample values, with the audio decoder (100) contains:

a determining unit (101) configured to determine whether a frame of the encoded audio data bitstream is an immediate playback frame (1) that contains encoded audio sample values associated with the current frame and additional information, wherein the encoded audio data bitstream is an MPEG-4 Audio bitstream, with side information being conveyed by the MPEG-4 Audio bitstream spreading mechanism, which is an extension payload element of a new type of extension payload element EXT_AUDIO_PRE_ROLL, with the extension payload element located at the first position in the bitstream MPEG-4 Audio and the extension payload element is inside a fill element (ID_FIL),

moreover, the additional information contains the encoded values of the samples of the audio signal of a certain number of frames preceding the frame (1) of immediate playback, and the encoded values of the samples of the audio signal of the previous frames can be encoded using the same codec configuration as the current frame,

wherein the number of previous frames corresponding to the preloaded frames (2, 3) corresponds to the number of frames that the decoder needs to generate a complete signal in order to be able to output the actual audio sample values associated with the current frame each time a frame (1 ) immediate playback; And

block (102) initialization, configured to initialize the decoder, if the determining block determines that the frame is a frame (1) immediate playback,

wherein initializing the decoder includes decoding the encoded audio sample values contained in the side information before decoding the encoded audio sample values associated with the current frame,

moreover, the initialization block (102) is configured to switch the audio decoder (100) from the current codec configuration to another codec configuration, if the determination block (101) determines that the frame is an immediate playback frame (1), and if the values of the samples of the audio signal of the current frames were encoded using a different codec configuration, and

moreover, the decoder (100) is configured to decode the current frame using the current codec configuration and discard additional information, if the determination unit (101) determines that the frame is an immediate playback frame (1), and if the values of the audio signal samples of the current frame have been encoded using the current codec configuration.

2. The audio decoder (100) according to claim 1, characterized in that the additional information additionally contains information about the configuration of the codecs, which is used to encode the values of the samples of the audio signal associated with the current frame, and the determination unit (101) is additionally configured to determining if the additional information codec configuration differs from the current codec configuration, and/or

wherein the immediate playback frame (1) contains additional information as an extension payload, and the determining unit (101) is configured to evaluate the extension payload of the immediate playback frame (1).

3. Audio decoder (100) according to any one of paragraphs. 1 or 2, characterized in that the extension payload element has a specific specific label indicating that the payload of the extension payload element is additional information, and/or

wherein the additional information further comprises a unique identifier, and optionally the unique identifier is used to detect another codec configuration.

4. Audio decoder (100) according to any one of paragraphs. 1-3, characterized in that it additionally contains a smooth mixing unit configured to perform smooth mixing of output sample values obtained by resetting the decoder (100) in the previous codec configuration, and output sample values obtained by decoding the encoded audio sample values. signal associated with the current frame, and/or

wherein the earliest frame of the number of frames contained in the side information is not temporally differential or entropy encoded with respect to any frame before the earliest frame, and wherein the immediate playback frame (1) is not temporally differential or entropy encoded with respect to any frame prior to the earliest frame of the number of frames preceding the immediate playback frame (1), or relative to any frame before the immediate playback frame (1).

5. A method for decoding an encoded audio data bitstream, wherein the encoded audio data bitstream represents a sequence of audio signal sample values and contains a plurality of frames, each frame containing associated encoded audio signal sample values, comprising:

determining (S102) whether the frame of the encoded audio data bitstream is an instant playback frame that contains encoded audio sample values associated with the current frame and side information, wherein the encoded audio data bitstream is an MPEG-4 Audio bitstream, moreover, the additional information is transmitted by the MPEG-4 Audio bitstream extension mechanism, which is an extension payload element of a new type of extension payload element EXT_AUDIO_PRE_ROLL, and the extension payload element is located in the first position in the MPEG-4 Audio bitstream, and moreover, the payload element the extension load is inside the fill element (ID_FIL),

moreover, the additional information contains the encoded values of the samples of the audio signal of a certain number of frames preceding the frame of immediate playback,

moreover, the encoded values of the samples of the audio signal of the previous frames are encoded using the same codec configuration as the immediate playback frame,

moreover, the number of previous frames corresponding to the preloaded frames corresponds to the number of frames that the decoder needs to generate a complete signal in order to be able to output the actual values of the audio samples associated with the current frame, each time an immediate playback frame is decoded;

initializing (S103) a decoder if it is determined that the frame is an instant playback frame, the initialization comprising decoding coded audio sample values contained in side information before decoding coded audio sample values associated with the current frame;

switching the audio decoder from the current codec configuration to another codec configuration if the frame is determined to be an instant playback frame and if the audio sample values of the instant playback frame were encoded using a different codec configuration; And

decoding the instant playback frame using the current codec configuration; and discarding side information if the frame is determined to be an immediate playback frame and if the audio sample values of the immediate playback frame were encoded using the current codec configuration.

6. The method according to claim. 5, characterized in that the encoded audio data bitstream contains the first number of frames encoded using the first codec configuration, and the second number of frames following the first number of frames and encoded using the second codec configuration, and the first frame of the second number of frames is an immediate playback frame.

7. An audio encoder (200) for generating a bitstream of encoded audio data with frames (1) of immediate playback, wherein the encoded audio data bitstream represents a sequence of audio sample values and contains a plurality of frames, each frame containing associated encoded audio sample values, wherein the audio encoder (200) comprises:

a base encoder (202) configured to encode uncompressed audio sample values associated with the plurality of frames using a predetermined codec configuration;

a buffer (203) configured to store encoded audio sample values of a number of frames preceding the current frame from the plurality of frames encoded using the predetermined codec configuration; And

embedder (204) configured to record an immediate playback frame (1) into the current frame from a plurality of frames, wherein the immediate playback frame (1) contains encoded audio sample values associated with the specified current frame and additional information corresponding to the encoded sample values the number of frames preceding the specified current frame, wherein the generated encoded audio data bitstream is an MPEG-4 Audio bitstream, and the embedder (204) is additionally configured to add additional information to the bitstream via the MPEG-4 Audio bitstream extension mechanism , which is an extension payload element of the new extension payload element type EXT_AUDIO_PRE_ROLL, placing the extension payload element in the first position in the MPEG-4 Audio bitstream, and inserting the extension payload element inside the fill element (ID_FIL).

8. The audio encoder (200) according to claim 7, characterized in that the embedder (204) is additionally configured to include information about the specified codec configuration in the side information, and/or

while the embedder (204) is additionally configured to include additional information in the immediate playback frame (1).

9. Audio encoder (200) according to any one of paragraphs. 7 or 8, characterized in that the embedder (204) is further configured to assign a special specific label indicating that the payload of the extension payload element is additional information, and/or

wherein the embedder (204) is additionally configured to include the unique identifier in the side information, and optionally the unique identifier communicates a predetermined codec configuration.

10. Audio encoder (200) according to any one of paragraphs. 7-9, characterized in that the audio encoder (200) is additionally configured not to subject the earliest frame of the number of frames contained in the side information to temporal differential coding or entropy coding with respect to any frame up to the earliest frame, and moreover, the audio encoder ( 200) is further configured not to time-differentially or entropy-code the immediate playback frame (1) with respect to any frame up to the earliest frame of the number of frames preceding the immediate playback frame (1) or with respect to any frame before the immediate playback frame (1). .

11. A system containing two or more audio encoders (200) according to any one of paragraphs. 7-10 to generate a plurality of encoded audio data bitstreams each having (1) immediate playback frames, each encoded audio data bitstream representing a sequence of audio sample values and containing a plurality of frames, and each frame containing associated encoded values sound signal samples.

12. The system according to claim 11, characterized in that the predetermined sampling rate is the same for each of the base encoders (202) of two or more audio encoders (200), and/or

wherein the system further comprises a delay equalization block (201) for equalizing the delay of the plurality of bit streams.

13. A method for generating, by an audio encoder, an encoded audio data bitstream with instant playback frames, the encoded audio data bitstream representing a sequence of audio sample values and comprising a plurality of frames, each frame containing associated encoded audio sample values, the method comprising the following stages:

encoding (S201) by the base encoder the uncompressed audio sample values associated with the plurality of frames using a predetermined codec configuration;

storing (S202) by a buffer of encoded audio sample values a number of frames preceding the current frame from a plurality of frames encoded using the predetermined codec configuration; And

recording (S203) by an instant playback frame embedder into a current frame of a plurality of frames, wherein the instant playback frame contains coded audio sample values associated with said current frame and side information corresponding to coded audio sample values of the number of frames preceding said current frame , wherein the generated encoded audio data bitstream is an MPEG-4 Audio bitstream, moreover, additional information is introduced into the bitstream by an embedder using the MPEG-4 Audio bitstream extension mechanism, which is an extension payload element of a new type of extension payload element EXT_AUDIO_PRE_ROLL, wherein the extension payload element is embedded by the embedder at the first position in the MPEG-4 Audio bitstream, and the extension payload element is inserted by the embedder inside the fill element (ID_FIL).

14. Device (300) for generating frames (1) for immediate playback in a bitstream of encoded audio data, wherein the encoded audio data bitstream represents a sequence of audio sample values and contains a plurality of frames, each frame containing associated encoded audio sample values, with this device (300) contains:

a receiver (301) configured to receive an MPEG-4 Audio encoded audio bitstream, the encoded audio data bitstream representing a sequence of audio sample values and comprising a plurality of frames, each frame containing associated encoded audio sample values;

a buffer configured to store encoded audio sample values of a number of frames preceding the current frame from the plurality of frames; And

embedder (302) configured to record an immediate playback frame (1) into a current frame from a plurality of frames, wherein the immediate playback frame (1) contains encoded audio sample values associated with the specified current frame and additional information corresponding to the encoded sample values an audio signal of the number of frames preceding the specified current frame, wherein the embedder (302) is additionally configured to insert additional information into the bitstream by means of an MPEG-4 Audio bitstream extension mechanism, which is an extension payload element of a new type of extension payload element EXT_AUDIO_PRE_ROLL, placing an extension payload element at a first position in the MPEG-4 Audio bitstream, and inserting an extension payload element inside a fill element (ID_FIL).

15. A device (300) for deleting immediate playback frames (1) from an encoded audio data bitstream, wherein the encoded audio data bitstream represents a sequence of audio sample values and contains a plurality of frames, each frame containing associated encoded audio sample values, with this device (300) contains:

a receiver (301) configured to receive an encoded audio data bitstream, the encoded audio data bitstream representing a sequence of audio sample values and comprising a plurality of frames, each frame containing associated encoded audio sample values; And

embedder (302), configured to convert the immediate playback frame (1) into a normal frame by removing from the immediate playback frame (1) additional information corresponding to the encoded values of the audio signal samples of a certain number of frames preceding the current frame from a plurality of frames, in which was recorded frame (1) immediate playback.